/usr/share/pyshared/mrjob-0.3.3.2.egg-info/PKG-INFO is in python-mrjob 0.3.3.2-1.
This file is owned by root:root, with mode 0o644.
The actual contents of the file can be viewed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 | Metadata-Version: 1.1
Name: mrjob
Version: 0.3.3.2
Summary: Python MapReduce framework
Home-page: http://github.com/Yelp/mrjob
Author: David Marin
Author-email: dave@yelp.com
License: Apache
Description: mrjob
=====
.. image:: http://github.com/yelp/mrjob/raw/master/docs/logos/logo_medium.png
mrjob is a Python 2.5+ package that helps you write and run Hadoop Streaming
jobs.
`Main documentation <http://packages.python.org/mrjob/>`_
mrjob fully supports Amazon's Elastic MapReduce (EMR) service, which allows you
to buy time on a Hadoop cluster on an hourly basis. It also works with your own
Hadoop cluster.
Some important features:
* Run jobs on EMR, your own Hadoop cluster, or locally (for testing).
* Write multi-step jobs (one map-reduce step feeds into the next)
* Duplicate your production environment inside Hadoop
* Upload your source tree and put it in your job's ``$PYTHONPATH``
* Run make and other setup scripts
* Set environment variables (e.g. ``$TZ``)
* Easily install python packages from tarballs (EMR only)
* Setup handled transparently by ``mrjob.conf`` config file
* Automatically interpret error logs from EMR
* SSH tunnel to hadoop job tracker on EMR
* Minimal setup
* To run on EMR, set ``$AWS_ACCESS_KEY_ID`` and ``$AWS_SECRET_ACCESS_KEY``
* To run on your Hadoop cluster, install ``simplejson`` and make sure
``$HADOOP_HOME`` is set.
Installation
------------
From PyPI:
``pip install mrjob``
From source:
``python setup.py install``
A Simple Map Reduce Job
-----------------------
Code for this example and more live in ``mrjob/examples``.
::
"""The classic MapReduce job: count the frequency of words.
"""
from mrjob.job import MRJob
import re
WORD_RE = re.compile(r"[\w']+")
class MRWordFreqCount(MRJob):
def mapper(self, _, line):
for word in WORD_RE.findall(line):
yield (word.lower(), 1)
def combiner(self, word, counts):
yield (word, sum(counts))
def reducer(self, word, counts):
yield (word, sum(counts))
if __name__ == '__main__':
MRWordFreqCount.run()
Try It Out!
-----------
::
# locally
python mrjob/examples/mr_word_freq_count.py README.rst > counts
# on EMR
python mrjob/examples/mr_word_freq_count.py README.rst -r emr > counts
# on your Hadoop cluster
python mrjob/examples/mr_word_freq_count.py README.rst -r hadoop > counts
Setting up EMR on Amazon
------------------------
* create an `Amazon Web Services account <http://aws.amazon.com/>`_
* sign up for `Elastic MapReduce <http://aws.amazon.com/elasticmapreduce/>`_
* Get your access and secret keys (click "Security Credentials" on
`your account page <http://aws.amazon.com/account/>`_)
* Set the environment variables ``$AWS_ACCESS_KEY_ID`` and
``$AWS_SECRET_ACCESS_KEY`` accordingly
Advanced Configuration
----------------------
To run in other AWS regions, upload your source tree, run ``make``, and use
other advanced mrjob features, you'll need to set up ``mrjob.conf``. mrjob looks
for its conf file in:
* The contents of ``$MRJOB_CONF``
* ``~/.mrjob.conf``
* ``/etc/mrjob.conf``
See `the mrjob.conf documentation
<http://packages.python.org/mrjob/configs-conf.html>`_ for more information.
Links
-----
* source: <http://github.com/Yelp/mrjob>
* documentation: <http://packages.python.org/mrjob/>
* discussion group: <http://groups.google.com/group/mrjob>
* Hadoop MapReduce: <http://hadoop.apache.org/mapreduce/>
* Elastic MapReduce: <http://aws.amazon.com/documentation/elasticmapreduce/>
* PyCon 2011 mrjob overview: <http://blip.tv/pycon-us-videos-2009-2010-2011/pycon-2011-mrjob-distributed-computing-for-everyone-4898987/>
Thanks to `Greg Killion <mailto:greg@blind-works.net>`_
(`blind-works.net <http://www.blind-works.net/>`_) for the logo.
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Natural Language :: English
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 2.5
Classifier: Programming Language :: Python :: 2.6
Classifier: Programming Language :: Python :: 2.7
Classifier: Topic :: System :: Distributed Computing
Provides: mrjob
|