This file is indexed.

/usr/share/doc/python-mrjob/README.rst is in python-mrjob 0.3.3.2-1.

This file is owned by root:root, with mode 0o644.

The actual contents of the file can be viewed below.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
mrjob
=====

.. image:: http://github.com/yelp/mrjob/raw/master/docs/logos/logo_medium.png

mrjob is a Python 2.5+ package that helps you write and run Hadoop Streaming
jobs.

`Main documentation <http://packages.python.org/mrjob/>`_

mrjob fully supports Amazon's Elastic MapReduce (EMR) service, which allows you
to buy time on a Hadoop cluster on an hourly basis. It also works with your own
Hadoop cluster.

Some important features:

* Run jobs on EMR, your own Hadoop cluster, or locally (for testing).
* Write multi-step jobs (one map-reduce step feeds into the next)
* Duplicate your production environment inside Hadoop
    * Upload your source tree and put it in your job's ``$PYTHONPATH``
    * Run make and other setup scripts
    * Set environment variables (e.g. ``$TZ``)
    * Easily install python packages from tarballs (EMR only)
    * Setup handled transparently by ``mrjob.conf`` config file
* Automatically interpret error logs from EMR
* SSH tunnel to hadoop job tracker on EMR
* Minimal setup
    * To run on EMR, set ``$AWS_ACCESS_KEY_ID`` and ``$AWS_SECRET_ACCESS_KEY``
    * To run on your Hadoop cluster, install ``simplejson`` and make sure
      ``$HADOOP_HOME`` is set.

Installation
------------

From PyPI:

``pip install mrjob``

From source:

``python setup.py install``


A Simple Map Reduce Job
-----------------------

Code for this example and more live in ``mrjob/examples``.

::

   """The classic MapReduce job: count the frequency of words. 
   """
   from mrjob.job import MRJob
   import re

   WORD_RE = re.compile(r"[\w']+")


   class MRWordFreqCount(MRJob):

       def mapper(self, _, line):
           for word in WORD_RE.findall(line):
               yield (word.lower(), 1)

       def combiner(self, word, counts):
           yield (word, sum(counts))

       def reducer(self, word, counts):
           yield (word, sum(counts))


    if __name__ == '__main__':
        MRWordFreqCount.run()

Try It Out!
-----------

::

    # locally
    python mrjob/examples/mr_word_freq_count.py README.rst > counts
    # on EMR
    python mrjob/examples/mr_word_freq_count.py README.rst -r emr > counts
    # on your Hadoop cluster
    python mrjob/examples/mr_word_freq_count.py README.rst -r hadoop > counts

Setting up EMR on Amazon
------------------------

* create an `Amazon Web Services account <http://aws.amazon.com/>`_
* sign up for `Elastic MapReduce <http://aws.amazon.com/elasticmapreduce/>`_
* Get your access and secret keys (click "Security Credentials" on
  `your account page <http://aws.amazon.com/account/>`_)
* Set the environment variables ``$AWS_ACCESS_KEY_ID`` and
  ``$AWS_SECRET_ACCESS_KEY`` accordingly

Advanced Configuration
----------------------

To run in other AWS regions, upload your source tree, run ``make``, and use 
other advanced mrjob features, you'll need to set up ``mrjob.conf``. mrjob looks 
for its conf file in:

* The contents of ``$MRJOB_CONF``
* ``~/.mrjob.conf``
* ``/etc/mrjob.conf``

See `the mrjob.conf documentation
<http://packages.python.org/mrjob/configs-conf.html>`_ for more information.


Links
-----

* source: <http://github.com/Yelp/mrjob>
* documentation: <http://packages.python.org/mrjob/>
* discussion group: <http://groups.google.com/group/mrjob>
* Hadoop MapReduce: <http://hadoop.apache.org/mapreduce/>
* Elastic MapReduce: <http://aws.amazon.com/documentation/elasticmapreduce/>
* PyCon 2011 mrjob overview: <http://blip.tv/pycon-us-videos-2009-2010-2011/pycon-2011-mrjob-distributed-computing-for-everyone-4898987/>

Thanks to `Greg Killion <mailto:greg@blind-works.net>`_
(`blind-works.net <http://www.blind-works.net/>`_) for the logo.