/usr/lib/python3/dist-packages/pdfrw-0.4.egg-info/PKG-INFO

Metadata-Version: 1.1
Name: pdfrw
Version: 0.4
Summary: PDF file reader/writer library
Home-page: https://github.com/pmaupin/pdfrw
Author: Patrick Maupin
Author-email: pmaupin@gmail.com
License: MIT
Description-Content-Type: UNKNOWN
Description: ==================
        pdfrw 0.4
        ==================
        
        :Author: Patrick Maupin
        
        .. contents::
            :backlinks: none
        
        .. sectnum::
        
        Introduction
        ============
        
        **pdfrw** is a Python library and utility that reads and writes PDF files:
        
        * Version 0.4 is tested and works on Python 2.6, 2.7, 3.3, 3.4, 3.5, and 3.6
        * Operations include subsetting, merging, rotating, modifying metadata, etc.
        * The fastest pure Python PDF parser available
        * Has been used for years by a printer in pre-press production
        * Can be used with rst2pdf to faithfully reproduce vector images
        * Can be used either standalone, or in conjunction with `reportlab`__
          to reuse existing PDFs in new ones
        * Permissively licensed
        
        __ http://www.reportlab.org/
        
        
        pdfrw will faithfully reproduce vector formats without
        rasterization, so the rst2pdf package has used pdfrw
        for PDF and SVG images by default since March 2010.
        
        pdfrw can also be used in conjunction with reportlab, in order
        to re-use portions of existing PDFs in new PDFs created with
        reportlab.
        
        
        Examples
        =========
        
        The library comes with several examples that show operation both with
        and without reportlab.
        
        
        All examples
        ------------------
        
        The examples directory has a few scripts which use the library.
        Note that if these examples do not work with your PDF, you should
        try to use pdftk to uncompress and/or unencrypt them first.
        
        * `4up.py`__ will shrink pages down and place 4 of them on
          each output page.
        * `alter.py`__ shows an example of modifying metadata, without
          altering the structure of the PDF.
        * `booklet.py`__ shows an example of creating a 2-up output
          suitable for printing and folding (e.g on tabloid size paper).
        * `cat.py`__ shows an example of concatenating multiple PDFs together.
        * `extract.py`__ will extract images and Form XObjects (embedded pages)
          from existing PDFs to make them easier to use and refer to from
          new PDFs (e.g. with reportlab or rst2pdf).
        * `poster.py`__ increases the size of a PDF so it can be printed
          as a poster.
        * `print_two.py`__ Allows creation of 8.5 X 5.5" booklets by slicing
          8.5 X 11" paper apart after printing.
        * `rotate.py`__ Rotates all or selected pages in a PDF.
        * `subset.py`__ Creates a new PDF with only a subset of pages from the
          original.
        * `unspread.py`__ Takes a 2-up PDF, and splits out pages.
        * `watermark.py`__ Adds a watermark PDF image over or under all the pages
          of a PDF.
        * `rl1/4up.py`__ Another 4up example, using reportlab canvas for output.
        * `rl1/booklet.py`__ Another booklet example, using reportlab canvas for
          output.
        * `rl1/subset.py`__ Another subsetting example, using reportlab canvas for
          output.
        * `rl1/platypus_pdf_template.py`__ Another watermarking example, using
          reportlab canvas and generated output for the document.  Contributed
          by user asannes.
        * `rl2`__ Experimental code for parsing graphics.  Needs work.
        * `subset_booklets.py`__ shows an example of creating a full printable pdf
          version in a more professional and pratical way ( take a look at
          http://www.wikihow.com/Bind-a-Book )
        
        __ https://github.com/pmaupin/pdfrw/tree/master/examples/4up.py
        __ https://github.com/pmaupin/pdfrw/tree/master/examples/alter.py
        __ https://github.com/pmaupin/pdfrw/tree/master/examples/booklet.py
        __ https://github.com/pmaupin/pdfrw/tree/master/examples/cat.py
        __ https://github.com/pmaupin/pdfrw/tree/master/examples/extract.py
        __ https://github.com/pmaupin/pdfrw/tree/master/examples/poster.py
        __ https://github.com/pmaupin/pdfrw/tree/master/examples/print_two.py
        __ https://github.com/pmaupin/pdfrw/tree/master/examples/rotate.py
        __ https://github.com/pmaupin/pdfrw/tree/master/examples/subset.py
        __ https://github.com/pmaupin/pdfrw/tree/master/examples/unspread.py
        __ https://github.com/pmaupin/pdfrw/tree/master/examples/watermark.py
        __ https://github.com/pmaupin/pdfrw/tree/master/examples/rl1/4up.py
        __ https://github.com/pmaupin/pdfrw/tree/master/examples/rl1/booklet.py
        __ https://github.com/pmaupin/pdfrw/tree/master/examples/rl1/subset.py
        __ https://github.com/pmaupin/pdfrw/tree/master/examples/rl1/platypus_pdf_template.py
        __ https://github.com/pmaupin/pdfrw/tree/master/examples/rl2/
        __ https://github.com/pmaupin/pdfrw/tree/master/examples/subset_booklets.py
        
        Notes on selected examples
        ------------------------------------
        
        Reorganizing pages and placing them two-up
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        
        A printer with a fancy printer and/or a full-up copy of Acrobat can
        easily turn your small PDF into a little booklet (for example, print 4
        letter-sized pages on a single 11" x 17").
        
        But that assumes several things, including that the personnel know how
        to operate the hardware and software. `booklet.py`__ lets you turn your PDF
        into a preformatted booklet, to give them fewer chances to mess it up.
        
        __ https://github.com/pmaupin/pdfrw/tree/master/examples/booklet.py
        
        Adding or modifying metadata
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        
        The `cat.py`__ example will accept multiple input files on the command
        line, concatenate them and output them to output.pdf, after adding some
        nonsensical metadata to the output PDF file.
        
        __ https://github.com/pmaupin/pdfrw/tree/master/examples/cat.py
        
        The `alter.py`__ example alters a single metadata item in a PDF,
        and writes the result to a new PDF.
        
        __ https://github.com/pmaupin/pdfrw/tree/master/examples/alter.py
        
        
        One difference is that, since **cat** is creating a new PDF structure,
        and **alter** is attempting to modify an existing PDF structure, the
        PDF produced by alter (and also by watermark.py) *should* be
        more faithful to the original (except for the desired changes).
        
        For example, the alter.py navigation should be left intact, whereas with
        cat.py it will be stripped.
        
        
        Rotating and doubling
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        
        If you ever want to print something that is like a small booklet, but
        needs to be spiral bound, you either have to do some fancy rearranging,
        or just waste half your paper.
        
        The `print_two.py`__ example program will, for example, make two side-by-side
        copies each page of of your PDF on a each output sheet.
        
        __ https://github.com/pmaupin/pdfrw/tree/master/examples/print_two.py
        
        But, every other page is flipped, so that you can print double-sided and
        the pages will line up properly and be pre-collated.
        
        Graphics stream parsing proof of concept
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        
        The `copy.py`__ script shows a simple example of reading in a PDF, and
        using the decodegraphics.py module to try to write the same information
        out to a new PDF through a reportlab canvas. (If you know about reportlab,
        you know that if you can faithfully render a PDF to a reportlab canvas, you
        can do pretty much anything else with that PDF you want.) This kind of
        low level manipulation should be done only if you really need to.
        decodegraphics is really more than a proof of concept than anything
        else. For most cases, just use the Form XObject capability, as shown in
        the examples/rl1/booklet.py demo.
        
        __ https://github.com/pmaupin/pdfrw/tree/master/examples/rl2/copy.py
        
        pdfrw philosophy
        ==================
        
        Core library
        -------------
        
        The philosophy of the library portion of pdfrw is to provide intuitive
        functions to read, manipulate, and write PDF files.  There should be
        minimal leakage between abstraction layers, although getting useful
        work done makes "pure" functionality separation difficult.
        
        A key concept supported by the library is the use of Form XObjects,
        which allow easy embedding of pieces of one PDF into another.
        
        Addition of core support to the library is typically done carefully
        and thoughtfully, so as not to clutter it up with too many special
        cases.
        
        There are a lot of incorrectly formatted PDFs floating around; support
        for these is added in some cases.  The decision is often based on what
        acroread and okular do with the PDFs; if they can display them properly,
        then eventually pdfrw should, too, if it is not too difficult or costly.
        
        Contributions are welcome; one user has contributed some decompression
        filters and the ability to process PDF 1.5 stream objects.  Additional
        functionality that would obviously be useful includes additional
        decompression filters, the ability to process password-protected PDFs,
        and the ability to output linearized PDFs.
        
        Examples
        --------
        
        The philosophy of the examples is to provide small, easily-understood
        examples that showcase pdfrw functionality.
        
        
        PDF files and Python
        ======================
        
        Introduction
        ------------
        
        In general, PDF files conceptually map quite well to Python. The major
        objects to think about are:
        
        -  **strings**. Most things are strings. These also often decompose
           naturally into
        -  **lists of tokens**. Tokens can be combined to create higher-level
           objects like
        -  **arrays** and
        -  **dictionaries** and
        -  **Contents streams** (which can be more streams of tokens)
        
        Difficulties
        ------------
        
        The apparent primary difficulty in mapping PDF files to Python is the
        PDF file concept of "indirect objects."  Indirect objects provide
        the efficiency of allowing a single piece of data to be referred to
        from more than one containing object, but probably more importantly,
        indirect objects provide a way to get around the chicken and egg
        problem of circular object references when mapping arbitrary data
        structures to files. To flatten out a circular reference, an indirect
        object is *referred to* instead of being *directly included* in another
        object. PDF files have a global mechanism for locating indirect objects,
        and they all have two reference numbers (a reference number and a
        "generation" number, in case you wanted to append to the PDF file
        rather than just rewriting the whole thing).
        
        pdfrw automatically handles indirect references on reading in a PDF
        file. When pdfrw encounters an indirect PDF file object, the
        corresponding Python object it creates will have an 'indirect' attribute
        with a value of True. When writing a PDF file, if you have created
        arbitrary data, you just need to make sure that circular references are
        broken up by putting an attribute named 'indirect' which evaluates to
        True on at least one object in every cycle.
        
        Another PDF file concept that doesn't quite map to regular Python is a
        "stream". Streams are dictionaries which each have an associated
        unformatted data block. pdfrw handles streams by placing a special
        attribute on a subclassed dictionary.
        
        Usage Model
        -----------
        
        The usage model for pdfrw treats most objects as strings (it takes their
        string representation when writing them to a file). The two main
        exceptions are the PdfArray object and the PdfDict object.
        
        PdfArray is a subclass of list with two special features.  First,
        an 'indirect' attribute allows a PdfArray to be written out as
        an indirect PDF object.  Second, pdfrw reads files lazily, so
        PdfArray knows about, and resolves references to other indirect
        objects on an as-needed basis.
        
        PdfDict is a subclass of dict that also has an indirect attribute
        and lazy reference resolution as well.  (And the subclassed
        IndirectPdfDict has indirect automatically set True).
        
        But PdfDict also has an optional associated stream. The stream object
        defaults to None, but if you assign a stream to the dict, it will
        automatically set the PDF /Length attribute for the dictionary.
        
        Finally, since PdfDict instances are indexed by PdfName objects (which
        always start with a /) and since most (all?) standard Adobe PdfName
        objects use names formatted like "/CamelCase", it makes sense to allow
        access to dictionary elements via object attribute accesses as well as
        object index accesses. So usage of PdfDict objects is normally via
        attribute access, although non-standard names (though still with a
        leading slash) can be accessed via dictionary index lookup.
        
        Reading PDFs
        ~~~~~~~~~~~~~~~
        
        The PdfReader object is a subclass of PdfDict, which allows easy access
        to an entire document::
        
            >>> from pdfrw import PdfReader
            >>> x = PdfReader('source.pdf')
            >>> x.keys()
            ['/Info', '/Size', '/Root']
            >>> x.Info
            {'/Producer': '(cairo 1.8.6 (http://cairographics.org))',
             '/Creator': '(cairo 1.8.6 (http://cairographics.org))'}
            >>> x.Root.keys()
            ['/Type', '/Pages']
        
        Info, Size, and Root are retrieved from the trailer of the PDF file.
        
        In addition to the tree structure, pdfrw creates a special attribute
        named *pages*, that is a list of all the pages in the document. pdfrw
        creates the *pages* attribute as a simplification for the user, because
        the PDF format allows arbitrarily complicated nested dictionaries to
        describe the page order. Each entry in the *pages* list is the PdfDict
        object for one of the pages in the file, in order.
        
        ::
        
            >>> len(x.pages)
            1
            >>> x.pages[0]
            {'/Parent': {'/Kids': [{...}], '/Type': '/Pages', '/Count': '1'},
             '/Contents': {'/Length': '11260', '/Filter': None},
             '/Resources': ... (Lots more stuff snipped)
            >>> x.pages[0].Contents
            {'/Length': '11260', '/Filter': None}
            >>> x.pages[0].Contents.stream
            'q\n1 1 1 rg /a0 gs\n0 0 0 RG 0.657436
              w\n0 J\n0 j\n[] 0.0 d\n4 M q' ... (Lots more stuff snipped)
        
        Writing PDFs
        ~~~~~~~~~~~~~~~
        
        As you can see, it is quite easy to dig down into a PDF document. But
        what about when it's time to write it out?
        
        ::
        
            >>> from pdfrw import PdfWriter
            >>> y = PdfWriter()
            >>> y.addpage(x.pages[0])
            >>> y.write('result.pdf')
        
        That's all it takes to create a new PDF. You may still need to read the
        `Adobe PDF reference manual`__ to figure out what needs to go *into*
        the PDF, but at least you don't have to sweat actually building it
        and getting the file offsets right.
        
        __ http://www.adobe.com/devnet/acrobat/pdfs/pdf_reference_1-7.pdf
        
        Manipulating PDFs in memory
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        
        For the most part, pdfrw tries to be agnostic about the contents of
        PDF files, and support them as containers, but to do useful work,
        something a little higher-level is required, so pdfrw works to
        understand a bit about the contents of the containers.  For example:
        
        -  PDF pages. pdfrw knows enough to find the pages in PDF files you read
           in, and to write a set of pages back out to a new PDF file.
        -  Form XObjects. pdfrw can take any page or rectangle on a page, and
           convert it to a Form XObject, suitable for use inside another PDF
           file.  It knows enough about these to perform scaling, rotation,
           and positioning.
        -  reportlab objects. pdfrw can recursively create a set of reportlab
           objects from its internal object format. This allows, for example,
           Form XObjects to be used inside reportlab, so that you can reuse
           content from an existing PDF file when building a new PDF with
           reportlab.
        
        There are several examples that demonstrate these features in
        the example code directory.
        
        Missing features
        ~~~~~~~~~~~~~~~~~~~~~~~
        
        Even as a pure PDF container library, pdfrw comes up a bit short. It
        does not currently support:
        
        -  Most compression/decompression filters
        -  encryption
        
        `pdftk`__ is a wonderful command-line
        tool that can convert your PDFs to remove encryption and compression.
        However, in most cases, you can do a lot of useful work with PDFs
        without actually removing compression, because only certain elements
        inside PDFs are actually compressed.
        
        __ https://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/
        
        Library internals
        ==================
        
        Introduction
        ------------
        
        **pdfrw** currently consists of 19 modules organized into a main
        package and one sub-package.
        
        The `__init.py__`__ module does the usual thing of importing a few
        major attributes from some of the submodules, and the `errors.py`__
        module supports logging and exception generation.
        
        __ https://github.com/pmaupin/pdfrw/tree/master/pdfrw/__init__.py
        __ https://github.com/pmaupin/pdfrw/tree/master/pdfrw/errors.py
        
        
        PDF object model support
        --------------------------
        
        The `objects`__ sub-package contains one module for each of the
        internal representations of the kinds of basic objects that exist
        in a PDF file, with the `objects/__init__.py`__ module in that
        package simply gathering them up and making them available to the
        main pdfrw package.
        
        One feature that all the PDF object classes have in common is the
        inclusion of an 'indirect' attribute. If 'indirect' exists and evaluates
        to True, then when the object is written out, it is written out as an
        indirect object. That is to say, it is addressable in the PDF file, and
        could be referenced by any number (including zero) of container objects.
        This indirect object capability saves space in PDF files by allowing
        objects such as fonts to be referenced from multiple pages, and also
        allows PDF files to contain internal circular references.  This latter
        capability is used, for example, when each page object has a "parent"
        object in its dictionary.
        
        __ https://github.com/pmaupin/pdfrw/tree/master/pdfrw/objects/
        __ https://github.com/pmaupin/pdfrw/tree/master/pdfrw/objects/__init__.py
        
        Ordinary objects
        ~~~~~~~~~~~~~~~~
        
        The `objects/pdfobject.py`__ module contains the PdfObject class, which is
        a subclass of str, and is the catch-all object for any PDF file elements
        that are not explicitly represented by other objects, as described below.
        
        __ https://github.com/pmaupin/pdfrw/tree/master/pdfrw/objects/pdfobject.py
        
        Name objects
        ~~~~~~~~~~~~
        
        The `objects/pdfname.py`__ module contains the PdfName singleton object,
        which will convert a string into a PDF name by prepending a slash. It can
        be used either by calling it or getting an attribute, e.g.::
        
            PdfName.Rotate == PdfName('Rotate') == PdfObject('/Rotate')
        
        In the example above, there is a slight difference between the objects
        returned from PdfName, and the object returned from PdfObject.  The
        PdfName objects are actually objects of class "BasePdfName".  This
        is important, because only these may be used as keys in PdfDict objects.
        
        __ https://github.com/pmaupin/pdfrw/tree/master/pdfrw/objects/pdfname.py
        
        String objects
        ~~~~~~~~~~~~~~
        
        The `objects/pdfstring.py`__
        module contains the PdfString class, which is a subclass of str that is
        used to represent encoded strings in a PDF file. The class has encode
        and decode methods for the strings.
        
        __ https://github.com/pmaupin/pdfrw/tree/master/pdfrw/objects/pdfstring.py
        
        
        Array objects
        ~~~~~~~~~~~~~
        
        The `objects/pdfarray.py`__
        module contains the PdfArray class, which is a subclass of list that is
        used to represent arrays in a PDF file. A regular list could be used
        instead, but use of the PdfArray class allows for an indirect attribute
        to be set, and also allows for proxying of unresolved indirect objects
        (that haven't been read in yet) in a manner that is transparent to pdfrw
        clients.
        
        __ https://github.com/pmaupin/pdfrw/tree/master/pdfrw/objects/pdfarray.py
        
        Dict objects
        ~~~~~~~~~~~~
        
        The `objects/pdfdict.py`__
        module contains the PdfDict class, which is a subclass of dict that is
        used to represent dictionaries in a PDF file. A regular dict could be
        used instead, but the PdfDict class matches the requirements of PDF
        files more closely:
        
        * Transparent (from the library client's viewpoint) proxying
          of unresolved indirect objects
        * Return of None for non-existent keys (like dict.get)
        * Mapping of attribute accesses to the dict itself
          (pdfdict.Foo == pdfdict[NameObject('Foo')])
        * Automatic management of following stream and /Length attributes
          for content dictionaries
        * Indirect attribute
        * Other attributes may be set for private internal use of the
          library and/or its clients.
        * Support for searching parent dictionaries for PDF "inheritable"
          attributes.
        
        __ https://github.com/pmaupin/pdfrw/tree/master/pdfrw/objects/pdfdict.py
        
        If a PdfDict has an associated data stream in the PDF file, the stream
        is accessed via the 'stream' (all lower-case) attribute.  Setting the
        stream attribute on the PdfDict will automatically set the /Length attribute
        as well.  If that is not what is desired (for example if the the stream
        is compressed), then _stream (same name with an underscore) may be used
        to associate the stream with the PdfDict without setting the length.
        
        To set private attributes (that will not be written out to a new PDF
        file) on a dictionary, use the 'private' attribute::
        
            mydict.private.foo = 1
        
        Once the attribute is set, it may be accessed directly as an attribute
        of the dictionary::
        
            foo = mydict.foo
        
        Some attributes of PDF pages are "inheritable."  That is, they may
        belong to a parent dictionary (or a parent of a parent dictionary, etc.)
        The "inheritable" attribute allows for easy discovery of these::
        
            mediabox = mypage.inheritable.MediaBox
        
        
        Proxy objects
        ~~~~~~~~~~~~~
        
        The `objects/pdfindirect.py`__
        module contains the PdfIndirect class, which is a non-transparent proxy
        object for PDF objects that have not yet been read in and resolved from
        a file. Although these are non-transparent inside the library, client code
        should never see one of these -- they exist inside the PdfArray and PdfDict
        container types, but are resolved before being returned to a client of
        those types.
        
        __ https://github.com/pmaupin/pdfrw/tree/master/pdfrw/objects/pdfindirect.py
        
        
        File reading, tokenization and parsing
        --------------------------------------
        
        `pdfreader.py`__
        contains the PdfReader class, which can read a PDF file (or be passed a
        file object or already read string) and parse it. It uses the PdfTokens
        class in `tokens.py`__  for low-level tokenization.
        
        __ https://github.com/pmaupin/pdfrw/tree/master/pdfrw/pdfreader.py
        __ https://github.com/pmaupin/pdfrw/tree/master/pdfrw/tokens.py
        
        
        The PdfReader class does not, in general, parse into containers (e.g.
        inside the content streams). There is a proof of concept for doing that
        inside the examples/rl2 subdirectory, but that is slow and not well-developed,
        and not useful for most applications.
        
        An instance of the PdfReader class is an instance of a PdfDict -- the
        trailer dictionary of the PDF file, to be exact.  It will have a private
        attribute set on it that is named 'pages' that is a list containing all
        the pages in the file.
        
        When instantiating a PdfReader object, there are options available
        for decompressing all the objects in the file.  pdfrw does not currently
        have very many options for decompression, so this is not all that useful,
        except in the specific case of compressed object streams.
        
        Also, there are no options for decryption yet.  If you have PDF files
        that are encrypted or heavily compressed, you may find that using another
        program like pdftk on them can make them readable by pdfrw.
        
        In general, the objects are read from the file lazily, but this is not
        currently true with compressed object streams -- all of these are decompressed
        and read in when the PdfReader is instantiated.
        
        
        File output
        -----------
        
        `pdfwriter.py`__
        contains the PdfWriter class, which can create and output a PDF file.
        
        __ https://github.com/pmaupin/pdfrw/tree/master/pdfrw/pdfwriter.py
        
        There are a few options available when creating and using this class.
        
        In the simplest case, an instance of PdfWriter is instantiated, and
        then pages are added to it from one or more source files (or created
        programmatically), and then the write method is called to dump the
        results out to a file.
        
        If you have a source PDF and do not want to disturb the structure
        of it too badly, then you may pass its trailer directly to PdfWriter
        rather than letting PdfWriter construct one for you.  There is an
        example of this (alter.py) in the examples directory.
        
        
        Advanced features
        -----------------
        
        `buildxobj.py`__
        contains functions to build Form XObjects out of pages or rectangles on
        pages.  These may be reused in new PDFs essentially as if they were images.
        
        buildxobj is careful to cache any page used so that it only appears in
        the output once.
        
        __ https://github.com/pmaupin/pdfrw/tree/master/pdfrw/buildxobj.py
        
        
        `toreportlab.py`__
        provides the makerl function, which will translate pdfrw objects into a
        format which can be used with `reportlab <http://www.reportlab.org/>`__.
        It is normally used in conjunction with buildxobj, to be able to reuse
        parts of existing PDFs when using reportlab.
        
        __ https://github.com/pmaupin/pdfrw/tree/master/pdfrw/toreportlab.py
        
        
        `pagemerge.py`__ builds on the foundation laid by buildxobj.  It
        contains classes to create a new page (or overlay an existing page)
        using one or more rectangles from other pages.  There are examples
        showing its use for watermarking, scaling, 4-up output, splitting
        each page in 2, etc.
        
        __ https://github.com/pmaupin/pdfrw/tree/master/pdfrw/pagemerge.py
        
        `findobjs.py`__ contains code that can find specific kinds of objects
        inside a PDF file.  The extract.py example uses this module to create
        a new PDF that places each image and Form XObject from a source PDF onto
        its own page, e.g. for easy reuse with some of the other examples or
        with reportlab.
        
        __ https://github.com/pmaupin/pdfrw/tree/master/pdfrw/findobjs.py
        
        
        Miscellaneous
        ----------------
        
        `compress.py`__ and `uncompress.py`__
        contains compression and decompression functions. Very few filters are
        currently supported, so an external tool like pdftk might be good if you
        require the ability to decompress (or, for that matter, decrypt) PDF
        files.
        
        __ https://github.com/pmaupin/pdfrw/tree/master/pdfrw/compress.py
        __ https://github.com/pmaupin/pdfrw/tree/master/pdfrw/uncompress.py
        
        
        `py23_diffs.py`__ contains code to help manage the differences between
        Python 2 and Python 3.
        
        __ https://github.com/pmaupin/pdfrw/tree/master/pdfrw/py23_diffs.py
        
        Testing
        ===============
        
        The tests associated with pdfrw require a large number of PDFs,
        which are not distributed with the library.
        
        To run the tests:
        
        * Download or clone the full package from github.com/pmaupin/pdfrw
        * cd into the tests directory, and then clone the package
          github.com/pmaupin/static_pdfs into a subdirectory (also named
          static_pdfs).
        * Now the tests may be run from that directory using unittest, or
          py.test, or nose.
        * travisci is used at github, and runs the tests with py.test
        
        Other libraries
        =====================
        
        Pure Python
        -----------
        
        -  `reportlab <http://www.reportlab.org/>`__
        
            reportlab is must-have software if you want to programmatically
            generate arbitrary PDFs.
        
        -  `pyPdf <https://github.com/mstamy2/PyPDF2>`__
        
            pyPdf is, in some ways, very full-featured. It can do decompression
            and decryption and seems to know a lot about items inside at least
            some kinds of PDF files. In comparison, pdfrw knows less about
            specific PDF file features (such as metadata), but focuses on trying
            to have a more Pythonic API for mapping the PDF file container
            syntax to Python, and (IMO) has a simpler and better PDF file
            parser.  The Form XObject capability of pdfrw means that, in many
            cases, it does not actually need to decompress objects -- they
            can be left compressed.
        
        -  `pdftools <http://www.boddie.org.uk/david/Projects/Python/pdftools/index.html>`__
        
            pdftools feels large and I fell asleep trying to figure out how it
            all fit together, but many others have done useful things with it.
        
        -  `pagecatcher <http://www.reportlab.com/docs/pagecatcher-ds.pdf>`__
        
            My understanding is that pagecatcher would have done exactly what I
            wanted when I built pdfrw. But I was on a zero budget, so I've never
            had the pleasure of experiencing pagecatcher. I do, however, use and
            like `reportlab <http://www.reportlab.org/>`__ (open source, from
            the people who make pagecatcher) so I'm sure pagecatcher is great,
            better documented and much more full-featured than pdfrw.
        
        -  `pdfminer <http://www.unixuser.org/~euske/python/pdfminer/index.html>`__
        
            This looks like a useful, actively-developed program. It is quite
            large, but then, it is trying to actively comprehend a full PDF
            document. From the website:
        
            "PDFMiner is a suite of programs that help extracting and analyzing
            text data of PDF documents. Unlike other PDF-related tools, it
            allows to obtain the exact location of texts in a page, as well as
            other extra information such as font information or ruled lines. It
            includes a PDF converter that can transform PDF files into other
            text formats (such as HTML). It has an extensible PDF parser that
            can be used for other purposes instead of text analysis."
        
        non-pure-Python libraries
        -------------------------
        
        -  `pyPoppler <https://launchpad.net/poppler-python/>`__ can read PDF
           files.
        -  `pycairo <http://www.cairographics.org/pycairo/>`__ can write PDF
           files.
        -  `PyMuPDF <https://github.com/rk700/PyMuPDF>`_ high performance rendering
           of PDF, (Open)XPS, CBZ and EPUB
        
        Other tools
        -----------
        
        -  `pdftk <https://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/>`__ is a wonderful command
           line tool for basic PDF manipulation. It complements pdfrw extremely
           well, supporting many operations such as decryption and decompression
           that pdfrw cannot do.
        -  `MuPDF <http://www.mupdf.com/>`_ is a free top performance PDF, (Open)XPS, CBZ and EPUB rendering library
           that also comes with some command line tools. One of those, ``mutool``, has big overlaps with pdftk's - 
           except it is up to 10 times faster.
        
        Release information
        =======================
        
        Revisions:
        
        0.4 -- Released 18 September, 2017
        
            - Python 3.6 added to test matrix
            - Proper unicode support for text strings in PDFs added
            - buildxobj fixes allow better support creating form XObjects
              out of compressed pages in some cases
            - Compression fixes for Python 3+
            - New subset_booklets.py example
            - Bug with non-compressed indices into compressed object streams fixed
            - Bug with distinguishing compressed object stream first objects fixed
            - Better error reporting added for some invalid PDFs (e.g. when reading
              past the end of file)
            - Better scrubbing of old bookmark information when writing PDFs, to
              remove dangling references
            - Refactoring of pdfwriter, including updating API, to allow future
              enhancements for things like incremental writing
            - Minor tokenizer speedup
            - Some flate decompressor bugs fixed
            - Compression and decompression tests added
            - Tests for new unicode handling added
            - PdfReader.readpages() recursion error (issue #92) fixed.
            - Initial crypt filter support added
        
        
        0.3 -- Released 19 October, 2016.
        
            - Python 3.5 added to test matrix
            - Better support under Python 3.x for in-memory PDF file-like objects
            - Some pagemerge and Unicode patches added
            - Changes to logging allow better coexistence with other packages
            - Fix for "from pdfrw import \*"
            - New fancy_watermark.py example shows off capabilities of pagemerge.py
            - metadata.py example renamed to cat.py
        
        
        0.2 -- Released 21 June, 2015.  Supports Python 2.6, 2.7, 3.3, and 3.4.
        
            - Several bugs have been fixed
            - New regression test functionally tests core with dozens of
              PDFs, and also tests examples.
            - Core has been ported and tested on Python3 by round-tripping
              several difficult files and observing binary matching results
              across the different Python versions.
            - Still only minimal support for compression and no support
              for encryption or newer PDF features.  (pdftk is useful
              to put PDFs in a form that pdfrw can use.)
        
        0.1 -- Released to PyPI in 2012.  Supports Python 2.5 - 2.7
        
        
Keywords: pdf vector graphics PDF nup watermark split join merge
Platform: Independent
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 2.6
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.3
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Topic :: Multimedia :: Graphics :: Graphics Conversion
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Text Processing
Classifier: Topic :: Printing
Classifier: Topic :: Utilities
python3-pdfrw 0.4-1 / usr / lib / python3 / dist-packages / pdfrw-0.4.egg-info / PKG-INFO