/usr/lib/python2.7/dist-packages/zfec-1.4.24.egg-info/PKG-INFO is in python-zfec 1.4.24-3.
This file is owned by root:root, with mode 0o644.
The actual contents of the file can be viewed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 | Metadata-Version: 1.1
Name: zfec
Version: 1.4.24
Summary: a fast erasure codec which can be used with the command-line, C, Python, or Haskell
Home-page: https://tahoe-lafs.org/trac/zfec
Author: Zooko O'Whielacronx
Author-email: zooko@zooko.com
License: GNU GPL
Description:
zfec -- efficient, portable erasure coding tool
===============================================
Generate redundant blocks of information such that if some of the blocks are
lost then the original data can be recovered from the remaining blocks. This
package includes command-line tools, C API, Python API, and Haskell API.
Intro and Licence
-----------------
This package implements an "erasure code", or "forward error correction
code".
You may use this package under the GNU General Public License, version 2 or,
at your option, any later version. You may use this package under the
Transitive Grace Period Public Licence, version 1.0 or, at your option, any
later version. (You may choose to use this package under the terms of either
licence, at your option.) See the file COPYING.GPL for the terms of the GNU
General Public License, version 2. See the file COPYING.TGPPL.html for the
terms of the Transitive Grace Period Public Licence, version 1.0.
The most widely known example of an erasure code is the RAID-5 algorithm
which makes it so that in the event of the loss of any one hard drive, the
stored data can be completely recovered. The algorithm in the zfec package
has a similar effect, but instead of recovering from the loss of only a
single element, it can be parameterized to choose in advance the number of
elements whose loss it can tolerate.
This package is largely based on the old "fec" library by Luigi Rizzo et al.,
which is a mature and optimized implementation of erasure coding. The zfec
package makes several changes from the original "fec" package, including
addition of the Python API, refactoring of the C API to support zero-copy
operation, a few clean-ups and optimizations of the core code itself, and the
addition of a command-line tool named "zfec".
Installation
------------
This package is managed with the "setuptools" package management tool. To
build and install the package directly into your system, just run ``python
./setup.py install``. If you prefer to keep the package limited to a
specific directory so that you can manage it yourself (perhaps by using the
"GNU stow") tool, then give it these arguments: ``python ./setup.py install
--single-version-externally-managed
--record=${specificdirectory}/zfec-install.log
--prefix=${specificdirectory}``
To run the self-tests, execute ``python ./setup.py test`` (or if you have
Twisted Python installed, you can run ``trial zfec`` for nicer output and
test options.) This will run the tests of the C API, the Python API, and the
command-line tools.
To run the tests of the Haskell API: ``runhaskell haskell/test/FECTest.hs``
Note that in order to run the Haskell API tests you must have installed the
library first due to the fact that the interpreter cannot process FEC.hs as
it takes a reference to an FFI function.
Community
---------
The source is currently available via darcs on the web with the command:
darcs get https://tahoe-lafs.org/source/zfec/trunk
More information on darcs is available at http://darcs.net
Please post about zfec to the Tahoe-LAFS mailing list and contribute patches:
<https://tahoe-lafs.org/cgi-bin/mailman/listinfo/tahoe-dev>
Overview
--------
This package performs two operations, encoding and decoding. Encoding takes
some input data and expands its size by producing extra "check blocks", also
called "secondary blocks". Decoding takes some data -- any combination of
blocks of the original data (called "primary blocks") and "secondary blocks",
and produces the original data.
The encoding is parameterized by two integers, k and m. m is the total
number of blocks produced, and k is how many of those blocks are necessary to
reconstruct the original data. m is required to be at least 1 and at most
256, and k is required to be at least 1 and at most m.
(Note that when k == m then there is no point in doing erasure coding -- it
degenerates to the equivalent of the Unix "split" utility which simply splits
the input into successive segments. Similarly, when k == 1 it degenerates to
the equivalent of the unix "cp" utility -- each block is a complete copy of
the input data.)
Note that each "primary block" is a segment of the original data, so its size
is 1/k'th of the size of original data, and each "secondary block" is of the
same size, so the total space used by all the blocks is m/k times the size of
the original data (plus some padding to fill out the last primary block to be
the same size as all the others). In addition to the data contained in the
blocks themselves there are also a few pieces of metadata which are necessary
for later reconstruction. Those pieces are: 1. the value of K, 2. the
value of M, 3. the sharenum of each block, 4. the number of bytes of
padding that were used. The "zfec" command-line tool compresses these pieces
of data and prepends them to the beginning of each share, so each the
sharefile produced by the "zfec" command-line tool is between one and four
bytes larger than the share data alone.
The decoding step requires as input k of the blocks which were produced by
the encoding step. The decoding step produces as output the data that was
earlier input to the encoding step.
Command-Line Tool
-----------------
The bin/ directory contains two Unix-style, command-line tools "zfec" and
"zunfec". Execute ``zfec --help`` or ``zunfec --help`` for usage
instructions.
Note: a Unix-style tool like "zfec" does only one thing -- in this case
erasure coding -- and leaves other tasks to other tools. Other Unix-style
tools that go well with zfec include `GNU tar`_ for archiving multiple files
and directories into one file, `lzip`_ for compression, and `GNU Privacy
Guard`_ for encryption or `sha256sum`_ for integrity. It is important to do
things in order: first archive, then compress, then either encrypt or
integrity-check, then erasure code. Note that if GNU Privacy Guard is used
for privacy, then it will also ensure integrity, so the use of sha256sum is
unnecessary in that case. Note also that you also need to do integrity
checking (such as with sha256sum) on the blocks that result from the erasure
coding in *addition* to doing it on the file contents! (There are two
different subtle failure modes -- see "more than one file can match an
immutable file cap" on the `Hack Tahoe-LAFS!`_ Hall of Fame.)
The `Tahoe-LAFS`_ project uses zfec as part of a complete distributed
filesystem with integrated encryption, integrity, remote distribution of the
blocks, directory structure, backup of changed files or directories, access
control, immutable files and directories, proof-of-retrievability, and repair
of damaged files and directories.
.. _GNU tar: http://directory.fsf.org/project/tar/
.. _lzip: http://www.nongnu.org/lzip/lzip.html
.. _GNU Privacy Guard: http://gnupg.org/
.. _sha256sum: http://www.gnu.org/software/coreutils/
.. _Tahoe-LAFS: https://tahoe-lafs.org
.. _Hack Tahoe-LAFS!: https://tahoe-lafs.org/hacktahoelafs/
Performance
-----------
To run the benchmarks, execute the included bench/bench_zfec.py script with
optional --k= and --m= arguments.
On my Athlon 64 2.4 GHz workstation (running Linux), the "zfec" command-line
tool encoded a 160 MB file with m=100, k=94 (about 6% redundancy) in 3.9
seconds, where the "par2" tool encoded the file with about 6% redundancy in
27 seconds. zfec encoded the same file with m=12, k=6 (100% redundancy) in
4.1 seconds, where par2 encoded it with about 100% redundancy in 7 minutes
and 56 seconds.
The underlying C library in benchmark mode encoded from a file at about 4.9
million bytes per second and decoded at about 5.8 million bytes per second.
On Peter's fancy Intel Mac laptop (2.16 GHz Core Duo), it encoded from a file
at about 6.2 million bytes per second.
On my even fancier Intel Mac laptop (2.33 GHz Core Duo), it encoded from a
file at about 6.8 million bytes per second.
On my old PowerPC G4 867 MHz Mac laptop, it encoded from a file at about 1.3
million bytes per second.
Here is a paper analyzing the performance of various erasure codes and their
implementations, including zfec:
http://www.usenix.org/events/fast09/tech/full_papers/plank/plank.pdf
Zfec shows good performance on different machines and with different values
of K and M. It also has a nice small memory footprint.
API
---
Each block is associated with "blocknum". The blocknum of each primary block
is its index (starting from zero), so the 0'th block is the first primary
block, which is the first few bytes of the file, the 1'st block is the next
primary block, which is the next few bytes of the file, and so on. The last
primary block has blocknum k-1. The blocknum of each secondary block is an
arbitrary integer between k and 255 inclusive. (When using the Python API,
if you don't specify which secondary blocks you want when invoking encode(),
then it will by default provide the blocks with ids from k to m-1 inclusive.)
- C API
fec_encode() takes as input an array of k pointers, where each pointer
points to a memory buffer containing the input data (i.e., the i'th buffer
contains the i'th primary block). There is also a second parameter which
is an array of the blocknums of the secondary blocks which are to be
produced. (Each element in that array is required to be the blocknum of a
secondary block, i.e. it is required to be >= k and < m.)
The output from fec_encode() is the requested set of secondary blocks which
are written into output buffers provided by the caller.
Note that this fec_encode() is a "low-level" API in that it requires the
input data to be provided in a set of memory buffers of exactly the right
sizes. If you are starting instead with a single buffer containing all of
the data then please see easyfec.py's "class Encoder" as an example of how
to split a single large buffer into the appropriate set of input buffers
for fec_encode(). If you are starting with a file on disk, then please see
filefec.py's encode_file_stringy_easyfec() for an example of how to read
the data from a file and pass it to "class Encoder". The Python interface
provides these higher-level operations, as does the Haskell interface. If
you implement functions to do these higher-level tasks in other languages,
please send a patch to tahoe-dev@tahoe-lafs.org so that your API can be
included in future releases of zfec.
fec_decode() takes as input an array of k pointers, where each pointer
points to a buffer containing a block. There is also a separate input
parameter which is an array of blocknums, indicating the blocknum of each
of the blocks which is being passed in.
The output from fec_decode() is the set of primary blocks which were
missing from the input and had to be reconstructed. These reconstructed
blocks are written into output buffers provided by the caller.
- Python API
encode() and decode() take as input a sequence of k buffers, where a
"sequence" is any object that implements the Python sequence protocol (such
as a list or tuple) and a "buffer" is any object that implements the Python
buffer protocol (such as a string or array). The contents that are
required to be present in these buffers are the same as for the C API.
encode() also takes a list of desired blocknums. Unlike the C API, the
Python API accepts blocknums of primary blocks as well as secondary blocks
in its list of desired blocknums. encode() returns a list of buffer
objects which contain the blocks requested. For each requested block which
is a primary block, the resulting list contains a reference to the
apppropriate primary block from the input list. For each requested block
which is a secondary block, the list contains a newly created string object
containing that block.
decode() also takes a list of integers indicating the blocknums of the
blocks being passed int. decode() returns a list of buffer objects which
contain all of the primary blocks of the original data (in order). For
each primary block which was present in the input list, then the result
list simply contains a reference to the object that was passed in the input
list. For each primary block which was not present in the input, the
result list contains a newly created string object containing that primary
block.
Beware of a "gotcha" that can result from the combination of mutable data
and the fact that the Python API returns references to inputs when
possible.
Returning references to its inputs is efficient since it avoids making an
unnecessary copy of the data, but if the object which was passed as input
is mutable and if that object is mutated after the call to zfec returns,
then the result from zfec -- which is just a reference to that same object
-- will also be mutated. This subtlety is the price you pay for avoiding
data copying. If you don't want to have to worry about this then you can
simply use immutable objects (e.g. Python strings) to hold the data that
you pass to zfec.
- Haskell API
The Haskell code is fully Haddocked, to generate the documentation, run
``runhaskell Setup.lhs haddock``.
Utilities
---------
The filefec.py module has a utility function for efficiently reading a file
and encoding it piece by piece. This module is used by the "zfec" and
"zunfec" command-line tools from the bin/ directory.
Dependencies
------------
A C compiler is required. To use the Python API or the command-line tools a
Python interpreter is also required. We have tested it with Python v2.4,
v2.5, v2.6, and v2.7. For the Haskell interface, GHC >= 6.8.1 is required.
Acknowledgements
----------------
Thanks to the author of the original fec lib, Luigi Rizzo, and the folks that
contributed to it: Phil Karn, Robert Morelos-Zaragoza, Hari Thirumoorthy, and
Dan Rubenstein. Thanks to the Mnet hackers who wrote an earlier Python
wrapper, especially Myers Carpenter and Hauke Johannknecht. Thanks to Brian
Warner and Amber O'Whielacronx for help with the API, documentation,
debugging, compression, and unit tests. Thanks to Adam Langley for improving
the C API and contributing the Haskell API. Thanks to the creators of GCC
(starting with Richard M. Stallman) and Valgrind (starting with Julian
Seward) for a pair of excellent tools. Thanks to my coworkers at Allmydata
-- http://allmydata.com -- Fabrice Grinda, Peter Secor, Rob Kinninmont, Brian
Warner, Zandr Milewski, Justin Boreta, Mark Meras for sponsoring this work
and releasing it under a Free Software licence. Thanks to Jack Lloyd, Samuel
Neves, and David-Sarah Hopwood.
Enjoy!
Zooko Wilcox-O'Hearn
2012-03-31
Boulder, Colorado
Platform: UNKNOWN
Classifier: Development Status :: 5 - Production/Stable
Classifier: Environment :: Console
Classifier: License :: OSI Approved :: GNU General Public License (GPL)
Classifier: License :: DFSG approved
Classifier: License :: Other/Proprietary License
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: End Users/Desktop
Classifier: Intended Audience :: System Administrators
Classifier: Operating System :: Microsoft
Classifier: Operating System :: Microsoft :: Windows
Classifier: Operating System :: Unix
Classifier: Operating System :: POSIX :: Linux
Classifier: Operating System :: POSIX
Classifier: Operating System :: MacOS :: MacOS X
Classifier: Operating System :: Microsoft :: Windows :: Windows NT/2000
Classifier: Operating System :: OS Independent
Classifier: Natural Language :: English
Classifier: Programming Language :: C
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 2.4
Classifier: Programming Language :: Python :: 2.5
Classifier: Programming Language :: Python :: 2.6
Classifier: Programming Language :: Python :: 2.7
Classifier: Topic :: Utilities
Classifier: Topic :: System :: Systems Administration
Classifier: Topic :: System :: Filesystems
Classifier: Topic :: System :: Distributed Computing
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Communications :: Usenet News
Classifier: Topic :: System :: Archiving :: Backup
Classifier: Topic :: System :: Archiving :: Mirroring
Classifier: Topic :: System :: Archiving
|