/usr/share/doc/bup/bup-margin.html is in bup-doc 0.29-3.
This file is owned by root:root, with mode 0o644.
The actual contents of the file can be viewed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 | <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta http-equiv="Content-Style-Type" content="text/css" />
<meta name="generator" content="pandoc" />
<meta name="author" content="Avery Pennarun apenwarr@gmail.com" />
<meta name="date" content="2017-04-01" />
<title>bup-margin(1) Bup debian/0.29-3</title>
<style type="text/css">code{white-space: pre;}</style>
</head>
<body>
<div id="header">
<h1 class="title">bup-margin(1) Bup debian/0.29-3</h1>
<h2 class="author">Avery Pennarun <a href="mailto:apenwarr@gmail.com">apenwarr@gmail.com</a></h2>
<h3 class="date">2017-04-01</h3>
</div>
<h1 id="name">NAME</h1>
<p>bup-margin - figure out your deduplication safety margin</p>
<h1 id="synopsis">SYNOPSIS</h1>
<p>bup margin [options...]</p>
<h1 id="description">DESCRIPTION</h1>
<p><code>bup margin</code> iterates through all objects in your bup repository, calculating the largest number of prefix bits shared between any two entries. This number, <code>n</code>, identifies the longest subset of SHA-1 you could use and still encounter a collision between your object ids.</p>
<p>For example, one system that was tested had a collection of 11 million objects (70 GB), and <code>bup margin</code> returned 45. That means a 46-bit hash would be sufficient to avoid all collisions among that set of objects; each object in that repository could be uniquely identified by its first 46 bits.</p>
<p>The number of bits needed seems to increase by about 1 or 2 for every doubling of the number of objects. Since SHA-1 hashes have 160 bits, that leaves 115 bits of margin. Of course, because SHA-1 hashes are essentially random, it's theoretically possible to use many more bits with far fewer objects.</p>
<p>If you're paranoid about the possibility of SHA-1 collisions, you can monitor your repository by running <code>bup margin</code> occasionally to see if you're getting dangerously close to 160 bits.</p>
<h1 id="options">OPTIONS</h1>
<dl>
<dt>--predict</dt>
<dd>Guess the offset into each index file where a particular object will appear, and report the maximum deviation of the correct answer from the guess. This is potentially useful for tuning an interpolation search algorithm.
</dd>
<dt>--ignore-midx</dt>
<dd>don't use <code>.midx</code> files, use only <code>.idx</code> files. This is only really useful when used with <code>--predict</code>.
</dd>
</dl>
<h1 id="examples">EXAMPLES</h1>
<pre><code>$ bup margin
Reading indexes: 100.00% (1612581/1612581), done.
40
40 matching prefix bits
1.94 bits per doubling
120 bits (61.86 doublings) remaining
4.19338e+18 times larger is possible
Everyone on earth could have 625878182 data sets
like yours, all in one repository, and we would
expect 1 object collision.
$ bup margin --predict
PackIdxList: using 1 index.
Reading indexes: 100.00% (1612581/1612581), done.
915 of 1612581 (0.057%) </code></pre>
<h1 id="see-also">SEE ALSO</h1>
<p><code>bup-midx</code>(1), <code>bup-save</code>(1)</p>
<h1 id="bup">BUP</h1>
<p>Part of the <code>bup</code>(1) suite.</p>
</body>
</html>
|