This file is indexed.

/usr/share/doc/bup/bup-margin.html is in bup-doc 0.29-3.

This file is owned by root:root, with mode 0o644.

The actual contents of the file can be viewed below.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
  <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
  <meta http-equiv="Content-Style-Type" content="text/css" />
  <meta name="generator" content="pandoc" />
  <meta name="author" content="Avery Pennarun apenwarr@gmail.com" />
  <meta name="date" content="2017-04-01" />
  <title>bup-margin(1) Bup debian/0.29-3</title>
  <style type="text/css">code{white-space: pre;}</style>
</head>
<body>
<div id="header">
<h1 class="title">bup-margin(1) Bup debian/0.29-3</h1>
<h2 class="author">Avery Pennarun <a href="mailto:apenwarr@gmail.com">apenwarr@gmail.com</a></h2>
<h3 class="date">2017-04-01</h3>
</div>
<h1 id="name">NAME</h1>
<p>bup-margin - figure out your deduplication safety margin</p>
<h1 id="synopsis">SYNOPSIS</h1>
<p>bup margin [options...]</p>
<h1 id="description">DESCRIPTION</h1>
<p><code>bup margin</code> iterates through all objects in your bup repository, calculating the largest number of prefix bits shared between any two entries. This number, <code>n</code>, identifies the longest subset of SHA-1 you could use and still encounter a collision between your object ids.</p>
<p>For example, one system that was tested had a collection of 11 million objects (70 GB), and <code>bup margin</code> returned 45. That means a 46-bit hash would be sufficient to avoid all collisions among that set of objects; each object in that repository could be uniquely identified by its first 46 bits.</p>
<p>The number of bits needed seems to increase by about 1 or 2 for every doubling of the number of objects. Since SHA-1 hashes have 160 bits, that leaves 115 bits of margin. Of course, because SHA-1 hashes are essentially random, it's theoretically possible to use many more bits with far fewer objects.</p>
<p>If you're paranoid about the possibility of SHA-1 collisions, you can monitor your repository by running <code>bup margin</code> occasionally to see if you're getting dangerously close to 160 bits.</p>
<h1 id="options">OPTIONS</h1>
<dl>
<dt>--predict</dt>
<dd>Guess the offset into each index file where a particular object will appear, and report the maximum deviation of the correct answer from the guess. This is potentially useful for tuning an interpolation search algorithm.
</dd>
<dt>--ignore-midx</dt>
<dd>don't use <code>.midx</code> files, use only <code>.idx</code> files. This is only really useful when used with <code>--predict</code>.
</dd>
</dl>
<h1 id="examples">EXAMPLES</h1>
<pre><code>$ bup margin
Reading indexes: 100.00% (1612581/1612581), done.
40
40 matching prefix bits
1.94 bits per doubling
120 bits (61.86 doublings) remaining
4.19338e+18 times larger is possible

Everyone on earth could have 625878182 data sets
like yours, all in one repository, and we would
expect 1 object collision.

$ bup margin --predict
PackIdxList: using 1 index.
Reading indexes: 100.00% (1612581/1612581), done.
915 of 1612581 (0.057%) </code></pre>
<h1 id="see-also">SEE ALSO</h1>
<p><code>bup-midx</code>(1), <code>bup-save</code>(1)</p>
<h1 id="bup">BUP</h1>
<p>Part of the <code>bup</code>(1) suite.</p>
</body>
</html>