/usr/share/doc/monotone/html/Hash-Integrity.html

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<!-- Created by GNU Texinfo 6.1, http://www.gnu.org/software/texinfo/ -->
<head>
<title>monotone documentation: Hash Integrity</title>

<meta name="description" content="monotone documentation: Hash Integrity">
<meta name="keywords" content="monotone documentation: Hash Integrity">
<meta name="resource-type" content="document">
<meta name="distribution" content="global">
<meta name="Generator" content="makeinfo">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<link href="index.html#Top" rel="start" title="Top">
<link href="General-Index.html#General-Index" rel="index" title="General Index">
<link href="index.html#SEC_Contents" rel="contents" title="Table of Contents">
<link href="Special-Topics.html#Special-Topics" rel="up" title="Special Topics">
<link href="Rebuilding-ancestry.html#Rebuilding-ancestry" rel="next" title="Rebuilding ancestry">
<link href="Internationalization.html#Internationalization" rel="prev" title="Internationalization">
<style type="text/css">
<!--
a.summary-letter {text-decoration: none}
blockquote.indentedblock {margin-right: 0em}
blockquote.smallindentedblock {margin-right: 0em; font-size: smaller}
blockquote.smallquotation {font-size: smaller}
div.display {margin-left: 3.2em}
div.example {margin-left: 3.2em}
div.lisp {margin-left: 3.2em}
div.smalldisplay {margin-left: 3.2em}
div.smallexample {margin-left: 3.2em}
div.smalllisp {margin-left: 3.2em}
kbd {font-style: oblique}
pre.display {font-family: inherit}
pre.format {font-family: inherit}
pre.menu-comment {font-family: serif}
pre.menu-preformatted {font-family: serif}
pre.smalldisplay {font-family: inherit; font-size: smaller}
pre.smallexample {font-size: smaller}
pre.smallformat {font-family: inherit; font-size: smaller}
pre.smalllisp {font-size: smaller}
span.nolinebreak {white-space: nowrap}
span.roman {font-family: initial; font-weight: normal}
span.sansserif {font-family: sans-serif; font-weight: normal}
ul.no-bullet {list-style: none}
-->
</style>
<link rel="stylesheet" type="text/css" href="texinfo.css">


</head>

<body lang="en">
<a name="Hash-Integrity"></a>
<div class="header">
<p>
Next: <a href="Rebuilding-ancestry.html#Rebuilding-ancestry" accesskey="n" rel="next">Rebuilding ancestry</a>, Previous: <a href="Internationalization.html#Internationalization" accesskey="p" rel="prev">Internationalization</a>, Up: <a href="Special-Topics.html#Special-Topics" accesskey="u" rel="up">Special Topics</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="General-Index.html#General-Index" title="Index" rel="index">Index</a>]</p>
</div>
<hr>
<a name="Hash-Integrity-1"></a>
<h3 class="section">7.2 Hash Integrity</h3>

<p>Some proponents of a competing, proprietary version control system
have suggested, in a
<a href="http://www.usenix.org/events/hotos03/tech/full_papers/henson/henson_html/">usenix paper</a>, that the use of a cryptographic hash function such as
<small>SHA1</small> as an identifier for a version is unacceptably unsafe. This
section addresses the argument presented in that paper and describes
monotone&rsquo;s additional precautions.
</p>
<p>To summarize our position:
</p><ul>
<li> the analysis in the paper is wrong,
</li><li> even if it were right, monotone is sufficiently safe.
</li></ul>

<a name="The-analysis-is-wrong"></a>
<h3 class="heading">The analysis is wrong</h3>

<p>The paper displays a fundamental lack of understanding about what a
<em>cryptographic</em> hash function is, and how it differs from a
normal hash function. Furthermore it confuses accidental collision
with attack scenarios, and mixes up its analysis of the risk involved
in each. We will try to untangle these issues here.
</p>
<p>A cryptographic hash function such as <small>SHA1</small> is more than just a
uniform spread of inputs to an output range. Rather, it must be
designed to withstand attempts at:
</p>
<ul>
<li> reversal: deriving an input value from the output
</li><li> collision: finding two different inputs which hash to the same output
</li></ul>

<p>Collision is the problem the paper is concerned with. Formally, an
n-bit cryptographic hash should cost <em>2^n</em> work units to collide
against a given value, and <em>sqrt(2^n)</em> tries to find a random
pair of colliding values. This latter probability is sometimes called
the hash&rsquo;s &ldquo;birthday paradox probability&rdquo;.
</p>
<a name="Accidental-collision"></a>
<h4 class="subheading">Accidental collision</h4>

<p>One way of measuring these bounds is by measuring how single-bit
changes in the input affect bits in the hash output. The <small>SHA1</small>
hash has a strong <em>avalanche property</em>, which means that flipping
<em>any one bit</em> in the input will cause on average half the 160
bits in the output code to change. The fanciful <small>VAL1</small> hash
presented in the paper does not have such a property &mdash; flipping its
first bit when all the rest are zero causes <em>no change</em> to any of
the 160 output bits &mdash; and is completely unsuited for use as a
<em>cryptographic hash</em>, regardless of the general shape of its
probability distribution.
</p>
<p>The paper also suggests that birthday paradox probability cannot be
used to measure the chance of accidental <small>SHA1</small> collision on &ldquo;real
inputs&rdquo;, because birthday paradox probability assumes a uniformly
random sample and &ldquo;real inputs&rdquo; are not uniformly random. The paper
is wrong: the inputs to <small>SHA1</small> are not what is being measured (and
in any case can be arbitrarily long); the collision probability being
measured is of <em>output space</em>. On output space, the hash is
designed to produce uniformly random spread, even given nearly
identical inputs. In other words, it is <em>a primary design
criterion</em> of such a hash that a birthday paradox probability is a
valid approximation of its collision probability.
</p>
<p>The paper&rsquo;s characterization of risk when hashing &ldquo;non-random
inputs&rdquo; is similarly deceptive. It presents a fanciful case of a
program which is <em>storing</em> every possible 2kb block in a
file system addressed by <small>SHA1</small> (the program is trying to find a
<small>SHA1</small> collision). While this scenario <em>will</em> very likely
encounter a collision <em>somewhere</em> in the course of storing all
such blocks, the paper neglects to mention that we only expect it to
collide after storing about <em>2^{80}</em> of the <em>2^{16384}</em> possible
such blocks (not to mention the requirements for compute time to
search, or disk space to store <em>2^{80}</em> 2kb blocks).
</p>
<p>Noting that monotone can only store <em>2^{41}</em> bytes in a database,
and thus probably some lower number (say <em>2^{32}</em> or so) active
rows, we consider such birthday paradox probability well out of
practical sight. Perhaps it will be a serious concern when
multi-yottabyte hard disks are common.
</p>

<a name="Collision-attacks"></a>
<h4 class="subheading">Collision attacks</h4>

<p>Setting aside accidental collisions, then, the paper&rsquo;s underlying
theme of vulnerability rests on the assertion that someone will break
<small>SHA1</small>. Breaking a cryptographic hash usually means finding a way
to collide it trivially. While we note that <small>SHA1</small> has in fact
resisted attempts at breaking for 8 years already, we cannot say that
it will last forever. Someone might break it. We can say, however,
that finding a way to trivially collide it only changes the resistance
to <em>active attack</em>, rather than the behavior of the hash on
benign inputs.
</p>
<p>Therefore the vulnerability is not that the hash might suddenly cease
to address benign blocks well, but merely that additional security
precautions might become a requirement to ensure that blocks are
benign, rather than malicious. The paper fails to make this
distinction, suggesting that a hash becomes &ldquo;unusable&rdquo; when it is
broken. This is plainly not true, as a number of systems continue to
get useful low collision hashing behavior &mdash; just not good security
behavior &mdash; out of &ldquo;broken&rdquo; cryptographic hashes such as MD4.
</p>
<a name="Monotone-is-probably-safe-anyways"></a>
<h3 class="heading">Monotone is probably safe anyways</h3>

<p>Perhaps our arguments above are unconvincing, or perhaps you are the
sort of person who thinks that practice never lines up with
theory. Fair enough. Below we present <em>practical</em> procedures you
can follow to compensate for the supposed threats presented in the
paper.
</p>
<a name="Collision-attacks-1"></a>
<h4 class="subheading">Collision attacks</h4>

<p>A successful collision attack on <small>SHA1</small>, as mentioned, does not
disrupt the <em>probability</em> features of <small>SHA1</small> on benign
blocks. So if, at any time, you believe <small>SHA1</small> is &ldquo;broken&rdquo;, it
does <em>not</em> mean that you cannot use it for your work with
monotone. It means, rather, that you cannot base your <em>trust</em> on
<small>SHA1</small> values anymore. You must trust who you communicate with.
</p>
<p>The way around this is reasonably simple: if you do not trust
<small>SHA1</small> to prevent malicious blocks from slipping into your
communications, you can always augment it by enclosing your
communications in more security, such as tunnels or additional
signatures on your email posts. If you choose to do this, you will
still have the benefit of self-identifying blocks, you will simply
cease to trust such blocks unless they come with additional
authentication information.
</p>
<p>If in the future <small>SHA1</small> (or, indeed, <small>RSA</small>) becomes accepted as
broken we will naturally upgrade monotone to a newer hash or public
key scheme, and provide migration commands to recalculate existing
databases based on the new algorithm.
</p>
<p>Similarly, if you do not trust our vigilance in keeping up to date
with cryptography literature, you can modify monotone to use any
stronger hash you like, at the cost of isolating your own
communications to a group using the modified version. Monotone is free
software, and runs atop <code>botan</code>, so it is both legal and
relatively simple to change it to use some other algorithm.
</p>
<hr>
<div class="header">
<p>
Next: <a href="Rebuilding-ancestry.html#Rebuilding-ancestry" accesskey="n" rel="next">Rebuilding ancestry</a>, Previous: <a href="Internationalization.html#Internationalization" accesskey="p" rel="prev">Internationalization</a>, Up: <a href="Special-Topics.html#Special-Topics" accesskey="u" rel="up">Special Topics</a> &nbsp; [<a href="index.html#SEC_Contents" title="Table of contents" rel="contents">Contents</a>][<a href="General-Index.html#General-Index" title="Index" rel="index">Index</a>]</p>
</div>



</body>
</html>
monotone-doc 1.1-9 / usr / share / doc / monotone / html / Hash-Integrity.html