String Comparison — jellyfish 0.5.6 documentation
<div class="section" id="string-comparison">
<h1>String Comparison<a class="headerlink" href="#string-comparison" title="Permalink to this headline">¶</a></h1>
<p>These methods are all measures of the difference (aka <cite>edit distance</cite>) between two strings.</p>
<div class="section" id="levenshtein-distance">
<h2>Levenshtein Distance<a class="headerlink" href="#levenshtein-distance" title="Permalink to this headline">¶</a></h2>
<dl class="function">
<dt id="levenshtein_distance">
<code class="descname">levenshtein_distance</code><span class="sig-paren">(</span><em>s1</em>, <em>s2</em><span class="sig-paren">)</span><a class="headerlink" href="#levenshtein_distance" title="Permalink to this definition">¶</a></dt>
<dd><p>Compute the Levenshtein distance between s1 and s2.</p>
<p>Levenshtein distance represents the number of insertions, deletions, and subsititutions required to change one word to another.</p>
<p>For example: <code class="docutils literal"><span class="pre">levenshtein_distance('berne',</span> <span class="pre">'born')</span> <span class="pre">==</span> <span class="pre">2</span></code> representing the transformation of the first e to o and the deletion of the second e.</p>
<p>See the <a class="reference external" href="http://en.wikipedia.org/wiki/Levenshtein_distance">Levenshtein distance article at Wikipedia</a> for more details.</p>
<div class="section" id="damerau-levenshtein-distance">
<h2>Damerau-Levenshtein Distance<a class="headerlink" href="#damerau-levenshtein-distance" title="Permalink to this headline">¶</a></h2>
<dl class="function">
<dt id="damerau_levenshtein_distance">
<code class="descname">damerau_levenshtein_distance</code><span class="sig-paren">(</span><em>s1</em>, <em>s2</em><span class="sig-paren">)</span><a class="headerlink" href="#damerau_levenshtein_distance" title="Permalink to this definition">¶</a></dt>
<dd><p>Compute the Damerau-Levenshtein distance between s1 and s2.</p>
<p>A modification of Levenshtein distance, Damerau-Levenshtein distance counts transpositions (such as ifhs for fish) as a single edit.</p>
<p>Where <code class="docutils literal"><span class="pre">levenshtein_distance('fish',</span> <span class="pre">'ifsh')</span> <span class="pre">==</span> <span class="pre">2</span></code> as it would require a deletion and an insertion,
though <code class="docutils literal"><span class="pre">damerau_levenshtein_distance('fish',</span> <span class="pre">'ifsh')</span> <span class="pre">==</span> <span class="pre">1</span></code> as this counts as a transposition.</p>
<p>See the <a class="reference external" href="http://en.wikipedia.org/wiki/Damerau-Levenshtein_distance">Damerau-Levenshtein distance article at Wikipedia</a> for more details.</p>
<div class="section" id="hamming-distance">
<h2>Hamming Distance<a class="headerlink" href="#hamming-distance" title="Permalink to this headline">¶</a></h2>
<dl class="function">
<dt id="hamming_distance">
<code class="descname">hamming_distance</code><span class="sig-paren">(</span><em>s1</em>, <em>s2</em><span class="sig-paren">)</span><a class="headerlink" href="#hamming_distance" title="Permalink to this definition">¶</a></dt>
<dd><p>Compute the Hamming distance between s1 and s2.</p>
<p>Hamming distance is the measure of the number of characters that differ between two strings.</p>
<p>Typically Hamming distance is undefined when strings are of different length, but this implementation
considers extra characters as differing. For example <code class="docutils literal"><span class="pre">hamming_distance('abc',</span> <span class="pre">'abcd')</span> <span class="pre">==</span> <span class="pre">1</span></code>.</p>
<p>See the <a class="reference external" href="http://en.wikipedia.org/wiki/Hamming_distance">Hamming distance article at Wikipedia</a> for more details.</p>
<div class="section" id="jaro-distance">
<h2>Jaro Distance<a class="headerlink" href="#jaro-distance" title="Permalink to this headline">¶</a></h2>
<dl class="function">
<dt id="jaro_distance">
<code class="descname">jaro_distance</code><span class="sig-paren">(</span><em>s1</em>, <em>s2</em><span class="sig-paren">)</span><a class="headerlink" href="#jaro_distance" title="Permalink to this definition">¶</a></dt>
<dd><p>Compute the Jaro distance between s1 and s2.</p>
<p>Jaro distance is a string-edit distance that gives a floating point response in [0,1] where 0 represents two completely dissimilar strings and 1 represents identical strings.</p>
<div class="section" id="jaro-winkler-distance">
<h2>Jaro-Winkler Distance<a class="headerlink" href="#jaro-winkler-distance" title="Permalink to this headline">¶</a></h2>
<dl class="function">
<dt id="jaro_winkler">
<code class="descname">jaro_winkler</code><span class="sig-paren">(</span><em>s1</em>, <em>s2</em><span class="sig-paren">)</span><a class="headerlink" href="#jaro_winkler" title="Permalink to this definition">¶</a></dt>
<dd><p>Compute the Jaro-Winkler distance between s1 and s2.</p>
<p>Jaro-Winkler is a modification/improvement to Jaro distance, like Jaro it gives a floating point response in [0,1] where 0 represents two completely dissimilar strings and 1 represents identical strings.</p>
<p>See the <a class="reference external" href="http://en.wikipedia.org/wiki/Jaro-Winkler_distance">Jaro-Winkler distance article at Wikipedia</a> for more details.</p>
<div class="section" id="match-rating-approach-comparison">
<h2>Match Rating Approach (comparison)<a class="headerlink" href="#match-rating-approach-comparison" title="Permalink to this headline">¶</a></h2>
<dl class="function">
<dt id="match_rating_comparison">
<code class="descname">match_rating_comparison</code><span class="sig-paren">(</span><em>s1</em>, <em>s2</em><span class="sig-paren">)</span><a class="headerlink" href="#match_rating_comparison" title="Permalink to this definition">¶</a></dt>
<dd><p>Compare s1 and s2 using the match rating approach algorithm, returns <code class="docutils literal"><span class="pre">True</span></code> if strings are considered equivalent or <code class="docutils literal"><span class="pre">False</span></code> if not. Can also return <code class="docutils literal"><span class="pre">None</span></code> if s1 and s2 are not comparable (length differs by more than 3).</p>
<p>The Match rating approach algorithm is an algorithm for determining whether or not two names are
pronounced similarly. Strings are first encoded using <a class="reference internal" href="phonetic.html#match_rating_codex" title="match_rating_codex"><code class="xref py py-func docutils literal"><span class="pre">match_rating_codex()</span></code></a> then compared according to the MRA algorithm.</p>
<p>See the <a class="reference external" href="http://en.wikipedia.org/wiki/Match_rating_approach">Match Rating Approach article at Wikipedia</a> for more details.</p>
