This file is indexed.

/usr/share/doc/diveintopython-zh/html/performance_tuning/list_operations.html is in diveintopython-zh 5.4b-1.

This file is owned by root:root, with mode 0o644.

The actual contents of the file can be viewed below.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
<!DOCTYPE html
  PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
   <head>
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
   
      <title>18.5.&nbsp;优化列表操作</title>
      <link rel="stylesheet" href="../diveintopython.css" type="text/css">
      <link rev="made" href="mailto:f8dy@diveintopython.org">
      <meta name="generator" content="DocBook XSL Stylesheets V1.52.2">
      <meta name="keywords" content="Python, Dive Into Python, tutorial, object-oriented, programming, documentation, book, free">
      <meta name="description" content="Python from novice to pro">
      <link rel="home" href="../toc/index.html" title="Dive Into Python">
      <link rel="up" href="index.html" title="第&nbsp;18&nbsp;章&nbsp;性能优化">
      <link rel="previous" href="dictionary_lookups.html" title="18.4.&nbsp;优化字典查找">
      <link rel="next" href="string_manipulation.html" title="18.6.&nbsp;优化字符串操作">
   </head>
   <body>
      <table id="Header" width="100%" border="0" cellpadding="0" cellspacing="0" summary="">
         <tr>
            <td id="breadcrumb" colspan="5" align="left" valign="top">导航:<a href="../index.html">起始页</a>&nbsp;&gt;&nbsp;<a href="../toc/index.html">Dive Into Python</a>&nbsp;&gt;&nbsp;<a href="index.html">性能优化</a>&nbsp;&gt;&nbsp;<span class="thispage">优化列表操作</span></td>
            <td id="navigation" align="right" valign="top">&nbsp;&nbsp;&nbsp;<a href="dictionary_lookups.html" title="上一页: “优化字典查找”">&lt;&lt;</a>&nbsp;&nbsp;&nbsp;<a href="string_manipulation.html" title="下一页: “优化字符串操作”">&gt;&gt;</a></td>
         </tr>
         <tr>
            <td colspan="3" id="logocontainer">
               <h1 id="logo"><a href="../index.html" accesskey="1">深入 Python :Dive Into Python 中文版</a></h1>
               <p id="tagline">Python 从新手到专家 [Dip_5.4b_CPyUG_Release]</p>
            </td>
            <td colspan="3" align="right">
               <form id="search" method="GET" action="http://www.google.com/custom">
                  <p><label for="q" accesskey="4">Find:&nbsp;</label><input type="text" id="q" name="q" size="20" maxlength="255" value=""> <input type="submit" value="搜索"><input type="hidden" name="domains" value="woodpecker.org.cn"><input type="hidden" name="sitesearch" value="www.woodpecker.org.cn/diveintopython"></p>
               </form>
            </td>
         </tr>
      </table>
      <!--#include virtual="/inc/ads" -->
      <div class="section" lang="zh_cn">
         <div class="titlepage">
            <div>
               <div>
                  <h2 class="title"><a name="soundex.stage3"></a>18.5.&nbsp;优化列表操作
                  </h2>
               </div>
            </div>
            <div></div>
         </div>
         <div class="abstract">
            <p>Soundex 算法的第三步是去除连续重复字符。怎样做是最佳方法?</p>
         </div>
         <p>这里是我们目前在 <tt class="filename">soundex/stage2/soundex2c.py</tt> 中的代码:
         </p>
         <div class="informalexample"><pre class="programlisting">
    digits2 = digits[0]
    <span class='pykeyword'>for</span> d <span class='pykeyword'>in</span> digits[1:]:
        <span class='pykeyword'>if</span> digits2[-1] != d:
            digits2 += d
</pre></div>
         <p>这里是 <tt class="filename">soundex2c.py</tt> 的性能表现:
         </p>
         <div class="informalexample"><pre class="screen">
<tt class="prompt">C:\samples\soundex\stage2&gt;</tt><span class="userinput">python soundex2c.py</span>
<span class="computeroutput">Woo             W000 11.437645008
Pilgrim         P426 13.2825062962
Flingjingwaller F452 18.5570110168</span>
</pre></div>
         <p>第一件事是考虑,考察在循环的每一轮都检查 <tt class="varname">digits[-1]</tt> 是否有效率。列表索引代价大吗?如果把上一个数字存在另外的变量中以便检查是否会获益?
         </p>
         <p>这里的 <tt class="filename">soundex/stage3/soundex3a.py</tt> 将回答这个问题:
         </p>
         <div class="informalexample"><pre class="programlisting">
    digits2 = <span class='pystring'>''</span>
    last_digit = <span class='pystring'>''</span>
    <span class='pykeyword'>for</span> d <span class='pykeyword'>in</span> digits:
        <span class='pykeyword'>if</span> d != last_digit:
            digits2 += d
            last_digit = d
</pre></div>
         <p><tt class="filename">soundex3a.py</tt> 并不比 <tt class="filename">soundex2c.py</tt> 运行得快多少,而且甚至可能更会慢些 (差异还没有大到可以确信这一点):
         </p>
         <div class="informalexample"><pre class="screen">
<tt class="prompt">C:\samples\soundex\stage3&gt;</tt><span class="userinput">python soundex3a.py</span>
<span class="computeroutput">Woo             W000 11.5346048171
Pilgrim         P426 13.3950636184
Flingjingwaller F452 18.6108927252</span>
</pre></div>
         <p>为什么 <tt class="filename">soundex3a.py</tt> 不更快呢?其实 <span class="application">Python</span> 的索引功能恰恰很有效。重复使用 <tt class="varname">digits2[-1]</tt> 根本没什么问题。另一方面,手工保留上一个数字意味着我们每存储一个数字都要为<span class="emphasis"><em>两个</em></span> 变量赋值,这便抹杀了我们避开索引查找所带来的微小好处。
         </p>
         <p>让我们从本质上不同的方法来思考。如果可以把字符串当作字符列表来对待,那么使用列表遍历遍寻列表便成为可能。问题是代码需要使用列表中的上一个字符,而且使用列表遍历做到这一点并不容易。</p>
         <p>但是,使用内建的 <tt class="function">range()</tt> 函数创建一个索引数字构成的列表是可以的。使用这些索引数字一步步搜索列表并拿出与前面不同的字符。这样将使你得到一个字符串列表,使用字符串方法 <tt class="methodname">join()</tt> 便可重建字符串。
         </p>
         <p>这便是 <tt class="filename">soundex/stage3/soundex3b.py</tt></p>
         <div class="informalexample"><pre class="programlisting">
    digits2 = <span class='pystring'>""</span>.join([digits[i] <span class='pykeyword'>for</span> i <span class='pykeyword'>in</span> range(len(digits))
                       <span class='pykeyword'>if</span> i == 0 <span class='pykeyword'>or</span> digits[i-1] != digits[i]])
</pre></div>
         <p>这样快了吗?一个字,否。</p>
         <div class="informalexample"><pre class="screen">
<tt class="prompt">C:\samples\soundex\stage3&gt;</tt><span class="userinput">python soundex3b.py</span>
<span class="computeroutput">Woo             W000 14.2245271396
Pilgrim         P426 17.8337165757
Flingjingwaller F452 25.9954005327</span>
</pre></div>
         <p>有可能因为目前的这些方法都是 “<span class="quote">字符串中心化</span>” 的。<span class="application">Python</span> 可以通过一个命令把一个字符串转化为一个字符列表:<tt class="function">list('abc')</tt> 返回 <tt class="literal">['a', 'b', 'c']</tt>。更进一步,列表可以被很快地<span class="emphasis"><em>就地</em></span> 改变。与其一砖一瓦地建造一个新的列表 (或者字符串),为什么不选择操作列表的元素呢?
         </p>
         <p>这便是 <tt class="filename">soundex/stage3/soundex3c.py</tt>,就地修改列表去除连续重复元素:
         </p>
         <div class="informalexample"><pre class="programlisting">
    digits = list(source[0].upper() + source[1:].translate(charToSoundex))
    i=0
    <span class='pykeyword'>for</span> item <span class='pykeyword'>in</span> digits:
        <span class='pykeyword'>if</span> item==digits[i]: <span class='pykeyword'>continue</span>
        i+=1
        digits[i]=item
    <span class='pykeyword'>del</span> digits[i+1:]
    digits2 = <span class='pystring'>""</span>.join(digits)
</pre></div>
         <p>这比 <tt class="filename">soundex3a.py</tt><tt class="filename">soundex3b.py</tt> 快吗?不,实际上这是目前最慢的一种方法<sup>[<a name="d0e39809" href="#ftn.d0e39809">16</a>]</sup></p>
         <div class="informalexample"><pre class="screen">
<tt class="prompt">C:\samples\soundex\stage3&gt;</tt><span class="userinput">python soundex3c.py</span>
<span class="computeroutput">Woo             W000 14.1662554878
Pilgrim         P426 16.0397885765
Flingjingwaller F452 22.1789341942</span>
</pre></div>
         <p>我们在这儿除了试用了几种 “<span class="quote">聪明</span>” 的技术,根本没有什么进步。到目前为止最快的方法就是最直接的原始方法 (<tt class="filename">soundex2c.py</tt>)。有时候聪明未必有回报。
         </p>
         <div class="example"><a name="d0e39837"></a><h3 class="title">&nbsp;18.5.&nbsp;目前的最佳结果:<tt class="filename">soundex/stage2/soundex2c.py</tt></h3><pre class="programlisting"><span class='pykeyword'>
import</span> string, re

allChar = string.uppercase + string.lowercase
charToSoundex = string.maketrans(allChar, <span class='pystring'>"91239129922455912623919292"</span> * 2)

<span class='pykeyword'>def</span><span class='pyclass'> soundex</span>(source):
    <span class='pykeyword'>if</span> (<span class='pykeyword'>not</span> source) <span class='pykeyword'>or</span> (<span class='pykeyword'>not</span> source.isalpha()):
        <span class='pykeyword'>return</span> <span class='pystring'>"0000"</span>
    digits = source[0].upper() + source[1:].translate(charToSoundex)
    digits2 = digits[0]
    <span class='pykeyword'>for</span> d <span class='pykeyword'>in</span> digits[1:]:
        <span class='pykeyword'>if</span> digits2[-1] != d:
            digits2 += d
    digits3 = re.sub(<span class='pystring'>'9'</span>, <span class='pystring'>''</span>, digits2)
    <span class='pykeyword'>while</span> len(digits3) &lt; 4:
        digits3 += <span class='pystring'>"0"</span>
    <span class='pykeyword'>return</span> digits3[:4]

<span class='pykeyword'>if</span> __name__ == <span class='pystring'>'__main__'</span>:
    <span class='pykeyword'>from</span> timeit <span class='pykeyword'>import</span> Timer
    names = (<span class='pystring'>'Woo'</span>, <span class='pystring'>'Pilgrim'</span>, <span class='pystring'>'Flingjingwaller'</span>)
    <span class='pykeyword'>for</span> name <span class='pykeyword'>in</span> names:
        statement = <span class='pystring'>"soundex('%s')"</span> % name
        t = Timer(statement, <span class='pystring'>"from __main__ import soundex"</span>)
        <span class='pykeyword'>print</span> name.ljust(15), soundex(name), min(t.repeat())
</pre></div>
         <div class="footnotes">
            <h3 class="footnotetitle">Footnotes</h3>
            <div class="footnote">
               <p><sup>[<a name="ftn.d0e39809" href="#d0e39809">16</a>] </sup><tt class="filename">soundex3c.py</tt><tt class="filename">soundex3b.py</tt> 快。――译注
               </p>
            </div>
         </div>
      </div>
      <table class="Footer" width="100%" border="0" cellpadding="0" cellspacing="0" summary="">
         <tr>
            <td width="35%" align="left"><br><a class="NavigationArrow" href="dictionary_lookups.html">&lt;&lt;&nbsp;优化字典查找</a></td>
            <td width="30%" align="center"><br>&nbsp;<span class="divider">|</span>&nbsp;<a href="index.html#soundex.divein" title="18.1.&nbsp;概览">1</a> <span class="divider">|</span> <a href="timeit.html" title="18.2.&nbsp;使用 timeit 模块">2</a> <span class="divider">|</span> <a href="regular_expressions.html" title="18.3.&nbsp;优化正则表达式">3</a> <span class="divider">|</span> <a href="dictionary_lookups.html" title="18.4.&nbsp;优化字典查找">4</a> <span class="divider">|</span> <span class="thispage">5</span> <span class="divider">|</span> <a href="string_manipulation.html" title="18.6.&nbsp;优化字符串操作">6</a> <span class="divider">|</span> <a href="summary.html" title="18.7.&nbsp;小结">7</a>&nbsp;<span class="divider">|</span>&nbsp;
            </td>
            <td width="35%" align="right"><br><a class="NavigationArrow" href="string_manipulation.html">优化字符串操作&nbsp;&gt;&gt;</a></td>
         </tr>
         <tr>
            <td colspan="3"><br></td>
         </tr>
      </table>
      <div class="Footer">
         <p class="copyright">Copyright © 2000, 2001, 2002, 2003, 2004 <a href="mailto:mark@diveintopython.org">Mark Pilgrim</a></p>
         <p class="copyright">Copyright © 2001, 2002, 2003, 2004, 2005, 2006, 2007 <a href="mailto:python-cn@googlegroups.com">CPyUG (邮件列表)</a></p>
      </div>
   </body>
</html>