/usr/share/doc/python-tidylib/html/index.html is in python-tidylib 0.2.1~dfsg-2.
This file is owned by root:root, with mode 0o644.
The actual contents of the file can be viewed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 | <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>PyTidyLib: A Python Interface to HTML Tidy — pytidylib module</title>
<link rel="stylesheet" href="_static/default.css" type="text/css" />
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<script type="text/javascript">
var DOCUMENTATION_OPTIONS = {
URL_ROOT: '',
VERSION: '',
COLLAPSE_INDEX: false,
FILE_SUFFIX: '.html',
HAS_SOURCE: true
};
</script>
<script type="text/javascript" src="_static/jquery.js"></script>
<script type="text/javascript" src="_static/underscore.js"></script>
<script type="text/javascript" src="_static/doctools.js"></script>
<link rel="top" title="pytidylib module" href="#" />
</head>
<body>
<div class="related">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="genindex.html" title="General Index"
accesskey="I">index</a></li>
<li><a href="#">pytidylib module</a> »</li>
</ul>
</div>
<div class="document">
<div class="documentwrapper">
<div class="bodywrapper">
<div class="body">
<div class="section" id="pytidylib-a-python-interface-to-html-tidy">
<h1>PyTidyLib: A Python Interface to HTML Tidy<a class="headerlink" href="#pytidylib-a-python-interface-to-html-tidy" title="Permalink to this headline">¶</a></h1>
<p><a class="reference external" href="http://countergram.com/open-source/pytidylib/">PyTidyLib</a> is a Python package that wraps the <a class="reference external" href="http://tidy.sourceforge.net/">HTML Tidy</a> library. This allows you, from Python code, to “fix” invalid (X)HTML markup. Some of the library’s many capabilities include:</p>
<ul class="simple">
<li>Clean up unclosed tags and unescaped characters such as ampersands</li>
<li>Output HTML 4 or XHTML, strict or transitional, and add missing doctypes</li>
<li>Convert named entities to numeric entities, which can then be used in XML documents without an HTML doctype.</li>
<li>Clean up HTML from programs such as Word (to an extent)</li>
<li>Indent the output, including proper (i.e. no) indenting for <tt class="docutils literal"><span class="pre">pre</span></tt> elements, which some (X)HTML indenting code overlooks.</li>
</ul>
<p>PyTidyLib is intended as as replacement for uTidyLib, which fills a similar purpose. The author previously used uTidyLib but found several areas for improvement, including OS X support, 64-bit platform support, unicode support, fixing a memory leak, and better speed.</p>
<div class="section" id="naming-conventions">
<h2>Naming conventions<a class="headerlink" href="#naming-conventions" title="Permalink to this headline">¶</a></h2>
<p><a class="reference external" href="http://tidy.sourceforge.net/">HTML Tidy</a> is a longstanding open-source library written in C that implements the actual functionality of cleaning up (X)HTML markup. It provides a shared library (<tt class="docutils literal"><span class="pre">so</span></tt>, <tt class="docutils literal"><span class="pre">dll</span></tt>, or <tt class="docutils literal"><span class="pre">dylib</span></tt>) that can variously be called <tt class="docutils literal"><span class="pre">tidy</span></tt>, <tt class="docutils literal"><span class="pre">libtidy</span></tt>, or <tt class="docutils literal"><span class="pre">tidylib</span></tt>, as well as a command-line executable named <tt class="docutils literal"><span class="pre">tidy</span></tt>. For clarity, this document will consistently refer to it by the project name, HTML Tidy.</p>
<p><a class="reference external" href="http://countergram.com/open-source/pytidylib/">PyTidyLib</a> is the name of the Python package discussed here. As this is the package name, <tt class="docutils literal"><span class="pre">easy_install</span> <span class="pre">pytidylib</span></tt> or <tt class="docutils literal"><span class="pre">pip</span> <span class="pre">install</span> <span class="pre">pytidylib</span></tt> is correct (they are case-insenstive). The <em>module</em> name is <tt class="docutils literal"><span class="pre">tidylib</span></tt>, so <tt class="docutils literal"><span class="pre">import</span> <span class="pre">tidylib</span></tt> is correct in Python code. This document will consistently use the package name, PyTidyLib, outside of code examples.</p>
</div>
<div class="section" id="installing-html-tidy">
<h2>Installing HTML Tidy<a class="headerlink" href="#installing-html-tidy" title="Permalink to this headline">¶</a></h2>
<p>You must have both <a class="reference external" href="http://tidy.sourceforge.net/">HTML Tidy</a> and <a class="reference external" href="http://countergram.com/open-source/pytidylib/">PyTidyLib</a> installed in order to use the functionality described here. There is no affiliation between the two projects. The following briefly outlines what you must do to install HTML Tidy. See the <a class="reference external" href="http://tidy.sourceforge.net/">HTML Tidy</a> web site for more information.</p>
<p><strong>Linux/BSD or similar:</strong> First, try to use your distribution’s package management system (<tt class="docutils literal"><span class="pre">apt-get</span></tt>, <tt class="docutils literal"><span class="pre">yum</span></tt>, etc.) to install HTML Tidy. It might go under the name <tt class="docutils literal"><span class="pre">libtidy</span></tt>, <tt class="docutils literal"><span class="pre">tidylib</span></tt>, <tt class="docutils literal"><span class="pre">tidy</span></tt>, or something similar. Otherwise see <em>Building from Source</em>, below.</p>
<p><strong>OS X:</strong> You may already have HTML Tidy installed. In the Terminal, run <tt class="docutils literal"><span class="pre">locate</span> <span class="pre">libtidy</span></tt> and see if you get any results, which should end in <tt class="docutils literal"><span class="pre">dylib</span></tt>. Otherwise see <em>Building from Source</em>, below.</p>
<p><strong>Windows:</strong> (Use PyTidyLib version 0.2 or later!) Prebuilt HTML Tidy DLLs are available from at least two locations. The <a class="reference external" href="http://int64.org/projects/tidy-binaries">int64.org Tidy Binaries</a> page provides binaries that were built in 2005, for both 32-bit and 64-bit Windows, against a patched version of the source. The <a class="reference external" href="http://tidy.sourceforge.net/">HTML Tidy</a> web site links to a DLL built in 2006, for 32-bit Windows only, using the vanilla source (scroll near the bottom to “Other Builds” – use the one that reads “exe/lib/dll”, <em>not</em> the “exe”-only version.)</p>
<p>Once you have a DLL (which may be named <tt class="docutils literal"><span class="pre">tidy.dll</span></tt>, <tt class="docutils literal"><span class="pre">libtidy.dll</span></tt>, or <tt class="docutils literal"><span class="pre">tidylib.dll</span></tt>), you must place it in a directory on your system path. If you are running Python from the command-line, placing the DLL in the present working directory will work, but this is unreliable otherwise (e.g. for server software).</p>
<p>See the articles <a class="reference external" href="http://www.computerhope.com/issues/ch000549.htm">How to set the path in Windows 2000/Windows XP</a> (ComputerHope.com) and <a class="reference external" href="http://www.question-defense.com/2009/06/22/modify-a-users-path-in-windows-vista-vista-path-environment-variable/">Modify a Users Path in Windows Vista</a> (Question Defense) for more information on your system path.</p>
<p><strong>Building from Source:</strong> The HTML Tidy developers have chosen to make the source code downloadable <em>only</em> through CVS, and not from the web site. Use the following CVS checkout at the command line:</p>
<div class="highlight-python"><pre>cvs -z3 -d:pserver:anonymous@tidy.cvs.sourceforge.net:/cvsroot/tidy co -P tidy</pre>
</div>
<p>Then see the instructions packaged with the source code or on the <a class="reference external" href="http://tidy.sourceforge.net/">HTML Tidy</a> web site.</p>
</div>
<div class="section" id="installing-pytidylib">
<h2>Installing PyTidyLib<a class="headerlink" href="#installing-pytidylib" title="Permalink to this headline">¶</a></h2>
<p>PyTidyLib is available on the Python Package Index and may be installed in the usual ways if you have <a class="reference external" href="http://pypi.python.org/pypi/pip">pip</a> or <a class="reference external" href="http://pypi.python.org/pypi/setuptools">setuptools</a> installed:</p>
<div class="highlight-python"><pre>pip install pytidylib
# or:
easy_install pytidylib</pre>
</div>
<p>You can also download the latest source distribution from the <a class="reference external" href="http://countergram.com/open-source/pytidylib/">PyTidyLib</a> web site.</p>
</div>
<div class="section" id="small-example-of-use">
<h2>Small example of use<a class="headerlink" href="#small-example-of-use" title="Permalink to this headline">¶</a></h2>
<p>The following code cleans up an invalid HTML document and sets an option:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="kn">from</span> <span class="nn">tidylib</span> <span class="kn">import</span> <span class="n">tidy_document</span>
<span class="n">document</span><span class="p">,</span> <span class="n">errors</span> <span class="o">=</span> <span class="n">tidy_document</span><span class="p">(</span><span class="s">'''<p>f&otilde;o <img src="bar.jpg">'''</span><span class="p">,</span>
<span class="n">options</span><span class="o">=</span><span class="p">{</span><span class="s">'numeric-entities'</span><span class="p">:</span><span class="mi">1</span><span class="p">})</span>
<span class="k">print</span> <span class="n">document</span>
<span class="k">print</span> <span class="n">errors</span>
</pre></div>
</div>
</div>
<div class="section" id="configuration-options">
<h2>Configuration options<a class="headerlink" href="#configuration-options" title="Permalink to this headline">¶</a></h2>
<p>The Python interface allows you to pass options directly to HTML Tidy. For a complete list of options, see the <a class="reference external" href="http://tidy.sourceforge.net/docs/quickref.html">HTML Tidy Configuration Options Quick Reference</a> or, from the command line, run <tt class="docutils literal"><span class="pre">tidy</span> <span class="pre">-help-config</span></tt>.</p>
<p>This module sets certain default options, as follows:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="n">BASE_OPTIONS</span> <span class="o">=</span> <span class="p">{</span>
<span class="s">"output-xhtml"</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span> <span class="c"># XHTML instead of HTML4</span>
<span class="s">"indent"</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span> <span class="c"># Pretty; not too much of a performance hit</span>
<span class="s">"tidy-mark"</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="c"># No tidy meta tag in output</span>
<span class="s">"wrap"</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="c"># No wrapping</span>
<span class="s">"alt-text"</span><span class="p">:</span> <span class="s">""</span><span class="p">,</span> <span class="c"># Help ensure validation</span>
<span class="s">"doctype"</span><span class="p">:</span> <span class="s">'strict'</span><span class="p">,</span> <span class="c"># Little sense in transitional for tool-generated markup...</span>
<span class="s">"force-output"</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span> <span class="c"># May not get what you expect but you will get something</span>
<span class="p">}</span>
</pre></div>
</div>
<p>If you do not like these options to be set for you, do the following after importing <tt class="docutils literal"><span class="pre">tidylib</span></tt>:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="n">tidylib</span><span class="o">.</span><span class="n">BASE_OPTIONS</span> <span class="o">=</span> <span class="p">{}</span>
</pre></div>
</div>
</div>
<div class="section" id="function-reference">
<h2>Function reference<a class="headerlink" href="#function-reference" title="Permalink to this headline">¶</a></h2>
</div>
</div>
</div>
</div>
</div>
<div class="sphinxsidebar">
<div class="sphinxsidebarwrapper">
<h3><a href="#">Table Of Contents</a></h3>
<ul>
<li><a class="reference internal" href="#">PyTidyLib: A Python Interface to HTML Tidy</a><ul>
<li><a class="reference internal" href="#naming-conventions">Naming conventions</a></li>
<li><a class="reference internal" href="#installing-html-tidy">Installing HTML Tidy</a></li>
<li><a class="reference internal" href="#installing-pytidylib">Installing PyTidyLib</a></li>
<li><a class="reference internal" href="#small-example-of-use">Small example of use</a></li>
<li><a class="reference internal" href="#configuration-options">Configuration options</a></li>
<li><a class="reference internal" href="#function-reference">Function reference</a></li>
</ul>
</li>
</ul>
<h3>This Page</h3>
<ul class="this-page-menu">
<li><a href="_sources/index.txt"
rel="nofollow">Show Source</a></li>
</ul>
<div id="searchbox" style="display: none">
<h3>Quick search</h3>
<form class="search" action="search.html" method="get">
<input type="text" name="q" />
<input type="submit" value="Go" />
<input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" />
</form>
<p class="searchtip" style="font-size: 90%">
Enter search terms or a module, class or function name.
</p>
</div>
<script type="text/javascript">$('#searchbox').show(0);</script>
</div>
</div>
<div class="clearer"></div>
</div>
<div class="related">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="genindex.html" title="General Index"
>index</a></li>
<li><a href="#">pytidylib module</a> »</li>
</ul>
</div>
<div class="footer">
© Copyright 2009 Jason Stitt.
Created using <a href="http://sphinx.pocoo.org/">Sphinx</a> 1.0.8.
</div>
</body>
</html>
|