/usr/share/doc/python-apsw/html/couchdb.html is in python-apsw-doc 3.7.6.3-r1-1build1.
This file is owned by root:root, with mode 0o644.
The actual contents of the file can be viewed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 | <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>CouchDB — APSW 3.7.6.3-r1 documentation</title>
<link rel="stylesheet" href="_static/default.css" type="text/css" />
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<script type="text/javascript">
var DOCUMENTATION_OPTIONS = {
URL_ROOT: '',
VERSION: '3.7.6.3-r1',
COLLAPSE_INDEX: false,
FILE_SUFFIX: '.html',
HAS_SOURCE: true
};
</script>
<script type="text/javascript" src="_static/jquery.js"></script>
<script type="text/javascript" src="_static/underscore.js"></script>
<script type="text/javascript" src="_static/doctools.js"></script>
<link rel="copyright" title="Copyright" href="copyright.html" />
<link rel="top" title="APSW 3.7.6.3-r1 documentation" href="index.html" />
<link rel="next" title="Copyright and License" href="copyright.html" />
<link rel="prev" title="Benchmarking" href="benchmarking.html" />
</head>
<body>
<div class="related">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="genindex.html" title="General Index"
accesskey="I">index</a></li>
<li class="right" >
<a href="py-modindex.html" title="Python Module Index"
>modules</a> |</li>
<li class="right" >
<a href="copyright.html" title="Copyright and License"
accesskey="N">next</a> |</li>
<li class="right" >
<a href="benchmarking.html" title="Benchmarking"
accesskey="P">previous</a> |</li>
<li><a href="index.html">APSW 3.7.6.3-r1 documentation</a> »</li>
</ul>
</div>
<div class="document">
<div class="documentwrapper">
<div class="bodywrapper">
<div class="body">
<div class="section" id="couchdb">
<span id="id1"></span><h1>CouchDB<a class="headerlink" href="#couchdb" title="Permalink to this headline">¶</a></h1>
<p><a class="reference external" href="http://couchdb.apache.org">CouchDB</a> is an increasingly popular
document oriented database, also known as a schema-less database. It
is also web based using <a class="reference external" href="http://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol">HTTP</a> for
access and <a class="reference external" href="http://json.org/">JSON</a> for data representation.</p>
<p>The APSW source distribution now includes a <a class="reference internal" href="vtable.html#virtualtables"><em>virtual table
implementation</em></a> that lets you access CouchDB databases
from SQLite, including both read and write access.</p>
<p>Some suggested uses:</p>
<blockquote>
<div><ul class="simple">
<li>Loading data from SQLite into CouchDB and vice versa. For example
you can import CSV data into SQLite and then upload it to CouchDB.</li>
<li>Being able to use SQLite based tools for working on CouchDB data.</li>
<li>Being able to do <a class="reference external" href="http://en.wikipedia.org/wiki/Join_%28SQL%29">joins</a> between data in
SQLite and CouchDB.</li>
<li>Using the SQLite <a class="reference internal" href="extensions.html#ext-fts3"><em>FTS full text search extension</em></a>
for data in CouchDB.</li>
<li>Using the SQLite <a class="reference internal" href="extensions.html#ext-rtree"><em>RTree extension</em></a> to do spatial
queries.</li>
</ul>
</div></blockquote>
<div class="section" id="getting-it">
<h2>Getting it<a class="headerlink" href="#getting-it" title="Permalink to this headline">¶</a></h2>
<p>You can find the code in the <a class="reference internal" href="download.html#source-and-binaries"><em>source distribution</em></a> named <tt class="file docutils literal"><span class="pre">apswcouchdb.py</span></tt>, or you can get a
copy directly from <a class="reference external" href="http://code.google.com/p/apsw/source/browse/tools/apswcouchdb.py">source control</a>
(choose “View Raw File”).</p>
<p>You will need to have the <a class="reference external" href="http://code.google.com/p/couchdb-python/">Python couchdb module</a> installed as well as its
prerequisites.</p>
</div>
<div class="section" id="usage">
<h2>Usage<a class="headerlink" href="#usage" title="Permalink to this headline">¶</a></h2>
<p>To use with your code, place the <tt class="file docutils literal"><span class="pre">apswcouchdb.py</span></tt> file anywhere
you can import it. To use it with the APSW <a class="reference internal" href="shell.html#shell"><em>Shell</em></a> use the the
<tt class="docutils literal"><span class="pre">.read</span></tt> command specifying the filename. The virtual table will be
automatically installed via the <a class="reference internal" href="apsw.html#apsw.connection_hooks" title="apsw.connection_hooks"><tt class="xref py py-attr docutils literal"><span class="pre">connection_hooks</span></tt></a> mechanism and
registering with the shell if appropriate.</p>
<p>A virtual table maps to a CouchDB database. The database needs to
exist already. Use the following SQL to create the virtual table:</p>
<div class="highlight-python"><pre>create virtual table mytable using couchdb('http://localhost:5984', 'dbname', col1, col2, col3);</pre>
</div>
<p>From that point on you can do regular SQL operations against <em>mytable</em>
where each row will correspond to a document from <em>dbname</em>. When you
drop the table it does not delete the CouchDB database. If you need
usernames and passwords then specify them as part of the url:</p>
<div class="highlight-python"><pre>http://username:password@localhost:5984</pre>
</div>
</div>
<div class="section" id="examples">
<h2>Examples<a class="headerlink" href="#examples" title="Permalink to this headline">¶</a></h2>
<div class="section" id="importing-csv-data">
<h3>Importing CSV data<a class="headerlink" href="#importing-csv-data" title="Permalink to this headline">¶</a></h3>
<p>We want to import some CSV data to a CouchDB database. The first step
is to import it into a temporary SQLite table. It could be imported
directly to CouchDB but then we wouldn’t have the ability to specify
<a class="reference external" href="http://www.sqlite.org/datatype3.html">column affinity</a> and all
data would end up as strings in CouchDB. I am using a real estate CSV
file in this example against the <a class="reference internal" href="shell.html#shell"><em>SQLite shell</em></a>:</p>
<div class="highlight-python"><pre>-- Specify affinities in temporary table so numbers from
-- the CSV end up as numbers and not strings
create temporary table refixup(street char, city char,
zip char, state char, beds int, baths int, sqft real,
type char, sale_date char, price int, latitude real,
longitude real);
-- Do the actual import
.mode csv
.import realestatetransactions.csv refixup
-- Create the CouchDB virtual table with the same column names
create virtual table realestate using couchdb('http://localhost:5984',
realestate, street, city, zip, state, beds, baths,
sqft, type, sale_date, price, latitude, longitude);
-- Copy the data from the temporary table to CouchDB
insert into realestate select * from refixup;
-- No longer need the temporary table
drop table refixup;</pre>
</div>
<p>Use <tt class="docutils literal"><span class="pre">.help</span> <span class="pre">import</span></tt> for more hints on importing data.</p>
</div>
<div class="section" id="using-fts">
<h3>Using FTS<a class="headerlink" href="#using-fts" title="Permalink to this headline">¶</a></h3>
<p>In this example we have CouchDB documents that are recipes. The
CouchDB _id field is the recipe name:</p>
<div class="highlight-python"><pre>-- Virtual table
create virtual table recipes using couchdb('http://localhost:5984',
_id, ingredients);
-- FTS table
create virtual table recipesearch using fts3(name, ingredients);
-- Copy the data from CouchDB to FTS3
insert into recipesearch(name, ingredients)
select _id,ingredients from recipes;
-- Which ones have these ingredients
select name as _id from recipesearch where ingredients
MATCH 'onions cheese';</pre>
</div>
</div>
</div>
<div class="section" id="implementation-notes">
<h2>Implementation Notes<a class="headerlink" href="#implementation-notes" title="Permalink to this headline">¶</a></h2>
<p>Automatic column names</p>
<blockquote>
<div><p>If you don’t want to manually specify all columns then you can use
‘+’ as a column and the first 1,000 documents in the database will
be examined to see what keys they have. Any that you have not
already specified as a column will be added to the column list in
alphabetical order. Only keys without a leading underscore are
considered:</p>
<div class="highlight-python"><pre>create virtual table mytable using couchdb('http://localhost:5984', 'dbname', '+');</pre>
</div>
</div></blockquote>
<p>Document Ids and Rowids</p>
<blockquote>
<div><p>CouchDB uses the document id (key _id) as the unique identifier.
SQLite uses a 64 bit integer rowid. In order to map the two a
<a class="reference external" href="http://www.sqlite.org/lang_createtable.html">temporary table</a> is
used behind the scenes. The revision (_rev) is also stored. SQLite
stores temporary tables separately and discards them when the
corresponding database is closed.</p>
<p>By default the temporary tables are stored in a file. You can use a
<a class="reference external" href="http://www.sqlite.org/pragma.html#pragma_temp_store">pragma</a> to
change that. For example <tt class="docutils literal"><span class="pre">pragma</span> <span class="pre">temp_store=memory</span></tt> will use
memory instead. My 200,000 document test database resulted in a
temporary mapping table of 50MB if I accessed all rows/documents.</p>
</div></blockquote>
<p>Scalability</p>
<blockquote>
<div><p>It is intended that you can use the virtual table with large
databases. For example development and profiling were done with a
200,000 document database using over 2GB of storage.</p>
<p>This means that behind the scenes the CouchDB <a class="reference external" href="http://wiki.apache.org/couchdb/HTTP_Bulk_Document_API">bulk API</a> is used.
(Doing an HTTP request per row/document read or written would be far
too slow.) Pending updates (adds, changes, deletions) are batched
up and sent as a group which means there is a delay between when
your SQL executes and the CouchDB server taking the appropriate
action. This can also lead to a delay in error reporting.</p>
<p>By default documents are read and written in batches of 5,000. The
larger the number the fewer HTTP requests are made but more memory
is consumed. You can change this number like this:</p>
<div class="highlight-python"><pre>select couchdb_config('read-batch', 2000);
select couchdb_config('write-batch', 10000);</pre>
</div>
<p>If you are debugging code then setting them to 1 will cause an
immediate HTTP request per row/document read or written rather than
waiting till the batch is full or for a transaction boundary.</p>
</div></blockquote>
<p>Updates</p>
<blockquote>
<div><p>SQLite (and SQL) define a fixed list of columns for each row while
CouchDB can have zero or more keys per document. In normal use of
this module you would have listed a subset of the possible keys as
columns for SQLite. CouchDB does an update by supplying a complete
new document. If an update specified only the keys/columns declared
at the SQLite level then other keys would be lost. Consequently on
each update this module has to obtain the document and update the
fields specified via SQL. Obtaining these documents as needed one
at a time slows down updates, but does prevent them from losing any
fields not known to SQL. If you have specified all the fields in
SQL then you can off this behaviour saving one HTTP request per
document updated:</p>
<div class="highlight-python"><pre>select couchdb_config('deep-update', 0);</pre>
</div>
</div></blockquote>
<p>None/null/undefined</p>
<blockquote>
<div><p>In SQL null means a value is not present. No two nulls are equal to
each other plus <a class="reference external" href="http://www.sqlite.org/nulls.html">other quirks</a>.
In Python None is a value more like a non-type specific zero
although it is a singleton. Javascript has both undefined with a
SQL null like meaning and null with Python None like meaning. JSON
can represent null but not undefined.</p>
<p>Whenever a key is not present in a document but a value is required
by SQLite then null is returned. When creating or updating
documents, a value of null from SQLite is treated as meaning you do
not want the key in the document. (Reading it back will still get
you a null in SQLite even though technically the document value is
undefined.) This means that you cannot use this module to set a
value in a CouchDB document to null - the key will not be present.</p>
</div></blockquote>
<p>Transactions</p>
<blockquote>
<div>SQLite has the same transaction boundaries as SQL and supports
transactions. CouchDB is atomic but does not support transactions.
Any outstanding batched updates are also flushed on SQLite
transaction boundaries as well as other points such as when a cursor
starts from the beginning again. This means that a rollback can only
discard unflushed pending updates but not undo earlier updates
within the SQL transaction.</div></blockquote>
<p>Types</p>
<blockquote>
<div><p>Communication between this module and CouchDB uses JSON over HTTP.
All the JSON types map to Python fundamental types, but only to a
subset of SQLite types.</p>
<table border="1" class="docutils">
<colgroup>
<col width="29%" />
<col width="38%" />
<col width="33%" />
</colgroup>
<thead valign="bottom">
<tr class="row-odd"><th class="head">JSON</th>
<th class="head">Python</th>
<th class="head">SQLite</th>
</tr>
</thead>
<tbody valign="top">
<tr class="row-even"><td>Float</td>
<td>double</td>
<td>Real</td>
</tr>
<tr class="row-odd"><td>Integer
(Actually
stored as
a float)</td>
<td>int (unlimited)</td>
<td>int (max 64
bit)</td>
</tr>
<tr class="row-even"><td>String</td>
<td>unicode</td>
<td>char</td>
</tr>
<tr class="row-odd"><td>null</td>
<td>None</td>
<td>null</td>
</tr>
<tr class="row-even"><td>No
equivalent</td>
<td>buffer / bytes</td>
<td>blob</td>
</tr>
<tr class="row-odd"><td>List</td>
<td>list</td>
<td>No
equivalent</td>
</tr>
<tr class="row-even"><td>Object</td>
<td>dict</td>
<td>No
equivalent</td>
</tr>
</tbody>
</table>
<p>Values in JSON that have no SQLite equivalent such as a list are
<a class="reference external" href="http://docs.python.org/library/pickle">pickled</a> and supplied to
SQLite as a blob. Similarly any blob supplied to SQLite intended
for CouchDB must be pickled Python data. Use <tt class="docutils literal"><span class="pre">-1</span></tt> as the pickle
version (selects binary encoding).</p>
<p>Note that this module has no affinity rules. Whatever type is
supplied at the SQL level is then sent as the same type to CouchDB.
You cannot specify types when creating the virtual table. If you
need affinity then use an intermediary temporary table as the
example in the next section shows.</p>
<div class="admonition note">
<p class="first admonition-title">Note</p>
<p>Although JSON has separate Integer and Float types, Javascript
itself does not and stores everything as a floating point number
which has about 15 digits of precision. Python, Erlang and CouchDB
support arbitrary large numbers but data that passes through a
Javascript view server will lose precision. For example
9223372036854775807 will come back as 9223372036854776000
(different last 4 digits).</p>
<p class="last">For Python 2 note that Pickle encodes strings and unicode strings
differently even when they have same ASCII contents. If you are
trying to do an equality check then ensure all strings including
dictionary keys are unicode before pickling.</p>
</div>
</div></blockquote>
<p>Expressions</p>
<blockquote>
<div><p>SQL is accelerated by using indices. These are
precomputed/presorted views of the data and used when evaluating
queries like <tt class="docutils literal"><span class="pre">select</span> <span class="pre">*</span> <span class="pre">from</span> <span class="pre">items</span> <span class="pre">where</span> <span class="pre">price</span> <span class="pre">></span> <span class="pre">74.99</span> <span class="pre">and</span>
<span class="pre">quantity<=10</span> <span class="pre">and</span> <span class="pre">customer='Acme</span> <span class="pre">Widgets'</span></tt> in order to avoid
visiting every row in the table.</p>
<p>If you have constraints like the above then this module uses a
CouchDB <a class="reference external" href="http://wiki.apache.org/couchdb/HTTP_view_API">temporary view</a> so that CouchDB
chooses an appropriate subset of documents rather than having to
transfer all of them and have SQLite do the filtering.</p>
<p>With a large number of documents the view can take quite a while for
CouchDB to calculate but is still quicker than sucking down all the
data for SQLite to calculate in most cases. Because the views are
temporary they will eventually be discarded by CouchDB.</p>
<p>This also means that Javascript’s rules are used for evaluation in
circumstances such as comparing strings to integers. However SQL
rules are used for nulls - they are not equal to each other or any
other value.</p>
<p>SQLite only supports one index per query. If all the constraints
are joined by <strong>and</strong> then this module can tell SQLite it has one
index covering all of them. If you use <strong>or</strong> then the index/view
can only be used for one side of the <strong>or</strong> expression with SQLite
having to evaluate the other.</p>
<p>You can get SQLite to do the row/document filtering (which means
retreiving all documents) like this:</p>
<div class="highlight-python"><pre>select couchdb_config('server-eval', 0);</pre>
</div>
</div></blockquote>
<p>Configuration summary</p>
<blockquote>
<div><p>You can change behaviour of the module by using the
<tt class="docutils literal"><span class="pre">couchdb_config</span></tt> SQL function. If you call with one argument then
it returns the current value and with two sets the value. The
various options are described in more detail in the relevant
sections above.</p>
<table border="1" class="docutils">
<colgroup>
<col width="25%" />
<col width="13%" />
<col width="62%" />
</colgroup>
<thead valign="bottom">
<tr class="row-odd"><th class="head">Option</th>
<th class="head">Default</th>
<th class="head">Description</th>
</tr>
</thead>
<tbody valign="top">
<tr class="row-even"><td>read-batch</td>
<td>5,000</td>
<td>How many documents are retrieved at a time
from the server</td>
</tr>
<tr class="row-odd"><td>write-batch</td>
<td>5,000</td>
<td>How many documents being created/changed
are saved up before sending in one request
to the bulk api on the server</td>
</tr>
<tr class="row-even"><td>deep-update</td>
<td>1 (True)</td>
<td>If a change is made to a row/document then
the original is retreived from the server
(one at a time) so that the keys not
specified as SQL level columns aren’t lost</td>
</tr>
<tr class="row-odd"><td>server-eval</td>
<td>1 (True)</td>
<td>SQL column expressions are evaluated in
the server (by claiming there is an index)
rather than downloading all documents
and having SQLite do the evaluation.</td>
</tr>
</tbody>
</table>
</div></blockquote>
</div>
</div>
</div>
</div>
</div>
<div class="sphinxsidebar">
<div class="sphinxsidebarwrapper">
<h3><a href="index.html">Table Of Contents</a></h3>
<ul>
<li><a class="reference internal" href="#">CouchDB</a><ul>
<li><a class="reference internal" href="#getting-it">Getting it</a></li>
<li><a class="reference internal" href="#usage">Usage</a></li>
<li><a class="reference internal" href="#examples">Examples</a><ul>
<li><a class="reference internal" href="#importing-csv-data">Importing CSV data</a></li>
<li><a class="reference internal" href="#using-fts">Using FTS</a></li>
</ul>
</li>
<li><a class="reference internal" href="#implementation-notes">Implementation Notes</a></li>
</ul>
</li>
</ul>
<h4>Previous topic</h4>
<p class="topless"><a href="benchmarking.html"
title="previous chapter">Benchmarking</a></p>
<h4>Next topic</h4>
<p class="topless"><a href="copyright.html"
title="next chapter">Copyright and License</a></p>
<h3>This Page</h3>
<ul class="this-page-menu">
<li><a href="_sources/couchdb.txt"
rel="nofollow">Show Source</a></li>
</ul>
<div id="searchbox" style="display: none">
<h3>Quick search</h3>
<form class="search" action="search.html" method="get">
<input type="text" name="q" size="18" />
<input type="submit" value="Go" />
<input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" />
</form>
<p class="searchtip" style="font-size: 90%">
Enter search terms or a module, class or function name.
</p>
</div>
<script type="text/javascript">$('#searchbox').show(0);</script>
</div>
</div>
<div class="clearer"></div>
</div>
<div class="related">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="genindex.html" title="General Index"
>index</a></li>
<li class="right" >
<a href="py-modindex.html" title="Python Module Index"
>modules</a> |</li>
<li class="right" >
<a href="copyright.html" title="Copyright and License"
>next</a> |</li>
<li class="right" >
<a href="benchmarking.html" title="Benchmarking"
>previous</a> |</li>
<li><a href="index.html">APSW 3.7.6.3-r1 documentation</a> »</li>
</ul>
</div>
<div class="footer">
© <a href="copyright.html">Copyright</a> 2004-2010, Roger Binns <rogerb@rogerbinns.com>.
Last updated on May 20, 2011.
Created using <a href="http://sphinx.pocoo.org/">Sphinx</a> 1.1pre/4b8d012cf82e.
</div>
</body>
</html>
|