This file is indexed.

/usr/share/doc/coop-computing-tools/manual/parrot.html is in coop-computing-tools-doc 4.0-1ubuntu5.

This file is owned by root:root, with mode 0o644.

The actual contents of the file can be viewed below.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
<html>

<head>
<title>Parrot User's Manual</title>
</head>

<h1>Parrot User's Manual</h1>
<b>October 2011</b>
<p>

Parrot is Copyright (C) 2003-2004 Douglas Thain and
Copyright (C) 2005- The University of Notre Dame.
All rights reserved.
This software is distributed under the GNU General Public License.
See the file COPYING for details.
<p>
<b>Please use the following citation to refer to Parrot:</b><br>
<dir>
<li> Douglas Thain and Miron Livny, <a href=http://www.cse.nd.edu/~dthain/papers/parrot-scpe.pdf>Parrot: An Application Environment for Data-Intensive Computing</a>, Scalable Computing: Practice and Experience, Volume 6, Number 3, Pages 9--18, 2005.
</dir>

<h2>Overview</h2>

Parrot is a tool for attaching old programs to new storage systems.  Parrot makes a remote storage system appear as a file system to a legacy application.  Parrot does not require any special privileges, any recompiling, or any change whatsoever to existing programs.  It can be used by normal users doing normal tasks.  For example, an anonymous FTP service is made available to <code>vi</code> like so:
<dir>
<pre>
% parrot_run vi /anonftp/ftp.gnu.org/pub/README
</pre>
</dir>
<p>
Parrot is useful to <i>users</i> of distributed systems, because it frees them from rewriting code to work with new systems and relying on remote administrators to trust and install new software.
Parrot is also useful to <i>developers</i> of distributed systems, because it allows rapid deployment of new code to real applications and real users that do not have the time, inclination, or permissions to build a kernel-level filesystem.
<p>
Parrot currently supports a variety of remote I/O systems, all detailed below.
We welcome contributions of new remote I/O drivers from others.
However, if you are working on a protocol driver
please drop us a note so that we can make sure work
is not duplicated.
<p>
Almost any application - whether static or dynmically linked,
standard or commercial, command-line or GUI - should work with
Parrot.  There are a few exceptions.  Because Parrot relies on
the Linux <code>ptrace</code> interface
any program that relies on the ptrace interface cannot run under Parrot.
This means Parrot cannot run a debugger, nor can it run itself recursively.
In addition, Parrot cannot run setuid programs, as the operating system
system considers this a security risk.
<p>
Parrot also provide a new experimental features called <i>identity boxing</i>.
This feature allows you to securely run a visiting application within
a protection domain without become root or creating a new account.
Read below for more information on identity boxing.
<p>
Parrot currently runs on the Linux operating system with either
Intel compatible (i386) or AMD compatible (x86_64) processors.
It relies on some fairly low level details in order to implement
system call trapping.  Ports to other platforms and processors
Linux may be possible in the future.
<p>
Like any software, Parrot is bound to have some bugs.
Please check the <a href=http://www.cse.nd.edu/~ccl/software/parrot/known-bugs.html>known bugs</a> page for the latest scoop.

<h2>Installation</h2>

Parrot is distributed as part of the
<a href=http://www.cse.nd.edu/~ccl/software>Cooperative Computing Tools</a>.
To install, please read the <a href=install.html>cctools installation instructions</a>.

<h2>Examples</h2>

To use Parrot, you simply use the <code>parrot</code> command followed by any other Unix program.  For example, to run a Parrot-enabled <code>vi</code>, execute this command:
<dir>
<pre>
% parrot_run vi /anonftp/ftp.gnu.org/pub/README
</pre>
</dir>
Of course, it can be clumsy to put <code>parrot_run</code> before every command you run, so try starting a shell with Parrot already loaded:
<dir>
<pre>
% parrot_run bash
</pre>
</dir>
Now, you should be able to run any standard command using Parrot filenames.  Here are some examples to get you thinking:
<dir>
<pre>
% cp /http/www.nd.edu/~dthain/papers/parrot-agm2003.pdf .
% grep Yahoo /http/www.yahoo.com
% set autolist
% cat /anonftp/ftp.gnu.org/pub<b>[Press TAB here]</b>
</pre>
</dir>
<p>

<center>
<table width=75% border=2>
<tr bgcolor=#ffffaa>
<td>
Hint: You may find it useful to have some visual indication
of when Parrot is active, so we recommend that you modify
your shell startup scripts to change the prompt when Parrot is enabled.
If you use <tt>tcsh</tt>, you might add something like this to your <tt>.cshrc</tt>:
<pre>
        if ( $?PARROT_ENABLED ) then
                set prompt = " (Parrot) %n@%m%~%# "
        else
                set prompt = " %n@%m%~%# "
        endif
</pre>
</table>
</center>
<p>

We have limited the examples so far to HTTP and anonymous FTP, as they are
the only services we know that absolutely everyone is familiar with.
There are a number of other more powerful and secure remote services
that you may be less familiar with.  Parrot supports them in the same form:
The filename begins with the service type, then the host name,
then the file name.  Here are all the currently supported services:

<hr>
<table>
<tr><td><b>example path</b><td><b>remote service</b><td><b>more info</b>
<tr><td>/http/www.somewhere.com/index.html<td>Hypertext Transfer Protocol<td>included
<tr><td>/grow/www.somewhere.com/index.html<td>GROW - Global Read-Only Web Filesystem <td>included
<tr><td>/ftp/ftp.cs.wisc.edu/RoadMap<td>File Transfer Protocol<td>included
<tr><td>/anonftp/ftp.cs.wisc.edu/RoadMap<td>Anonymous File Transfer Protocol<td>included
<tr><td>/gsiftp/ftp.globus.org/path<td>Globus Security + File Transfer Protocol<td><a href=http://www.globus.org/gridftp>more info</a>
<tr><td>/irods/host:port/zone/home/user/path<td>iRODS<td><a href=http://irods.sdsc.edu>more info</a>
<tr><td>/hdfs/namenode:port/path<td>Hadoop Distributed File System (HDFS)<td><a href=http://hadoop.apache.org/common/docs/current/hdfs_user_guide.html>more info</a>
<tr><td>/xrootd/host:port/path<td>XRootD/Scalla Distributed Storage System (xrootd)<td><a href=http://project-arda-dev.web.cern.ch/project-arda-dev/xrootd/site/index.html>more info</a>
<tr><td>/cvmfs/grid.cern.ch/path<td>CernVM-FS<td><a href=http://cernvm.cern.ch/portal/filesystem>more info</a> 
<tr><td>/chirp/target.cs.wisc.edu/path<td>Chirp Storage System<td>included + <a href=chirp.html>more info</a> 
</table>
<hr>
<p>
The following protocols have been supported in the past, but are not currently in active use.
<p>
<hr>
<table>
<tr><td>/nest/nest.cs.wisc.edu/path<td>Network Storage Technology<td><a href=http://www.cs.wisc.edu/condor/nest>more info</a>
<tr><td>/rfio/host.cern.ch/path<td>Castor Remote File I/O<td><a href=http://castor.web.cern.ch/castor/Welcome.html>more info</a>
<tr><td>/dcap/dcap.cs.wisc.edu/pnfs/cs.wisc.edu/path<td>DCache Access Protocol<td><a href=http://dcache.desy.de>more info</a>
<tr><td>/lfn/logical/path<td>Logical File Name - Grid File Access Library<td><a href=http://grid-deployment.web.cern.ch/grid-deployment/gis/GFAL/GFALindex.html>more info</a>
<tr><td>/srm/server/path<td>Site File Name - Grid File Access Library<td><a href=http://grid-deployment.web.cern.ch/grid-deployment/gis/GFAL/GFALindex.html>more info</a>
<tr><td>/guid/abc123<td>Globally Unique File Name - Grid File Access Library<td><a href=http://grid-deployment.web.cern.ch/grid-deployment/gis/GFAL/GFALindex.html>more info</a>
<tr><td>/gfal/protocol://host//path<td>Grid File Access Library<td><a href=http://grid-deployment.web.cern.ch/grid-deployment/gis/GFAL/GFALindex.html>more info</a>
</table>
<hr>
<p>
You will notice quite quickly that not all remote I/O systems
provide all of the functionality common to an ordinary file system.
For example, HTTP is incapable of listing files.
If you attempt to perform a directory listing on an HTTP
server, Parrot will attempt to keep <code>ls</code> happy
by producing a bogus directory entry:
<pre>
    % parrot_run ls -la /http/www.yahoo.com/
    -r--r--r--    1 dthain   dthain          0 Jul 16 11:50 /http/www.yahoo.com
</pre>

A less-drastic example is found in FTP.  If you attempt
to perform a directory listing of an FTP server, Parrot
fills in the available information -- the file names and
their sizes -- but again inserts bogus information to fill the rest out:
<pre>
    % parrot_run ls -la /anonftp/ftp.gnu.org/pub
    -rwxrwxrwx 1 dthain dip    580 Oct  5 16:00 CRYPTO.README
    -rwxrwxrwx 1 dthain dip 166697 Oct  5 16:00 find.txt.gz
    drwxrwxrwx 1 dthain dip      0 Oct  5 16:00 gnu
...
</pre>

If you would like to get a better idea of the underlying behavior
of Parrot, try running it with the <code>-d remote</code> option,
which will display all of the remote I/O operations that it performs
on a program's behalf:

<pre>
    % parrot_run -d remote ls -la /anonftp/ftp.gnu.org
    ...
    ftp.cs.wisc.edu <-- TYPE I
    ftp.cs.wisc.edu --> 200 Type set to I.
    ftp.cs.wisc.edu <-- PASV
    ftp.cs.wisc.edu --> 227 Entering Passive Mode (128,105,2,28,194,103)
    ftp.cs.wisc.edu <-- NLST /
    ftp.cs.wisc.edu --> 150 Opening BINARY mode data connection for file list.
    ...
</pre>

If your program is upset by the unusual semantics of such
storage systems, then consider using the Chirp protocol and
server:

<h2>The Chirp Protocol and Server</h2>

Although Parrot works with many different protocols,
is it limited by the capabilities provided by each underlying system.
(For example, HTTP does not have reliable directory listings.)
Thus, we have developed a custom protocol, <b>Chirp</b>,
which provides secure remote file access with all of the capabilities
needed for running arbitrary Unix programs.
<b>Chirp</b> is included with the distribution of Parrot,
and requires no extra steps to install.
<p>
To start a Chirp server, simply do the following:
<pre>
    % chirp_server -d all
</pre>
The <b>-d all</b> option turns on debugging, which helps you to understand
how it works initially.  You may remove this option once everything
is working.
<p>
Suppose the Chirp server is running on <tt>bird.cs.wisc.edu</tt>.
Using Parrot, you may access all of the Unix features of
that host from elsewhere:
<pre>
    % parrot_run tcsh
    % cd /chirp/bird.cs.wisc.edu
    % ls -la
    % ...
</pre>
In general, Parrot gives better performance and usability
with Chirp than with other protocols.
You can read extensively about the Chirp server and protocol
<a href=chirp.html>in the Chirp manual</a>.
<p>
In addition, Parrot provides several custom command line tools
(<tt>parrot_getacl</tt>, <tt>parrot_setacl</tt>, <tt>parrot_lsalloc</tt>,
and <tt>parrot_mkalloc</tt>) that can be used to manage the
access control and space allocation features of Chirp from the Unix command line.

<h2>Name Resolution</h2>

In addition to accessing remote storage, Parrot allows you
to create a custom namespace for any program.  All file name
activity passes through the Parrot <b>name resolver</b>, which
can transform any given filename according to a series of rules
that you specify.
<p>
The simplest name resolver is the <b>mountlist</b>, given
by the <b>-m mountfile</b> option.  This file corresponds closely
to <tt>/etc/ftsab</tt> in Unix.  A mountlist is simply a file with two
columns.  The first column gives a logical directory or file name,
while the second gives the physical path that it must be connected
to.
<p>
For example, if a database is stored at an FTP server under the
path <tt>/anonftp/ftp.cs.wisc.edu/db</tt>, it may be spliced into the
filesystem under <tt>/dbase</tt> with a mount list like this:
<pre>
     /dbase       /anonftp/ftp.cs.wisc.edu/db
</pre>
Instruct Parrot to use the mountlist as follows:
<pre>
    % parrot_run -m mountfile tcsh
    % cd /dbase
    % ls -la
</pre>
A single mount entry may be given on the command line
with the <b>-M</b> option as follows:
<pre>
    % parrot_run -M /dbase=/anonftp/ftp.cs.wisc.edu/db tcsh
</pre>
A more sophisticated way to perform name binding is with
an <i>external resolver</i>.  This is a program executed
whenever Parrot needs to locate a file or directory.
The program accepts a logical file name and then returns
the physical location where it can be found.
<p>
Suppose that you have a database service that locates
the nearest copy of a file for you.  If you run the
command <tt>locate_file</tt>, it will print out the nearest
copy of a file.  For example:
<pre>
    % locate_file /1523.data
    /chirp/server.nd.edu/mix/1523.data
</pre>

To connect the program <tt>locate_file</tt> to Parrot,
simply give a mount string that specifies the program
as a resolver:
<pre>
    % parrot_run -M /dbase=resolver:/path/to/locate_file tcsh
</pre>

Now, if you attempt to access files under /dbase,
Parrot will execute <tt>locate_file</tt> and access
the data stored there:
<pre>
    % cat /dbase/1523.data
    (see contents of /chirp/server.nd.edu/mix/1523.data)
</pre>

<h2>More Efficient Copies with <tt>parrot_cp</tt></h2>

If you are using Parrot to copy lots of files across the network,
you may see better performance using the <tt>parrot_cp</tt> tool.
This program looks like an ordinary <tt>cp</tt>, but it makes
use of an optimized Parrot system call that streams entire files
over the network, instead of copying them block by block.
<p>
To use <tt>parrot_cp</tt>, simply use your shell to alias calls
to <tt>cp</tt> with calls to <tt>parrot_cp</tt>:
<p>
<pre>
% parrot_run tcsh
% alias cp parrot_cp
% cp /tmp/mydata /chirp/server.nd.edu/joe/data
% cp -rR /chirp/server.nd.edu/joe /tmp/joe
 </pre>
<p>
If run outside of Parrot, <tt>parrot_cp</tt> will operate as
an ordinary <tt>cp</tt> without any performance gain or loss.

<h2>Notes on Protocols</h2>

<h3>HTTP Proxy Servers</h3>

HTTP, CVMFS, and GROW can take advantage of standard HTTP proxy servers.
To route requests through a single proxy server, set the HTTP_PROXY
environment variable to the server name and port:
<pre>
    % setenv HTTP_PROXY "http://proxy.nd.edu:8080"
</pre>
Multiple proxy servers can be given, separated by a semicolon.
This will cause Parrot to try each proxy in order until one succeeds.
If <tt>DIRECT</tt> is given as the last name in the list, then
Parrot will fall back on a direct connection to the target web server.
For example:
<pre>
    % setenv HTTP_PROXY "http://proxy.nd.edu:8080;http://proxy.wisc.edu:1000;DIRECT"
</pre>


<h3>GROW - Global Read Only Web Filesystem</h3>

Although the strict HTTP protocol does not allow for correct
structured directory listings, it is possible to emulate
directory listings with a little help from the underlying filesystem.
We call this technique GROW, a global filesystem based on the Web.
GROW requires the exporter of data to run a script (<tt>make_growfs</tt>)
that generates a complete directory listing of the data that you wish to
export.  This directory listing is then used to produce reliable metadata.
Of course, if the data changes, the script must be run again,
so GROW is only useful for data that changes infrequently.
<p>
To set up an GROW filesystem, you must run <tt>make_growfs</tt>
on the web server machine with the name of the local storage directory
as the argument.  For example, suppose that the web server
<tt>my.server.com</tt> stores pages for the URL
<tt>http://my.server.com/~fred</tt> in the local directory <tt>/home/fred/www</tt>.
In this case, you should run the following command:
<p>
<pre>
    % make_growfs /home/fred/www
</pre>
Now, others may perceive the web server as a file server under the /grow
hierarchy.  For example:
<pre>
    % parrot_run tcsh
    % cd /grow/my.server.com/~fred
    % ls -la
</pre>

In addition to providing precise directory metadata, GROW offers
two additional advantages over plain HTTP:
<dir>
<li><b>Aggressive Caching.</b>  GROW caches files in an on-disk
cache, but unlike plain HTTP, does not need to issue up-to-date
checks against the server.   Using the cached directory metadata,
it can tell if a file is up-to-date without any network communication.
The directory is only checked for changes at the beginning of program
execution, so changes become visible only to newly executed programs.
<li><b>SHA-1 Integrity.</b>
<tt>make_growfs</tt> generates SHA-1 checksums on the directory
and each file so that the integrity of the system can be verified at
runtime.
If a checksum fails, GROW will attempt to reload the file or directory
listing in order to repair the error, trying until the master timeout
(set by the -T option) expires.  This will also occur if the underlying
files have been modified and <tt>make_growfs</tt> has not yet been re-run.
If necessary, checksums can be disabled by giving the <tt>-k</tt> option
to either Parrot or <tt>make_growfs</tt>.
</dir>

<h3>CVMFS - CernVM-FS</h3>

<p>CVMFS is a read-only filesystem, which was initially based on GROW.
It is used within CernVM-FS to provide access to software
repositories.  It may be used outside of CernVM-FS by mounting it as a
FUSE module.  Parrot makes it possible to access CVMFS in cases where
mounting CVMFS via FUSE is not an option.  Like GROW, CVMFS makes use
of web proxies and local disk caching for scalability.  For security,
data integrity is verified with cryptographic checksums.  For
increased reliability and performance, CVMFS repositories may also be
mirrored in multiple locations and accessed via groups of
load-balanced web proxies, with fail-over between groups.

<p>The CVMFS repositories hosted by the <a
href=http://cernvm.cern.ch/>CernVM project</a> are enabled by default.
To access a different repository, it is necessary to configure parrot
to know how to access the repository.  This may be done with the
<tt>-r</tt> option or with the <tt>PARROT_CVMFS_REPO</tt> environment
variable.  The repository configuration syntax is

<pre>
repo_name:options  repo_name2:option2  ...
</pre>

<p>The repository with <tt>repo_name</tt> is used when the parrot user
attempts to access the matching path <tt>/cvmfs/repo_name/...</tt>.
The configured repository name may begin with <tt>*</tt>, which acts
as a wildcard, matching one or more characters in the requested
repository name.  This is useful when multiple repositories are hosted
at the same site and all configuration details are the same except the
beginnings of the repository names, as in <tt>atlas.cern.ch</tt> and
<tt>cms.cern.ch</tt>.  Any <tt>*</tt> appearing in the options is
replaced by the characters in the requested path that were matched by
the <tt>*</tt> in the configured repository name.  If a cvmfs path
matches more than one configured repository, the last one appearing in
the configuration takes precedence.

<p>The format of the repository options is

<pre>
option1=value1,option2=value2,...
</pre>

<p>Literal spaces, tabs, newlines, asterisks, equal signs, or commas
in the options must be proceeded by a backslash to avoid being
interpreted as delimiters.  If the same option is specified more than
once, the last value takes precedence.  The possible options are
listed in the table below.  The <tt>url</tt> option is required and
has no default.  The <tt>proxies</tt> option is required and defaults
to the proxy used by parrot, if any.

<p>
<table>
<tr><td><b>url=URL</b> <td>The URL of the CernVM-FS server(s): 'url1;url2;...'
<tr><td><b>proxies=HTTP_PROXIES</b>   <td>Set the HTTP proxy list, such as 'proxy1|proxy2'; default is given by -P option (HTTP_PROXY)
<tr><td>                              <td>Proxies separated by '|' are randomly chosen for load balancing.
                                          Groups of proxies separated by ';' may be specified for failover.
                                          If the first group fails, the second group is used, and so on down the chain.
<tr><td><b>cachedir=DIR</b>           <td>Where to store disk cache; default is within parrot temp directory (-t option)
<tr><td><b>timeout=SECONDS</b> <td>Timeout for network operations; default is given by -T option (PARROT_TIMEOUT)
<tr><td><b>timeout_direct=SECONDS</b> <td>Timeout in for network operations without proxy; default is given by -T option (PARROT_TIMEOUT)
<tr><td><b>max_ttl=MINUTES</b>        <td>Maximum TTL for file catalogs; default: take from catalog
<tr><td><b>allow_unsigned</b>         <td>Accept unsigned catalogs (allows man-in-the-middle attacks)
<tr><td><b>whitelist=URL</b>          <td>HTTP location of trusted catalog certificates (defaults is /.cvmfswhitelist)
<tr><td><b>pubkey=PEMFILE</b>         <td>Public RSA key that is used to verify the whitelist signature.
<tr><td><b>rebuild_cachedb</b>        <td>Force rebuilding the quota cache db from cache directory
<tr><td><b>quota_limit=MB</b>         <td>Limit size of cache. -1 (the default) means unlimited.
<tr><td>                              <td>If not -1, files larger than quota_limit-quota_threshold will not be readable.
<tr><td><b>quota_threshold=MB</b>     <td>Cleanup cache until size is <= threshold
<tr><td><b>deep_mount=prefix</b>      <td>Path prefix if a repository is mounted on a nested catalog
<tr><td><b>repo_name=NAME</b>         <td>Unique name of the mounted repository; default is the name used for this configuration entry
<tr><td><b>mountpoint=PATH</b>        <td>Path to root of repository; default is /cvmfs/repo_name
<tr><td><b>blacklist=FILE</b>         <td>Local blacklist for invalid certificates.  Has precedence over the whitelist.
</table>

<p>Setting the CVMFS configuration overrides the default
configuration.  If it is desired to configure additional repositories
but still retain the default repositories, the configuration entry
<nobr><tt>&lt;default-repositories&gt;</tt></nobr> may be put in the
configuration string.  Example:

<pre>
parrot_run -r '&lt;default-repositories&gt; my.repo:url=http://cvmfs.server.edu/cvmfs/my.repo.edu' ...
</pre>

<p>The configuration of the default repositories may be modified by
specifying additional options using the syntax
<nobr><tt>&lt;default-repositories&gt;:option1=value1,option2=value2</tt></nobr>.

<p>The CVMFS library can only support one active repository open at a time.
If a program attempts to access more than one repository in a session,
Parrot will emit an error.  If the optional argument --cvmfs-repo-switching
is given to <tt>parrot_run</tt>, then Parrot will switch repositories at
run-time by re-initializing the CVMFS library.  This feature is experimental.
</p>

<h3>Hadoop Distributed File System (HDFS)</h3>

HDFS is the primary distributed filesystem used in the <a
    href="http://hadoop.apache.org">Hadoop</a> project.  Parrot supports read
and write access to HDFS systems using the <tt>parrot_run_hdfs</tt> wrapper.  This
script checks that the appropriate environmental variables are defined and
calls <tt>parrot</tt>.
<p>
In particular, you must ensure that you define the following environmental
variables:

<pre>
JAVA_HOME     Location of your Java installation.
HADOOP_HOME   Location of your Hadoop installation.
</pre>

Based on these environmental variables, <tt>parrot_run_hdfs</tt> will attempt to
find the appropriate paths for <tt>libjvm.so</tt> and <tt>libhdfs.so</tt>.
These paths are stored in the environmental variables <tt>LIBJVM_PATH</tt> and
<tt>LIBHDFS_PATH</tt>, which are used by the HDFS Parrot module to load the
necessary shared libraries at run-time.  To avoid the startup overhead of
searching for these libraries, you may set the paths manually in your
environment before calling <tt>parrot_run_hdfs</tt>, or you may edit the script
directly.

<p>

Note that while Parrot supports read access to HDFS, it only provides
write-once support on HDFS.  This is because the current implementations of
HDFS do not provide reliable append operations.  Likewise, files can only be
opened in either read (<tt>O_RDONLY</tt>) or write mode (<tt>O_WRONLY</tt>),
and not both (<tt>O_RDWR</tt>).

<h2>Identity Boxing</h2>

Parot provides a unique feature known as <i>identity boxing</i>.
This feature allows you to run a (possibly) untrusted program
within a protection domain, as if it were run in a completely separate account.
Using an identity box, you do not need to become root or even
to manage account names: you can create any identity that you like on the fly.
<p>
For example, suppose that you wish to allow a friend to log into your
private workstation.  Instead of creating a new account, simply use a
script supplied with Parrot to create an identity box:
<p>
<pre>
% whoami
dthain
% parrot_identity_box MyFriend
% whoami
MyFriend
% touch ~dthain/private-data
touch: creating ~dthain/private-data': Permission denied
</pre>
<p>
Note that the shell running within the identity box cannot change or
modify any of the supervising user's data.  In fact, the contained
user can only access items that are world-readable or world-writable.
<p>
You can give the contained user access to other parts of the filesystem
by creating access control lists. (ACLs)  An ACL is a list of users
and the resources that they are allowed to access.  Each directory
has it's own ACL in the file <tt>.__acl</tt>.  This file does not
appear in a directory listing, but you can read and write it just the same.
<p>
For example, <tt>MyFriend</tt> above can see his initial ACL as follows:
<p>
<pre>
% cat .__acl
MyFriend rwlxa
</pre>

This means that <tt>MyFriend</tt> can read, write, list, execute, and
administer items in the current directory.  Now, suppose that <tt>MyFriend</tt>
wants to allow <tt>Freddy</tt> read access to the same directory.
Simply edit the ACL file to read:
<pre>
MyFriend rwlxa
Freddy   rl
</pre>

Identity boxing and ACLs are particularly useful when using distributed
storage.  You can read more about ACLs and identity boxing in the
<a href=chirp.html>Chirp</a> manual.

<h2>64-Bit Support</h2>

In all modes, Parrot supports applications that access large (>2GB) files
that require 64-bit seek pointers.   However, we have found that many tools
and filesystems do not manipulate such large files properly.  If possible,
we advise users to break up files into smaller pieces for processing.
<p>
Parrot supports 64 bit programs and processors in the following combinations:
<center>
<table border=1>
<tr> <td colspan=2 align=center> <b>Program Type</b>
<tr> <td> 32-bit <td> 64-bit <td> <b>CPU Type</b>
<tr> <td><font size=+1 color=green>YES</font> <td> <font size=+1 color=red>NO</font><td> Parrot for 32-bit X86 CPU    <br> Pentium, Xeon, Athlon, Sempron
<tr> <td><font size=+1 color=green>YES</font> <td> <font size=+1 color=green>YES</font> <td > Parrot for 64-bit X86_64 CPU <br> Opteron, Athlon64, Turion64, Sempron64
</table>
</center>

<h2>For More Information</h2>

For the latest information on Parrot, visit the home page:

<dir>
<li> <a href=http://www.nd.edu/~ccl/software/parrot>Parrot Home Page</a>
</dir>

An exhaustive list of all options and commands can be found in the manual pages:

<dir>
<li> <a href=man/parrot_run.html>parrot_run</a>
<li> <a href=man/parrot_run_hdfs.html>parrot_run_hdfs</a>
<li> <a href=man/parrot_cp.html>parrot_cp</a>
<li> <a href=man/parrot_md5.html>parrot_md5</a>
<li> <a href=man/parrot_getacl.html>parrot_getacl</a>
<li> <a href=man/parrot_setacl.html>parrot_setacl</a>
<li> <a href=man/parrot_mkalloc.html>parrot_mkalloc</a>
<li> <a href=man/parrot_lsalloc.html>parrot_lsalloc</a>
<li> <a href=man/parrot_locate.html>parrot_locate</a>
<li> <a href=man/parrot_timeout.html>parrot_timeout</a>
<li> <a href=man/parrot_whoami.html>parrot_whoami</a>
</dir>