This file is indexed.

/usr/share/fastdnaml/fastdnaml_help is in fastdnaml 1.2.2-11+b1.

This file is owned by root:root, with mode 0o644.

The actual contents of the file can be viewed below.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
                             fastDNAml 1.2


Gary J. Olsen, Department of Microbiology
University of Illinois, Urbana, IL
gary@phylo.life.uiuc.edu

Ross Overbeek, Mathematics and Computer Science
Argonne National Laboratory, Argonne, IL
overbeek@mcs.anl.gov



Citing fastDNAml

If you publish work using fastDNAml, please cite the following publications:

   Olsen, G. J., Matsuda, H., Hagstrom, R., and Overbeek, R.  1994.  fastDNAml:
   A tool for construction of phylogenetic trees of DNA sequences using maximum
   likelihood.  Comput. Appl. Biosci. 10: 41-48.

   Felsenstein, J.  1981.  Evolutionary trees from DNA sequences:
   A maximum likelihood approach.  J. Mol. Evol. 17: 368-376.



What is fastDNAml

fastDNAml is a program derived from Joseph Felsenstein's version 3.3 DNAML
(part of his PHYLIP package).  Users should consult the documentation for
DNAML before using this program.

fastDNAml is an attempt to solve the same problem as DNAML, but to do so
faster and using less memory, so that larger trees and/or more bootstrap
replicates become tractable.  Much of fastDNAml is merely a recoding of the
PHYLIP 3.3 DNAML program from PASCAL to C.

DNAML includes the following notice:

version 3.3. (c) Copyright 1986, 1990 by the University of Washington and
Joseph Felsenstein.  Written by Joseph Felsenstein.  Permission is granted to
copy and use this program provided no fee is charged for it and provided that
this copyright notice is not removed.



Why is fastDNAml faster?

Some recomputation of values has been eliminated (Joe Felsenstein has done
much of this in version 3.4 DNAML).

The optimization of branch lengths has been accelerated by changing from an EM
method to Newton's method (Joe Felsenstein has done much of this in version 3.4
DNAML).

The strategy for simultaneously optimizing all of the branches on the tree has
been modified to spend less time getting an individual branch right before
improving the other branches.



Other new features in fastDNAml

fastDNAml includes a checkpoint feature to regularly save its progress toward
finding a large tree.  If the program is interrupted, a minor change to the
input file and adding the R (restart) option permits the work to be resumed
from the last checkpoint.

The new R {restart) option can also be used for more rapid addition of new
sequences to a previously computed tree (when new sequences are added to the
alignment, it is best if the relative alignment of the previous sequences is
not altered).

The G (global) option has been generalized to permit crossing any number of
branches during tree rearrangements.  In addition, it is possible to modify
the extent of rearrangement explored during the sequential addition phase of
tree building.

The G U (global and user tree) option combination instructs the program to
find the best of the user trees, and then look for rearrangements that are
better still.

The number of available rate categories has been raised from 9 to 35.

The weighting mask accepts values from 0 through 35.

The new B (bootstrap) option causes generation of a bootstrap sample, drawn
from the input data.

The program includes "P4" code for distributing the problem over multiple
processors (either within one machine, or across multiple machines).



Do DNAML and fastDNAml give the same answer?

Generally yes, though there are some reservations:

One or the other might find a better tree due to minor changes in the ways
trees are searched.  When sequence addition is replicated with different
values of the jumble random number seed, they have about the same probability
of finding the best tree, but any given seed might give different trees.

The likelihoods and branch lengths sometimes differ very slightly due to
different criteria for stopping the optimization process.

Little has been done to check the confidence limits on branch lengths.  There
seem to be some instances in which they disagree, and we think that fastDNAml
is correct.  However, do not take the "significantly greater than zero" too
seriously.

If you are concerned, you can supply a tree inferred by fastDNAml as a user
tree to DNAML and let it (1) reoptimize branch lengths, (2) tell you
the confidence limits and (3) tell you the tree likelihood.



Changes and new features in version 1.2

The program can now calculate the likelihood of extremely large user trees.
The largest tree we have tested had 3200 taxa.  Generally, you will run out
of computer memory before you excede an intrinsic limitation.  (With this,
it is possible to compare trees found by whatever your favorite methods are
under the likelihood criterion.)

The computation has been changed to permit ease of implimenting new models
of evolution and analysis of amino acid sequences (though these have not yet
been done).  This has slowed down the program 5-10%.



Changes and new features in version 1.1

The quickadd option is now the default.  This has the ugly effect of reversing
the meaning of putting a Q on the option line.  (Sorry, about this, and the
next note, but in the long run it it is the better behavior.)

Use of empirical base frequencies is now the default.  This reverses the
meaning of the F option, making the default behavior more like that of PHYLIP.

The tree output file is now generated by default and should be more compatible
with the files written and read by the PHILIP programs.  In particular, the
comments with information about the tree, its likelihood, etc. are removed, and
there are no quotation marks around names unless there are unusual characters
within the name.  (There are two things to be very careful about in names:
there is no completely consistent way to handle both blanks and underscores in
names without quotation marks, and when a name is spaced in from the margin in
the input file, there are leading blank spaces in the name, which can be very
hard to make compatible with some programs.)

Maintaining a list of the several best trees, not just the (single) best.  In
particular, when evaluating user-supplied trees, the program tries to same
information about all of the trees and provides a Hasegawa and Kashino type
test of whether each tree is better than optimum.  Note, the current version
of the program prints the report in the order of tree likelihood, NOT in the
order the trees are supplied to the program.  The best way (at present) to
figure out which tree is which is to look at the likelihoods.  This is the
same test used in PHILIP, but I had removed access in version 1.0 of fastDNAml
due to differences in how the programs handle multiple trees.  The difference
is that fastDNAml can maintain nearly optimal trees all the time, so you can
get a list of the N best trees found by using the new K option (below).

The program should accept rooted trees (strictly bifurcating), as well as
unrooted trees (with a trifurcation at the deepest level).  This is not fully
tested, but it seems to work.



Features in the works

Test subtree exchanges (as well as moving a single subtree) in the search for
better trees.

Allowing the program to optimize any user-defined subset of branches when user
lengths are supplied.



Input and Options


Basics

The input to fastDNAml is similar to that used by DNAML (and the other PHYLIP
programs).  The user should consult the PHYLIP documentation for a basic
description of the format.

This version of fastDNAml expects to get its input from stdin (standard input)
and writes its output to stdout (standard output).  (There are compile time
options to modify this, for those who care to get into such things.)

On a UNIX or DOS system, it is a simple matter to redirect input from a file
and output to a file:

  fastDNAml < infile > outfile

On a VMS system it is only slightly more difficult.  Immediately before
running the program, one includes two commands that define the input and
output files:

  $ Define/User  Sys$Input   infile
  $ Define/User  Sys$Output  outfile
  $ Run fastDNAml

The default input data format is Interleaved (see I option).  To help get data
from a GenBank or similar format, the interleaved option can be switched off with the I option.  Numbers in the sequence data (i.e., sequence position
numbers) will be ignored, so they need not be stripped out.

(Note that the program also writes a file called checkpoint.PID.  See the R
option below for more description.)


1 -- Print Data

By default, fastDNAml does not echo the sequence data to the output file.
Option 1 reverses this.


3 -- Do Not Print Tree

By default, fastDNAml prints the final tree to the output file.  Option 3
reverses this.


4 -- Do Not Write Tree to File  (*****  Changed in version 1.1 *****)

By default, fastDNAml versions 1.1 and 1.2 write a machine readable (Newick
format) copy of the final tree to an output file.  Option 4 reverses this.
The tree output file will be called treefile.PID (where PID is the process ID
under which fastDNAml is running).  Look at the Y option below for more
information on alternative tree formats.


B -- Bootstrap

Generates a bootstrap sample of the input data.  Requires auxiliary data line
of the form:

  B  random_number_seed

Example:

  5  114  B
  B  137
  Sequence1   ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
  ...

If the W option is used, only positions that have nonzero weights are used in
computing the bootstrap sample.  Warning:  For a given random number seed, the
sample will always be the same.

PHYLIP DNAML does not include a bootstrap option.  (Use the SEQBOOT program.)


C -- Categories

Requires auxiliary data of the form:

  C  number_of_categories  list_of_category_rates

The maximum number of categories is 35.  This line is followed by a list of
the rates for each site:

  Categories  list_of_categories  [per site, one or more lines]

Category "numbers" are ordered: 1, 2, 3, ..., 9, A, B, ..., Y, Z.  Category
zero (undefined rate) is permitted at sites with a zero in a user-supplied
weighting mask.

Example:

  5  114  C
  C  12  0.0625  0.125  0.25  0.5  1  2  4  8  16  32  64  128
  Categories  5111136343678975AAA8949995566778888889AAAAAA9239898629AAAAA9
              633792246624457364222574877188898132984963499AA9899975
  Sequence1   ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
  ...

PHYLIP DNAML is limited to categories 1 through 9.  Also, in PHYLIP version
3.3, the categories data came after all the other auxiliary data, but before
the user-supplied base frequencies and sequence data.  If you make the C line
your last auxiliary data line, the programs will behave the same.


F -- Empirical Frequencies  (*****  Changed in version 1.1 *****)

By default (starting with version 1.1), the program uses base frequencies
derived from the sequence data (called emperical base frequencies).  Therefore
the input file should normally NOT include a base frequencies line preceding
the data.  If you want to include your own base freqency data, it is now
necessary to use the F option, and add a line to the input file that supplies
the frequency data:

Instructs the program to use user-supllied base frequencies derived from the
sequence data.  Therefore the input file should not include a base frequencies
line IMMEDIATELY preceding the data:

  5  114  F
  0.25  0.30  0.20  0.25
  Sequence1   ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
  ...

There is an alternative format: the frequencies can be anywhere in the list of
auxilliary data lines if they are preceded by an F in the first column:

  5  114  F C W
  F 0.25  0.30  0.20  0.25
  C ...
    ...
  W ...
  Sequence1   ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
  ...


G -- Global

If the global option is specified, there may also be an [optional] auxiliary
data line of form:

  G  N1

or

  G  N1  N2

N1 is the number of branches to cross in rearrangements of the completed tree.
The value of N2 is the number of branches to cross in testing rearrangements
during the sequential addition phase of tree inference.

  N1 = 1:            local rearrangement (default without G option)

  1 < N1 < numsp-3:  regional rearrangements (crossing N1 branches)

  N1>= numsp-3:      global rearrangements (default with G option)



  N2 <= N1           the default N2 is 1, local rearrangements.

The G option can also be used to force branch swapping on user trees, that is,
a combination of G and U options.

If the auxiliary line is supplied, it cannot be the last line of auxiliary
data.  (It may be necessary to add the T option with an auxiliary data line of

  T 2.0

if no other auxiliary data are used.)

Examples:

Do local rearrangements after each addition, and global after last addition:

  5  114  G
  Sequence1   ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
  ...

Do local rearrangements after each addition, and regional (crossing 4
branches) after last addition:

  5  114  G T
  G  4
  T  2.0
  Sequence1   ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
  ...

Do no rearrangements after each addition, and local after last addition:

  5  114  G T
  G  1 0
  T  2.0
  Sequence1   ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
  ...

PHYLIP DNAML does not support the auxiliary data line or branch swapping on a
user tree.


I -- Not Interleaved

By default, fastDNAml 1.2 expects data lines for the various sequences in an
interleaved format (as did PHYLIP 3.3 DNAML).  The I option reverses the
expected format (to non-interleaved data, in which all the data lines for one
sequence before the next sequence begins).  This is particularly useful for
editing a GenBank or equivalent format into a valid input file (note that
numbers within the sequence data are ignored, so it is not necessary to remove
them).

If all the data for each sequence are on one line, then the interleaved  and
non-interleaved formats are degenerate.  (This is the way David Swofford's
PAUP program writes PHYLIP format output files.)  The drawback is that many
programs do not handle long lines of text.  This includes the vi and EDT text
editors, many electronic mail programs, and some versions of FTP for VAX/VMS
systems.

PHYLIP 3.3 DNAML expects interleaved data, and does not include an I option to
alter this.  PHYLIP 3.4 DNAML accepts an I option, but the default format is
reversed.


J -- Jumble

Randomize the sequence addition order.  Requires an auxiliary input line of
the form:

  J  random_number_seed

Example:

  5  114  J
  J  137
  Sequence1   ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
  ...

Note that fastDNAml explores a very small number of alternative tree
topologies relative to a typical parsimony program.  There is a very real
chance that the search procedure will not find the tree topology with the
highest likelihood.  Altering the order of taxon addition and comparing the
trees found is a fairly efficient method for testing convergence.  Typically,
it would be nice to find the same best tree at least twice (if not three
times), as opposed to simply performing some fixed number of jumbles and
hoping that at least one of them will be the optimum.


K -- Keep multiple best trees  (***** New in version 1.1 *****)

The program can keep a list of the best trees that it has found.  When the
program is done, it prints a list of these, from best to worst, and print
a Hasegawa and Kishino type test as to which trees are significantly worse
than the best tree found.  When evaluating user-supplied trees, the program
automatically keeps all trees.  In other situations, the program keeps only
the best tree that it has found.  The K option, and associate auxilliary data
line, can be used to define an alternative number:

Example, to keep the 15 best trees found:

  5  114  K
  K  15
  Sequence1   ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
  ...

Example, to keep only the one best tree of possibly numerous user-supplied
trees:

  5  114  K  U
  K  1
  Sequence1   ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
  ...



L -- User Lengths

Causes user trees to be read with branch lengths (and it is an error to omit
any of them).  Without the L option, branch lengths in user trees are not
required, and are ignored if present.

Example:

  5  114  U L
  Sequence1   ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
  ...

(The U is for user tree and the L for user lengths)


O -- Outgroup

Use the specified sequence number for the outgroup.  Requires an auxiliary
data line of the form:

  O  outgroup_number

Example:

  5  114  O
  O 5
  Sequence1   ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
  ...

This option only affects the way the tree is drawn (and written to the
treefile).



Q -- Quickadd  (***** Changed in version 1.1 *****)

The quickadd feature greatly decreases the time in initially placing a new
sequence in the growing tree (but does not change the time required to
subsequently test rearrangements).  The overall time savings seems to be about
30%, based on a number of test cases.  Its downside, if any, is unknown.  This
is now (starting in version 1.1) the default program behavior.

If the analysis is run with a global option of "G 0 0", so that no
rearrangements are permitted, the tree is build very approximately, but very
quickly.  This may be of greatest interest if the question is, "Where does
this one new sequence fit into this known tree?  The known tree is provided
with the restart option (below).

PHYLIP DNAML does not include anything comparable to the quickadd feature.

The quickadd feature can be turned OFF by adding a Q to the first line of the
input file.



R -- Restart

The R option causes the program to read a user-supplied tree with less than
the full number of taxa as the starting point for sequential addition of the
remaining taxa.  Thus, the sequence data must be followed by a valid (Newick
format) tree.  (The phylip_tree/2, prolog fact format, is now also supported.)

The restart option can also be used to increase the range of the search for
alternative (better) trees.  For example, you can take a tree produced with
only "local" tree rearrangements, and increase the rearrangements to
"regional" or "global" by combining the appropriate global option with the
restart option.  If the starting tree was written by fastDNAml, then the
extent of rearrangements is saved with the tree, and will be used as the
starting point for the additional search.  If the tree was already globally
optimized, then no additional searching will be performed.

To support the R option, after each taxon is added to the growing tree, and
after each round of rearrangements, the program appends a checkpoint tree to a
file called checkpoint.PID, where PID is the process number of the running
fastDNAml program.  The last line of this file needs to be appended to the
input file when the R option is used.  (This should not be confused with the U
(user tree) option, which expects a number followed by that number of trees.
No additional taxa are added to user trees.)

The UNIX utility tail can be used to remove the last tree from the checkpoint
file, and the utility cat can be used to append it to the input.  For example,
the following script can be used to add a starting tree and the R option to a
data file, and restart fastDNAml:

  #! /bin/sh
  if test $# -ne 1
    then echo "Usage:  restart checkpoint_file"
    exit
  fi
  read first_line             # first line of data file
  echo "$first_line R"        # add restart option
  cat -                       # rest of data file
  tail -1 $1                  # append last tree in checkpoint file

If this shell script is in the file called restart, then one might use the
command:

  restart  checkpoint.21312  < infile  | fastDNAml  > new_outfile
   ^script  ^checkpoint tree    ^data     ^dnaml program  ^output_file

If this is too opaque, don't worry about it, or talk with your local unix
wizard.  In the mean time, this and other useful shell scripts are provided
with the program.

PHYLIP DNAML does not write checkpoint trees and does not have a restart
option.



T -- Transition/transversion ratio

Use a user-specified ratio of transition to transversion type substitutions.
Without the T option, a value of 2.0 is used.  Requires an auxiliary data line
of the form:

  T  ratio

Example:

  5  114  T
  T  1.0
  Sequence1   ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
  ...

(Note that a T option with a value of 2.0 does nothing, but it can provide
a last auxiliary data line following optional auxiliary data.  See the
examples for G and Y.)



U -- User Tree(s)

Read an input line with the number of user-specified trees, followed by the
specified number of trees.  These data immediately follow the sequence data.

The trees must be in Newick format, and terminated with a semicolon.  (The
program also accepts a pseudo_newick format, which is a valid prolog fact.)

The tree reader in this program is more powerful than that in PHYLIP 3.3.  In
particular, material enclosed in square brackets, [ like this ], is ignored as
comments; taxa names can be wrapped in single quotation marks to support the
inclusion of characters that would otherwise end the name (i.e., '(', ')',
':', ';', '[', ']', ',' and ' '); names of internal nodes are properly
ignored; and exponential notation (such as 1.0E-6) for branch lengths is
supported.



W -- Weights

Read user-specified column weighting information.  This option requires
auxiliary data of the form:

  Weights     list_of_weight_values    [per site, one or more lines]

Example:

  5  114  W
  Weights     111111111111001100000100011111100000000000000110000110000000
              111101111111111111111111011100000111001011100000000011
  Sequence1   ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
  ...

It is necessary that the weight values not start before the 11'th character in
the line, or some of them will be lost.  Weights from 0 to 35 are indicated by
the series: 0, 1, 2, 3, ..., 9, A, B, ..., Y, Z.

PHYLIP DNAML does not support user weights with values other than 1 or 0.
This limit has been removed in fastDNAml to permit the use of user weights
as a mechanism for representing a bootstrap sample (that is, only the
auxiliary data lines change, not the body of the data file).



Y -- Write Tree  (*****  Changed in version 1.1  *****)

fastDNAml writes the final tree to an output file called treefile.PID.  By
default the tree is in PHYLIP format.  The Y option allows turning this off,
or changing the format of the tree.

The Y option by itself toggles the saving of the tree, on or off.  If there
is also an auxiliary input line of the form:

  Y number

where number can be 1, 2, or 3, the number selects one of three tree output
formats:

  1  Newick
  2  Prolog
  3  PHYLIP (default)

Newick is the tree standard used by PAUP, MacClade, and serveral other
programs.  The tree includes a comment about the analysis that the tree is
based upon.  fastDNAml uses this comment when it reads a tree.  In addition,
the names of the taxa are enclosed in quotation marks.  Both of these
features of the file make it incompatible with the PHYLIP package.

PHYLIP is the subset of the Newick tree standard used by programs in the
PHYLIP package.  There are no comments and no quotations marks around names.
(If a name includes unusual characters, such as a comma, fastDNAml will put
it in quotation marks, making it a valid tree, but it cannot be read by the
PHYLIP programs.)

The Prolog format very similar to the Newick format, but it is a valid prolog
fact that permits direct loading into some sequence analysis tools that we
use.  The structure of the term is:

  pseudo_newick([Comment], (Subtree1, Subtree2, Subtree3): Length).

where each subtree is either

  (Subtree1,Subtree2): Length

or

  Label: Length

The comment is a valid prolog term when && is defined as a unary operator.
Label is a prolog atom (it is a valid Newick label, with single quotation
marks).  Length is a number.

Because the Y auxiliary input line is optional, it cannot be the last auxiliary
data line.

Examples.  To turn of the saving of the tree,

  5  114  Y
  Sequence1   ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
  ...

or, to change the output to the full Newick format,

  5  114  Y T
  Y 1
  T 2.0
  Sequence1   ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
  ...

PHYLIP DNAML does not append the PID (process ID) to the tree file name and
does not support the full Newick standard or the prolog format output.

=============================================================================

Acknowledgements:

The origin and development of fastDNAml as a program to extend the use of
maximum likelihood phylogenetic inference to larger sets of DNA sequences
was encouraged by Carl Woese.  Through the development and evolution of the
program, Joseph Felsenstein has been extremely helpful and encouraging.

Numerous users have made suggestions and/or reported program bugs:

   Gary Nunn
   Tom Schmidt
   Ross Overbeek
   Hideo Matsuda
   Mitchell Sogin
   Brenden Rielly

=============================================================================

Examples:

Data file with empirical frequencies (generic analysis) (notice that blank
lines are permitted in the data):

5  114
Sequence1   ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
Sequence2   ACGCGGTGTCGTGTCATGCTACATTATGCTAGACTGCGTCGGATGCTCGTATTGACTGCG
Sequence3   ACGCGGTGCCGTGTNATGCTGCATTATGCTCGACTGCGRCGGATGCTAGTATTGACTGCG
Sequence4   ACGCGCTGCCGTGTCATCCTACACGATGCYAGACAGCGTCAGCTGCTAGTACTGGCTGAG
Sequence5   ACGCGCTGTCGTGTCATACTGCAGGATGCTAGACTGCGTCAGCTGCTAGTACTGGCTGAG

            AGCTCGATGATCGGTGACGTAGACTCAGGGGCCATGCCGCGAGTTTGCGATGCG
            AGCACGGTGATCAATGACGTAGNCTCAGGRTCCACGCCGTGACTTTGTGATNCG
            AGCACGATGACCGATGACGTAGACTGAGGGTCCGTGCCGCGACTTTGTGATGCG
            ACCTCGGTGATTGATGACGTAGACTGCGGGTCCATGCCGCGATTTTGCGRTGCG
            ACCTCGATGCTCGATGACGTAGACTGCGGGTCCATGCCGTGATTTTGCGATGCG


Data file with empirical frequencies and a random addition order:

5  114  J
J 137
Sequence1   ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
Sequence2   ACGCGGTGTCGTGTCATGCTACATTATGCTAGACTGCGTCGGATGCTCGTATTGACTGCG
Sequence3   ACGCGGTGCCGTGTNATGCTGCATTATGCTCGACTGCGRCGGATGCTAGTATTGACTGCG
Sequence4   ACGCGCTGCCGTGTCATCCTACACGATGCYAGACAGCGTCAGCTGCTAGTACTGGCTGAG
Sequence5   ACGCGCTGTCGTGTCATACTGCAGGATGCTAGACTGCGTCAGCTGCTAGTACTGGCTGAG

            AGCTCGATGATCGGTGACGTAGACTCAGGGGCCATGCCGCGAGTTTGCGATGCG
            AGCACGGTGATCAATGACGTAGNCTCAGGRTCCACGCCGTGACTTTGTGATNCG
            AGCACGATGACCGATGACGTAGACTGAGGGTCCGTGCCGCGACTTTGTGATGCG
            ACCTCGGTGATTGATGACGTAGACTGCGGGTCCATGCCGCGATTTTGCGRTGCG
            ACCTCGATGCTCGATGACGTAGACTGCGGGTCCATGCCGTGATTTTGCGATGCG


Data file with empirical frequencies and a bootstrap resampling:

5  114  B
B 137
Sequence1   ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
Sequence2   ACGCGGTGTCGTGTCATGCTACATTATGCTAGACTGCGTCGGATGCTCGTATTGACTGCG
Sequence3   ACGCGGTGCCGTGTNATGCTGCATTATGCTCGACTGCGRCGGATGCTAGTATTGACTGCG
Sequence4   ACGCGCTGCCGTGTCATCCTACACGATGCYAGACAGCGTCAGCTGCTAGTACTGGCTGAG
Sequence5   ACGCGCTGTCGTGTCATACTGCAGGATGCTAGACTGCGTCAGCTGCTAGTACTGGCTGAG

            AGCTCGATGATCGGTGACGTAGACTCAGGGGCCATGCCGCGAGTTTGCGATGCG
            AGCACGGTGATCAATGACGTAGNCTCAGGRTCCACGCCGTGACTTTGTGATNCG
            AGCACGATGACCGATGACGTAGACTGAGGGTCCGTGCCGCGACTTTGTGATGCG
            ACCTCGGTGATTGATGACGTAGACTGCGGGTCCATGCCGCGATTTTGCGRTGCG
            ACCTCGATGCTCGATGACGTAGACTGCGGGTCCATGCCGTGATTTTGCGATGCG


Data with weighting mask and rate categories:

5  114  W C
Weights     111111111111001100000100011111100000000000000110000110000000
            111101111111111111111111011100000111001011100000000011
C  10  0.0625  0.125  0.25  0.5  1  2  4  8  16  32
Categories  5111136343678975AAA8949995566778888889AAAAAA9239898629AAAAA9
            633792246624457364222574877188898132984963499AA9899975
Sequence1   ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
Sequence2   ACGCGGTGTCGTGTCATGCTACATTATGCTAGACTGCGTCGGATGCTCGTATTGACTGCG
Sequence3   ACGCGGTGCCGTGTNATGCTGCATTATGCTCGACTGCGRCGGATGCTAGTATTGACTGCG
Sequence4   ACGCGCTGCCGTGTCATCCTACACGATGCYAGACAGCGTCAGCTGCTAGTACTGGCTGAG
Sequence5   ACGCGCTGTCGTGTCATACTGCAGGATGCTAGACTGCGTCAGCTGCTAGTACTGGCTGAG

            AGCTCGATGATCGGTGACGTAGACTCAGGGGCCATGCCGCGAGTTTGCGATGCG
            AGCACGGTGATCAATGACGTAGNCTCAGGRTCCACGCCGTGACTTTGTGATNCG
            AGCACGATGACCGATGACGTAGACTGAGGGTCCGTGCCGCGACTTTGTGATGCG
            ACCTCGGTGATTGATGACGTAGACTGCGGGTCCATGCCGCGATTTTGCGRTGCG
            ACCTCGATGCTCGATGACGTAGACTGCGGGTCCATGCCGTGATTTTGCGATGCG


Data with three user-specified tree branching orders:

5  114  U
Sequence1   ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
Sequence2   ACGCGGTGTCGTGTCATGCTACATTATGCTAGACTGCGTCGGATGCTCGTATTGACTGCG
Sequence3   ACGCGGTGCCGTGTNATGCTGCATTATGCTCGACTGCGRCGGATGCTAGTATTGACTGCG
Sequence4   ACGCGCTGCCGTGTCATCCTACACGATGCYAGACAGCGTCAGCTGCTAGTACTGGCTGAG
Sequence5   ACGCGCTGTCGTGTCATACTGCAGGATGCTAGACTGCGTCAGCTGCTAGTACTGGCTGAG

            AGCTCGATGATCGGTGACGTAGACTCAGGGGCCATGCCGCGAGTTTGCGATGCG
            AGCACGGTGATCAATGACGTAGNCTCAGGRTCCACGCCGTGACTTTGTGATNCG
            AGCACGATGACCGATGACGTAGACTGAGGGTCCGTGCCGCGACTTTGTGATGCG
            ACCTCGGTGATTGATGACGTAGACTGCGGGTCCATGCCGCGATTTTGCGRTGCG
            ACCTCGATGCTCGATGACGTAGACTGCGGGTCCATGCCGTGATTTTGCGATGCG
3
(Sequence1,(Sequence2,Sequence3),(Sequence4,Sequence5));
(Sequence2,(Sequence1,Sequence3),(Sequence4,Sequence5));
(Sequence3,(Sequence1,Sequence2),(Sequence4,Sequence5));


Data with transition/transversion ratio and base frequencies to
simulate Jukes & Cantor model:

5  114  T F
T 0.501
F 0.25 0.25 0.25 0.25
Sequence1   ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
Sequence2   ACGCGGTGTCGTGTCATGCTACATTATGCTAGACTGCGTCGGATGCTCGTATTGACTGCG
Sequence3   ACGCGGTGCCGTGTNATGCTGCATTATGCTCGACTGCGRCGGATGCTAGTATTGACTGCG
Sequence4   ACGCGCTGCCGTGTCATCCTACACGATGCYAGACAGCGTCAGCTGCTAGTACTGGCTGAG
Sequence5   ACGCGCTGTCGTGTCATACTGCAGGATGCTAGACTGCGTCAGCTGCTAGTACTGGCTGAG

            AGCTCGATGATCGGTGACGTAGACTCAGGGGCCATGCCGCGAGTTTGCGATGCG
            AGCACGGTGATCAATGACGTAGNCTCAGGRTCCACGCCGTGACTTTGTGATNCG
            AGCACGATGACCGATGACGTAGACTGAGGGTCCGTGCCGCGACTTTGTGATGCG
            ACCTCGGTGATTGATGACGTAGACTGCGGGTCCATGCCGCGATTTTGCGRTGCG
            ACCTCGATGCTCGATGACGTAGACTGCGGGTCCATGCCGTGATTTTGCGATGCG


Non-interleaved data:

5  114  I
Sequence1   ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
            AGCTCGATGATCGGTGACGTAGACTCAGGGGCCATGCCGCGAGTTTGCGATGCG
Sequence2   ACGCGGTGTCGTGTCATGCTACATTATGCTAGACTGCGTCGGATGCTCGTATTGACTGCG
            AGCACGGTGATCAATGACGTAGNCTCAGGRTCCACGCCGTGACTTTGTGATNCG
Sequence3   ACGCGGTGCCGTGTNATGCTGCATTATGCTCGACTGCGRCGGATGCTAGTATTGACTGCG
            AGCACGATGACCGATGACGTAGACTGAGGGTCCGTGCCGCGACTTTGTGATGCG
Sequence4   ACGCGCTGCCGTGTCATCCTACACGATGCYAGACAGCGTCAGCTGCTAGTACTGGCTGAG
            ACCTCGGTGATTGATGACGTAGACTGCGGGTCCATGCCGCGATTTTGCGRTGCG
Sequence5   ACGCGCTGTCGTGTCATACTGCAGGATGCTAGACTGCGTCAGCTGCTAGTACTGGCTGAG
            ACCTCGATGCTCGATGACGTAGACTGCGGGTCCATGCCGTGATTTTGCGATGCG


Non-interleaved data by editing a GenBank format (make sure that the names are
padded to at least ten characters with blanks):

5  114  I
Sequence1  
        1 ACACGGTGTC GTATCATGCT GCAGGATGCT AGACTGCGTC ANATGTTCGT ACTAACTGTG
       61 AGCTCGATGA TCGGTGACGT AGACTCAGGG GCCATGCCGC GAGTTTGCGA TGCG
Sequence2  
        1 ACGCGGTGTC GTGTCATGCT ACATTATGCT AGACTGCGTC GGATGCTCGT ATTGACTGCG
       61 AGCACGGTGA TCAATGACGT AGNCTCAGGR TCCACGCCGT GACTTTGTGA TNCG
Sequence3  
        1 ACGCGGTGCC GTGTNATGCT GCATTATGCT CGACTGCGRC GGATGCTAGT ATTGACTGCG
       61 AGCACGATGA CCGATGACGT AGACTGAGGG TCCGTGCCGC GACTTTGTGA TGCG
Sequence4  
        1 ACGCGCTGCC GTGTCATCCT ACACGATGCY AGACAGCGTC AGCTGCTAGT ACTGGCTGAG
       61 ACCTCGGTGA TTGATGACGT AGACTGCGGG TCCATGCCGC GATTTTGCGR TGCG
Sequence5  
        1 ACGCGCTGTC GTGTCATACT GCAGGATGCT AGACTGCGTC AGCTGCTAGT ACTGGCTGAG
       61 ACCTCGATGC TCGATGACGT AGACTGCGGG TCCATGCCGT GATTTTGCGA TGCG


Data analysis restarted from a four-taxon tree (which happens to be wrong,
but it will be corrected by local rearrangements after the tree is read):

5  114  R
Sequence1   ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
Sequence2   ACGCGGTGTCGTGTCATGCTACATTATGCTAGACTGCGTCGGATGCTCGTATTGACTGCG
Sequence3   ACGCGGTGCCGTGTNATGCTGCATTATGCTCGACTGCGRCGGATGCTAGTATTGACTGCG
Sequence4   ACGCGCTGCCGTGTCATCCTACACGATGCYAGACAGCGTCAGCTGCTAGTACTGGCTGAG
Sequence5   ACGCGCTGTCGTGTCATACTGCAGGATGCTAGACTGCGTCAGCTGCTAGTACTGGCTGAG

            AGCTCGATGATCGGTGACGTAGACTCAGGGGCCATGCCGCGAGTTTGCGATGCG
            AGCACGGTGATCAATGACGTAGNCTCAGGRTCCACGCCGTGACTTTGTGATNCG
            AGCACGATGACCGATGACGTAGACTGAGGGTCCGTGCCGCGACTTTGTGATGCG
            ACCTCGGTGATTGATGACGTAGACTGCGGGTCCATGCCGCGATTTTGCGRTGCG
            ACCTCGATGCTCGATGACGTAGACTGCGGGTCCATGCCGTGATTTTGCGATGCG
(Sequence4:0.1,Sequence2:0.1,(Sequence1:0.1,Sequence5:0.1):0.1):0.0;


Data analysis restarted from a four-taxon tree (which is wrong, and which
will not be corrected after the tree is read due to the suppression of all
rearrangements by the global 0 0 option):

5  114  R G T
G 0 0
T 2.0
Sequence1   ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
Sequence2   ACGCGGTGTCGTGTCATGCTACATTATGCTAGACTGCGTCGGATGCTCGTATTGACTGCG
Sequence3   ACGCGGTGCCGTGTNATGCTGCATTATGCTCGACTGCGRCGGATGCTAGTATTGACTGCG
Sequence4   ACGCGCTGCCGTGTCATCCTACACGATGCYAGACAGCGTCAGCTGCTAGTACTGGCTGAG
Sequence5   ACGCGCTGTCGTGTCATACTGCAGGATGCTAGACTGCGTCAGCTGCTAGTACTGGCTGAG

            AGCTCGATGATCGGTGACGTAGACTCAGGGGCCATGCCGCGAGTTTGCGATGCG
            AGCACGGTGATCAATGACGTAGNCTCAGGRTCCACGCCGTGACTTTGTGATNCG
            AGCACGATGACCGATGACGTAGACTGAGGGTCCGTGCCGCGACTTTGTGATGCG
            ACCTCGGTGATTGATGACGTAGACTGCGGGTCCATGCCGCGATTTTGCGRTGCG
            ACCTCGATGCTCGATGACGTAGACTGCGGGTCCATGCCGTGATTTTGCGATGCG
(Sequence4:0.1,Sequence2:0.1,(Sequence1:0.1,Sequence5:0.1):0.1):0.0;