DNA parsimony

version 3.5c

DNAMOVE - Interactive DNA parsimony
     (c) Copyright 1986-1993 by Joseph Felsenstein and  by  the  University  of
Washington.  Written by Joseph Felsenstein.  Permission is granted to copy this
document provided that no fee is charged for it and that this copyright  notice
is not removed.

     DNAMOVE is  an  interactive  DNA  parsimony  program,  inspired  by  Wayne
Maddison and David Maddison's marvellous program MacClade, which is written for
Apple MacIntosh computers.  DNAMOVE reads in a data set which  is  prepared  in
almost the same format as one for the DNA parsimony program DNAPARS.  It allows
the user to choose an initial tree, and displays this tree on the screen.   The
user  can  look  at  different  sites  and  the  way  the nucleotide states are
distributed on that tree, given the most parsimonious reconstruction  of  state
changes for that particular tree.  The user then can specify how the tree is to
be rearraranged, rerooted or written out to a file.  By  looking  at  different
rearrangements  of  the  tree  the  user  can  manually  search  for  the  most
parsimonious tree, and can get a feel for how different sites are  affected  by
changes in the tree topology.

     This program is compatible with fewer  computer  systems  than  the  other
programs  in PHYLIP.  It can be adapted to PCDOS systems or to any system whose
screen or terminals emulate DEC VT52 or VT100 terminals (such as, for  example,
Zenith  Z19,  Z29,  and  Z49 terminals Telnet programs for logging in to remote
computers over a TCP/IP network, VT100-compatible windows in  the  X  windowing
system,  and  any  terminal compatible with ANSI standard terminals).   For any
other screen types, there is a generic option which does not make use of screen
graphics  characters  to  display  the  nucleotide  states.   This will be less
effective, as the nucleotide states will be less easy to see when displayed.

     The input data file is set up almost identically to  the  data  files  for
DNAPARS.   The  code for nucleotide sequences is the standard one, as described
in the molecular sequence programs document.  As in DNAPARS,  the  only  option
whose  presence  needs  to  be  signalled  in the input file is the W (Weights)
option, which functions as described in the main documentation file.  If it  is
used  then  there  must  be a W on the first line of the input file.  The user-
defined trees, as described below, are not specified in the input file but in a
separate tree file.



     The user interaction starts with the program presenting a menu.  The  menu
looks like this:


Interactive DNA parsimony, version 3.5c

Settings for this run:
  O                           Outgroup root?  No, use as outgroup species  1
  T                 Use Threshold parsimony?  No, use ordinary parsimony
  I             Input sequences interleaved?  Yes
  U Initial tree (arbitrary, user, specify)?  Arbitrary
  0      Graphics type (IBM PC, VT52, ANSI)?  ANSI
  L               Number of lines on screen?    24

Are these settings correct? (type Y or the letter for one to change)



The O (Outgroup), T (Threshold), and 0 (Graphics type) options  are  the  usual
ones  and  are  described  in the main documentation file.  The I (Interleaved)
option is the usual one and is described in the main documentation file and the
molecular  sequences  programs documentation file.  The U (initial tree) option
allows the user to  choose  whether  the  initial  tree  is  to  be  arbitrary,
interactively specified by the user, or read from a tree file.  Typing U causes
the program to change among the three possibilities in turn.  I would recommend
that  for  a  first  run,  you  allow  the  tree  to be set up arbitrarily (the
default), as the "specify" choice is difficult  to  use  and  the  "user  tree"
choice  requires  that you have available a tree file with the tree topology of
the initial tree, which must be a rooted tree.  If you  wish  to  set  up  some
particular  tree  you  can also do that by the rearrangement commands specified
below.

     The T (threshold) option allows a continuum of methods  between  parsimony
and  compatibility.   Thresholds  less  than  or  equal  to 1.0 do not have any
meaning and should not be used: they will result in a tree  dependent  only  on
the input order of species and not at all on the data!

     The L (screen Lines) option allows the user to change the  height  of  the
screen (in lines of characters) that is assumed to be available on the display.
This may be particularly helpful when displaying large trees on terminals  that
have  more  than  24  lines per screen, or on workstation or X-terminal screens
that can emulate the ANSI terminals with more than 24 lines.

     After the initial menu is displayed and the choices are made, the  program
then sets up an initial tree and displays it.  Below it will be a one-line menu
of possible commands, which looks like this:

NEXT? (Options: R # + - S . T U W O F C H ? X Q) (H or ? for Help)

If you type H or ? you will get a single screen showing a description  of  each
of   these   commands  in  a  few  words.   Here  are  slightly  more  detailed
descriptions:

R    ("Rearrange").  This command asks for the number of a node which is to  be
    removed from the tree.  It and everything to the right of it on the tree is
    to be removed (by breaking the branch immediately below it).   The  command
    also  asks  for  the  number  of  a  node  below  which that group is to be
    inserted.  If an impossible number is given, the program refuses  to  carry
    out  the  rearrangement and asks for a new command.  The rearranged tree is
    displayed: it will  often  have  a  different  number  of  steps  than  the
    original.   If  you wish to undo a rearrangement, use the Undo command, for
    which see below.

#    This command, and the +, - and S commands described below, determine which
    site  has  its  states displayed on the branches of the trees.  The initial
    tree displayed by the program does not show states of  sites.   When  #  is
    typed,  the  program  does  not  ask the user which site is to be shown but
    automatically shows the states of the next site that is not compatible with
    the tree (the next site that does not perfectly fit the current tree).  The
    search for this site "wraps around" so that if it  reaches  the  last  site
    without  finding  one  that  is  not  compatible  with the tree, the search
    continues at the first site; if no incompatible site is found  the  current
    site  is  shown again, and if no current site is being shown then the first
    site is shown.  The display takes the form of different symbols or textures
    on  the  branches  of  the  tree.  The state of each branch is actually the
    state of the node above it.  A key of the  symbols  or  shadings  used  for
    states A, C, G, T (U) and ? are shown next to the tree.  State ? means that
    more than one possible nucleotide could exist at that point  on  the  tree,
    and  that  the user may want to consider the different possibilities, which



    are usually apparent by inspection.

+    This command is the same as #  except  that  it  goes  forward  one  site,
    showing  the  states  of the next site.  If no site has been shown, using +
    will cause the first site to  be  shown.   Once  the  last  site  has  been
    reached, using + again will show the first site.

-    This command is the same as + except that it goes backwards,  showing  the
    states of the previous site.  If no site has been shown, using - will cause
    the last site to be shown.  Once site number 1 has been  reached,  using  -
    again will show the last site.

S    ("Show").  This command is the same as + and - except that it  causes  the
    program  to  ask  you for the number of a site.  That site is the one whose
    states will be displayed.  If you give the site number as  0,  the  program
    will go back to not showing the states of the sites.

 .    This command simply causes the current tree to be redisplayed.  It is  of
    use when the tree has partly disappeared off of the top of the screen owing
    to too many responses to commands being printed out at the  bottom  of  the
    screen.

T    ("Try rearrangements").  This command asks for the name of  a  node.   The
    part  of  the  tree  at  and above that node is removed from the tree.  The
    program tries to re-insert it in each possible location on the  tree  (this
    may  take  some time, and the program reminds you to wait).  Then it prints
    out a summary.  For each possible  location  the  program  prints  out  the
    number of the node to the right of the place of insertion and the number of
    steps required in each case.  These are divided into those that are  better
    then  or tied with the current tree.  Once this summary is printed out, the
    group that was removed is reinserted into its original position.  It is  up
    to  you  to use the R command to actually carry out any of the arrangements
    that have been tried.

U     ("Undo").   This  command  reverses  the  effect  of  the   most   recent
    rearrangement, outgroup re-rooting, or flipping of branches.  It returns to
    the previous tree topology.  It will be of great use when  rearranging  the
    tree  and  when  a  rearrangement proves worse than the preceding one -- it
    permits you to abandon the new one and return to the previous  one  without
    remembering its topology in detail.

W    ("Write").  This command writes out the current tree onto  a  tree  output
    file.   If  the file already has been written to by this run of DNAMOVE, it
    will ask you whether you want to replace the contents of the file, add  the
    tree  to  the end of the file, or  not write out the tree to the file.  The
    tree is written in the standard format used by PHYLIP (a subset of the  New
    Hampshire  standard).   It  is  in  the proper format to serve as the User-
    Defined Tree for setting up the initial tree in a  subsequent  run  of  the
    program.   Note  that  if  you provided the initial tree topology in a tree
    file and replace its contents, that initial tree will be lost.

O    ("Outgroup").  This asks for the number of a  node  which  is  to  be  the
    outgroup.   The  tree  will  be  redisplayed  with  that  node  as the left
    descendant of the bottom fork.  Note that it is possible  to  use  this  to
    make  a  multi-species group the outgroup (i.e., you can give the number of
    an interior node of the tree as the outgroup, and the program will  re-root
    the tree properly with that on the left of the bottom fork.

F    ("Flip").  This asks for a node number and then flips the two branches  at
    that  node,  so  that  the  left-right  order  of  branches at that node is
    changed.  This does not actually change the tree topology (or the number of



    steps on that tree) but it does change the appearance of the tree.

C    ("Clade").  When the data consist of more than 12 species  (or  more  than
    half  the  number  of  lines  on  the  screen if this is not 24), it may be
    difficult to display the tree on one screen.  In that case the tree will be
    squeezed  down  to  one line per species.  This is too small to see all the
    interior states of the tree.  The C command instructs the program to  print
    out  only  that  part  of the tree (the "clade") from a certain node on up.
    The program will prompt you for the number of  this  node.   Remember  that
    thereafter you are not looking at the whole tree.  To go back to looking at
    the whole tree give the C command again and enter "0" for the  node  number
    when asked.  Most users will not want to use this option unless forced to.

H    ("Help").  Prints a one-screen summary of what  the  commands  do,  a  few
    words for each command.

?    ("?").  A synonym for H.  Same as Help command.

X    ("Exit").  Exit from program.  If the current tree has not yet been  saved
    into a file, the program will first ask you whether it should be saved.

Q    ("Quit").  A synonym for X.  Same as the eXit command.


            ADAPTING THE PROGRAM TO YOUR COMPUTER AND TO YOUR TERMINAL

     As we have seen, the initial menu of the  program  allows  you  to  choose
among  four  screen  types  (PCDOS, Ansi, VT52 and none).  If you want to avoid
having to make this choice every time, you can change some of the constants  at
the  beginning  of  the program to have it initialize itself in the proper way.
At the beginning of the program are three constants that determine  which  kind
of  screen  graphics the program will use.   They are ibmpc0, vt520, and ansi0.
In the distribution version of the programs, ansi0 is set  to  "true"  and  the
others to "false", in the statements:

#define ibmpc0          false
#define ansi0           true
#define vt520           false

so that the version will work with ANSI compatible terminals.

     On the other hand if you have a terminal compatible with DEC's  VT52,  but
not with the ANSI terminal, you should change the constant ansi0 to "false" and
vt520 to "true".   If you have instead a terminal which is compatible with  IBM
PC  graphics,  you should set the constant "ibmpc0" to "true" and the others to
"false".   If your terminal is compatible with none of these, you will have  to
set  the  constants all "false", in which case special graphics characters will
not be used to indicate nucleotide states, but only letters will  be  used  for
the four nucleotides.  This is less easy to look at.

                      MORE ABOUT THE PARSIMONY CRITERION

     This program carries out unrooted parsimony (analogous  to  Wagner  trees)
(Eck  and  Dayhoff, 1966; Kluge and Farris, 1969) on DNA sequences.  The method
of Fitch (1971) is used to count the number of changes  of  base  needed  on  a
given  tree.   The assumptions of this method are exactly analogous to those of
MIX:

     1.  Each site evolves independently.





     2.  Different lineages evolve independently.

     3.  The probability of a base substitution at a given site is  small  over
the lengths of time involved in a branch of the phylogeny.

     4.  The expected amounts of change in different branches of the  phylogeny
do not vary by so much that two changes in a high-rate branch are more probable
than one change in a low-rate branch.

     5.  The expected amounts of change do not vary enough among sites that two
changes in one site are more probable than one change in another.

     That these are the assumptions of parsimony methods has been documented in
a  series of papers of mine: (1973a, 1978b, 1979, 1981b, 1983b, 1988b).  For an
opposing  view  arguing  that  the  parsimony  methods  make   no   substantive
assumptions  such  as  these, see the papers by Farris (1983) and Sober (1983a,
1983b), but also read the exchange between Felsenstein and Sober (1986).

     Change from an occupied site to a  deletion  is  counted  as  one  change.
Reversion from a deletion to an occupied site is allowed and is also counted as
one change.

     Below is a test data set, but we  cannot  show  the  output  it  generates
because of the interactive nature of the program.

-------------------------------TEST DATA SET----------------------------

   5   13
Alpha     AACGUGGCCA AAU
Beta      AAGGUCGCCA AAC
Gamma     CAUUUCGUCA CAA
Delta     GGUAUUUCGG CCU
Epsilon   GGGAUCUCGG CCC