version 3.5c

DOLMOVE -- Interactive Dollo and Polymorphism Parsimony
(c) Copyright  1986-1993  by  Joseph  Felsenstein  and  by  the  University  of
Washington.  Written by Joseph Felsenstein.  Permission is granted to copy this
document provided that no fee is charged for it and that this copyright  notice
is not removed.

     DOLMOVE is an interactive parsimony  program  which  uses  the  Dollo  and
Polymorphism  parsimony  criteria.   It is inspired on Wayne Maddison and David
Maddison's marvellous program MacClade, which is written  for  Apple  MacIntosh
computers.   DOLMOVE  reads  in a data set which is prepared in almost the same
format as one for the Dollo  and  polymorhism  parsimony  program  DOLLOP.   It
allows  the  user  to  choose  an  initial  tree, and displays this tree on the
screen.  The user can look at different characters and the way their states are
distributed  on  that tree, given the most parsimonious reconstruction of state
changes for that particular tree.  The user then can specify how the tree is to
be  rearraranged,  rerooted  or written out to a file.  By looking at different
rearrangements  of  the  tree  the  user  can  manually  search  for  the  most
parsimonious tree, and can get a feel for how different characters are affected
by changes in the tree topology.

     This program is compatible with fewer  computer  systems  than  the  other
programs  in PHYLIP.  It can be adapted to PCDOS systems or to any system whose
screen or terminals emulate DEC VT52 or VT100 terminals (such as, for  example,
Zenith  Z19,  Z29,  and  Z49 terminals Telnet programs for logging in to remote
computers over a TCP/IP network, VT100-compatible windows in  the  X  windowing
system,  and  any  terminal compatible with ANSI standard terminals).   For any
other screen types, there is a generic option which does not make use of screen
graphics  characters  to  display  the  character  states.   This  will be less
effective, as the states will be less easy to see when displayed.

     The input data file is set up almost identically to  the  data  files  for
DOLLOP.   The  sole  exception  is that the user trees are not contained in the
input file, but in the tree file, which is used both for input of the  starting
tree  (if desired) and for output of the final tree.  Note that this means that
the user tree supplied on input will possibly be overwritten.  The W (Weights),
A (Ancestors), and F (Factors) are possible options specified in the input file
(some must also be chosen in the menu).

     The user interaction starts with the program presenting a menu.  The  menu
looks like this:


Interactive Dollo or polymorphism parsimony, version 3.5c

Settings for this run:
  P                        Parsimony method?  Dollo
  T                 Use Threshold parsimony?  No, use ordinary parsimony
  A      Use ancestral states in input file?  No
  U Initial tree (arbitrary, user, specify)?  Arbitrary
  0      Graphics type (IBM PC, VT52, ANSI)?  ANSI
  L               Number of lines on screen?  24

Are these settings correct? (type Y or the letter for one to change)

The P (Parsimony Method) option is the one that  toggles  between  polymorphism
parsimony and Dollo parsimony.  The program defaults to Dollo parsimony.



     The T (Threshold) and 0 (Graphics type) options are the usual ones and are
described  in  the main documentation file.  The L option allows the program to
take advantage of larger screens if available.  The X (Mixed Methods option  is
not  available  in  DOLMOVE.   The  U  (initial tree) option allows the user to
choose whether the initial tree is to be arbitrary, interactively specified  by
the  user,  or  read  from  a tree file.  Typing U causes the program to change
among the three possibilities in turn.  I would recommend that for a first run,
you  allow  the  tree  to be set up arbitrarily (the default), as the "specify"
choice is difficult to use and the "user tree" choice requires  that  you  have
available  a tree file with the tree topology of the initial tree.  If you wish
to set up some particular tree you  can  also  do  that  by  the  rearrangement
commands  specified  below.   The  T  (threshold)  option allows a continuum of
methods between parsimony and compatibility.  Thresholds less than or equal  to
0  do  not  have any meaning and should not be used: they will result in a tree
dependent only on the input order of species and not at all on the data!

     After the initial menu is displayed and the choices are made, the  program
then sets up an initial tree and displays it.  Below it will be a one-line menu
of possible commands, which looks like this:

NEXT? (Options: R # + - S . T U W O F C H ? X Q) (H or ? for Help)

If you type H or ? you will get a single screen showing a description  of  each
of   these   commands  in  a  few  words.   Here  are  slightly  more  detailed
descriptions:

R    ("Rearrange").  This command asks for the number of a node which is to  be
    removed from the tree.  It and everything to the right of it on the tree is
    to be removed (by breaking the branch immediately below it).   The  command
    also  asks  for  the  number  of  a  node  below  which that group is to be
    inserted.  If an impossible number is given, the program refuses  to  carry
    out  the  rearrangement and asks for a new command.  The rearranged tree is
    displayed: it will  often  have  a  different  number  of  steps  than  the
    original.   If  you wish to undo a rearrangement, use the Undo command, for
    which see below.

#    This command, and the +, - and S commands described below, determine which
    character  has  its  states  displayed  on  the branches of the trees.  The
    initial tree displayed by the program does not show states of sites.   When
    #  is  typed,  the  program  does not ask the user which character is to be
    shown but automatically shows the states of the next binary character  that
    is not compatible with the tree (the next character that does not perfectly
    fit the current tree).  The search for this  character  "wraps  around"  so
    that  if  it  reaches  the  last  character without finding one that is not
    compatible with the tree, the search continues at the first  character;  if
    no  incompatible  character is found the current character is shown, and if
    no current character is shown then the first character is  shown.   If  the
    last  character  has been reached, using + again causes the first character
    to be shown.  The display takes the form of different symbols  or  textures
    on  the  branches  of  the  tree.  The state of each branch is actually the
    state of the node above it.  A key of the  symbols  or  shadings  used  for
    states  0,  1  and ? are shown next to the tree.  State ? means that either
    state 0 or state 1 could exist at that point on the tree, and that the user
    may  want  to  consider  the  different  possibilities,  which  are usually
    apparent by inspection.

+    This command is the same as # except that it goes forward  one  character,
    showing  the states of the next character.  If no character has been shown,
    using + will cause  the  first  character  to  be  shown.   Once  the  last
    character has been reached, using + again will show the first character.




-    This command is the same as + except that it goes backwards,  showing  the
    states  of the previous character.  If no character has been shown, using -
    will cause the last character to be shown.  Once  character  number  1  has
    been reached, using - again will show the last character.

S    ("Show").  This command is the same as + and - except that it  causes  the
    program  to  ask  you for the number of a character.  That character is the
    one whose states will be displayed.  If you give the character number as 0,
    the program will go back to not showing the states of the characters.

 .    This command simply causes the current tree to be redisplayed.  It is  of
    use when the tree has partly disappeared off of the top of the screen owing
    to too many responses to commands being printed out at the  bottom  of  the
    screen.

T    ("Try rearrangements").  This command asks for the name of  a  node.   The
    part  of  the  tree  at  and above that node is removed from the tree.  The
    program tries to re-insert it in each possible location on the  tree  (this
    may  take  some time, and the program reminds you to wait).  Then it prints
    out a summary.  For each possible  location  the  program  prints  out  the
    number of the node to the right of the place of insertion and the number of
    steps required in each case.  These are divided into those that are better,
    tied,  or  worse  than the current tree.  Once this summary is printed out,
    the group that was removed is inserted into its original position.   It  is
    up  to  you to use the R command to actually carry out any the arrangements
    that have been tried.

U     ("Undo").   This  command  reverses  the  effect  of  the   most   recent
    rearrangement, outgroup re-rooting, or flipping of branches.  It returns to
    the previous tree topology.  It will be of great use when  rearranging  the
    tree  and  when  a  rearrangement proves worse than the preceding one -- it
    permits you to abandon the new one and return to the previous  one  without
    remembering its topology in detail.

W    ("Write").  This command writes out the current tree onto  a  tree  output
    file.   If  the file already has been written to by this run of DOLMOVE, it
    will ask you whether you want to replace the contents of the file, add  the
    tree  to  the end of the file, or  not write out the tree to the file.  The
    tree is written in the standard format used by PHYLIP (a subset of the  New
    Hampshire  standard).   It  is  in  the proper format to serve as the User-
    Defined Tree for setting up the initial tree in a  subsequent  run  of  the
    program.

O    ("Outgroup").  This asks for the number of a  node  which  is  to  be  the
    outgroup.   The  tree  will  be  redisplayed  with  that  node  as the left
    descendant of the bottom fork.  The number of steps required  on  the  tree
    may  change  on re-rooting.  Note that it is possible to use this to make a
    multi-species group the outgroup (i.e., you  can  give  the  number  of  an
    interior node of the tree as the outgroup, and the program will re-root the
    tree properly with that on the left of the bottom fork).

F    ("Flip").  This asks for a node number and then flips the two branches  at
    that,  so  that  the  left-right order of branches at that node is changed.
    This does not actually change the tree topology (or the number of steps  on
    that tree) but it does change the appearance of the tree.

C    ("Clade").  When the data consist of more than 12 species  (or  more  than
    half  the  number  of  lines  on  the  screen if this is not 24), it may be
    difficult to display the tree on one screen.  In that case the tree will be
    squeezed  down  to  one line per species.  This is too small to see all the
    interior states of the tree.  The C command instructs the program to  print



    out  only  that  part  of the tree (the "clade") from a certain node on up.
    The program will prompt you for the number of  this  node.   Remember  that
    thereafter you are not looking at the whole tree.  To go back to looking at
    the whole tree give the C command again and enter "0" for the  node  number
    when asked.  Most users will not want to use this option unless forced to.

H    ("Help").  Prints a one-screen summary of what  the  commands  do,  a  few
    words for each command.

?    ("?").  A synonym for H.  Same as Help command.

X    ("Exit").  Exit from program.  If the current tree has not yet been  saved
    into a file, the program will ask you whether it should be saved.

Q    ("Quit").  A synonym for X.  Same as the eXit command.


                                      OUTPUT

     If the A option is used, then the program will infer,  for  any  character
whose  ancestral state is unknown ("?") whether the ancestral state 0 or 1 will
give the fewest changes (according to the criterion  in  use).   If  these  are
tied,  then  it  may  not be possible for the program to infer the state in the
internal nodes, and many of these will be shown as "?".  If the A option is not
used, then the program will assume 0 as the ancestral state.

     When reconstructing the placement of forward changes and reversions  under
the  Dollo  method,  keep in mind that each polymorphic state in the input data
will require one "last minute" reversion.  This  is  included  in  the  counts.
Thus  if  we  have  both  states  0 and 1 at a tip of the tree the program will
assume that the lineage had state 1 up to the last minute,  and  then  state  0
arose in that population by reversion, without loss of state 1.

     When DOLMOVE calculates the number of characters compatible with the tree,
it  will  take the F option into account and count the multistate characters as
units, counting a character as compatible with the tree only when  all  of  the
binary characters corresponding to it are compatible with the tree.

     In the input file the W (Weights) option is available, as usual.   It  and
the  A  (Ancestral  states)  and  F  (Factors) options require the option to be
declared on the first line of the  input  file  and  other  information  to  be
present in the input file.  If the Ancestral States option is used the A option
must also be chosen in the menu.  The F (Factors) option is available  in  this
program.  It is used to inform the program which groups of characters are to be
counted together in computing the number  of  characters  compatible  with  the
tree.   Thus  if three binary characters are all factors of the same multistate
character, the multistate character will be counted as compatible with the tree
only if all three factors are compatible with it.


          ADAPTING THE PROGRAM TO YOUR COMPUTER AND TO YOUR TERMINAL

     As we have seen, the initial menu of the  program  allows  you  to  choose
among  four  screen  types  (PCDOS, Ansi, VT52 and none).  If you want to avoid
having to make this choice every time, you can change some of the constants  at
the  beginning  of  the program to have it initialize itself in the proper way,
and recompile it.  Among the constants at the beginning of the program you will
find  three  that determine which kind of screen graphics the program will use.
These constants are "ibmpc0",  "vt520",  and  "ansi0".    In  the  distribution
version  of  the  programs,  "ansi0" is set to true and the others to false, so
that the version will work with ANSI compatible terminals.



     On the other hand if you have a terminal compatible with DEC's  VT52,  but
not with the ANSI terminal, you should change the constant "ansi0" to false and
"vt520" to true.   If you have instead a terminal which is compatible with  IBM
PC  graphics,  you  should  set the constant "ibmpc0" to true and the others to
false.   If your terminal is compatible with none of these, you  will  have  to
set the constants all false, in which case special graphics characters will not
be used to indicate character states, but only "*" for 1, "=" for  0,  and  "."
for ? states.  This is less easy to look at.

     The program should work successfully on DEC VAX systems under  either  the
VMS or the Unix operating systems without any other changes.

                      MORE ABOUT THE PARSIMONY CRITERION

     DOLMOVE uses  as  its  numerical  criterion  the  Dollo  and  polymorphism
parsimony methods.  The program defaults to carrying out Dollo parsimony.

     The Dollo parsimony method was first suggested in print in verbal form  by
Le  Quesne (1974) and was first well-specified by Farris (1977).  The method is
named after Louis Dollo since he was  one  of  the  first  to  assert  that  in
evolution  it  is  harder  to  gain  a  complex  feature  than to lose it.  The
algorithm explains the presence of the state 1 by allowing up  to  one  forward
change  0-->1  and  as  many  reversions  1-->0 as are necessary to explain the
pattern of states seen.  The program attempts to minimize the number  of  1-->0
reversions necessary.

     The assumptions of this method are in effect:

     1. We know which state is the ancestral one (state 0).

     2. The characters are evolving independently.

     3. Different lineages evolve independently.

     4. The  probability  of  a  forward  change  (0-->1)  is  small  over  the
evolutionary times involved.

     5. The probability of a reversion (1-->0) is also  small,  but  still  far
larger  than  the  probability of a forward change, so that many reversions are
easier to envisage than even one extra forward change.

     6. Retention  of  polymorphism  for  both  states  (0  and  1)  is  highly
improbable.

     7. The lengths of the segments of the true tree are not  so  unequal  that
two changes in a long segment are as probable as one in a short segment.

     One problem can arise when using additive binary recoding to  represent  a
multistate  character  as  a series of two-state characters.  Unlike the Camin-
Sokal, Wagner, and Polymorphism  methods,  the  Dollo  method  can  reconstruct
ancestral states which do not exist.  An example is given in my 1979 paper.  It
will be necessary to check the output to make sure that this has not occurred.

     The polymorphism parsimony method was first used by me,  and  the  results
published  (without  a clear specification of the method) by Inger (1967).  The
method was published by Farris (1978a) and by me (1979).   The  method  assumes
that  we can explain the pattern of states by no more than one origination (0--
>1) of state 1, followed by retention of polymorphism along as many segments of
the  tree  as  are  necessary,  followed by loss of state 0 or of state 1 where
necessary.  The program tries to  minimize  the  total  number  of  polymorphic
characters,  where  each  polymorphism  is counted once for each segment of the



tree in which it is retained.

     The assumptions of the polymorphism parsimony method are in effect:

     1. The ancestral state (state 0) is known in each character.

     2. The characters are evolving independently of each other.

     3. Different lineages are evolving independently.

     4. Forward change (0-->1) is highly improbable over  the  length  of  time
involved in the evolution of the group.

     5. Retention of polymorphism is also improbable,  but  far  more  probable
that  forward change, so that we can more easily envisage much polymorhism than
even one additional forward change.

     6. Once state 1 is reached, reoccurrence of state 0  is  very  improbable,
much less probable than multiple retentions of polymorphism.

     7. The lengths of segments in the true tree are not so unequal that we can
more  easily  envisage  retention events occurring in both of two long segments
than one retention in a short segment.

     That these are the assumptions of parsimony methods has been documented in
a  series of papers of mine: (1973a, 1978b, 1979, 1981b, 1983b, 1988b).  For an
opposing  view  arguing  that  the  parsimony  methods  make   no   substantive
assumptions  such  as  these, see the papers by Farris (1983) and Sober (1983a,
1983b), but also read the exchange between Felsenstein and Sober (1986).

     At the beginning of the program are a series of constants,  which  can  be
changed  to help adapt the program to different computer systems.  "nmlngth" is
the length of the species names.  "screenlines" specifies the number  of  lines
per  screen,  which you will normally want to leave at its default value of 24.
I have already described the  constants  "ibmpc0",  "vt520",  and  "ansi0"  for
specifying the terminal type.

     Below is a test data set, but we  cannot  show  the  output  it  generates
because of the interactive nature of the program.

-------------------------------TEST DATA SET----------------------------

     5    6
Alpha     110110
Beta      110000
Gamma     100110
Delta     001001
Epsilon   001110