OLDDISTANCES

[ Program Manual | User's Guide | Data Files | Databases ]

Table of Contents

FUNCTION

DESCRIPTION

FUNCTION [ Top | Next ]

OldDistances makes a table of the pairwise similarities within a group of aligned sequences.

DESCRIPTION [ Previous | Top | Next ]

OldDistances writes a matrix of the pairwise similarities between up to 50 different sequences in a multiple sequence alignment. The similarity value is the number of matches between each sequence pair divided by the sequence length.

Matches

A match occurs if the value in the scoring matrix for a pair of bases or amino acids is greater than or equal to a set match threshold.

Denominator

The denominator can be any of four functions of sequence length: 1) the length of the shorter sequence of the pair; 2) the length without gaps of the shorter sequence of the pair; 3) the average of the sequence lengths; or 4) the average of the sequence lengths without gaps.

EXAMPLE [ Previous | Top | Next ]

Here is a session using OldDistances to determine similarities between aligned sequences in the file hsp70.msf:


% olddistances

 OLDDISTANCES within what multiple sequence alignment ?  hsp70.msf{*}

 What is the threshold for a match (* 0.6 *) ?

 Divide the sum of the matches by:

     1)  Length of shorter sequence including gaps
     2)  Length of shorter sequence excluding gaps
     3)  Average sequence length including gaps
     4)  Average sequence length excluding gaps
     5)  Nothing

 Please choose one (* 2 *) :

     hsp70.msf{hs70_plafa} 720
     hsp70.msf{hs70_thean} 720
     hsp70.msf{hs70_leido} 720

     //////////////////////////

     hsp70.msf{hs75_yeast} 720
     hsp70.msf{hs77_yeast} 720
     hsp70.msf{dnak_ecoli} 720

 What should I call the output file (* hsp70.distances *) ?

%

OUTPUT [ Previous | Top | Next ]: Here is part of the output file; it contains a 29 X 29 matrix (not all of which is shown):


 OLDDISTANCES within: hsp70.msf{*}  August  9, 1994 13:42

Threshold of comparison: 0.60
            Denominator: "Length of shorter sequence without gaps"
    Number of sequences: 28
Symbol Comparison Table: GenRunData:pepdistances.cmp

Key for column and row indices:

  1     hsp70.msf{Hs70_Plafa}  Length: 720       Length without gaps: 681
  2     hsp70.msf{Hs70_Thean}  Length: 720       Length without gaps: 646
  3     hsp70.msf{Hs70_Leido}  Length: 720       Length without gaps: 653

 ////////////////////////////////////////////////////////////////////////

 28     hsp70.msf{Dnak_Ecoli}  Length: 720       Length without gaps: 637

 Distance Matrix Part: 1

                 1         2         3         4         5   ...
____________________________________________________________ ... ..
|    1   |    1.0000    0.8421    0.7933    0.8372    0.7655 ...
|    2   |              1.0000    0.7554    0.8081    0.7477 ...
|    3   |                        1.0000    0.9884    0.9096 ...
|    4   |                                  1.0000    0.9496 ...

//////////////////////////////////////////////////////////// ...

PileUp creates a multiple sequence alignment from a group of related sequences using progressive, pairwise alignments. It can also plot a tree showing the clustering relationships used to create the alignment. Gap makes sequence alignments. LineUp edits multiple sequence alignments. ProfileGap aligns a new sequence to an existing multiple sequence alignment. Pretty displays multiple sequence alignments. Distances (which is the replacement for this program) writes a matrix of the pairwise genetic distances between sequences in a multiple sequence alignment. The distances can be corrected for multiple substitions at a single site using one of several distance correction methods and the distance value is expressed as the number of nucleotide or amino acid substitutions per 100 residues.

RESTRICTIONS [ Previous | Top | Next ]

Unknown. The sequences must be aligned properly for OldDistances to work.

ALGORITHM [ Previous | Top | Next ]

OldDistances compares each pair of aligned sequences base by base from the first symbol to the last symbol of the shorter sequence. The sequences must have already been aligned for the comparison to make sense. OldDistances simply counts the matches where the scoring matrix value is greater than a set match threshold. The sum of the matches is divided by various denominators such as the length of the shorter sequence.

Gaps are treated like any other symbol. The gap symbol (.) matches another symbol if that pair's value in the scoring matrix is above the threshold.

CONSIDERATIONS [ Previous | Top | Next ]

OldDistances chooses a default match threshold that is appropriate for the scoring matrix it reads. If you select a different scoring matrix wit the -MATRix command-line parameter, the program will adjust the default match threshold accordingly.

SUGGESTIONS [ Previous | Top | Next ]

If the sequences are not in an MSF file, use Pretty to display the aligned sequences you pass to OldDistances. If they look right in the Pretty display, they work sensibly with OldDistances.

COMMAND-LINE SUMMARY [ Previous | Top | Next ]

All parameters for this program may be put on the command line. Use the parameter -CHEck to see the summary below and to have a chance to add things to the command line before the program executes. In the summary below, the capitalized letters in the parameter names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose parameter values that are optional. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.


Minimal Syntax: % olddistances [-INfile=]hsp70.msf{*} -Default

Prompted Parameters:

-THReshold=0.6       minimum symbol comparison score for a match
-MENu=2              divides the sum of the matches by:
                     1=length of the shorter sequence
                     2=length of the shorter sequence without gaps
                     3=Average length
                     4=Average length without gaps
                     5=Nothing
[-OUTfile=]hsp70.distances  output file

Local Data Files:

-MATRix=blosum62.cmp      scoring matrix for peptide sequences
-MATRix=dnadistances.cmp  scoring matrix for nucleotide sequences

Optional Parameters: None

LOCAL DATA FILES [ Previous | Top | Next ]

The files described below supply auxiliary data to this program. The program automatically reads them from a public data directory unless you either 1) have a data file with exactly the same name in your current working directory; or 2) name a file on the command line with an expression like -DATa1=myfile.dat. For more information see Chapter 4, Using Data Files in the User's Guide.

Local Scoring Matrices

This program reads one or more scoring matrices for the comparison of sequence characters. The program automatically reads the program default scoring matrix file in a public data directory unless you either 1) have a data file with exactly the same name as the program default scoring matrix in your current working directory; or 2) have a data file with exactly the same name as the program default scoring matrix in the directory with the logical name MyData; or 3) name a file on the command line with an expression like -MATRix=mymatrix.cmp. If you don't include a directory specification when you name a file on the command line with -MATRix, the program searches for the file first in your local directory, then in the directory with the logical name MyData, then in the public data directory with the logical name GenMoreData, and finally in the public data directory with the logical name GenRunData. For more information see "Using a Special Kind of Data File: A Scoring Matrix" in Chapter 4, Using Data Files in the User's Guide.

OldDistances reads the scoring matrix file blosum62.cmp for peptide comparisons and dnadistances.cmp for nucleotide comparisons.

OPTIONAL PARAMETERS [ Previous | Top | Next ]

The parameters listed below can be set from the command line. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.

-MATRix=mymatrix.cmp: allows you to specify a scoring matrix file name other than the program default. If you don't include a directory specification when you name a file on the command line with -MATRix, the program searches for the file first in your local directory, then in the directory with the logical name MyData, then in the public data directory with the logical name GenMoreData, and finally in the public data directory with the logical name GenRunData. For more information see the Local Scoring Matrices topic above.

Printed: November 18, 1996 13:05 (1162)

[ Program Manual | User's Guide | Data Files | Databases ]

Documentation Comments: doc-comments@gcg.com
Technical Support: help@gcg.com

Licenses and Trademarks Wisconsin Package is a trademark of Genetics Computer Group, Inc. GCG and the GCG logo are registered trademarks of Genetics Computer Group, Inc.

All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.