[ Program Manual | User's Guide | Data Files | Databases ]
OldDistances makes a table of the pairwise similarities within a group of aligned sequences.
OldDistances writes a matrix of the pairwise similarities between up to 50 different sequences in a multiple sequence alignment. The similarity value is the number of matches between each sequence pair divided by the sequence length.
A match occurs if the value in the scoring matrix for a pair of bases or amino acids is greater than or equal to a set match threshold.
The denominator can be any of four functions of sequence length: 1) the length of the shorter sequence of the pair; 2) the length without gaps of the shorter sequence of the pair; 3) the average of the sequence lengths; or 4) the average of the sequence lengths without gaps.
Here is a session using OldDistances to determine similarities between aligned sequences in the file hsp70.msf:
% olddistances OLDDISTANCES within what multiple sequence alignment ? hsp70.msf{*} What is the threshold for a match (* 0.6 *) ? Divide the sum of the matches by: 1) Length of shorter sequence including gaps 2) Length of shorter sequence excluding gaps 3) Average sequence length including gaps 4) Average sequence length excluding gaps 5) Nothing Please choose one (* 2 *) : hsp70.msf{hs70_plafa} 720 hsp70.msf{hs70_thean} 720 hsp70.msf{hs70_leido} 720 ////////////////////////// hsp70.msf{hs75_yeast} 720 hsp70.msf{hs77_yeast} 720 hsp70.msf{dnak_ecoli} 720 What should I call the output file (* hsp70.distances *) ? %
Here is part of the output file; it contains a 29 X 29 matrix (not all of which is shown):
OLDDISTANCES within: hsp70.msf{*} August 9, 1994 13:42 Threshold of comparison: 0.60 Denominator: "Length of shorter sequence without gaps" Number of sequences: 28 Symbol Comparison Table: GenRunData:pepdistances.cmp Key for column and row indices: 1 hsp70.msf{Hs70_Plafa} Length: 720 Length without gaps: 681 2 hsp70.msf{Hs70_Thean} Length: 720 Length without gaps: 646 3 hsp70.msf{Hs70_Leido} Length: 720 Length without gaps: 653 //////////////////////////////////////////////////////////////////////// 28 hsp70.msf{Dnak_Ecoli} Length: 720 Length without gaps: 637 Distance Matrix Part: 1 1 2 3 4 5 ... ____________________________________________________________ ... .. | 1 | 1.0000 0.8421 0.7933 0.8372 0.7655 ... | 2 | 1.0000 0.7554 0.8081 0.7477 ... | 3 | 1.0000 0.9884 0.9096 ... | 4 | 1.0000 0.9496 ... //////////////////////////////////////////////////////////// ...
PileUp creates a multiple sequence alignment from a group of related sequences using progressive, pairwise alignments. It can also plot a tree showing the clustering relationships used to create the alignment. Gap makes sequence alignments. LineUp edits multiple sequence alignments. ProfileGap aligns a new sequence to an existing multiple sequence alignment. Pretty displays multiple sequence alignments. Distances (which is the replacement for this program) writes a matrix of the pairwise genetic distances between sequences in a multiple sequence alignment. The distances can be corrected for multiple substitions at a single site using one of several distance correction methods and the distance value is expressed as the number of nucleotide or amino acid substitutions per 100 residues.
Unknown. The sequences must be aligned properly for OldDistances to work.
OldDistances compares each pair of aligned sequences base by base from the first symbol to the last symbol of the shorter sequence. The sequences must have already been aligned for the comparison to make sense. OldDistances simply counts the matches where the scoring matrix value is greater than a set match threshold. The sum of the matches is divided by various denominators such as the length of the shorter sequence.
Gaps are treated like any other symbol. The gap symbol (.) matches another symbol if that pair's value in the scoring matrix is above the threshold.
OldDistances chooses a default match threshold that is appropriate for the scoring matrix it reads. If you select a different scoring matrix wit the -MATRix command-line parameter, the program will adjust the default match threshold accordingly.
If the sequences are not in an MSF file, use Pretty to display the aligned sequences you pass to OldDistances. If they look right in the Pretty display, they work sensibly with OldDistances.
All parameters for this program may be put on the command line. Use the parameter -CHEck to see the summary below and to have a chance to add things to the command line before the program executes. In the summary below, the capitalized letters in the parameter names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose parameter values that are optional. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.
Minimal Syntax: % olddistances [-INfile=]hsp70.msf{*} -Default Prompted Parameters: -THReshold=0.6 minimum symbol comparison score for a match -MENu=2 divides the sum of the matches by: 1=length of the shorter sequence 2=length of the shorter sequence without gaps 3=Average length 4=Average length without gaps 5=Nothing [-OUTfile=]hsp70.distances output file Local Data Files: -MATRix=blosum62.cmp scoring matrix for peptide sequences -MATRix=dnadistances.cmp scoring matrix for nucleotide sequences Optional Parameters: None
The files described below supply auxiliary data to this program. The program automatically reads them from a public data directory unless you either 1) have a data file with exactly the same name in your current working directory; or 2) name a file on the command line with an expression like -DATa1=myfile.dat. For more information see Chapter 4, Using Data Files in the User's Guide.
This program reads one or more scoring matrices for the comparison of sequence characters. The program automatically reads the program default scoring matrix file in a public data directory unless you either 1) have a data file with exactly the same name as the program default scoring matrix in your current working directory; or 2) have a data file with exactly the same name as the program default scoring matrix in the directory with the logical name MyData; or 3) name a file on the command line with an expression like -MATRix=mymatrix.cmp. If you don't include a directory specification when you name a file on the command line with -MATRix, the program searches for the file first in your local directory, then in the directory with the logical name MyData, then in the public data directory with the logical name GenMoreData, and finally in the public data directory with the logical name GenRunData. For more information see "Using a Special Kind of Data File: A Scoring Matrix" in Chapter 4, Using Data Files in the User's Guide.
OldDistances reads the scoring matrix file blosum62.cmp for peptide comparisons and dnadistances.cmp for nucleotide comparisons.
The parameters listed below can be set from the command line. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.
allows you to specify a scoring matrix file name other than the program default. If you don't include a directory specification when you name a file on the command line with -MATRix, the program searches for the file first in your local directory, then in the directory with the logical name MyData, then in the public data directory with the logical name GenMoreData, and finally in the public data directory with the logical name GenRunData. For more information see the Local Scoring Matrices topic above.
[ Program Manual | User's Guide | Data Files | Databases ]
Documentation Comments: doc-comments@gcg.com
Technical Support: help@gcg.com
Copyright (c) 1982, 1983, 1985, 1986, 1987, 1989, 1991, 1994, 1995, 1996, 1997 Genetics Computer Group, Inc. a wholly owned subsidiary of Oxford Molecular Group, Inc. All rights reserved.
Licenses and Trademarks Wisconsin Package is a trademark of Genetics Computer Group, Inc. GCG and the GCG logo are registered trademarks of Genetics Computer Group, Inc.
All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.