[ Program Manual | User's Guide | Data Files | Databases ]
Pretty displays multiple sequence alignments and calculates a consensus sequence. It does not create the alignment; it simply displays it.
Pretty prints sequences with their columns aligned and can display a consensus for the alignment, allowing you to look at relationships among the sequences. This program can be used for aligned sequences in an MSF (multiple sequence format) or RSF (rich sequence format) file, or for separate sequences that have had gaps added to make them all align.
You can change the alignments displayed by Pretty with a text editor. The output from Pretty can then be separated into individual sequence files by running Pretty with the command-line parameter -UGLy.
By repeatedly using the Gap program with the command-line parameter -OUT, gaps were added to a group of picorna virus capsid proteins in the antigenic region to make them align with each other and with a growing consensus sequence. Here is a session using Pretty to display the alignment and calculate a consensus sequence of the antigenic region from those picorna virus capsid protein sequences.
% pretty -CONsensus -CASe PRETTY format what sequence(s) ? @pretty.list fa10.ugly len: 349 wgt: 0.50 fa12.ugly len: 349 wgt: 0.50 ////////////////////////////// r14.ugly len: 349 wgt: 0.50 r2.ugly len: 349 wgt: 0.50 Begin (* 1 *) ? End (* 349 *) ? Find consensus to what minimum plurality (* 2.00 *) ? What should I call the output file (* pretty.pretty *) ? %
Plurality: 2.00 Threshold: 4 AveWeight 0.55 AveMatch 2.91 AvMisMatch -2.00 PRETTY of: @pretty.list October 3, 1996 10:35 .. 1 50 fa10.ugly .......... .......... .......... ..TTttGESA D.PvtTtVE. fa12.ugly .......... .......... .......... ..TTatGESA D.PvtTtVE. fo1k.ugly .......... .......... .......... ..TTsaGESA D.PvtTtVE. e.ugly Gvenae.kgv tEnTna.Tad fvaqpvyLPe .nqT...... kv.Affynrs p1m.ugly GlgqmlEsmI .dnTvreTvg AatsrdaLPn teasGPthSk eiPALTAVET p1s.ugly GlgqmlEsmI .dnTvreTvg AatsrdaLPn teasGPahSk eiPALTAVET p2s.ugly GigdmiEgav .Egitknalv pptstnsLPg hkpsGPahSk eiPALTAVET p3s.ugly Giedliseva .qgal..Tls lpkqqdsLPd tkasGPahSk evPALTAVET cb3.ugly ...gpvEdaI .......T.. Aaigr..vad tvgTGPtnSe aiPALTAaET r14.ugly GlgdelEevI vEkT.kqTv. Asi....... ..ssGPkhtq kvPiLTAnET r2.ugly ...npvEnyI dEvlnevlv. .......vPn inssnPttSn saPALdAaET Consensus G-----E--I -E-T---T-- A------LP- --TTGPGESA D-PALTAVET ///////////////////////////////////////////////////////////////// 301 349 fa10.ugly aElyCPRPll AIkvtsqdRy KqKI.iAPa. ..KQll.... ......... fa12.ugly aElyCPRPll AIevssqdRh KqKI.iAPg. ..KQll.... ......... fo1k.ugly aEtyCPRPll AIhpt.eaRh KqKI.vAPv. ..KQTl.... ......... e.ugly krvfCPRPtv ffPwpTsG.D Kidmtpragv lmlespnald isrty.... p1m.ugly irvWCPRPPR AlaYygpGvD ykdgtltPls tkdlTTy... ......... p1s.ugly irvWCPRPPR AvaYygpGvD ykdgtltPls tkdlTTy... ......... p2s.ugly VrvWCPRPPR AvPYfgpGvD ykdg.ltPlp ekglTTy... ......... p3s.ugly VrvWCPRPPR AvPYygpGvD yrn.nldPls ekglTTy... ......... cb3.ugly VkaWiPRPPR lcqYekakn. vnfrssgvtt trqsiTtmtn tgaiwtti. r14.ugly VEaWiPRaPR AlPY.Tsigr tny..pknte pvikkrk.gd i.ksy.... r2.ugly VkaWCPRPPR AleY.Trahr tnfkiedrsi qtaivTrpii ttagpsdmy Consensus VE-WCPRPPR AIPY-T-GRD K-KI--AP-- --KQTT---- ---------
Pretty accepts multiple (one or more) aligned nucleotide sequences or aligned protein sequences as input. You can specify an MSF file, such as the output file from a session with PileUp, as input to Pretty with a command like % pretty pileup.msf{*}. Similarly, you can specify an RSF file, such as the output file from a session with PileUp in SeqLab, as input to Pretty with a command like % pretty pileup.rsf{*}. Weights can be specified for sequences in both MSF and RSF files. (See the Vote Weight discussion below.) Multiple sequence alignments can also be represented with list files. For Pretty, these files may include a vote weight for each sequence with the wgt: sequence attribute.
Here is the input file of sequence names (pretty.list) from the example session:
!!SEQUENCE_LIST 1.0 A multiple sequence alignment represented as a list file for input to the programs PRETTY, PROFILEMAKE and LINEUP. 7/30/94 .. GenDocData:fa10.ugly wgt: 0.5 GenDocData:fa12.ugly wgt: 0.5 GenDocData:fo1k.ugly wgt: 1.0 GenDocData:e.ugly wgt: 1.0 GenDocData:p1m.ugly wgt: 0.25 GenDocData:p1s.ugly wgt: 0.25 GenDocData:p2s.ugly wgt: 0.25 GenDocData:p3s.ugly wgt: 0.25 GenDocData:cb3.ugly wgt: 1.0 GenDocData:r14.ugly wgt: 0.5 GenDocData:r2.ugly wgt: 0.5
The function of Pretty depends on whether your input sequence(s) are protein or nucleotide. Programs determine the type of a sequence by the presence of either Type: N or Type: P on the last line of the text heading just above the sequence. If your sequence(s) are not the correct type, turn to Appendix VI for information on how to change or set the type of a sequence.
PileUp creates a multiple sequence alignment of a group of related sequences. If you run Gap with the command-line parameters for sequence output, it writes sequence files with the sequences expanded by the addition of gaps. LineUp is an editor that allows you to edit multiple sequence alignments.
PlotSimilarity plots the average similarity of two or more aligned sequences at each position in the alignment.
Pretty displays sequences that have already been aligned. You can use up to 500 sequences, although the total length of all sequences combined must be less than 2,000,000 characters.
If you use one of the command-line parameters -CONsensus, -DIFferences, or -CASe, Pretty calculates a consensus for each column of the alignment using the scoring matrix blosum62.cmp for peptides or prettydna.cmp for nucleic acids. The consensus symbol for a column is determined in two steps:
1) The program finds the symbol whose comparison to all of the symbols in the column (including itself) yields the greatest number of votes. A vote is cast for each symbol comparison that is over some set threshold value; votes can be either 1.0 or some vote weight assigned to the sequence from which the vote comes.
2) Among the coalition of symbols that voted for the winning symbol, the most common symbol is chosen as the consensus.
If there is no coalition of votes that is larger than all of the other coalitions, or if the largest coalition of votes is below the minimum plurality, then there is no consensus for the column.
The weights for each sequence and the minimum plurality are floating point numbers. The threshold value is an integer.
If you use -IDEntity, a consensus symbol is chosen only when all of the sequence symbols in a column of the alignment are identical, regardless of their votes.
If you use -CASe, Pretty shows the symbols in a column in uppercase when their comparison value with the consensus symbol meets or exceeds the threshold. All other symbols are in lowercase.
If you use -DIFferences, Pretty only shows those symbols in a column whose comparison value with the consensus symbol is lower than the threshold. These symbols are shown in lowercase; all other positions in the column are left blank.
If you use -CONsensus, Pretty adds a line to your alignment with the consensus sequence.
determines the scoring matrix value below which a symbol may not vote for a coalition. Pretty chooses a default threshold that is appropriate for the scoring matrix it reads. If you select a different scoring matrix with the -MATRix command-line parameter, the program will adjust the default threshold accordingly. Use -THReshold to specify an alternative threshold if you don't want to accept the default value.
defines the number of votes (vote weights) below which there is no consensus.
If several of your sequences are very similar, you may not want their votes to dominate the consensus for the column. If your input file specification to Pretty is a list file, you can assign each sequence a vote weight with the wgt sequence attribute. The vote weight is the vote that each row casts for the consensus. A weight of 1.0 is assumed if no vote weight is specified. (See the INPUT FILES topic for information about the list file used to run the example above.) Note how each kind of sequence is assigned a vote weight so that their combined impact on the election is never more than one vote. For more information about list files, see "Using List Files" in Chapter 2, Using Sequence Files and Databases in the User's Guide.
You can assign vote weights to sequences in an MSF file by editing the MSF file and modifying the weight on the name/weight line for each sequence at the top of the file. (See "Using Multiple Sequence Format (MSF) Files" in Chapter 2, Using Sequence Files and Databases in the User's Guide for a complete description of MSF files.)
You can assign vote weights to sequences in an RSF (rich sequence format) file by modifying the weight attribute for each sequence within SeqLab. (See "Using Rich Sequence Format (RSF) Files" in Chapter 2, Using Sequence Files and Databases in the User's Guide for a complete description of RSF files. Also see "Viewing Sequence Attribute and Reference Information" in Chapter 2, Editing Sequences in the SeqLab Guide for more information about modifying the weight attribute for each sequence within an RSF file.)
If a sequence from an MSF or RSF file is listed in a list file with a vote weight, the vote weight in the list file is used; the sequence weight in the MSF or RSF file is ignored. If you add -WEIGHT=1.0 to the command line, Pretty ignores weights specified for individual sequences and gives all of the sequences in the alignment equal weight.
All parameters for this program may be put on the command line. Use the parameter -CHEck to see the summary below and to have a chance to add things to the command line before the program executes. In the summary below, the capitalized letters in the parameter names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose parameter values that are optional. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.
Minimal Syntax: % pretty [-INfile=]@Pretty.List -Default Prompted Parameters: -BEGin=1 -END=349 range of interest [-OUTfile=]pretty.pretty output file Local Data Files: -MATRix=prettydna.cmp consensus scoring matrix for nucleotides -MATRix=blosum62.cmp consensus scoring matrix for peptides Optional Parameters: -CONsensus generates (displays) a consensus sequence -IDEntity[=*] only shows positions of unanimous agreement in the consensus -DIFferences[="-"] only shows positions disagreeing with the calculated consensus -CASe shows positions agreeing with the calculated consensus in upper case -THReshold=1 sets minimum comparison value for symbol to vote in consensus -PLUrality=2.0 defines the minimum number of votes for a consensus to exist -LINesize=50 sets the number of residues per line -WEIGHT=1.0 sets the weight for all input sequences -BLOcksize=10 sets the number of residues per block -UGLy writes the individual sequences into new files
We are very grateful to Ann Palmenberg of the UW Biophysics lab for help with the design of Pretty. The sequences in the example were aligned for Dr. Palmenberg's work.
The files described below supply auxiliary data to this program. The program automatically reads them from a public data directory unless you either 1) have a data file with exactly the same name in your current working directory; or 2) name a file on the command line with an expression like -DATa1=myfile.dat. For more information see Chapter 4, Using Data Files in the User's Guide.
This program reads one or more scoring matrices for the comparison of sequence characters. The program automatically reads the program default scoring matrix file in a public data directory unless you either 1) have a data file with exactly the same name as the program default scoring matrix in your current working directory; or 2) have a data file with exactly the same name as the program default scoring matrix in the directory with the logical name MyData; or 3) name a file on the command line with an expression like -MATRix=mymatrix.cmp. If you don't include a directory specification when you name a file on the command line with -MATRix, the program searches for the file first in your local directory, then in the directory with the logical name MyData, then in the public data directory with the logical name GenMoreData, and finally in the public data directory with the logical name GenRunData. For more information see "Using a Special Kind of Data File: A Scoring Matrix" in Chapter 4, Using Data Files in the User's Guide.
If you use one of the command-line parameters -CONsensus, -DIFferences, or -CASe, Pretty calculates a consensus for each column using a scoring matrix (see Chapter 4, Using Data Files in the User's Guide). You can provide your own matrix called either blosum62.cmp for peptides or prettydna.cmp for nucleic acids. You can specify some other matrix with the command-line parameter -MATRix=filename.
The parameters listed below can be set from the command line. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.
allows you to specify a scoring matrix file name other than the program default. If you don't include a directory specification when you name a file on the command line with -MATRix, the program searches for the file first in your local directory, then in the directory with the logical name MyData, then in the public data directory with the logical name GenMoreData, and finally in the public data directory with the logical name GenRunData. For more information see the Local Scoring Matrices topic above.
causes Pretty to show a consensus sequence for the set of sequences you are displaying. (Read how Pretty finds the consensus above.)
causes Pretty to show a consensus indicating where there is complete agreement among all of the sequences. If an optional character is added after the command-line parameter, Pretty uses that character to indicate complete agreement. Otherwise, the consensus contains the completely conserved sequence symbol.
causes Pretty to print only those symbols in each column whose comparison value with the consensus symbol is lower than the threshold (see -THReshold below), and to print blank spaces at all other positions. If an optional character is added, Pretty prints that character instead of blank spaces. The optional character has to be enclosed in quotes.
causes Pretty to print in uppercase all those symbols in each column whose comparison value with the consensus symbol is greater than or equal to the threshold (see -THReshold below), and to print all other symbols in lowercase. This parameter overrides -DIFferences if both are used.
determines the scoring matrix value below which a symbol may not vote for a coalition (see the CALCULATING A CONSENSUS topic above). Pretty chooses a default threshold that is appropriate for the scoring matrix it reads. If you select a different scoring matrix with the -MATRix command-line parameter, the program will adjust the default threshold accordingly. Use -THReshold to specify an alternative threshold if you don't want to accept the default value.
defines the number of votes (vote weights) below which there is no consensus (see the CALCULATING A CONSENSUS topic above).
sets the sequence weight for all input sequences. When the weight is set from the command line, Pretty ignores weights specified for individual sequences in a list file, a multiple sequence format (MSF) file, or a rich sequence format (RSF) file.
specifies the number of sequence symbols to display on each line.
specifies the number of sequence symbols to put into each block.
rewrites the sequences in a Pretty output file into individual sequence files in GCG format. The Pretty output file must have a line with two periods (..) separating the text in the heading from the sequences. -UGLy also causes Pretty to write a list file to go with the new sequence files.
[ Program Manual | User's Guide | Data Files | Databases ]
Documentation Comments: doc-comments@gcg.com
Technical Support: help@gcg.com
Copyright (c) 1982, 1983, 1985, 1986, 1987, 1989, 1991, 1994, 1995, 1996, 1997 Genetics Computer Group, Inc. a wholly owned subsidiary of Oxford Molecular Group, Inc. All rights reserved.
Licenses and Trademarks Wisconsin Package is a trademark of Genetics Computer Group, Inc. GCG and the GCG logo are registered trademarks of Genetics Computer Group, Inc.
All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.