FRAMESEARCH⁽⁺⁾

FrameSearch searches a group of protein sequences for similarity to one or more nucleotide query sequences, or searches a group of nucleotide sequences for similarity to one or more protein query sequences. For each sequence comparison, the program creates the optimal local alignment of the best region of similarity between the protein sequence and all possible codons on each strand of the nucleotide sequence. Because FrameSearch can match the protein to codons in different reading frames of the nucleotide sequence as part of the same alignment, it can identify sequence similarity even when the nucleotide sequence contains reading frame shifts.

In standard sequence alignment programs, you routinely specify gap creation and extension penalties. In addition to these penalties, FrameSearch also allows you to specify a separate frameshift penalty for the creation of gaps that result in reading frame shifts in the nucleotide sequence. (See the ALGORITHM topic for a more detailed explanation of how gaps are penalized.)

By default, the search proceeds as a local alignment between the query sequence and each sequence in the search set. Optionally, you can search using a global alignment procedure where FrameSearch inserts gaps to optimize the alignment between the entire nucleotide sequence and the entire protein sequence.

The search output contains an ordered list of the sequences in the search set that have the highest comparison scores when aligned to the query sequence. The actual alignments for these top-scoring matches are displayed after the list.

You can specify multiple query sequences (such as a list file or a sequence specification using an asterisk (*) wildcard) as input to FrameSearch. The program compares each query sequence separately to the sequences specified in the search set, and it writes a separate output file for each query search. If you use a list file as your query, you can add begin and end sequence attributes to specify the range for each query sequence. For more information about list files, see "Using List Files" in Chapter 2, Using Sequence Files and Databases in the User's Guide.

EXAMPLE [ Previous | Top | Next ]

Here is a session using FrameSearch to find sequences in SWISS-PROT with similarities to the translation product of the cDNA sequence EST:Atts0012.


% FrameSearch

 FRAMESEARCH with what query sequence(s) ? EST:Atts0012

                  Begin (* 1 *) ?
                End (*   286 *) ?

 Search for query in what sequence(s) (* SwissProt:* *) ?

 What is the gap creation penalty (* 12 *) ?

 What is the gap extension penalty (* 4 *) ?

 What is the frameshift penalty (* 0 *) ?

 This program can plot the distribution of alignment search scores graphically.
 Do you want to:

     A) Plot to a FIGURE file called "atts0012.figure"
     B) Plot graphics on LaserWriter attached to /dev/tty10
     C) Suppress the plot

 Please choose one (* A *):

 What should I call the output file (* atts0012.framesearch *) ?

          1 Sequences         924 aa searched    SW:104k_thepa
        101 Sequences      36,727 aa searched    SW:a1d_psesp

        //////////////////////////////////////////////////////

     52,101 Sequences  18,494,206 aa searched    SW:ZG58_XENLA
     52,201 Sequences  18,529,247 aa searched    SW:ZPB_RABIT

Aligning........................................

 FIGURE instructions are now being written into atts0012.figure.

 CPU time used:
        Search time:  2:53: 5.6
   Post-search time:  0: 0: 6.9
     Total CPU time:  2:53:12.5

 Output File: Atts0012.Framesearch

%

OUTPUT [ Previous | Top | Next ]: Here is some of the output file:


!!SEQUENCE_LIST 1.0
  FRAMESEARCH of: Gb_Est1:Atts0012  check: 2422   from: 1  to: 286

LOCUS       ATTS0012      286 bp    RNA             EST       31-OCT-1992
DEFINITION  A. thaliana transcribed sequence; clone TAT1B11, 5' end; similar to
            GLYCERALDEHYDE 3-PHOSPHATE DEHYDROGENASE.
ACCESSION   Z17438
NID         g16580
KEYWORDS    expressed sequence tag; partial cDNA sequence. . . .

 TO: SwissProt:*  Sequences: 52,205  Total-length: 18,531,385
    September 20, 1996 17:09

 Databases searched:
        swissprot, Release 33.0, Released on 22Mar96, Formatted on 22Jul1996

 Scoring matrix: GenRunData:blosum62.cmp
 Translation table: GenRunData:translate.txt

  Gap creation penalty:     12
 Gap extension penalty:      4
    Frameshift penalty:      0

The best scores are:                                                  ..

Sw:G3pc_Arath  P25858 arabidopsis thaliana (mouse-ear cress). glyce...  343
Sw:G3pc_Sinal  P04796 sinapis alba (white mustard). glyceraldehyde ...  331
Sw:G3pc_Ranac  P26521 ranunculus acer (common buttercup). glycerald...  313

///////////////////////////////////////////////////////////////////////////

Sw:G3pc_Chlre  P49644 chlamydomonas reinhardtii. glyceraldehyde 3-p...  228
Sw:G3p_Klula  P17819 kluyveromyces lactis (yeast). glyceraldehyde 3...  227
Sw:G3p_Pig  P00355 sus scrofa (pig). glyceraldehyde 3-phosphate deh...  227

\\End of list

        Match display thresholds for the alignment(s):
                    | = IDENTITY
                    : =   2
                    . =   1

atts0012
G3pc_Arath

            Quality:    343             Length:    240
              Ratio:  4.397               Gaps:      2
 Percent Similarity: 98.718   Percent Identity: 97.436

                  .         .         .         .         .
       3 GAAATCAAGAAGGCCATCAAGGAGGAATCTGAAGGCAAAATGAAGGGAAT 52
         |||||||||||||||||||||||||||||||||||||||:::||||||||
     261 GluIleLysLysAlaIleLysGluGluSerGluGlyLysLeuLysGlyIl 277
                  .         .         .         .         .
      53 TTTGGGATACTCTGAGGATGATGTTGTGTCTACCGACTTTGTTGGTGACA 102
         ||||||||||...|||||||||||||||||||||||||||||||||||||
     278 eLeuGlyTyrThrGluAspAspValValSerThrAspPheValGlyAspA 294
                  .         .         .         .         .
     103 ACAGGTCAAGCATTTTCGATGCCAAGGCTGGATTGCATTGCATTGAGCGA 152
         ||||||||||||||||||||||||||||||||    ||||||||||||||
     295 snArgSerSerIlePheAspAlaLysAlaGly....IleAlaLeuSerAs 309
                  .         .         .         .         .
     153 CAAGTTTGTGAAGTTGGTGTCATGGTACGACAACGAATGGGGTTACACAG 202
         ||||||||||||||||||||||||||||||||||||||||||||||  ||
     310 pLysPheValLysLeuValSerTrpTyrAspAsnGluTrpGlyTyr..Se 325
                  .         .         .         .
     203 TTCTCGTGTCGTTGACCTTATCGTTCACATGTCAAAGGCC 242
         ||||||||||||||||||||||||||||||||||||||||
     326 rSerArgValValAspLeuIleValHisMetSerLysAla 338

  /////////////////////////////////////////////////////////////

! CPU time used:
!        Search time:  2:53: 5.6
!   Post-search time:  0: 0: 6.9
!     Total CPU time:  2:53:12.5

The FrameSearch output is an ordered list of those sequences with the highest alignment scores when compared to the query sequence. It reports each high-scoring sequence name along with a short line of sequence documentation and the alignment score. If /rev follows the sequence name, the match is to the reverse-complement strand of the nucleotide sequence.

By default, each line of the output list has space for 70 characters, including the sequence name and documentation. You can increase this space for documentation that accompanies each reported sequence by specifying a larger number with the -LINesize command-line parameter.

Following the list of best scores, FrameSearch displays the optimal alignments between the query sequence and the top-scoring sequences in the search list. The alignment output displays sequence similarity by printing one of three characters between a codon and an amino acid: a pipe character (|), a colon (:), or a period (.). Normally, a pipe character is put between a codon and an amino acid when the translated codon is identical to the amino acid. A colon is put between a codon and an amino acid when the comparison value between the translated codon and the amino acid is greater than or equal to the average positive non-identical comparison value in the amino acid substitution matrix. A period is put between a codon and an amino acid when the comparison value between the translated codon and the amino acid is greater than or equal to 1. You can change these match display thresholds from the command line by specifying the-PAIr command-line parameter. (See the Appendix VII for more information about comparison values in scoring matrices.)

The FrameSearch output file can be used as a list file for input to other Wisconsin Package programs.

If you specify multiple query sequences as input (see the INPUT FILES topic), FrameSearch writes a separate text output file for each query sequence used to search the search set.

SCORE DISTRIBUTION PLOT [ Previous | Top | Next ]

By default, FrameSearch plots a histogram showing the number of sequence comparisons with each different score. This plot can help you judge which of the sequences in your output list are significant and whether the output list was large enough to contain all of the significant scores. Here is the score distribution plot from the example session:

By looking at this plot, you can conclude that comparisons with a score of less than about 65 are probably part of the population of sequences with only random similarity to EST:Atts0012.

If you specify multiple query sequences as input (see the INPUT FILES topic), or you add either -BATch or -Default to the FrameSearch command line, the score distribution plot for each query search is written to its own Figure file. Each Figure file is named after the query sequence and given the .figure file name extension. You can then use the Figure program to display any of the score distribution plots on the supported graphics device of your choice.

INPUT FILES [ Previous | Top | Next ]

The input to FrameSearch is one or more query sequences and one or more search set sequences. If the query input is one or more nucleotide sequences, the program will search a set of protein sequences; if the query input is one or more protein sequences, the program will search a set of nucleotide sequences. If the query input contains both nucleotide and protein sequences, the program will skip those query sequences that are not of the same type as the first sequence in the group. You can specify multiple sequences in a number of ways: by using a list file, for example @project.list; by using an MSF or RSF file, for example project.msf{*}; or by using a sequence specification with an asterisk (*) wildcard, for example GenEMBL:*.

If you use a list file to specify query sequences, you can add begin andend sequence attributes to specify a range for each sequence.

BLAST searches for sequences similar to a query sequence. The query and the database searched can be either peptide or nucleic acid in any combination. BLAST can search databases on your own computer or databases maintained at the National Center for Biotechnology Information (NCBI) in Bethesda, Maryland, USA.

FastA does a Pearson and Lipman search for similarity between a query sequence and a group of sequences of the same type (nucleic acid or protein). For nucleotide searches, FastA may be more sensitive than BLAST. TFastA does a Pearson and Lipman search for similarity between a query peptide sequence and any group of nucleotide sequences. TFastA translates the nucleotide sequences in all six reading frames before performing the comparison. It is designed to answer the question, "What implied peptide sequences in a nucleotide sequence database are similar to my peptide sequence?"

ProfileSearch uses a profile (representing a group of aligned sequences) as a query to search the database for new sequences with similarity to the group. The profile is created with the program ProfileMake.

FindPatterns, LookUp, StringSearch, and Names are other sequence identification programs.

FrameAlign creates an optimal alignment of the best segment of similarity (local alignment) between a protein sequence and the codons in all possible reading frames of a nucleotide sequence. Optimal alignments may include reading frame shifts.

ALGORITHM [ Previous | Top | Next ]

FrameSearch aligns the query sequence to each sequence in the search set. The alignment procedure is an extension of the local alignment algorithm of Smith and Waterman (Advances in Applied Mathematics 2; 482-489 (1981)) that is modified to determine the score of the best segment of similarity between a protein sequence and the codons in a nucleotide sequence.

Scoring Matrix

To create the alignments, FrameSearch requires a scoring matrix that contains values for matches between all possible amino acids and codons. FrameSearch derives this amino acid - codon scoring matrix on the fly from a translation table and an amino acid substitution matrix. The translation table contains a list of all possible codons for each amino acid. The amino acid substitution matrix contains match values for the comparison of all possible amino acids.

In the derived amino acid - codon scoring matrix, the value of a match between any amino acid and any codon is the value of the match between the amino acid and the translated codon in the amino acid substitution matrix. If a codon contains IUB nucleotide ambiguity symbols (described in Appendix III), and all possible unambiguous representations of the codon translate to the same amino acid (e.g. MGR always translates to arginine in the standard genetic code), then the value of a match between that codon and any amino acid can be similarly determined. If all possible unambiguous representations of the codon do not translate to the same amino acid, then the value of a match between that codon and any amino acid is 0.

FrameSearch chooses default gap creation and extension penalties that are appropriate for the scoring matrix it reads. If you select a different scoring matrix with the -MATRix command-line parameter, the program will adjust the default gap penalties accordingly. (See Appendix VII for information about how to set the default gap penalties for any scoring matrix.) You can respond to the program prompts or use -GAPweight and -LENgthweight to specify alternative gap penalties if you don't want to accept the default values.

Protein-Nucleotide Alignment

FrameSearch uses the values in the amino acid - codon scoring matrix to determine the score of the best alignment between the protein and nucleotide sequences. If you consider a graph, or path matrix, with the nucleotide sequence placed on the X axis and the protein sequence placed on the Y axis, then every point on the path matrix represents the best alignment between the sequences that ends at that point. For any point on the path matrix, the X coordinate is the first nucleotide of the final codon in the alignment, and the Y coordinate is the final amino acid in the alignment. Each possible alignment end point is associated with a path, which is a series of steps (insertions, deletions, matches) through the path matrix required to create the alignment. Each step has its own score, and the scores for all the steps in an alignment path determine the quality score for the alignment. The quality score for an alignment is equal to the sum of the scoring matrix values of the matches in the alignment, minus the gap creation penalty multiplied by the number of gaps in the alignment, minus the frameshift penalty multiplied by the number of gaps in the alignment that change the reading frame, minus the gap extension penalty multiplied by the total length of all gaps in the alignment. (You can set the value for each of the penalties.)


quality = SUM(scoring matrix values of the matches in the alignment) -
          gap creation penalty  x  number of gaps in the alignment -
          frameshift penalty    x  number of gaps in the alignment
                                   that change the reading frame -
          gap extension penalty x  total length of all gaps
                                   in the alignment

For example, the following protein-nucleotide alignment consists of six steps:


       1 UGUUGUAUUCG....UGGUGG 17
         ||||||:::      ||||||
       1 CysCysValGlnIleTrpTrp 7

The first two steps are UGU-Cys matches. The third step is an AUU-Val match. The fourth step is a four nucleotide deletion. The last two steps are UGG-Trp matches. The quality score for this alignment is the sum of the scoring matrix values for two UGU-Cys matches, one AUU-Val match, and two UGG-Trp matches, minus one gap creation penalty, minus four gap extension penalties, minus one frameshift penalty.

Matches between an amino acid and a partial codon, like

CG.

Gln

in the above example, do not add any match value to the alignment score. By convention, all gap characters in partial codons are placed at the end of the codon. For example, the partial codon CG. in the above example will never be written asC.G If the best alignment ending at any point has a negative value, a zero is put at that position of the path matrix; otherwise, the quality score for the alignment is put at that position. After the path matrix is completely filled, the highest value in the matrix represents the score of the best region of similarity between the sequences (optimal local alignment). This highest value is reported as the comparison score between the nucleotide and protein sequences. The alignment itself can be reconstructed for display by following the best path from this point of highest value backward to the point where the path matrix has a value of zero.

ALIGNMENT METRICS [ Previous | Top | Next ]

Four figures of merit are displayed along with the optimal alignments between the query sequence and the top-scoring search sequences: Quality, Ratio, Identity, and Similarity.

The Quality score (described above in the ALGORITHM topic) is the measure that is maximized in order to align the sequences. Ratio is the Quality divided by the smaller of one-third the number of bases in the alignment and the number of amino acids in the alignment. Gap symbols are ignored in the calculation of Ratio. Identity is the percent of identical matches between amino acids and codons in the alignment (i.e. the amino acid is identical to the translated codon). Similarity is the percent of matches between amino acids and codons in the alignment whose comparison values exceed the similarity threshold. By default, this threshold is the average positive non-identical comparison value in the scoring matrix. FrameSearch uses this same threshold to decide when to put a colon (:) between an aligned codon and amino acid in the alignment display. You can reset this threshold with the -PAIr command-line parameter.

CONSIDERATIONS [ Previous | Top | Next ]

FrameSearch displays the alignments between each query sequence and the top-scoring sequences in the search set. If the program cannot gain access to enough computer memory to display the alignments, the program stops after listing the top-scoring sequences in the output file.

FrameSearch can take several hours to search the protein database for sequences similar to the translation product of a single nucleotide query sequence (see the SUGGESTIONS topic for details).

SUGGESTIONS [ Previous | Top | Next ]

Searching Only the Top Strand of Nucleotide Sequences

By default, FrameSearch searches both strands of nucleotide sequences. If your nucleotide query sequence is known to represent the coding strand, you can use the -ONEstrand command-line parameter to search using only the top strand of the query sequence. This reduces the time required to search the protein database by 50 percent. If you are searching a nucleotide sequence database for similarity to a protein query sequence, -ONEstrand will search only the top strand of each sequence in the database.

Global Similarity

By default, FrameSearch uses a local alignment algorithm to determine the best segment of similarity between the query sequence and each sequence in the search set (see the ALGORITHM topic for details). If you specify -GLObal on the command line, FrameSearch uses a global alignment procedure to determine similarity between the entire length of each query sequence and the entire length of each sequence in the search set.

Nucleotide Sequences Using Nonstandard Genetic Codes

If the nucleotide sequence(s) involved in the search are from an organism or organelle that uses a nonstandard genetic code, then you should specify an appropriate translation table using the -TRANSlate command-line parameter. Different translation tables are discussed in Appendix VII.

Batch Queue and Execution Speed

FrameSearch may take a considerable amount of time to run. For instance, a search of the SWISS-PROT protein database (Release 33.0, containing 52,205 sequence entries comprising 18,531,384 total amino acids) with a 286-base nucleotide query sequence took about 3 hours of CPU time on a DEC 3000/500. It would take twice as long if you either doubled the size of the query sequence or the database. Very large comparisons may exceed the CPU limit set by some systems.

Because of the extensive search time, you should probably run most searches in the batch queue. You can specify that this program run at a later time in the batch queue by using the command-line parameter -BATch. Run this way, the program prompts you for all the required parameters and then automatically submits itself to the batch or at queue. For more information, see "Using the Batch Queue" in Chapter 3, Using Programs in the User's Guide.

If you specify a non-zero frameshift penalty in response to the program prompt, FrameSearch takes about 40% longer to complete a search than if you accept the default frameshift penalty of 0. Our experience using the default search parameters suggests that specifying a non-zero frameshift penalty does not significantly improve the search results.

Interrupting a Search: <Ctrl>C

You can type <Ctrl>C to interrupt a search and see the results from the part of the search that has already been completed. Once you've interrupted a search, you cannot resume it.

GRAPHICS [ Previous | Top | Next ]

The Wisconsin Package must be configured for graphics before you run any program with graphics output! If the % setplot command is available in your installation, this is the easiest way to establish your graphics configuration, but you can also use commands like % postscript that correspond to the graphics languages the Wisconsin Package supports. See Chapter 5, Using Graphics in the User's Guide for more information about configuring your process for graphics.

<CTRL>C [ Previous | Top | Next ]

If you need to stop this program, use <Ctrl>C to reset your terminal and session as gracefully as possible. Searches and comparisons write out the results from the part of the search that is complete when you use <Ctrl>C. The graphics device should stop plotting the current page and start plotting the next page. If the current page is the last page, plotters should put the pen away and graphic terminals should return to interactive mode.

COMMAND-LINE SUMMARY [ Previous | Top | Next ]

All parameters for this program may be put on the command line. Use the parameter -CHEck to see the summary below and to have a chance to add things to the command line before the program executes. In the summary below, the capitalized letters in the parameter names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose parameter values that are optional. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.


Minimal Syntax: % framesearch [-INfile1=]EST:Atts0012 -Default

Prompted Parameters:

-BEGin1=1 -END1=286              range of interest for a single
                                   query sequence
[-INfile2]=SwissProt:*           search set
-GAPweight=12                    gap creation penalty
-LENgthweight=4                  gap extension penalty
-FRAmeweight=0                   frameshift gap penalty
[-OUTfile]=atts0012.framesearch  output file name

Local Data Files: -MATRix=blosum62.cmp      amino acid substitution matrix
                  -TRANSlate=translate.txt  contains the genetic code

Optional Parameters:

-BEGin1=1 -END1=100   range of interest for each query sequence
-ONEstrand            searches only the top strand of nucleotide sequences
-LIStsize=40          number of scores to show
-ALIgn=40             number of alignments to show
                        (-NOALIgn suppresses alignments)
-GLObal               searches by global alignment
  -ENDWeight          penalizes end gaps in global alignments like
                        other gaps
-HIGhroad             among equally optimal alignments, shows one
                        with maximum gaps in protein sequence
-LOWroad              among equally optimal alignments, shows one
                        with maximum gaps in nucleotide sequence
-LINesize=70          length of documentation for each sequence in the
                        output list
-PAIr=x,2,1           thresholds for displaying '|', ':', and '.'
-WIDth=50             the number of sequence symbols per line
-PAGe=60              adds a line with a form feed every 60 lines
-NOBIGGaps            suppresses abbreviation of large gaps with '.'s
-NOPLOt               suppresses the plot of the search score distribution
-BATch                submits program to the batch queue
-NOMonitor            suppresses the screen trace of program progress
-NOSUMmary            suppresses the screen summary

All GCG graphics programs accept these and other switches. See the Using
Graphics chapter of the USERS GUIDE for descriptions.

-FIGure[=FileName]  stores plot in a file for later input to FIGURE
-FONT=3             draws all text on the plot using font 3
-COLor=1            draws entire plot with pen in stall 1
-SCAle=1.2          enlarges the plot by 20 percent (zoom in)
-XPAN=10.0          moves plot to the right 10 platen units (pan right)
-YPAN=10.0          moves plot up 10 platen units (pan up)
-PORtrait           rotates plot 90 degrees

ACKNOWLEDGEMENT [ Previous | Top | Next ]

FrameSearch was written by Irv Edelman.

LOCAL DATA FILES [ Previous | Top | Next ]

The files described below supply auxiliary data to this program. The program automatically reads them from a public data directory unless you either 1) have a data file with exactly the same name in your current working directory; or 2) name a file on the command line with an expression like -DATa1=myfile.dat. For more information see Chapter 4, Using Data Files in the User's Guide.

Local Scoring Matrices

This program reads one or more scoring matrices for the comparison of sequence characters. The program automatically reads the program default scoring matrix file in a public data directory unless you either 1) have a data file with exactly the same name as the program default scoring matrix in your current working directory; or 2) have a data file with exactly the same name as the program default scoring matrix in the directory with the logical name MyData; or 3) name a file on the command line with an expression like -MATRix=mymatrix.cmp. If you don't include a directory specification when you name a file on the command line with -MATRix, the program searches for the file first in your local directory, then in the directory with the logical name MyData, then in the public data directory with the logical name GenMoreData, and finally in the public data directory with the logical name GenRunData. For more information see "Using a Special Kind of Data File: A Scoring Matrix" in Chapter 4, Using Data Files in the User's Guide.

FrameSearch creates a scoring matrix on the fly that contains values for matches between all possible amino acids and all possible codons. (See the ALGORITHM topic for details.) FrameSearch creates this amino acid - codon scoring matrix from a translation table and an amino acid substitution matrix. The translation table, containing a list of all possible codons for each amino acid, is defined in the file translate.txt. If the standard genetic code does not apply to your sequence, you can provide a modified version of this file with exactly the same name in your working directory or name an alternative file on the command line with an expression like -TRANSlate=mycode.txt. The amino acid substitution matrix, containing match values for the comparison of all possible amino acids, is defined in the file blosum62.cmp. This matrix is a copy of the BLOSUM62 scoring matrix described by Henikoff and Henikoff (Proc. Natl. Acad. Sci. USA 89; 10915-10919 (1992)). You can use the Fetch program to copy this file to your local directory and modify the match values to suit your own needs. (See Appendix VII for more information about translation tables and scoring matrices.)

OPTIONAL PARAMETERS [ Previous | Top | Next ]

The parameters listed below can be set from the command line. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.

-MATRix=mymatrix.cmp

allows you to specify a scoring matrix file name other than the program default. If you don't include a directory specification when you name a file on the command line with -MATRix, the program searches for the file first in your local directory, then in the directory with the logical name MyData, then in the public data directory with the logical name GenMoreData, and finally in the public data directory with the logical name GenRunData. For more information see the Local Scoring Matrices topic above.

-TRANSlate=filename.txt

Usually, translation is based on the translation table in a default or local data file called translate.txt. This parameter allows you to use a translation table in a different file. (See Appendix VII for information about translation tables.)

-BEGin=1

sets the beginning position for all query sequences. When the beginning position is set from the command line, FrameSearch ignores beginning positions specified for individual sequences in a list file.

-END=100

sets the ending position for all query sequences. When the ending position is set from the command line, FrameSearch ignores ending positions specified for individual sequences in a list file.

-ONEstrand

uses only the top strand of nucleotide sequences in searches.

-LIStsize=40

sets the number of top-scoring entries to save in the output list.

-ALIgn=40

sets the number of top-scoring sequence alignments to display in the output file.

Use -NOALIgn to suppress the sequence alignments.

-GLObal

aligns the entire lengths of the nucleotide and protein sequences (global alignment). By default, FrameSearch determines a local alignment of the best region of similarity between the protein sequence and the codons in the nucleotide sequence.

-ENDWweight: penalizes gaps placed before the beginning of a sequence and after the end of a sequence the same as gaps inserted within a sequence. By default, gaps placed at the very ends of sequences in global alignments are not penalized at all.

-HIGhroad

displays the optimal alignment with the maximal number of gaps in the protein sequence when several equally optimal alignments are possible.

-LOWroad

displays the optimal alignment with the maximal number of gaps in the nucleotide sequence when several equally optimal alignments are possible.

-LINesize=70

sets the length of documentation for each sequence in the output list.

-PAIr=4,2,1

changes the thresholds for the display of sequence similarity in the alignment output.

In the program output, the paired alignment displays sequence similarity by printing one of three characters between similar sequence symbols: a pipe character (|), a colon (:), or a period (.). Normally, a pipe character is put between a codon and an amino acid when the translated codon is identical to the amino acid. A colon is put between a codon and an amino acid when the comparison value between the translated codon and the amino acid is greater than or equal to the average positive non-identical comparison value in the amino acid substitution matrix. A period is put between a codon and an amino acid when the comparison value between the translated codon and the amino acid is greater than 1.

The three parameter values for-PAIr are the display thresholds for the pipe character, colon, and period, respectively. By default, a pipe character is inserted between identical sequence symbols. If you specify a numerical threshold as the first value, a pipe character will no longer be inserted between identical symbols unless their comparison value is greater than or equal to this threshold. If you want to specify a threshold for the display of colons and periods, but you still want a pipe character to connect identical symbols, usex instead of a number as the first value. (See Appendix VII for more information about comparison values in scoring matrices.)

-WIDth=50

sets the number of sequence symbols on each line of the alignment display.

-PAGe=60

adds form feeds to the output file so that each alignment begins at the top of a new page. Also, a form feed is added after every 60 lines of each alignment output. You can change the number of lines per page for each alignment display by specifying a number after the -PAGe parameter.

-NOBIGGaps

Normally, if one of the sequences is aligned opposite gap characters for one or more complete lines of the alignment, then that portion of the alignment is abbreviated with three dots arranged in a vertical line.-NOBIGGaps displays the entire alignment without abbreviation.

-NOPLOt

suppresses the histogram plot of the search score distribution.

-BATch

submits the program to the batch queue for processing after prompting you for all required user inputs. Any information that would normally appear on the screen while the program is running is written into a log file. Whether that log file is deleted, printed, or saved to your current directory depends on how your system manager has set up the command that submits this program to the batch queue. All output files are written to your current directory, unless you direct the output to another directory when you specify the output file.

-MONitor=100

monitors this program's progress on your screen. Use this parameter to see this same monitor in the log file for a batch process. If the monitor is slowing down the program because your terminal is connected to a slow modem, suppress it with-NOMONitor.

The monitor is updated every time the program processes 100 sequences or files. You can use a value after the parameter to set this monitoring interval to some other number.

-SUMmary

writes a summary of the program's work to the screen when you've used the -Default parameter to suppress all program interaction. A summary typically displays at the end of a program run interactively. You can suppress the summary for a program run interactively with -NOSUMmary.

You can also use this parameter to cause a summary of the program's work to be written in the log file of a program run in batch.

The parameters below apply to all GCG graphics programs. These and many others are described in detail in Chapter 5, Using Graphics of the User's Guide.

-FIGure=programname.figure

writes the plot as a text file of plotting instructions suitable for input to the Figure program instead of drawing the plot on your plotter.

-FONT=3

draws all text characters on the plot using Font 3 (see Appendix I).

-COLor=1

draws the entire plot with the pen in stall 1.

The parameters below let you expand or reduce the plot (zoom), move it in either direction (pan), or rotate it 90 degrees (rotate).

-SCAle=1.2

expands the plot by 20 percent by resetting the scaling factor (normally 1.0) to 1.2 (zoom in). You can expand the axes independently with -XSCAle and -YSCAle. Numbers less than 1.0 contract the plot (zoom out).

-XPAN=30.0

moves the plot to the right by 30 platen units (pan right).

-YPAN=30.0

moves the plot up by 30 platen units (pan up).

-PORtrait

rotates the plot 90 degrees. Usually, plots are displayed with the horizontal axis longer than the vertical (landscape). Note that plots are reduced or enlarged, depending on the platen size, to fill the page.

Printed: November 18, 1996 13:05 (1162)

[ Program Manual | User's Guide | Data Files | Databases ]

Documentation Comments: doc-comments@gcg.com
Technical Support: help@gcg.com

Licenses and Trademarks Wisconsin Package is a trademark of Genetics Computer Group, Inc. GCG and the GCG logo are registered trademarks of Genetics Computer Group, Inc.

All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.

FRAMESEARCH(+)

FRAMESEARCH⁽⁺⁾