STEMLOOP

[ Program Manual | User's Guide | Data Files | Databases ]

Table of Contents
FUNCTION
DESCRIPTION
EXAMPLE
OUTPUT
INPUT FILES
RELATED PROGRAMS
RESTRICTIONS
ALGORITHM
CONSIDERATIONS
COMMAND-LINE SUMMARY
LOCAL DATA FILES
OPTIONAL PARAMETERS

FUNCTION

[ Top | Next ]

StemLoop finds stems (inverted repeats) within a sequence. You specify the minimum stem length, minimum and maximum loop sizes, and the minimum number of bonds per stem. All loops or only the best loops can be displayed on your screen or written into a file.

DESCRIPTION

[ Previous | Top | Next ]

StemLoop searches for inverted repeats in your sequence after you choose a minimum stem length and minimum and maximum loop sizes. You must also specify a minimum number of bonds per stem with G-T, A-T, and G-C scored as 1, 2, and 3 bonds, respectively. The stems found can be sorted by position, size (stem length), or quality (number of bonds) and can be either filed or displayed on the screen. StemLoop tells you the number of stems found for your settings of minimum stem size, maximum loop size, minimum loop size, and minimum bonds per stem. If you feel there are too many stems, you may reset the parameters without reviewing the stems found or view only the best stems found. To view only the best stems, there must be more than 25 stems found and you must sort them by quality or size. (See the ALGORITHM topic below to understand precisely what StemLoop does.)

EXAMPLE

[ Previous | Top | Next ]

Here is a session using StemLoop to see the 10 best inverted repeats that have at least 21 bonds within 10 bases in the file Vi:mcvsatrn5:


% stemloop

  STEMLOOP of what sequence ?  Vi:Mcvsatrn5

                   Begin (* 1 *) ?
                 End (*   334 *) ?

  What minimum stem length (* 6 *) ?  10

  What minimum number of bonds/stem (* 20 *) ?  21

  What maximum loop size (* 20 *) ?  100

  What minimum loop size (* 3 *) ?

              just a second  ...

  There are 9 stems.  Would you like to:

        1) See the stems
        2) See the stem coordinates
        3) File the stems
        4) File the stems as points for DOTPLOT
        5) Choose new parameters,
        6) Get a different sequence

        Q)uit ?

  Please choose one (* 1 *):  3

  Sort Stems by:

        1) Position
        2) Quality
        3) Size

        Q)uit

  Please choose one (* 1 *):  2

  What should I call the output file (* mcvsatrn5.stem *) ?

  There are 9 stems.  Would you like to:

        1) See the stems,
        2) See the stem coordinates,
        3) File the stems,
        4) File the stems as points for DOTPLOT
        4) Choose new parameters,
        5) Get a different sequence

        Q)uit ?

  Please choose one (* Q *):

%

OUTPUT

[ Previous | Top | Next ]

StemLoop creates an output file if you choose to file the stems from any search; otherwise, you may view the stems on your screen. In either case, the stem is shown, as below, with the first and last coordinates of the stem at the left, and the length of the stem (size), the number of bonds in the stem (quality), and the loop size on the right. Here is part of the file mcvsatrn5.stem created by the example session above:


  STEMLOOP of: Mcvsatrn5  check: 3205  from: 1  to: 334

J02061 Cucumber mosaic virus-associated satellite RNA 5 ((1)caRNA5). 2/85
LOCUS       MCVSATRN5     334 bp ss-RNA             VRL       22-FEB-1985
DEFINITION  Cucumber mosaic virus-associated satellite RNA 5 ((1)caRNA5).
ACCESSION   J02061
NID         g331710
KEYWORDS    . . . .

 Minimum Stem: 10  Minimum bonds/stem: 21  Maximum loop size: 100
 Stems found: 9  Stems shown: 9
 Average Match: 1.80  Average Mismatch: 0.00  Nibbling Threshold:  1

                           October 14, 1996 07:59  ..

     49 TCTGTCACTCGGC  GGTGTGGGTTACCT    13, 26
        | |||||||||||
    102 ACGCAGTGAGTTG  GGCGGCATCGTCCC    28

    106 CGGACTGGGGAC  CGCTGGCTTGCGAGCTATGTCCGCT    12, 24
        | || ||||| |                           A
    180 GACTCGCCCCCG  AGTTTACTCGCGCATCACGACTCTC    51

    /////////////////////////////////////////////////////

     55 ACTCGGCGGT  GTGGGTTACCTCCCTGCTACGGCGG    10, 21
        | ||||||||                           G
    125 TCGGTCGCCA  GGGGTCAGGCTCCACGCAGTGAGTT    51

     23 GTAGAGGGGT  TATATCTACGTGAGGATCT    10, 21
        |||| |||||                     G
     81 CGTCCCTCCA  TTGGGTGTGGCGGCTCACT    39

You may choose to see only the numbers defining each stem on your screen by choosing option '2' in the first menu. This is what that screen output would look like if you choose option '2' in the first menu and then choose to sort by quality in the second menu:


        Loop     Start       End      Size     Quality
           1        49       102        13          26
           2       106       180        12          24
           3       127       210        11          23
           4       123       189        11          23
           5        57       109        11          23
           6        22        85        10          23
           7       186       297        10          21
           8        55       125        10          21
           9        23        81        10          21

StemLoop can also make an output file with points for plotting with DotPlot.

INPUT FILES

[ Previous | Top | Next ]

StemLoop accepts a single nucleotide sequence as input. If StemLoop rejects your nucleotide sequence, turn to Appendix VI to see how to change or set the type of a sequence.

RELATED PROGRAMS

[ Previous | Top | Next ]

MFold predicts optimal and suboptimal secondary structures for an RNA molecule using the most recent energy minimization method of Zuker. PlotFold displays the optimal and suboptimal secondary structures for an RNA molecule predicted by MFold. FoldRNA predicts a single optimal secondary structure for an RNA molecule by the older method of Zuker. Circles, Domes, Mountains, Squiggles, and DotPlot all make graphic secondary structure representations with the .connect output files from FoldRNA and PlotFold.

Using Compare-DotPlot to create a dot-plot of the similarities between a nucleotide sequence and its reverse-complement strand is functionally equivalent to running StemLoop. Repeat uses the same algorithm as StemLoop to find repeats that are not inverted. DotPlot shows you the output from Compare, StemLoop or FoldRNA on a surface of comparison.

RESTRICTIONS

[ Previous | Top | Next ]

StemLoop only searches for loops through a range that is equal to twice the minimum stem length, plus the maximum loop size. You may extend the search range by increasing the maximum loop size; however, the maximum range for the search may not exceed 2,000 bases. StemLoop cannot find more than 1,000 loops.

ALGORITHM

[ Previous | Top | Next ]

StemLoop uses a window and stringency match criterion in exactly the same manner as Compare. For every position in each register shift, a window set by you as the minimum stem size is moved along the sequence, and if the minimum number of bonds per stem or more are found, then a stem is recorded covering all of the bases under the window. The number of the bonds under the window at each window position is the sum of the scoring matrix values for each base pair found in the file stemloop.cmp (see the LOCAL DATA FILES topic below). Mismatches can be scored negatively, although the public data file simply scores matches with G-T, A-T, and G-C worth 1, 2, and 3, respectively. Several adjacent mismatches may be found within a long stem if there are strong matches on either side. The criterion for a stem is that the minimum number of bonds occur within a length set by you as the minimum stem length.

Stem Extension and Nibbling

Before the stems are presented, they are extended (or nibbled) from both ends so that the first base on each end participates in a bond. The criterion for a bond between pairing bases is that the value in the scoring matrix file (stemloop.cmp) for the pair is greater than or equal to the average positive non-identical comparison value in the scoring matrix. You can reset the threshold for nibbling with the command-line parameter -PAIr. You could set a pairing threshold high enough so that all stems are nibbled away!

Since stem nibbling occurs, stems shorter than the minimum stem length are commonly reported. If, on the other hand, extra pairing bases are found adjacent to the stem, the stem is extended until a pair of bases do not have a bond between them. If the nibbling function finds, after nibbling a stem, that the now enlarged loop is longer than the specified maximum loop size, the stem is not reported.

CONSIDERATIONS

[ Previous | Top | Next ]

StemLoop chooses a default minimum number of bonds per stem that is appropriate for the scoring matrix it reads. If you select a different scoring matrix with the -MATRix command-line parameter, the program will adjust the default minimum number of bonds per stem accordingly.

COMMAND-LINE SUMMARY

[ Previous | Top | Next ]

All parameters for this program may be put on the command line. Use the parameter -CHEck to see the summary below and to have a chance to add things to the command line before the program executes. In the summary below, the capitalized letters in the parameter names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose parameter values that are optional. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.


Minimal Syntax: % stemloop [-INfile=]Vi:Mcvsatrn5 -Default

Prompted Parameters:

-BEGin=1 -END=334           range of interest
-STEMlength=6               minimum stem length
-BONds=12                   minimum bonds per stem
-MINLoopsize=3              minimum loop size
-MAXLoopsize=20             maximum loop size (distance to furthest
                              inverted repeat)
-MENu1=1                    output: See stems=1, See coordinates=2,
                                    File=3, DotPlot file=4
-MENu2=1                    sort by: Position=1, Quality=2, Size=3
-MAXSTems=25                maximum number of stems to show (quality or size
                              sorts only)
[-OUTfile=]Mcvsatrn5.stem   output file name

Local Data Files:

-MATRix=stemloop.cmp        scoring matrix for finding bonds-stem

Optional Parameters:

-PAIr=1                     threshold for nibbling, match (|),
                              and point display

Note: StemLoop does not cycle through the menus repeatedly if you specify either -MENu1 or -MENu2 on the command line.

LOCAL DATA FILES

[ Previous | Top | Next ]

The files described below supply auxiliary data to this program. The program automatically reads them from a public data directory unless you either 1) have a data file with exactly the same name in your current working directory; or 2) name a file on the command line with an expression like -DATa1=myfile.dat. For more information see Chapter 4, Using Data Files in the User's Guide.

Local Scoring Matrices

This program reads one or more scoring matrices for the comparison of sequence characters. The program automatically reads the program default scoring matrix file in a public data directory unless you either 1) have a data file with exactly the same name as the program default scoring matrix in your current working directory; or 2) have a data file with exactly the same name as the program default scoring matrix in the directory with the logical name MyData; or 3) name a file on the command line with an expression like -MATRix=mymatrix.cmp. If you don't include a directory specification when you name a file on the command line with -MATRix, the program searches for the file first in your local directory, then in the directory with the logical name MyData, then in the public data directory with the logical name GenMoreData, and finally in the public data directory with the logical name GenRunData. For more information see "Using a Special Kind of Data File: A Scoring Matrix" in Chapter 4, Using Data Files in the User's Guide.

StemLoop uses a scoring matrix of the kind described in Appendix VII to find the number of bonds between any possible pair of bases. Every non-zero value is defined in the scoring matrix. StemLoop reads the scoring matrix file stemloop.cmp in your local directory, or if it fails to find such a file there, it uses the public file of the same name. The file can be customized so that any score, positive or negative, can be assigned to any possible pair of bases (GCG symbols). You can get the public file with % fetch stemloop.cmp. The values in the file assign G-T, A-T, and G-C to 1, 2, and 3 respectively, with all other pairs valued at zero. A more realistic set of values might assign some negative score to the mismatches, especially purine-purine pairs. This would make the output sorted by quality more significant.

OPTIONAL PARAMETERS

[ Previous | Top | Next ]

The parameters listed below can be set from the command line. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.

-MATRix=mymatrix.cmp

allows you to specify a scoring matrix file name other than the program default. If you don't include a directory specification when you name a file on the command line with -MATRix, the program searches for the file first in your local directory, then in the directory with the logical name MyData, then in the public data directory with the logical name GenMoreData, and finally in the public data directory with the logical name GenRunData. For more information see the Local Scoring Matrices topic above.

-PAIr=1

The output from this program has a '|' (vertical bar) between sequence symbols that match. This match display character is added to the output whenever the symbol comparison value for the two symbols in your scoring matrix is greater than or equal to the average positive non-identical comparison value in the matrix. The -PAIr parameter lets you specify a match display threshold appropriate for the scoring matrix you are using.

Stem structure nibbling also uses the threshold value set by this command-line parameter to decide what pairs should be nibbled away from the structure. You can set a pairing threshold high enough so that all stems are nibbled away!

Printed: November 18, 1996 13:07 (1162)

[ Program Manual | User's Guide | Data Files | Databases ]


Documentation Comments: doc-comments@gcg.com
Technical Support: help@gcg.com

Copyright (c) 1982, 1983, 1985, 1986, 1987, 1989, 1991, 1994, 1995, 1996, 1997 Genetics Computer Group, Inc. a wholly owned subsidiary of Oxford Molecular Group, Inc. All rights reserved.

Licenses and Trademarks Wisconsin Package is a trademark of Genetics Computer Group, Inc. GCG and the GCG logo are registered trademarks of Genetics Computer Group, Inc.

All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.

Genetics Computer Group

www.gcg.com