FOLDRNA

[ Program Manual | User's Guide | Data Files | Databases ]

Table of Contents
FUNCTION
DESCRIPTION
COPYRIGHT
EXAMPLE
OUTPUT TEXT
OUTPUT DATA
INPUT FILES
RELATED PROGRAMS
RESTRICTIONS
BATCH QUEUE
FOLDING CONSTRAINTS
FOLDING FRAGMENTS
COMMAND-LINE SUMMARY
LOCAL DATA FILES
OPTIONAL PARAMETERS

FUNCTION

[ Top | Next ]

FoldRNA predicts a single optimal secondary structure for an RNA molecule by the older method of Zuker.

DESCRIPTION

[ Previous | Top | Next ]

FoldRNA finds a secondary structure of minimum free energy for an RNA molecule based on published values of stacking and loop destabilizing energies. FoldRNA is the program of Michael Zuker (Methods in Enzymology , 180, 262-288(1989). The energies used by Zuker's program were first described by Winston Salser (CSHSQB 42; 985) and are now defined by Turner (Freier et al., Proc. Natl. Acad. Sci. USA 83: 9373-9377 (1986)).

You should be aware of the limitations of energy minimizing algorithms in predicting real secondary structures. The structure reported in the output file is only one of a family of structures that have the same or nearly the same energy. The number of structures that have similar energies to the optimal structure reported by FoldRNA may be very large when several hundred bases are folded or when the secondary structure is not strong.

COPYRIGHT

[ Previous | Top | Next ]

GCG is allowed to distribute a GCG-compatible implementation of FoldRNA under a license agreement with the National Research Council of Canada, Institute for Biological Sciences, Ottawa, Canada, K1A 0R6 (613)-993-4830. The copyright to FoldRNA, however, belongs to the Government of Canada. If you use FoldRNA for published research, cite Dr. Zuker's Nucleic Acids Research paper. Any communication of the FoldRNA program must be approved by the National Research Council of Canada. FoldRNA was adapted to work with the Wisconsin Package at the University of Wisconsin by Yonah Karp.

EXAMPLE

[ Previous | Top | Next ]

Here is a session using FoldRNA to predict an optimal secondary structure for the sequence Vi:mcvsatrn5:


% foldrna

 FOLDRNA on what sequence ?  Vi:Mcvsatrn5

 What is the structure output file (* mcvsatrn5.fld *) ?

 What is the base-by-base output file (* mcvsatrn5.connect *) ?

           Begin (* 1 *) ?
           End (* 334 *) ?

%

OUTPUT TEXT

[ Previous | Top | Next ]

FoldRNA writes two output files. The base-by-base output file, mcvsatrn5.connect, can be used as input to the Squiggles, Circles, Domes, Mountains, and DotPlot programs to create graphic output. mcvsatrn5.fld is a text representation of the folded molecule that can be displayed on most terminals and printers. Here is an example of the text output:

OUTPUT DATA

[ Previous | Top | Next ]

Here is part of the base-by-base output file from the example session:


FOLDRNA of: gb_vi:mcvsatrn5 Check: 3205 from: 1 to: 334  October 13, 1996 11:22

Length: 334  Energy: -94.0
 ..
    1 G       0    2  332    1
    2 U       1    3  331    2
    3 U       2    4  330    3

  ////////////////////////////

  332 C     331  333    1  332
  333 C     332  334    0  333
  334 C     333    0    0  334

INPUT FILES

[ Previous | Top | Next ]

FoldRNA accepts a single nucleotide sequence as input. If FoldRNA rejects your nucleotide sequence, turn to Appendix VI to see how to change or set the type of a sequence.

RELATED PROGRAMS

[ Previous | Top | Next ]

MFold predicts optimal and suboptimal secondary structures for an RNA molecule using the most recent energy minimization method of Zuker. PlotFold displays the optimal and suboptimal secondary structures for an RNA molecule predicted by MFold. FoldRNA predicts a single optimal secondary structure for an RNA molecule by the older method of Zuker. Circles, Domes, Mountains, Squiggles, and DotPlot all make graphic secondary structure representations with the .connect output file from FoldRNA and PlotFold.

The RNA secondary structure prediction algorithm and the folding energies used by MFold are more refined than the algorithm and energies used by FoldRNA. You cannot use the MFold energy files (see the LOCAL DATA FILES topic, below) with FoldRNA.

StemLoop finds all possible stems (inverted repeats) above some minimum quality that you can set, but StemLoop cannot recognize a structure with gaps (bulge loops or uneven bifurcation loops). The stems can be plotted with DotPlot.

RESTRICTIONS

[ Previous | Top | Next ]

The maximum length for the folded range of a sequence is 2,000 bases. The original sequence from which the folded range comes may not be more than 5,000 bases long. Sequences should contain only G, A, T/U, and C.

Do not use FoldRNA with -REMOve if you want to plot the results.

FoldRNA was not written by GCG. Incompatibilities may be found that we do not know about. We would appreciate hearing about any misbehavior you experience.

The behavior of FoldRNA when folding constraints are imposed is completely unknown at this writing.

BATCH QUEUE

[ Previous | Top | Next ]

FoldRNA uses an algorithm that computes in time proportional to the cube of the sequence length. It takes a DEC 5000/300 about nine seconds of CPU time to fold 220 bases. You can predict, therefore, that 400 bases will take about six times as long or a little less than one minute of CPU time. Because of this, you might want to consider running FoldRNA in the batch queue for long sequences. You can specify that this program run at a later time in the batch queue by using the command-line parameter -BATch. Run this way, the program prompts you for all the required parameters and then automatically submits itself to the batch or at queue. For more information, see "Using the Batch Queue" in Chapter 3, Using Programs in the User's Guide. Very large RNA secondary structure predictions may exceed the CPU limit set by some systems.

FOLDING CONSTRAINTS

[ Previous | Top | Next ]

You can fold a molecule in such a way that certain bases do or do not pair.

Removing Bases

Before your fragment is folded, you can exclude regions from it by using the optional parameter -REMOve=i,j to remove bases i through j. Additional regions (up to a limit of nine) can be excised with -REMOve2=k,l ... -REMOve9=y,z.

Forcing Bases to Pair

You can insist that the folding includes a particular stem by forcing certain bases to pair to one another. To do this, specify the first base pair, between bases i and j, and the length of the helix, k, using the -FORCe1=i,j,k command-line parameter. This forces base pairs s(i)-s(j), s (i+1)-s(j-1),..., s(i+k-1)-s(j-k+1).

You can insist that a group of consecutive bases be double-stranded without specifying the pairing partner for each base. To do this, specify the first base of the forced region, i, and the length of the forced region, k, using -FORCe1=i,0,k. The 0 between i and k is necessary to tell the program that you are forcing a group of contiguous bases to be double-stranded, rather than forcing a specific helix. This forces bases s(i), s(i+1),..., s (i+k-1) to be double-stranded.

You can force up to eight additional regions to pair with -FORCe2=l,m,n ... -FORCe9=x,y,z.

The only allowable base pairs are A-T/U, G-C, and G-T/U. If you force other base pairing, the program produces undefined results.

Preventing Bases From Pairing

You can prevent a specified stem from forming in the predicted folding. To do this, specify the first base pair of the helix you want to prevent, between bases i and j, and the length of the helix, k, using the -PREVent1=i,j,k command-line parameter. This prevents the helix containing base pairs s(i)-s(j), s(i+1)-s(j-1),..., s(i+k-1)-s(j-k+1) from forming. Only a specific, single helix is prevented; the prevented bases are still free to participate in other helices.

You can prevent a group of consecutive bases from being involved in any helix, forcing them to remain single-stranded in the predicted folding. To do this, specify the first base of the single-stranded region, i, and the length of the single-stranded region, k, using -PREVent1=i,0,k. The 0 between i and k is necessary to tell the program that you are forcing a single-stranded region, rather than preventing a specific helix from forming. This will force bases s(i), s(i+1),..., s(i+k-1) to be single-stranded.

You can prevent up to eight additional regions from pairing with -PREVent2=l,m,n ... -PREVent9=x,y,z.

If you want to specify multiple regions for any folding constraint discussed above, you must number that constraint sequentially. For instance, if you want to specify two excluded regions to exclude from folding, you would need to specify -REMOve1 and -REMOve2 on the command line; specifying -REMOve1 and -REMOve3 would cause the program to recognize only the first excluded region.

FOLDING FRAGMENTS

[ Previous | Top | Next ]

You can predict the optimal secondary structure for a large molecule and then find the optimal folding for any part of the molecule by running FoldRNA on the whole molecule with -SAVe on the command line (save run) and then running FoldRNA repeatedly with -CONTinue on the command line (continuation run). The optimal folding is not recalculated for continuation runs so they are very fast.

For instance, if you are interested in what a folding of bases 75 to 129 of Mcvsatrn5 would have looked like without the rest of the molecule, you would first run FoldRNA with -SAVe=mcvsatrn5.sav on the command line. Then, run FoldRNA again with -CONTinue=mcvsatrn5.sav, and set begin and end to 75 and 129. You should get the same folding as if you had folded this region by itself.

If you run FoldRNA with the -SAVe parameter, it ignores the -REMOve, -FORCe and -PREVent parameters.

COMMAND-LINE SUMMARY

[ Previous | Top | Next ]

All parameters for this program may be put on the command line. Use the parameter -CHEck to see the summary below and to have a chance to add things to the command line before the program executes. In the summary below, the capitalized letters in the parameter names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose parameter values that are optional. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.


Minimal Syntax:  % foldrna [-INfile=]Vi:Mcvsatrn5 -Default

Prompted Parameters:

-BEGin=1 -END=334              the range of interest
[-OUTfile1=]mcvsatrn5.fld      the text representation of folding
[-OUTfile2=]mcvsatrn5.connect  the base-by-base output file

Local Data Files: -DATa=foldrna.energy    has the energy rules

Optional Parameters:

-SAVe=mcvsatrn5.sav       saves folding matrix in Mcvsatrn1.Sav
-CONTinue=mcvsatrn5.sav   use a previously saved folding matrix from
                               Mcvsatrn5.Sav
-REMOve=i,j               exclude bases i through j from folding,
                               ligating bases i-1 and j+1 together
-FORCe=i,j,k              force k consecutive base pairs, starting with
                               the base pair between i and j
-FORCe=i,0,k              force k consecutive bases, beginning with i,
                               to form base pairs
-PREVent=i,j,k            prevent k consecutive base pairs, starting
                               with the base pair between i and j
-PREVent=i,0,k            prevent k consecutive bases, beginning
                               with i, from base pairing
-BATch                    submits the program to run in the batch queue

LOCAL DATA FILES

[ Previous | Top | Next ]

The files described below supply auxiliary data to this program. The program automatically reads them from a public data directory unless you either 1) have a data file with exactly the same name in your current working directory; or 2) name a file on the command line with an expression like -DATa1=myfile.dat. For more information see Chapter 4, Using Data Files in the User's Guide.

FoldRNA reads the file foldrna.energy for the stacking and loop destabilizing energies. The public file contains the Turner energies (Freier et al., Proc. Natl. Acad. Sci. USA 83; 9373-9377 (1986)). The original energies for FoldRNA, as described by Salser (CSHSQB 42; 985), are available in the file salser.energy. The Salser rules, as modified by Tinoco (Cech et al., Proc. Natl. Acad. Sci. USA 80; 3903), are available in the file salser_cech.energy.

Unlike most GCG data files, the FoldRNA energy file is formatted; that is, the data in it must be in specific columns. You can change the numeric values but not the columns in which they are found!

OPTIONAL PARAMETERS

[ Previous | Top | Next ]

The parameters listed below can be set from the command line. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.

-SAVe=mcvsatrn5.sav

saves the matrix calculated by FoldRNA for future runs with -CONTinue (see the FOLDING FRAGMENTS topic above). The name of the file can be set by you on the command line, or FoldRNA makes up a name for the file by using the name of the input sequence for the file name and .sav for the file name extension.

-CONTinue=mcvsatrn5.sav

makes a new folding on any part of a previously folded molecule with a matrix saved in filename during an earlier run by FoldRNA (see the FOLDING FRAGMENTS topic above).

-REMOve=i,j ... -REMOve9=x,y

excludes the sequence range from base i through base j from folding, "ligating" base i-1 to j+1 before folding the molecule.

You can exclude up to 9 regions from folding in this manner by specifying sequential numbers with the -REMOve parameter (-REMOve1=i,j ... -REMOve9=y,z).

-FORCe=i,j,k ... -FORCe9=x,y,z

forces the helix that begins with the base pair between bases i and j and extends for k bases to the base pair between i+k-1 and j-k+1.

If j is 0, then the sequence of k consecutive bases, beginning with base i, is forced to be double-stranded (although the pairing partner for each base is not specified).

You can force up to 9 regions to pair by specifying sequential numbers with the -FORCe parameter (-FORCe1=l,m,n ... -FORCe9=x,y,z).

The only allowable base pairs are A-T/U, G-C, and G-T/U. Attempts to force other base pairing produce undefined results.

-PREVent=i,j,k ... -PREVent9=i,j,k

prevents the helix that begins with the base pair between bases i and j and extends for k bases to the base pair between bases i+k-1 and j-k+1.

If j is 0, then the sequence of k consecutive bases, beginning at base i is prevented from participating in any helix, forcing them to remain single-stranded.

You can prevent up to 9 regions from pairing by specifying sequential numbers with the -PREVent parameter (-PREVent1=l,m,n ... -PREVent9=x,y,z).

-BATch

submits the program to the batch queue for processing after prompting you for all required user inputs. Any information that would normally appear on the screen while the program is running is written into a log file. Whether that log file is deleted, printed, or saved to your current directory depends on how your system manager has set up the command that submits this program to the batch queue. All output files are written to your current directory, unless you direct the output to another directory when you specify the output file.

Printed: November 18, 1996 13:07 (1162)






[ Program Manual | User's Guide | Data Files | Databases ]


Documentation Comments: doc-comments@gcg.com
Technical Support: help@gcg.com

Copyright (c) 1982, 1983, 1985, 1986, 1987, 1989, 1991, 1994, 1995, 1996, 1997 Genetics Computer Group, Inc. a wholly owned subsidiary of Oxford Molecular Group, Inc. All rights reserved.

Licenses and Trademarks Wisconsin Package is a trademark of Genetics Computer Group, Inc. GCG and the GCG logo are registered trademarks of Genetics Computer Group, Inc.

All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.

Genetics Computer Group

www.gcg.com