[ Program Manual | User's Guide | Data Files | Databases ]
FoldRNA predicts a single optimal secondary structure for an RNA molecule by the older method of Zuker.
FoldRNA finds a secondary structure of minimum free energy for an RNA molecule based on published values of stacking and loop destabilizing energies. FoldRNA is the program of Michael Zuker (Methods in Enzymology , 180, 262-288(1989). The energies used by Zuker's program were first described by Winston Salser (CSHSQB 42; 985) and are now defined by Turner (Freier et al., Proc. Natl. Acad. Sci. USA 83: 9373-9377 (1986)).
You should be aware of the limitations of energy minimizing algorithms in predicting real secondary structures. The structure reported in the output file is only one of a family of structures that have the same or nearly the same energy. The number of structures that have similar energies to the optimal structure reported by FoldRNA may be very large when several hundred bases are folded or when the secondary structure is not strong.
GCG is allowed to distribute a GCG-compatible implementation of FoldRNA under a license agreement with the National Research Council of Canada, Institute for Biological Sciences, Ottawa, Canada, K1A 0R6 (613)-993-4830. The copyright to FoldRNA, however, belongs to the Government of Canada. If you use FoldRNA for published research, cite Dr. Zuker's Nucleic Acids Research paper. Any communication of the FoldRNA program must be approved by the National Research Council of Canada. FoldRNA was adapted to work with the Wisconsin Package at the University of Wisconsin by Yonah Karp.
Here is a session using FoldRNA to predict an optimal secondary structure for the sequence Vi:mcvsatrn5:
% foldrna FOLDRNA on what sequence ? Vi:Mcvsatrn5 What is the structure output file (* mcvsatrn5.fld *) ? What is the base-by-base output file (* mcvsatrn5.connect *) ? Begin (* 1 *) ? End (* 334 *) ? %
FoldRNA writes two output files. The base-by-base output file, mcvsatrn5.connect, can be used as input to the Squiggles, Circles, Domes, Mountains, and DotPlot programs to create graphic output. mcvsatrn5.fld is a text representation of the folded molecule that can be displayed on most terminals and printers. Here is an example of the text output:
Here is part of the base-by-base output file from the example session:
FOLDRNA of: gb_vi:mcvsatrn5 Check: 3205 from: 1 to: 334 October 13, 1996 11:22 Length: 334 Energy: -94.0 .. 1 G 0 2 332 1 2 U 1 3 331 2 3 U 2 4 330 3 //////////////////////////// 332 C 331 333 1 332 333 C 332 334 0 333 334 C 333 0 0 334
FoldRNA accepts a single nucleotide sequence as input. If FoldRNA rejects your nucleotide sequence, turn to Appendix VI to see how to change or set the type of a sequence.
MFold predicts optimal and suboptimal secondary structures for an RNA molecule using the most recent energy minimization method of Zuker. PlotFold displays the optimal and suboptimal secondary structures for an RNA molecule predicted by MFold. FoldRNA predicts a single optimal secondary structure for an RNA molecule by the older method of Zuker. Circles, Domes, Mountains, Squiggles, and DotPlot all make graphic secondary structure representations with the .connect output file from FoldRNA and PlotFold.
The RNA secondary structure prediction algorithm and the folding energies used by MFold are more refined than the algorithm and energies used by FoldRNA. You cannot use the MFold energy files (see the LOCAL DATA FILES topic, below) with FoldRNA.
StemLoop finds all possible stems (inverted repeats) above some minimum quality that you can set, but StemLoop cannot recognize a structure with gaps (bulge loops or uneven bifurcation loops). The stems can be plotted with DotPlot.
The maximum length for the folded range of a sequence is 2,000 bases. The original sequence from which the folded range comes may not be more than 5,000 bases long. Sequences should contain only G, A, T/U, and C.
Do not use FoldRNA with -REMOve if you want to plot the results.
FoldRNA was not written by GCG. Incompatibilities may be found that we do not know about. We would appreciate hearing about any misbehavior you experience.
The behavior of FoldRNA when folding constraints are imposed is completely unknown at this writing.
FoldRNA uses an algorithm that computes in time proportional to the cube of the sequence length. It takes a DEC 5000/300 about nine seconds of CPU time to fold 220 bases. You can predict, therefore, that 400 bases will take about six times as long or a little less than one minute of CPU time. Because of this, you might want to consider running FoldRNA in the batch queue for long sequences. You can specify that this program run at a later time in the batch queue by using the command-line parameter -BATch. Run this way, the program prompts you for all the required parameters and then automatically submits itself to the batch or at queue. For more information, see "Using the Batch Queue" in Chapter 3, Using Programs in the User's Guide. Very large RNA secondary structure predictions may exceed the CPU limit set by some systems.
You can fold a molecule in such a way that certain bases do or do not pair.
Before your fragment is folded, you can exclude regions from it by using the optional parameter -REMOve=i,j to remove bases i through j. Additional regions (up to a limit of nine) can be excised with -REMOve2=k,l ... -REMOve9=y,z.
You can insist that the folding includes a particular stem by forcing certain bases to pair to one another. To do this, specify the first base pair, between bases i and j, and the length of the helix, k, using the -FORCe1=i,j,k command-line parameter. This forces base pairs s(i)-s(j), s (i+1)-s(j-1),..., s(i+k-1)-s(j-k+1).
You can insist that a group of consecutive bases be double-stranded without specifying the pairing partner for each base. To do this, specify the first base of the forced region, i, and the length of the forced region, k, using -FORCe1=i,0,k. The 0 between i and k is necessary to tell the program that you are forcing a group of contiguous bases to be double-stranded, rather than forcing a specific helix. This forces bases s(i), s(i+1),..., s (i+k-1) to be double-stranded.
You can force up to eight additional regions to pair with -FORCe2=l,m,n ... -FORCe9=x,y,z.
The only allowable base pairs are A-T/U, G-C, and G-T/U. If you force other base pairing, the program produces undefined results.
You can prevent a specified
stem from forming in the
predicted folding. To do
this, specify the first base
pair of the helix
you want to prevent, between
bases i and j, and
the length of the helix,
k, using the -PREVent1=i,j,k command-line
parameter. This prevents the
helix containing base pairs s(i)-s(j),
s(i+1)-s(j-1),..., s(i+k-1)-s(j-k+1) from
forming. Only a specific,
single helix is prevented; the
prevented bases are still free
to participate in other helices.
You can prevent up to eight additional regions from pairing with -PREVent2=l,m,n ... -PREVent9=x,y,z.
If you want to specify multiple regions for any folding constraint discussed above, you must number that constraint sequentially. For instance, if you want to specify two excluded regions to exclude from folding, you would need to specify -REMOve1 and -REMOve2 on the command line; specifying -REMOve1 and -REMOve3 would cause the program to recognize only the first excluded region.
You can predict the optimal secondary structure for a large molecule and then find the optimal folding for any part of the molecule by running FoldRNA on the whole molecule with -SAVe on the command line (save run) and then running FoldRNA repeatedly with -CONTinue on the command line (continuation run). The optimal folding is not recalculated for continuation runs so they are very fast.
For instance, if you are interested in what a folding of bases 75 to 129 of Mcvsatrn5 would have looked like without the rest of the molecule, you would first run FoldRNA with -SAVe=mcvsatrn5.sav on the command line. Then, run FoldRNA again with -CONTinue=mcvsatrn5.sav, and set begin and end to 75 and 129. You should get the same folding as if you had folded this region by itself.
If you run FoldRNA with the -SAVe parameter, it ignores the -REMOve, -FORCe and -PREVent parameters.
All parameters for this program may be put on the command line. Use the parameter -CHEck to see the summary below and to have a chance to add things to the command line before the program executes. In the summary below, the capitalized letters in the parameter names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose parameter values that are optional. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.
Minimal Syntax: % foldrna [-INfile=]Vi:Mcvsatrn5 -Default Prompted Parameters: -BEGin=1 -END=334 the range of interest [-OUTfile1=]mcvsatrn5.fld the text representation of folding [-OUTfile2=]mcvsatrn5.connect the base-by-base output file Local Data Files: -DATa=foldrna.energy has the energy rules Optional Parameters: -SAVe=mcvsatrn5.sav saves folding matrix in Mcvsatrn1.Sav -CONTinue=mcvsatrn5.sav use a previously saved folding matrix from Mcvsatrn5.Sav -REMOve=i,j exclude bases i through j from folding, ligating bases i-1 and j+1 together -FORCe=i,j,k force k consecutive base pairs, starting with the base pair between i and j -FORCe=i,0,k force k consecutive bases, beginning with i, to form base pairs -PREVent=i,j,k prevent k consecutive base pairs, starting with the base pair between i and j -PREVent=i,0,k prevent k consecutive bases, beginning with i, from base pairing -BATch submits the program to run in the batch queue
The files described below supply auxiliary data to this program. The program automatically reads them from a public data directory unless you either 1) have a data file with exactly the same name in your current working directory; or 2) name a file on the command line with an expression like -DATa1=myfile.dat. For more information see Chapter 4, Using Data Files in the User's Guide.
FoldRNA reads the file foldrna.energy for the stacking and loop destabilizing energies. The public file contains the Turner energies (Freier et al., Proc. Natl. Acad. Sci. USA 83; 9373-9377 (1986)). The original energies for FoldRNA, as described by Salser (CSHSQB 42; 985), are available in the file salser.energy. The Salser rules, as modified by Tinoco (Cech et al., Proc. Natl. Acad. Sci. USA 80; 3903), are available in the file salser_cech.energy.
Unlike most GCG data files, the FoldRNA energy file is formatted; that is, the data in it must be in specific columns. You can change the numeric values but not the columns in which they are found!
The parameters listed below can be set from the command line. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.
saves the matrix calculated by FoldRNA for future runs with -CONTinue (see the FOLDING FRAGMENTS topic above). The name of the file can be set by you on the command line, or FoldRNA makes up a name for the file by using the name of the input sequence for the file name and .sav for the file name extension.
makes a new folding on any part of a previously folded molecule with a matrix saved in filename during an earlier run by FoldRNA (see the FOLDING FRAGMENTS topic above).
excludes the sequence range from base i through base j from folding, "ligating" base i-1 to j+1 before folding the molecule.
You can exclude up to 9 regions from folding in this manner by specifying sequential numbers with the -REMOve parameter (-REMOve1=i,j ... -REMOve9=y,z).
forces the helix that begins with the base pair between bases i and j and extends for k bases to the base pair between i+k-1 and j-k+1.
If j is 0, then the sequence of k consecutive bases, beginning with base i, is forced to be double-stranded (although the pairing partner for each base is not specified).
You can force up to 9 regions to pair by specifying sequential numbers with the -FORCe parameter (-FORCe1=l,m,n ... -FORCe9=x,y,z).
The only allowable base pairs are A-T/U, G-C, and G-T/U. Attempts to force other base pairing produce undefined results.
prevents the helix that begins with the base pair between bases i and j and extends for k bases to the base pair between bases i+k-1 and j-k+1.
If j is 0, then the sequence of k consecutive bases, beginning at base i is prevented from participating in any helix, forcing them to remain single-stranded.
You can prevent up to 9 regions from pairing by specifying sequential numbers with the -PREVent parameter (-PREVent1=l,m,n ... -PREVent9=x,y,z).
submits the program to the batch queue for processing after prompting you for all required user inputs. Any information that would normally appear on the screen while the program is running is written into a log file. Whether that log file is deleted, printed, or saved to your current directory depends on how your system manager has set up the command that submits this program to the batch queue. All output files are written to your current directory, unless you direct the output to another directory when you specify the output file.
[ Program Manual | User's Guide | Data Files | Databases ]
Documentation Comments: doc-comments@gcg.com
Technical Support: help@gcg.com
Copyright (c) 1982, 1983, 1985, 1986, 1987, 1989, 1991, 1994, 1995, 1996, 1997 Genetics Computer Group, Inc. a wholly owned subsidiary of Oxford Molecular Group, Inc. All rights reserved.
Licenses and Trademarks Wisconsin Package is a trademark of Genetics Computer Group, Inc. GCG and the GCG logo are registered trademarks of Genetics Computer Group, Inc.
All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.