MFOLD

MFold is an adaptation of the mfold package by Zuker and Jaeger that has been modified to work with the Wisconsin Package^(TM). Their method uses the energy rules developed by Turner and colleagues to determine optimal and suboptimal secondary structures for an RNA molecule. (See the ACKNOWLEDGEMENTS topic for references.)

Using energy minimization criteria, any predicted "optimal" secondary structure for an RNA molecule depends on the model of RNA folding and the specific folding energies used to calculate that structure. Different optimal foldings may be calculated if the folding energies are changed even slightly. Because of uncertainties in the folding model and the folding energies, the "correct" folding may not be the "optimal" folding determined by the program. You may therefore want to view many optimal and suboptimal structures within a few percent of the minimum energy. You can use the variation among these structures to determine which regions of the secondary structure you can predict reliably. For instance, a region of the RNA molecule containing the same helix in most calculated optimal and suboptimal secondary structures may be more reliably predicted than other regions with greater variation.

MFold calculates energy matrices that determine all optimal and suboptimal secondary structures for an RNA molecule. The program writes these energy matrices to an output file. A companion program, PlotFold, reads this output file and displays a representative set of optimal and suboptimal secondary structures for the RNA molecule within any increment of the computed minimum free energy you choose. You can choose any of several different graphic representations for displaying the secondary structures in PlotFold.

EXAMPLE [ Previous | Top | Next ]

Here is a session using MFold to predict optimal and suboptimal secondary structures for an Alu consensus sequence.


% mfold

 (Linear) MFOLD what sequence ? alucons.seq

                  Begin (* 1 *) ?
                End (*   290 *) ?

 What should I call the energy matrix output file (* alucons.mfold *) ?

   Folding .........................................................

               CPU time: 01:03.52

            Output file: alucons.mfold

%

OUTPUT [ Previous | Top | Next ]

The output file produced by MFold contains the calculated energy matrices that determine all optimal and suboptimal secondary structures for the folded RNA molecule. You cannot read the output file produced by MFold. This file is read by the companion program, PlotFold, which can display any of several different graphic representations of optimal and suboptimal secondary structures for the folded RNA molecule.

INPUT FILES [ Previous | Top | Next ]

MFold accepts a single nucleotide sequence as input. If MFold rejects your nucleotide sequence, turn to Appendix VI to see how to change or set the type of a sequence.

MFold predicts optimal and suboptimal secondary structures for an RNA molecule using the most recent energy minimization method of Zuker. PlotFold displays the optimal and suboptimal secondary structures for an RNA molecule predicted by MFold. FoldRNA predicts a single optimal secondary structure for an RNA molecule by the older method of Zuker. Circles, Domes, Mountains, Squiggles, and DotPlot all make graphic secondary structure representations with the .connect output file from FoldRNA and PlotFold.

The RNA secondary structure prediction algorithm and the folding energies used by MFold are more refined than the algorithm and energies used by FoldRNA. You cannot use the MFold energy files (see the LOCAL DATA FILES topic, below) with FoldRNA.

StemLoop finds all possible stems (inverted repeats) above some minimum quality that you can set, but StemLoop cannot recognize a structure with gaps (bulge loops or uneven bifurcation loops). The stems can be plotted with DotPlot.

ALGORITHM [ Previous | Top | Next ]

The general algorithm for determining multiple optimal and suboptimal secondary structures is described by the author of the program, Dr. Michael Zuker (Science 244, 48-52 (1989)). A description of the folding parameters used in the algorithm is presented in Jaeger, Turner, and Zuker (Proc. Natl. Acad. Sci. USA, 86, 7706-7710 (1989)).

FOLDING CONSTRAINTS [ Previous | Top | Next ]

You may want to constrain the computed foldings to require specific helices and/or unpaired regions based on experimental data.

Forcing Bases to Pair

You can insist that all optimal and suboptimal foldings include a specified helix. (This is equivalent to the double force option in Zuker's original version of the program.) To do this, specify the first base pair, between bases i and j, and the length of the helix, k, using the -FORCe1=i,j,k command-line parameter. This forces base pairs s_(i)-s_(j), s _(i+1)-s_(j-1),..., s_(i+k-1)-s_(j-k+1).

You can insist that a group of consecutive bases be double-stranded without specifying the pairing partner for each base. (This is equivalent to the single force option in Zuker's original version of the program.) To do this, specify the first base of the forced region, i, and the length of the forced region, k, using -FORCe1=i,0,k. The 0 between i and k is necessary to tell the program that you are forcing a group of contiguous bases to be double-stranded, rather than forcing a specific helix. This forces bases s_(i), s_(i+1),..., s_(i+k-1) to be double-stranded.

You can force up to eight additional regions to pair with -FORCe2=l,m,n ... -FORCe9=x,y,z.

The only allowable base pairs are A-T/U, G-C, and G-T/U. If you force other base pairing, the program ignores them.

Preventing Bases from Pairing

You can prevent a specified helix from forming in all optimal and suboptimal foldings. (This is equivalent to the double prevent option in Zuker's original version of the program.) To do this, specify the first base pair of the helix you want to prevent, between bases i and j, and the length of the helix, k, using the-PREVent1=i,j,k command-line parameter. This prevents the helix containing base pairs s_(i)-s_(j), s_(i+1)-s_(j-1),..., s_(i+k-1)-s_(j-k+1) from forming. Only a specific, single helix is prevented; the prevented bases are still free to participate in other helices.

You can prevent a group of consecutive bases from being involved in any helix, forcing them to remain single-stranded in all predicted foldings. (This is equivalent to the single prevent option in Zuker's original version of the program.) To do this, specify the first base of the single-stranded region, i, and the length of the single-stranded region, k, using -PREVent1=i,0,k. The 0 between i and k is necessary to tell the program that you are forcing a single-stranded region, rather than preventing a specific helix from forming. This will force bases s_(i), s_(i+1),..., s_(i+k-1) to be single-stranded.

You can prevent up to eight additional regions from pairing with -PREVent2=l,m,n ... -PREVent9=x,y,z.

Removing Bases

You can exclude a region of the RNA molecule from folding when a secondary structure model for that region already exists. (This is equivalent to the closed excision option in Zuker's original version of the program.) To do this, specify the base pair that closes off the excluded region, between bases i and j, using the -CLOSedexcise1=i,j command-line parameter. MFold folds the remainder of the sequence, including the base pair between i and j. The only allowable base pairs are A-T/U, G-C, and G-T/U. Attempts to force other base pairing produce undefined results. You can specify up to eight additional regions for closed excisions with -CLOSedexcise2=k,l ... -CLOSedexcise9=y,z.

You can also exclude a region of the RNA molecule from participating in a secondary structure as if that region were spliced from the molecule before folding. (This is equivalent to the open excision option in Zuker's original version of the program.) To do this, specify the beginning and ending base numbers, i and j respectively, of the excluded region using the -OPENexcise1=i,j command-line parameter. The region from i to j, inclusive, is removed and base i-1 is "ligated" to j+1 before folding the molecule. You can specify up to eight additional regions for open excisions with -OPENexcise2=k,l ... -OPENexcise9=y,z.

See RESTRICTIONS for constraints in plotting the secondary structures of RNA molecules in which a region has be excluded from folding with either the -CLOSedexcise or -OPENexcise parameter.

If you want to specify multiple regions for any folding constraint discussed above, you must number that constraint sequentially. For instance, if you want to specify two excluded regions for open excisions, you would need to specify -OPENexcise1 and -OPENexcise2 on the command line; specifying -OPENexcise1 and -OPENexcise3 would cause the program to recognize only the first excluded region.

If you don't specify any folding constraints as described above, yet an optimal folding is inconsistent with the experimental data, then one of the predicted suboptimal foldings may be consistent.

RESTRICTIONS [ Previous | Top | Next ]

A maximum of 1400 bases can be folded.

Sequences should only contain the symbols A, C, G, and T/U.

If you exclude a region of the RNA molecule from folding with either the -CLOSedexcise or -OPENexcise parameter, you can display the predicted secondary structures using only the text output option in PlotFold; do not use any of the graphic plotting options of PlotFold to display the results.

MFold does not predict RNA secondary structures containing pseudoknots.

BATCH QUEUE [ Previous | Top | Next ]

MFold uses an algorithm that computes in time proportional to the cube of the folded length of sequence. It takes a DEC 5000/300 about one minute to fold 290 bases. You can predict, therefore, that 500 bases will take a little more than five times as long. Because of this, you might want to consider running MFold in the batch queue for long sequences. You can specify that this program run at a later time in the batch queue by using the command-line parameter -BATch. Run this way, the program prompts you for all the required parameters and then automatically submits itself to the batch or at queue. For more information, see "Using the Batch Queue" in Chapter 3, Using Programs in the User's Guide. Very large RNA secondary structure predictions may exceed the CPU limit set by some systems.

CONSIDERATIONS [ Previous | Top | Next ]

There are several differences between the GCG implementation of MFold and Dr. Zuker's mfold package. Dr. Zuker's lrna and crna programs, which fold linear and circular sequences, respectively, are combined into a single GCG program. By default, MFold treats the input sequence as a linear molecule. To fold a circular sequence, use the -CIRCular command-line parameter.

In Dr. Zuker's original implementation, the program takes an RNA sequence as input, computes the energy matrices, and then displays representations of optimal and suboptimal secondary structures. Dr. Zuker's program allows you the option of storing the energy matrices in a save run of the program and later displaying the secondary structures in a separate continue run. The GCG version of MFold always saves the energy matrices into an output file. A separate program, PlotFold, reads these energy matrices and displays representative secondary structures. Depending on the size of the RNA sequence, the file containing the energy matrices can be very large. For example, the output file created in the MFold example session requires approximately 0.35 megabytes of disk storage. You should consider deleting files that you no longer need.

The default energy files are used by the program to predict folding at 37^(o)C. Dr. Zuker's newtemp program allows you to generate energy files for folding RNA molecules at any temperature between 0^(o)C and 100^(o)C. The GCG version of MFold does not require separate energy files for folding at another temperature. You can specify another folding temperature by adding -TEMperature=45 to the MFold command line (to fold at 45^(o)C, for example).

In Dr. Zuker's original implementation, the symbols B, Z, H, and V/W represent, respectively, the bases A, C, G, and U that are accessible to single-strand nuclease cleavage. The GCG version of MFold does not recognize these symbols as nuclease-sensitive bases; sequences should only contain the symbols A, C, G, and T/U.

COMMAND-LINE SUMMARY [ Previous | Top | Next ]

All parameters for this program may be put on the command line. Use the parameter -CHEck to see the summary below and to have a chance to add things to the command line before the program executes. In the summary below, the capitalized letters in the parameter names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose parameter values that are optional. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.


Minimal Syntax:  % mfold [-INfile=]alucons.seq -Default

Prompted Parameters:

-BEGin=1 -END=290          range of interest
[-OUTfile=]alucons.mfold   energy matrix output file

Local Data Files:

-DATa1=dangle.mfold037     energies for single base stacking
-DATa2=loop.mfold037       destabilizing energies for internal, bulge,
                             and hairpin loops
-DATa3=stack.mfold037      energies for base stacking
-DATa4=tstack.mfold037     energies for terminal mismatched pairs in
                             interior and hairpin loops
-DATa5=tloop.mfold037      bonus energies for recognized "tetraloops"
-DATa6=miscloop.mfold037   energies for multi-branched and asymmetric
                             interior loops

Optional Parameters:

-TEMperature=37.0     folding temperature (celsius)
-CIRcular             folds a circular RNA molecule
-EXTension=mfold037   default extension for all local data files
-MAXLoopsize=30       maximum size of interior loop
-LOPsidedness=30      maximum lopsidedness of an interior loop
-FORCe=i,j,k          forces k consecutive base pairs, starting
                        with the base pair between i and j
-FORCe=i,0,k          forces k consecutive bases, beginning with i,
                        to form base pairs
-PREVent=i,j,k        prevents k consecutive bases pairs, starting
                        with the base pair between i and j
-PREVent=i,0,k        prevents k consecutive bases, beginning with i,
                        from base pairing
-CLOSedexcise=i,j     excludes bases i+1 through j-1 from folding,
                        forcing a base pair between i and j
-OPENexcise=i,j       excludes bases i through j from folding,
                        ligating bases i-1 and j+1 together
-NOMONitor            suppresses screen trace of program progress
-NOSUMmary            suppresses screen summary at the end of the
                        program
-BATch                submits program to the batch queue

ACKNOWLEDGEMENTS [ Previous | Top | Next ]

GCG is licensed to distribute MFold by the National Research Council of Canada. If you use MFold for published research, please cite Dr. Zuker's Science paper (reference below). We are very grateful to Dr. Zuker both for making his work available to GCG and for helping us incorporate his work into the Wisconsin Package.

MFold is an adaptation of the mfold package by Zuker and Jaeger (Zuker, M. (1989). Science 244, 48-52; Jaeger, J.A. Turner, D.H., and Zuker, M. (1989). Proc. Natl. Acad. Sci. USA, 86, 7706-7710; Jaeger, J.A., Turner, D.H., and Zuker, M. (1990). In Methods in Enzymology, 183, 281-306) that has been modified to work with the Wisconsin Package. Their method uses the energy rules developed by Turner and colleagues (Freier, S.M., Kierzek, R., Jaeger, J.A., Sugimoto, N., Caruthers, M.H., Neilson, T., and Turner, D.H. (1986). Proc. Natl. Acad. Sci. USA 83, 9373-9377; Turner, D.H. Sugimoto, N., Jaeger, J.A., Longfellow, C.E., Freier, S.M. and Kierzek, R. (1987). Cold Spring Harbor Symp., Quant. Biol. 52, 123-133; Turner, D.H., Sugimoto, N., and Freier, S.M. (1988). Annu. Rev. Biophys. Biophys. Chem. 17, 167-192) to determine optimal and suboptimal secondary structures for an RNA molecule.

MFold was modified to work with version 7.2 of the Wisconsin Package by Irv Edelman.

LOCAL DATA FILES [ Previous | Top | Next ]

The files described below supply auxiliary data to this program. The program automatically reads them from a public data directory unless you either 1) have a data file with exactly the same name in your current working directory; or 2) name a file on the command line with an expression like -DATa1=myfile.dat. For more information see Chapter 4, Using Data Files in the User's Guide.

MFold reads the file dangle.mfold037 for the single base stacking energies; loop.mfold037 for the internal, bulge, and hairpin loop energies; stack.mfold037 for the base stacking energies; tstack.mfold037 for the energies for mismatched pairs in interior and hairpin loops; tloop.mfold037 for the bonus energies for recognized tetraloops; and miscloop.mfold037 for the energies for multi-branched and asymmetric interior loops.

OPTIONAL PARAMETERS [ Previous | Top | Next ]

The parameters listed below can be set from the command line. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.

-TEMperature=37

lets you select the folding temperature in degrees Celsius. The default folding temperature is 37^(o).

-CIRcular

tells MFold to treat the RNA molecule as circular.

-EXTension=mfold037

selects a file name extension for all local data files.

-MAXLoopsize=30

set the maximum size for an interior or bulge loop in the predicted secondary structures. An interior loop is an unpaired region interrupting a helix, with unpaired bases on both strands of the interrupted region. A bulge loop is a loop-out in a helix involving only one of the helix strands. The size of the loop is the total number of unpaired bases in the loop.

-LOPsidedness=30

sets the maximum lopsidedness for an interior or bulge loop in the predicted secondary structures. For an interior loop, this is the maximum difference between the number of single-stranded bases on one side of the loop and the number of single-stranded bases on the other side. For a bulge loop, this is the maximum number of bases in the loop.

-FORCe1=i,j,k ... -FORCe9=x,y,z

forces the helix that begins with the base pair between bases i and j and extends for k bases to the base pair between i+k-1 and j-k+1.

If j is 0, then the sequence of k consecutive bases, beginning with base i, is forced to be double-stranded (although the pairing partner for each base is not specified).

You can force up to 9 regions to pair by specifying sequential numbers with the-FORCe parameter (-FORCe1=l,m,n ... -FORCe9=x,y,z).

The only allowable base pairs are A-T/U, G-C, and G-T/U. Attempts to force other base pairing produce undefined results.

-PREVent1=i,j,k ... -PREVent9=x,y,z

prevents the helix that begins with the base pair between bases i and j and extends for k bases to the base pair between bases i+k-1 and j-k+1.

If j is 0, then the sequence of k consecutive bases, beginning at base i is prevented from participating in any helix, forcing them to remain single-stranded.

You can prevent up to 9 regions from pairing by specifying sequential numbers with the-PREVent parameter (-PREVent1=l,m,n ... -PREVent9=x,y,z).

-CLOSedexcise1=i,j ... -CLOSedexcise9=y,z

excludes the sequence range from base i+1 through base j-1 from folding, forcing a base pair between the bases i and j.

You can exclude up to 9 regions from folding in this manner by specifying sequential numbers with the -CLOSedexcise parameter (-CLOSedexcise1=i,j ...-CLOSedexcise9=y,z).

The only allowable base pairs are A-T/U, G-C, and G-T/U. Attempts to force other base pairing produce undefined results.

-OPENexcise1=i,j ... -OPENexcise9=y,z.

excludes the sequence range from base i through base j from folding, "ligating" base i-1 to j+1 before folding the molecule.

You can exclude up to 9 regions from folding in this manner by specifying sequential numbers with the -OPENexcise parameter (-OPENexcise1=i,j ...-OPENexcise9=y,z).

-MONitor

shows the progress of MFold on your screen. Use this parameter to see this same monitor in the log file for a batch process. If the monitor is slowing down the program because your terminal is connected to a slow modem, suppress it by including -NOMONitor on the command line.

-SUMmary

writes a summary of the program's work to the screen when you've used the -Default parameter to suppress all program interaction. A summary typically displays at the end of a program run interactively. You can suppress the summary for a program run interactively with -NOSUMmary.

You can also use this parameter to cause a summary of the program's work to be written in the log file of a program run in batch.

-BATch

submits the program to the batch queue for processing after prompting you for all required user inputs. Any information that would normally appear on the screen while the program is running is written into a log file. Whether that log file is deleted, printed, or saved to your current directory depends on how your system manager has set up the command that submits this program to the batch queue. All output files are written to your current directory, unless you direct the output to another directory when you specify the output file.

Printed: November 18, 1996 13:07 (1162)

[ Program Manual | User's Guide | Data Files | Databases ]

Documentation Comments: doc-comments@gcg.com
Technical Support: help@gcg.com

Licenses and Trademarks Wisconsin Package is a trademark of Genetics Computer Group, Inc. GCG and the GCG logo are registered trademarks of Genetics Computer Group, Inc.

All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.