FITCONSENSUS

[ Program Manual | User's Guide | Data Files | Databases ]

Table of Contents

FUNCTION

DESCRIPTION

FUNCTION [ Top | Next ]

FitConsensus uses a consensus table written by Consensus as a probe to find the best examples of the consensus in a DNA sequence. You can specify the number of fits you want to see, and FitConsensus tabulates them with their position, frame, and a statistical measure of their quality.

DESCRIPTION [ Previous | Top | Next ]

FitConsensus uses a consensus table, generated by the Consensus program, as a probe. The program checks all possible alignments of the top strand of your nucleotide sequence to the table and reports those with the best fits. You can select the number of fits you want to see. If there are any positions of the consensus table with values of 100%, FitConsensus allows you to choose if these are truly a necessary condition for a fit. The fits are reported in ascending order of the position in the sequence where the fit was found. The technique used by FitConsensus is discussed by Staden (Nucl. Acids Res. 12(1); 505-519 (1984)). FitConsensus cannot put gaps in the alignments of the table to the sequence.

EXAMPLE [ Previous | Top | Next ]

Here is a session using FitConsensus to find the best examples of intervening sequence donor splice sites in the sequence gamma.seq:


% fitconsensus

  FITCONSENSUS into what sequence file ?  gamma.seq

             Begin (* 1 *) ?
           End (* 11375 *) ?

  Using what consensus table file ?  donor.csn

  Your consensus table has position(s) with 100% certainty,

  Are these necessary conditions for a fit (* Yes *) ?

  Show how many fits (* 40 *) ?

  What should I call the output file (* gamma.fit *) ?

        .................................................
        .................................................
        .................

%

OUTPUT [ Previous | Top | Next ]: Here is part of the output file:


 FITCONSENSUS of: gamma.seq  Check: 6474  from: 1  to: 11375

Human fetal beta globins G and A gamma
from Shen, Slightom and Smithies,  Cell 26; 191-203.
Analyzed by Smithies et al. Cell 26; 345-353.

 Using Consensus: donor.csn

!!AA_SEQUENCE 1.0
 CONSENSUS from:
Splice site sequences
from Stephen Mount NAR 10(2) 459;472 figure 1 page 460

 List-size: 40  Average quality: 38.26      October 17, 1996 14:52   ..

  position:   416   607  1430  1452  1764  2229  2267  2612  3120  3132  4267
     frame:     2     1     2     3     3     3     2     2     3     3     1
   quality: 51.42 50.75 50.33 49.08 48.50 51.75 51.42 58.25 50.83 48.00 51.25

   //////////////////////////////////////////////////////////////////////////

  position:  9801 10333 10420 10433 11059 11315 11334
     frame:     3     1     1     2     1     2     3
   quality: 48.25 48.92 51.33 47.75 51.42 48.75 48.92

INPUT FILES [ Previous | Top | Next ]

FitConsensus takes the output file from Consensus as one of its input files. The other input is a single nucleotide sequence file. If FitConsensus rejects your nucleotide sequence, turn to Appendix VI to see how to change or set the type of a sequence.

The Consensus program writes a consensus table from a group of prealigned sequences into a file with the correct format for input to FitConsensus. FindPatterns finds short sequence patterns allowing ambiguity and mismatch.

ProfileMake creates a position-specific scoring table, called a profile, that quantitatively represents the information from a group of aligned sequences. The profile can then be used for database searching (ProfileSearch) or sequence alignment (ProfileGap). ProfileGap makes an optimal alignment between a profile and one or more sequences.

RESTRICTIONS [ Previous | Top | Next ]

FitConsensus only searches the top strand of a nucleotide sequence. The statistic used is under study and probably changes as more sensitive instruments are found.

STATISTICS USED [ Previous | Top | Next ]

A program very similar to FitConsensus is described by Staden (Nucl. Acids Res. 12(1); 505-519(1984)). A table of the kind shown below is aligned over the input sequence in every frame. If you require that any 100%'s in your table are necessary conditions for a fit, then the alignment is first checked to see if the sequence is correct in these known positions. If the known positions test is passed, then for each alignment, the input sequence at each position is given a score equivalent to its value in the table. T in the first position would get a score of 20 in this example. An ambiguity code would get the average value for the several nucleotides it represents. R (representing A or G) in the second position would get a score of 24.5 in this example. The values for each position of an alignment are summed and divided by the size of the consensus table (12 in this case). Starting at the first position in the range of interest in your sequence, the frame cycles through these three steps repeatedly to the end of the sequence range.

INPUT FILES [ Previous | Top | Next ]

Input files are written by the Consensus program. There are preassembled donor and acceptor splice site consensus tables in the public data files called donor.csn and acceptor.csn. Here is the donor.csn file:


!!AA_SEQUENCE 1.0
 CONSENSUS from:

Splice site sequences
from Stephen Mount NAR 10(2) 459;472 figure 1 page 460

                                            *****

 %G      20    9   11   74  100    0   29   12   84    9   18   20
 %A      30   40   64    9    0    0   61   67    9   16   39   24
 %U      20    7   13   12    0  100    7   11    5   63   22   27
 %C      30   44   11    6    0    0    2    9    2   12   20   28

 Total  140  140  140  140  140  140  140  140  140  140  137  137

                                             *****

  CONSENSUS sequence to a certainty level of  75 percent at each position:

  Donor.Csn  Length: 12  July 20, 1994 11:03  Type: N  Check: 6055  ..

       1  VMWKGTRRGW HH

COMMAND-LINE SUMMARY [ Previous | Top | Next ]: All parameters for this program may be put on the command line. Use the parameter -CHEck to see the summary below and to have a chance to add things to the command line before the program executes. In the summary below, the capitalized letters in the parameter names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose parameter values that are optional. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.


Minimal Syntax: % fitconsensus [-INfile1=]gamma.seq \
                  [-INfile2=]donor.csn -Default

Prompted Parameters:

-BEGin=1 -END=11375   range of interest
-NONECessary          100% positions do not have to match exactly
-LIStsize=40          number of fits to show
[-OUTfile=]gamma.fit  output file name

Local Data Files:     None

Optional Parameters:  None

LOCAL DATA FILES [ Previous | Top | Next ]

None.

OPTIONAL PARAMETERS [ Previous | Top | Next ]

None.

Printed: November 18, 1996 13:06 (1162)

[ Program Manual | User's Guide | Data Files | Databases ]

Documentation Comments: doc-comments@gcg.com
Technical Support: help@gcg.com

Licenses and Trademarks Wisconsin Package is a trademark of Genetics Computer Group, Inc. GCG and the GCG logo are registered trademarks of Genetics Computer Group, Inc.

All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.