[ Program Manual | User's Guide | Data Files | Databases ]
FitConsensus uses a consensus table written by Consensus as a probe to find the best examples of the consensus in a DNA sequence. You can specify the number of fits you want to see, and FitConsensus tabulates them with their position, frame, and a statistical measure of their quality.
FitConsensus uses a consensus table, generated by the Consensus program, as a probe. The program checks all possible alignments of the top strand of your nucleotide sequence to the table and reports those with the best fits. You can select the number of fits you want to see. If there are any positions of the consensus table with values of 100%, FitConsensus allows you to choose if these are truly a necessary condition for a fit. The fits are reported in ascending order of the position in the sequence where the fit was found. The technique used by FitConsensus is discussed by Staden (Nucl. Acids Res. 12(1); 505-519 (1984)). FitConsensus cannot put gaps in the alignments of the table to the sequence.
Here is a session using FitConsensus to find the best examples of intervening sequence donor splice sites in the sequence gamma.seq:
% fitconsensus FITCONSENSUS into what sequence file ? gamma.seq Begin (* 1 *) ? End (* 11375 *) ? Using what consensus table file ? donor.csn Your consensus table has position(s) with 100% certainty, Are these necessary conditions for a fit (* Yes *) ? Show how many fits (* 40 *) ? What should I call the output file (* gamma.fit *) ? ................................................. ................................................. ................. %
FITCONSENSUS of: gamma.seq Check: 6474 from: 1 to: 11375 Human fetal beta globins G and A gamma from Shen, Slightom and Smithies, Cell 26; 191-203. Analyzed by Smithies et al. Cell 26; 345-353. Using Consensus: donor.csn !!AA_SEQUENCE 1.0 CONSENSUS from: Splice site sequences from Stephen Mount NAR 10(2) 459;472 figure 1 page 460 List-size: 40 Average quality: 38.26 October 17, 1996 14:52 .. position: 416 607 1430 1452 1764 2229 2267 2612 3120 3132 4267 frame: 2 1 2 3 3 3 2 2 3 3 1 quality: 51.42 50.75 50.33 49.08 48.50 51.75 51.42 58.25 50.83 48.00 51.25 ////////////////////////////////////////////////////////////////////////// position: 9801 10333 10420 10433 11059 11315 11334 frame: 3 1 1 2 1 2 3 quality: 48.25 48.92 51.33 47.75 51.42 48.75 48.92
FitConsensus takes the output file from Consensus as one of its input files. The other input is a single nucleotide sequence file. If FitConsensus rejects your nucleotide sequence, turn to Appendix VI to see how to change or set the type of a sequence.
The Consensus program writes a consensus table from a group of prealigned sequences into a file with the correct format for input to FitConsensus. FindPatterns finds short sequence patterns allowing ambiguity and mismatch.
ProfileMake creates a position-specific scoring table, called a profile, that quantitatively represents the information from a group of aligned sequences. The profile can then be used for database searching (ProfileSearch) or sequence alignment (ProfileGap). ProfileGap makes an optimal alignment between a profile and one or more sequences.
FitConsensus only searches the top strand of a nucleotide sequence. The statistic used is under study and probably changes as more sensitive instruments are found.
A program very similar to FitConsensus is described by Staden (Nucl. Acids Res. 12(1); 505-519(1984)). A table of the kind shown below is aligned over the input sequence in every frame. If you require that any 100%'s in your table are necessary conditions for a fit, then the alignment is first checked to see if the sequence is correct in these known positions. If the known positions test is passed, then for each alignment, the input sequence at each position is given a score equivalent to its value in the table. T in the first position would get a score of 20 in this example. An ambiguity code would get the average value for the several nucleotides it represents. R (representing A or G) in the second position would get a score of 24.5 in this example. The values for each position of an alignment are summed and divided by the size of the consensus table (12 in this case). Starting at the first position in the range of interest in your sequence, the frame cycles through these three steps repeatedly to the end of the sequence range.
Input files are written by the Consensus program. There are preassembled donor and acceptor splice site consensus tables in the public data files called donor.csn and acceptor.csn. Here is the donor.csn file:
!!AA_SEQUENCE 1.0 CONSENSUS from: Splice site sequences from Stephen Mount NAR 10(2) 459;472 figure 1 page 460 ***** %G 20 9 11 74 100 0 29 12 84 9 18 20 %A 30 40 64 9 0 0 61 67 9 16 39 24 %U 20 7 13 12 0 100 7 11 5 63 22 27 %C 30 44 11 6 0 0 2 9 2 12 20 28 Total 140 140 140 140 140 140 140 140 140 140 137 137 ***** CONSENSUS sequence to a certainty level of 75 percent at each position: Donor.Csn Length: 12 July 20, 1994 11:03 Type: N Check: 6055 .. 1 VMWKGTRRGW HH
All parameters for this program may be put on the command line. Use the parameter -CHEck to see the summary below and to have a chance to add things to the command line before the program executes. In the summary below, the capitalized letters in the parameter names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose parameter values that are optional. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.
Minimal Syntax: % fitconsensus [-INfile1=]gamma.seq \ [-INfile2=]donor.csn -Default Prompted Parameters: -BEGin=1 -END=11375 range of interest -NONECessary 100% positions do not have to match exactly -LIStsize=40 number of fits to show [-OUTfile=]gamma.fit output file name Local Data Files: None Optional Parameters: None
None.
None.
[ Program Manual | User's Guide | Data Files | Databases ]
Documentation Comments: doc-comments@gcg.com
Technical Support: help@gcg.com
Copyright (c) 1982, 1983, 1985, 1986, 1987, 1989, 1991, 1994, 1995, 1996, 1997 Genetics Computer Group, Inc. a wholly owned subsidiary of Oxford Molecular Group, Inc. All rights reserved.
Licenses and Trademarks Wisconsin Package is a trademark of Genetics Computer Group, Inc. GCG and the GCG logo are registered trademarks of Genetics Computer Group, Inc.
All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.