TESTCODE(+)

[ Program Manual | User's Guide | Data Files | Databases ]

Table of Contents
FUNCTION
DESCRIPTION
EXAMPLE
OUTPUT
INPUT FILES
RELATED PROGRAMS
RESTRICTIONS
FICKETT'S TESTCODE STATISTIC
CONSIDERATIONS
GRAPHICS
<CTRL>C
COMMAND-LINE SUMMARY
LOCAL DATA FILES
OPTIONAL PARAMETERS

FUNCTION

[ Top | Next ]

TestCode helps you identify protein coding sequences by plotting a measure of the non-randomness of the composition at every third base. The statistic does not require a codon frequency table.

DESCRIPTION

[ Previous | Top | Next ]

TestCode helps identify genes when you do not have specific knowledge of codon preferences for the DNA being examined. TestCode plots a measure of the period three constraint of each region of a DNA molecule using a statistic developed by Dr. James Fickett at Los Alamos (Nucl. Acids Res. 10(17); 5303-5318 (1982)).

The statistic is independent of the reading frame and is based on measurements of the period three compositional constraints in the entire database for regions thought to be coding and non-coding. The plot is divided into three regions for which the statistic makes predictions. For windows larger than 200 nucleotides, the top region is supposed to predict coding regions to a 95 percent level of confidence. The bottom region is supposed to predict non-coding regions to the same confidence level. The middle region is the window of vulnerability for the method where the statistic can make no significant prediction.

In the plot, there are markings above the curve that identify the potential start codons (ATG) and stop codons for each reading frame of the sequence. Starts are indicated by short vertical lines and stops by small diamonds.

EXAMPLE

[ Previous | Top | Next ]

Here is a session using TestCode to plot the TestCode statistic for the E. coli outer membrane proteins in the sequence Bacterial:EcoOmpa:


% testcode

  Plot TESTCODE for what sequence ?  Bacterial:EcoOmpa

               Begin (* 1 *) ?
             End (*  2270 *) ?
            Reverse (* No *) ?

  What window size in bp (* 200 *) ?

  The minimum density for a one page plot is: 2270.0 bases/page
  A typical density is about 3000.0 bases/page

  What density would you like (* 2270.0 *) ?

  When your LaserWriter attached to tty07 is ready, press <Return>.

%

OUTPUT

[ Previous | Top | Next ]

The plot from this session is shown in the figure at the end of this program entry.

INPUT FILES

[ Previous | Top | Next ]

TestCode accepts a single nucleotide sequence as input. If TestCode rejects your nucleotide sequence, turn to Appendix VI to see how to change or set the type of a sequence.

RELATED PROGRAMS

[ Previous | Top | Next ]

The method of Gribskov et al. (Nucl. Acids Res. 12(1); 539-549 (1984)) is available in the CodonPreference program if you have a table of codon choices. The codon-preference plot in CodonPreference differentiates the reading frames.

RESTRICTIONS

[ Previous | Top | Next ]

Unknown.

FICKETT'S TESTCODE STATISTIC

[ Previous | Top | Next ]

Fickett's TestCode statistic was described by James Fickett in Nucleic Acids Research 10(17); 5303-5318 (1982). We believe that TestCode is a formal implementation of Fickett's method.

The statistic is high when measures of compositional bias with a periodicity of three are high. The key measures of bias are simply the three measures:

Maximum(n(1), n(2), n(3)) / Minimum(n(1), n(2), n(3))

where n(1), n(2) and n(3) are the composition of each nucleotide at positions (1,4,7,...), (2,5,8,...) and (3,6,9,...). The composition is simply the number of observations of n in the window.

The path to the final TestCode statistic is quite tortuous, but there is good reason. Fickett measured the biases for the coding and noncoding sequences that were then in the database and derived an empirical statistic that would separate coding sequences from non-coding sequences. He did not take a sliding-window approach to that measurement but instead used whole coding sequences. Unfortunately, the exons of many eukaryotic coding sequences are considerably shorter than the resolution of the method. The TestCode statistic does not claim to make a significant prediction for windows of less than 200 bases.

Fickett also found that compositional constraint is characteristic of coding sequences, and his TestCode statistic takes composition into account. However, we have received two personal communications suggesting that the TestCode statistic is actually more sensitive when composition is ignored. We have done no experiments to confirm this.

CONSIDERATIONS

[ Previous | Top | Next ]

The method was designed to detect coding regions that are more than 200 bases long. Therefore, the method misses many eukaryotic coding sequences that are considerably shorter than this. The statistic is very sensitive when coding regions have strong codon preferences.

Frameshift errors in the data reduce the TestCode statistic as the window passes over them.

Plotting at a density of more than 5,000 bases per page may make a pattern difficult to read.

GRAPHICS

[ Previous | Top | Next ]

The Wisconsin Package must be configured for graphics before you run any program with graphics output! If the % setplot command is available in your installation, this is the easiest way to establish your graphics configuration, but you can also use commands like % postscript that correspond to the graphics languages the Wisconsin Package supports. See Chapter 5, Using Graphics in the User's Guide for more information about configuring your process for graphics.

<CTRL>C

[ Previous | Top | Next ]

If you need to stop this program, use <Ctrl>C to reset your terminal and session as gracefully as possible. Searches and comparisons write out the results from the part of the search that is complete when you use <Ctrl>C. The graphics device should stop plotting the current page and start plotting the next page. If the current page is the last page, plotters should put the pen away and graphic terminals should return to interactive mode.

COMMAND-LINE SUMMARY

[ Previous | Top | Next ]

All parameters for this program may be put on the command line. Use the parameter -CHEck to see the summary below and to have a chance to add things to the command line before the program executes. In the summary below, the capitalized letters in the parameter names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose parameter values that are optional. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.


Minimal Syntax: % testcode [-INfile=]Bacterial:EcoOmpa -Default

Prompted Parameters:

-BEGin=1            first base in plot
-END=2270           last base in plot
-REVerse            use the reverse strand
-WINdow=200         sets the window size
-DENsity=2270       sets the density in bp per 100 platen units

Local Data Files:

-MARk=ecoompa.mrk   marks the plot with regions of known interest

Optional Parameters:

-INCrement=3        lets you set the window slide increment
-POInts             makes points instead of a curve

All GCG graphics programs accept these and other switches. See the Using
Graphics chapter of the USERS GUIDE for descriptions.

-FIGure[=FileName]  stores plot in a file for later input to FIGURE
-FONT=3             draws all text on the plot using font 3
-COLor=1            draws entire plot with pen in stall 1
-SCAle=1.2          enlarges the plot by 20 percent (zoom in)
-XPAN=10.0          moves plot to the right 10 platen units (pan right)
-YPAN=10.0          moves plot up 10 platen units (pan up)
-PORtrait           rotates plot 90 degrees

LOCAL DATA FILES

[ Previous | Top | Next ]

The files described below supply auxiliary data to this program. The program automatically reads them from a public data directory unless you either 1) have a data file with exactly the same name in your current working directory; or 2) name a file on the command line with an expression like -DATa1=myfile.dat. For more information see Chapter 4, Using Data Files in the User's Guide.

If you are studying a sequence with known features, this program can mark the plot with small boxes showing the positions of these features. The presence of a file in your directory with the same name as your sequence and the filename extension .mrk causes the program to mark each range specified in the file. You can provide a marking file on the command line with an expression like -MARk=gamma.mrk. The file gamma.mrk contains information about the format of marking files. The figure for the example session shows marked regions.

OPTIONAL PARAMETERS

[ Previous | Top | Next ]

The parameters listed below can be set from the command line. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.

-POInts

causes TestCode to plot unconnected points instead of a continuous line.

-INCrement=3

allows you to set the distance that the window is moved after each TestCode measurement. The default is three.

-MARk=ecoompa.mrk

If you are studying a sequence with known features, this program can mark the plot with small boxes showing the positions of these features. The presence of a file in your directory with the same name as your sequence and the file name extension .mrk causes the program to mark each range specified in the file. The file gamma.mrk contains information about the format of marking files.

The parameters below apply to all GCG graphics programs. These and many others are described in detail in Chapter 5, Using Graphics of the User's Guide.

-FIGure=programname.figure

writes the plot as a text file of plotting instructions suitable for input to the Figure program instead of drawing the plot on your plotter.

-FONT=3

draws all text characters on the plot using Font 3 (see Appendix I).

-COLor=1

draws the entire plot with the pen in stall 1.

The parameters below let you expand or reduce the plot (zoom), move it in either direction (pan), or rotate it 90 degrees (rotate).

-SCAle=1.2

expands the plot by 20 percent by resetting the scaling factor (normally 1.0) to 1.2 (zoom in). You can expand the axes independently with -XSCAle and -YSCAle. Numbers less than 1.0 contract the plot (zoom out).

-XPAN=30.0

moves the plot to the right by 30 platen units (pan right).

-YPAN=30.0

moves the plot up by 30 platen units (pan up).

-PORtrait

rotates the plot 90 degrees. Usually, plots are displayed with the horizontal axis longer than the vertical (landscape). Note that plots are reduced or enlarged, depending on the platen size, to fill the page.

-DENsity=1000

sets the number of bases or amino acids per 100 platen units (PU). This is usually equivalent to the number of bases or amino acids per page. Output from different GCG graphics programs that are run at the same density can be compared by lining up the plots on a light box.

Printed: November 18, 1996 13:06 (1162)


[ Program Manual | User's Guide | Data Files | Databases ]


Documentation Comments: doc-comments@gcg.com
Technical Support: help@gcg.com

Copyright (c) 1982, 1983, 1985, 1986, 1987, 1989, 1991, 1994, 1995, 1996, 1997 Genetics Computer Group, Inc. a wholly owned subsidiary of Oxford Molecular Group, Inc. All rights reserved.

Licenses and Trademarks Wisconsin Package is a trademark of Genetics Computer Group, Inc. GCG and the GCG logo are registered trademarks of Genetics Computer Group, Inc.

All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.

Genetics Computer Group

www.gcg.com