BACKTRANSLATE

[ Program Manual | User's Guide | Data Files | Databases ]

Table of Contents
FUNCTION
DESCRIPTION
EXAMPLE
OUTPUT
INPUT FILES
RELATED PROGRAMS
RESTRICTIONS
CONSIDERATIONS
FUTURE DEVELOPMENT OF BACKTRANSLATE
COMMAND-LINE SUMMARY
LOCAL DATA FILES
OPTIONAL PARAMETERS

FUNCTION

[ Top | Next ]

BackTranslate backtranslates an amino acid sequence into a nucleotide sequence. The output helps you recognize minimally ambiguous regions that might be good for constructing synthetic probes.

DESCRIPTION

[ Previous | Top | Next ]

BackTranslate uses a peptide sequence and a codon usage file to make a table of possible backtranslations. The program can write all of the codons for each amino acid in the peptide sequence, showing the frequency of each within its synonymous group.

BackTranslate also calculates either the most probable back-translation or the fully ambiguous depiction of the nucleotide sequence. In either case, it writes the implied sequence such that the output file can be used by other Wisconsin Package programs that accept nucleotide sequence input.

If you choose one of the table of back-translations options, the codons are written in order of their preference in the codon frequency table. Below each block of synonymous codons, there is a number between 0 and 1,000; it is the product of the probabilities for the most likely codon for the next four amino acids multiplied by 1,000. The higher the number, the more likely it is that the next 12 nucleotides contain the most-preferred codons. All of the codons and their preferences are included to help you look critically at the alternative oligonucleotides that you might want to synthesize. For instance, if for one amino acid the codon preference is not strong, you could consider making a mixture that contains all of the different possibilities.

EXAMPLE

[ Previous | Top | Next ]

To make a back-translation of the ilvI protein showing all possible back-translations from amino acids one to six, using codon frequencies from the file ecohigh.cod, you would do the following:


% backtranslate

  BACKTRANSLATE what sequence ?  ilvhiaa.pep

                 Begin (* 1 *) ?
                 End (* 956 *) ?  6

  Would you like to see:

      a) table of back-translations and most probable sequence
      b) table of back-translations and most ambiguous sequence
      c) most probable sequence only
      d) most ambiguous sequence only

  Please choose one (* b *):

  Use what codon frequency file (* GenRunData:ecohigh.cod *) ?

  What should I call the output file (* ilvhiaa.seq *) ?

%

OUTPUT

[ Previous | Top | Next ]

Here is part of the output file:


!!NA_SEQUENCE 1.0
 BACKTRANSLATE of: : ilvhiaa.pep  check: 2165  from: 1  to: 6

E Coli. ilvI - ilvH (peptide)

 Using codon frequencies from:
 /package/share/9.0/gcgcore/data/rundata/ecohigh.cod
 CheckFile: 9032

Codon usage for enteric bacterial (highly expressed) genes 7/19/83

    Ser        Phe        Ser        Gln        Pro        Trp

  UCC 0.37   UUC 0.76   UCC 0.37   CAG 0.86   CCG 0.77   UGG 1.00
  UCU 0.34   UUU 0.24   UCU 0.34   CAA 0.14   CCA 0.15
  AGC 0.20              AGC 0.20              CCU 0.08
  UCG 0.04              UCG 0.04              CCC 0.00
  AGU 0.03              AGU 0.03
  UCA 0.02              UCA 0.02
  89         186        245        0          0          0

ilvhiaa.seq  Length: 18  October 17, 1996 17:08  Type: N  Check: 2929  ..

       1  WSNTTYWSNC ARCCNTGG

INPUT FILES

[ Previous | Top | Next ]

BackTranslate accepts a single protein sequence and a single codon frequency table as input. Look at the CodonFrequency program for information about how to create or modify a codon frequency file. If BackTranslate rejects your protein sequence, turn to Appendix VI to see how to change or set the type of a sequence.

RELATED PROGRAMS

[ Previous | Top | Next ]

Prime selects oligonucleotide primers for a template DNA sequence. The primers may be useful for the polymerase chain reaction (PCR) or for DNA sequencing. You can allow Prime to choose primers from the whole template or limit the choices to a particular set of primers listed in a file.

CodonFrequency tabulates codon usage from sequences and existing codon frequency tables. Composition counts trinucleotides from any set of sequences. The mapping programs can be run with the command line parameter -ALL to identify all potential restriction sites in back-translated sequences. If you run the mapping programs with the command line parameter -SILent, they will identify potential restriction sites where the introduction of the site-by-site-specific mutagenesis does not change the translation.

RESTRICTIONS

[ Previous | Top | Next ]

No checking is done to see that your codon frequency table and your translation table agree. The most ambiguous back-translated sequence comes from the translation table. The most probable back-translated sequence comes from the codon frequency table. The table of codon choices also comes from the codon frequency table.

CONSIDERATIONS

[ Previous | Top | Next ]

You should realize that the most ambiguous back-translation uses three IUB codes (see Appendix III) to represent each codon. These codes are not capable of correctly representing sets of codons where more than one of the bases is incompletely permuted. This is the case for the stop codons and for the residues with six synonymous codons. For instance, serine should back-translate into the codons TCT, TCC, TCA, TCG, AGT or AGC . These can be represented precisely as either TCN or AGY. The codon shown by BackTranslate for serine is WSX, which has eight permutations, six of which are correct and two of which are not!

FUTURE DEVELOPMENT OF BACKTRANSLATE

[ Previous | Top | Next ]

We continue to be interested in supporting back-translation to create probes. If there are features of BackTranslate that you need for probe design, please tell us.

COMMAND-LINE SUMMARY

[ Previous | Top | Next ]

All parameters for this program may be put on the command line. Use the parameter -CHEck to see the summary below and to have a chance to add things to the command line before the program executes. In the summary below, the capitalized letters in the parameter names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose parameter values that are optional. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.


Minimal Syntax: % backtranslate [-INfile1=]ilvhiaa.pep -Default

Prompted Parameters:

-BEGin=1
-END=6
-MENu=A       menu for what kind of output you want, where:

      A is for table of all back-translations and most probable sequence
      B is for table of all back-translations and most ambiguous sequence
      C is for most probable sequence only
      D is for most ambiguous sequence only

[-INfile2=]ecohigh.cod    codon frequency table
[-OUTfile=]ilvhiaa.seq    output file name

Local Data Files:

-TRANSlate=translate.txt    defines most ambiguous representation for
                              each codon family

Optional Parameters:

-WINdow=4    shows probability of the preferred codons for next "4"
               amino acids occurring together by chance

LOCAL DATA FILES

[ Previous | Top | Next ]

The files described below supply auxiliary data to this program. The program automatically reads them from a public data directory unless you either 1) have a data file with exactly the same name in your current working directory; or 2) name a file on the command line with an expression like -DATa1=myfile.dat. For more information see Chapter 4, Using Data Files in the User's Guide.

BackTranslate uses translate.txt to make the most ambiguous back-translation and to find the relation between the three-letter codes for the amino acids used in your codon frequency table and the one-letter codes used in your peptide sequence. If the standard translation table does not apply to your sequence, you can provide the file translate.txt in your current working directory or name it on the command line with an expression like -TRANSlate=mycode.txt. Translation tables are discussed in more detail in Appendix VII.

OPTIONAL PARAMETERS

[ Previous | Top | Next ]

The parameters listed below can be set from the command line. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.

-WINdow=4

BackTranslate normally displays the probability of the preferred codons for the next four amino acids occurring together by chance, based on your codon frequency table. You can set the number of codons used in this display to some number other than four with this parameter.

-TRANSlate=filename.txt

Usually, translation is based on the translation table in a default or local data file called translate.txt. This parameter allows you to use a translation table in a different file. (See Appendix VII for information about translation tables.)

Printed: November 18, 1996 13:08 (1162)

[ Program Manual | User's Guide | Data Files | Databases ]


Documentation Comments: doc-comments@gcg.com
Technical Support: help@gcg.com

Copyright (c) 1982, 1983, 1985, 1986, 1987, 1989, 1991, 1994, 1995, 1996, 1997 Genetics Computer Group, Inc. a wholly owned subsidiary of Oxford Molecular Group, Inc. All rights reserved.

Licenses and Trademarks Wisconsin Package is a trademark of Genetics Computer Group, Inc. GCG and the GCG logo are registered trademarks of Genetics Computer Group, Inc.

All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.

Genetics Computer Group

www.gcg.com