PEPTIDESORT

[ Program Manual | User's Guide | Data Files | Databases ]

Table of Contents
FUNCTION
DESCRIPTION
EXAMPLE
OUTPUT
INPUT FILES
RELATED PROGRAMS
CONSIDERATIONS
RESTRICTIONS
SELECTING ENZYMES
COMMAND-LINE SUMMARY
ACKNOWLEDGEMENT
LOCAL DATA FILES
OPTIONAL PARAMETERS

FUNCTION

[ Top | Next ]

PeptideSort shows the peptide fragments from a digest of an amino acid sequence. It sorts the peptides by weight, position, and HPLC retention at pH 2.1, and shows the composition of each peptide. It also prints a summary of the composition of the whole protein.

DESCRIPTION

[ Previous | Top | Next ]

PeptideSort cuts a peptide sequence with any or all of the proteolytic enzymes and reagents listed in the public or local data file proenzall.dat. The peptides from each digest are sorted by position, weight, and retention time in a high-pressure liquid chromatograph at pH 2.1. For each peptide in each sorting, the following data are displayed: beginning and ending positions, molecular weight, HPLC retention at pH 2.1, HPLC retention at pH 7.4, charge, number of aromatic residues, number of acidic residues, number of basic residues, number of residues containing sulfur, number of hydrophilic residues, and number of hydrophobic residues. The content, isoelectric point, and molar extinction coefficient at 280 nm of each peptide are shown with the table of peptides sorted by position. The content can be displayed in the order of expected elution from an amino acid analyzer.

EXAMPLE

[ Previous | Top | Next ]

Here is a session using PeptideSort to sort the tryptic peptides from the corn storage protein sequence in the file gzeinaa.pep:


% peptidesort

  PEPTIDESORT of what sequence ?  gzeinaa.pep

              Begin (* 1 *) ?  18
            End (*   283 *) ?  243

 Select the enzymes:  Type nothing or "*" to get all enzymes. Type "?"
 for help on which enzymes are available and how to select them.


                             Enzyme (* * *):  trypsin

  Trypsin

  "TRYPSIN" selected 1 enzyme, new total: 1.  Enzyme:

  What should I call the output file (* gzeinaa.pepsort *) ?

%

OUTPUT

[ Previous | Top | Next ]

Here is the output file:


 PEPTIDESORT of: gzeinaa.pep  check: 2106  from: 18  to: 243

Corn Storage Protein Am. Ac. (19,000, genomic)
extracted from GZEIN.SEQ, checksum 2842, row a

 With Enzymes: TRYPSIN

                         October 2, 1996 10:42  ..

               Digest with: Trypsin.  Peptides Sorted by Position

 Pos  From     To   Mol Wt  Ret2.1  Ret7.4    Chg  Aro Acid Base Sulf Phil Phob
   1    18 -   65   4991.8   173.4   167.6    0.0    3    1    1    4   21   27
   A7,C2,E1,F1,G1,I3,L8,M2,N1,P7,Q2,R1,S8,T1,V1,Y2 Iso=6.11 Ext=2800
   2    66 -  103   4117.9   160.0   146.2    1.0    1    0    1    0   16   22
   A5,F1,G1,H1,I4,L10,N1,P3,Q7,R1,S3,V1 Iso=10.53 Ext=0
   3   104 -  156   5919.7   153.6   115.4    1.0    6    0    1    0   24   29
   A11,F3,L11,N3,P3,Q14,R1,S3,V1,Y3 Iso=9.50 Ext=3840
   4   157 -  243   9608.0   364.4   291.6   -1.0   11    1    0    0   38   49
   A12,D1,F8,G3,H2,I2,L18,N4,P8,Q16,S3,T4,V3,Y3 Iso=6.50 Ext=3840

               Digest with: Trypsin.  Peptides Sorted by Weight

 Pos  From     To   Mol Wt  Ret2.1  Ret7.4    Chg  Aro Acid Base Sulf Phil Phob
   2    66 -  103   4117.9   160.0   146.2    1.0    1    0    1    0   16   22
   1    18 -   65   4991.8   173.4   167.6    0.0    3    1    1    4   21   27
   3   104 -  156   5919.7   153.6   115.4    1.0    6    0    1    0   24   29
   4   157 -  243   9608.0   364.4   291.6   -1.0   11    1    0    0   38   49

               Digest with: Trypsin.  Peptides Sorted by Retention

 Pos  From     To   Mol Wt  Ret2.1  Ret7.4    Chg  Aro Acid Base Sulf Phil Phob
   3   104 -  156   5919.7   153.6   115.4    1.0    6    0    1    0   24   29
   2    66 -  103   4117.9   160.0   146.2    1.0    1    0    1    0   16   22
   1    18 -   65   4991.8   173.4   167.6    0.0    3    1    1    4   21   27
   4   157 -  243   9608.0   364.4   291.6   -1.0   11    1    0    0   38   49

     Summary for whole sequence:

Molecular weight =   24583.35     Residues =    226
Average Residue Weight = 108.776     Charged =    1
Isoelectric point =  8.12
Extinction coefficient =  10360

Residue           Number      Mole Percent     ..

A = Ala            35           15.487
B = Asx             0            0.000
C = Cys             2            0.885
D = Asp             1            0.442
E = Glu             1            0.442
F = Phe            13            5.752
G = Gly             5            2.212
H = His             3            1.327
I = Ile             9            3.982
K = Lys             0            0.000
L = Leu            47           20.796
M = Met             2            0.885
N = Asn             9            3.982
P = Pro            21            9.292
Q = Gln            39           17.257
R = Arg             3            1.327
S = Ser            17            7.522
T = Thr             5            2.212
V = Val             6            2.655
W = Trp             0            0.000
Y = Tyr             8            3.540
Z = Glx             0            0.000

A + G              40           17.699
S + T              22            9.735
D + E               2            0.885
D + E + N +  Q     50           22.124
H + K + R           6            2.655
D + E + H + K + R   8            3.540
I + L + M + V      64           28.319
F + W + Y          21            9.292

 Enzymes that do cut:

  Trypsin

 Enzymes that do not cut:

   NONE

INPUT FILES

[ Previous | Top | Next ]

PeptideSort accepts a single protein sequence as input. If PeptideSort rejects your protein sequence, turn to Appendix VI to see how to change or set the type of a sequence.

RELATED PROGRAMS

[ Previous | Top | Next ]

PeptideMap creates a peptide map with an output format similar to the DNA restriction maps. Isoelectric plots the charge as a function of pH for any peptide sequence.

CONSIDERATIONS

[ Previous | Top | Next ]

The algorithm used by PeptideSort to estimate HPLC retention times (Meek, Proc. Natl. Acad. Sci. USA 77; 1632 (1980)) is based on the assumption that the retention of a peptide correlates to its amino acid composition. This assumption holds for peptides of up to about 20 amino acids, but steric and conformational factors can affect the retention of longer peptides. Retention times calculated by PeptideSort for peptides longer than 20 amino acids should not be considered accurate.

The formula for estimating the retention time is the sum of the retention coefficients for the amino acids in the peptide, plus the coefficients for the end groups, plus a value t0, which is the time for elution of unretained compounds. The retention time reported by PeptideSort does not include the t0 value. You will have to determine this time for your HPLC system and add it to the reported times.

Meek's paper does not report retention coefficients for cysteine, only for cystine. PeptideSort assumes that these are the same. Therefore the estimated retention time for a peptide containing cysteines may be inaccurate.

The retention times reported by PeptideSort should be regarded as estimates, since the actual retention times can vary according to the elution conditions. Meek's retention coefficients were determined empirically using a linear gradient of acetonitrile, starting at 0% at 0 min and increasing to 60% at 80 min (0.75% per min). Increasing the gradient rate to 1.5% acetonitrile per min resulted in retention times that were 70 percent of normal. Decreasing the gradient rate to 0.5% per min resulted in retention times that were 120 percent of normal. Meek also noted minor differences in relative retention rates with columns made by different manufacturers.

RESTRICTIONS

[ Previous | Top | Next ]

A digest may not produce more than 1,000 peptides. If you choose all enzymes by typing * to the prompt Select enzymes: and your protein sequence is over 500 residues long, there may be a great deal of output. Remember to delete the output file when you are finished looking at the data to free disk space.

SELECTING ENZYMES

[ Previous | Top | Next ]

The program presents you with an enzyme selection prompt that lets you enter enzymes individually or collectively. To get help with selecting enzymes, type a ? at the enzyme prompt. Here is what you see:


Select enzymes:

Type "*" to select all enzymes.
Type "**" to select all enzymes including isoschizomers.
Type individual names like "AluI" to select specific enzymes.
Type "?" to see this message and all available enzymes.
Type "??" to see the available enzymes AND their recognition sites.
Type "?A*" to see what enzymes start with "A."
Type "A*" to select all enzymes starting with "A."
Type parts of names like "Al*" to select all enzymes starting with "AL."
Type "~A*" to unselect all selected enzymes starting with "A."
Type "/*" to see what enzymes you have selected so far.
Type "#" to select no enzymes at all.

Press <Return> after each selection.
Press <Return> and nothing else to end your selections.
Spaces are allowed; upper and lower case are equivalent.

We maintain our enzyme files with a semicolon (;) character in front of all but one member of a family of isoschizomers. (Isoschizomers are restriction endonucleases with the same recognition site.) The isoschizomers beginning with a semicolon are normally not displayed by our mapping programs unless you specifically select them by name or type "**" instead of "*" at the enzyme prompt.

There is more information on enzyme files in Appendix VII.

A command-line expression like -ENZymes=AluI,EcoRII would choose AluI and EcoRII and suppress interactive enzyme selection.

COMMAND-LINE SUMMARY

[ Previous | Top | Next ]

All parameters for this program may be put on the command line. Use the parameter -CHEck to see the summary below and to have a chance to add things to the command line before the program executes. In the summary below, the capitalized letters in the parameter names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose parameter values that are optional. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.


Minimal Syntax: % peptidesort [-INfile=]gzeinaa.pep -Default

Prompted Parameters:

-BEGin=18 -END=243           range of interest
-ENZymes=*[,...]             enzymes of interest
[-OUTfile=]gzeinaa.pepsort   output file name

Local Data Files:

-DATa1=proenzall.dat     contains enzyme data
-DATa2=aminoacid.dat     contains amino acid data
-DATa3=isoelectric.dat   contains amino acid pK data
-DATa4=extinctcoef.dat   contains extinction coefficient data

Optional parameters:

-7                       sorts on HPLC retention at pH 7.4 instead of pH 2.1
-MINCuts=2               shows only enzymes that cut at least 2 times
-MAXCuts=4               shows only enzymes that cut less than 4 times
-ELUtion[=DNEQSGHRTAPYVMCILFKW]    sets the order of the composition display

ACKNOWLEDGEMENT

[ Previous | Top | Next ]

PeptideSort was written by John Devereux in the GCG laboratory. It was designed to handle several suggestions made to us by Drs. Michael Gribskov and Roland Rueckert. HPLC retention is from Meek, Proc. Natl. Acad. Sci. USA 77; 1632 (1980). Molar extinction coefficient is from Gill, S.C. and von Hippel, P.H., Anal. Biochem. 182; 319-326 (1989).

LOCAL DATA FILES

[ Previous | Top | Next ]

The files described below supply auxiliary data to this program. The program automatically reads them from a public data directory unless you either 1) have a data file with exactly the same name in your current working directory; or 2) name a file on the command line with an expression like -DATa1=myfile.dat. For more information see Chapter 4, Using Data Files in the User's Guide.

PeptideSort needs for data files that can be either local or public. proenzall.dat (see Appendix VII) contains information about the enzymes and proteolytic reagents. aminoacid.dat has information on the physical properties of the amino acids. isoelectric.dat contains pK values for the relevant amino acids. extinctcoef.dat contains extinction coefficient data for the amino acids.

OPTIONAL PARAMETERS

[ Previous | Top | Next ]

The parameters listed below can be set from the command line. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.

-7

causes PeptideSort to sort each digest on HPLC retention at pH 7.4 instead of on HPLC retention at pH 2.1 (default).

-MINCuts=n

excludes enzymes that do not cut at least n times.

-MAXCuts=n

excludes enzymes that cut more than n times.

-ELUtion=DNEQSGHRTAPYVMCILFKW

sets the order for the composition data display. If you use the -ELUtion parameter without the optional value, the order is changed from alphabetical to DNE... as expected from the Waters analyzer.

Printed: November 18, 1996 13:07 (1162)

[ Program Manual | User's Guide | Data Files | Databases ]


Documentation Comments: doc-comments@gcg.com
Technical Support: help@gcg.com

Copyright (c) 1982, 1983, 1985, 1986, 1987, 1989, 1991, 1994, 1995, 1996, 1997 Genetics Computer Group, Inc. a wholly owned subsidiary of Oxford Molecular Group, Inc. All rights reserved.

Licenses and Trademarks Wisconsin Package is a trademark of Genetics Computer Group, Inc. GCG and the GCG logo are registered trademarks of Genetics Computer Group, Inc.

All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.

Genetics Computer Group

www.gcg.com