COMPOSITION

[ Program Manual | User's Guide | Data Files | Databases ]

Table of Contents

FUNCTION

DESCRIPTION

NAMING SETS OF SEQUENCES

COMMAND-LINE SUMMARY

LOCAL DATA FILES

OPTIONAL PARAMETERS

FUNCTION [ Top | Next ]

Composition determines the composition of sequence(s). For nucleotide sequence(s), Composition also determines dinucleotide and trinucleotide content.

DESCRIPTION [ Previous | Top | Next ]

Composition measures the composition of one or a group of sequences. If you specify only one sequence, you can choose a range within the sequence. Lowercase letters are converted to uppercase and counted with their uppercase equivalents. If you specify a group of sequences, Composition displays the name of each sequence as it finishes the measurement for that sequence.

EXAMPLE [ Previous | Top | Next ]

Here is a session using Composition to measure the composition and di- and trinucleotide content for all of the bacterial sequences in GenEMBL:


% composition

  COMPOSITION on what sequence(s) ?  Bacterial:*

  What should I call the output file (* bacterial.composition *) ?

  A33344
  A33349
  A34992

  /////////

  ZSSSURNAS1
  ZSSSURNAS2
  ZYP16SRNA

  COMPOSITION complete.

        Sequences: 28,541
     Total Length: 55,156,451
         CPU time: 01:24.50
      Output file: bacterial.composition

%

OUTPUT [ Previous | Top | Next ]: Here is the output file:


 COMPOSITION of: Primate:*  July 19, 1994 15:28

 Sequences: 33,698  Total_Length: 32,144,633  CPU_Time: 37.62

                            *****

     A: 8,339,308    B: 9            C: 7,917,038    D: 21
     G: 7,954,939    H: 27           K: 32           M: 43
     N: 43,720       R: 113          S: 80           T: 7,889,081
     V: 5            W: 45           Y: 172

                          Other: 0

                          Total: 32,144,633

                            *****

     GG: 2,297,373   GA: 2,147,959   GT: 1,580,677   GC: 1,911,703
     AG: 2,402,359   AA: 2,378,202   AT: 1,799,282   AC: 1,741,330
     TG: 2,432,942   TA: 1,370,032   TT: 2,162,321   TC: 1,905,484
     CG: 806,401     CA: 2,422,350   CT: 2,332,522   CC: 2,343,636

                          Other: 76,362

                          Total: 32,110,935

                            *****

     GGG: 632,266    GGA: 672,183    GGT: 414,196    GGC: 573,039
     GAG: 678,442    GAA: 630,211    GAT: 409,841    GAC: 423,632
     GTG: 582,551    GTA: 274,209    GTT: 366,014    GTC: 354,367
     GCG: 224,924    GCA: 536,353    GCT: 553,150    GCC: 595,113

     AGG: 667,936    AGA: 710,925    AGT: 448,152    AGC: 569,432
     AAG: 640,829    AAA: 811,477    AAT: 495,798    AAC: 424,242
     ATG: 525,722    ATA: 377,278    ATT: 499,219    ATC: 393,907
     ACG: 169,295    ACA: 620,142    ACT: 461,184    ACC: 488,540

     TGG: 729,399    TGA: 593,158    TGT: 563,753    TGC: 542,692
     TAG: 270,594    TAA: 385,558    TAT: 394,363    TAC: 317,694
     TTG: 486,686    TTA: 387,384    TTT: 740,428    TTC: 542,156
     TCG: 149,275    TCA: 557,400    TCT: 611,642    TCC: 583,427

     CGG: 262,896    CGA: 165,933    CGT: 152,181    CGC: 224,224
     CAG: 805,496    CAA: 545,454    CAT: 495,156    CAC: 572,359
     CTG: 834,582    CTA: 328,964    CTT: 552,004    CTC: 611,663
     CCG: 261,470    CCA: 704,422    CCT: 702,026    CCC: 671,946

                          Other: 106,285

                          Total: 32,077,239

                            *****

RESTRICTIONS [ Previous | Top | Next ]

Unknown.

CONSIDERATIONS [ Previous | Top | Next ]

You can infer the composition of the bottom strand of a nucleic acid sequence from the composition of the top strand. The-BOTHstrands parameter measures both strands, but information is lost because G=C and A=T, and so on.

INPUT FILES [ Previous | Top | Next ]

Composition takes either a single or a multiple sequence specification. You can specify multiple sequences in a number of ways: by using a list file, for example @project.list; by using an MSF or RSF file, for example project.msf{*}; or by using a sequence specification with an asterisk (*) wildcard, for example GenEMBL:*. The function of Composition depends on whether your input sequence(s) are protein or nucleotide. Programs determine the type of a sequence by the presence of either Type: N orType: P on the last line of the text heading just above the sequence. If your sequence(s) are not the correct type, turn to Appendix VI for information on how to change or set the type of a sequence.

CodonFrequency tabulates codon frequencies for any range of a sequence in a particular reading frame, as opposed to counting all trinucleotides.

<CTRL>C [ Previous | Top | Next ]

If you need to stop this program, use <Ctrl>C to reset your terminal and session as gracefully as possible. Searches and comparisons write out the results from the part of the search that is complete when you use <Ctrl>C.

BATCH QUEUE [ Previous | Top | Next ]

You can run this program in the batch queue using a script that we supply. Use Fetch with a filename that starts with this program's name and ends with the filename extension .csh. Modify the file with any text editor so that it specifies the experiment you want to do and queue the script.

NAMING SETS OF SEQUENCES [ Previous | Top | Next ]

See the sections on specifying sequences in Chapter 2, Using Sequence Files and Databases of the User's Guide.

COMMAND-LINE SUMMARY [ Previous | Top | Next ]

All parameters for this program may be put on the command line. Use the parameter -CHEck to see the summary below and to have a chance to add things to the command line before the program executes. In the summary below, the capitalized letters in the parameter names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose parameter values that are optional. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.


Minimal Syntax: % composition [-INfile=]Bacterial:* -Default

Prompted Parameters:

-BEGin=1 -END=1000                range (for single sequences only)
[-OUTfile=]bacterial.composition  output file name

Local Data Files: None

Optional Parameters:

-BOTHstrands  determines composition of both strands of nucleic acids
-NOCOMmas     removes the commas from the numbers in the output
-NOMONitor    suppresses the screen monitor showing each sequence
-NOSUMmary    suppresses the screen summary at the end of the program

LOCAL DATA FILES [ Previous | Top | Next ]

None.

OPTIONAL PARAMETERS [ Previous | Top | Next ]

The parameters listed below can be set from the command line. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.

-BOTHstrands

measures the composition of both strands of a nucleic acid sequence.

-NOCOMmas

Composition normally displays numbers greater than 999 with commas to make them easier to read; for example, the number 1234567 would look like 1,234,567. These commas make the numbers unreadable to a computer. If you are going to use the output file from this program for input to another program, you can suppress the commas with this parameter.

-MONitor

This program normally monitors its progress on your screen. However, when you use -Default to suppress all program interaction, you also suppress the monitor. You can turn it back on with this parameter. If you are running the program in batch, the monitor will appear in the log file.

-SUMmary

writes a summary of the program's work to the screen when you've used the -Default parameter to suppress all program interaction. A summary typically displays at the end of a program run interactively. You can suppress the summary for a program run interactively with -NOSUMmary.

You can also use this parameter to cause a summary of the program's work to be written in the log file of a program run in batch.

Printed: November 18, 1996 13:06 (1162)

[ Program Manual | User's Guide | Data Files | Databases ]

Documentation Comments: doc-comments@gcg.com
Technical Support: help@gcg.com

Licenses and Trademarks Wisconsin Package is a trademark of Genetics Computer Group, Inc. GCG and the GCG logo are registered trademarks of Genetics Computer Group, Inc.

All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.