[ Program Manual | User's Guide | Data Files | Databases ]
Composition determines the composition of sequence(s). For nucleotide sequence(s), Composition also determines dinucleotide and trinucleotide content.
Composition measures the composition of one or a group of sequences. If you specify only one sequence, you can choose a range within the sequence. Lowercase letters are converted to uppercase and counted with their uppercase equivalents. If you specify a group of sequences, Composition displays the name of each sequence as it finishes the measurement for that sequence.
Here is a session using Composition to measure the composition and di- and trinucleotide content for all of the bacterial sequences in GenEMBL:
% composition COMPOSITION on what sequence(s) ? Bacterial:* What should I call the output file (* bacterial.composition *) ? A33344 A33349 A34992 ///////// ZSSSURNAS1 ZSSSURNAS2 ZYP16SRNA COMPOSITION complete. Sequences: 28,541 Total Length: 55,156,451 CPU time: 01:24.50 Output file: bacterial.composition %
COMPOSITION of: Primate:* July 19, 1994 15:28 Sequences: 33,698 Total_Length: 32,144,633 CPU_Time: 37.62 ***** A: 8,339,308 B: 9 C: 7,917,038 D: 21 G: 7,954,939 H: 27 K: 32 M: 43 N: 43,720 R: 113 S: 80 T: 7,889,081 V: 5 W: 45 Y: 172 Other: 0 Total: 32,144,633 ***** GG: 2,297,373 GA: 2,147,959 GT: 1,580,677 GC: 1,911,703 AG: 2,402,359 AA: 2,378,202 AT: 1,799,282 AC: 1,741,330 TG: 2,432,942 TA: 1,370,032 TT: 2,162,321 TC: 1,905,484 CG: 806,401 CA: 2,422,350 CT: 2,332,522 CC: 2,343,636 Other: 76,362 Total: 32,110,935 ***** GGG: 632,266 GGA: 672,183 GGT: 414,196 GGC: 573,039 GAG: 678,442 GAA: 630,211 GAT: 409,841 GAC: 423,632 GTG: 582,551 GTA: 274,209 GTT: 366,014 GTC: 354,367 GCG: 224,924 GCA: 536,353 GCT: 553,150 GCC: 595,113 AGG: 667,936 AGA: 710,925 AGT: 448,152 AGC: 569,432 AAG: 640,829 AAA: 811,477 AAT: 495,798 AAC: 424,242 ATG: 525,722 ATA: 377,278 ATT: 499,219 ATC: 393,907 ACG: 169,295 ACA: 620,142 ACT: 461,184 ACC: 488,540 TGG: 729,399 TGA: 593,158 TGT: 563,753 TGC: 542,692 TAG: 270,594 TAA: 385,558 TAT: 394,363 TAC: 317,694 TTG: 486,686 TTA: 387,384 TTT: 740,428 TTC: 542,156 TCG: 149,275 TCA: 557,400 TCT: 611,642 TCC: 583,427 CGG: 262,896 CGA: 165,933 CGT: 152,181 CGC: 224,224 CAG: 805,496 CAA: 545,454 CAT: 495,156 CAC: 572,359 CTG: 834,582 CTA: 328,964 CTT: 552,004 CTC: 611,663 CCG: 261,470 CCA: 704,422 CCT: 702,026 CCC: 671,946 Other: 106,285 Total: 32,077,239 *****
Unknown.
You can infer the composition of the bottom strand of a nucleic acid sequence from the composition of the top strand. The -BOTHstrands parameter measures both strands, but information is lost because G=C and A=T, and so on.
Composition takes either a single or a multiple sequence specification. You can specify multiple sequences in a number of ways: by using a list file, for example @project.list; by using an MSF or RSF file, for example project.msf{*}; or by using a sequence specification with an asterisk (*) wildcard, for example GenEMBL:*. The function of Composition depends on whether your input sequence(s) are protein or nucleotide. Programs determine the type of a sequence by the presence of either Type: N or Type: P on the last line of the text heading just above the sequence. If your sequence(s) are not the correct type, turn to Appendix VI for information on how to change or set the type of a sequence.
CodonFrequency tabulates codon frequencies for any range of a sequence in a particular reading frame, as opposed to counting all trinucleotides.
If you need to stop this program, use <Ctrl>C to reset your terminal and session as gracefully as possible. Searches and comparisons write out the results from the part of the search that is complete when you use <Ctrl>C.
You can run this program in the batch queue using a script that we supply. Use Fetch with a filename that starts with this program's name and ends with the filename extension .csh. Modify the file with any text editor so that it specifies the experiment you want to do and queue the script.
See the sections on specifying sequences in Chapter 2, Using Sequence Files and Databases of the User's Guide.
All parameters for this program may be put on the command line. Use the parameter -CHEck to see the summary below and to have a chance to add things to the command line before the program executes. In the summary below, the capitalized letters in the parameter names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose parameter values that are optional. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.
Minimal Syntax: % composition [-INfile=]Bacterial:* -Default Prompted Parameters: -BEGin=1 -END=1000 range (for single sequences only) [-OUTfile=]bacterial.composition output file name Local Data Files: None Optional Parameters: -BOTHstrands determines composition of both strands of nucleic acids -NOCOMmas removes the commas from the numbers in the output -NOMONitor suppresses the screen monitor showing each sequence -NOSUMmary suppresses the screen summary at the end of the program
None.
The parameters listed below can be set from the command line. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.
measures the composition of both strands of a nucleic acid sequence.
Composition normally displays numbers greater than 999 with commas to make them easier to read; for example, the number 1234567 would look like 1,234,567. These commas make the numbers unreadable to a computer. If you are going to use the output file from this program for input to another program, you can suppress the commas with this parameter.
This program normally monitors its progress on your screen. However, when you use -Default to suppress all program interaction, you also suppress the monitor. You can turn it back on with this parameter. If you are running the program in batch, the monitor will appear in the log file.
writes a summary of the program's work to the screen when you've used the -Default parameter to suppress all program interaction. A summary typically displays at the end of a program run interactively. You can suppress the summary for a program run interactively with -NOSUMmary.
You can also use this parameter to cause a summary of the program's work to be written in the log file of a program run in batch.
[ Program Manual | User's Guide | Data Files | Databases ]
Documentation Comments: doc-comments@gcg.com
Technical Support: help@gcg.com
Copyright (c) 1982, 1983, 1985, 1986, 1987, 1989, 1991, 1994, 1995, 1996, 1997 Genetics Computer Group, Inc. a wholly owned subsidiary of Oxford Molecular Group, Inc. All rights reserved.
Licenses and Trademarks Wisconsin Package is a trademark of Genetics Computer Group, Inc. GCG and the GCG logo are registered trademarks of Genetics Computer Group, Inc.
All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.