9.0 What's New (UNIX)

[ Program Manual | User's Guide | Data Files | Databases ]

Table of Contents

New Programs

SeqLab, the Successor to WPI

Changes and Enhancements

Bug Fixes

New Programs [ Top | Next ]

The programs listed below are new to Version 9 of the Wisconsin Package^(TM).

Manipulation

Seg

Seg replaces low complexity regions in protein sequences with X characters. If a resulting protein sequence is used as a query for a BLAST search, the regions with X characters are ignored.

Xnu

Xnu replaces statistically significant tandem repeats in protein sequences with X characters. If a resulting protein sequence is used as a query for a BLAST search, the regions with X characters are ignored.

Sequence Exchange

BreakUp

BreakUp reads a GCG-format sequence file containing more than 350,000 sequence characters and writes it as a set of separate, shorter, overlapping sequence files that can be analyzed by Wisconsin Package programs.

Graphics program for the Macintosh

GCGFigure

A new GCG program is available for the Macintosh that allows you to display and print high-quality GCG graphics. You are able to save graphic images as PICT files, a standard file format for importing into word processing, desktop publishing, or graphic/drawing programs.

GCGFigure is freely available to all GCG users. If you are interested, you can anonymously download it from ftp://alanine.gcg.com in the /pub/mac directory. Or it is also available on the software CD within the /gcgunsupported directory. A README file is also available. See your system manager for assistance.

SeqLab, the Successor to WPI [ Previous | Top | Next ]

SeqLab^(TM), a graphical user interface based on OSF/Motif^(TM), is new to Version 9 of the Wisconsin Package. SeqLab combines the best of the Wisconsin Package Interface (WPI), released in Version 8.0 of the Wisconsin Package, and the Genetic Data Environment (GDE). GDE was originally developed in the Department of Microbiology, University of Illinois, at Urbana-Champaign, Illinois, USA (Smith et al., CABIOS, 10(6):671-675 (1994)).

SeqLab lets you view and edit sequence features represented by color highlighting or schematic figures. SeqLab is also a powerful sequence selector, allowing you to select multiple ranges based on features. For example, you can select and delete all introns from a DNA sequence before translating it, or select all the promoters from an alignment of similar genes. You can add features to sequences as the result of a sequence analysis, such as secondary structure predictions or pattern searching. You can display the results from these analyses over a multiple sequence alignment as color highlighting.

With the addition of SeqLab, the Wisconsin Package now offers fully integrated editing, analysis, and annotation capabilities.

Name Change for WPI Resource Files

The graphical user interface to the Wisconsin Package is no longer called WPI; the new name is SeqLab. Therefore, if you have modified WPI resource files in your own login directory to customize fonts and colors, you will need to create SeqLab resource files to do the equivalent. To create a new version of the resource file, copy the old WPI resource file to the new SeqLab equivalent, and replace any instance of Wpi within the file with SeqLab. (Note that case sensitivity is important).

Old WPI Filename New SeqLab Filename ---------------- ------------------- OpenVMS Wpi.Dat SeqLab.Dat WpiSmall.Dat SeqLabSmall.Dat WpiLarge.Dat SeqLabLarge.Dat ---------------- ------------------- UNIX Wpi SeqLab WpiSmall SeqLabSmall WpiLarge SeqLabLarge

RSF Files New to Version 9

SeqLab creates Rich Sequence Format (RSF) files, which can contain one or more sequences as well as their sequence feature annotations. For more information, see "New File Format: RSF Files" in the Package-Wide Enhancements section of these release notes.

Menu Changes

In the Main List, the Sequences menu has been replaced by the Edit menu. In addition, you may find some of the functions originally located under the Sequences menu in the File menu.

The Wisconsin Package programs available from within the Functions menu have been reorganized. You may find programs grouped under different and multiple functional headings.

Adding Sequences to Your Main List

The function that enables you to add sequences to the list file loaded in the Main List is now found under the File menu. Previously this function was available from the Sequences menu.

User Preference Options Moved

A new User Preferences dialog box is available from the Options menu of the SeqLab Main Window. From this dialog box you can customize General, Output, and Editor Properties options. Some of the preferences that were available elsewhere in WPI have been moved to this dialog box.

General. The Working Directory option was moved from the Main Window to the General option in the User Preferences dialog box.

Output. Options that determine how and when the output from a program is displayed on your screen were moved from the Output Manager to the Output option in the User Preferences dialog box. In addition, the Global Qualifiers options now appear in the Output option.

General Changes to Running Programs

When you click the Run button in a program window, the window now automatically closes. SeqLab maintains the state of the selected parameters in the window. That is, the next time you open the program window during the session, the parameter values appear as they were the last time you ran the program.

In addition, you must now choose the input sequences for an application before you open a program window. The Change/Select button for input sequences is no longer available.

Programs Removed from SeqLab

LineUp, SeqEd, and Publish are no longer available from SeqLab. They have been replaced by SeqLab Editor mode. However, they are still available from the command line.

Known Bugs

MacX

Individuals using MacX (version 1.5 tested, others unknown) may have problems launching SeqLab without first starting the xterm or DECterm program. We have seen several instances of this during the field test for Version 9. If SeqLab is launched from one of these terminal programs, then you should not see a problem.

The MacX local window manager with Macintosh style window decorations is not recommended. This window manager ignores window stacking order imposed by Motif and can cause some serious difficulties in using SeqLab. Use the Motif style window manger or use a "rooted" session with the Motif window manager (mwm) running on the server.

FastA File Compatibility

You can directly load FastA files into SeqLab's Editor mode by using the Import... option in the File menu. You can also use the FromFastA program under the Importing/Exporting section of the Functions menu to convert FastA sequences to GCG format.

On a similar note, setting the global switch for the default sequence format with the SeqFormat program is not supported by SeqLab. Import any sequences in a similar manner as above.

Changes and Enhancements [ Previous | Top | Next ]

You will find the following information in this section:

- Program Enhancements

- Package-Wide Enhancements

- Documentation Enhancements

Program Enhancements

The programs listed below are changed from the last version of the Wisconsin Package.

Editing

SeqEd

Change: <Ctrl>H now deletes the character to the left of the cursor; previously it moved the cursor to the beginning of the line.

Fragment Assembly

GelMerge

Enhancement: The program can now assemble a maximum of 1,650 fragments; the previous maximum was 750 fragments. No additional computer memory is required for this program in Version 9 of the Wisconsin Package.

Enhancement: The maximum size for a consensus sequence created by GelMerge is now 100,000 bases; the previous maximum was 75,000 bases.

GelAssemble

Enhancement: The program can now accept contigs containing a maximum of 1,650 fragments; the previous maximum was 1000 fragments.

Change: <Ctrl>H now deletes the character to the left of the cursor; previously it moved the cursor to the beginning of the line.

Mapping

Map

Enhancement: This program now displays enzyme names horizontally in the output, improving their readability. Previously Map output displayed enzyme names vertically. The new format also lets you show the exact cut positions on the bottom strand of nucleotide sequences.

Enhancement: This program now supports the following optional parameters:

-VERtical displays enzyme names vertically over the cut points, as in previous versions of the Wisconsin Package. In Version 9, enzyme names are displayed horizontally over the cut sites by default.

-BOTtom displays cut sites on both the forward and reverse strands of nucleotide sequences.

-NOCUTline suppresses the line of pipe (|) symbols that indicates the cut positions on the sequence in the output file.

-TABle writes a table of cut positions sorted by position along the sequence. If you specify a nucleotide sequence as input, then all cut positions on both strands are written in the table. You can use this table as input to other programs, such as spreadsheets.

-SORtbyenzyme, when used with -TABle, writes a table of cut positions that is sorted alphabetically by enzyme name rather than by cut position along the sequence.

Map, MapSort, and MapPlot

Enhancement: These programs now support the following optional parameters:

-MINSitelen=6 selects enzymes with at least six bases in their recognition sites. You can specify any minimum length for the recognition site with this parameter.

-OVErhang=0 selects only those restriction endonucleases that leave blunt-end cuts. You can select enzymes that leave either 5' or 3' overhangs by using5 or 3, respectively, with this parameter. You can also select enzymes that leave more than one type of overhang; for instance -OVErhang=5,3 selects restriction endonucleases that leave either 5' or 3' overhangs but not blunt ends.

-CUTters writes an enzyme data file containing those enzymes that cut the input sequence. You can then use this enzyme data file as input to other mapping programs.

-NONCUTters writes an enzyme data file containing those enzymes that did not cut the input sequence. You can then use this enzyme data file as input to other mapping programs.

-EXCUTters writes an enzyme data file containing those enzymes that did cut the input sequence but were not displayed because they failed to meet the criteria specified with the -MINCuts, -MAXCuts, or -EXCLude command-line parameters. You can then use this enzyme data file as input to other mapping programs.

Prime

Enhancement: The maximum range of interest for the input template sequence is now 350,000 bp; previously the maximum length was 32,000.

Comparison

Compare

Enhancement: When comparing two sequences using the parameter -WORdsize on the command line, the maximum range of interest for the vertical sequence is now 350,000 sequence characters. Previously the vertical sequence in a word comparison was limited to a maximum range of 32,000.

Enhancement: Previously the default stringency in window/stringency comparisons was hardwired into the program and may not have been appropriate if you chose an alternate scoring matrix. Now the default stringency is calculated from the symbol comparison values in the scoring matrix. As always, you can override the default stringency in response to the program prompt or with the-STRIngency command-line parameter. The stringency is now an integer value; previously it was a floating point number.

BestFit and Gap

Enhancement: These programs now support the following optional parameter:

-PENAlizedlength allows you to specify the maximum penalized length for any gap in an alignment. For instance, if you specify-PENAlizedlength=20, then all gaps longer than 20 characters will be penalized the same as a gap of length 20. This parameter may be useful, for instance, when you are aligning a cDNA with the corresponding genomic DNA containing large introns.

Enhancement: You can create longer alignments than in the previous version. Instead of restricting the amount of computer memory used in the alignment to a fixed size, the programs now allow you to use all available computer memory for longer alignments. As in the previous version, input sequences may not be more than 30,000 sequence characters long.

FrameAlign

Enhancement: This program now supports the following optional parameter:

-BATch allows you to submit the program to the batch queue for processing after the program prompts you for all the required information. (This program previously supported the -BATch parameter, but this support was undocumented.)

Overlap

Enhancement: The maximum length for any of the input query sequences is now 350,000 bp; previously the maximum length was 32,000.

NoOverlap

Enhancement: The maximum length for any input sequence is now 350,000 bp; previously the maximum length was 32,000.

Database Searching

BLAST

Enhancement: When you search a local database, you can use the resulting BLAST output file as input to all Wisconsin Package programs that accept list files. Previously you had to hand edit the output file and delete everything below the list of database hits before you could use it as an input list file to other programs. This feature does not apply to output files from remote BLAST searches.

FastA and TFastA

Enhancement: This release includes newer versions of FastA and TFastA, based on Dr. William Pearson's version 2.0 of FastA. There are now explicit statistical estimates for similarity scores. Each sequence in the output list of best matches is reported along with a normalized z-score and an expectation value. The expectation value indicates how many sequences in the search set you could expect to give a z-score at least as good as the observed score purely by chance.

Enhancement: You are no longer prompted for the number of matches you want reported in the output list file. Instead you are prompted for the maximum expectation value. Matches appear in the output list only if their z-scores are expected less frequently by chance than this value. If you explicitly set an output list size on the command line with the -LIStsize command-line parameter, then you are not prompted for the maximum expectation value.

Enhancement: The final alignments produced by FastA protein searches now allow unlimited gaps. Previously alignments were restricted to a band of 32 residues. To allow unlimited gaps in alignments produced by FastA nucleotide searches and TFastA, use the new-SWalign command-line parameter.

Enhancement: In addition to the new-SWalign command-line parameter mentioned above, these programs now support the following optional parameters:

-MINLength specifies the minimum length of a sequence to be searched in the search set.

-MAXLength specifies the maximum length of a sequence to be searched in the search set.

Enhancement: You can use the output from FastA and TFastA as input to all Wisconsin Package programs that accept list files. Previously you had to specify -NOALIGN on the FastA or TFastA command line to produce a list file other programs could use as input.

Enhancement: The list file created by FastA and TFastA includes theBegin:, End:, andStrand: attributes for each sequence in the list. These attributes indicate the region of each sequence in the search set that was aligned with the query sequence.

Enhancement: The FastA and TFastA output files now include a list of those databases that were searched.

Enhancement: You can save the alignment output in a format that other programs and scripts can easily parse if you use the -MARKx=10 command-line parameter. Programmers and script writers may find this feature useful, but most users of FastA and TFastA can ignore it.

Change: These programs no longer support the -NOINCrease and -SCAle optional parameters.

Change: The default scoring matrix for protein searches in FastA and TFastA is BLOSUM50; previously the default was the PAM250 scoring matrix.

The default scoring matrix for nucleotide sequence searches in FastA has changed slightly. Matches now have a value of +5 and mismatches have a value of -4; previously matches had a value of +4 and mismatches had a value of -3.

Change: By default, FastA and TFastA now determine a rigorous local alignment score for those matches with an initn score above a given threshold. The programs then use these scores as the basis for retaining the best matches. Previously you had to add -OPTall to the program command line to determine the list of best scores in this manner.

WordSearch

Enhancement: The WordSearch output file now includes a list of those databases that were searched.

Segments

Change: Previously, the default gap creation penalty was calculated from the word size chosen in WordSearch, and the default gap extension penalty was hardwired into the program. Now, both gap penalties are determined from the scoring matrix in the same manner as they are for other alignment programs. For more information see "New Scoring Matrices" in the Package-Wide Enhancements section of these release notes.

FrameSearch

Enhancement: You can use the output from this program as input to all Wisconsin Package programs that accept list files. Previously, you had to specify-NOALIgn on the FrameSearch command line to produce a list file that you could use as input to other programs.

Enhancement: The FrameSearch output file now includes a list of those databases that were searched.

Enhancement: If you specify multiple query sequences as input, and you request that the score distribution histogram for each search be written to a figure file, the program now writes a separate figure file for each query sequence. Previously all score distribution histograms were written to a single figure file.

Change: The program plots a score distribution histogram for each search by default. Previously, you had to specify -PLOt, -FIGure, or -PSINClude on the command line to plot the histogram.

FindPatterns

Enhancement: The FindPatterns output file now includes a list of those databases that were searched.

ToBLAST

ToBLAST has been renamed GCGtoBLAST.

Multiple Sequence Analysis

PileUp

Enhancement: This program now supports the following optional parameter:

-INSitu allows you to realign a portion of an existing alignment without changing the remainder of the alignment. You specify the portion to realign with the -BEGin and -END command-line parameters.

Enhancement: PileUp can now take into account the Strand: sequence attribute (+ or -) to align each sequence in a list file. As always, you can restrict the range for each sequence in a list file using the Begin: and End: sequence attributes.

Change: When you create a non-end-weighted alignment (the default), the gaps at the ends of each sequence are written as tildes (~). Tildes represent differences in input sequence lengths rather than missing characters. When you create an end-weighted alignment in PileUp by adding -ENDWeight to the command line, gaps at the ends of each sequence are written as periods (.) since those gaps are significant and may represent missing characters in the sequence. For more information see "New Gap Character" in the Package-Wide Enhancements section of these release notes.

LineUp

Change: <Ctrl>H now deletes the character to the left of the cursor; previously it moved the cursor to the beginning of the line.

Pretty

Change: If you use the -CASe command-line parameter, a sequence character is shown in uppercase only when its comparison with the consensus symbol has a value that is at least equal to the threshold specified with -THReshold. Previously, the program showed a sequence character in uppercase when its comparison with the coalition-defining symbol had a value at least equal to this threshold. Since the coalition-defining symbol was not necessarily the consensus symbol, the case of a sequence character was not easily related to the consensus symbol.

Change: If you use the -DIFferences command-line parameter, a sequence character is shown in the alignment when its comparison with the consensus symbol has a value less than the threshold specified with -THReshold. Previously, the program showed a sequence character in the output alignment only when its comparison with the coalition-defining symbol had a value less than this threshold. Since the coalition-defining symbol was not necessarily the same as consensus symbol, the displayed symbols were not easily related to the consensus symbol.

Change: Previously if all of the sequences in a multiple sequence alignment were not of equal length, Pretty padded the shorter sequences at the end with period (.) gap characters to the length of the longest sequence. Now, the program pads the shorter sequences at the end with tilde (~) gap characters to signify that the gaps do not represent missing characters but rather differences in input sequence lengths. For more information see "New Gap Character" in the Package-Wide Enhancements section of these release notes.

Enhancement: Previously the default threshold for consensus calculation was hardwired into the program and may not have been appropriate if you chose an alternate scoring matrix. Now the default threshold is calculated from the symbol comparison values in the scoring matrix. As always, you can override the default threshold with the -THReshold command-line parameter. The threshold is now an integer value; previously it was a floating point number.

PlotSimilarity

Enhancement: This program now considers sequence weights specified in an MSF file, an RSF file, or a list file. The comparison score between any two sequence characters at a position in the alignment is now the comparison value of those characters in the scoring matrix multiplied by the weight of each of the two sequences.

Enhancement: This program now supports the following optional parameters:

-OUTfile writes an output file with the average similarity value at each position in the alignment.

-NOPLOt suppresses the plot of the average similarity value at each position in the alignment.

-CMASK writes a grayscale colormask file according to the average similarity value at each position in the alignment. This can be used to shade each column of the alignment in the Editor mode of SeqLab, where darker regions represent regions of high conservation and lighter regions represent regions of low conservation.

Change: When you specify multiple sequences as input in response to the program prompt, you are no longer prompted for the sequence range. You can still modify the sequence range with the -BEGin and -END command-line parameters.

ProfileMake

Enhancement This program now accepts up to 5,000 sequences as input. The previous limit was 100 sequences.

ProfileSearch

Enhancement: This program can now search up to 100,000 protein sequences or 50,000 nucleotide sequences. The previous limit was 60,000 protein sequences or 30,000 nucleotide sequences.

ProfileSegments

Enhancement: This program now supports the following optional parameter:

-MSF writes an MSF (multiple sequence format) file with all of the input sequences aligned to each other and to the profile consensus sequence.

ProfileGap

Enhancement: This program can now accept multiple sequence input, such as list files, MSF or RSF files, or specifications using the * wildcard character. If you specify multiple sequences as input, the output file contains a separate alignment of each sequence to the input profile.

Enhancement: This program now supports the following optional parameter:

-MSF writes an MSF (multiple sequence format) file with all of the input sequences aligned to each other and to the profile consensus sequence.

Evolutionary Analysis

Distances

Enhancement: You can now calculate a matrix of the pairwise distances between aligned nucleotide sequences using the method of Tamura.

Enhancement: This program now accepts sequences of unequal length as input. For each pairwise comparison between the sequences, the shorter sequence is treated as though it was padded at the end with gap characters to the length of the longer sequence.

NewDiverge

Change: This program has been renamed Diverge. The program formerly known as Diverge is no longer supported.

Enhancement: This program now accepts sequences of unequal length as input. Each input sequence is treated as though it was padded at the end with gap characters to the length of the longest input sequence. Since the program ignores codon pairs that contain gaps when calculating the synonymous and nonsynonymous codon substitution statistics, the padding will not affect those calculations.

Enhancement: This program can now accept multiple sequence input, such as list files, MSF or RSF files, or specifications using the * wildcard character. If multiple sequences are specified in a list file, you can specify the range and strand for each sequence with the Begin:, End:, andStrand: sequence attributes.

Enhancement: This program now supports the following optional parameters:

-TOFiles writes two additional output files when you specify at least three sequences as input to the program. One additional output file has a .ks file extension and contains a matrix of the estimated number of synonymous substitutions between each pair of input sequences. The other additional output file has a .ka file extension and contains a matrix of the estimated number of nonsynonymous substitutions between each pair of input sequences. You can use either of these additional matrix files as input to the GrowTree program.

Pattern Recognition

Repeat

Enhancement: Previously, the default minimum stringency was hardwired into the program and may not have been appropriate if you chose an alternate scoring matrix. Now, the default minimum stringency is calculated from the symbol comparison values in the scoring matrix. As always, you can override the default minimum stringency in response to the program prompt. The minimum stringency is now an integer value; previously it was a floating point number.

Enhancement: Previously, the match display threshold was hardwired into the program and may not always have been appropriate if you chose an alternate scoring matrix. Now, the default match display threshold is calculated from the symbol comparison values in the scoring matrix. As always, you can override the default match display threshold with the -PAIr command-line parameter.

RNA Secondary Structure

MFold

Enhancement: You are no longer limited to selecting the region to fold from the first 10,000 bases of the input sequence; you can select a region of up to 1,400 bases from any part of the sequence.

StemLoop

Enhancement: Previously, the default minimum number of bonds per stem was not calculated from the values in the scoring matrix and may not have been appropriate if you chose an alternate scoring matrix. Now, the default minimum number of bonds per stem is calculated from the symbol comparison values in the scoring matrix. As always, you can override the default value with the -BONds command-line parameter or by typing a different value in response to the program prompt.

Enhancement: Previously, the match display threshold was hardwired into the program and may not have been appropriate if you chose an alternate scoring matrix. Now, the default match display threshold is calculated from the symbol comparison values in the scoring matrix. As always, you can override the default match display threshold with the -PAIr command-line parameter.

Sequence Exchange

Reformat

Enhancements: Reformat now accepts input from stdin if you specify -INfile=- on the command line. If the stdin input does not contain a heading that is separated from the sequence by a line containing two dots (..), then add -NOHEAding to the Reformat command line.

Enhancement: This program now supports the following optional parameters:

-RSF allows you to reformat one or more sequences into a new RSF file. For more information see "New File Format: RSF Files" in the Package-Wide Enhancements section of these release notes.

Several new optional parameters are concerned with reformatting scoring matrices:

-OLDCMPformat converts a pre-Version 9 triangular scoring matrix (containing floating point values) to the rectangular scoring matrix format (containing integer values) that you can use in Version 9 of the Wisconsin Package. By default, when you use this parameter, each floating point value in the input matrix is first multiplied by 10 and then rounded to the nearest integer in the output matrix.

-SCAle, when used with either-COMParison or -OLDCMPformat, allows you to scale each value in the scoring matrix by a constant value. For instance-SCAle=5 creates an output scoring matrix in which each comparison value is fivefold greater than in the input matrix.

-EQUALSformat, when used with either-COMParison or -OLDCMPformat, converts a scoring matrix to a format that is less compact but that some fine more easy to read. Any program that reads scoring matrices can read this equals format file.

-GAPweight, when used with either-COMParison or -OLDCMPformat, allows you to specify the default gap creation penalty that will be associated with the reformatted scoring matrix. For more information see "New Scoring Matrices" in the Changes that Affect the Whole Package section of these release notes.

-LENgthweight, when used with either-COMParison or -OLDCMPformat, allows you to specify the default gap extension penalty that will be associated with the reformatted scoring matrix. For more information see "New Scoring Matrices" in the Package-Wide Enhancements section of these release notes.

-PROtein or -NUCleotide, when used with either -COMParison or -OLDCMPformat, allows you to specify the type of the reformatted scoring matrix. For more information see "File Typing in Version 9" in the Package-Wide Enhancements section of these release notes.

FromFastA

Enhancement: The default output filename extension is now .pep for protein sequences and.seq for nucleotide sequences. Previously, the output files were written with no filename extension by default.

Protein Analysis

ProfileScan

Enhancement: The output files now contain information about only those profiles that match the query sequence; previously the program wrote information about every profile searched.

Enhancement: When a match is found to a profile derived from a motif defined in the PROSITE Dictionary of Protein Sites and Patterns , the corresponding PROSITE abstract is now written to the .scan output file along with the alignment between the query sequence and the profile. You can suppress writing the PROSITE abstract with the new -NOREFerence command-line parameter.

Enhancement: In addition to the new-NOREFerence parameter, this program now supports the following optional parameter:

-BATch allows you to submit the program to the batch queue for processing after the program prompts you for all the required information.

Manipulation

CompTable

Change: Because all scoring matrices are now assigned a nucleotide or protein type (see "File Typing in Version 9" in the Package-Wide Enhancements section of these release notes), you must now indicate the type of scoring matrix you are creating when running this program. To do this, respond to the new program prompt or add either-PROtein or -NUCleotide to the program command line.

Change: This program calculates default gap creation and extension penalties from the symbol comparison values in the scoring matrix and writes them in an auxiliary data block in the output matrix file. (For more information see "New Scoring Matrices" in the Package-Wide Enhancements section of these release notes.) When you use the output matrix file with other programs, you can override the default values with the -GAPweight and -LENgthweight command-line parameters.

Display

Red

Enhancement: Red now supports the following optional parameter:

-A4 moves all left margins to the left 9/72 inch and raises all top and bottom margins up by 24/72 inch. This command centers documents on A4 paper without changing their pagination or filling in any way.

Package-Wide Enhancements

The changes in this section describe conditions and situations that can affect the Package as a whole.

Program Name Changes

- ToBLAST is renamed GCGToBlast.

- NewDiverge is renamed Diverge. The program formerly named Diverge is no longer supported.

- WPI, the graphical user interface to the Wisconsin Package, is enhanced and renamed SeqLab. For more information see the SeqLab, the Improved Graphical User Interface section in these release notes.

Commands No Longer Supported

GCG no longer supports the HuntFor command, which was used to find the sequence name(s) that correspond to any accession number in GenBank. You can use the LookUp program to perform the same function.

New Gap Character

In addition to the existing period (.) gap character, the Wisconsin Package now supports a new gap sequence character, the tilde (~). Programs in the Wisconsin Package run from the command line or from the Main List mode of SeqLab treat the two gap characters identically in input sequences. Programs in the Wisconsin Package run from the Editor mode of SeqLab remove any tilde gap characters from the right end of each input sequence before performing their analyses.

In the future, programs run from either the command line or from SeqLab may differentiate the two gap characters in their analyses. The period gap character will increasingly be used as a space holder that may represent a missing character in a sequence. For example, the period gap character may represent a missed base call in a contig alignment in fragment assembly. The tilde gap character will increasingly be used to as a simple place holder that never represents an actual character in a sequence. For example, two tildes may be used in a translated sequence to align each codon in a nucleotide sequence with its corresponding single-letter amino acid symbol. As another example, gaps at the ends of sequences in an alignment may be written as tildes when those gaps are due to differences in input sequence lengths rather than missing characters in the input sequences. See Appendix III in the Program Manual for a list of all supported GCG sequence characters.

The Plus Symbol (+) Is No Longer a Valid Sequence Character

The Wisconsin Package no longer supports the plus symbol (+) as a valid sequence character. Analysis program will not recognize existing sequences containing plus symbols as valid GCG sequences. You can use the Reformat program to remove the plus symbols from any input sequence. If you want to replace the plus symbols in existing sequences with another valid sequence character, you must manually edit the sequences to make the substitutions and then reformat the sequences with the Reformat program. See Appendix III in the Program Manual for a list of all supported GCG sequence characters.

FastA-Format User Sequences

Sequence analysis programs in the Wisconsin Package now accept input sequences from files in FastA format when you add -FASTA to the program command line. Alternatively, you can use the global switch % seqformat fasta to automatically set the programs to accept sequences from files in FastA format. Warning: If the FastA-format sequence file contains multiple sequences, only the first one is read by the analysis program.

File Typing in Version 9

Many of the output files created by Wisconsin Package programs in Version 9 will indicate the file type on the top line of the file. This line begins with two exclamation points (!!) and is followed by text specifying the type of data in the file and the version number of the file. For example, an individual sequence file created in Version 9 will display either

!!NA_SEQUENCE 1.0 or !!AA_SEQUENCE 1.0

as the first line of the file. The file type must remain the first line of the file and you should not alter it in any way. Files created without file types before Version 9 will work in Version 9 of the Wisconsin Package. File formats new to Version 9, like RSF files, are required to have file types.

Many of the data files used by Wisconsin Package programs in Version 9 will similarly contain file types on the top line of the file. As with sequence file types, data file types must remain the first line of the file, and you should not alter them in any way. SeqLab may not recognize data files created without file types before Version 9 of the Wisconsin Package. The new scoring matrix file format in Version 9 is required to have a file type as the first line in the file to be recognized by any Wisconsin Package program.

New Scoring Matrices

BLOSUM Matrices

The default scoring matrix for most protein sequence comparisons is now the BLOSUM62 scoring matrix. A complete series of BLOSUM matrices is provided in the GenMoreData directory. The new command-line parameter -MATRix allows you to specify an alternate scoring matrix to any program that reads a scoring matrix. If the name of the matrix you provide after this parameter does not contain any directory specification, then the program searches for this matrix first in your local directory, then in the directory with the logical name MyData, then in the GenMoreData directory, and finally in the GenRunData directory. For instance, if you want to use the alternate BLOSUM45 matrix GCG supplies in the GenMoreData directory with a particular program, you need only specify -MATRix=blosum45.cmp on the command line.

New Scoring Matrix Format

The format and content of scoring matrices is changed in Version 9 of the Wisconsin Package. To see an example of the new default scoring matrix format, copy a representative scoring matrix to your local directory by typing % fetch blosum45.cmp and then view the contents of the file with any text editor. In Version 9, the scoring matrix in the data file is rectangular; previously the scoring matrix was triangular. Also, the values in the scoring matrix are now integers; previously the values were floating point numbers. These changes make the format and content of scoring matrices provided by the Wisconsin Package more similar to scoring matrices provided by others.

You can convert an old-style scoring matrix to the format required for Version 9 with % reformat -OLDCMPformat. See the Reformat notes in the Changes to Existing Programs section of these release notes for a listing of other new command-line parameters that affect the reformatting of scoring matrices.

The very top line of the scoring matrix file is the file type. (For more information see "File Typing in Version 9" in this section of these release notes.)

In Version 9, each scoring matrix can optionally specify its own default gap creation and extension penalties in an auxiliary data block. Just like the symbol comparison values in the scoring matrix, these penalties are now integers. To see the format of the auxiliary data block, look at the representative matrix you've already fetched. Any program that requires gap penalties will use the defaults found in the auxiliary data block. If optional default gap penalties are not specified for a scoring matrix, any program that requires gap penalties will calculate defaults from the symbol comparison values in the matrix. As always, you can override the default gap creation and extension penalties in response to the program prompts or on the program command line with the-GAPweight and -LENgthweight command-line parameters.

Changes to Alignment Display

In Version 9 of the Wisconsin Package, default thresholds for the placement of the symbols . : and | between paired characters in an alignment display are calculated from the values in the scoring matrix used to create the alignment. Previously, these values were hardwired into each program and may not always have been appropriate for alternate matrices you chose to use. As always, you can override the default display thresholds with the-PAIr command-line parameter. The pair display thresholds are now usually listed in the alignment output.

BLAST-Format Scoring Matrices

Using the -MATRix command-line parameter, you can specify BLAST-format scoring matrices as alternate scoring matrices in Wisconsin Package programs. However you cannot specify default gap penalties in an auxiliary data block in a BLAST-format scoring matrix file. Any program that reads a BLAST-format scoring matrix will calculate its own default gap penalties from the symbol comparison values in the matrix. To convert a native BLAST-format scoring matrix to the standard format used by the Wisconsin Package in Version 9, use % reformat -COMParison.

Translation Tables

The alternate translation tables provided with the Wisconsin Package in the GenMoreData directory have been renamed and supplemented with additional tables. See Appendix VII in the Program Manual for a complete list of the tables provided.

New Graphics Formats

You can initialize your graphics configuration to create color EPS file output from Wisconsin Package graphics programs. At the system prompt, type % postscript CEPSF. When the computer prompts you for the name of the port to which your device which supports CEPSF (Color Encapsulated PostScript Format) is connected, respond with the name of the file you want to contain the color EPS instructions.

In addition, you can initialize your graphics configuration to create GIF (Graphics Interchange Format(c)) file output from Wisconsin Package graphics programs. At the system prompt, type % gif. The computer prompts you to select either GIF87a or GIF89a (GIF89a is a newer GIF version with extensions that are not supported by all GIF viewers), the name of the GIF output file, and the graphics image width and height.

For Version 9.0, GIF is an optional graphics driver sold separately from the Wisconsin Package. The Graphics Interchange Format is the Copyright property of CompuServe Incorporated. GIF is the Service Mark property of CompuServe Incorporated. The GIF-LZW compression software is licensed under U.S. Paten 4,558,302 and foreign counterparts.

X Windows Graphics Output

Previously, you were unable to change the background color of the X Windows graphics window on a monochrome display. Now if you type

% xwindows mono window_name bgcolor

the graphics window background on a monochrome display will switch from its default color (either white or black) to the opposite color.

Using Tilde (~) in Specifying Directory Paths

You can now use the tilde (~) when you specify directory paths in Wisconsin Package programs. For instance % map ~/test.seq finds the file test.seq in your login directory. The tilde is a standard UNIX character that, when used at the beginning of a filename, is used to refer to home directories.

Specifying Graphics Output Filenames

When using SetPlot or graphics configuration commands, you can now specify filenames that change according to the program, time of day, username, or hostname of the computer. For example, the following command will cause plotting programs to create PostScript files that are named for the program.

    % postscript laserwriter '$program$.ps'

The new tokens include:

token generates filename with --------- ----------------------------- $program$ the name of the program (e.g. mapplot) $host$ the name of the computer $user$ the name of the user running the program $time$ the time of day in numeric format

In addition, you can use two or more of these tokens together to create unique files, for example

   % postscript laserwriter '$program$-$time$.ps'

New File Format: RSF Files

A Rich Sequence Format (RSF) file contains one or more sequences that may or may not be related. In addition to the sequence data, each sequence can be annotated with the following descriptive sequence information:

- Creator/author of the sequence

- Sequence weight

- Creation date

- One-line description of the sequence

- Offset, or the number of leading gaps in a sequence that is part of an alignment or fragment assembly project

- Known sequence features

RSF files are useful within SeqLab, the graphical user interface to the Wisconsin Package. Because they store positional information, you can display RSF files within SeqLab's Editor to view and edit sequence alignments and features. The features annotation allows you to graphically view and align sequences based on features as well as run programs on sequence regions selected by features. You also will find RSF files useful for distributing sequences to colleagues, since these files contain each sequence's data and descriptive information. For more information on RSF files, see "Using Rich Sequence Format (RSF) Files" in Chapter 2, Using Sequence Files and Databases of the User's Guide.

Documentation Enhancements

The changes in this section describe enhancements to the Wisconsin Package documentation.

Program Manual Reorganization

The Program Manual has grown to two manuals. The programs are now organized alphabetically, instead of grouped in functional sections. For those unfamiliar with the functionality the Package offers, a tabbed division titled "Programs by Function" contains a table of the programs grouped within the following functions:

Comparison - Pairwise or Multiple    Database Searching - Reference or Sequence
Editing and Publication              Evolution
Fragment Assembly                    Gene Finding and Pattern Recognition
Importing/Exporting                  Mapping
Primer Selection                     Protein Analysis
RNA Secondary Structure              Translation
Utilities

Some programs appear within multiple functional categories.

Online Help

Online help for the Wisconsin Package now includes the User's Guide as well as the Program Manual. In addition, the online help has been converted to HTML, and typing % genhelp or %genmanual now displays the text-only browser Lynx for navigating between topics and links. To use a different browser, such as Netscape, use the documentation URL in the banner that displays on your screen when you initialize the Package. As was available with the previous version of online help, you will still be able to navigate to a specific online help topic with a command like % genhelp map. For more information, see the "Improved Online Help" document accompanying this release.

Program Documentation Removed from the Program Manual

To help focus the Program Manual on sequence analysis, we have removed the documentation for a number of utility programs: Fonts, EchoKey, FileCheck, Examine, Count, Crypt, and GetText. Only the documentation has been removed; these utilities are still available for use.

In addition, the DBIndex program has moved from the Program Manual to the Database Utilities chapter of the System Support Manual.

Data Files Manual Discontinued

The Data Files manual is not available in Version 9. Its information instead has been condensed and incorporated into the Program Manual as Appendix VII.

Bug Fixes [ Previous | Top | Next ]

You will find the following information in this section:

- Program Bug Fixes

- Package-Wide Bug Fixes

Program Bug Fixes

The following notes describe problems and errors in existing programs that are corrected in Version 9.0 of the Wisconsin Package.

Editing

SeqEd

Problem: You were unable to enter the tilde (~) to indicate NOT syntax when you specified a pattern to find in the sequence you were editing.

Update: You can enter the tilde (~) to indicate NOT syntax when you specify a pattern to find in the sequence you are editing.

Mapping

Map

Problem: The reading frames for the forward strand protein translations were not changed if you specified a beginning position other than the first position in the sequence. For instance, frame a always began at the first position in the sequence rather than at the beginning of the selected range. Similarity, reading frames for the reverse strand protein translations were not changed if you specified an ending position other than the last position in the sequence. For instance, frame f always began at the last position in the sequence rather than at the ending of the selected range.

Update: If you specify a beginning position other than the first position in the sequence, the forward strand reading frames are changed so that the a frame starts at the beginning of the selected range. If you specify an ending position other than the last position in the sequence, the reverse strand reading frames are changed so that the f frame starts at the end of the selected range.

MapSort

Problem: Unless you added the undocumented-PROtein command-line parameter, the program recognized all input sequences as nucleic acids.

Update: The program determines the sequence type from the Type: field of the input sequence (see Appendix VI in the Program Manual) in the same way the sequence type is determined by other Wisconsin Package analysis programs.

Problem: If you looked for silent restriction sites by adding -SILent to the command line, and you specified a sequence range that did not extend to the end of the sequence, the program sometimes crashed if a silent restriction site was identified at the very end of the sequence range.

Update: The program identifies silent restriction sites at all appropriate sites in the sequence without crashing.

Prime

Problem: Prime rarely calculated incorrect annealing scores between two primers. In addition, Prime also rarely calculated incorrect annealing scores between a primer and possible false priming sites on the template sequence when you specified either -ALLANNEALTemplate or -ENDANNEALTemplate on the command-line. Because of this, some primers were incorrectly rejected from consideration and others might not have been ranked at their appropriate positions in the output list.

Update: Prime correctly calculates annealing scores between two primers or between a primer and possible false priming sites on the template sequence.

Problem: If you specified an input file of primer sequences with the -PRImers command-line parameter, and you also specified either -ALLANNEALTemplate or -ENDANNEALTemplate on the command line, Prime sometimes calculated incorrect annealing scores between the primers and possible false priming sites on the template sequence. Because of this, some primers were incorrectly rejected from consideration and others may not have been ranked at their appropriate positions in the output list.

Update: Prime correctly calculates annealing scores when you specify an input file of primer sequences and you also specify either -ALLANNEALTemplate or -ENDANNEALTemplate on the command-line.

Problem: If you chose an alternative output filename and you specified-BATch on the command line, the output filename you chose was ignored and the output file was given the default name.

Update: Prime writes its output to the file you name when you specify -BATch on the command line.

Comparison

BestFit and Gap

Problem: If you aligned sequences contained in an MSF file, the output listed the name of the MSF file and not the names of the sequences.

Update: If you align sequences in an MSF file, the output lists the names of the sequence.

FrameAlign

Problem: If either input sequence was contained in an MSF file, the output alignment listed the name of the MSF file instead of the name of the sequence within the MSF file.

Update: The output alignment lists the name of the sequence within the MSF file.

Overlap

Problem: If you specified a group of sequences within an MSF file as input, each overlap listed the name of the MSF file instead of the names of the overlapping sequences.

Update: Each overlap lists the names of the overlapping sequences within the MSF file.

Database Searching

BLAST

Problem: If you specified a database to search on the command line with an expression like-INfile2=MyDir:nuc and a database of the same name, but in a different directory, was also found in one of the database menu files (blast.rdbs, blast.ldbs, or blast.sdbs), the database in the menu file was searched.

Update: The database you specify on the command line is searched.

Known Problem: With the increasing sizes of the databases, it is possible that you will not have enough memory to run a local BLAST search. We have already seen this happen on some machines when a nucleotide sequence is used to search GenEMBL. The program terminates before the search begins and the program output file contains an "out of memory" error message.

Possible Work-arounds: Increase the limits on your account by typing % unlimit before running BLAST. This may help if the query sequence is short. If this doesn't work, search each strand of a nucleotide query sequence separately using the commands % blast -TOPstrand and % blast -BOTtomstrand. If this doesn't work, speak to your system manager about either partitioning the large BLAST database into two or more smaller BLAST databases or increasing swap space.

LookUp

Problem: If you specified a query on the command line without a parameter (e.g. % lookup Smithies), the program stopped after complaining that no valid libraries exist.

Update: If you specify a query on the command line without a parameter, the program behaves as if you had used the -ALLtext parameter with the query and proceeds normally.

Problem: If your output list file contained comments exactly eighty characters in length, you could have problems if you tried to use the file as input to other Wisconsin Package programs. If you tried to do so, all sequence entries following those with long comment lines were skipped. Also, the long comment lines may have been missing a character at the end of the line.

Update: If your output list file contains comments exactly eighty characters in length, you can use the file as input to other Wisconsin Package programs without problem. Also, the long comment lines are no longer missing any part of the comment.

Problem: If you searched for sequences in SWISS-PROT, the program occasionally crashed.

Update: The program finds the appropriate matches in SWISS-PROT to your query and completes normally.

Problem: If you searched for some authors associated with sequence entries in SWISS-PROT, no matching entries were found.

Update: You can search for any authors associated with sequence entries in SWISS-PROT and the appropriate matching entries are reported.

Problem: If you searched for feature names containing apostrophes (e.g. 5'UTR) in GenBank, the program displayed a message claiming a syntax error in the feature name, and matching entries were not found.

Update: You can search for feature names containing apostrophes in GenBank, and the program reports the appropriate matching entries.

Problem: If you searched PIR and selected fragment output by adding-FRAgments to the command line, the program crashed.

Update: The program stops normally after displaying a message reporting that fragment output from PIR is currently unavailable.

FastA and TFastA

Problem: If you added -PAMfactor to the command line when searching with a nucleic acid query sequence, the command-line parameter was ignored. Instead of using a scoring matrix for the calculation of initial diagonal scores as you requested, the program used a constant factor for each match.

Update: When you add -PAMfactor to the command line in a nucleic acid search, the program uses a scoring matrix for the calculation of initial diagonal scores.

Problem: If either the query or matching search set sequence contained more than six letters, only the first six were displayed to the left of the sequences in the alignment output.

Update: There is now space for up to twelve letters in the sequence names to the left of the sequences in the alignment output.

FrameSearch

Problem: If you specified a search set that contained no valid sequences, and you added -Default to the command line, the program automatically substituted SwissProt:* (for a nucleotide query) or EST:* (for a protein query) as the search set.

Update: If you specify a search set containing no valid sequences and you add -Default to the command line, the program displays an error message and stops.

Problem: The program occasionally miscalculated the "Percent similarity" reported for each alignment when you specified an identity threshold with the -PAIr command-line parameter.

Update: The program correctly calculates the "Percent similarity" reported for each alignment under all circumstances.

Problem: If you ran FrameSearch with-BATch on the command line, and you specified multiple query sequences as input, the program used the sequence length of the first query sequence as the ending position for all subsequent query sequences.

Update: If you specify multiple query sequences as input, the program uses the entire length of each sequence in the search (unless you specify -BEGin or -END on the command line).

FindPatterns

Problem: If you allowed mismatches between the pattern and sequence by specifying -MISmatch on the command line, and a mismatch occurred in that portion of the pattern containing OR matching, the program sometimes missed finding all the appropriate matches.

Update: All appropriate matches are found when you allow mismatches and your pattern contains OR matching.

Problem: If you specified -PERFect on the command line in a search of nucleotide sequences, only the forward (top) strand of each nucleotide sequence was searched for matches to each pattern.

Update: Both strands of each nucleotide sequence are searched for perfect (non-ambiguous) matches to each pattern.

Problem: If no matches were found, then no output file was created.

Update: If no matches are found, then an output file is written indicating this result.

Problem: If you specified patterns containing spaces on the command line using the -PATterns command-line parameter, no matches were reported. However, if you specified patterns containing spaces in response to the program prompt, the spaces were automatically removed before searching and the appropriate matches were reported.

Update: If you specify patterns containing spaces on the command line, the spaces are removed before searching and the appropriate matches are reported.

Problem: If you searched for patterns in sequences contained in an MSF file, the output listed the name of the MSF file and not the names of the sequences in which the patterns were found.

Update: The output lists the names of the sequences within the MSF file in which the patterns are found.

StringSearch

Problem: If you entered a character pattern consisting of two words separated by more than one space, and you added -BATch to the command line, the program removed all but one of the spaces separating the words before searching for the pattern.

Update: StringSearch no longer removes any spaces between words in a character pattern when you add-BATch to the command line.

Problem: If you created a list file of sequence names with LookUp, and then used that list as input to StringSearch for a definitions search, StringSearch was unable to find matches to any text patterns.

Update: You can use a LookUp output file as input to StringSearch for a definitions search, and appropriate matches to the text patterns you specify are found.

GCGToBLAST (formerly ToBLAST)

Problem: If you tried to create a BLAST-searchable database from nucleic acid sequences containing X sequence symbols, GCGToBLAST complained that X was an invalid nucleic acid code because BLAST does not recognize theX as a nucleic acid ambiguity code.

Update: GCGToBLAST converts each occurrence ofX (or x) in nucleotide sequences into an N (orn) nucleic acid ambiguity symbol in the BLAST-searchable database.

Multiple Sequence Analysis

PileUp

Problem: You could not increase the combined length of all gaps that could be added to each sequence in the alignment with the -MAXGap command-line parameter unless you also decreased the maximum segment length of each input sequence with the -MAXSeg command-line parameter.

Update: When you increase the combined length of all gaps that can be added to each sequence in the alignment with the -MAXGap command-line parameter, the maximum segment length of each input sequence is automatically reduced so that the sum of the maximum segment length and the maximum gap length is equal to 7,000.

LineUp

Problem: On the command line, you were unable to use a normal MSF sequence specification like^&^myseqs.msf{*}. Instead, you had to use a LineUp-specific syntax like % lineup myseqs.msf.

Update: LineUp accepts any single or multiple sequence specification on the command line using the normal syntax.

Problem: If your your local directory contained a set.keys file specifying keyboard key redefinitions for editing nucleotide sequences and the first sequence you entered into LineUp was a nucleotide sequence, the key redefinitions were ignored.

Update: If the first sequence entered into LineUp is a nucleotide sequence, the keyboard keys are redefined according to the specifications in the Set.Keys file in your local directory.

Problem: When you used the ZIp command to align a protein sequence to an existing protein consensus sequence, the program could propose meaningless alignments involving the reverse complement (-) strand of the protein sequence.

Update: The program does not propose alignments that involve the reverse complement strand of a protein sequence.

Problem: You were unable to enter the tilde (~) to indicate NOT syntax when you specified a pattern to find in the sequence you were editing.

Update: You can enter the tilde (~) to indicate NOT syntax.

PlotSimilarity

Problem: If you added -PROFile to the command line and specified non-profile input, the program displayed an error message and continued prompting for additional parameters. If you responded to the additional program prompts, the program crashed.

Update: If you add -PROFile to the command line and specify non-profile input, the program displays an error message and stops.

Problem: If you tried to plot the running average similarity among the sequences in an alignment longer than 11,500 symbols, the program rejected any density for the plot you specified in response to the program prompt and continued to prompt you for a new density. You had to use <Ctrl>C to exit the program.

Update: You can plot the running average similarity among the sequences in an alignment of any length. However, since the entire plot must fit on a single page, the plot becomes more difficult to read as the length of the alignment increases.

ProfileMake

Problem: If you entered more than the maximum limit of 100 sequences as input, the program displayed an error message for each additional sequence it tried to read.

Update: You can now enter up to 5,000 sequences as input. If this limit is exceeded, the program stops immediately.

ProfileSearch

Problem: If you specified a search set that contained no valid sequences, and you added -Default to the command line, the program automatically substituted SwissProt:* (for a protein profile) or EMBL:* (for a nucleotide profile) as the search set.

Update: If you specify a search set containing no valid sequences and you add -Default to the command line, the program displays an error message and stops.

Problem: If you specified a search set containing sequences whose lengths were all very similar, the program crashed while trying to normalize the scores.

Update: The program no longer tries to normalize the scores when the lengths of all the sequences are very similar.

Evolutionary Analysis

GrowTree

Problem: If you chose to reconstruct a tree from a distance matrix using the UPGMA method, the branch lengths in the output tree were incorrect.

Update: The branch lengths in the output tree are correct.

Pattern Recognition

CodonPreference

Problem: If you specified a range of the input sequence to analyze, and you chose to reverse that specified segment either by adding -REVerse to the command line or by responding to the program prompt, the entire sequence was first reversed and then the specified segment was selected from the reverse sequence strand.

Update: If you specify a range of the input sequence to analyze, and you choose to reverse the specified segment, the specified segment is first chosen from the forward sequence strand and then this segment is reversed. This is consistent with the behavior of other Wisconsin Package programs that offer you the option to analyze the reverse strand of the input sequence.

Repeat

Problem: If a repeat was longer than 55 bases or residues, a length of 55 was reported in the output file to the right of the repeat alignment.

Update: If a repeat is longer than 55 bases or residues, the correct length is reported in the output file to the right of the repeat alignment. However, only the first 55 bases of the repeat are actually displayed in the alignment.

CodonFrequency

Problem: If you tabulated the codon usage of a single sequence specified in a list file, any attributes associated with that sequence were ignored unless you added -Default to the command line.

Update: If you tabulate the codon usage of a single sequence specified in a list file, any begin, end, and strand attributes associated with that sequence are used as default input values by the program.

Protein Analysis

PlotStructure

Problem: If you create a figure file of the 1-dimensional panel graph plot by specifying -FIGure on the command line, and you also specified a font for all text characters in the plot using -FONT, the program crashed.

Update: You can specify both -FIGure and -FONT on the PlotStructure command line and create a figure file of a 1-dimensional panel graph plot without problem.

Translation

Translate

Problem: If you ran Translate noninteractively by specifying multiple sequences as input on the command line or by adding -Default to the command line, the program occasionally gave the output file a nucleotide sequence type. This occurred when the translated sequence contained only amino acid symbols that could be recognized as IUPAC-IUB nucleotide ambiguity symbols.

Update: Translate always writes an output file with a protein sequence type.

Problem: If you translated a single sequence specified in a list file, any attributes associated with that sequence were ignored unless you added -Default to the command line.

Update: If you translate a single sequence specified in a list file, any begin, end, strand, and join attributes associated with that sequence are used as default input values by the program.

BackTranslate

Problem: If your protein input sequence contained gap characters and you selected one of the table of back-translations menu choices (option a or b), then the table of back translations contained three periods in a row (...). Even though the output file also contained a GCG sequence appended after the table, the file was not recognized as a GCG sequence file by analysis programs.

Update: If your protein input sequence contains gap characters and you select one of the table of back-translations menu choices, the gap characters are back-translated to three tildes in a row (~~~). GCG analysis programs recognize the output file, also containing a GCG sequence appended after the table, as a GCG sequence file.

Manipulation

Assemble

Problem: If you interactively assembled a user sequence and then chose toG)et segments from another sequence, but specified the same user sequence again, the program didn't recognize the sequence the second time.

Update: You can repetitively assemble fragments from a single sequence by repetitively specifying the same sequence name in response to the program prompt.

Problem: If you assembled a single sequence specified in a list file, any attributes associated with that sequence were ignored unless you added -Default to the command line.

Update: If you assemble a single sequence specified in a list file, any begin, end, strand, and join attributes associated with that sequence are used as default input values by the program.

Simplify

Problem: If you entered a multiple sequence specification as input (for example, a sequence specification with an asterisk (*) wildcard) and that multiple sequence specification referenced only a single sequence, the program didn't read the sequence.

Update: You can enter any valid single or multiple sequence specification as input, even if the multiple sequence specification references a single sequence. This is consistent with the behavior of other Wisconsin Package programs that accept either a single or multiple sequences as input.

Display

Publish

Problem: If you selected one of the translation menu choices with numbering (option F or G), the translations may have been numbered incorrectly in several instances. For example, if the translation began in the middle of a row of the nucleotide sequence, it was numbered as if it began at the beginning of the row. If you chose three-letter translations (option F), and an amino acid began at the end of one row and stopped at the beginning of the next row, the numbering was incorrect. If you selected more than one discontinuous translation range (e.g. translations of exons separated by introns), they were numbered as if the entire sequence from the beginning of the first range to the end of the last range had been translated.

Update: The translation numbering is now correct in Publish.

Sequence Exchange

Reformat

Problem: If your attempt to reformat a sequence did not succeed, the input file was deleted.

Update: If your attempt to reformat a sequence does not succeed, the input file is not deleted.

Problem: If you reformatted a scoring matrix found in a directory other than your local directory by specifying a directory path along with the filename, the output scoring matrix was written to that same directory by default.

Update: If you reformat a scoring matrix found in a directory other than your local directory, the output scoring matrix is written to your local directory by default.

FromStaden

Problem: Staden-format nucleotide sequences containing lowercase IUPAC-IUB sequence characters were converted to periods (.) in the GCG-format output sequence files.

Update: Staden-format nucleotide sequences containing lowercase IUPAC-IUB sequence characters are unchanged in the GCG-format output sequence files. Appendix III of the Program Manual contains an updated list of the mappings between Staden and GCG sequence characters.

FromFasta

Problem: In a FastA-format input file, if the documentation following the sequence name contained two adjacent periods (..), the output sequence file had two lines containing two adjacent periods. This file was not recognized as a GCG sequence file by analysis programs.

Update: If the documentation following the sequence name contains two adjacent periods (..) in a FastA-format input file, the program inserts a blank space between the periods (. .) in the output file. GCG analysis programs will then recognize the output sequence file as a GCG sequence file.

Problem: If the documentation following the sequence name for a FastA-format sequence was longer than 511 characters, the program crashed.

Update: If the documentation following the sequence name for a FastA-format sequence is longer than 511 characters, it is written as several shorter documentation lines in the GCG-format output sequence file and the program completes normally.

Package-Wide Bug Fixes

The following notes describe problems and errors that affected the whole Package. These errors are corrected in Version 9.0 of the Wisconsin Package.

Graphics

Problem: In any Wisconsin Package plotting program, if you used the-COPies command-line parameter to specify more than one copy for a plot sent to a PostScript device, only a single copy was plotted.

Update: All of the plot copies you specify with the -COPies command-line parameter are actual plotted.

List Files

Problem: In any program that recognizes the begin: and end: sequence attributes in a list file, if you specified -BEGin or-END on the command line without any value, the program ignored that command-line parameter.

Update: In any program that recognizes the begin: sequence attribute in a list file, if you specify -BEGin without any value, the program uses a beginning position of 1 for each sequence; beginning positions specified for individual sequences in the list file are ignored. In any program that recognizes the end: sequence attribute in a list file, if you specify -END without any value, the program uses the end of each sequence as the ending position; ending positions specified for individual sequences in the list file are ignored.

[ Program Manual | User's Guide | Data Files | Databases ]

Documentation Comments: doc-comments@gcg.com
Technical Support: help@gcg.com

Licenses and Trademarks Wisconsin Package is a trademark of Genetics Computer Group, Inc. GCG and the GCG logo are registered trademarks of Genetics Computer Group, Inc.

All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.