LINEUP

[ Program Manual | User's Guide | Data Files | Databases ]

Table of Contents
FUNCTION
DESCRIPTION
EXAMPLE
PRETTY
STARTING OUT
EDITING EXISTING GROUPS
SCREEN MODE
COMMAND MODE
COMMAND DESCRIPTIONS
HEADING MODE
PROTEIN AND NUCLEOTIDE SEQUENCE GROUPS
THE CONSENSUS SEQUENCE
PULL-OVER AND PUSH-OVER
MULTIPLE SEQUENCE FORMAT (MSF) FILES
EDITING INDIVIDUAL SEQUENCE FILES
EMBEDDED COMMENTS
LINEUP AND PRETTY
THE LINEUP DISPLAY
FILE NAME CONVENTIONS
SYSTEM CRASH OR HANGUP
RESTRICTIONS
ACKNOWLEDGEMENTS
COMMAND-LINE SUMMARY
LOCAL DATA FILES
OPTIONAL PARAMETERS

FUNCTION

[ Top | Next ]

LineUp is a screen editor for editing multiple sequence alignments. You can edit up to 30 sequences simultaneously. New sequences can be typed in by hand or added from existing sequence files. A consensus sequence identifies places where the sequences are in conflict.

DESCRIPTION

[ Previous | Top | Next ]

LineUp lets you edit several overlapping or aligned sequences simultaneously. LineUp allows you to edit sequences in the context of an alignment to help you see the effect of your changes on the alignment.

As in SeqEd, you can move the cursor with the arrow keys and insert or delete symbols or gaps in the sequences. In LineUp, the cursor can travel from one sequence to another. You can add new sequences by hand or from existing sequence files, and you can move sequences from one position to another.

LineUp provides a surface on which you can arrange and edit many sequences. This surface resembles a piece of graph paper with 31 rows and as many columns as you need. The screen acts as a window behind which the LineUp surface is scrolled.

Sequences can be placed anywhere on the surface as long as two sequences in the same row do not collide. Several sequences can be placed on the same row.

Sequences placed on the LineUp surface become part of a sequence group. A new sequence group is formed by running LineUp with a new sequence group name. Sequences already stored in files can be placed anywhere on the surface with the Get command. New sequences (not already in sequence files) can be typed in anywhere on the surface.

When you end a session with LineUp, it writes out each sequence in a file and then writes a list file with the name and position of each sequence in the group. (See Chapter 2, Using Sequence Files and Databases in the User's Guide for more information about list files.) When you edit the group again, the sequences reappear on the LineUp surface where you left them.

You can have a consensus sequence display the dominant character at each column where sequences overlap. The consensus uses uppercase where overlapping sequences are in agreement, lowercase to show disagreement, and periods to show where there is no consensus at all.

EXAMPLE

[ Previous | Top | Next ]

Here is a session using LineUp to edit the same sequence group displayed in the example session for the Pretty program. First use Fetch to copy the files *.frg and picorna.fil to your default directory.


% lineup picorna


R2       Column: 1  Row: 5     No AutoCons  FOSN:  PICORNA  Protein

15: ................................ttttgesad.pvtttve....n..yggdt.q....vq
14: ................................ttatgesad.pvtttve....n..ygget.q....vq
13: ................................ttsagesad.pvtttve....n..ygget.q....iq
12: gvenae.kgvtentna.tadfvaqpvylpe.nqt......kv.affynrs...spi.gaftvks.....
11: glgqmlesmi.dntvretvgaatsrdalpnteasgpthskeipaltavetgatnplvpsdtvqtrhvvq
10: glgqmlesmi.dntvretvgaatsrdalpnteasgpahskeipaltavetgatnplvpsdtvqtrhvvq
 9: gigdmiegav.egitknalvpptstnslpghkpsgpahskeipaltavetgatnplvpsdtvqtrhviq
 8: giedliseva.qgal..tlslpkqqdslpdtkasgpahskevpaltavetgatnplapsdtvqtrhvvq
 7: ...gpvedai.......t..aaigr..vadtvgtgptnseaipaltaaetghtsqvvpgdtmqtrhvkn
 6: glgdeleevivekt.kqtv.asi.........ssgpkhtqkvpiltanetgatmpvlpsdsietrttym
 5: ...npvenyidevlnevlv........vpninssnpttsnsapaldaaetghtssvqpedvietryvqt

 ..|.........|.........|.........|.........|.........|.........|.........
   0        10        20        30        40        50        60

"picorna.fil" successfully loaded.

PRETTY

[ Previous | Top | Next ]

LineUp does not insist that all your sequences start in the same column, but this is a requirement of Pretty.

STARTING OUT

[ Previous | Top | Next ]

To create a new sequence group, use the LineUp command with a new group name such as myseqs. If you use % lineup myseqs, LineUp looks in your current directory for the file myseqs.fil. If you use the command % lineup -MSF myseqs, then LineUp looks for the file myseqs.msf. If it doesn't find a file with this name, LineUp starts a new group with one sequence, the consensus, having the same name as the group. (If you do not want to have a consensus sequence in your group, run LineUp with the command-line parameter -NOCONsensus.) To construct the group, use the Get command to add sequences from existing sequence files or use the New command so LineUp lets you type in a new sequence. LineUp prompts you for a unique name of up to ten characters for each new sequence.

EDITING EXISTING GROUPS

[ Previous | Top | Next ]

You can start LineUp with the name of an existing group. If you have a file of sequence names called myseqs.fil, which was created in a previous session with LineUp, use % lineup myseqs. If you have a multiple sequence format (MSF) file called myseqs.msf, which was created in a previous session with LineUp, use % lineup -MSF myseqs. You may specify a file name extension if the default extension LineUp adds is not appropriate. You also can use any single or multiple sequence specification as input to LineUp. Multiple sequences can be specified as a list file or as a sequence specification using a wildcard. (See Chapter 2, Using Sequence Files and Databases in the User's Guide for help in specifying sequences.) LineUp loads the sequences into the multiple sequence editor and starts with its window at the left end of the group. You can add more sequences, modify existing ones, delete sequences, rename sequences, and move any sequence to a new position.

SCREEN MODE

[ Previous | Top | Next ]

In Screen Mode, commands are typically single keystrokes. Except for the search command, Screen Mode commands do not require a <Return>.

Entering Sequence Characters

In Screen Mode, the cursor shows your position in one of the sequences in the group. You can insert any valid GCG sequence symbol (see Appendix III) into the sequence by typing the symbol. It is inserted at the cursor.

Deleting Sequence Characters

The <Delete> key and <Ctrl>H delete the symbols to the left of the cursor, one by one. The remainder of the sequence slides over to fill the gap.

Moving the Cursor Horizontally

To move the cursor to the right one symbol, use the <Right-arrow> key; to move to the left, use the <Left-arrow> key. Moving the cursor past the end of the sequence moves it to the next sequence on the row.

You can type a number followed by a <Return> and the cursor moves to that position in the current row. If you specify a position that is not occupied by a character, the cursor moves to the nearest occupied position.

The <Left-arrow> and <Right-arrow> keys can be preceded by a number, telling how many symbols to move to the left or right. For example, 10<Right-arrow> moves ten symbols to the right.

You can use the angle brackets to skip 50 characters to the left or right. If you precede the angle bracket by a number, the cursor skips that many characters and continues to do so until you change the number.

Moving the Cursor Vertically

The <Up-arrow> and <Down-arrow> keys move the cursor up or down to the next row.

In contrast with the horizontal arrows, if you precede either the <Up-arrow> or the <Down-arrow> with a number, the cursor moves to the row with that number. For example, 10<Up-arrow> moves to row ten, not up ten rows.

Moving a Whole Sequence

When the cursor is at the left end of a sequence, you can move the sequence to the right with the space bar and to the left with the <Delete> key or <Ctrl>H. If you want to move a sequence to another row or to the other side of a sequence on the same row, you must use the MOve command in Command Mode.

Finding Patterns

To search for a pattern, type a / (slash) in Screen Mode. You are prompted for the sequence pattern you wish to find. LineUp only searches the current sequence. You can repeat the last search by simply using /<Return>.

The command-line parameter -NUCleotide or the NUCleotide command in Command Mode makes LineUp treat all nucleic acid sequences as circular and finds your pattern even if it wraps from the end of the sequence into the beginning. LineUp uses the same rules for pattern definition and recognition as the FindPatterns, MapPlot, Map, and MapSort programs.

The command-line parameter -PROtein or the PROtein command in Command Mode makes LineUp searches linear and disables the nucleic acid ambiguity meanings of the GCG sequence symbols; they also change the way the consensus sequence is defined (see the topic THE CONSENSUS SEQUENCE below).

Even if LineUp thinks your sequence is a nucleotide sequence, you can request a perfect match by typing an = right after the /. So if you type /=RTC only RTC is matched, whether you have a protein or a nucleotide sequence.

Leaving Screen Mode

Use <Ctrl>D to leave Screen Mode and enter Command Mode.

Screen Mode Summary

Here is the summary of Screen Mode commands you would see in the on-line help:


                                 Screen Mode

                      [n] is an optional numeric parameter.

      G, A, T, C ....  - inserts a sequence character
      <Delete>         - deletes a sequence character, "drags" a
                         sequence to the left if cursor is at its start
      <Ctrl>H          - deletes a sequence character, "drags" a
                         sequence to the left if cursor is at its start
      <Space bar>      - "pushes" a sequence to the right if cursor is at
                         its start
      /TAACG<Return>   - finds the next occurrence of "TAACG", last
                         pattern is the default when none is specified
      [n]<Right-arrow> - move ahead [n characters]
      [n]<Left-arrow>  - move back  [n characters]
      [n]<Up-arrow>    - move up to next sequence [or to row specified]
      [n]<Down-arrow>  - move down to next sequence [or to row specified]
      [n]<Return>      - move to column n
      1<Return>        - move to start of current sequence
      <Ctrl>E          - move to end of current sequence
      <Ctrl>R          - redraw the screen
      <Ctrl>D          - enter Command Mode
      <Ctrl>I          - push over all seqs starting past current column
      <Ctrl>P          - pull over all seqs starting past current column
      [n]<             - move 50 [or n] positions to left
      [n]>             - move 50 [or n] positions to right

COMMAND MODE

[ Previous | Top | Next ]

In Command Mode, you enter commands followed by a <Return>.

Entering Command Mode

Use <Ctrl>D to leave Screen Mode and enter Command Mode.

Editing LineUp Commands

LineUp command editing is modeled on OpenVMS DCL command line editing. The <Left-arrow> and <Right-arrow> keys let you move your cursor around in a command that you have typed so you can insert or delete characters at any position. <Ctrl>E moves the cursor to the end of the line. <Ctrl>U deletes all the characters from the current cursor position to the start of the line.

Editing Previous LineUp Commands

LineUp lets you modify and execute previous commands. The <Up-arrow> key displays previous commands.

Returning to the Screen Mode

If you simply press <Return>, LineUp returns to Screen Mode described above. If you have -SINGlecommand on the command line or in your command-line initializing file, LineUp returns to Screen Mode immediately after executing each command.

Commands May Be Truncated

Only the capitalized portion of the commands described in the documentation below must be typed.

Parameters Are Used With Some Commands

Some commands may be preceded or followed by optional numeric parameters or a file name. The square brackets ([ and ]) in the documentation below show optional command arguments: s and f refer to starting and finishing rows or offsets on the surface; x and y refer to offset and row coordinates. When an optional parameter is omitted, some commands prompt you for the value. Others commands make default assumptions that are explained in each command description.

Missing Position Parameters: Spacewalk

Several commands need position parameters to know where to put a sequence. If these parameters are omitted, LineUp enters a mode, called Spacewalk, that allows you to move the cursor anywhere on the surface to select a position for the new sequence. In Spacewalk Mode, the arrow keys and <Return> can be preceded with numbers as in Screen Mode. <Ctrl>D cancels the command when you are in Spacewalk Mode. If you prefer to provide numeric coordinates rather than position the cursor, you can eliminate Spacewalk Mode by using the command-line parameter -NOSPACewalk or the command NOSPacewalk. You are then prompted for numeric coordinates if you omit them from commands.

Other Missing Parameters

If a required name is omitted or illegal, LineUp prompts you for a name. If you respond with a blank name, LineUp cancels the command.

Working Directory

You must have write privileges in your current working directory to use LineUp; otherwise, LineUp will not accept any name you try to give a sequence.

Default Values for LineUp Prompts

Often when it prompts for sequence or file names, LineUp presents a default value in a manner different from other GCG programs; when the prompt appears, it looks like you have already typed in the default value. You can just press <Return> if you want to accept the default. If you want to change it, proceed as with command-line editing. Make small changes by using the arrow keys to move within the offered response or delete the response with <Ctrl>U and type your desired response.

Command Mode Summary

Here is the summary of Command Mode commands you would see in the on-line help:


                                 Command Mode

               x and y represent numbers for column and row.
           Only the capitalized part of the command is necessary.

[x,y] Get [filename]     - add sequence [at position x,y] [from filename]
[x,y] New [seqname]      - add empty sequence [at position x,y] [named
                           seqname]
[x,y] MOve [seqname]     - move current or specified sequence [to x,y]
      REMove [seqname]   - delete current or specified sequence entirely
      REName [old] [new] - change sequence name (changing consensus name
                           changes the group)
      REDraw             - redraw the screen
      HEAding [seqname]  - edit documentary heading of current or
                           specified sequence
      screen             - enter screen mode (pressing <Return> is
                           sufficient)
      NUCleotide         - use nucleotide ambiguity codes in find and
                           consensus
      PROtein            - do not use nucleotide ambiguity codes
      SPacewalk          - use spacewalk to position sequences
      NOSPacewalk        - DO NOT use spacewalk to position sequences
      FOSN               - use list file format when writing
      MSF                - use multiple sequence format files when writing
[n]   SLide              - add n to all sequence columns
[s,f] ROWMove [n]        - move a set of rows (s to f) up or down
                           [n rows]
[s,f] PRint   [filename] - write the sequence group to a Pretty format
                           file
      SUMmary [filename] - write the sequence names and positions
                           in a file or on the terminal screen
      GOto [seqname]     - put cursor on start of named sequence
[s,f] CONSensus          - calculate consensus [from s to f]
      AUtoconsensus      - automatically calculate consensus (slow)
      NOAUtoconsensus    - turn off automatic consensus
      FLip               - reverse complement the current group
      ZIp [filename]     - align and gap a sequence to the current group
      Write [filename]   - write the current sequence group to a file
      EXit  [filename]   - write the current group to a file and stop
      Quit               - quit the editor without writing out the group

COMMAND DESCRIPTIONS

[ Previous | Top | Next ]

:[x,y] Get [filename]

adds the sequence in the specified file to the group at column x in row y. The screen is erased and you are prompted to enter the range and strand. Unlike the Write and EXit commands, Get does not assume any file extension. You must type the file name plus any extension it requires.

:[x,y] New [SeqName]

adds an empty sequence at column x in row y.

:[x,y] MOve [SeqName]

moves the sequence to start at column x in row y. If the SeqName parameter is omitted, the sequence at the current cursor position is moved.

:REMove [SeqName]

deletes the entire sequence from the group. If the SeqName parameter is omitted, the sequence at the current cursor position is removed.

:REName [OldName] [NewName]

changes the name of the sequence. If no names are provided in the command, the sequence at the current cursor position is renamed and you are prompted for the new name. If only one name is provided, it is assumed to be the old name and you are prompted for the new name.

:REDraw

redraws your terminal screen. This is useful if noise in the line between your terminal and the computer has changed the screen in some unreasonable way or if a system message appears on your screen.

:[s] HEAding [SeqName]

enters Heading Mode to let you view and edit the documentary heading. You can modify any part of the heading. Heading Mode is terminated with <Ctrl>D. The optional SeqName parameter specifies which sequence heading you want to edit. If omitted, the sequence at the current cursor position is assumed. The optional numeric parameter specifies which line of the heading you want to start editing.

:screen

returns your session to Screen Mode. Just pressing <Return> also returns you to Screen Mode. (If you prefer to return to Screen Mode after every command is executed, use the command-line parameter -SINGlecommand.)

:NUCleotide

sets the sequence type for each sequence in the sequence group to be nucleotide. This enables the nucleic acid ambiguity meanings of the GCG sequence symbols in pattern searches (with /) and consensus definition (set the topic THE CONSENSUS SEQUENCE below). Also, LineUp treats nucleic acid sequences as circular when searching for a pattern. When the sequences are saved to files with either the Write or EXit command, they are written as nucleotide sequences if their sequence type is nucleotide.

:PROtein

sets the sequence type for each sequence in the sequence group to be protein. This forces LineUp to treat all sequences as linear in pattern searches and does not interpret any sequence characters as nucleotide ambiguity symbols in pattern searches and consensus definition (see the topic THE CONSENSUS SEQUENCE below). When the sequences are saved to files with either the Write or EXit command, they are written as protein sequences if their sequence type is protein.

:SPacewalk

enters Spacewalk Mode, which allows you to move the cursor anywhere on the surface to select a position for a new sequence.

:NOSPacewalk

tells LineUp not to use Spacewalk Mode but to prompt for numerical surface coordinates.

:FOSN

tells LineUp to use the list file format when loading or storing the sequence group.

:MSF

tells LineUp to use the multiple sequence format (MSF) file when loading or storing the sequence group.

:[s,f] PRint [filename]

writes a file of the formatted sequence group from position s to f. The format resembles that of Pretty.

:SUMmary [filename]

writes a list of the names and beginning positions of the sequences loaded into the LineUp editor. This list can go either to a file or to your screen (by typing Term for filename).

:[n] SLide

shifts all the sequence starting positions by n. The coordinate ruler appears to slide under the sequences. n can be either a positive or negative number to shift the sequences to the right or left, respectively.

:[s,f] ROWMove [n]

moves a clump of rows up or down. The sequences on rows numbered from s to f are moved up n rows. Negative values of n move the sequences down n rows. This command can be used to open a row in the middle of the surface for another sequence. LineUp will not let you move sequences onto rows containing other sequences not simultaneously being moved.

:GOto [SeqName]

moves the cursor to the beginning of the named sequence.

:[s,f] CONsensus

calculates the consensus sequence between positions s and f. If the optional positions are omitted, the entire consensus is calculated. This command only works when LineUp is not in the Auto Consensus state. (See the topic THE CONSENSUS SEQUENCE below for further details.)

:AUtoconsensus

makes LineUp recalculate the consensus sequence each time there is a change in any of the other sequences. When LineUp is in the Auto Consensus state, the consensus is strictly a function of the other sequences and cannot be changed directly. When the sequence group is large, recomputing the consensus uses a lot of machine time and makes LineUp appear sluggish.

:NOAUtoconsensus

turns off the Auto Consensus state. This allows you to change the consensus directly.

:ZIp [filename]

aligns a new sequence to the current consensus.

:Write [filename]

records the current surface configuration in a list file and saves the current version of each sequence in a file if the program is in FOSN mode (see the FILE NAME CONVENTIONS topic below). If the program is in MSF mode, a multiple sequence format (MSF) file is written. If the filename parameter is omitted, LineUp uses the sequence group name specified when the program is initially run. If you specify a file in another directory, all files are created there.

:EXit [filename]

works like the Write command but stops the session after writing out the sequences. The filename parameter behaves as in the Write command.

:Quit

terminates a session with LineUp without saving any changes you've made since the last time you used the Write command.

:Help

shows the commands available in Screen and Command Modes of LineUp.

HEADING MODE

[ Previous | Top | Next ]

Heading Mode allows you to view and edit the documentation that precedes the sequence in the sequence file. All headings are lost if you write the sequences into a multiple sequence format (MSF) file.

Entering Heading Mode

To enter Heading Mode, use the HEAding command.

Leaving Heading Mode

Use <Ctrl>D to return to Command Mode.

Moving the Cursor

You can move around using the arrow keys. Although the editing window is only twenty lines long, it scrolls over the heading vertically to let you see and modify any part. <Ctrl>E positions the cursor at the end of the current line.

Editing in Heading Mode

Like many text editors, typing inserts text at the cursor and the <Delete> key and <Ctrl>H delete characters to the left of the cursor. <Ctrl>U deletes everything from the current cursor position to the start of the line. Pressing <Return> creates a new line starting at the current position in the heading.

Unlike many text editors, before letting you edit the heading LineUp asks you if you need more storage. You must enter the maximum number of lines that you expect to have to add. If you are in Heading Mode and find you do not have enough storage for your changes and additions, you can exit Heading Mode and enter it again, specifying some larger number of lines for increased storage.

PROTEIN AND NUCLEOTIDE SEQUENCE GROUPS

[ Previous | Top | Next ]

LineUp behaves differently depending on whether you are working with a protein or nucleotide sequence group.

If you are working with a nucleotide sequence group, then pattern searches (see "Finding Patterns" under the SCREEN MODE topic) and the consensus definition (see the topic THE CONSENSUS SEQUENCE) assume the IUB nucleotide ambiguity meanings for the GCG sequence symbols. Also, LineUp treats nucleic acid sequences as circular when searching for patterns. When the sequences are saved to files with either the Write or EXit command, they are written as nucleotide sequences if their sequence type is nucleotide.

If you are working with a protein sequence group, then LineUp treats all sequences as linear in pattern searches and does not interpret any sequence characters as nucleotide ambiguity symbols in pattern searches and the consensus definition. When the sequences are saved to files with either the Write or EXit command, they are written as protein sequences if their sequence type is protein.

By default, if the first sequence entered into the LineUp editor screen is from an existing sequence file, then the type of that sequence determines the type for the entire group. If the first sequence in a sequence group is entered interactively from the keyboard, then LineUp sets the sequence type for the entire sequence group to be protein, by default. LineUp indicates the type of the sequence group (protein or nucleotide) in the upper-right corner of the editor screen.

You can specify the sequence type for the entire group from the command line with the -PROtein and -NUCleotide command-line parameters. Once you are viewing the editor screen, you can change the sequence type for the entire sequence group with the PROtein and NUCleotide command mode commands.

THE CONSENSUS SEQUENCE

[ Previous | Top | Next ]

An optional consensus sequence can be generated as a function of the rest of the sequences in a sequence group, or, like any other sequence, typed in by you.

By default, new sequence groups contain an empty consensus at row 0 unless LineUp is run with the -NOCONsensus command-line parameter.

If the sequence group has no consensus, you can create one using the New command and giving the new sequence the same name as the sequence group. The CONsensus command and the AUtoconsensus commands now work on the row you have designated for the consensus with the New command.

When a consensus sequence is generated by LineUp, either by issuing the CONsensus command or AUtoconsensus command, each consensus character in the consensus sequence is replaced with a character that is a function of the other characters in its column. If all the characters in the column are the same letter and at least one of them is uppercase, the consensus character is the uppercase equivalent of that letter. If there is more than one letter in the column, but one occurs more frequently than any other, or if all letters are the same, but none are uppercase, then the consensus character is the lowercase of that letter. Otherwise, the consensus character is a dot (.).

The consensus definition also depends on whether LineUp is working with a nucleotide or protein sequence group. If a protein sequence group is loaded into the multiple sequence editor, the above description is complete. If a nucleotide sequence group is loaded into the editor, the ambiguity codes are ignored for the purpose of consensus definition. This treats all ambiguity codes as though they were the code 'N.' LineUp indicates the type of the sequence group (protein or nucleotide) in the upper-right corner of the editor screen.

The consensus sequence is distinguished by having the same name as the sequence group. If you rename the consensus sequence with the REName command, the name of the sequence group changes as well. (You can rename the group even if you have no consensus sequence.) Conversely, if you specify a file name in a Write or EXit command, this changes the name for the sequence group being saved and also changes the consensus sequence name.

The consensus sequence is unique in that, because it will likely extend to all columns determined by the other sequences, no other sequence may share its row. You can delete the consensus sequence from your group, and you can later create a new consensus sequence. However, an existing sequence cannot become the consensus sequence, either through the REName or Get commands.

PULL-OVER AND PUSH-OVER

[ Previous | Top | Next ]

If you use LineUp with sequence groups that have different sequences starting at several different columns, a problem will arise when you make deletions or insertions to whole columns. For example, suppose you are assembling a group of sequences with LineUp. When a new sequence is added, you may decide a previous sequence reading was incorrect. You may decide to delete a base from an old sequence near the left end of the assembly. The tail of that sequence slides left one column, destroying its alignment with any sequence starting to the right of the deletion site.

The general problem is that insertions or deletions cause shifts of register between sequences. Those sequences that overlap with the changed column appear to need adjustment. But those sequences that start down to the right do not appear misaligned.

LineUp warns you of potential alignment problems by producing a warning sign at the top of the screen. The sign says either PUSH-OVER WARNING or PULL-OVER WARNING, depending on whether there was an insertion or deletion. The warning is only displayed if there are other sequences that start to the right of your change. To make it easy for you to correct the alignment problem, LineUp provides you with screen mode commands to PULLOVER the sequences starting to the right of the cursor (<Ctrl>P for deletion), or to PUSHOVER (<Ctrl>I for insertion). You must make the decision whether the deletion or insertion requires adjustments and then ensure that the adjustments are correct. It is not recommended that you blindly trust the warning sign but that you let it remind you of the issue.

MULTIPLE SEQUENCE FORMAT (MSF) FILES

[ Previous | Top | Next ]

By default, LineUp reads and writes individual sequence files, grouped in a list file (FOSN format). Using the command-line parameter -MSF causes LineUp to expect a multiple sequence format (MSF) file when reading a sequence group, and to write out an MSF file when storing a sequence group (MSF format). For instance, the command % lineup -MSF hsp70 reads the sequences in the file hsp70.msf into the LineUp editor and names the sequence group hsp70. (See Chapter 2, Using Sequence Files and Databases in the User's Guide for a complete description of MSF files.) When LineUp writes an MSF file, leading gap characters (.) are added to those sequences that do not start at the beginning of the alignment so that all sequences are left-justified in the output file.

The current sequence group format is indicated as either FOSN: or MSF: on the top line of the screen editor. You can toggle between these two formats using the FOSN and MSF commands in command mode.

EDITING INDIVIDUAL SEQUENCE FILES

[ Previous | Top | Next ]

There is no harm in using SeqEd to change a sequence file that has been written by LineUp. Provided the name is the same, the new version is accepted by LineUp. The only restriction on replacing members of a sequence group is that the new members must not overlap with other sequences on the same row. The information where the sequence starts is stored in the list file, so changing the sequence file can only change the length of the sequence. You can change where a sequence starts on the surface by modifying the Offset and Row columns of its entry in the list file using a text editor. If you overlap two sequences on the same row, LineUp refuses to load one of the overlapping sequences.

EMBEDDED COMMENTS

[ Previous | Top | Next ]

LineUp does not handle embedded comments. LineUp can read files containing embedded comments, but the comments are lost and will not appear in any file written by LineUp.

LINEUP AND PRETTY

[ Previous | Top | Next ]

If your sequences all start at the same column, you can use Pretty to generate a consensus sequence for a sequence group created by LineUp. Pretty uses a more sophisticated algorithm than LineUp to generate a consensus sequence and you have more control over the consensus calculation. However, Pretty can only handle sequence groups whose left ends are aligned.

Pretty and LineUp both know how to read the other's files of sequence names, so you can use Pretty to get a consensus sequence in Pretty format. Then, % pretty -UGLy makes a file of sequence names that LineUp can read. However, the consensus sequence defined by Pretty will not be recognized as the consensus sequence of LineUp. It is named Consensus by Pretty, whereas LineUp names its consensus sequence with the sequence group name. This is reasonable, since LineUp will not define the consensus in the same way, so the names should be different.

If you alternate between using Pretty and LineUp on a sequence group having a LineUp consensus sequence, you have to preserve the old sequence group name when doing Pretty -UGLy in order to make LineUp recognize the consensus sequence. If you give a new name to the group, the consensus sequence is no longer recognizable, as such, by LineUp.

THE LINEUP DISPLAY

[ Previous | Top | Next ]

Several indicators for LineUp are displayed on the top row of the screen. The left-most word indicates the name of the sequence on which the cursor currently rests. Next, the cursor's position on the surface is displayed. Then the display shows whether LineUp calculates the consensus automatically every time you add or delete a character. The sequence group name is next, preceded by either FOSN or MSF, indicating the file format to be used for reading and writing the sequence group. Finally, LineUp indicates whether the type of the sequence group is nucleotide or protein.

LineUp frequently displays the PULL-OVER WARNING sign (see the PULL-OVER AND PUSH-OVER topic above).

The screen provides a window onto the sequence surface. Through this window, 16 of the 31 surface rows can be viewed at one time. As you move your cursor near the top row of the window, for example, if there are occupied surface rows past the top of the window the surface is scrolled down, letting you see more lines at the top of the window and fewer at the bottom.

When there are more rows in use than can be displayed at once, some rows are hidden above or below the window. When this happens a '+' is displayed next to the top or bottom row number indicating hidden rows in that direction.

Although the window also scrolls horizontally, there is no analogous sign indicating that you cannot see the whole length of the surface.

FILE NAME CONVENTIONS

[ Previous | Top | Next ]

When you save a sequence group using FOSN format, the name given to the FOSN is made up of the sequence group name followed by the extension .fil. The sequence file names are the sequence names used in LineUp and the file extension .frg. When you save a sequence group using MSF format, the name given to the MSF file is made up of the sequence group name followed by the extension .msf.

These file name extensions are the defaults for LineUp, but you can specify your own by using the command-line parameters -FOSNEXtension, -FRAGEXtension, and -MSFEXtension (see below). You can override these choices when you specify an output file name; if you include a file extension, it is used in lieu of that given on the command line or the default.

SYSTEM CRASH OR HANGUP

[ Previous | Top | Next ]

The current version of LineUp cannot recover from a system crash. If you are disconnected from LineUp, you lose everything you have done since the last time you saved the group using the Write or EXit commands. Therefore, we recommend that you save your work frequently using the Write command so that little is lost in the event of a crash.

RESTRICTIONS

[ Previous | Top | Next ]

There is a Finite Amount of Storage Space

LineUp has a total of one million bytes of storage for sequences and their headings. While this is large, it is finite. If your sequence group exceeds this limit, the parameter MEMSIZE can be changed in the the file GenInclude:mem.inc and LineUp can be recompiled and linked. If you make MEMSIZE larger, you may notice that the computer is doing a huge number of page swaps, making the program needlessly expensive and inefficient to run. If you suspect this is happening, you should discuss the possibility of raising the size of your working set with your system manager. This will reduce the number of page swaps required.

ACKNOWLEDGEMENTS

[ Previous | Top | Next ]

LineUp was designed and implemented by Dr. William Winsborough. We are very grateful for the collaboration of Drs. William Boorstein and Lynn Manseau of the UW Department of Physiological Chemistry.

COMMAND-LINE SUMMARY

[ Previous | Top | Next ]

All parameters for this program may be put on the command line. Use the parameter -CHEck to see the summary below and to have a chance to add things to the command line before the program executes. In the summary below, the capitalized letters in the parameter names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose parameter values that are optional. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.


Minimal Syntax: % lineup [-INfile1=]Picorna

Prompted Parameters: None

Local Data Files:

set.keys (must be in your current working directory to be used)

-MATRix=blosum62.cmp       scoring matrix for Zipping peptides
-MATRix=swgapdna.cmp       scoring matrix for Zipping nucleic acids

Optional Parameters:

-MSF                reads and writes sequence groups in MSF format
-SINGlecommnd       automatically returns to screen mode after each
                       command
-PROtein            sets sequence type to protein, and sets find to
                       search for perfect symbol matches
-NUCleotide         sets sequence type to nucleotide, and sets find to
                       allow nucleotide ambiguity code matches
-CONSROW=0          sets the consensus row for a new sequence group
-NOCONsensus        starts new sequence groups without a consensus row
-LINesize=50        sets line length for output with the PRint command
-BLOcksize=10       sets block length for output with the PRint command
-FRAGEXtension=frg  sets the file extension for each sequence when using
                       FOSN format
-CONSEXtension=con  sets the file extension for the consensus when using
                       FOSN format
-FOSNEXtension=fil  sets the file extension for the list file when
                       using FOSN format
-MSFEXtension=msf   sets the file extension for the multiple sequence
                       file when using MSF format

LOCAL DATA FILES

[ Previous | Top | Next ]

The files described below supply auxiliary data to this program. The program automatically reads them from a public data directory unless you either 1) have a data file with exactly the same name in your current working directory; or 2) name a file on the command line with an expression like -DATa1=myfile.dat. For more information see Chapter 4, Using Data Files in the User's Guide.

Customizing Your Keyboard With SetKeys

You can use the program SetKeys to create a set.keys file that tells the SeqEd, GelEnter, LineUp, GelAssemble, and SeqLab sequence editors how to interpret the letters you type at the terminal. When entering gel readings, it is useful to have the symbols for G, A, T, and C under the fingers of one hand in the same positions as the lanes in your gel. SeqEd, GelEnter, LineUp, GelAssemble, and the SeqLab sequence editor automatically read the file set.keys if it is present in your local directory. If set.keys is absent, or if the sequence type is set to Protein (in SeqEd and LineUp only) the terminal keys retain their conventional meanings.

If you have a set.keys file in your directory, SeqEd, GelEnter, LineUp, and GelAssemble only respond to the keys that it redefines. You can edit the file set.keys with a text editor if some of the keys you want to use are not in it. Any keys not mentioned in set.keys appear to be dead in these sequence editors. In the SeqLab sequence editor, keys that are not redefined retain their normal meanings.

Several keys are vital for the control of SeqEd, LineUp, GelEnter, and GelAssemble; this means you are not allowed to redefine the keys for /, [, ], {, }, (, ), :, ,, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, <Ctrl>R, <Ctrl>D, <Ctrl>H, <Return>, and <Ctrl>E.

Local Scoring Matrices

This program reads one or more scoring matrices for the comparison of sequence characters. The program automatically reads the program default scoring matrix file in a public data directory unless you either 1) have a data file with exactly the same name as the program default scoring matrix in your current working directory; or 2) have a data file with exactly the same name as the program default scoring matrix in the directory with the logical name MyData; or 3) name a file on the command line with an expression like -MATRix=mymatrix.cmp. If you don't include a directory specification when you name a file on the command line with -MATRix, the program searches for the file first in your local directory, then in the directory with the logical name MyData, then in the public data directory with the logical name GenMoreData, and finally in the public data directory with the logical name GenRunData. For more information see "Using a Special Kind of Data File: A Scoring Matrix" in Chapter 4, Using Data Files in the User's Guide.

When you use the :ZIp command to align a new sequence to the current consensus, LineUp reads a scoring matrix file containing values for every possible comparison between sequence symbols. By default, LineUp reads the file swgapdna.cmp for nucleotide sequence alignments and blosum62.cmp for protein sequence alignments.

OPTIONAL PARAMETERS

[ Previous | Top | Next ]

The parameters listed below can be set from the command line. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.

-MSF

sets LineUp to use MSF format. LineUp reads a sequence group from an MSF (multiple sequence format) file and writes an MSF file when storing a sequence group. The default FOSN format reads and writes individual sequence files, grouped in a list file. (See Chapter 2, Using Sequence Files and Databases in the User's Guide for a complete description of list files and MSF files.)

-SINGlecommand

sets LineUp to return automatically to Screen Mode after every command in Command Mode. -NOSINGlecommand is the default.

-PROtein and -NUCleotide

sets the sequence type for each sequence in the sequence group to be either protein or nucleotide. By default, if the first sequence in a sequence group is read from an existing sequence file, then the type of that sequence determines the type for the entire group. Also by default, if the first sequence in a sequence group is entered interactively from the keyboard, then LineUp sets the sequence type for the entire sequence group to be protein.

You can change the sequence type for the entire group when you are in LineUp with the PROtein and NUCleotide commands. PROtein tells LineUp to make pattern searches using perfect symbol matches. When LineUp is in the nucleotide state, if you type /GARC in Screen Mode, either of the patterns GAAC or GAGC is found. In the protein state, LineUp treats sequences as linear and will not find patterns that start at the end and continue into the beginning. In the nucleotide state, sequences are searched as though they are circular.

The automatic consensus definition is also different in the protein state than in the nucleotide state. In the nucleotide state, ambiguity codes make no contribution to the consensus. They are treated as if they were all Ns and are ignored. In the protein state, all characters have the same status.

-CONsensus and -NOCONsensus

tells LineUp whether new files should start with a consensus sequence in the group. (Remember that you can create or remove the consensus sequence at anytime, so this is only a matter of convenience.) The default is -CONsensus.

-CONSROW=0

tells LineUp on which row to put the consensus in a new sequence group. This command is only in effect if -NOCONsensus is not on the command line. The default is row 0.

-LINesize=n

sets the line length for pretty-style output created by the PRint command. The value must be in the range from 10 to 110. The default value is 50.

-BLOcksize=n

sets the block length (number of bases between spaces) for pretty-style output created by the PRint command. The range of n must be 1 to line size. The default value is 10.

-MATRix=mymatrix.cmp

allows you to specify a scoring matrix file name other than the program default. If you don't include a directory specification when you name a file on the command line with -MATRix, the program searches for the file first in your local directory, then in the directory with the logical name MyData, then in the public data directory with the logical name GenMoreData, and finally in the public data directory with the logical name GenRunData. For more information see the Local Scoring Matrices topic above.

-FRAGEXtension=frg

sets the file extension that LineUp uses when reading and writing sequence files while in the FOSN format state. Do not include the dot separating the file from the extension. The default value is 'frg'.

-CONSEXtension=con

sets the file extension that LineUp uses when reading and writing consensus sequence files while in the FOSN format state. Do not include the dot separating the file from the extension. The default value is 'con'.

-FOSNEXtension=fil

sets the file extension that LineUp uses when reading and writing list files while in the FOSN format state. Do not include the dot separating the file from the extension. The default value is 'fil'.

-MSFEXtension=msf

sets the file extension that LineUp uses when reading and writing multiple sequence format files while in the MSF format state. Do not include the dot separating the file from the extension. The default value is 'msf'.

Printed: November 18, 1996 13:05 (1162)

[ Program Manual | User's Guide | Data Files | Databases ]


Documentation Comments: doc-comments@gcg.com
Technical Support: help@gcg.com

Copyright (c) 1982, 1983, 1985, 1986, 1987, 1989, 1991, 1994, 1995, 1996, 1997 Genetics Computer Group, Inc. a wholly owned subsidiary of Oxford Molecular Group, Inc. All rights reserved.

Licenses and Trademarks Wisconsin Package is a trademark of Genetics Computer Group, Inc. GCG and the GCG logo are registered trademarks of Genetics Computer Group, Inc.

All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.

Genetics Computer Group

www.gcg.com