NOOVERLAP

[ Program Manual | User's Guide | Data Files | Databases ]

Table of Contents
FUNCTION
DESCRIPTION
EXAMPLE
OUTPUT
INPUT FILES
RELATED PROGRAMS
RESTRICTIONS
CONSIDERATIONS
COMMAND-LINE SUMMARY
ACKNOWLEDGEMENT
LOCAL DATA FILES
OPTIONAL PARAMETERS

FUNCTION

[ Top | Next ]

NoOverlap identifies the places where a group of nucleotide sequences do not share any common subsequences.

DESCRIPTION

[ Previous | Top | Next ]

This program determines if there are regions where a group of nucleotide sequences do not share any common subsequences. Witkiewicz, Bolander, and Edwards assert that hybridization probes specific enough to detect individual members of a gene family can be prepared if a region 100 bases or longer can be found that does not have a perfect match of nine or more bases with any other member of the family (BioTechniques 14(3); 458-463). NoOverlap is designed to find out if such regions occur in a group of sequences.

To use NoOverlap, you name a group of related sequences in which you want to find regions that do not share any 9-mer with any other sequence in the group. The resulting output is a list of the sequences that have such regions and the coordinates of the regions where no common 9-mers occur.

EXAMPLE

[ Previous | Top | Next ]

Here is a session using NoOverlap to find all of the regions of length 100 or greater that contain no common 9-mers in the sequences named in the file of sequence names inhibit.list.


% nooverlap

 (Double-stranded) NOOVERLAP among what sequences ?  @inhibit.list

 What is the word size (* 9 *) ?

 What minimum region length with no 9-mers (* 100 *) ?

 What should I call the output file (* nooverlap.dat *) ?

   Reading ..
 Comparing ..

 NOOVERLAP complete!

             Sequences:       2
          Total Length:   1,844
        Common  9-mers:      22
 Regions of no overlap:       7

%

OUTPUT

[ Previous | Top | Next ]

NoOverlap makes an output file with a list of all the non-overlapping regions in every sequence that meet your requirements for word size and length. Here is the output file from this session:


 (Double-stranded) NOOVERLAP of: @inhibit.list  September 20, 1996 10:57

 Window: 9  Minimum No-hit region: 100  Sequences: 2

Sequence   Ranges     ..

X03124
               1-116
             422-583
             593-772

J05593
               1-195
             275-402
             493-599
             691-790

INPUT FILES

[ Previous | Top | Next ]

NoOverlap accepts multiple (two or more) nucleotide sequences as input. You can specify multiple sequences in a number of ways: by using a list file, for example @project.list; by using an MSF or RSF file, for example project.msf{*}; or by using a sequence specification with an asterisk (*) wildcard, for example GenEMBL:*. If NoOverlap rejects your nucleotide sequence, turn to Appendix VI to see how to change or set the type of a sequence.

RELATED PROGRAMS

[ Previous | Top | Next ]

Compare compares two protein or nucleic acid sequences and creates a file of the points of similarity between them for plotting with DotPlot. Compare finds the points using either a window/stringency or a word match criterion. The word comparison is 1,000 times faster than the window/stringency comparison, but somewhat less sensitive.

RESTRICTIONS

[ Previous | Top | Next ]

NoOverlap only works with nucleotide sequences. The total of all sequence lengths cannot be greater than 350,000 bases.

CONSIDERATIONS

[ Previous | Top | Next ]

If your setting for the minimum region length without an n-mer is greater than the longest sequence in the set of sequences you search, NoOverlap will adjust it downwards to the length of the longest sequence in the group.

Different ambiguity codes will not necessarily match one another. That is, NoOverlap converts ambiguity codes to single, unambiguous bases. Thus, ambiguity codes match only those other ambiguity codes which have been converted to the same unambiguous base.

RNA and DNA are treated the same way; that is, T is equivalent to U.

COMMAND-LINE SUMMARY

[ Previous | Top | Next ]

All parameters for this program may be put on the command line. Use the parameter -CHEck to see the summary below and to have a chance to add things to the command line before the program executes. In the summary below, the capitalized letters in the parameter names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose parameter values that are optional. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.


Minimal Syntax: % nooverlap [-INfile1=]@Inhibit.List -Default

Prompted Parameters:

-WORdsize=9                 the length of words that must not occur
-MINlength=100              minimum size of region with no common words
[-OUTfile=]nooverlap.dat    the output file name

Local Data Files: None

Optional Switches:

-ONEstrand   searches only the top strand of your sequences
-NOMONitor   suppresses the screen trace: "Reading ..."
-NOSUMmary   suppresses the summary at the end of the program

ACKNOWLEDGEMENT

[ Previous | Top | Next ]

NoOverlap was written by John Devereux in collaboration with Dr. Halina Witkiewicz at the Mayo clinic.

LOCAL DATA FILES

[ Previous | Top | Next ]

None.

OPTIONAL PARAMETERS

[ Previous | Top | Next ]

The parameters listed below can be set from the command line. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.

-ONEstrand

searches only for regions in the top strand of each of your sequences.

-MONitor

This program normally monitors its progress on your screen. However, when you use -Default to suppress all program interaction, you also suppress the monitor. You can turn it back on with this parameter. If you are running the program in batch, the monitor will appear in the log file.

-SUMmary

writes a summary of the program's work to the screen when you've used the -Default parameter to suppress all program interaction. A summary typically displays at the end of a program run interactively. You can suppress the summary for a program run interactively with -NOSUMmary.

You can also use this parameter to cause a summary of the program's work to be written in the log file of a program run in batch.

Printed: November 18, 1996 13:05 (1162)


[ Program Manual | User's Guide | Data Files | Databases ]


Documentation Comments: doc-comments@gcg.com
Technical Support: help@gcg.com

Copyright (c) 1982, 1983, 1985, 1986, 1987, 1989, 1991, 1994, 1995, 1996, 1997 Genetics Computer Group, Inc. a wholly owned subsidiary of Oxford Molecular Group, Inc. All rights reserved.

Licenses and Trademarks Wisconsin Package is a trademark of Genetics Computer Group, Inc. GCG and the GCG logo are registered trademarks of Genetics Computer Group, Inc.

All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.

Genetics Computer Group

www.gcg.com