GCGToBLAST

[ Program Manual | User's Guide | Data Files | Databases ]

Table of Contents
FUNCTION
DESCRIPTION
EXAMPLE
OUTPUT
INPUT FILES
RELATED PROGRAMS
RESTRICTIONS
SPECIFYING DATABASES TO BLAST
CONSIDERATIONS
COMMAND-LINE SUMMARY
LOCAL DATA FILES
OPTIONAL PARAMETERS

FUNCTION

[ Top | Next ]

GCGToBLAST combines any set of GCG sequences into a database that you can search with BLAST.

DESCRIPTION

[ Previous | Top | Next ]

BLAST can search only databases that have been compressed into a special format. Such databases must be searched in their entirety. GCGToBLAST is provided to allow you to create a BLAST-searchable database from a group of sequences that interest you.

GCGToBLAST accepts any GCG multiple sequence specification as input and creates the three or four output files necessary for BLAST. These files share a common base name (the database name) and must be kept together in the same directory.

The output is written into your current working directory. If you want your output written into another directory use the command-line parameter -DIRectory=/usr/user/burgess/seq/.

EXAMPLE

[ Previous | Top | Next ]

Here is a session with GCGToBLAST that converts all the sequences specified by hsp70.list into a database suitable for input to BLAST.


% gcgtoblast

 GCGTOBLAST of what input sequence(s) ?  @hsp70.list

 What should I call the database ?  hsp70

        SW:HS70_BRELC     676 characters.
        SW:HS70_CHICK     634 characters.

        /////////////////////////////////

        SW:GR78_YEAST     682 characters.
        SW:HS74_YEAST     641 characters.
        SW:DNAK_ECOLI     637 characters.
hsp70 ==> 28 sequences totalling 18,014 letters
Maximum sequence length 682

 GCGTOBLAST complete:

         Sequences: 28
           Symbols: 18,014
   Output files in: .

%

OUTPUT

[ Previous | Top | Next ]

GCGToBLAST writes three or four files in your current working directory unless you redirect the output with the -DIRectory parameter.

INPUT FILES

[ Previous | Top | Next ]

GCGToBLAST accepts multiple sequences of the same type. You can specify multiple sequences in a number of ways: by using a list file, for example @project.list; by using an MSF or RSF file, for example project.msf{*}; or by using a sequence specification with an asterisk (*) wildcard, for example GenEMBL:*.

RELATED PROGRAMS

[ Previous | Top | Next ]

DataSet creates a GCG data library from any set of sequences in GCG format.

BLAST searches for sequences similar to a query sequence. The query and the database searched can be either peptide or nucleic acid in any combination. BLAST can search databases on your own computer or databases maintained at the National Center for Biotechnology Information (NCBI) in Bethesda, Maryland, USA.

RESTRICTIONS

[ Previous | Top | Next ]

All the sequences compressed by GCGToBLAST must be the same type, that is all nucleotide or all protein! The output files must be kept together in the same directory.

SPECIFYING DATABASES TO BLAST

[ Previous | Top | Next ]

By default BLAST does local searches by reading files from the directory whose logical name is BLASTDB. Each database known to BLAST is named in one of the three local data files: blast.rdbs, blast.ldbs, and blast.sdbs, so if your BLAST-searchable database is in some other directory, you have to name that directory as part of the search set specification to BLAST. For instance you could use a specification like /usr/user/burgess/seq/mydatabase that includes both the directory name and the name of the BLAST-searchable database (mydatabase in this example).

CONSIDERATIONS

[ Previous | Top | Next ]

The compressed representation of nucleotide sequences in the output from GCGToBLAST is not rich enough to represent nucleotide ambiguity codes accurately. So in addition to the compressed form of the database, GCGToBLAST writes an ASCII version of the data in what is becoming known as FastA format. If the sequences in the database are nucleotide, and if they contain ambiguous symbols, GCGToBLAST saves this file. BLAST uses it to display any sequences found in a search that contain ambiguous symbols.

The FastA format file can be large. If the display of the correct original ambiguity codes in your segment pair output is not important to you, you might want to delete this file or use GCGToBLAST with the -DELete parameter so that GCGToBLAST will delete it for you once the database is created. GCGToBLAST automatically deletes this file if the sequences in the data set are proteins, since the compressed amino acid codes can express ambiguity.

COMMAND-LINE SUMMARY

[ Previous | Top | Next ]

All parameters for this program may be put on the command line. Use the parameter -CHEck to see the summary below and to have a chance to add things to the command line before the program executes. In the summary below, the capitalized letters in the parameter names are the letters that you must type in order to use the parameter. Square brackets ([ and ]) enclose parameter values that are optional. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.


Minimal Syntax: % gcgtoblast [-INfile=]@Hsp70.List [-OUTfile=]Hsp70 -Default

Prompted Parameters: None

Local Data Files:  None

Optional Switches:

-DIRectory=DirName writes into a directory other than the current directory
-DELete            deletes the FASTA-format data after database generation
-NOMONitor         suppresses the screen monitor
-NOSUMmary         suppresses the screen summary
-BATch             submits the program to run in the batch queue

LOCAL DATA FILES

[ Previous | Top | Next ]

None.

OPTIONAL PARAMETERS

[ Previous | Top | Next ]

The parameters listed below can be set from the command line. For more information, see "Using Program Parameters" in Chapter 3, Using Programs in the User's Guide.

-DIRectory=DirName

This parameter allows you to redirect the output files written by GCGToBLAST to a directory other than your current working directory.

-DELete

Use this parameter to direct GCGToBLAST to delete the FastA-format version of a nucleotide sequence database that it creates in addition to the compressed database. You would do this to free up disk space if the display of the correct original ambiguity codes in the output of a BLAST search is not important to you.

-BATch

submits the program to the batch queue for processing after prompting you for all required user inputs. Any information that would normally appear on the screen while the program is running is written into a log file. Whether that log file is deleted, printed, or saved to your current directory depends on how your system manager has set up the command that submits this program to the batch queue. All output files are written to your current directory, unless you direct the output to another directory when you specify the output file.

-MONitor

This program normally monitors its progress on your screen. However, when you use -Default to suppress all program interaction, you also suppress the monitor. You can turn it back on with this parameter. If you are running the program in batch, the monitor will appear in the log file.

-SUMmary

writes a summary of the program's work to the screen when you've used the -Default parameter to suppress all program interaction. A summary typically displays at the end of a program run interactively. You can suppress the summary for a program run interactively with -NOSUMmary.

You can also use this parameter to cause a summary of the program's work to be written in the log file of a program run in batch.

Printed: November 18, 1996 13:08 (1162)

[ Program Manual | User's Guide | Data Files | Databases ]


Documentation Comments: doc-comments@gcg.com
Technical Support: help@gcg.com

Copyright (c) 1982, 1983, 1985, 1986, 1987, 1989, 1991, 1994, 1995, 1996, 1997 Genetics Computer Group, Inc. a wholly owned subsidiary of Oxford Molecular Group, Inc. All rights reserved.

Licenses and Trademarks Wisconsin Package is a trademark of Genetics Computer Group, Inc. GCG and the GCG logo are registered trademarks of Genetics Computer Group, Inc.

All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.

Genetics Computer Group

www.gcg.com