APPENDIX II

[ Program Manual | User's Guide | Data Files | Databases ]

Data Submission to GenBank or EMBL

Sequence data may be submitted to the EMBL or GenBank databases using the following form. You can have a file with an empty form by using the command $ Fetch GenBank.Form. Fill it out with a text editor. Mailing instructions are included on the form. You can include the sequence itself in GCG format if you wish.

GenBank.Form


GBDAT.FRM          Genetic Sequence Data Bank
                         15 August 1994

                  GenBank Flat File Release 84.0

                  Genbank Data Submission Form

  196703 loci, 201815802 bases, from 196703 reported sequences

                        AUTHORIN from GenBank

        If you have an IBM PC or Macintosh, we request that you use the
        Authorin submission program rather than the submission form
        below.

        GENBANK ENLISTING COMMUNITY SUPPORT

        To meet the goal of an accurate and current data bank, GenBank
        and the scientific journals are requesting members of the
        molecular biology community to submit their data directly to
        the data bank.

        Efforts to encourage author-initiated direct submission of
        sequence data have been successful to date. Over 75% of new
        sequences enter GenBank by these means.

        THE SEQUENCE SUBMISSION PROGRAM: AUTHORIN

        To facilitate authors' direct submission of data, GenBank has
        developed an IBM PC and Macintosh program called Authorin. This is
        an easy-to-use software tool designed to help researchers prepare
        their sequence and annotation data for computer-readable
        submission to GenBank, EMBL, DDBJ, or PIR.

        Authors may enter their data in any order and may make
        revisions at any time prior to submission. Partially completed
        entries may be saved and finished in a later session. Menus are
        provided for many of the fields to standardize terminology and
        reduce typing.

        SUBMITTING WITH AUTHORIN

        Files generated by Authorin are simple text files and may be
        copied to disk and mailed to the appropriate data bank.

        Alternatively, the submission file may be transferred from your
        personal computer to a computer system connected to the BITNET
        or INTERNET networks and mailed electronically to the data
        banks.

        FULLY AUTOMATED DATA ENTRY

        Authorin submissions received via electronic mail are
        checked automatically by a suite of computer programs
        designed to validate the sequence data and their
        associated information. Validated data are passed directly
        to the database.  The submitter then receives an
        electronic mail response containing the accession number
        and a copy of the data as they appear in the database.

        AUTHORIN AVAILABLE FREE

        Authorin is available from GenBank at no charge. The program
        is distributed for the IBM-PC and the Macintosh.  If you
        would like to receive a copy of the Authorin program and
        documentation, send your name and address to:

                National Center for Biotechnology Information
	        National Library of Medicine
		Room 8N-803  Bldg. 38A
		8600 Rockville Pike
		Bethesda, MD 20894

                Tel: (301) 496-2475
                E-mail: authorin@ncbi.nlm.nih.gov

****************************************************************************

                       SEQUENCE DATA SUBMISSION FORM

This form solicits the information needed for a  nucleotide and/or amino  acid
sequence  data bank entry.  By completing and returning it to us promptly you
help us to enter your data in the database accurately  and  rapidly.

Please answer all questions which apply to your data.  If you submit two or
more  non-contiguous  sequences,  please copy  and  fill  out  this  form
for each additional sequence.  Please  include  in  your  submission  any
additional sequence data which is not reported in your manuscript but which
has been reliably determined (for example, introns or flanking sequences).
When submitting nucleic acid sequences containing protein coding regions,
please include a translation (SEPARATELY from the  nucleic  acid  sequence).
Then send  (1) this form, (2) a copy of your manuscript (if available) and
(3) your sequence data (in machine readable form) to the address  shown  below.
Information  about  the  various  ways  you  can send us your data and about
formats for the sequence data is given in the following sections.

                        SUBMITTING DATA TO GENBANK

We can process sequence and annotation data submitted in any of the following
ways:
1. ELECTRONIC  FILE  TRANSFER:  files can be sent via computer network
to gb-sub@ncbi.nlm.nih.gov. This address can  be  reached  via  various gateways
from  BITNET, INTERNET, USENET, JANET, JUNET, etc.  Ask your local network
expert how to send it or phone us for help at (301) 496-2475
2. FLOPPY DISKS:  Macintosh or DOS systems (all sizes and densities): if using
word processing software, the file should be sent as an ASCII text file rather
than as a software-specific file.
3. PRINTED COPY: as a last resort only! Please do not reduce the size of the
letters in the sequence.
Our address is:

       GenBank Submissions
       National Center for Biotechnology Information
       National Library of Medicine
       Room 8N-803, Bldg. 38A
       8600 Rockville Pike
       Bethesda, MD 20894

       E-MAIL:  gb-sub@ncbi.nlm.nih.gov

                                ACCESSION NUMBERS

An accession number is permanently assigned to each sequence submitted to the
database.  We will assign an accession number upon receipt of this form
and return it to you within seven days, or contact you if there are
errors.  We recommend that you cite this number when referring to both these
data and the article where they were originally reported.  If you are
forwarding this number on to a journal, please send a photocopy or facsimile
of the notification received from GenBank; do not send the number over the
telephone.

If your manuscript has already been accepted for publication, the  accession
number should  be included at the galley proof stage as a note added in proof.
If the journal has not already provided a format, we suggest that the note
added to the manuscript or in the galley proof should be inserted as a
footnote on the title page and read approximately  as  follows:   "The
nucleotide sequence data reported in this paper have been submitted to
GenBank and assigned the accession number M12345."

                        FORMATS FOR SUBMITTED DATA

We would appreciate receiving the sequence data in a form which conforms  as
closely as possible to the following standards:

 o Each sequence should include the names of the authors.

 o Each distinct sequence should be listed separately using  the  same number
   of bases/residues per line and clearly indicating its length in
   bases/residues.

 o Enumeration of distinct sequences should begin with a "1" and ascend in the
   direction 5' to 3' (amino- to carboxy-terminus).

 o Amino acid sequences should be listed using the one-letter code.  The code
   for representing the sequence characters should conform to the IUPAC-IUB
   standards, which are described in the following references:  Nucl. Acids
   Res.  13: 3021-3030 (1985) for nucleotides, and J. Biol. Chem. 243:
   3557-3559 (1968) for amino acids.

                _________________________________________________

These data will be shared among the following databases:  EMBL Data Library
(Heidelberg, Federal Republic of Germany); GenBank (NCBI, NIH, Bethesda, MD,
USA); DNA Data Bank of Japan (DDBJ; Mishima, Japan); National Biomedical
Research Foundation Protein Identification Resource (NBRF-PIR; Washington, D.C.,
U.S.A.); Martinsried Institute for Protein Sequence Data (MIPS; Martinsried,
Federal Republic of Germany) and International Protein Information Database in
Japan (JIPID; Noda, Japan).

I.  GENERAL INFORMATION
==============================================================================
Your last name                   first name                middle initials
------------------------------------------------------------------------------
Institution
------------------------------------------------------------------------------
Address

------------------------------------------------------------------------------
Computer mail address                  Telex number
------------------------------------------------------------------------------
Telephone                              Telefax number
==============================================================================
On what medium and in what format are you sending us your sequence data?
(see instructions at the beginning of this form)
  [ ] electronic mail
  [ ] diskette
        computer:			operating system:
	editor:                         filename:
  [ ] magnetic tape (specify format)
==============================================================================

II.  CITATION INFORMATION
==============================================================================
These data represent
[ ]new submission  [ ]correction (if correction, Accession number:          )
==============================================================================
These data are  [ ] published  [ ] in press  [ ] submitted  [ ] in preparation
                [ ] no plans to publish
------------------------------------------------------------------------------
authors
------------------------------------------------------------------------------
title of paper

------------------------------------------------------------------------------
journal                     volume, first-last pages, year
------------------------------------------------------------------------------
Do you agree that these  data can be made  available in the  database before
they appear in print?
  [ ] yes    [ ] no, they can be made available after:              (date)
==============================================================================
Does the sequence  which you are  sending with this form  include  data that
does NOT appear in the above citation?
  [ ] no
  [ ] yes, from position _______ to _______  [ ] bases OR
                                             [ ] amino acid residues
     (If your sequence contains 2 or more such spans,  use the feature table
     in section IV to indicate their positions)
If so, how should these data be cited in the database?
  [ ] published  [ ] in press  [ ] submitted  [ ] in preparation
  [ ] no plans to publish
------------------------------------------------------------------------------
authors
------------------------------------------------------------------------------
address (if different from that given in section I)

------------------------------------------------------------------------------
title of paper

------------------------------------------------------------------------------
journal                     volume, first-last pages, year
==============================================================================
List references to papers  and/or  database  entries which report sequences
overlapping with that submitted here.

1st author     journal, vol., pages, year and/or database, accession number
------------------------------------------------------------------------------

------------------------------------------------------------------------------

==============================================================================

III.  DESCRIPTION OF SEQUENCED SEGMENT

Wherever possible, please use standard  nomenclature or conventions.  If  a
question  is not applicable to your sequence, answer by writing N.A. in the
appropriate space; if the information is relevant but not available,  write
a question mark (?).
==============================================================================
What kind of molecule did you sequence?   (check all boxes which apply)

 [ ] genomic DNA    [ ] genomic RNA   [ ] cDNA to mRNA  [ ]cDNA to genomic RNA
 [ ] organelle DNA  [ ] organelle RNA  please specify organelle:
 [ ] tRNA           [ ] rRNA          [ ] snRNA          [ ] scRNA
 for viruses: [ ] virus  or  [ ] provirus  or  [ ] viroid   [ ] DNA or [ ] RNA
              [ ] ds     or  [ ] ss        or  [ ] circular [ ] enveloped
                                                         or [ ] nonenveloped
 [ ] other nucleic acid.  please specify:

 [ ] peptide  [ ] sequence assembled by  [ ] overlap of sequenced fragments
                                         [ ] homology with related sequence
                                         [ ] other.  please specify:

              [ ] partial:               [ ] N-terminal
                                         [ ] C-terminal
                                         [ ] internal fragment
==============================================================================
length of sequence              [ ] bases or  [ ] amino acid residues
------------------------------------------------------------------------------
gene name(s) (e.g., lacZ)
------------------------------------------------------------------------------
gene product name(s) (e.g., beta-D-galactosidase)
------------------------------------------------------------------------------
Enzyme Commission number (e.g., EC 3.2.1.23)
------------------------------------------------------------------------------
gene product subunit structure (e.g., hemoglobin alpha-2 beta-2)
==============================================================================
The following items refer to the  original source of the  molecule you have
sequenced.
  organism (species) (e.g., Mus musculus)             plant cultivar
------------------------------------------------------------------------------
  strain (e.g., K12, BALB/c)                          substrain
------------------------------------------------------------------------------
  name/number of individual/isolate (e.g., patient 123; influenza virus
  A/PR/8/34)
------------------------------------------------------------------------------
  developmental stage                        [ ] germ line   [ ] rearranged
------------------------------------------------------------------------------
  haplotype                    tissue type                cell type
------------------------------------------------------------------------------
  allele                       variant                    [ ] macronuclear
==============================================================================
The  following  items  refer  to the  immediate experimental  source of the
submitted sequence.
  name of cell line (e.g., Hela; 3T3-L1) or plant cultivar
------------------------------------------------------------------------------
  clone library				clone(s), subclone(s)
==============================================================================
The following items refer to the  position of the submitted sequence in the
genome.
  chromosome (or segment) name/number
------------------------------------------------------------------------------
  map position                   units:  [ ] genome %  [ ] nucleotide number
                                         [ ] other:
==============================================================================
Using single words or short phrases, describe the properties of the sequence
in terms of:

  -  its associated phenotype(s);
  -  the biological/enzymatic activity of its product;
  -  the general functional  classification of the gene  and/or gene product
  -  macromolecules to which the gene product can bind  (e.g., DNA, calcium,
     other proteins);
  -  subcellular localization of the gene product;
  -  any other relevant information.

Example (for the viral erbB nucleotide sequence): transforming capacity; EGF
receptor-related; tyrosine kinase; oncogene; transmembrane protein.

==============================================================================

IV.  FEATURES OF THE SEQUENCE

Please  list  below  the  types  and  locations of all significant  features
experimentally  identified within the sequence.   Be sure that your sequence
is numbered beginning with "1."  Use < or > if a feature extends beyond
 the
beginning or end of the indicated sequence span.

In the column marked                   fill in

      feature          type of feature (see information below)
      from             number of first base/amino acid in the feature
      to               number of last base/amino acid in the feature
      bp               an "x" if numbering refers to position of a base pair
                       in a nucleotide sequence
      aa               an "x" if  numbering  refers to  position of an amino
                       acid residue in a peptide sequence
      id               indicate  method by which the feature was identified.
                       E  =  experimentally;  S  =  by similarity with known
                       sequence or to an established consensus sequence; P =
                       by similarity  to  some  other  pattern,  such  as an
                       open reading frame
      comp             an  "x"  for a nucleotide sequence feature located on
                       strand complementary to that reported here

Significant features include:

  -  regulatory signals (e.g., promoters, attenuators, enhancers)
  -  transcribed  regions  (e.g., mRNA, rRNA, tRNA).  (indicate reading frame
     if start and stop codons are not present)
  -  regions  subject to  post-transcriptional  modificaton  (e.g.,  introns,
     modified bases)
  -  translated regions
  -  extent of  signal  peptide,  prepropeptide,  propeptide,  mature peptide
  -  regions subject to post-translational modification  (e.g.,  glycosylated
     or phosphorylated sites)
  -  other  domains/sites  of  interest  (e.g.,  extracellular  domain,  DNA-
     binding domain, active site, inhibitory site)
  -  sites involved in bonding (disulfide, thiolester, intrachain, interchain)
  -  regions of protein secondary structure  (e.g., alpha helix or beta sheet)
  -  conflicts with sequence data reported by other authors
  -  variations and polymorphisms

The first 2 lines of the table are filled in with examples.

==============================================================================
Numbering for features on submitted sequence  [ ] matches manuscript
                                              [ ] does not match manuscript
==============================================================================
             feature               from        to         bp  aa   id    comp
------------------------------------------------------------------------------
EXAMPLE     TATA box              1             8          x        S
------------------------------------------------------------------------------
EXAMPLE      exon 1               9           >264         x
==============================================================================

------------------------------------------------------------------------------

------------------------------------------------------------------------------

------------------------------------------------------------------------------

------------------------------------------------------------------------------

------------------------------------------------------------------------------

------------------------------------------------------------------------------

------------------------------------------------------------------------------

==============================================================================

V.    SEQUENCE DATA

Please enter the nucleotide sequence data here:

Please enter the translated amino acid sequence here:

E6/11.90 (Last change: 09-Jan-1992)

                    GenBank ERROR / SUGGESTION REPORT FORM

GENERAL INSTRUCTIONS

This form should be used to report errors in GenBank data and to submit
suggestions.  Your suggestions help us to keep GenBank data up-to-date and
accurate.  We welcome your input.

Please answer all questions which apply to the problem or suggestion.  If you
report two or more separate problems, please copy and fill out this form for
each additional report.  You may fill out the computer-readable form using a
text editor or print the form and fill it out by hand.
Please send the form(s) to:

       GenBank Updates
       National Center for Biotechnology Information
       National Library of Medicine
       Room 8N-803, Bldg. 38A
       8600 Rockville Pike
       Bethesda, MD 20894

       E-mail:  update@ncbi.nlm.nih.gov
       Phone:   (301) 496-2475

Please be sure to include the primary (first) accession number and locus name
of all entries affected.  The form is reproduced below.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Name: _________________________________________________________________________

Department: ________________________  Institution: ____________________________

Mailing Address: _______________________________________   Phone: _____________

City: ______________________________  State: ___________   Zip: _______________

Electronic Mail Address: ______________________________________________________
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Type of Report: [ ] error  [ ] problem  [ ] suggestion  [ ] comment  [ ] other

Release of GenBank to which this applies: ___________________________

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Entry Information:

Primary Accession Number(s):                   Entry Name(s):

Division (check one or more):

  [ ] BCT    [ ] INV    [ ] MAM    [ ] ORG    [ ] PHG    [ ] PLN    [ ] PRI

       [ ] ROD    [ ] RNA    [ ] SYN    [ ] UNA    [ ] VRL    [ ] VRT

_______________________________________________________________________________

Field Type (check one or more):

     [ ] Locus line    [ ] Source       [ ] Comment       [ ] Origin
     [ ] Accession     [ ] Organism     [ ] Features      [ ] Sequence
     [ ] Keywords      [ ] Reference    [ ] Base Count    [ ] Other ___________

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Report / Suggestion
      (please be as precise as possible - attach pages if necessary)

  ___________________________________________________________________________
  Data presently in field:

  ___________________________________________________________________________
  Proposed Change:

  ___________________________________________________________________________
  Reason for Change:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

GenBank Staff Use Only:

  [ ] Change made, Release __________         [ ] Reply, Date _______________

  [ ] Approved by ___________________________________________________________

Printed: November 17, 1996 13:22 (1162)

[ Program Manual | User's Guide | Data Files | Databases ]


Documentation Comments: doc-comments@gcg.com
Technical Support: help@gcg.com

Copyright (c) 1982, 1983, 1985, 1986, 1987, 1989, 1991, 1994, 1995, 1996, 1997 Genetics Computer Group, Inc. a wholly owned subsidiary of Oxford Molecular Group, Inc. All rights reserved.

Licenses and Trademarks Wisconsin Package is a trademark of Genetics Computer Group, Inc. GCG and the GCG logo are registered trademarks of Genetics Computer Group, Inc.

All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.

Genetics Computer Group

www.gcg.com