[ Program Manual | User's Guide | Data Files | Databases ]
Sequence data may be submitted to the EMBL or GenBank databases using the following form. You can have a file with an empty form by using the command $ Fetch GenBank.Form. Fill it out with a text editor. Mailing instructions are included on the form. You can include the sequence itself in GCG format if you wish.
GBDAT.FRM Genetic Sequence Data Bank 15 August 1994 GenBank Flat File Release 84.0 Genbank Data Submission Form 196703 loci, 201815802 bases, from 196703 reported sequences AUTHORIN from GenBank If you have an IBM PC or Macintosh, we request that you use the Authorin submission program rather than the submission form below. GENBANK ENLISTING COMMUNITY SUPPORT To meet the goal of an accurate and current data bank, GenBank and the scientific journals are requesting members of the molecular biology community to submit their data directly to the data bank. Efforts to encourage author-initiated direct submission of sequence data have been successful to date. Over 75% of new sequences enter GenBank by these means. THE SEQUENCE SUBMISSION PROGRAM: AUTHORIN To facilitate authors' direct submission of data, GenBank has developed an IBM PC and Macintosh program called Authorin. This is an easy-to-use software tool designed to help researchers prepare their sequence and annotation data for computer-readable submission to GenBank, EMBL, DDBJ, or PIR. Authors may enter their data in any order and may make revisions at any time prior to submission. Partially completed entries may be saved and finished in a later session. Menus are provided for many of the fields to standardize terminology and reduce typing. SUBMITTING WITH AUTHORIN Files generated by Authorin are simple text files and may be copied to disk and mailed to the appropriate data bank. Alternatively, the submission file may be transferred from your personal computer to a computer system connected to the BITNET or INTERNET networks and mailed electronically to the data banks. FULLY AUTOMATED DATA ENTRY Authorin submissions received via electronic mail are checked automatically by a suite of computer programs designed to validate the sequence data and their associated information. Validated data are passed directly to the database. The submitter then receives an electronic mail response containing the accession number and a copy of the data as they appear in the database. AUTHORIN AVAILABLE FREE Authorin is available from GenBank at no charge. The program is distributed for the IBM-PC and the Macintosh. If you would like to receive a copy of the Authorin program and documentation, send your name and address to: National Center for Biotechnology Information National Library of Medicine Room 8N-803 Bldg. 38A 8600 Rockville Pike Bethesda, MD 20894 Tel: (301) 496-2475 E-mail: authorin@ncbi.nlm.nih.gov **************************************************************************** SEQUENCE DATA SUBMISSION FORM This form solicits the information needed for a nucleotide and/or amino acid sequence data bank entry. By completing and returning it to us promptly you help us to enter your data in the database accurately and rapidly. Please answer all questions which apply to your data. If you submit two or more non-contiguous sequences, please copy and fill out this form for each additional sequence. Please include in your submission any additional sequence data which is not reported in your manuscript but which has been reliably determined (for example, introns or flanking sequences). When submitting nucleic acid sequences containing protein coding regions, please include a translation (SEPARATELY from the nucleic acid sequence). Then send (1) this form, (2) a copy of your manuscript (if available) and (3) your sequence data (in machine readable form) to the address shown below. Information about the various ways you can send us your data and about formats for the sequence data is given in the following sections. SUBMITTING DATA TO GENBANK We can process sequence and annotation data submitted in any of the following ways: 1. ELECTRONIC FILE TRANSFER: files can be sent via computer network to gb-sub@ncbi.nlm.nih.gov. This address can be reached via various gateways from BITNET, INTERNET, USENET, JANET, JUNET, etc. Ask your local network expert how to send it or phone us for help at (301) 496-2475 2. FLOPPY DISKS: Macintosh or DOS systems (all sizes and densities): if using word processing software, the file should be sent as an ASCII text file rather than as a software-specific file. 3. PRINTED COPY: as a last resort only! Please do not reduce the size of the letters in the sequence. Our address is: GenBank Submissions National Center for Biotechnology Information National Library of Medicine Room 8N-803, Bldg. 38A 8600 Rockville Pike Bethesda, MD 20894 E-MAIL: gb-sub@ncbi.nlm.nih.gov ACCESSION NUMBERS An accession number is permanently assigned to each sequence submitted to the database. We will assign an accession number upon receipt of this form and return it to you within seven days, or contact you if there are errors. We recommend that you cite this number when referring to both these data and the article where they were originally reported. If you are forwarding this number on to a journal, please send a photocopy or facsimile of the notification received from GenBank; do not send the number over the telephone. If your manuscript has already been accepted for publication, the accession number should be included at the galley proof stage as a note added in proof. If the journal has not already provided a format, we suggest that the note added to the manuscript or in the galley proof should be inserted as a footnote on the title page and read approximately as follows: "The nucleotide sequence data reported in this paper have been submitted to GenBank and assigned the accession number M12345." FORMATS FOR SUBMITTED DATA We would appreciate receiving the sequence data in a form which conforms as closely as possible to the following standards: o Each sequence should include the names of the authors. o Each distinct sequence should be listed separately using the same number of bases/residues per line and clearly indicating its length in bases/residues. o Enumeration of distinct sequences should begin with a "1" and ascend in the direction 5' to 3' (amino- to carboxy-terminus). o Amino acid sequences should be listed using the one-letter code. The code for representing the sequence characters should conform to the IUPAC-IUB standards, which are described in the following references: Nucl. Acids Res. 13: 3021-3030 (1985) for nucleotides, and J. Biol. Chem. 243: 3557-3559 (1968) for amino acids. _________________________________________________ These data will be shared among the following databases: EMBL Data Library (Heidelberg, Federal Republic of Germany); GenBank (NCBI, NIH, Bethesda, MD, USA); DNA Data Bank of Japan (DDBJ; Mishima, Japan); National Biomedical Research Foundation Protein Identification Resource (NBRF-PIR; Washington, D.C., U.S.A.); Martinsried Institute for Protein Sequence Data (MIPS; Martinsried, Federal Republic of Germany) and International Protein Information Database in Japan (JIPID; Noda, Japan). I. GENERAL INFORMATION ============================================================================== Your last name first name middle initials ------------------------------------------------------------------------------ Institution ------------------------------------------------------------------------------ Address ------------------------------------------------------------------------------ Computer mail address Telex number ------------------------------------------------------------------------------ Telephone Telefax number ============================================================================== On what medium and in what format are you sending us your sequence data? (see instructions at the beginning of this form) [ ] electronic mail [ ] diskette computer: operating system: editor: filename: [ ] magnetic tape (specify format) ============================================================================== II. CITATION INFORMATION ============================================================================== These data represent [ ]new submission [ ]correction (if correction, Accession number: ) ============================================================================== These data are [ ] published [ ] in press [ ] submitted [ ] in preparation [ ] no plans to publish ------------------------------------------------------------------------------ authors ------------------------------------------------------------------------------ title of paper ------------------------------------------------------------------------------ journal volume, first-last pages, year ------------------------------------------------------------------------------ Do you agree that these data can be made available in the database before they appear in print? [ ] yes [ ] no, they can be made available after: (date) ============================================================================== Does the sequence which you are sending with this form include data that does NOT appear in the above citation? [ ] no [ ] yes, from position _______ to _______ [ ] bases OR [ ] amino acid residues (If your sequence contains 2 or more such spans, use the feature table in section IV to indicate their positions) If so, how should these data be cited in the database? [ ] published [ ] in press [ ] submitted [ ] in preparation [ ] no plans to publish ------------------------------------------------------------------------------ authors ------------------------------------------------------------------------------ address (if different from that given in section I) ------------------------------------------------------------------------------ title of paper ------------------------------------------------------------------------------ journal volume, first-last pages, year ============================================================================== List references to papers and/or database entries which report sequences overlapping with that submitted here. 1st author journal, vol., pages, year and/or database, accession number ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ ============================================================================== III. DESCRIPTION OF SEQUENCED SEGMENT Wherever possible, please use standard nomenclature or conventions. If a question is not applicable to your sequence, answer by writing N.A. in the appropriate space; if the information is relevant but not available, write a question mark (?). ============================================================================== What kind of molecule did you sequence? (check all boxes which apply) [ ] genomic DNA [ ] genomic RNA [ ] cDNA to mRNA [ ]cDNA to genomic RNA [ ] organelle DNA [ ] organelle RNA please specify organelle: [ ] tRNA [ ] rRNA [ ] snRNA [ ] scRNA for viruses: [ ] virus or [ ] provirus or [ ] viroid [ ] DNA or [ ] RNA [ ] ds or [ ] ss or [ ] circular [ ] enveloped or [ ] nonenveloped [ ] other nucleic acid. please specify: [ ] peptide [ ] sequence assembled by [ ] overlap of sequenced fragments [ ] homology with related sequence [ ] other. please specify: [ ] partial: [ ] N-terminal [ ] C-terminal [ ] internal fragment ============================================================================== length of sequence [ ] bases or [ ] amino acid residues ------------------------------------------------------------------------------ gene name(s) (e.g., lacZ) ------------------------------------------------------------------------------ gene product name(s) (e.g., beta-D-galactosidase) ------------------------------------------------------------------------------ Enzyme Commission number (e.g., EC 3.2.1.23) ------------------------------------------------------------------------------ gene product subunit structure (e.g., hemoglobin alpha-2 beta-2) ============================================================================== The following items refer to the original source of the molecule you have sequenced. organism (species) (e.g., Mus musculus) plant cultivar ------------------------------------------------------------------------------ strain (e.g., K12, BALB/c) substrain ------------------------------------------------------------------------------ name/number of individual/isolate (e.g., patient 123; influenza virus A/PR/8/34) ------------------------------------------------------------------------------ developmental stage [ ] germ line [ ] rearranged ------------------------------------------------------------------------------ haplotype tissue type cell type ------------------------------------------------------------------------------ allele variant [ ] macronuclear ============================================================================== The following items refer to the immediate experimental source of the submitted sequence. name of cell line (e.g., Hela; 3T3-L1) or plant cultivar ------------------------------------------------------------------------------ clone library clone(s), subclone(s) ============================================================================== The following items refer to the position of the submitted sequence in the genome. chromosome (or segment) name/number ------------------------------------------------------------------------------ map position units: [ ] genome % [ ] nucleotide number [ ] other: ============================================================================== Using single words or short phrases, describe the properties of the sequence in terms of: - its associated phenotype(s); - the biological/enzymatic activity of its product; - the general functional classification of the gene and/or gene product - macromolecules to which the gene product can bind (e.g., DNA, calcium, other proteins); - subcellular localization of the gene product; - any other relevant information. Example (for the viral erbB nucleotide sequence): transforming capacity; EGF receptor-related; tyrosine kinase; oncogene; transmembrane protein. ============================================================================== IV. FEATURES OF THE SEQUENCE Please list below the types and locations of all significant features experimentally identified within the sequence. Be sure that your sequence is numbered beginning with "1." Use < or > if a feature extends beyond the beginning or end of the indicated sequence span. In the column marked fill in feature type of feature (see information below) from number of first base/amino acid in the feature to number of last base/amino acid in the feature bp an "x" if numbering refers to position of a base pair in a nucleotide sequence aa an "x" if numbering refers to position of an amino acid residue in a peptide sequence id indicate method by which the feature was identified. E = experimentally; S = by similarity with known sequence or to an established consensus sequence; P = by similarity to some other pattern, such as an open reading frame comp an "x" for a nucleotide sequence feature located on strand complementary to that reported here Significant features include: - regulatory signals (e.g., promoters, attenuators, enhancers) - transcribed regions (e.g., mRNA, rRNA, tRNA). (indicate reading frame if start and stop codons are not present) - regions subject to post-transcriptional modificaton (e.g., introns, modified bases) - translated regions - extent of signal peptide, prepropeptide, propeptide, mature peptide - regions subject to post-translational modification (e.g., glycosylated or phosphorylated sites) - other domains/sites of interest (e.g., extracellular domain, DNA- binding domain, active site, inhibitory site) - sites involved in bonding (disulfide, thiolester, intrachain, interchain) - regions of protein secondary structure (e.g., alpha helix or beta sheet) - conflicts with sequence data reported by other authors - variations and polymorphisms The first 2 lines of the table are filled in with examples. ============================================================================== Numbering for features on submitted sequence [ ] matches manuscript [ ] does not match manuscript ============================================================================== feature from to bp aa id comp ------------------------------------------------------------------------------ EXAMPLE TATA box 1 8 x S ------------------------------------------------------------------------------ EXAMPLE exon 1 9 >264 x ============================================================================== ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ ============================================================================== V. SEQUENCE DATA Please enter the nucleotide sequence data here: Please enter the translated amino acid sequence here: E6/11.90 (Last change: 09-Jan-1992) GenBank ERROR / SUGGESTION REPORT FORM GENERAL INSTRUCTIONS This form should be used to report errors in GenBank data and to submit suggestions. Your suggestions help us to keep GenBank data up-to-date and accurate. We welcome your input. Please answer all questions which apply to the problem or suggestion. If you report two or more separate problems, please copy and fill out this form for each additional report. You may fill out the computer-readable form using a text editor or print the form and fill it out by hand. Please send the form(s) to: GenBank Updates National Center for Biotechnology Information National Library of Medicine Room 8N-803, Bldg. 38A 8600 Rockville Pike Bethesda, MD 20894 E-mail: update@ncbi.nlm.nih.gov Phone: (301) 496-2475 Please be sure to include the primary (first) accession number and locus name of all entries affected. The form is reproduced below. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Name: _________________________________________________________________________ Department: ________________________ Institution: ____________________________ Mailing Address: _______________________________________ Phone: _____________ City: ______________________________ State: ___________ Zip: _______________ Electronic Mail Address: ______________________________________________________ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Type of Report: [ ] error [ ] problem [ ] suggestion [ ] comment [ ] other Release of GenBank to which this applies: ___________________________ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Entry Information: Primary Accession Number(s): Entry Name(s): Division (check one or more): [ ] BCT [ ] INV [ ] MAM [ ] ORG [ ] PHG [ ] PLN [ ] PRI [ ] ROD [ ] RNA [ ] SYN [ ] UNA [ ] VRL [ ] VRT _______________________________________________________________________________ Field Type (check one or more): [ ] Locus line [ ] Source [ ] Comment [ ] Origin [ ] Accession [ ] Organism [ ] Features [ ] Sequence [ ] Keywords [ ] Reference [ ] Base Count [ ] Other ___________ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Report / Suggestion (please be as precise as possible - attach pages if necessary) ___________________________________________________________________________ Data presently in field: ___________________________________________________________________________ Proposed Change: ___________________________________________________________________________ Reason for Change: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ GenBank Staff Use Only: [ ] Change made, Release __________ [ ] Reply, Date _______________ [ ] Approved by ___________________________________________________________
[ Program Manual | User's Guide | Data Files | Databases ]
Documentation Comments: doc-comments@gcg.com
Technical Support: help@gcg.com
Copyright (c) 1982, 1983, 1985, 1986, 1987, 1989, 1991, 1994, 1995, 1996, 1997 Genetics Computer Group, Inc. a wholly owned subsidiary of Oxford Molecular Group, Inc. All rights reserved.
Licenses and Trademarks Wisconsin Package is a trademark of Genetics Computer Group, Inc. GCG and the GCG logo are registered trademarks of Genetics Computer Group, Inc.
All other product names mentioned in this documentation may be trademarks, and if so, are trademarks or registered trademarks of their respective holders and are used in this documentation for identification purposes only.