Perl 4 - Loops and Filehandles

Reading: Deitel 3.7->, Deitel 4.6, Deitel 5.1, 5.2, 5.3, Deitel 10.3 -> 10.7


What is a loop?

So far we have gone from programs with a single flow of execution (Lecture 1) and with decision points (Lecture 2).

What we have lacked to far is the ability to do repetition

while loops


program 1
#!/usr/local/bin/perl
# sum.pl - adds a list of numbers

print "Enter a list of numbers, seperated by spaces:\n"; $line = <STDIN>;

# use the split function to break the line # up into an array of numbers @numbers = split( ' ', $line );

# initialize the running total to zero $sum = 0;

while ( @numbers ) { # execute loop until array is empty

# get the next number from the array $number = shift @numbers;

# increment the running total $sum += $number; print "Current Running total is $sum\n";

}

print "\n\nThe sum of your numbers is: $sum\n";

download

% perl sum.pl
Enter a list of numbers, seperated by spaces:
4 3 77 21
Current Running total is 4
Current Running total is 7
Current Running total is 84
Current Running total is 105
 
 
The sum of your numbers is: 105

How does this work?

A while loop is composed of a test condition and a block of code.

the block of code is executed so long as the test condition evaluates to true.

while ( test condition ) { statements }


Foreach loops


program 2
# print_bases.pl
$i = 1;
foreach $base ( 'A', 'C', 'G', 'T' ) {
        print "base $i = $base\n";
        $i++;
}
download

this outputs:

base 1 = A
base 2 = C
base 3 = G
base 4 = T

the structure of this command is

foreach loop variable ( list ) { statements }

Here is an example from the Deitel book:


program 3
#!/usr/local/bin/perl
# square_array.pl
#
# Using foreach to iterate over an array.

@array = ( 1 .. 10 ); # create array containing 1-10

foreach $number ( @array ) { # for each element in @array $number **= 2; # square the value }

print "@array\n"; # display the results

download

this outputs:
1 4 9 16 25 36 49 64 81 100


The $_ (dollar underscore) variable

Perl has reserved a number of variable names which have special meaning.


program 4
# seq_lengths.pl
@seqs = qw( AATGG CAAT GCGTTAAC ATATATATATATA );
foreach ( @seqs ) {
	$seq_length = length($_);
        print "sequence = $_ length = $seq_length\n";
}
download

this outputs:


program 5
sequence = AATGG length = 5
sequence = CAAT length = 4
sequence = GCGTTAAC length = 8
sequence = ATATATATATATA length = 13
download

If we don't give foreach a loop variable, the special scalar variable $_ is used instead.

Whether you make use of these special variables is up to you and your
own style of programming. Some people find this style leads to more
succinct, condensed programs. some people (myself included) like this
style because it appeals to their laziness - less typing. On the other
hand, some people
detest this style of programming and say it leads to ugly code that is
difficult to read.


The for Loop


program 6
# for_loop.pl
for ( $i = 0 ; $i < 10; $i++ ) {
        print "the value of \$i is  $i\n";
}
download

the value of $i is  0
the value of $i is  1
the value of $i is  2
the value of $i is  3
the value of $i is  4
the value of $i is  5
the value of $i is  6
the value of $i is  7
the value of $i is  8
the value of $i is  9


program 7
# forloop_decrement.pl
for ( $i = 10 ; $i > -10; $i -= 2 ) {
        print "the value of \$i is  $i\n";
}
download

the value of $i is  10
the value of $i is  8
the value of $i is  6
the value of $i is  4
the value of $i is  2
the value of $i is  0
the value of $i is  -2
the value of $i is  -4
the value of $i is  -6
the value of $i is  -8

make sure you get the test condition right - if you say less than are you sure you mean less 
than and not less than or equal to


Loops - map and grep

map


program 8
# map_example.pl  - prints the numbers 1 to 100
@numbers = (1, 2, 3, 4, 5);
@numbers_squared = map { $_ * $_ } @numbers;
print "number  = @numbers \n";            # 1 2 3 4 5
print "squared = @numbers_squared \n";    # 1 4 9 16 25
download

grep


program 9
# grep_example.pl  - filters expect values

# initialize an array of blast expect values @expect_values = (0, 3.1e-20, 1.2e-17, 4.7e-5, 0.098, 1.2);

# set an arbitrary threshold $threshold = 1e-5;

# loop through all values filtering ones that meet threshold @passed = grep { $_ < $threshold } @expect_values;

print scalar(@passed), " hits passed the threshold\n"; # 3 print "values = @passed \n"; # 0 3.1e-20 1.2e-17

download


Nested Loops


program 10
# nested_loop.pl

for ( my $x = 0; $x < 10; $x++ ) { for ( my $y = 0; $y < 10; $y++ ) { print "$x * $y = ", ( $x * $y ), "\n"; } }

download

Always indent your code so that you know which statement block each line of code belongs to.


Escaping from loops : next, last

next

Reading: Deitel 5.6

All the following examples are from the Deitel book:


program 11
# using_next.pl
foreach ( 1 .. 10 ) {
 
   if ( $_ == 5 ) {
      $skipped = $_;  # store skipped value
      next;  # skip remaining code in loop only if $_ is 5
   }
 
   print "$_ ";
}
 
print "\n\nUsed 'next' to skip the value $skipped.\n";
download

last


program 12
# using_last.pl
foreach ( 1 .. 10 ) {
   if ( $_ == 5 ) {
      $number = $_;  # store current value before loop ends
      last;          # jump to end of foreach structure
   }
 
   print "$_ ";
}
 
print "\n\nUsed 'last' to terminate loop at $number.\n";
download

redo


program 13
# using_redo.pl
$number = 1;
 
while ( $number <= 5 ) {
 
   if ( $number <= 10 ) {
      print "$number ";
      ++$number;
      redo;  # Continue loop without testing ( $number <= 5 )
   }
}
 
print "\nStopped when \$number became $number.\n";
download

1 2 3 4 5 6 7 8 9 10
Stopped when $number became 11.

Block Labels (optional)

It is possible to label statement blocks to have greater control over program flow. While this is unecessary, some people prefer this style.

Described in Deitel 5.9


Filehandles

we have already encountered the standard input filehandle STDIN

recall -


program 14
#!/usr/local/bin/perl
# echo.pl

$line = <STDIN>; print "echo:$line";

download

this program reads a line of input from the user's terminal, and writes back the line.

STDIN is the name of a special Filehandle that is for reading input from the command line

The next program reads in multiple lines and prints them all
program 15
#!/usr/local/bin/perl
# echo2.pl

# keeps echoing back what the user types while ( $line = <STDIN> ) { print "echo:$line"; }

download


$_ as the default input variable


program 16
#!/usr/local/bin/perl
# echo3.pl

# keeps echoing back what the user types while ( <STDIN> ) { print "echo:$_"; }

download

Here is another example:


program 17
#!/usr/local/bin/perl
# wc.pl - imitates unix wc (word count) program

$lines = 0; $words = 0; $characters = 0; while ( <> ) {

$lines++;

@words = split(' ', $_); $words += scalar(@words);

$characters += length($_);

} print " $lines $words $characters\n";

download

The empty angle brackets <> inidicate input should either be from files specified on the command line, or from standard input.

you can use the program this way (with a pipe):
% cat myfile | wc.pl

or this way:
% wc.pl myfile


Input from a file


program 18
#!/usr/local/bin/perl
# sort.pl - sorts the lines in a file alphabetically

# open a file and assign the filehandle F open(F, "myfile.txt") or die("can't open myfile.txt: $!\n");

# read in the whole file into an array of lines @lines = (); while(<F>) { push(@lines, $_); } close(F); # close the filehandle

# sort the lines @lines = sort @lines;

# print out the lines, now sorted foreach (@lines) { print; # means the same as print $_; }

download

Notice the construct in the first line of code; this is a common construct in perl programs. If the open function returns FALSE, indicating the file could not be opened, then the program dies/exits with a meaningful message.

Let's create a file then sort it using our program:

% cat > myfile.txt
frizzle
zen
abacus
quibble
jellybean
% sort.pl
abacus
frizzle
jellybean
quibble
zen


Writing to a file


program 19
# filewrite.pl
open(F, ">output.txt") or die( "cannot write to output.txt : $!\n");
print F "Hello World!\n";
close(F);
download

appending


program 20
open(F, ">>output.txt");
print F "Goodbye world!\n";
close(F);
download

% cat output.txt
Hello World!
Goodbye world!


Input record seperator

So far we have been reading in files (or user input) one line at a time.

the construct
program 21
while ( $line = <F> ) {
    # do something here
}
download

Loops through the filehandle F, each time reading in all the characters up to and including the newline character \n

What if we want to split up a file using a different seperator?

This is where some of perl's more unusual features become extremely useful and powerful. The next program reads in a series of genbank records, with each record divided using the double-forwardslash. It asks the user for a record number and prints that record.

See the appendix for a test dataset


program 22
#!/usr/local/bin/perl
# genbank_fetch.pl

$/ = "//"; # genbank record seperator

print "What record number do you want to fetch?\n"; my $record = <STDIN>; chomp $record;

$current_record = 1; while ( <> ) { chomp; print if $record == $current_record; $current_record++; }

download

To download a test genbank file click here

We can modify the above program, using our knowledge of regular expressions and pattern matching:


program 23
#!/usr/local/bin/perl
# genbank_filter.pl - filters a file of genbank entries

$/ = "//"; # genbank record seperator

$filter = shift @ARGV; # get filter pattern from user

while ( <> ) { print if /$filter/s; # only print record if it matches pattern }

download

Try running this program on a file of genbank entries, filtering by organism:

% cat test.genbank | genbank_filter.pl "Homo sapiens"

or by gene name:
% cat test.genbank | genbank_filter.pl "beta-globin"

or multiple filters using your knowledge of unix pipes:

% cat test.genbank | genbank_filter.pl "Homo sapiens" | genbank_filter.pl "beta-globin"


Exercises

Write a program to loop through a file of sequences, and give the count of the number of sequences containing EcoRI sites.

Assume a sequence file of one line per sequence.

Extend the program to give a count for each of the restriction enzymes in this hash:


program 24
%re_lookup = (
          'Eco47III'=> 'AGCGCT',
          'EcoRI'   => 'GAATTC',
          'HindIII' => 'AAGCTT',
);

(notice that we exclude all enzymes that have Ns / purines/ pyramidines to match)

You should write the program in such a way that it is easy to extend with other restriction enzymes (again focusing only on ones with A/C/G/T recognition) just by changing the hash.


if you wrote the medline lookup program in the last set of exercises, extend this to read it's data from a file.

otherwise, extend the restriction enzyme count program above to read the restriction enzyme sequences from a file.

download


Appendix - data

Here is an example file of 5 GenBank entries for using in some of the programs above.

You can also download this by clicking here

LOCUS       DDU63596      310 bp    DNA             INV       14-MAY-1999
DEFINITION  Dictyostelium discoideum Tdd-4 transposable element flanking
            sequence, clone p427/428 right end.
ACCESSION   U63596
NID         g2393749
KEYWORDS    .
SOURCE      Dictyostelium discoideum.
  ORGANISM  Dictyostelium discoideum
            Eukaryota; Dictyosteliida; Dictyostelium.
REFERENCE   1  (bases 1 to 310)
  AUTHORS   Wells,D.J.
  TITLE     Tdd-4, a DNA transposon of Dictyostelium that encodes proteins
            similar to LTR retroelement integrases
  JOURNAL   Nucleic Acids Res. 27 (11), 2408-2415 (1999)
REFERENCE   2  (bases 1 to 310)
  AUTHORS   Wells,D.J. and Welker,D.L.
  TITLE     Dictyostelium discoideum Tdd-4 transposable element, right end
            flanking sequence from clone p427/428
  JOURNAL   Unpublished
REFERENCE   3  (bases 1 to 310)
  AUTHORS   Wells,D.J. and Welker,D.L.
  TITLE     Direct Submission
  JOURNAL   Submitted (11-JUL-1996) Biology, Utah State Univ., Logan, UT
            84322-5305, USA
FEATURES             Location/Qualifiers
     source          1..310
                     /organism="Dictyostelium discoideum"
                     /strain="AX4"
                     /db_xref="taxon:44689"
                     clone="p427428"
     misc_feature    5.12
                     /note="Fuzzy location"
     misc_feature    J00194:(100..202),1..245,300..422
                     /note="Location partly in another entry"
BASE COUNT      118 a     46 c     67 g     79 t
ORIGIN      
        1 gtgacagttg gctgtcagac atacaatgat tgtttagaag aggagaagat tgatccggag
       61 taccgtgata gtattttaaa aactatgaaa gcgggaatac ttaatggtaa actagttaga
      121 ttatgtgacg tgccaagggg tgtagatgta gaaattgaaa caactggtct aaccgattca
      181 gaaggagaaa gtgaatcaaa agaagaagag tgatgatgaa tagccaccat tactgcatac
      241 tgtagccctt acccttgtcg caccattagc cattaataaa aataaaaaat tatataaaaa
      301 ttacacccat 
//
LOCUS       DDU63595       83 bp    DNA             INV       14-MAY-1999
DEFINITION  Dictyostelium discoideum Tdd-4 transposable element flanking
            sequence, clone p427/428 left end.
ACCESSION   U63595
NID         g2393748
KEYWORDS    .
SOURCE      Dictyostelium discoideum.
  ORGANISM  Dictyostelium discoideum
            Eukaryota; Dictyosteliida; Dictyostelium.
REFERENCE   1  (bases 1 to 83)
  AUTHORS   Wells,D.J.
  TITLE     Tdd-4, a DNA transposon of Dictyostelium that encodes proteins
            similar to LTR retroelement integrases
  JOURNAL   Nucleic Acids Res. 27 (11), 2408-2415 (1999)
REFERENCE   2  (bases 1 to 83)
  AUTHORS   Wells,D.J. and Welker,D.L.
  TITLE     Dictyostelium discoideum Tdd-4 transposable element, left end
            flanking sequence from clone p427/428
  JOURNAL   Unpublished
REFERENCE   3  (bases 1 to 83)
  AUTHORS   Wells,D.J. and Welker,D.L.
  TITLE     Direct Submission
  JOURNAL   Submitted (11-JUL-1996) Biology, Utah State Univ., Logan, UT
            84322-5305, USA
FEATURES             Location/Qualifiers
     source          1..83
                     /organism="Dictyostelium discoideum"
                     /strain="AX4"
                     /db_xref="taxon:44689"
                     clone="p427428"
BASE COUNT       31 a     16 c     12 g     24 t
ORIGIN      
        1 ttcgaaggat atctcaaggc agttaataat tactatgatg attgtaaaat attccaaagt
       61 ttcccagacc caccaataat gac
//
LOCUS       HUMBDNF       918 bp    DNA             PRI       31-OCT-1994
DEFINITION  Human brain-derived neurotrophic factor (BDNF) gene, complete cds.
ACCESSION   M37762
VERSION     M37762.1  GI:179402
KEYWORDS    neurotrophic factor.
SOURCE      Human DNA.
  ORGANISM  Homo sapiens
            Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Mammalia;
            Eutheria; Primates; Catarrhini; Hominidae; Homo.
REFERENCE   1  (bases 1 to 918)
  AUTHORS   Jones,K.R. and Reichardt,L.F.
  TITLE     Molecular cloning of a human gene that is a member of the nerve
            growth factor family
  JOURNAL   Proc. Natl. Acad. Sci. U.S.A. 87 (20), 8060-8064 (1990)
  MEDLINE   91045937
COMMENT     Draft entry and computer-readable sequence for [Proc. Natl. Acad.
            Sci. U.S.A. (1990) In press] kindly submitted
            by K.R.Jones, 13-AUG-1990.
FEATURES             Location/Qualifiers
     source          1..918
                     /organism="Homo sapiens"
                     /db_xref="taxon:9606"
                     /dev_stage="adult"
     sig_peptide     76..123
                     /gene="NTF3"
                     /note="G00-125-917; putative"
                     /product="brain-derived neurotrophic factor"
     CDS             76..819
                     /gene="BDNF"
                     /note="putative"
                     /codon_start=1
                     /db_xref="GDB:G00-125-916"
                     /product="brain-derived neurotrophic factor"
                     /protein_id="AAA51820.1"
                     /db_xref="GI:179403"
                     /translation="MTILFLTMVISYFGCMKAAPMKEANIRGQGGLAYPGVRTHGTLE
                     SVNGPKAGSRGLTSLADTFEHVIEELLDEDQKVRPNEENNKDADLYTSRVMLSSQVPL
                     EPPLLFLLEEYKNYLDAANMSMRVRRHSDPARRGELSVCDSISEWVTAADKKTAVDMS
                     GGTVTVLEKVPVSKGQLKQYFYETKCNPMGYTKEGCRGIDKRHWNSQCRTTQSYVRAL
                     TMDSKKRIGWRFIRIDTSCVCTLTIKRGR"
     gene            76..816
                     /gene="NTF3"
                     /map="12p13"
     gene            76..819
                     /gene="BDNF"
                     /map="11p13"
     mat_peptide     124..816
                     /gene="NTF3"
                     /note="G00-125-917; putative"
                     /product="brain-derived neurotrophic factor"
BASE COUNT      269 a    192 c    237 g    220 t
ORIGIN
        1 ggtgaaagaa agccctaacc agttttctgt cttgtttctg ctttctccct acagttccac
       61 caggtgagaa gagtgatgac catccttttc cttactatgg ttatttcata ctttggttgc
      121 atgaaggctg cccccatgaa agaagcaaac atccgaggac aaggtggctt ggcctaccca
      181 ggtgtgcgga cccatgggac tctggagagc gtgaatgggc ccaaggcagg ttcaagaggc
      241 ttgacatcat tggctgacac tttcgaacac gtgatagaag agctgttgga tgaggaccag
      301 aaagttcggc ccaatgaaga aaacaataag gacgcagact tgtacacgtc cagggtgatg
      361 ctcagtagtc aagtgccttt ggagcctcct cttctctttc tgctggagga atacaaaaat
      421 tacctagatg ctgcaaacat gtccatgagg gtccggcgcc actctgaccc tgcccgccga
      481 ggggagctga gcgtgtgtga cagtattagt gagtgggtaa cggcggcaga caaaaagact
      541 gcagtggaca tgtcgggcgg gacggtcaca gtccttgaaa aggtccctgt atcaaaaggc
      601 caactgaagc aatacttcta cgagaccaag tgcaatccca tgggttacac aaaagaaggc
      661 tgcaggggca tagacaaaag gcattggaac tcccagtgcc gaactaccca gtcgtacgtg
      721 cgggccctta ccatggatag caaaaagaga attggctggc gattcataag gatagacact
      781 tcttgtgtat gtacattgac cattaaaagg ggaagatagt ggatttatgt tgtatagatt
      841 agattatatt gagacaaaaa ttatctattt gtatatatac ataacagggt aaattattca
      901 gttaagaaaa aaataatt
//
LOCUS       NT_010368  161485 bp    DNA             CON       16-NOV-2000
DEFINITION  Homo sapiens chromosome 15 working draft sequence segment, complete
            sequence.
ACCESSION   NT_010368
VERSION     NT_010368.1  GI:11433101
KEYWORDS    HTG.
SOURCE      human.
  ORGANISM  Homo sapiens
            Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
            Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo.
REFERENCE   1  (bases 1 to 161485)
  AUTHORS   International Human Genome Project collaborators.
  TITLE     Toward the complete sequence of the human genome
  JOURNAL   Unpublished
COMMENT     GENOME ANNOTATION REFSEQ:  NCBI contigs are derived from assembled
            genomic sequence data. They may include both draft and finished
            sequence.
            COMPLETENESS: not full length.
FEATURES             Location/Qualifiers
     source          1..310
                     /organism="Homo sapiens"
                     /db_xref="taxon:9606"
                     /chromosome="15"
     source          order(1..100,251..300,300..310)
                     /note="Doctored from Accession AC011224 
	             sequenced by Whitehead Institute
                     for Biomedical Research"
                     /organism="Homo sapiens"
                     /db_xref="taxon:9606"
                     /clone="RP11-10K20"
     variation       244
                     /replace="T"
                     /replace="A"
                     /db_xref="dbSNP:140670"
ORIGIN      
        1 gtgacagttg gctgtcagac atacaatgat tgtttagaag aggagaagat tgatccggag
       61 taccgtgata gtattttaaa aactatgaaa gcgggaatac ttaatggtaa actagttaga
      121 ttatgtgacg tgccaagggg tgtagatgta gaaattgaaa caactggtct aaccgattca
      181 gaaggagaaa gtgaatcaaa agaagaagag tgatgatgaa tagccaccat tactgcatac
      241 tgtagccctt acccttgtcg caccattagc cattaataaa aataaaaaat tatataaaaa
      301 ttacacccat 
//
LOCUS       HUMBETGLOA   3002 bp    DNA             PRI       26-AUG-1994
DEFINITION  Human haplotype C4 beta-globin gene, complete cds.
ACCESSION   L26462
VERSION     L26462.1  GI:432453
KEYWORDS    beta-globin.
SOURCE      Homo sapiens DNA.
  ORGANISM  Homo sapiens
            Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Mammalia;
            Eutheria; Primates; Catarrhini; Hominidae; Homo.
REFERENCE   1  (bases 1 to 3002)
  AUTHORS   Fullerton,S.M., Harding,R.M., Boyce,A.J. and Clegg,J.B.
  TITLE     Molecular and population genetic analysis of allelic sequence
            diversity at the human beta-globin locus
  JOURNAL   Proc. Natl. Acad. Sci. U.S.A. 91 (5), 1805-1809 (1994)
  MEDLINE   94173918
FEATURES             Location/Qualifiers
     source          1..3002
                     /organism="Homo sapiens"
                     /db_xref="taxon:9606"
                     /haplotype="C4"
                     /note="sequence found in a Melanesian population"
     variation       replace(111,"t")
     variation       replace(263,"t")
                     /note="Rsa I polymorphism"
     variation       replace(273,"c")
     variation       replace(286..287,"")
                     /note="2 bp insertion of AT"
     variation       replace(288,"t")
     variation       replace(295..296,"")
                     /note="1 bp deletion of C or 2 bp deletion of CT"
     variation       replace(347,"c")
     variation       replace(476,"t")
     variation       replace(500,"c")
     CDS             join(866..957,1088..1310,2161..2289)
                     /codon_start=1
                     /product="beta-globin"
                     /protein_id="AAA21100.1"
                     /db_xref="GI:532506"
                     /translation="MVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFE
                     SFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPE
                     NFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH"
     exon            <866..957
                     /number=1
     variation       replace(874,"c")
     intron          958..1087
                     /number=1
     exon            1088..1310
                     /number=2
     intron          1311..2160
                     /number=2
     variation       replace(1326,"g")
                     /note="Ava II polymorphism"
     variation       replace(1384,"g")
     variation       replace(1391,"t")
     variation       replace(1976,"t")
     exon            2161..>2289
                     /number=3
     variation       replace(2522,"c")
     variation       replace(2602,"a")
     variation       replace(2604,"c")
     variation       replace(2760,"t")
                     /note="Hinf I polymorphism"
     variation       replace(2913,"g")
BASE COUNT      810 a    601 c    599 g    992 t
ORIGIN
        1 acctcctatt tgacaccact gattacccca ttgatagtca cactttgggt tgtaagtgac
       61 tttttattta tttgtatttt tgactgcatt aagaggtctc tagtttttta cctcttgttt
      121 cccaaaacct aataagtaac taatgcacag agcacattga tttgtattta ttctattttt
      181 agacataatt tattagcatg catgagcaaa ttaagaaaaa caacaacaaa tgaatgcata
      241 tatatgtata tgtatgtgtg tacatataca catatatata tatatatatt ttttcttttc
      301 ttaccagaag gttttaatcc aaataaggag aagatatgct tagaactgag gtagagtttt
      361 catccattct gtcctgtaag tattttgcat attctggaga cgcaggaaga gatccatcta
      421 catatcccaa agctgaatta tggtagacaa aactcttcca cttttagtgc atcaacttct
      481 tatttgtgta ataagaaaat tgggaaaacg atcttcaata tgcttaccaa gctgtgattc
      541 caaatattac gtaaatacac ttgcaaagga ggatgttttt agtagcaatt tgtactgatg
      601 gtatggggcc aagagatata tcttagaggg agggctgagg gtttgaagtc caactcctaa
      661 gccagtgcca gaagagccaa ggacaggtac ggctgtcatc acttagacct caccctgtgg
      721 agccacaccc tagggttggc caatctactc ccaggagcag ggagggcagg agccagggct
      781 gggcataaaa gtcagggcag agccatctat tgcttacatt tgcttctgac acaactgtgt
      841 tcactagcaa cctcaaacag acaccatggt gcatctgact cctgaggaga agtctgccgt
      901 tactgccctg tggggcaagg tgaacgtgga tgaagttggt ggtgaggccc tgggcaggtt
      961 ggtatcaagg ttacaagaca ggtttaagga gaccaataga aactgggcat gtggagacag
     1021 agaagactct tgggtttctg ataggcactg actctctctg cctattggtc tattttccca
     1081 cccttaggct gctggtggtc tacccttgga cccagaggtt ctttgagtcc tttggggatc
     1141 tgtccactcc tgatgctgtt atgggcaacc ctaaggtgaa ggctcatggc aagaaagtgc
     1201 tcggtgcctt tagtgatggc ctggctcacc tggacaacct caagggcacc tttgccacac
     1261 tgagtgagct gcactgtgac aagctgcacg tggatcctga gaacttcagg gtgagtctat
     1321 gggacccttg atgttttctt tccccttctt ttctatggtt aagttcatgt cataggaagg
     1381 ggataagtaa cagggtacag tttagaatgg gaaacagacg aatgattgca tcagtgtgga
     1441 agtctcagga tcgttttagt ttcttttatt tgctgttcat aacaattgtt ttcttttgtt
     1501 taattcttgc tttctttttt tttcttctcc gcaattttta ctattatact taatgcctta
     1561 acattgtgta taacaaaagg aaatatctct gagatacatt aagtaactta aaaaaaaact
     1621 ttacacagtc tgcctagtac attactattt ggaatatatg tgtgcttatt tgcatattca
     1681 taatctccct actttatttt cttttatttt taattgatac ataatcatta tacatattta
     1741 tgggttaaag tgtaatgttt taatatgtgt acacatattg accaaatcag ggtaattttg
     1801 catttgtaat tttaaaaaat gctttcttct tttaatatac ttttttgttt atcttatttc
     1861 taatactttc cctaatctct ttctttcagg gcaataatga tacaatgtat catgcctctt
     1921 tgcaccattc taaagaataa cagtgataat ttctgggtta aggcaatagc aatatctctg
     1981 catataaata tttctgcata taaattgtaa ctgatgtaag aggtttcata ttgctaatag
     2041 cagctacaat ccagctacca ttctgctttt attttatggt tgggataagg ctggattatt
     2101 ctgagtccaa gctaggccct tttgctaatc atgttcatac ctcttatctt cctcccacag
     2161 ctcctgggca acgtgctggt ctgtgtgctg gcccatcact ttggcaaaga attcacccca
     2221 ccagtgcagg ctgcctatca gaaagtggtg gctggtgtgg ctaatgccct ggcccacaag
     2281 tatcactaag ctcgctttct tgctgtccaa tttctattaa aggttccttt gttccctaag
     2341 tccaactact aaactggggg atattatgaa gggccttgag catctggatt ctgcctaata
     2401 aaaaacattt attttcattg caatgatgta tttaaattat ttctgaatat tttactaaaa
     2461 agggaatgtg ggaggtcagt gcatttaaaa cataaagaaa tgaagagcta gttcaaacct
     2521 tgggaaaata cactatatct taaactccat gaaagaaggt gaggctgcaa acagctaatg
     2581 cacattggca acagccctga tgcatatgcc ttattcatcc ctcagaaaag gattcaagta
     2641 gaggcttgat ttggaggtta aagttttgct atgctgtatt ttacattact tattgtttta
     2701 gctgtcctca tgaatgtctt ttcactaccc atttgcttat cctgcatctc tcagccttga
     2761 ctccactcag ttctcttgct tagagatacc acctttcccc tgaagtgttc cttccatgtt
     2821 ttacggcgag atggtttctc ctcgcctggc cactcagcct tagttgtctc tgttgtctta
     2881 tagaggtcta cttgaagaag gaaaaacagg ggtcatggtt tgactgtcct gtgagccctt
     2941 cttccctgcc tcccccactc acagtgaccc ggaatctgca gtgctagtct cccggaacta
     3001 tc
//


Chris Mungall cjm@fruitfly.org
Berkeley Drosophila Genome Project