Reading: Deitel 3.7->, Deitel 4.6, Deitel 5.1, 5.2, 5.3, Deitel 10.3 -> 10.7
#!/usr/local/bin/perl # sum.pl - adds a list of numbers |
% perl sum.pl Enter a list of numbers, seperated by spaces: 4 3 77 21 Current Running total is 4 Current Running total is 7 Current Running total is 84 Current Running total is 105 The sum of your numbers is: 105 |
# print_bases.pl
$i = 1;
foreach $base ( 'A', 'C', 'G', 'T' ) {
print "base $i = $base\n";
$i++;
}
|
this outputs:
base 1 = A base 2 = C base 3 = G base 4 = T |
the structure of this command is
foreach loop variable ( list ) { statements }
Here is an example from the Deitel book:
program 3
#!/usr/local/bin/perl # square_array.pl # # Using foreach to iterate over an array. |
this outputs:
1 4 9 16 25 36 49 64 81 100 |
# seq_lengths.pl
@seqs = qw( AATGG CAAT GCGTTAAC ATATATATATATA );
foreach ( @seqs ) {
$seq_length = length($_);
print "sequence = $_ length = $seq_length\n";
}
|
this outputs:
program 5
sequence = AATGG length = 5 sequence = CAAT length = 4 sequence = GCGTTAAC length = 8 sequence = ATATATATATATA length = 13 |
If we don't give foreach a loop variable, the special scalar variable $_ is used instead.
Whether you make use of these special variables is up to you and your own style of programming. Some people find this style leads to more succinct, condensed programs. some people (myself included) like this style because it appeals to their laziness - less typing. On the other hand, some people detest this style of programming and say it leads to ugly code that is difficult to read. |
# for_loop.pl
for ( $i = 0 ; $i < 10; $i++ ) {
print "the value of \$i is $i\n";
}
|
the value of $i is 0 the value of $i is 1 the value of $i is 2 the value of $i is 3 the value of $i is 4 the value of $i is 5 the value of $i is 6 the value of $i is 7 the value of $i is 8 the value of $i is 9 |
program 7
# forloop_decrement.pl
for ( $i = 10 ; $i > -10; $i -= 2 ) {
print "the value of \$i is $i\n";
}
|
the value of $i is 10 the value of $i is 8 the value of $i is 6 the value of $i is 4 the value of $i is 2 the value of $i is 0 the value of $i is -2 the value of $i is -4 the value of $i is -6 the value of $i is -8 |
make sure you get the test condition right - if you say less than are you sure you mean less than and not less than or equal to |
# map_example.pl - prints the numbers 1 to 100
@numbers = (1, 2, 3, 4, 5);
@numbers_squared = map { $_ * $_ } @numbers;
print "number = @numbers \n"; # 1 2 3 4 5
print "squared = @numbers_squared \n"; # 1 4 9 16 25
|
# grep_example.pl - filters expect values |
# nested_loop.pl |
Always indent your code so that you know which statement block each line of code belongs to.
Reading: Deitel 5.6
All the following examples are from the Deitel book:
Escaping from loops : next, last
next
program 11
# using_next.pl
foreach ( 1 .. 10 ) {
if ( $_ == 5 ) {
$skipped = $_; # store skipped value
next; # skip remaining code in loop only if $_ is 5
}
print "$_ ";
}
print "\n\nUsed 'next' to skip the value $skipped.\n";
|
# using_last.pl
foreach ( 1 .. 10 ) {
if ( $_ == 5 ) {
$number = $_; # store current value before loop ends
last; # jump to end of foreach structure
}
print "$_ ";
}
print "\n\nUsed 'last' to terminate loop at $number.\n";
|
# using_redo.pl
$number = 1;
while ( $number <= 5 ) {
if ( $number <= 10 ) {
print "$number ";
++$number;
redo; # Continue loop without testing ( $number <= 5 )
}
}
print "\nStopped when \$number became $number.\n";
|
1 2 3 4 5 6 7 8 9 10 Stopped when $number became 11. |
It is possible to label statement blocks to have greater control over
program flow. While this is unecessary, some people prefer this style.
Described in Deitel 5.9
we have already encountered the standard input filehandle STDIN
recall -
Block Labels (optional)
Filehandles
program 14
#!/usr/local/bin/perl # echo.pl |
this program reads a line of input from the user's terminal, and writes back the line.
STDIN is the name of a special Filehandle that is for reading input from the command line
The next program reads in multiple lines and prints them all
program 15
#!/usr/local/bin/perl # echo2.pl |
$_ as the default input variable
program 16
#!/usr/local/bin/perl # echo3.pl |
Here is another example:
program 17
#!/usr/local/bin/perl # wc.pl - imitates unix wc (word count) program |
The empty angle brackets <> inidicate input should either be from files specified on the command line, or from standard input.
you can use the program this way (with a pipe):
% cat myfile | wc.pl |
or this way:
% wc.pl myfile |
#!/usr/local/bin/perl # sort.pl - sorts the lines in a file alphabetically |
Notice the construct in the first line of code; this is a common construct in perl programs. If the open function returns FALSE, indicating the file could not be opened, then the program dies/exits with a meaningful message.
Let's create a file then sort it using our program:
% cat > myfile.txt frizzle zen abacus quibble jellybean % sort.pl abacus frizzle jellybean quibble zen |
# filewrite.pl open(F, ">output.txt") or die( "cannot write to output.txt : $!\n"); print F "Hello World!\n"; close(F); |
open(F, ">>output.txt"); print F "Goodbye world!\n"; close(F); |
% cat output.txt Hello World! Goodbye world! |
while ( $line = <F> ) {
# do something here
}
|
Loops through the filehandle F, each time reading in all the characters up to and including the newline character \n
What if we want to split up a file using a different seperator?
This is where some of perl's more unusual features become extremely useful and powerful. The next program reads in a series of genbank records, with each record divided using the double-forwardslash. It asks the user for a record number and prints that record.
See the appendix for a test dataset |
program 22
#!/usr/local/bin/perl # genbank_fetch.pl |
To download a test genbank file click here |
We can modify the above program, using our knowledge of regular expressions and pattern matching:
program 23
#!/usr/local/bin/perl # genbank_filter.pl - filters a file of genbank entries |
Try running this program on a file of genbank entries, filtering by organism:
% cat test.genbank | genbank_filter.pl "Homo sapiens" |
or by gene name:
% cat test.genbank | genbank_filter.pl "beta-globin" |
or multiple filters using your knowledge of unix pipes:
% cat test.genbank | genbank_filter.pl "Homo sapiens" | genbank_filter.pl "beta-globin" |
%re_lookup = (
'Eco47III'=> 'AGCGCT',
'EcoRI' => 'GAATTC',
'HindIII' => 'AAGCTT',
);
|
Here is an example file of 5 GenBank entries for using in some of the programs above.
You can also download this by clicking here |
LOCUS DDU63596 310 bp DNA INV 14-MAY-1999
DEFINITION Dictyostelium discoideum Tdd-4 transposable element flanking
sequence, clone p427/428 right end.
ACCESSION U63596
NID g2393749
KEYWORDS .
SOURCE Dictyostelium discoideum.
ORGANISM Dictyostelium discoideum
Eukaryota; Dictyosteliida; Dictyostelium.
REFERENCE 1 (bases 1 to 310)
AUTHORS Wells,D.J.
TITLE Tdd-4, a DNA transposon of Dictyostelium that encodes proteins
similar to LTR retroelement integrases
JOURNAL Nucleic Acids Res. 27 (11), 2408-2415 (1999)
REFERENCE 2 (bases 1 to 310)
AUTHORS Wells,D.J. and Welker,D.L.
TITLE Dictyostelium discoideum Tdd-4 transposable element, right end
flanking sequence from clone p427/428
JOURNAL Unpublished
REFERENCE 3 (bases 1 to 310)
AUTHORS Wells,D.J. and Welker,D.L.
TITLE Direct Submission
JOURNAL Submitted (11-JUL-1996) Biology, Utah State Univ., Logan, UT
84322-5305, USA
FEATURES Location/Qualifiers
source 1..310
/organism="Dictyostelium discoideum"
/strain="AX4"
/db_xref="taxon:44689"
clone="p427428"
misc_feature 5.12
/note="Fuzzy location"
misc_feature J00194:(100..202),1..245,300..422
/note="Location partly in another entry"
BASE COUNT 118 a 46 c 67 g 79 t
ORIGIN
1 gtgacagttg gctgtcagac atacaatgat tgtttagaag aggagaagat tgatccggag
61 taccgtgata gtattttaaa aactatgaaa gcgggaatac ttaatggtaa actagttaga
121 ttatgtgacg tgccaagggg tgtagatgta gaaattgaaa caactggtct aaccgattca
181 gaaggagaaa gtgaatcaaa agaagaagag tgatgatgaa tagccaccat tactgcatac
241 tgtagccctt acccttgtcg caccattagc cattaataaa aataaaaaat tatataaaaa
301 ttacacccat
//
LOCUS DDU63595 83 bp DNA INV 14-MAY-1999
DEFINITION Dictyostelium discoideum Tdd-4 transposable element flanking
sequence, clone p427/428 left end.
ACCESSION U63595
NID g2393748
KEYWORDS .
SOURCE Dictyostelium discoideum.
ORGANISM Dictyostelium discoideum
Eukaryota; Dictyosteliida; Dictyostelium.
REFERENCE 1 (bases 1 to 83)
AUTHORS Wells,D.J.
TITLE Tdd-4, a DNA transposon of Dictyostelium that encodes proteins
similar to LTR retroelement integrases
JOURNAL Nucleic Acids Res. 27 (11), 2408-2415 (1999)
REFERENCE 2 (bases 1 to 83)
AUTHORS Wells,D.J. and Welker,D.L.
TITLE Dictyostelium discoideum Tdd-4 transposable element, left end
flanking sequence from clone p427/428
JOURNAL Unpublished
REFERENCE 3 (bases 1 to 83)
AUTHORS Wells,D.J. and Welker,D.L.
TITLE Direct Submission
JOURNAL Submitted (11-JUL-1996) Biology, Utah State Univ., Logan, UT
84322-5305, USA
FEATURES Location/Qualifiers
source 1..83
/organism="Dictyostelium discoideum"
/strain="AX4"
/db_xref="taxon:44689"
clone="p427428"
BASE COUNT 31 a 16 c 12 g 24 t
ORIGIN
1 ttcgaaggat atctcaaggc agttaataat tactatgatg attgtaaaat attccaaagt
61 ttcccagacc caccaataat gac
//
LOCUS HUMBDNF 918 bp DNA PRI 31-OCT-1994
DEFINITION Human brain-derived neurotrophic factor (BDNF) gene, complete cds.
ACCESSION M37762
VERSION M37762.1 GI:179402
KEYWORDS neurotrophic factor.
SOURCE Human DNA.
ORGANISM Homo sapiens
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Mammalia;
Eutheria; Primates; Catarrhini; Hominidae; Homo.
REFERENCE 1 (bases 1 to 918)
AUTHORS Jones,K.R. and Reichardt,L.F.
TITLE Molecular cloning of a human gene that is a member of the nerve
growth factor family
JOURNAL Proc. Natl. Acad. Sci. U.S.A. 87 (20), 8060-8064 (1990)
MEDLINE 91045937
COMMENT Draft entry and computer-readable sequence for [Proc. Natl. Acad.
Sci. U.S.A. (1990) In press] kindly submitted
by K.R.Jones, 13-AUG-1990.
FEATURES Location/Qualifiers
source 1..918
/organism="Homo sapiens"
/db_xref="taxon:9606"
/dev_stage="adult"
sig_peptide 76..123
/gene="NTF3"
/note="G00-125-917; putative"
/product="brain-derived neurotrophic factor"
CDS 76..819
/gene="BDNF"
/note="putative"
/codon_start=1
/db_xref="GDB:G00-125-916"
/product="brain-derived neurotrophic factor"
/protein_id="AAA51820.1"
/db_xref="GI:179403"
/translation="MTILFLTMVISYFGCMKAAPMKEANIRGQGGLAYPGVRTHGTLE
SVNGPKAGSRGLTSLADTFEHVIEELLDEDQKVRPNEENNKDADLYTSRVMLSSQVPL
EPPLLFLLEEYKNYLDAANMSMRVRRHSDPARRGELSVCDSISEWVTAADKKTAVDMS
GGTVTVLEKVPVSKGQLKQYFYETKCNPMGYTKEGCRGIDKRHWNSQCRTTQSYVRAL
TMDSKKRIGWRFIRIDTSCVCTLTIKRGR"
gene 76..816
/gene="NTF3"
/map="12p13"
gene 76..819
/gene="BDNF"
/map="11p13"
mat_peptide 124..816
/gene="NTF3"
/note="G00-125-917; putative"
/product="brain-derived neurotrophic factor"
BASE COUNT 269 a 192 c 237 g 220 t
ORIGIN
1 ggtgaaagaa agccctaacc agttttctgt cttgtttctg ctttctccct acagttccac
61 caggtgagaa gagtgatgac catccttttc cttactatgg ttatttcata ctttggttgc
121 atgaaggctg cccccatgaa agaagcaaac atccgaggac aaggtggctt ggcctaccca
181 ggtgtgcgga cccatgggac tctggagagc gtgaatgggc ccaaggcagg ttcaagaggc
241 ttgacatcat tggctgacac tttcgaacac gtgatagaag agctgttgga tgaggaccag
301 aaagttcggc ccaatgaaga aaacaataag gacgcagact tgtacacgtc cagggtgatg
361 ctcagtagtc aagtgccttt ggagcctcct cttctctttc tgctggagga atacaaaaat
421 tacctagatg ctgcaaacat gtccatgagg gtccggcgcc actctgaccc tgcccgccga
481 ggggagctga gcgtgtgtga cagtattagt gagtgggtaa cggcggcaga caaaaagact
541 gcagtggaca tgtcgggcgg gacggtcaca gtccttgaaa aggtccctgt atcaaaaggc
601 caactgaagc aatacttcta cgagaccaag tgcaatccca tgggttacac aaaagaaggc
661 tgcaggggca tagacaaaag gcattggaac tcccagtgcc gaactaccca gtcgtacgtg
721 cgggccctta ccatggatag caaaaagaga attggctggc gattcataag gatagacact
781 tcttgtgtat gtacattgac cattaaaagg ggaagatagt ggatttatgt tgtatagatt
841 agattatatt gagacaaaaa ttatctattt gtatatatac ataacagggt aaattattca
901 gttaagaaaa aaataatt
//
LOCUS NT_010368 161485 bp DNA CON 16-NOV-2000
DEFINITION Homo sapiens chromosome 15 working draft sequence segment, complete
sequence.
ACCESSION NT_010368
VERSION NT_010368.1 GI:11433101
KEYWORDS HTG.
SOURCE human.
ORGANISM Homo sapiens
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo.
REFERENCE 1 (bases 1 to 161485)
AUTHORS International Human Genome Project collaborators.
TITLE Toward the complete sequence of the human genome
JOURNAL Unpublished
COMMENT GENOME ANNOTATION REFSEQ: NCBI contigs are derived from assembled
genomic sequence data. They may include both draft and finished
sequence.
COMPLETENESS: not full length.
FEATURES Location/Qualifiers
source 1..310
/organism="Homo sapiens"
/db_xref="taxon:9606"
/chromosome="15"
source order(1..100,251..300,300..310)
/note="Doctored from Accession AC011224
sequenced by Whitehead Institute
for Biomedical Research"
/organism="Homo sapiens"
/db_xref="taxon:9606"
/clone="RP11-10K20"
variation 244
/replace="T"
/replace="A"
/db_xref="dbSNP:140670"
ORIGIN
1 gtgacagttg gctgtcagac atacaatgat tgtttagaag aggagaagat tgatccggag
61 taccgtgata gtattttaaa aactatgaaa gcgggaatac ttaatggtaa actagttaga
121 ttatgtgacg tgccaagggg tgtagatgta gaaattgaaa caactggtct aaccgattca
181 gaaggagaaa gtgaatcaaa agaagaagag tgatgatgaa tagccaccat tactgcatac
241 tgtagccctt acccttgtcg caccattagc cattaataaa aataaaaaat tatataaaaa
301 ttacacccat
//
LOCUS HUMBETGLOA 3002 bp DNA PRI 26-AUG-1994
DEFINITION Human haplotype C4 beta-globin gene, complete cds.
ACCESSION L26462
VERSION L26462.1 GI:432453
KEYWORDS beta-globin.
SOURCE Homo sapiens DNA.
ORGANISM Homo sapiens
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Mammalia;
Eutheria; Primates; Catarrhini; Hominidae; Homo.
REFERENCE 1 (bases 1 to 3002)
AUTHORS Fullerton,S.M., Harding,R.M., Boyce,A.J. and Clegg,J.B.
TITLE Molecular and population genetic analysis of allelic sequence
diversity at the human beta-globin locus
JOURNAL Proc. Natl. Acad. Sci. U.S.A. 91 (5), 1805-1809 (1994)
MEDLINE 94173918
FEATURES Location/Qualifiers
source 1..3002
/organism="Homo sapiens"
/db_xref="taxon:9606"
/haplotype="C4"
/note="sequence found in a Melanesian population"
variation replace(111,"t")
variation replace(263,"t")
/note="Rsa I polymorphism"
variation replace(273,"c")
variation replace(286..287,"")
/note="2 bp insertion of AT"
variation replace(288,"t")
variation replace(295..296,"")
/note="1 bp deletion of C or 2 bp deletion of CT"
variation replace(347,"c")
variation replace(476,"t")
variation replace(500,"c")
CDS join(866..957,1088..1310,2161..2289)
/codon_start=1
/product="beta-globin"
/protein_id="AAA21100.1"
/db_xref="GI:532506"
/translation="MVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFE
SFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPE
NFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH"
exon <866..957
/number=1
variation replace(874,"c")
intron 958..1087
/number=1
exon 1088..1310
/number=2
intron 1311..2160
/number=2
variation replace(1326,"g")
/note="Ava II polymorphism"
variation replace(1384,"g")
variation replace(1391,"t")
variation replace(1976,"t")
exon 2161..>2289
/number=3
variation replace(2522,"c")
variation replace(2602,"a")
variation replace(2604,"c")
variation replace(2760,"t")
/note="Hinf I polymorphism"
variation replace(2913,"g")
BASE COUNT 810 a 601 c 599 g 992 t
ORIGIN
1 acctcctatt tgacaccact gattacccca ttgatagtca cactttgggt tgtaagtgac
61 tttttattta tttgtatttt tgactgcatt aagaggtctc tagtttttta cctcttgttt
121 cccaaaacct aataagtaac taatgcacag agcacattga tttgtattta ttctattttt
181 agacataatt tattagcatg catgagcaaa ttaagaaaaa caacaacaaa tgaatgcata
241 tatatgtata tgtatgtgtg tacatataca catatatata tatatatatt ttttcttttc
301 ttaccagaag gttttaatcc aaataaggag aagatatgct tagaactgag gtagagtttt
361 catccattct gtcctgtaag tattttgcat attctggaga cgcaggaaga gatccatcta
421 catatcccaa agctgaatta tggtagacaa aactcttcca cttttagtgc atcaacttct
481 tatttgtgta ataagaaaat tgggaaaacg atcttcaata tgcttaccaa gctgtgattc
541 caaatattac gtaaatacac ttgcaaagga ggatgttttt agtagcaatt tgtactgatg
601 gtatggggcc aagagatata tcttagaggg agggctgagg gtttgaagtc caactcctaa
661 gccagtgcca gaagagccaa ggacaggtac ggctgtcatc acttagacct caccctgtgg
721 agccacaccc tagggttggc caatctactc ccaggagcag ggagggcagg agccagggct
781 gggcataaaa gtcagggcag agccatctat tgcttacatt tgcttctgac acaactgtgt
841 tcactagcaa cctcaaacag acaccatggt gcatctgact cctgaggaga agtctgccgt
901 tactgccctg tggggcaagg tgaacgtgga tgaagttggt ggtgaggccc tgggcaggtt
961 ggtatcaagg ttacaagaca ggtttaagga gaccaataga aactgggcat gtggagacag
1021 agaagactct tgggtttctg ataggcactg actctctctg cctattggtc tattttccca
1081 cccttaggct gctggtggtc tacccttgga cccagaggtt ctttgagtcc tttggggatc
1141 tgtccactcc tgatgctgtt atgggcaacc ctaaggtgaa ggctcatggc aagaaagtgc
1201 tcggtgcctt tagtgatggc ctggctcacc tggacaacct caagggcacc tttgccacac
1261 tgagtgagct gcactgtgac aagctgcacg tggatcctga gaacttcagg gtgagtctat
1321 gggacccttg atgttttctt tccccttctt ttctatggtt aagttcatgt cataggaagg
1381 ggataagtaa cagggtacag tttagaatgg gaaacagacg aatgattgca tcagtgtgga
1441 agtctcagga tcgttttagt ttcttttatt tgctgttcat aacaattgtt ttcttttgtt
1501 taattcttgc tttctttttt tttcttctcc gcaattttta ctattatact taatgcctta
1561 acattgtgta taacaaaagg aaatatctct gagatacatt aagtaactta aaaaaaaact
1621 ttacacagtc tgcctagtac attactattt ggaatatatg tgtgcttatt tgcatattca
1681 taatctccct actttatttt cttttatttt taattgatac ataatcatta tacatattta
1741 tgggttaaag tgtaatgttt taatatgtgt acacatattg accaaatcag ggtaattttg
1801 catttgtaat tttaaaaaat gctttcttct tttaatatac ttttttgttt atcttatttc
1861 taatactttc cctaatctct ttctttcagg gcaataatga tacaatgtat catgcctctt
1921 tgcaccattc taaagaataa cagtgataat ttctgggtta aggcaatagc aatatctctg
1981 catataaata tttctgcata taaattgtaa ctgatgtaag aggtttcata ttgctaatag
2041 cagctacaat ccagctacca ttctgctttt attttatggt tgggataagg ctggattatt
2101 ctgagtccaa gctaggccct tttgctaatc atgttcatac ctcttatctt cctcccacag
2161 ctcctgggca acgtgctggt ctgtgtgctg gcccatcact ttggcaaaga attcacccca
2221 ccagtgcagg ctgcctatca gaaagtggtg gctggtgtgg ctaatgccct ggcccacaag
2281 tatcactaag ctcgctttct tgctgtccaa tttctattaa aggttccttt gttccctaag
2341 tccaactact aaactggggg atattatgaa gggccttgag catctggatt ctgcctaata
2401 aaaaacattt attttcattg caatgatgta tttaaattat ttctgaatat tttactaaaa
2461 agggaatgtg ggaggtcagt gcatttaaaa cataaagaaa tgaagagcta gttcaaacct
2521 tgggaaaata cactatatct taaactccat gaaagaaggt gaggctgcaa acagctaatg
2581 cacattggca acagccctga tgcatatgcc ttattcatcc ctcagaaaag gattcaagta
2641 gaggcttgat ttggaggtta aagttttgct atgctgtatt ttacattact tattgtttta
2701 gctgtcctca tgaatgtctt ttcactaccc atttgcttat cctgcatctc tcagccttga
2761 ctccactcag ttctcttgct tagagatacc acctttcccc tgaagtgttc cttccatgtt
2821 ttacggcgag atggtttctc ctcgcctggc cactcagcct tagttgtctc tgttgtctta
2881 tagaggtcta cttgaagaag gaaaaacagg ggtcatggtt tgactgtcct gtgagccctt
2941 cttccctgcc tcccccactc acagtgaccc ggaatctgca gtgctagtct cccggaacta
3001 tc
//
|
Berkeley Drosophila Genome Project
Date: Fri May 25 09:25:37 2001