Reading:
Deitel ch4
(leave 4.6 until
next lecture)
Advanced: man
perldata (perl data structures manual)
So far we have
been looking at scalar variables. Recall a scalar is a single
valued variable. 
Limitations
of scalar variables
Imagine we want
to find the average of a list of numbers
we could do it
like this:
program 1
| $number1 = 5.4;
$number2 = 7.3;
$number3 = 4.1;
$average = ( $number1 + $number2 + $number3 ) / 3;
|
download
but this is obviously
extremely limited
Lists
program 2
| ( 5.6, 8.22, 14.9 ); # list of floating point numbers
( "hello", "brazil" ); # list of strings
( "hello", $country );
( "blah", 18, 22, "x", 3.14 ); # mixed list
( 0 .. 5 ); # list of integers between 0 and 5
( 'a' .. 'z' ); # list of strings a,b,c,d......
|
download
Array
variables
program 3
| @numbers = (5.6, 8.22, 14.9); # list of floating point numbers
@words = ("Hello", "Brazil!"); # list of strings
@qual = (100, 100, 100, 75, 75, 75);
@greeting = ("hello", $country);
@list = ("blah", 18, 22, "x", 3.14); # mixed list
@range = (0..5); # list of integers betwen 0 and 5
|
download
as we can see,
we use the special character @ for denoting arrays
Accessing
array elements
program 4
| # alphabet_index.pl
print "Enter an index number between 0 and 25\n";
$index = <STDIN>;
chomp $index;
@letters = ('A'..'Z');
print "letter index $index = $letters[$index] \n";
|
download
The number in
square brackets is the index
arrays are
indexed from zero, not one
The arrays above
are lists of scalars. we use the scalar sign $ to indicate
that the elements we are accessing is a scalar.
Setting the
values in an array
program
5
| # array_set.pl
@words = ('The', 'quick', 'brown', 'fox');
$words[1] = 'small';
$words[2] = 'furry';
print "@words \n"; # The small furry fox
|
download
Let's go through
this line by line and see what is happening
program 6
| @words = ('The', 'quick', 'brown', 'fox');
|
download
This constructs
an array that looks like this:
| +------+
$words[0] |The |
+------+
$words[1] |quick |
+------+
$words[2] |brown |
+------+
$words[3] |fox |
+------+
|
program
7
download
This sets element
1 (counting from zero) of the array, so our array now looks like this:
| +------+
$words[0] |The |
+------+
$words[1] |small |
+------+
$words[2] |brown |
+------+
$words[3] |fox |
+------+
|
program
8
download
This sets element
2 (counting from zero) of the array, so our array ends up like this:
| +------+
$words[0] |The |
+------+
$words[1] |small |
+------+
$words[2] |furry |
+------+
$words[3] |fox |
+------+
|
Indexing arrays
with negative numbers
You can index
from the end of an array backwards by using negative numbers to index
the array:
program 9
| # negative_index.pl
@letters = ('A'..'Z');
print " last letter = $letters[-1] \n"; # Z
print "penultimate letter = $letters[-2] \n"; # Y
|
download
Getting the
length of an array
You can use the
function scalar to turn an array into a single valued scalar
variable; the value of this variable will be the length of the array.
program 10
| @numbers = (0..100);
print scalar(@numbers); # prints 101
|
download
The index
count $#
You can also get
the value of the last index by preceeding the array variable name
with $#
program 11
| # index2.pl
@numbers = (0..100);
@numbers = reverse @numbers;
print "index = $#numbers \n"; # prints 100
print "$numbers[$#numbers] \n"; # prints 0
|
download
Taking a "slice"
of an array
program
12
| @words = ('the', 'quick', 'brown', 'fox');
@cut = @words[1,3]; # same as ($words[1], $words[3])
print "@cut \n"; # quick fox
|
download
Functions
that act on arrays
Push
program
13
| # push_example.pl
@numbers = (1, 2, 3);
push(@numbers, 4, 5);
print "@numbers \n"; # prints 1 2 3 4 5
|
download
Pop
program
14
| # pop_example.pl
@words = ('the', 'quick', 'brown', 'fox');
print pop(@words); # fox
print pop(@words); # brown
print pop(@words); # quick
print pop(@words); # the
|
download
Shift
program
15
| # shift_example.pl
@words = ('the', 'quick', 'brown', 'fox');
print pop(@words); # the
print pop(@words); # quick
print pop(@words); # brown
print pop(@words); # fox
|
download
Unshift
program
16
| # unshift_example.pl
@words = ('quick', 'brown', 'fox');
unshift(@words, 'the');
print "@words\n"; # the quick brown fox
|
download
Reverse
program
17
| # reverse_example.pl
@words = ('the', 'quick', 'brown', 'fox');
print reverse(@words), "\n"; # foxbrownquickthe
|
download
| an array in quotes is interpolated i.e. there are spaces placed
between the words. If we print an array without the quotes, the
elements are all squashed together.
|
Sort
program
18
| # sort_example.pl
@words = ('The', 'quick', 'brown', 'fox', 'jumped');
@sorted = sort(@words);
print "sorted words = @sorted\n"; # The brown fox jumped quick
|
download
you can optionally
specify a code block to use as the sort method. Use the special variables
$a and $b to specify to sort comparison.
program 19
| # sort_example2.pl
@words = ('The', 'quick', 'brown', 'fox', 'jumped');
@sorted = sort { lc($a) cmp lc($b) } @words;
print "sorted words = @sorted\n"; # brown fox jumped quick The
|
download
lc is a
function that takes a string as an arguments and returns the string
as lower case cmp is a new operation. it returns
-1 if the
left side is less than (alphabetically before) the right side
0 if the
left side is the same as the right side
+1 if the
left side is greater than (alphabetically after) the right side
what do you think
the outcome of the following program is?
program 20
| # sort_example3.pl
@numbers = (100, 101, 102, 10, 11, 12, 1, 2, 3);
@sorted = sort @numbers;
print "sorted numbers = @sorted\n";
|
download
The default sort
comparison is alphabetical order:
| sorted numbers = 1 10 100 101 102 11 12 2 3
|
to sort in numeric
order:
program 21
| # sort_example4.pl
@numbers = (100, 101, 102, 10, 11, 12, 1, 2, 3);
@sorted = sort { $a <=> $b } @numbers;
print "sorted numbers = @sorted\n"; # 1 2 3 10 11 12 100 101 102
|
download
<=>
is a new operation. it returns
-1 if the
left side is less than the right side
0 if the
left side is equal to the right side
+1 if the
left side is greater than the right side
Splice
program 22
| # split_example.pl
@words = ('The', 'quick', 'brown', 'fox', 'jumped');
@spliced = splice(@words, 1, 2, 'happy', 'red');
print "spliced words = @spliced\n"; # quick brown
print " words = @words\n"; # The happy red fox jumped
|
download
Join
program
23
| # join_example.pl
@words = ('The', 'quick', 'brown', 'fox', 'jumped');
print join("+", @words), "\n"; # The+quick+brown+fox+jumped
|
download
Split
program
24
| # split_example.pl
$sentence = "The+++quick+++brown+++fox+++jumped";
@words = split(/\+\+\+/, $sentence);
print "@words \n"; # The quick brown fox jumped
|
download
Often we want
to break up a sentence seperated by spaces into an array of words:
program 25
| # split_example2.pl
$sentence = "The quick brown fox jumped";
@words = split(/ /, $sentence);
print "word0 = '$words[0]'\n"; # 'The'
print "word1 = '$words[1]'\n"; # ''
print "word2 = '$words[2]'\n"; # ''
print "word3 = '$words[3]'\n"; # ''
print "word4 = '$words[4]'\n"; # 'quick'
|
download
What has happened
here is that the split function in splitting on each individual space
character. To remedy this:
program 26
| # split_example3.pl
$sentence = "The quick brown fox jumped";
@words = split(' ', $sentence);
print "word0 = '$words[0]'\n"; # 'The'
print "word1 = '$words[1]'\n"; # 'quick'
print "word2 = '$words[2]'\n"; # 'brown'
print "word3 = '$words[3]'\n"; # 'fox'
print "word4 = '$words[4]'\n"; # 'jumped'
|
download
specifying an
empty split term will break a string into individual characters:
program 27
| # split_example4.pl
$alphabet = "ABCDEF";
@words = split(//, $alphabet); # @words = ('A', 'B', 'C', 'D', 'E', 'F')
|
download
The
qw operator
This is an operator,
not a function.
It is used purely
for convenience when specifying a list of words.
program 28
| # using_qw.pl
@words = qw(The quick brown fox jumped);
printf "The number of words is = %d\n", scalar(@words); # 5
print "@words\n"; # The quick brown fox jumped
|
download
Hashes
Hashes (also known
as hashtables or dictionaries or associative arrays) are common data
structure in programming. They are built into the language in perl.
What is a hash?
With an array,
you index values by a numberic index. With hashes, you can use a symbolic
index. (Think of a telephone directory for hashes, and a row of numbered
houses for arrays)
The symbolic index
is known as the key
The result is
known as the value
The hash is a
mappings between a set of keys and values
program 29
| # re_hash.pl
# initialize the lookup table
%re_lookup = (
'Eco47III'=> 'AGCGCT',
'EcoNI' => 'CCTNNNNNAGG',
'EcoRI' => 'GAATTC',
'EcoRII' => 'CCWGG',
'HincII' => 'GTYRAC',
'HindII' => 'GTYRAC',
'HindIII' => 'AAGCTT',
'HinfI' => 'GANTC'
);
print "Enter restriction enzyme name\n";
$re=<STDIN>;
chomp $re;
$seq = $re_lookup{$re};
if (defined($seq)) {
print "RE sequence for $re is: $seq\n";
}
else {
print "Sorry, I don't know about \"$re\"";
}
|
download
The symbol to
indicate a hash table is %
Hashes can be
specified in a similar way to arrays; use the parentheses ()
to construct a hash.
The construct
for looking up a hash is:
value =
hashvariable => { key }
we use =>
to indicate the key and the value
The keys and
values functions
The keys
function takes a hash as argument and returns a list of keys in that
hash
The values
function takes a hash as argument and returns a list of values in
that hash
program
30
| # keys_example.pl
# create a lookup table of GenBank accessions
# keyed by Clone ID
%accession_hash =
(
"BACR01A01" => "AC005555",
"BACR48E02" => "AC005577",
"BACR24K17" => "AC005101",
);
# get all the keys in the hash (hash is keyed by clone ID)
@clones = keys %accession_hash;
print "Clone IDs: @clones\n"; # prints BACR01A01 BACR48E02 BACR24K17
# get all the values in the hash (hash is a lookup for accessions)
@accs = values %accession_hash;
print "GenBank Accessions: @accs\n"; # prints AC005555 AC005577 AC005101
|
download
Reverse
on hashes
You can use the
reverse function to reverse a hash; unlike arrays, this does
not affect the order. Hashes are implicitly unordered.
reverse
will map the values onto the keys.
program
31
| %re_lookup_by_seq = reverse %re_lookup;
print "I recognise GAATTC as being $re_lookup_by_seq{GAATTC}\n";
# the above should print EcoRI
|
download
For this to work,
there must be a one to one mapping between keys and values
Removing elements
from a hash
program
32
| delete $re_lookup{"EcoRI"}
|
download
We can also set
the value in a hash:
program 33
| # translate1.pl
%translate = (); # initialize the hash
$translate{'atg'} = 'M';
$translate{'taa'} = '*';
$translate{'ctt'} = 'K';
print $translate{'atg'}; # prints M
|
download
Exercises:
Deitel
exercises 4.5, 4.6
Write a program
that sorts dna seqs by size. The output should be one sequence per
line, like this.
| % perl sort_by_seqsize.pl AAA TCCAAAGGGT ATTGG
Sorted seqs:
AAA
ATTGG
TCCAAAGGGT
|
|
you may want to use the *length* function; see Deitel p290
|
Write an english
to portuguese translator; or a program that translates between any
two languages. Keep the vocabulary down to ten words or so, and forget
about grammar/context altogether
Extend it so that
you can go either way.
EITHER
Deitel
4.7 (do two kinds of shuffling; the kind that is mentioned in the
book, and the shuffle you would get from cutting the cards)
OR
Write a molecular
evolution simulation program(!) Do it for only one generation. (Choose
artificially exaggerated probabilities for testing).
1. Define a paramater:
chance of single point mutation
This should either
be hardcoded or come from the user
| % perl recomb.pl 0.5 AAAAATTTTTTT
after one generation:
AAAAATGTTTTT
|
assume that each
base is equally likely in an outcome in the event of a point mutation.
2. Add other mutation/recombination
events; feel free to be biologically unrealistic in order to make
a fun simulation.
If you've made
it this far you're doing extremely well. The rest of the exercises
are optional.
Write a mini-medline
system. Create a hash of journal article titles by medline ID. Just
populate it with 3 or so entries, you can make them up if you like,
e.g.
program 34
| 1000 => "Made up article title",
|
download
1. Write a program
to allow people to look up journal article titles by ID
2. Extend the
program to allow people to get the medline ID if they give the exact
title. You should ask the user the question : search by ID/title?
3. (Extra) Extend
the "database" (e.g. add other hashtables) such that titles can be
looked up by author and/or journal name. What happens when an author
has written more than one article? Discuss the limitations of the
system.
Deitel

Chris Mungall cjm@fruitfly.org
Berkeley
Drosophila Genome Project