  Perl 3 - Arrays and Hashes

Advanced: man perldata (perl data structures manual)

So far we have been looking at scalar variables. Recall a scalar is a single valued variable. Limitations of scalar variables

Imagine we want to find the average of a list of numbers

we could do it like this:

program 1
 ```\$number1 = 5.4; \$number2 = 7.3; \$number3 = 4.1; \$average = ( \$number1 + \$number2 + \$number3 ) / 3; ```

but this is obviously extremely limited Lists

program 2
 ```( 5.6, 8.22, 14.9 ); # list of floating point numbers ( "hello", "brazil" ); # list of strings ( "hello", \$country ); ( "blah", 18, 22, "x", 3.14 ); # mixed list ( 0 .. 5 ); # list of integers between 0 and 5 ( 'a' .. 'z' ); # list of strings a,b,c,d...... ``` Array variables

program 3
 ```@numbers = (5.6, 8.22, 14.9); # list of floating point numbers @words = ("Hello", "Brazil!"); # list of strings @qual = (100, 100, 100, 75, 75, 75); @greeting = ("hello", \$country); @list = ("blah", 18, 22, "x", 3.14); # mixed list @range = (0..5); # list of integers betwen 0 and 5 ```

as we can see, we use the special character @ for denoting arrays Accessing array elements

program 4
 ```# alphabet_index.pl print "Enter an index number between 0 and 25\n"; \$index = ; chomp \$index; @letters = ('A'..'Z'); print "letter index \$index = \$letters[\$index] \n"; ```

The number in square brackets is the index

arrays are indexed from zero, not one

The arrays above are lists of scalars. we use the scalar sign \$ to indicate that the elements we are accessing is a scalar.

## Setting the values in an array

program 5
 ```# array_set.pl @words = ('The', 'quick', 'brown', 'fox'); \$words = 'small'; \$words = 'furry'; print "@words \n"; # The small furry fox ```

Let's go through this line by line and see what is happening

program 6
 ```@words = ('The', 'quick', 'brown', 'fox'); ```

This constructs an array that looks like this:

 ``` +------+ \$words |The | +------+ \$words |quick | +------+ \$words |brown | +------+ \$words |fox | +------+ ```

program 7
 ```\$words = 'small'; ```

This sets element 1 (counting from zero) of the array, so our array now looks like this:

 ``` +------+ \$words |The | +------+ \$words |small | +------+ \$words |brown | +------+ \$words |fox | +------+ ```

program 8
 ```\$words = 'furry'; ```

This sets element 2 (counting from zero) of the array, so our array ends up like this:

 ``` +------+ \$words |The | +------+ \$words |small | +------+ \$words |furry | +------+ \$words |fox | +------+ ```

## Indexing arrays with negative numbers

You can index from the end of an array backwards by using negative numbers to index the array:

program 9
 ```# negative_index.pl @letters = ('A'..'Z'); print " last letter = \$letters[-1] \n"; # Z print "penultimate letter = \$letters[-2] \n"; # Y ```

## Getting the length of an array

You can use the function scalar to turn an array into a single valued scalar variable; the value of this variable will be the length of the array.

program 10
 ```@numbers = (0..100); print scalar(@numbers); # prints 101 ```

## The index count \$#

You can also get the value of the last index by preceeding the array variable name with \$#

program 11
 ```# index2.pl @numbers = (0..100); @numbers = reverse @numbers; print "index = \$#numbers \n"; # prints 100 print "\$numbers[\$#numbers] \n"; # prints 0 ```

## Taking a "slice" of an array

program 12
 ```@words = ('the', 'quick', 'brown', 'fox'); @cut = @words[1,3]; # same as (\$words, \$words) print "@cut \n"; # quick fox ``` Functions that act on arrays

Push

program 13
 ```# push_example.pl @numbers = (1, 2, 3); push(@numbers, 4, 5); print "@numbers \n"; # prints 1 2 3 4 5 ```

## Pop

program 14
 ```# pop_example.pl @words = ('the', 'quick', 'brown', 'fox'); print pop(@words); # fox print pop(@words); # brown print pop(@words); # quick print pop(@words); # the ```

## Shift

program 15
 ```# shift_example.pl @words = ('the', 'quick', 'brown', 'fox'); print pop(@words); # the print pop(@words); # quick print pop(@words); # brown print pop(@words); # fox ```

Unshift

program 16
 ```# unshift_example.pl @words = ('quick', 'brown', 'fox'); unshift(@words, 'the'); print "@words\n"; # the quick brown fox ```

Reverse

program 17
 ```# reverse_example.pl @words = ('the', 'quick', 'brown', 'fox'); print reverse(@words), "\n"; # foxbrownquickthe ```

 ```an array in quotes is interpolated i.e. there are spaces placed between the words. If we print an array without the quotes, the elements are all squashed together. ```

Sort

program 18
 ```# sort_example.pl @words = ('The', 'quick', 'brown', 'fox', 'jumped'); @sorted = sort(@words); print "sorted words = @sorted\n"; # The brown fox jumped quick ```

you can optionally specify a code block to use as the sort method. Use the special variables \$a and \$b to specify to sort comparison.

program 19
 ```# sort_example2.pl @words = ('The', 'quick', 'brown', 'fox', 'jumped'); @sorted = sort { lc(\$a) cmp lc(\$b) } @words; print "sorted words = @sorted\n"; # brown fox jumped quick The ```

lc is a function that takes a string as an arguments and returns the string as lower case cmp is a new operation. it returns

-1 if the left side is less than (alphabetically before) the right side

0 if the left side is the same as the right side

+1 if the left side is greater than (alphabetically after) the right side

what do you think the outcome of the following program is?
program 20
 ```# sort_example3.pl @numbers = (100, 101, 102, 10, 11, 12, 1, 2, 3); @sorted = sort @numbers; print "sorted numbers = @sorted\n"; ```

The default sort comparison is alphabetical order:
 ```sorted numbers = 1 10 100 101 102 11 12 2 3 ```

to sort in numeric order:
program 21
 ```# sort_example4.pl @numbers = (100, 101, 102, 10, 11, 12, 1, 2, 3); @sorted = sort { \$a <=> \$b } @numbers; print "sorted numbers = @sorted\n"; # 1 2 3 10 11 12 100 101 102 ```

<=> is a new operation. it returns

-1 if the left side is less than the right side

0 if the left side is equal to the right side

+1 if the left side is greater than the right side

Splice

program 22
 ```# split_example.pl @words = ('The', 'quick', 'brown', 'fox', 'jumped'); @spliced = splice(@words, 1, 2, 'happy', 'red'); print "spliced words = @spliced\n"; # quick brown print " words = @words\n"; # The happy red fox jumped ```

Join

program 23
 ```# join_example.pl @words = ('The', 'quick', 'brown', 'fox', 'jumped'); print join("+", @words), "\n"; # The+quick+brown+fox+jumped ```

Split

program 24
 ```# split_example.pl \$sentence = "The+++quick+++brown+++fox+++jumped"; @words = split(/\+\+\+/, \$sentence); print "@words \n"; # The quick brown fox jumped ```

Often we want to break up a sentence seperated by spaces into an array of words:

program 25
 ```# split_example2.pl \$sentence = "The quick brown fox jumped"; @words = split(/ /, \$sentence); print "word0 = '\$words'\n"; # 'The' print "word1 = '\$words'\n"; # '' print "word2 = '\$words'\n"; # '' print "word3 = '\$words'\n"; # '' print "word4 = '\$words'\n"; # 'quick' ```

What has happened here is that the split function in splitting on each individual space character. To remedy this:

program 26
 ```# split_example3.pl \$sentence = "The quick brown fox jumped"; @words = split(' ', \$sentence); print "word0 = '\$words'\n"; # 'The' print "word1 = '\$words'\n"; # 'quick' print "word2 = '\$words'\n"; # 'brown' print "word3 = '\$words'\n"; # 'fox' print "word4 = '\$words'\n"; # 'jumped' ```

specifying an empty split term will break a string into individual characters:

program 27
 ```# split_example4.pl \$alphabet = "ABCDEF"; @words = split(//, \$alphabet); # @words = ('A', 'B', 'C', 'D', 'E', 'F') ```

The qw operator

This is an operator, not a function.

It is used purely for convenience when specifying a list of words.

program 28
 ```# using_qw.pl @words = qw(The quick brown fox jumped); printf "The number of words is = %d\n", scalar(@words); # 5 print "@words\n"; # The quick brown fox jumped ``` Hashes

Hashes (also known as hashtables or dictionaries or associative arrays) are common data structure in programming. They are built into the language in perl.

What is a hash?

With an array, you index values by a numberic index. With hashes, you can use a symbolic index. (Think of a telephone directory for hashes, and a row of numbered houses for arrays)

The symbolic index is known as the key

The result is known as the value

The hash is a mappings between a set of keys and values

program 29
 ```# re_hash.pl # initialize the lookup table %re_lookup = ( 'Eco47III'=> 'AGCGCT', 'EcoNI' => 'CCTNNNNNAGG', 'EcoRI' => 'GAATTC', 'EcoRII' => 'CCWGG', 'HincII' => 'GTYRAC', 'HindII' => 'GTYRAC', 'HindIII' => 'AAGCTT', 'HinfI' => 'GANTC' ); print "Enter restriction enzyme name\n"; \$re=; chomp \$re; \$seq = \$re_lookup{\$re}; if (defined(\$seq)) { print "RE sequence for \$re is: \$seq\n"; } else { print "Sorry, I don't know about \"\$re\""; } ```

The symbol to indicate a hash table is %

Hashes can be specified in a similar way to arrays; use the parentheses () to construct a hash.

The construct for looking up a hash is:

value = hashvariable => { key }

we use => to indicate the key and the value

The keys and values functions

The keys function takes a hash as argument and returns a list of keys in that hash

The values function takes a hash as argument and returns a list of values in that hash

program 30
 ```# keys_example.pl # create a lookup table of GenBank accessions # keyed by Clone ID %accession_hash = ( "BACR01A01" => "AC005555", "BACR48E02" => "AC005577", "BACR24K17" => "AC005101", ); # get all the keys in the hash (hash is keyed by clone ID) @clones = keys %accession_hash; print "Clone IDs: @clones\n"; # prints BACR01A01 BACR48E02 BACR24K17 # get all the values in the hash (hash is a lookup for accessions) @accs = values %accession_hash; print "GenBank Accessions: @accs\n"; # prints AC005555 AC005577 AC005101 ```

Reverse on hashes

You can use the reverse function to reverse a hash; unlike arrays, this does not affect the order. Hashes are implicitly unordered.

reverse will map the values onto the keys.

program 31
 ```%re_lookup_by_seq = reverse %re_lookup; print "I recognise GAATTC as being \$re_lookup_by_seq{GAATTC}\n"; # the above should print EcoRI ```

For this to work, there must be a one to one mapping between keys and values

Removing elements from a hash

program 32
 ```delete \$re_lookup{"EcoRI"} ```

We can also set the value in a hash:

program 33
 ```# translate1.pl %translate = (); # initialize the hash \$translate{'atg'} = 'M'; \$translate{'taa'} = '*'; \$translate{'ctt'} = 'K'; print \$translate{'atg'}; # prints M ``` Exercises:

Deitel exercises 4.5, 4.6

Write a program that sorts dna seqs by size. The output should be one sequence per line, like this.

 ```% perl sort_by_seqsize.pl AAA TCCAAAGGGT ATTGG Sorted seqs: AAA ATTGG TCCAAAGGGT ```

 ``` you may want to use the *length* function; see Deitel p290 ```

Write an english to portuguese translator; or a program that translates between any two languages. Keep the vocabulary down to ten words or so, and forget about grammar/context altogether

Extend it so that you can go either way.

EITHER

Deitel 4.7 (do two kinds of shuffling; the kind that is mentioned in the book, and the shuffle you would get from cutting the cards)

OR

Write a molecular evolution simulation program(!) Do it for only one generation. (Choose artificially exaggerated probabilities for testing).

1. Define a paramater: chance of single point mutation

This should either be hardcoded or come from the user

 ```% perl recomb.pl 0.5 AAAAATTTTTTT after one generation: AAAAATGTTTTT ```

assume that each base is equally likely in an outcome in the event of a point mutation.

2. Add other mutation/recombination events; feel free to be biologically unrealistic in order to make a fun simulation.

If you've made it this far you're doing extremely well. The rest of the exercises are optional.

Write a mini-medline system. Create a hash of journal article titles by medline ID. Just populate it with 3 or so entries, you can make them up if you like, e.g.

program 34
 ```1000 => "Made up article title", ``` 