Bioinfo Helpdesk & On-line Training

Perl 7 - References

Reading: Deitel 13.1-13.4; 13.6, 13.7

advanced reading: man perlref (not the easiest of guides)

What is a reference?

So far we have dealt with different kinds of variables; we have encountered scalars, which are single valued, and arrays and hashes, which are multiple valued.

scalars

program 1

$x      = 12;
$seq    = "ATATATATGGGATTTT";
$score  = 78.87 ;
$expect = 2.3e-78;

download

arrays

program 2

@sequences = ( "GTTGATTGC", "CGCTTGNNNN", "ATATAGGATTCC" );
push(@sequences, "ATGGCTGTTGCTAAT");

@orfs = ( "560-1189", "1567-2791", "5401-9623" );
print "@orfs\n";

@scores = ( 1.34, 12.82, 54.02, 98.77 );
print $scores[3];

download

hashes

program 3

%codons = (
   ATG => 'M',
   TCA => 'S',
   TCG => 'S',
   TCC => 'S',
   TCT => 'S',
   TTT => 'F',
   TTC => 'F',
   TTA => 'L',
   TTG => 'L'
);
print $codons{"ATG"};

download

Scalars can be numeric or they can be strings; scalars can also hold a reference. Think of a reference as a pointer to another variable.

Creating a reference to another variable

We use the backslash symbol to create a reference to another variable

program 4

# ref_to_array.pl

@words = ( 'The', 'quick', 'brown', 'fox' );
$ref_to_array = \@words;

download

we can have references to any variable type:

program 5

# ref_to_hash.pl

%re_hash = (
          'Eco47III'=> 'AGCGCT',
          'EcoNI'   => 'CCTNNNNNAGG',
          'EcoRI'   => 'GAATTC',
          'EcoRII'  => 'CCWGG',
          'HincII'  => 'GTYRAC',
          'HindII'  => 'GTYRAC',
          'HindIII' => 'AAGCTT',
          'HinfI'   => 'GANTC'
);
$ref_to_hash = \%re_hash;

download

Initializing references to arrays

We use the square brackets to compose a reference to an array

program 6

$words = [ 'The', 'quick', 'brown', 'fox' ];

download

We use the curly brackets to compose a reference to a hash

program 7

$re_hash = {
          'Eco47III'=> 'AGCGCT',
          'EcoNI'   => 'CCTNNNNNAGG',
          'EcoRI'   => 'GAATTC',
          'EcoRII'  => 'CCWGG',
          'HincII'  => 'GTYRAC',
          'HindII'  => 'GTYRAC',
          'HindIII' => 'AAGCTT',
          'HinfI'   => 'GANTC'
};

download

Indexing elements in an array reference

We use the arrow operator -> to to index an element in an array reference. Think of it as arrow pointing to the array being referenced by the reference variable.

program 8

$genes = [ "CDC1", "ACT1", "ORC1" ];

print $genes->[0];   
print $genes->[1];   
print $genes->[2];   
print $genes->[-1];

download

This will output:

CDC1ACT1ORC1ORC1

we can also dereference the array and access the scalar like this

program 9

$genes = [ "CDC1", "ACT1", "ORC1" ];

print $$genes[0];  
print $$genes[1]; 
print $$genes[2];
print $$genes[-1];

download

This will output:

CDC1ACT1ORC1ORC1

alternatively we can turn an array reference into an array like this:

program 10

$genes = [ "CDC1", "ACT1", "ORC1" ];

@array = @$genes;      # turn array reference into an array

print $array[0]; 
print $array[1]; 
print $array[2]; 
print $array[-1];

download

This will output:

CDC1ACT1ORC1ORC1

Accessing hash references

Similar rules apply to hash references too.

program 11

# gene_hash_refence.pl

$gene_ref = {
         name        => "PHDP",
	 function    => "transcription factor",
	 cytology    => "60A8",
	 accession   => "FBgn0025334",
	 contig      => "AE003462",
	 start       => 105855,
	 end         => 106880,
	 strand      => "+"
};

# accessing element with the key "name"
print $gene_ref->{"name"},            "\n";   # PHDP

# accessing element with the key "cytology"
print $gene_ref->{"cytology"},        "\n";   # 60A8

# or we can reference this way:
print $$gene_ref{"name"},             "\n";   # PHDP
print $$gene_ref{"function"},         "\n";   # transcription factor

# turn the hash reference into a hash
%gene_hash = %$gene_ref;

print $gene_hash{"name"},             "\n";   # PHDP
print $gene_hash{"accession"},        "\n";   # FBgn0025334

download

Uses of references

passing multiple nonscalars to a subroutine

Let's say we want to write a subroutine to multiply all the elements in one array a with all the elements in another b; i.e.

( a₁ x b₁, a₂ x b₂, a_n x b_n )

program 12

#!/usr/local/bin/perl
# test.pl

# to be written...
sub arraymult {
    .... ?
}

@a = (1, 2, 3);
@b = (4, 5, 6);
@c = arraymult(@a, @b);

download

What is wrong with the program above? It turns out that it is impossible to write a subroutine that will work in the desired way if it is called in the manner above.

This is because a subroutine takes a single list of arguments; the subroutine call above is equivalent to

@c = arraymult((1, 2, 3), (4, 5, 6)

which is equivalent to

@c = arraymult(1, 2, 3, 4, 5, 6)

because the comma operator colapses the lists together

The solution

We can instead pass in references to the arrays. We create a reference to a variable using the backslash operator (just when you thought perl couldn't get any more full of typographical symbols...)

@c = arraymult(\@a, \@b);

remember, references are just scalar variables that hold pointers. We are still passing the arraymult subroutine a list of two scalars.

That's how we call the subroutine. How do we actually write the subroutine? Well, we have to do the opposite - we have to dereference the references to get the the arrays they hold.

program 13

sub arraymult {
    my ($listref1, $listref2) = @_;

    # dereference references
    my @list1 = @$listref1;
    my @list2 = @$listref2;

    # passing lists of different sizes is a 
    # logic error
    if (scalar(@list1) != scalar(@list2)) {
        die("lists must be same size!");
    }

    my @multlist = ();   # initialize the list of multiplied vals

    # loop through all indices, multiplying
    for ( my $i = 0; $i < @list1; $i++ ) {
	$multlist[$i] = $list1[$i] * $list2[$i]
    }
    return @multlist;
}

@a = (1, 2, 3);
@b = (4, 5, 6);
@array = arraymult( \@a, \@b );
print "@array\n";                  # prints 4 10 18

@array = arraymult( [8, 12], [4, 7] );
print "@array\n";                  # prints 32 94

download

Altering the value of variables passed to subroutine

program 14

# swapvars.pl

# correct attempt at swapping two variables
sub swap {
    my ($v1, $v2) = @_;
    my $tmp = $$v1;
    $$v1 = $$v2;
    $$v2 = $tmp;
}

$x = 5;
$y = 10;

swap(\$x, \$y);
print "x=$x; y=$y\n";      # outputs x=10; y=5

download

Nested Data Structures

program 15

# annotations.pl

# hash of annotations keyed by gene symbol
%annotations_hash =
  (
        "ACT1"    =>  [ "actin modification", 
		        "cell wall organization	and biogenesis",
			"endocytosis" ],
        "ORC1"    =>  [ "DNA replication initiation", 
		        "S phase of mitotic cell cycle" ],
	"CDC1"    =>  [ "metal ion homeostasis",
                        "DNA replication",
			"DNA recombination" ]
  );

print "Please enter a gene symbol\n";
$symbol = <STDIN>;
chomp $symbol;

# lookup the annotations hash, keyed by gene symbol
$annotation_listref = $annotations_hash{$symbol};

# turn the array reference into a normal array
@annotation_list = @$annotation_listref;

print "Annotations for $symbol = @annotation_list\n"

download

Let's try running this program:

Please enter a gene symbol
CDC1
Annotations for CDC1 = metal ion homeostasis DNA replication DNA recombination

Exercises

Rewrite the re_hash.pl program from Lecture3 to use a hash reference.

Deitel exercise 13.5

Create a program that reads in hexamer output, or the output from some other prediction program. Turn each line into a hash reference, with keys such as start, end, strand, score. Store the hash references in an array. After the program has done this, it should prompt the user for a range. The program will then loop through the array, finding all predictions in that range, and put these into another array. It should then iterate though this array, printing the results in GFF format.

Chris Mungall cjm@fruitfly.org
Berkeley Drosophila Genome Project

scalars program 1

arrays

hashes program 3

Accessing hash references

passing multiple nonscalars to a subroutine

The solution

scalars

program 1

hashes

program 3