Reading:
Deitel Chapter
2
Optional: Jambek
Chapter 12
Perl
and Bioinformatics
Perl: the beginning
Invented by Larry
Wall, a linguist, in 1986 - whilst working for the US National Security
Agency!
Quickly adopted
by bioinformaticians for its text processing capabilities
How perl saved
the human genome project - http://www.bioperl.org/GetStarted/tpj_ls_bio.html
What are the
distinguishing features of perl?
- interpreted
- powerful, high-level
- modular, object-oriented
- cross platform
- easygoing,
relaxed
- data munging
- third party
libraries
- www
- bioinformatics
- bioperl
Some other programming
languages
C
|
fast
"low-level"
easy to make mistakes
standard language for implementing algorithms (blast, sim4)
not so portable
|
C++
|
extension of C
object-oriented
|
Java
|
compiled/interpreted
object oriented
applets
portable
good bioinformatics support (biojava)
|
Python
|
interpreted
object oriented
aesthetically pleasing
less users
|
Lisp
|
AI
functional
emacs
few users
|
Others - FORTRAN,
Pascal, Haskell, Prolog....
Perl in action
Perl is a highly
versatile language, and is equally happy performing tasks ranging
from handy timesaving chores - for instance, splitting a fasta file
into multiple files - to entire bioinformatics infrastructures: interfacing
with relational databases, serving web pages, running analysis pipelines,
graphical user interfaces.

Hello
World
The tradition
in computer programming is for one's first program to be a greeting
to the world.
program
1
| # hello_world.pl
print "Hello World!\n";
|
download
When you execute
this program, this is what you will see:
| % perl hello_world.pl
Hello World!
|
So far so good.
We've encountered
our first perl function - print
Let's look at
another example, from the Deitel book:
program 2
| # welcome.pl
print ( "1. Welcome to Perl!\n" );
print "2. Welcome to Perl!\n" ;
print "3. Welcome ", "to ", "Perl!\n";
print "4. Welcome ";
print "to Perl!\n";
print "5. Welcome to Perl!\n";
print "6. Welcome\n to\n\n Perl!\n";
|
download
| 1. Welcome to Perl!
2. Welcome to Perl!
3. Welcome to Perl!
4. Welcome to Perl!
5. Welcome to Perl!
6. Welcome
to
Perl!
|
Here is another
example, this time using mathematical /operators
program 3
| #!/usr/local/bin/perl
# math.pl
print " 9 x 9 = ", 9 * 9, "\n";
print " 8 - 12 = ", 8 - 12, "\n";
print " 2 to the 5 = ", 2 ** 5, "\n";
print " 30 / 5 = ", 30 / 5, "\n";
print " pi x 5 * 5 = ", 3.141 * 5 * 5, "\n";
print "\n";
# now let's use some of perl's handy numeric functions:
print " log(1e-50) = ", log(1e-50), "\n";
print " log10(1e-50) = ", log(1e-50) / log(10), "\n";
print " sqrt of 256 = ", sqrt(256), "\n";
print "3.14 rounded down = ", int(3.14), "\n";
print " random number = ", rand(100), "\n";
print " sin(3.1414) = ", sin(3.1414), "\n";
|
download
here is the output:
| 9 x 9 = 81
8 - 12 = -4
2 to the 5 = 32
30 / 5 = 6
pi x 5 * 5 = 78.525
log(1e-50) = -115.129254649702
log10(1e-50) = -50
sqrt of 256 = 16
3.14 rounded down = 3
random number = 91.0604610880004
sin(3.1414) = 0.000192653588601532
|

Writing
and Executing Perl
Creating your
masterpiece
First of all think
of a name for your program. Then fire up a text editor to compose
your program.
Any text editor
will do - emacs is recommended, as it has a special mode for perl
that intelligently highlights your program in nice colours, and will
help you layout the program.
Running your
program
There are generally
two ways to execute (run) your program.
run it from the
command line using the perl command
or, put in a header
line that lets unix know how to run the program
program 4
| #!/usr/local/bin/perl
# myprogram.pl - my first perl program
print "Ola!\n";
|
download
and then make
the program executable using the unix chmod command,
and just run it by typing the name of the program.
| % chmod +x myprogram.pl
% myprogram.pl
Ola!
|
Tips
It is often a
good idea to name your perl scripts with a name that ends in .pl,
for example myprogram.pl - this helps organise your files,
and lets emacs know to use perl mode
you can find out
what version of perl you are running by using the perl command
with the -version switch
| % perl -version
This is perl, v5.6.0 built for sun4-solaris
Copyright 1987-2000, Larry Wall
Perl may be copied only under the terms of either the Artistic License or the
GNU General Public License, which may be found in the Perl 5.0 source kit.
Complete documentation for Perl, including FAQ lists, should be found on
this system using `man perl' or `perldoc perl'. If you have access to the
Internet, point your browser at http://www.perl.com/, the Perl Home Page.
|
If you are using
any version less that 5.00404 you should consider upgrading. The latest
stable version of perl at the time of writing is 5.6.1
you can find out
where the perl binary is installed on your system by using the unix
which command
| % which perl
/usr/local/bin/perl
|
if you have perl
installed somewhere non-standard, you may have to change the first
line of your perl scripts to reflext this.

Introducing
variables
Let's look at
another example from Deitel -
program 5
| #!/usr/local/bin/perl
# addition.pl
# A simple addition program
print "Please enter first number:\n";
$number1 = <STDIN>;
chomp $number1;
print "Please enter second number:\n";
$number2 = <STDIN>;
chomp $number2;
$sum = $number1 + $number2;
print "The sum is $sum.\n";
|
download
This program prompts
the user to enter two numbers. The variable $sum is assigned
to be the sum of $number1 and $number2, then the answer is displayed.
We have a few
new concepts here
- flow
- the program is a series of instructions, executed from beinning
to end
- variables -
$number1 and $number2 are scalar variables
- assignment
- the equals = symbol is for assigning variables
- user input
- <STDIN> prompts input from the keyboard
see Deitel
p32-34
variables
and memory
The computer's
memory can be thought of as stacks of boxes, into which you can place
your data.
Creating a variable
is like sticking a label on the box.
this snippet of
code...
program 6
| $number1 = 5;
$number2 = 10;
|
download
...can be thought
of as having the following effect:
| $number1
+-----+
| 5 |
+-----+
|
$number2
+-----+
| 10 |
+-----+
|
We shall discuss
variables more later on in this lecture.
Operators
Perl has different
kinds of operators; so far we have encountered arithmentic
operators.
You can get a
full list of operators by typing man perlop
Often the order
of evaluation is important; Perl has precedence rules for deciding
what order to evaluate parts of an expression. Most of these are fairly
intuitive, but it doesn't do any harm to force precedence using brackets/parentheses:
program 7
| 2 + 3*4; # 14
2 +(3*4); # 14
(2 +3)*4; # 20
|
download
Logical Operators
Logical operators
are like arithmetic operators except they return the values TRUE or
FALSE. In perl, the boolean values of true and false are stored
as integers.
program 8
| # logical_operators.pl
print "truth value of (2 == 2) = ", (2 == 2), "\n";
print "truth value of (1+1 == 3) = ", (1+1 == 3), "\n";
print "truth value of (5 != 4) = ", (5 != 4), "\n";
print "truth value of !(5 == 4) = ", !(5 == 4), "\n";
print "truth value of 3 < 3 = ", (3 < 3), "\n";
print "truth value of 3 <= 3 = ", (3 <= 3), "\n";
print "truth value of (1 == 1) AND ( 1==2) = ", (1 == 1 and 1==2), "\n";
print "truth value of (1 == 2) OR ( 1==1) = ", (1 == 2 or 1==1), "\n";
print "truth value of (((1 AND 0) OR 1) AND 0) = ",
(((1 and 0) or 1) and 0), "\n";
|
download
this outputs
| truth value of (2 == 2) = 1
truth value of (1+1 == 3) =
truth value of (5 != 4) = 1
truth value of !(5 == 4) = 1
truth value of 3 < 3 =
truth value of 3 <= 3 = 1
truth value of (1 == 1) AND ( 1==2) =
truth value of (1 == 2) OR ( 1==1) = 1
truth value of (((1 AND 0) OR 1) AND 0) = 0
|
The ?
Operator (advanced)
program 9
| $val = 5;
print ( $val == 5 ? "val equals 5\n" : "val doesn't equal 5\n" );
|
download
This operator
has the following structure
test condition
? value-if-true : value-if-false
The part before
the ? is an expression. Depending on the truth value of that
expression, only one of the following two expressions will be executed.
If the test condition is true, the expression preceeding the :
will be evaluated. If it is false, the expression after the :
will be evaluated.
File Operators
| -f file name # tests if file exists
-d directory name # tests if directory exists
-x file name # tests if file is executable
-w file name # tests if file is writable
|
program
10
| # filetests.pl
print "Enter file/directory name:\n";
$filename = <STDIN>;
chomp $filename;
print "$filename ", -f $filename ? "IS a file\n" : "is NOT a file\n";
print "$filename ", -d $filename ? "IS a directory\n" : "is NOT a directory\n";
print "$filename ", -x $filename ? "IS executable\n" : "is NOT executable\n";
|
download

Functions
Already we have
met some perl functions - print and chomp, for acting on strings.
We have also used various arithmentic functions.
The format
for a function call is
| function name ( argument 1, argument 2,... argument n )
|
A function
call can have anything from zero to an unlimited number of arguments.
With perl's
builtin functions, the parantheses are optional;
e.g.
program 11
| print("Hello", "World", "\n");
# is equivalent to
print "Hello", "World", "\n";
|
download
You can
get a full list of functions by typing man perlfunc
You can
get specific documentation on a function by typing perldoc -f function
name
The list
is extensive enough to cover many useful tasks, and you can always
make your own. Later on we'll discuss how to create your own functions
to do useful bioinformatics tasks.

Variables,
part II
So far, we have
only been discussing scalar variables. Perl actually has other
variable types such as arrays and hashes
we will return
to these later, let's focus on scalars just now.
Scalars
- begin
with $
- are single-valued
- can be used
to store strings, integers and floating points - and more
strings
program 12
| $greeting = "Hello World";
$newline = "\n";
print $greeting, $newline;
|
download
integers
program 13
| $x = 8;
$y = 7;
$product = $x * $y;
print $product;
|
download
floating point
program 14
| $val = log(1e40);
print $val;
|
download
Operators
on scalar variables
assignment
operator =
program 15
| $x = 3; # assigns 3 to $x
$y = $x + 1; # assigns 4 to $y
$y = $y * 2; # assigns 8 to $y
|
download
other operators:
program 16
| # += increment operator
$x += 5; # same as $x = $x + 5
# -= decrement operator
$x -= 5; # same as $x = $x - 5
# ++ increment by 1 operator
$x ++; # same as $x = $x + 1
# -- decrement by 1 operator
$x --; # same as $x = $x - 1
# /= divide operator
$x /= 5; # same as $x = $x / 5
# *= multiplication operator
$x *= 5; # same as $x = $x * 5
|
download
program 17
| # scalar_operators.pl
# assignment
$val1 = 10;
$val2 = 2;
# change the value of a variable
$val2 = $val2 + 2;
# modify $val1 by adding $val2
$val1 += $val2;
# decrement by 1
$val1 --;
# multiply val1 by val2
$val1 = $val1 * $val2;
# this could also have been written as
# $val1 *= $val2;
print "\n And the answer is... $val1\n\n";
|
download
what do you think
the output of this program will be?
You can step through
this program one line at a time by using the perl debugger
| % perl -d scalar_operators.pl
main::(scalar_operators.pl:4): $val1 = 10;
DB<1> n
main::(scalar_operators.pl:5): $val2 = 2;
DB<1> p $val1
10
DB<2> n
main::(scalar_operators.pl:8): $val2 = $val2 + 2;
DB<2> n
main::(scalar_operators.pl:11): $val1 += $val2;
DB<2> n
main::(scalar_operators.pl:14): $val1 --;
DB<2> p $val1
14
DB<3> p $val2
4
DB<4> n
main::(scalar_operators.pl:19): $val1 = $val1 * $val2;
DB<4> n
main::(scalar_operators.pl:23): print $val1;
DB<4> n
And the answer is... 52
|
Some operators
can also act on strings too
program 18
| $word1 = "Hello";
$word2 = "World";
$space = " ";
$newline = "\n";
$sentence = $word1 . $space;
$sentence .= $word2;
print $sentence . $newline;
|
download
you can get a
full list of operators by typing man perlop

Command
Line Arguments
You can write
programs that take their arguments from the command line.
When a program
is executed, the command line arguments go into a special array
called @ARGV.
We will learn
how to manipulate arrays in a later lecture. For now, just treat the
following lines as magic:
program 19
| # user_input.pl
$first_arg = shift @ARGV;
$second_arg = shift @ARGV;
print " $first_arg x $second_arg = ", $first_arg * $second_arg, "\n";
|
download
| % perl user_input.pl 7 8
7 x 8 = 56
|

Avoiding
mistakes
You can control
perl's level of "strictness" using command line switches and pragmas
the -w switch
program 20
| #!/usr/local/bin/perl -w
# deliberate_mistake.pl
$variable = 5;
$varaible++;
print "new value = $variable\n";
|
download
this outputs:
| % deliberate_mistake.pl
Name "main::varaible" used only once: possible typo at
lectures/1_intro/deliberate_mistake.pl line 5.
new value = 5
|
the strict
pragma
This forces all
variables to be explicitly declared using the my keyword.
program 21
| #!/usr/local/bin/perl -w
# deliberate_mistake2.pl
use strict; # forces all variables to be declared
my $variable = 5;
$varaible = $variable + 1;
print "new value = $variable\n";
|
download
Notice this would
not get caught by the -w switch
| % deliberate_mistake2.pl
Global symbol "$varaible" requires explicit package name at
lectures/1_intro/deliberate_mistake2.pl line 7.
Execution of lectures/1_intro/deliberate_mistake2.pl aborted due to
compilation
errors.
|

Exercises
- Deitel
Ex 2.7, 2.8,
- write a program
that computes the hypotenuse of a right angled triangle;
the lengths of the two sides should come from the command line,
or prompted by the program
- write
an arithmetic test program. it should prompt the user for the answer
to a random multiplication sum. if the user answers correctly, print
a suitably rewarding message.

Chris Mungall cjm@fruitfly.org
Berkeley
Drosophila Genome Project