Bioinfo Helpdesk & On-line Training

Analysis of Low Complexity

I: Exercises to explore the SEG/PSEG algorithm:

Explore the SEG algorithm using single protein file(s) from directory ______. Remember: seg FASTAfilename WindowLength TriggerComplexity ExtensionComplexity –p.
Run seg using window length 45, trigger complexity 3.3 and extension complexity 3.6. Use option –p for "pretty print."
Vary the window length from 45 to 30, 20, 10.
What is the impact of window length?
Merge 5 proteins from the directory ______ using SEALS "cat", determine the 2 proteins with the highest complexity (use parameters in 1a), remove those 2 proteins "fanot" to obtain a 3-protein file.
Using your 3-protein file, use PSEG to search for repeat sequences (SSR). Remember: pseg FASTAfilename WindowLength TriggerComplexity ExtensionComplexity –z___.
1. Use –z1, -z2, -z3, -z4, -z5.
2. Try the following trigger/extension complexity parameters:
  1. 0/0 0.5/0.8 1.0/1.3
  examine file with UNIX command "more".
3. What do the capital letter/low case letter scheme mean and how does the capital letter frequency change with different trigger/extension complexity parameters?
4. How does the capital letter frequency change with different widow lengths?

II: Exercises to find repetitive proteins:

Form groups, each group being assigned one taxa.
As you see from I 3a, a segment can be assigned to different periods. How could you list each of the periods possible and then choose which one?
It is possible to write a UNIX script to successively determine the segments with differing periods.
1. View shell script executable in :__________________. It is a good idea to think through how you would do this more elegantly using Pearl.
2. Run pseg.P1_9 using the organism assigned in II.
3. ftp "export" file to PC and view in Excel (or if not possible view in xemacs).

III: Exercises to see differences between taxa: