
Analysis of Low Complexity
I:
Exercises to explore the SEG/PSEG algorithm:
- Explore the SEG
algorithm using single protein file(s) from directory ______. Remember:
seg FASTAfilename WindowLength TriggerComplexity ExtensionComplexity
–p.
Run seg using window length 45, trigger complexity 3.3 and extension
complexity 3.6. Use option –p for "pretty print."
Vary the window length from 45 to 30, 20, 10.
What is the impact of window length?
- Merge 5 proteins
from the directory ______ using SEALS "cat", determine the 2 proteins
with the highest complexity (use parameters in 1a), remove those 2
proteins "fanot" to obtain a 3-protein file.
- Using your 3-protein
file, use PSEG to search for repeat sequences (SSR). Remember: pseg
FASTAfilename WindowLength TriggerComplexity ExtensionComplexity –z___.
- Use –z1,
-z2, -z3, -z4, -z5.
- Try the following
trigger/extension complexity parameters:
- 0/0 0.5/0.8
1.0/1.3
examine file
with UNIX command "more".
- What do
the capital letter/low case letter scheme mean and how does the
capital letter frequency change with different trigger/extension
complexity parameters?
- How does
the capital letter frequency change with different widow lengths?
II: Exercises
to find repetitive proteins:
- Form groups,
each group being assigned one taxa.
- As you see
from I 3a, a segment can be assigned to different periods. How could
you list each of the periods possible and then choose which one?
- It is possible
to write a UNIX script to successively determine the segments with
differing periods.
- View shell
script executable in :__________________. It is a good idea
to think through how you would do this more elegantly using
Pearl.
- Run pseg.P1_9
using the organism assigned in II.
- ftp "export"
file to PC and view in Excel (or if not possible view in xemacs).
III:
Exercises to see differences between taxa:
- Consider organism
assignment from II.
- Using window
length 45, and extension complexity = trigger + 0.3, determine the
appropriate trigger complexity for your organism. Use the steps:
- Use dbcomp
- Use shuffledb
- Use seg,
start with trigger complexity estimation given for your organism.
- Vary trigger
complexity slightly to determine what complexity results in
the shuffled database having 4% of its AA in low-complexity
segments.
- Using correct
trigger complexity, determine the % AA you organism has in Low-complexity
regions.
- Report
to me, and we will have a class discussion.
|