Analysis of Low Complexity

I: Exercises to explore the SEG/PSEG algorithm:

  1. Explore the SEG algorithm using single protein file(s) from directory ______. Remember: seg FASTAfilename WindowLength TriggerComplexity ExtensionComplexity –p.
    Run seg using window length 45, trigger complexity 3.3 and extension complexity 3.6. Use option –p for "pretty print."
    Vary the window length from 45 to 30, 20, 10.
    What is the impact of window length?


  2. Merge 5 proteins from the directory ______ using SEALS "cat", determine the 2 proteins with the highest complexity (use parameters in 1a), remove those 2 proteins "fanot" to obtain a 3-protein file.


  3. Using your 3-protein file, use PSEG to search for repeat sequences (SSR). Remember: pseg FASTAfilename WindowLength TriggerComplexity ExtensionComplexity –z___.
    1. Use –z1, -z2, -z3, -z4, -z5.
    2. Try the following trigger/extension complexity parameters:
      1. 0/0 0.5/0.8 1.0/1.3
      examine file with UNIX command "more".
    3. What do the capital letter/low case letter scheme mean and how does the capital letter frequency change with different trigger/extension complexity parameters?
    4. How does the capital letter frequency change with different widow lengths?


II: Exercises to find repetitive proteins:

    1. Form groups, each group being assigned one taxa.
    2. As you see from I 3a, a segment can be assigned to different periods. How could you list each of the periods possible and then choose which one?
    3. It is possible to write a UNIX script to successively determine the segments with differing periods.
      1. View shell script executable in :__________________. It is a good idea to think through how you would do this more elegantly using Pearl.
      2. Run pseg.P1_9 using the organism assigned in II.
      3. ftp "export" file to PC and view in Excel (or if not possible view in xemacs).

III: Exercises to see differences between taxa:

    1. Consider organism assignment from II.
    2. Using window length 45, and extension complexity = trigger + 0.3, determine the appropriate trigger complexity for your organism. Use the steps:
      1. Use dbcomp
      2. Use shuffledb
      3. Use seg, start with trigger complexity estimation given for your organism.
      4. Vary trigger complexity slightly to determine what complexity results in the shuffled database having 4% of its AA in low-complexity segments.
      5. Using correct trigger complexity, determine the % AA you organism has in Low-complexity regions.
      6. Report to me, and we will have a class discussion.