Fundamentals in Bioinformatics Oriented Statistics

Some Statistical Methods for Detecting Clustering in Biological Sequences
John Spouge (NCBI) Presented, Tue May 29, 2001 at NCBI    [Microsoft Powerpoint]

Many individuals within NCBI have come to me, presenting various statistical questions. Recently, many of these questions have had a common theme, namely, the detection of various kinds of clusters within biological sequences. This talk will not emphasize the biological background of the problems presented to me (although my collaborators are certainly welcome at the talk to provide it). Rather, the talk gives some of the generic statistical methods used to handle the problems. The biological problems addressed include the evaluation of clusters of genes and of restriction sites within bacterial genomes, of clusters of intergenic conserved nucleotides in different organisms, and of clusters of PSSM motifs representing promoter binding sites. The statistical methods discussed will include the minimum distance between randomly placed markers, Kolmogorov-Smirnov tests, scan tests, local run (BLAST) statistics, and compound Poisson process models.