Quiz Questions File

Basic Molecular Biology Questions

1 - The Non-coding regions of spliced mRNA are

(a)    CDS

(b)    3’UTR and 5’UTR

(c)    Introns

(d)    Promoter region

(e)    All of the above


2 - Possible Binding sites for regulatory elements of a gene

(a)    CDS

(b)    3’UTR and 5’UTR

(c)    upstream region of a gene

(d)    Downstream region of a gene

(e)   All of the above


3 - A repeat Element (like ALU) can be present in which regions of DNA

(a)    CDS

(b)    3’UTR and 5’UTR

(c)    Introns

(d)    Any DNA segment outside of a gene

(e)    All of the above


4 - A low complexity sequence can be found in which kind of biomolecule.

(a)    Protein

(b)    DNA

(c)    mRNA

(d)    EST

(e)    All of the Above


5 - Which element(s) is/are not part of the DNA sequence of a gene.

(a)    Donor/acceptor sites of introns

(b)    PolyA tail

(c)    Exons

(d)    Promoter

(e)    All of the above


6 - Where is a signal peptide located

(a)    The end a protein

(b)    The 3’UTR

(c)    Before the mature peptide

(d)    On the tRNA

(e)   All of the above


7 - What is an Open Reading Frame

(a)    A segment that starts with a start codon and goes until the next stop codon.

(b)    A partial coding region with no start or stop codon.

(c)    A partial coding region with no STOP codon

(d)    A coding region with a gap in it.

(e)    All of the above


Basic Web Question
s

8 - Use a search Engine and find the URL for the CENSOR program to mask repeats. What is that URL

9 - Use a Search engine to find a large laboratory facility that specializes in hosting 100’s of mouse strains and is also heavily involved in the sequencing of mouse genes and genome. (hint: they are located on coastal Maine). What is the name of that laboratory?


Basic Entrez questions

10 - How do you prevent words from being term-mapped in Entrez

(a)    Put them in parenthesis

(b)    Put a ‘+’ sign in front of each term.

(c)    Put them within single quotes

(d)    Put them within double quotes

(e)    Add butnot pubref [PROP] to the query.


11 - How would you restrict a Pubmed search to only publications with Sicotte as an author.


12 - What is a MeSH term.

(a)    It’s a term used in the meshed index of Entrez

(b)    A controlled vocabulary of keywords assigned by manually by indexers: Medical Subject Heading.

(c)    It’s a synomym dictionary used by entrez: Medical Synonym and Homonyms

(d)    It’s a set of keywords that each submitter of an article assigns to his article.

(e)    All of the above.


13 -
What is UniGene

(a)    A curated collection of Gene locus

(b)    A database of the genomic location of genes.

(c)    A clustering of sequence that attempts to provide one record for each gene by using sequence similarity.

(d)    The blast database of all the EST

(e)    All of the above.


14 - Simply based on the format. Which of the following Accession numbers could be a refseq accession numbers for a protein.[ may not exist in the database]

(a)    NM_001241

(b)    NP_001241

(c)    P001241

(d)    AAA001241

(e)    NT_001241


15 -
Simply based on the format. Which of the following Accession numbers could be a swissprot protein.[ may not exist in the database]

(a)    A50517

(b)    AAA01241

(c)    P23356

(d)    A23D561

(e)    NP_001241


16 -
Simply based on the format which is NOT a valid accession number in Entrez. (these sequences may not exist in the database, and locus names do not count as a valid accession number .. even though they are indexed in entrez)

(a)    A23D561

(b)    I12345

(c)    NX_001241

(d)    R01241

(e)    AAA26521


17 -
Which database contains the source of protein structures

(a)    StructBase

(b)    PDB

(c)    PRF

(d)    Genbank

(e)    Swissprot


18 -
Which feature of ncbi would you use if you only wanted refseq sequences.

(a)    The limits field in entrez restricted to refseq sequences

(b)    In entrez, limit to srcdb_refseq [PROP]

(c)    Search using the LocusLink ressource

(d)    All of the above


Alignments, Gene prediction, advanced topics

19 - What kind of alignment method is the Smith-Waterman method.

(a)    Hash-indexed

(b)    Global Alignment

(c)    Multiple-alignment

(d)    Local Alignment

(e)    Gibbs Sampling


20 -
Which of the following methods cannot be used for gene prediction.

(a)    Hidden-Markov modeling

(b)    Regular expression searching.

(c)    Blast alignments

(d)    Codon Usage

(e)    Profile searching.


21 -
In SAGE, one uses 10nt long tags. How many unique 10mers are there?

(a)    10e10

(b)    65536

(c)    16777216

(d)    1048576

(e)    1024


22 -
Which method allows you to analyze the expression of large number of genes, highlighting the ones that are differentially expressed.

(a)    SAGE

(b)    EST+UniGene+DDD

(c)    Affymetrix microarrays

(d)    2D GELS

(e)    All of the Above


23 -
What is the frequency of polymorphism in a single diploid individual.

(a)    About 1 variation every 3 bases in the coding region.

(b)    About 1 variation per million base

(c)    About 1 variation per 1000 bases

(d)    There is possible polymorphism at every base vary outside the coding region.

(e)    All of the above.

24 - I want to search a 5’ human EST against 5’ drosophila EST, which tool should I use.

(a)    Blastn

(b)    Blastx

(c)    PSI-blast

(d)    Tblastx

(e)    Megablast


25 -
I am using blastn to align genomic DNA from yeast and humans. What am I most likely trying to do.

(a)    Find evidence of Horizontally transferred genes.

(b)    Find evidence of yeast infection in the human patients.

(c)    Find human exons, since the introns will have evolved away.

(d)    Find sequencing contamination in human sequence.

(e)    All of the above


26 -
When doing the previous query (yeast against human using blastn) which masking options should I choose.

(a)    NO filtering whatsoever

(b)    Low-complexity filtering AND human repeat filtering

(c)    Low-complexity filtering only.

(d)    Human repeat filtering only.

(e)    No filtering, but mask for lookup table only.


Problems

P1 - (5pts, 1-10 minutes) Find one genomic sequences of rodents that have sequence length between 10040 and 10050, and which contain at least one CDS feature. Give me the accession number.

Accession:_____________________

P2 - (10pts,5-30 minutes) After much analysis, your collaborators have determined that a gene involved in diabetes is between genethon markers AFM242ZG5 and AFM266YB5. Of all the genes in gene_seq map between those two markers, which gene is most likely to be involved in diabetes.(don’t forget to set the display settings to see what you want! zoom in enough.. or to change enough parameters to see ALL the genes and markers.. and to use the verbose mode)

Gene name or defline:______________________________________________

Accession:_______________

P3 - (10 pts5-10 minutes.. mostly waiting.) blast the protein for the PAX6 isoform a gene, NP_000271 against nr. You should find a hit for PAX2a.. if not increase the number of definitions to display.

a) What is the E-value of the PAX2 (pax gene 2) hit. (near the bottom of the hitlist)

E-value:_____________

Use the FASTA(without the defline) of NP_000271  to search PROSITE at the expasy website (use a search engine if you don’t remember the URL) for PROSITE patterns. Exclude from the search patterns with a high probability of occurrence.

b)      What are the two longest patterns that you find in this sequence (give the procite names) .

Pattern 1: __________________________

Pattern 2:___________________________

c)      Use the PHI-blast with NP_000271 (without defline) and the PROSITE pattern (one of the ones you already found)

[LIVMFYG]-[ASLVR]-x(2)-[LIVMSTACN]-x-[LIVM]-x(4)-[LIV]-[RKNQESTAIY]-[LIVFSTNKH]-W-[FYVC]-x-[NDQTAH]-x(5)-[RKNAIMW]

Do you see PAX2a (if you see it, what is the E-value)

Yes/No  {Evalue =            )