Bioinfo Helpdesk & On-line Training

Quiz Questions File

Basic Molecular Biology Questions

1 - The Non-coding regions of spliced mRNA are

(a) CDS

(b) 3’UTR and 5’UTR

(d) Promoter region

(e) All of the above

2 - Possible Binding sites for regulatory elements of a gene

(a) CDS

(b) 3’UTR and 5’UTR

(d) Downstream region of a gene

(e) All of the above

3 - A repeat Element (like ALU) can be present in which regions of DNA

(a) CDS

(b) 3’UTR and 5’UTR

(d) Any DNA segment outside of a gene

(e) All of the above

4 - A low complexity sequence can be found in which kind of biomolecule.

(a) Protein

(b) DNA

(d) EST

(e) All of the Above

5 - Which element(s) is/are not part of the DNA sequence of a gene.

(a) Donor/acceptor sites of introns

(b) PolyA tail

(d) Promoter

(e) All of the above

6 - Where is a signal peptide located

(a) The end a protein

(b) The 3’UTR

(d) On the tRNA

(e) All of the above

7 - What is an Open Reading Frame

(a) A segment that starts with a start codon and goes until the next stop codon.

(b) A partial coding region with no start or stop codon.

(d) A coding region with a gap in it.

(e) All of the above

Basic Web Questions

8 - Use a search Engine and find the URL for the CENSOR program to mask repeats. What is that URL

9 - Use a Search engine to find a large laboratory facility that specializes in hosting 100’s of mouse strains and is also heavily involved in the sequencing of mouse genes and genome. (hint: they are located on coastal Maine). What is the name of that laboratory?

Basic Entrez questions

10 - How do you prevent words from being term-mapped in Entrez

(a) Put them in parenthesis

(b) Put a ‘+’ sign in front of each term.

(d) Put them within double quotes

(e) Add butnot pubref [PROP] to the query.

11 - How would you restrict a Pubmed search to only publications with Sicotte as an author.

12 - What is a MeSH term.

(a) It’s a term used in the meshed index of Entrez

(b) A controlled vocabulary of keywords assigned by manually by indexers: Medical Subject Heading.

(d) It’s a set of keywords that each submitter of an article assigns to his article.

(e) All of the above.

13 - What is UniGene

(a) A curated collection of Gene locus

(b) A database of the genomic location of genes.

(d) The blast database of all the EST

(e) All of the above.

14 - Simply based on the format. Which of the following Accession numbers could be a refseq accession numbers for a protein.[ may not exist in the database]

(a) NM_001241

(b) NP_001241

(d) AAA001241

(e) NT_001241

15 - Simply based on the format. Which of the following Accession numbers could be a swissprot protein.[ may not exist in the database]

(a) A50517

(b) AAA01241

(d) A23D561

(e) NP_001241

16 - Simply based on the format which is NOT a valid accession number in Entrez. (these sequences may not exist in the database, and locus names do not count as a valid accession number .. even though they are indexed in entrez)

(a) A23D561

(b) I12345

(d) R01241

(e) AAA26521

17 - Which database contains the source of protein structures

(a) StructBase

(b) PDB

(d) Genbank

(e) Swissprot

18 - Which feature of ncbi would you use if you only wanted refseq sequences.

(a) The limits field in entrez restricted to refseq sequences

(b) In entrez, limit to srcdb_refseq [PROP]

(d) All of the above

Alignments, Gene prediction, advanced topics

19 - What kind of alignment method is the Smith-Waterman method.

(a) Hash-indexed

(b) Global Alignment

(d) Local Alignment

(e) Gibbs Sampling

20 - Which of the following methods cannot be used for gene prediction.

(a) Hidden-Markov modeling

(b) Regular expression searching.

(d) Codon Usage

(e) Profile searching.

21 - In SAGE, one uses 10nt long tags. How many unique 10mers are there?

(a) 10e10

(b) 65536

(d) 1048576

(e) 1024

22 - Which method allows you to analyze the expression of large number of genes, highlighting the ones that are differentially expressed.

(a) SAGE

(b) EST+UniGene+DDD

(d) 2D GELS

(e) All of the Above

23 - What is the frequency of polymorphism in a single diploid individual.

(a) About 1 variation every 3 bases in the coding region.

(b) About 1 variation per million base

(d) There is possible polymorphism at every base vary outside the coding region.

(e) All of the above.

24 - I want to search a 5’ human EST against 5’ drosophila EST, which tool should I use.

(a) Blastn

(b) Blastx

(d) Tblastx

(e) Megablast

25 - I am using blastn to align genomic DNA from yeast and humans. What am I most likely trying to do.

(a) Find evidence of Horizontally transferred genes.

(b) Find evidence of yeast infection in the human patients.

(d) Find sequencing contamination in human sequence.

(e) All of the above

26 - When doing the previous query (yeast against human using blastn) which masking options should I choose.

(a) NO filtering whatsoever

(b) Low-complexity filtering AND human repeat filtering

(d) Human repeat filtering only.

(e) No filtering, but mask for lookup table only.

Problems

P1 - (5pts, 1-10 minutes) Find one genomic sequences of rodents that have sequence length between 10040 and 10050, and which contain at least one CDS feature. Give me the accession number.

Accession:_____________________

P2 - (10pts,5-30 minutes) After much analysis, your collaborators have determined that a gene involved in diabetes is between genethon markers AFM242ZG5 and AFM266YB5. Of all the genes in gene_seq map between those two markers, which gene is most likely to be involved in diabetes.(don’t forget to set the display settings to see what you want! zoom in enough.. or to change enough parameters to see ALL the genes and markers.. and to use the verbose mode)

Gene name or defline:______________________________________________

Accession:_______________

P3 - (10 pts5-10 minutes.. mostly waiting.) blast the protein for the PAX6 isoform a gene, NP_000271 against nr. You should find a hit for PAX2a.. if not increase the number of definitions to display.

a) What is the E-value of the PAX2 (pax gene 2) hit. (near the bottom of the hitlist)

E-value:_____________

Use the FASTA(without the defline) of NP_000271 to search PROSITE at the expasy website (use a search engine if you don’t remember the URL) for PROSITE patterns. Exclude from the search patterns with a high probability of occurrence.

b) What are the two longest patterns that you find in this sequence (give the procite names) .

Pattern 1: __________________________

Pattern 2:___________________________

c) Use the PHI-blast with NP_000271 (without defline) and the PROSITE pattern (one of the ones you already found)

[LIVMFYG]-[ASLVR]-x(2)-[LIVMSTACN]-x-[LIVM]-x(4)-[LIV]-[RKNQESTAIY]-[LIVFSTNKH]-W-[FYVC]-x-[NDQTAH]-x(5)-[RKNAIMW]

Do you see PAX2a (if you see it, what is the E-value)

Yes/No {Evalue = )