FIOCRUZ

Unless otherwise indicated, use the Bioinformatics resources at http://www.ncbi.nlm.nih.gov/ .

In accordance with the ever-changing nature of Bioinformatics resources, you are expected to find and read the help documentation to figure out on your own how to execute a given query.

Here is a quick guide to the peculiarities of the Entrez system: Read it carefully and refer to it throughout the lab.

When using the entrez/Pubmed databases, you can enter 3 types of terms.

        An unqualified term is any simple combination not restricted to a particular field.

        e.g. baltimore

          unqualified terms, may be remapped to other terms in the database and will not necessarely search all the fields in the database.

        A qualified term, is a term restricted to a search field.

                        e.g. baltimore [AUTH]

        In Pubmed both qualified and unqualified terms may be expanded. As of today, the only Pubmed qualified term that may be remapped are terms restricted to the "All Fields".

        Both qualified  and unqualified terms may be forced to remain uninterpreted.

        This is achieved by enclosing the search term in quotes.

                        e.g. "baltimore" [All  Fields]        or         "baltimore"

Some of the most commonly used search fields are Author [AUTH], Organism [ORGN], Title word [TITL], text word [TEXT]. Remember that you can restrict to search fields because Entrez creates a separate index for each field of a record that is to be indexed. (thus one record is pointed to by many different indexes) .

In Pubmed, the [All Fields] (unqualified field is the same as specifying [All Fields]) search field is treated differently than other fields. When interpreted (no double "") it does NOT in fact map to all the fields in the database!, rather the terms are mapped to [Text Words], [MeSH] terms, substance names, and if the term "looks" like an author name, it is mapped to the text field as well.  Pubmed thinks a term "looks" like an author name if the term is one long word (more than 2 letters) followed by one or two letters. For Pubmed 'baltimore d' looks like an author, but 'baltimore' doesn't (both without quotes).

Without quotes, the [AUTH] Field does substrings mapping. E.g. if you type (without quotes) 'baltimore' [AUTH], it will match all indexed entries for 'baltimore' and 'baltimore d', and 'baltimore dv', etc.. On the other hand, when you use quotes for a query "baltimore" [AUTH], then you had better be sure that this is the way that this author always types his name.. since it will not match entries with 'baltimore d' as author.

URL stands for Universal Ressource Locator: also known in vernacular as an internet address.

e.g. http://www.infoseek.com/ , http://www.yahoo.com

To look at the list of available terms, you can click on the Preview/Index button (below the search box) in the new Entrez system.

To restrict by molecule type or database type (or some other field), you can use the limits field. It is activated by clicking on the limits button, below the search box in the new Entrez system. Do not forget to turn it off (by checking off the check-box next to "index") when you are done with it.

The History button lists all the queries you have done within one database. You can use Boolean operators AND , OR, and NOT to combine previous searches, by using the reference number on the left of the history. E,g. If you have previously done two searches, you can type #1 AND #2 as a new search. For example you could combine a simple search on "kinase" [TITL]  (#1) and a second more complicated search restricted (using limits) to the organism worm, and to messenger RNA molecules(#2).

1) . For a Boolean query  A NOT A, one expect no results.

                   Thus the query “Varmus” NOT  “Varmus”, yields no  results.

     On the other hand, Execute the following query

                "Varmus" NOT "varmus" [AUTH]

-          Given that you obtained some articles (instead of nothing). Which fields are searched (or which ones are not) with this query

-      What types of articles/publications are most of them (review,news,refereed papers, conference proceedings, clinical studies)?


2)

a) How many articles are there for (don't search with the ":")

                bishop [All Fields]                    

                “bishop” [All Fields] 

b) You should have gotten different numbers for the last two queries. Which fields are searched in those two types of queries. (Hint, try using the details button, which tells you how the system interpreted your query)

3)

a) You are searching for a document containing the nucleotide sequence encoding  bovine serum albumin. Suppose you want to use the word albumin for your search. In which field are the matches most significant? Rank the following fields in order of significance (for searching with the word albumin) or say if they are irrelevant:

Title field

Text Field

Organism field,

Author Field .

b) What is the accession Number of the swiss-prot (swiss-prot record are well annotated protein records from the swiss-prot database)  record for the Bovine serum albumin protein (or it’s precursor protein).

c) What search would you formulate to only get one record as a result (not knowing the accession!)

(hint: Use the limit filter to limit the search to sequences only from the  SWISS-PROT database .. use the details button to see how you would type this in.)

4) [15pts: 5,5,5]

a) Search the NCBI LocusLink resource for the human gene for alcohol dehydrogenase 1 .

(what is the URL that retrieves the Locus Link report for the alcohol dehydrogenase 1 gene)

b) Once you have found this report, follow (click on) the Link of the E.C. Number (1.1.1.1) to the Enzyme Classification database entry at the expasy webserver. Look through the record and follow the Link to the Kyoto database. At the Kegg database, Click on the MAP for the Glycolysis Pathway. (What is the URL of the Map)

c) From this map, Switch to the view of the pathways for Human. The enzymes present in human are highlighted in green. Find out which enzyme, degrading the product of alcohol dehydrogenase in yeast(sacharomyces cerevisiae), is missing in humans.

5) [note: b & c of this question are particularly hard.]

a)By searching one of the various resources (Pubmed, Entrez, OMIM) figure out what generic type of protein  is encoded by the human CD45 gene. (e.g. enzyme, transcription factor, protein kinase, DNA-binding protein,….)

b)How many exons does the complete gene have. (Be careful if using sequence records! A segment can be more than one exon.. read the annotation carefully!)

c)How many exons code for the mature peptide. If you used Entrez nucleotide, tell me which Accession number are you refereeing to. If you used Pubmed then tell me which paper tells you the answer.

6 )

a) The MYC gene contains a DNA-binding motif. What is the DNA-binding motif name? (hint look at the annotation in nucleotide record or at some Pubmed Abstract).

b)The Leucine Zipper is a motif rich in Leucine that allows two proteins to form a strong link with each other, to dimerize. From the Swiss-Prot entry for the human MYC gene, at which end is the Leucine Zipper located.

c)Find two articles that deals with gene regulation in MYC using antisense oligonucleotides. Write down their title and their Pubmed ID.

7)[10pts: 3,3,4]

You are interviewing tomorrow at Celera genomics. Use any search engine to find their website.

a)(what is their URL)

b) The scientific founder, Craig Venter was made famous by his first shotgun sequencing of the H. influenza genome. Search for that bacterial genome in the Genome division of Entrez. What is the accession Number of the genome.

c)Using one of the pre-made views of the RNA genes, what is the position of the tRNA gene that codes for cysteine.