Lab1

docx

School

Johns Hopkins University *

*We aren’t endorsed by this school

Course

633

Subject

Biology

Date

Jan 9, 2024

Type

docx

Pages

10

Uploaded by murphydanyael

Report
Intro to Bioinformatics Lab 1 Name: Danyael Murphy 1. Search the MeSH database with “Dwarfin-C.” a. (0.2) What is the preferred term on the MeSH record that appears? Smad5 Protein Click the subheading “genetics” and add to search builder to search PubMed. You should find a 2022 paper by Y. Ning and others. b. (0.1) Which two genes in the title belong to the mothers against decapentaplegic protein family? SMAD1 and SMAD5 Goal: Use the revision history for cattle SMAD5 to answer some questions. Resources The NCBI Gene database The RefSeq protein record for cattle SMAD5 Revision History for an NCBI record The NCBI Nucleotide database Hint: Use the transcript table in the gene database to get the accession numbers for the transcript and the protein c. (0.1) At what numerical position of the protein is the C-terminal Mad Homology 2 (MH2) domain 265…465 d. (0.1) What is the accession number with version of the mRNA that codes for this protein? NM_001077107.4 e. (0.2) When did version 4 of the mRNA replace version 3? May 30, 2018 f. (0.2) List the sequence length of both version 2 and version 3. 6870, 6869 g. (0.2) In the Gene database linked to this record, list the Official Symbol and the Official Full Name. SMAD5, SMAD family member 5
2. Find a 2010 paper from the Journal of Biological Chemistry whose first author is Maxim V. Gerashchenko. Open the abstract a. (0.1) The authors suggest that an unusual CUG codon is capable of initiating translation in the mouse TGR RNA. What start codon usually initiates translation? AUG b. (0.1) The authors speculate that the inefficient CUG start codon allows for what phenomenon involving the protein product(s)? Generation of protein isoforms Goal: Find four protein isoforms of TGR in mouse ( Mus musculus ) to answer questions and complete the table below . Given 1. TGR is also known as Txnrd3 or TR2. That should help you find the correct record in the NCBI Gene database. 2. The longer mRNA contains two extra coding exons. Resource: All relevant information for the table should be in the appropriate NCBI Gene record. Information about the isoforms is in the section of that record titled “mRNA and proteins.” c. (0.1) The SECIS element allows for the insertion of what amino acid for a UGA codon? selenocysteine d. Open the mRNA record for isoform 1 (TGR-L). What is the EC number of TGR? EC 1.8.1.9 e. (0.1) The beginning of the CDS is the beginning of the start codon. What codon is in that position in the GenBank record? CTG f. (0.1) The beginning of the CDS is the beginning of the start codon. What codon is in that position in the GenBank record? CTG g. (0.4) Complete the table below. isofor m NM NP Start Codon 2 internal coding exons 1 NM_001178058 NP_001171529 CUG includes those extra exons 2 NM_153162.3 NP_694802.2 ATG includes 3 NM_001178059 NP_001171530. 1 CTG missing those extra exons 4 NM_001178060. 1 NP_001171531. 1 ATG missing
3. Goal: Enter a series of searches in the NCBI Nucleotide database and then use Boolean operators to accomplish certain search goals. Given: Three searches: “factor A domain” (filtered for RefSeq only) ---use the quotes rodents [orgn] (no filters) mouse [orgn] (no filters) Resource: NCBI Nucleotide, Advanced Search Builder a. (0.2) List the number of records for all three searches. Search # of records “factor A domain” (filtered for RefSeq only) 9796 rodents [orgn] (no filters) 20819962 mouse [orgn] (no filters) 11244658 b. (0.1) What does using quotes in the protein 5A search accomplish? narrows the results. The words must appear as an exact phrase. c. (0.1) Describe in detail a search strategy for rodent records that meet the “factor A domain” criteria but are NOT from mouse. "factor A domain" AND rodent [orgn] NOT mouse [orgn] d. (0.1) How many non-mouse rodent records for “factor A domain” are there in RefSeq? 820 e. (0.1) RefSeq sequences that have PREDICTED in the description have accession numbers that start with what letter? XM f. (0.1) Use the filters at the upper right to limit to Rattus norvegicus . Then filter for mRNA. How many results? 48 g. (0.1) Click the record (not marked PREDICTED) with “5A” (not “5A-like”) in the title. What is the CDS location on this mRNA? 115…2505
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
h. (0.1) Follow the protein link. What are the numerical locations of the metal ion- dependent adhesion? 287,289,291,367,394 i. (0.1) What is the “human homolg” (“homolog” is misspelled once in the record) described as? Gene A 4. Search by description for the enzyme commission entry for “mixed linkage beta- glucanase.” https://enzyme.expasy.org/ a. (0.1) List the accepted name of the enzyme and the EC number for the first listed match. Goal: Perform a filtered search of the NCBI protein database based on an EC identifier. Given: The EC number you found in your previous search. Resource: The NCBI protein database. b. (0.3) Search NCBI Protein using the EC number, limit your search to proteins from bacteria ONLY and also limit your output to records in the Swiss-Prot database. Write out (describe) your complete search strategy. Include how you limited to Swiss-Prot. 3.2.1.73[ECNO] AND (bacteria[filter] AND swissprot[filter]) c. (0.1) How many records in the bacterial search above? 12 d. (0.1) Use the filters to limit your results to records from Acetivibrio thermocellus . You should have two records. List the protein lengths for both. 334 e. (0.1) The two proteins differ only at amino acid 149. Find the amino acid at that position for each (one-letter code is fine). V, A f. (0.1) In the comment section of one record, it states that this protein may form part of a multienzyme complex. What is the name of that complex? cellulosome g. (0.1) List the numerical location of the Type I dockerin repeat domain 271 licheninase 3.2.1.73
5. Find mouse proteins (proteins from mouse) in NCBI with a molecular weight between 55,250 and 55,260 daltons. a. (0.2) Write out the search strategy in detail. ("Mus"[Organism] OR "Mus musculus"[Organism] OR mouse[All Fields]) AND (000055250[MOLWT] : 000055260[MOLWT]) b. (0.1) How many results 223 c. (0.1) Open the L-amino acid oxidase precursor record. How many amino acids? 523 Goal: Find the amino oxidase region and examine the revision history for changes to the protein sequence. Given: Your previous search should have given you a protein from Mus musculus . Resource: The NCBI protein record that you found and its revision history. d. (0.1) Looking at the title of the 2021 reference, L-amino acid oxidase 1 is associated with what process in male mice and bulls? Reproductive performance e. (0.1) What is the numerical location of the amino -oxidase region? 67…508 f. (0.1) When was the accession number first seen at NCBI? AK052781.1 g. (0.1) When did version 3 replace version 2? April 25, 2009
h. (0.2) The amino acid sequence appears to have changed at position 771. What was the change? 6. Find a paper in Journal of Biological Chemistry ( www.jbc.org ) in the May 2022 issue by Mark F. Mablanglo and Walid A. Houry. a. (0.1) What highly conserved serine protease is the focus of this study? b. (0.2) List two ATPase hexamers that form larger complexes with ClpP? Goal: Search the NCBI protein database for more information on the protease from part a. Given: You are looking for the protein in Escherichia coli O157:h7 str. Sakai. Resources: NCBI Gene Database (good starting point), NCBI Protein Database (RefSeq) c. (0.1) What is the gene description from the gene database? d. (0.2) List the protein accession number (NP)? e. (0.2) What is the numerical location and amino acid of the active site? f. (0.2) Click the identical proteins link. What organism other than E. coli shows up in the table? 7. Using the nucleotide database, find all sequences that are between 3200 and 3300 base pairs in length that are from Drosophila melanogaster and have the word “homeobox” in the title. a. (0.3) Write your search strategy.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
b. (0.1) Open the record for the retinal homeobox protein. What consortium is listed on the first reference? c. (0.1) On what chromosome is this gene? d. (0.1) Follow the protein link. How many amino acids? e. (0.2) What are the numerical locations of the OAR domain? f. (0.2) Where is the homeobox domain? 8. Open the NCBI nucleotide record with accession NM_004323.1. a. (0.05) How many base pairs? b. (0.1) What is the modification date (top line)? c. (0.1) At what nucleotide location is the CDS? d. (0.1) At the top, it states that this record has been updated. Click on the current version. What is the version number? e. (0.05) What is the CDS location? f. (0.1) Click the protein link. How many amino acids? g. (0.1) Follow the PubMed link (not RefSeq or Weighted). Open the 2022 abstract by Kilbas et al. Overexpression of Bag-1 decreases the efficiency of what drugs? h. (0.2) Design a search for the human Bag1 protein at NCBI. Bag1 should be limited to the title field and ONLY human records should appear. Write out your exact search strategy below. i. (0.1) Open the result for the protein variant. What is the numerical location of the ubiquitin- like domain? j. (0.1) The ubiquitin-like domain begins with what amino acid (name the amino acid, not the number)? 9. Find a paper in PubMed authored by N. Fa-Hui. a. (0.1) What is the publication date? b. (0.1) For what gene did they identify a transcript variant? c. (0.1) What amino acids are in the new
transcript variant that are not found in isoform a? d. (0.1) What tissues highly express this isoform? Goal: Examine authors’ implication that this “novel transcript” was a new discovery. Resources: Links to the Gene database from PubMed and links from there to various protein records, OMIM database Hint: Examine the section of the gene record marked mRNA and Proteins. e. (0.1) From the gene record, what is the official full name? f. (0.1) Which isoform has six more amino acids than isoform a? g. (0.1) From the protein record’s revision histories, when was the longer isoform first seen at NCBI? h. (0.2) The extra six amino acids in this isoform are SFPLKQ. Knowing that, was the “novel” transcript isolated by Fa-Hui et al. in 2008 really a new discovery? Briefly explain. i. (0.1) From OMIM: According to Marion et al., what might be the dual role of this protein? 10. Go to the MeSH database and enter “Pecten Oculi.” a. (0.1) What is the title of the record that comes up? b. (0.1) Click the checkbox by metabolism and then add to the search builder and search PubMed. How many results?
c. (0.2) Add “wu [1au]” to the search window and find a March 2014 paper from those results. From what journal? d. (0.2) Follow the Protein (RefSeq) link. You should see four records. List each protein length. Goal: Determine why RefSeq, a non-redundant database, has two records for each protein isoform. Given 1. The four records represent alternative splice products from the same gene. 2. There are two records for protein isoform 1. Each is identical in amino acid sequence. 3. Likewise, there are two records for protein isoform 2. Likewise, each is identical in amino acid sequence. 4. Each of the four protein sequences has a corresponding mRNA record. Resources: NCBI Protein database, NCBI Nucleotide database, RefSeq e. (0.2) Record the accession number and sequence length of each of the four corresponding mRNA sequence records. Accession Sequence length f. (0.2) Looking only at isoform 1, if the mRNAs are different but the coding regions are identical, then in what regions of the mRNA are the differences between those two mRNA variants? AN# gender chromosomes autosome pairs sex chromosomes organism taxid MM1302 male 7 2 X,Y1,Y2 muntjac 9888 MM1303 female 6 2 X,X muntjac 9888 CA123 none 14 7 none yeast 5476 CF13 male 78 38 X,Y dog 9615 CF14 female 78 38 X,X dog 9615
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
HS1 male 46 22 X,Y human 9606 HS2 female 46 22 X,X human 9606 GG236 male 18 8 Z,Z chicken 9031 GG237 female 18 8 Z,W chicken 9031 DB478 male 134 66 X,Y rhino 9805 DB479 female 134 66 X,X rhino 9805 CE13 hermaph. 12 5 X,X roundworm 6239 CE14 male 11 5 X roundworm 6239 taxid species common name domain kingdom class 5476 Candida albicans yeast eukaryota fungi saccharomycetes 6239 Caenorhabidits elegans roundworm eukaryota metazoa chromadorea 9031 Gallus gallus chicken eukaryota metazoa aves 9606 Homo sapiens human eukaryota metazoa mammalia 9615 Canis familiaris dog eukaryota metazoa mammalia 9805 Diceros bicornis black rhinoceros eukaryota metazoa mammalia 9888 Muntiacus muntjak Indian muntjac eukaryota metazoa mammalia 11. Use the table above to answer the following questions: a. (0.2) Because the databases are linked, what is the name for this kind of database? Derived b. (0.1) What is a primary key for the top database? AN# c. (0.1) What is a secondary key for the top database? taxid d. (0.2) The black rhinoceros and the Indian muntjac represent the extremes in chromosome number for mammals. What is a muntjac (use the web)? Barking deer/rib-faced deer e. (0.1) Why do male and female muntjacs have different numbers of chromosomes (not looking for the mechanism—just a simple explanation from viewing the above table)? Males has 3 sex chromosomes f. (0.1) Why do male and hermaphrodite roundworms have different numbers (again— simple explanation)? Hermaphrodite has 2 sex chromosomes