Graded HW 4

docx

School

Johns Hopkins University *

*We aren’t endorsed by this school

Course

633

Subject

Biology

Date

Jan 9, 2024

Type

docx

Pages

6

Uploaded by murphydanyael

Report
Name: Danyael Murphy Modules 7-8: Graded Homework 4 1. (2 points) Compare PSI-BLAST and JACKHMMER using the chicken TIA-1 protein. In both cases, use the Swiss-Prot database, limited to Birds. Use a PSI-BLAST cutoff of 0.0001. >chicken protein TIA-1 MEDEMPKTLYVGNLSRDVTEALILQLFSQIGPCKNCKMIMDTAGNDPYCFVEFYEHRHAASALAAMNGRK IMGKEVKVNWATTPSSQKKDTSNHFHVFVGDLSPEITTEDIKAAFAPFGRISDARVVKDMATGKSKGYGF VSFFNKWDAENAIQQMGGQWLGGRQIRTNWATRKPPAPKSTYESNTKQLSYDDVVNQSSPSNCTVYCGGV TSGLTEQLMRQTFSPFGQIMEIRVFPDKGYSFVRFNSHESAAHAIVSVNGTTIEGHVVKCYWGKETPDMI SPVQQNQIGYPQAYGQWGQWYGNAQLGQYVPNGWQVPAYGMYGQAWNQQGFNQTQSSAAWMGANYSVQPP QGQNGSVLTNQAGYRVAGFETQ Iteration PSI-BLAST JACKHMMER First Interation 16 14 Second Iteration 22 22 a. (0.8 points) Fill in the table above with number of matches above threshold for both PSI-BLAST (0.0001 cutoff) and JACKHMMER (default settings). b. (0.4) Compare the new matches on second iteration using both methods. Which proteins are also found on iteration 2 using the other method. Insulin-like growth factor 2 mRNA-binding protein 3 Insulin-like growth factor 2 mRNA-binding protein 1 RNA-binding protein with multiple splicing 2 Histone-lysine N-methyltransferase SETD1B THO complex subunit 4 Cleavage and polyadenylation specificity factor subunit 6 c. (0.4) Identify and list any protein found using one method that was NOT found using the other method. Pre-mRNA-splicing factor RBM22 Epithelial splicing regulatory protein 2 Rad52 motif-containing protein 1 Meiosis regulator and mRNA stability factor 1
d. (0.4) For proteins found using only one of the methods, how would you determine if that protein was "correctly" matched? Check the E value of the protein and run an individual pairwise alignment of the protein against the query sequence. 2. (1) Run PHI-BLAST with the cow clusterin sequence as a query. Search the Swiss- Prot database, no organism limit. Set the PSI-BLAST iteration threshold for 0.0001. The pattern for clusterin to use in PHI_BLAST is C-L-[RK]-M-[RK]-x-[EQ]-C-[ED]-K-C. >Cow Clusterin MKTLLLLMGLLLSWESGWAISDKELQEMSTEGSKYVNKEIKNALKEVKQIKTQIEQTNEE RKLLLSSLEEAKKKKEDALNDTRDSENKLKASQGVCNETMTALWEECKPCLKQTCMKFYA RVCRSGSGLVGHQLEEFLNQSSPFYFWINGDRIDSLMENDREQSHVMDVMEDSFTRASSI MDELFQDRFFLRRPQDTQYYSPFSSFPRGSLFFNPKSRFARNVMPFPLLEPFNFHDVFQP FYDMIHQAQQAMDAHLQRTPYHFPTMEFTENNDRTVCKEIRHNSTGCLRMKDQCEKCQEI LEVDCSASNPTQTLLRQQLNASLQLAEKFSRLYDQLLQSYQQKMLNTSALLKQLNEQFTW VSQLANLTQSDDQHYLQVFTVNSHNSDPSIPSGLTKVIVKLFNSFPITVTVPQEVSSPNF MENVAEKALQQYRRKSQEE a. (0.4) List each organism matched in the first iteration. Bos taurus Sus scrofa Equus callabus Canis lupus familiaris Homo sapiens Rattus norvegicus Mus musculus Mesocrietus auratus b. (0.4) List the total number of sequences at each iteration. Iteration 1 – 9 Iteration 2 – 16
c. (0.2) Why might some of those sequences that are new in iteration 2 have not shown up in iteration 1? In each iteration specific sites are given more or less weight and conservation at those sites are considered more or less important in choosing hits. Meaning, more distant homologs can be found, increasing the number of hits. 3. (1) Take the FASTA formatted sequence set below. Discussion topic: Briefly describe anything you can find about this protein family. Are there any published multiple alignments? >human_CDGSH_iron-sulfur_domain- containing_protein_3,_mitochondrial_precursor MRGAGAILRPAARGARDLNPRRDISSWLAQWFPRTPARSVVALKTPIKVELVAGKTYRWCVCGRSKKQPF CDGSHFFQRTGLSPLKFKAQETRMVALCTCKATQRPPYCDGTHRSERVQKAEVGSPL >chimpanzee_PREDICTED:_similar_to_conserved_hypothetical_protein MRGVGAILRPAARGARDLNPRRDISSWLARWFPRTPARSVVALKTPIKVELVAGKTYRWCVCGRSKKQPF CDGSHFFQRTGLSPLKFKAQETRMVALCTCKATQRPPYCDGTHRSERVQKAEVGSPL >dog_PREDICTED:_CDGSH_iron-sulfur_domain- containing_protein_3,_mitochondrial-like MSGVGAAVRPAAWRLSQRRDLSSWLSRWFPKTPAKSVVALKAPIKVELVAGKTYRWCVCGRSKKQPFCDG SHFFQRTGLSPLKFKAQETRVMALCTCKATQKPPYCDGTHRSERVQKAELGSPL >cow_PREDICTED:_similar_to_hCG31406 MGSVRGVGTLVRPAAWTLNQRRDITSWLARWFPKTPAKSVVALKTPIKVELVAGKTYRWCVCGRSKKQPF CDGSHFFKRTGLSPLKFKAQETRTVALCTCKATQKPPYCDGTHRSEQVQKAELGSSL >mouse_CDGSH_iron-sulfur_domain- containing_protein_3,_mitochondrial_precursor MGFRRLSFPTDFIFLFPNHICLPALSKPYQRREISSWLARWFPKDPAKPVVAQKTPIRLELVAGKTYRWC VCGRSKNQPFCDGSHFFQRTGLSPLKFKAQETRTVALCTCKATQRPPYCDGTHKSEQVQKAEVGSPL >rat_CDGSH_iron_sulfur_domain-containing_protein_3,_mitochondrial MGFRCLPFSTDRIFLFPNHICLSALTKLCQRREISSWLARWFPKDPAKPVVAQKTPIRLELVAGKTYRWC VCGRSKNQSSLCRSLSLSFWRTLPHFPVLCGWDEKPSSEGLLCGLWTALLRWLPLLPAHWPFPTQVQGRR DTHSGPLYLQGHSAAPLL Run Clustal Omega and MUSCLE on the sequences. Both programs should be able to align a long stretch of identical amino acids across all sequences. a. (0.3) How long is that region with 100% identity? ELVAGKTYRWCVCGRSK 17 residues b. (0.2) One sequence has two consecutive leucines at the C-terminus. From what rat
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
species? c. (0.2) Use BLAST to find three more homologs to add to your sequences. Try to include one somewhat distantly related protein. List the source of your three proteins, species and accession number. Gorilla gorilla gorilla NP_001129970.1 Orcinus orca XP_004282752.1 Sorex fumeus XP_055983799.1 d. (0.3) Pick either Clustal Omega or MUSCLE and compare the output before including the three new sequences to the alignment with the three new sequences. Things to consider: how long is the longest region of identity, do the original conserved regions still appear to be conserved in the new alignment? Are there gaps not previously there? ELVAGKTYRWCVCG 14 residues The longest region of identity is shorter, there are added gaps. Most but not all of the previously conserved regions are still there. 4. (1) This alignment comes from a 2018 paper (Sirisena et al. Fish and Shellfish Immunol. 84:73-82). The authors used Clustal omega. Try that and two other MSA programs to align these sequences (sod_alignment_files.txt).
a. (0.1) List the three programs you chose. Clustal Omega MUSCLE T-COFFEE b. (0.3) Which result looks closest to the result in the paper? Briefly explain how you determined this. Clustal Omega looks closest. I determined this based off of the number of similar regions in the alignment. c. (0.6) The diagram shows two yellow boxes and four purple arrows. They highlight six amino acids that are identical across all species. For each of the three programs, list how many of those six amino acids were properly aligned. 6 6 6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help