Lab 4

docx

School

Johns Hopkins University *

*We aren’t endorsed by this school

Course

633

Subject

Biology

Date

Jan 9, 2024

Type

docx

Pages

9

Uploaded by murphydanyael

Report
Lab 4 - Protein Analysis Name: Danyael Murphy 1. Goal: Find patterns, profiles and HMMs in a sequence. Given: The LETM1 protein. Prosite ( http://prosite.expasy.org/ ) MotifScan ( https://myhits.sib.swiss/cgi-bin/motif_scan/ ) a. (0.2) List all patterns and profiles from Prosite that are above the high probability match bar. PS00018 – EF-hand calcium-binding domain PS50222 – EF-hand calcium-binding domain profile PS51758 – Letm1 ribosome-binding (RBD) domain b. (0.1) Find a short (high probability of occurrence) pattern with a secondary structure motif (e.g. zinc finger or leucine zipper). Which one is found? PS00029 – Leucine zipper pattern c. (0.1) What is the consensus pattern for that motif? L-x(6)-L-x(6)-L-x(6)-L d. (0.1) Give the numerical locations of each amino acid residue that fits the pattern. 548..569 e. (0.1) Find what HMMs (local models) are matched by this sequence. Which web server did you select and why? Pfam HMMs (local models). It is a database with a large collection of protein families represented by multiple sequence alignments and hidden Markov Models. f. (0.1) What is the strongest E-value (local models) and its description? 1.9e-180. LETM1-like protein. g. (0.1) Members of this family are typically found in what organelle’s membrane? (Hint: copy the PF number and search the pfam dtabase) Inner mitochondrial membrane h. (0.2) Some HMMs match with as E value that may fall short of significant. Would these be false positives? Please explain. It is a possibility that those sequences may be false positives. E-values are very dependent on the query sequence length and database size. Short identical sequences may have a high E- value and may be regarded as “false positive.”
2. Goal: Determine whether a membrane protein is likely alpha-helical or beta-strand, then predict transmembrane regions. Given: membrane_protein.txt from Piscirickettsia salmonis HNN: ( https://npsa-prabi.ibcp.fr/cgi-bin/npsa_automat.pl? page=/NPSA/npsa_hnn.html ) TMHMM: ( http://www.cbs.dtu.dk/services/TMHMM/ ) Phobius: ( http://phobius.sbc.su.se/ ) PRED-TMBB: ( http://biophysics.biol.uoa.gr/PRED-TMBB/ ) a. (0.3) Is this protein predicted to be more alpha- helical or more beta-strand? Beta-strand b. (0.4) Based on your prediction, which transmembrane prediction program did you choose and why? PRED-TMBB because it is capable of predicting and discriminating beta-barrel outer membrane proteins c. (0.3) How many transmembrane regions are predicted? 8 3. Goal: Find all potential tyrosine phosphorylation sites in a protein. Given: mgf.txt Prosite: ( http://prosite.expasy.org ) NetPhos: ( https://services.healthtech.dtu.dk/service.php?NetPhos-3.1 ) a. (0.4) Which program did you select and why? NetPhos. The server predicts serine, threonine or tyrosine phosphorylation sites b. (0.3) Which tyrosine residues are predicted phosphorylated with > 50% probability? Residues at positions: 74, 123, 198, 219, 281, 305, 325 c. (0.3) What are the advantages and disadvantages of lowering the threshold to 35% probability? Increased sensitivity but decreased specificity.
4. (0.8 points) Goal: Find and examine a recent PDB entry. Given: Author is C.M. Hoel; http://www.rcsb.org/pdb a. (0.1) How many entries for C.M. Hoel? 5407 b. (0.1) Sort by release date. On what date was the most recent structure released? 11/08/2023 c. (0.1) What method was used to determine the structure? X-ray diffraction d. (0.1) At what resolution? 1.48 angstroms e. (0.1) From what organism is this protein? Serve acute respiratory syndrome coronavirus 2 f. (0.1) What is the expression system? Escherichia coli g. (0.1) How many Ramachandran outliers? 0 h. (0.1) There is no SCOP/CATH classification data in this record in the Annotations tab. Why might that be? Because it is a non-structural protein. 5. (1.2 points) Goal: Run and analyze localization predictions. Given: Problem 5 Sequence. It is an animal sequence. Predict its subcellular location using six different location predictors. For each one, simply list the result of the subcellular location. a. (0.2) DeepLoc Tree ( https://services.healthtech.dtu.dk/service.php? DeepLoc-2.0 ) List the most likely location and probability, Cytoplasm, 0.7072 b. (0.2) Predotar (https://urgi.versailles.inra.fr/predotar/). List the most likely probability & prediction. Elsewhere, 0.99 c. (0.2) Euk-mPLoc 2.0 (http://www.csbio.sjtu.edu.cn/bioinf/euk-multi-2/). List the predicted location(s). Endoplasmic reticulum d. (0.2) DeepMito (http://busca.biocomp.unibo.it/deepmito/). List all results in table. Not mitochondrial Score 0.15 e. (0.2) CELLO (http://cello.life.nctu.edu.tw/). List the CELLO Prediction: first and second place values. Cytoplasmic 2.599 Nuclear 1.001 f. (0.2) Wolf PSORT (https://wolfpsort.hgc.jp/) list the locations of the two most common neighbors. Cysk 956.2 Extr 1148.66
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
6. Goal: Examine the structural comparison of two MMDB structures Given: NCBI structure database. The structures are 2IFQ and 1GH2. Start with 2IFQ and find 1GH2 among VAST neighbors (it is not on the front page of results). Cn3D: use Style>Coloring Shortcuts>Aligned a. (0.1) What is the sequence identity of the pair of sequences? 42.16% b. (0.1) Comment on the strength of the 3-D alignment (visualize superposition). The 3D image of the proteins appears to be very similar. The domains of IGH2 are larger than 2IFQ c. (0.1) What is more identical—sequence or structure? structure d. (0.2) List the alignment length (aligned residues), RMSD and sequence identity. Alignment length – 104 RSMD – 1.01 Identity- 41.3 e. (0.2) Which is more common (and briefly explain): High sequence identity with lower structure alignment Lower sequence identity with high structure alignment Lower sequence identity with high structure alignment. Structural alignment requires no prior knowledge of equivalent pairs of residues, does not rely on the sequence alignment, and the type of residues are ignored when the correspondence is established. Goal: Examine a structure from the NCBI structure database. Given: 3A5T in the NCBI Structure database. View the structure in Cn3D. f. (0.1) There should be one stretch of three leucines in Chain A where leucine appears every seventh position. Give the numerical (PDB) locations of those three leucines. 63,70, 77, g. (0.1) What is the nature of Chains C&D (fundamentally different molecule type from Chains A&B)? Nucleotides, “other DNA” h. (0.1) Are the C&D chains helical in structure? yes
7. Goal: Analyze very complete reports on a protein sequence. Given: Use the “unknown gorilla protein.” Predict Protein: ( https://predictprotein.org ) Phyre2: ( http://www.sbg.bio.ic.ac.uk/phyre2/html/page.cgi?id=index ) a. (0.2) How many total alpha helices are predicted by the RePROF program? 3 b. (0.1) What does Predict Protein predict for protein localization (it’s an animal protein)? Membrane protein c. (0.1) From PHYRE2, the depicted model is based on the template from what PDB structure (ignore the “c” at the beginning). 7jozR d. (0.2) What is the confidence in the structure model for that structure? 99.3 e. (0.2) What is the percent identity? 44 f. (0.2) How many alpha-helices were predicted (be careful of wrap-around)? 3
8. Transmembrane protein Goal: Start with a sequence and predict patterns and transmembrane domains. Given: problem 8 sequence; PROSITE, TMHMM or Phobius a. (0.2) Find all long (not high-frequency match) patterns that match this sequence. List the accession number and the description for each. PS00014 b. (0.2) Why is there no E value for this pattern? Because the pattern sequence is very short, it may be a false positive. c. (0.2) Interpret (explain in a sentence) the regular expression for this consensus pattern (> means “at the C-terminus”). At position 1 are amino acid residues, KRHQSA, at position 2, DENQ, at position 3, E, and at position 4, E, at the C-terminus end of the protein. d. (0.2) Which program(s) would you choose to find transmembrane domains? TMHMM or Phobius e. (0.2) Where is the transmembrane domain located? What is the predicted location (inside or outside) of the C-terminal domain? 338…359, inside 9. Interproscan search Goal: Examine Interproscan results. Given: AAC99858.1; Interproscan ( http://www.ebi.ac.uk/interpro/search/sequence-search ) a. (0.2) Looking at the sequence, what amino acid tends to be most prevalent? Proline b. (0.2) Examine the results from the Prints database (follow the external link). These proteins account for how much of the dry weight of the cell wall? c. (0.2) Does the species for this sequence have cell walls? Based on your answer, do you believe that the prediction is correct? No d. (0.1) Go back to the InterProScan results. Click the link beginning with PF (follow external link). What is the name of the motif? WH2 (Wiskott Aldrich Syndrome homology region 2) e. (0.1) WASP proteins help control the polymerization of what cytoskeletal protein? actin f. (0.2) With what syndrome is the domain associated? Wiskott-Aldrich
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
10. Goal: Find interacting proteins to a human protein. Given: The human PPARA protein, MENTHA database ( http://www.mentha.uniroma2.it/ ), STRING database ( https://string-db.org/ ). a. (0.1) What is the PPARA protein (what does PPARA stand for)? Peroxisome proliferator-activated receptor alpha b. (0.2) Find the top five interacting proteins for human PPARA from Mentha and from STRING. Enter the gene symbols in the table. There may be fewer than five Top 5 from Mentha Top 5 from STRING NCOR1 PPARGC1A NCOA1 RXRA RXRA NCOA1 EP300 NCOR2 NCOR2 NCOR1 c. (0.1) Which interacting proteins did STRING find by experiments? All 5 proteins Goal: Examine the structure of 6KYP Given: PDB# 6KYP, PDB database ( https://www.rcsb.org/ ) d. (0.1) More alpha-helical or mostly beta- strand? More alpha helical e. (0.1) Solution Structure or Crystal Structure? Crystal f. (0.2) Looking at the Protein Feature View, at what position is the “artifact?” Position 1-4 g. (0.2) Open the full validation report. Under Entry composition, list the numbers of each type of atom in the A chain. C – 1380 N – 363 O – 393 S – 19
11. Goal: Examine some theoretical trypsin digests and collect data on a protein Given: Mascot protein attached. Peptide Cutter: ( http://web.expasy.org/peptide_cutter/ ). Select trypsin as the only enzyme. Protein Prospector: ( http://prospector.ucsf.edu ) Peptides package in R a. (0.1) How many trypsin cleavage sites from Peptide Cutter? 77 b. (0.1) Run Protein Prospector and do a trypsin digest (MS digest). Use Raw format for the sequence. What are the isoelectric point (pI) and the molecular weight? pI: 5.0 MW: 73624 c. (0.1) What is the most common amino acid in this protein? Alanine d. (0.1) What is the smallest and largest fragment (monoisotopic) in the m/z (mi) column? Smallest (K) NVLEKAK (R) Largest (R)QQQQQQWFMDEQQQQQQHFNATNNQYGDQR(G) e. (0.1) What is the start and end position for the fragment with sequence FQANKSNGR? 132..140 f. (0.2) Blast that fragment. What is the most likely known protein that has this fragment? Transport protein SEC9 g. (0.1) What is the molecular weight of the mascot protein from the R package, monoisotopic? Monoisotopic = FALSE, 73623.76 Monoisotopic = TRUE, 73579.14 h. (0.1) What is the isoelectric point using the Stryer methodology? 5.127907 i. (0.1) What is the hydrophobicity using the Kyte-Doolittle scale? -1.26851
12. At the AlphaFold database, find human SERTA domain-containing protein 3 Goal: Examine a predicted structure and compare to another prediction Given: AlphaFold ( https://www.alphafold.ebi.ac.uk ) HNN ( https://npsa-prabi.ibcp.fr/cgi-bin/npsa_automat.pl? page=/NPSA/npsa_hnn.html ) a. (0.4) How many alpha helices appear to be present in the AlphaFold Prediction? 4 b. (0.2) How many alpha helices appear to be present in the HNN prediction? Website not working a. (0.4) Fill the table below looking at four amino acids, as predicted by AlphaFold and by HNN. Amino Acid Confidence Score- AlphaFold Helix, sheet or coil - AlphaFold Helix, sheet or coil - HNN Arg35 90.38 coil Val132 62.78 coil Ala136 62.06 coil Pro170 63.78 helix
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help