Week6Lab-2023-Bioinfo

docx

School

Rutgers University *

*We aren’t endorsed by this school

Course

11:126:484

Subject

Biology

Date

Jan 9, 2024

Type

docx

Pages

4

Uploaded by DrTarsier3954

Report
Tools for Bioinformatic Analysis Week 6 Lab: Domains, Patterns, and Profile (30 points) PART I - 1. Name different kinds of domains that you retrieved in your sequence. Also give their InterPro ID number? (2 points) Kazal_dom Domain- InterPro ID: IPR002350 EGF-like_dom Domain- InterPro ID: IPR000742 2. For each kind of domain that you answered in question 1 - list below how many of such domains are present in this sequence. Also list the amino acid range over which the domains span? (2 points) There are 3 domains present in this sequence Kazal_dom Domain- 98-145AA Kazal_dom Domain- 189-237AA EGF-like_dom Domain- 271-311AA 3. For each of the domains with an IPR ID- list the individual databases the information was retrieved from and the accession number/ ID number for that database. Certain domains will be listed in more than one databases- list all the databases and their accession ID for that domain. (4 points) Kazal_dom Domain: IPR002350 PROSITE database: Kazal domain profile: PS51465 PFAM database: Kazal domain profile: PF07648 SMART database: Kazal_3: SM00280 EGF-like]\_dom Domain: IPR000742 PROSITE patterns EGF-like domain signature 2: PS01186 PROSITE profiles EGF-like domain profile: PS50026 PROSITE patterns EGF-like domain signature 1: PS00022 4. Read briefly the overview and list below the class of proteins has this domain? (1 point) The class of proteins is- serine proteinase inhibitors. 5. Next Look at the Proteins tab. State how many proteins were matched to contain this domain (doesn’t have to be precise). (1 point) 54K proteins were matched to contain this domain. 6. List three species that have proteins with this domain. (2 points) Brucella anthropi Metarhizium robertsii Ancylostoma ceylanicum 7. Look at the characteristic structural pattern of this domain and list which amino acid(s) is most conserved in this pattern and how many times is it repeated in this pattern? (1 point) Cystine is the most conserved (4 times) and is repeated 6 times in this pattern. 8. Is the residue above listed involved in a special kind of bond- how many such bonds are there? What is the pattern of these bonds? (4 points)
The Kazal inhibitor has six cysteine residues engaged in disulfide bonds arranged as shown in the following schematic representation: +------------------+ | | *******************|*** xxxxxxxxCxxxxxxCx#xxxxxCxxxxxxxxxxCxxCxxxxxxxxxxxxxxxxxC | | | | | +-------------|-----------------+ +----------------------------+ 'C': conserved cysteine involved in a disulfide bond. '#': active site residue. '*': position of the pattern. Yes, Cysteine is included in the formation of disulfide bonds. There are 3 disulfide bonds present in the sequence. These disulfides are formed when two sulfur atoms form a covalent bond when the cysteine residues come close together in the protein's folded structure. The sulfur atoms then undergo oxidation to form the disulfide bond and stabilize the protein. Consensus pattern: C-x(4)-{C}-x(2)-C-x-{A}-x(4)-Y-x(3)-C-x(2,3)-C 9 List the names of any two proteins that are on this page that contain this domain. (1point) Pancreatic secretory trypsin inhibitor Mammalian seminal acrosin inhibitors 10. How many amino acids are the most conserved (at least half height)? What amino acid(s) are they, and in what position(s)? (2 points) 4 amino acids are most conserved that is Cysteine. And they are in the positions: Model Column: 6 Model Column: 15 Model Column: 26 Model Column: 49
PART II- 1. Can you explain what Prosite means by profile versus pattern? Scan the result carefully to answer this question. (1 point) A pattern is a straightforward regular expression that represents an amino acid sequence that is conserved in a collection of connected proteins. A profile is a more thorough description that considers both the possibility of seeing different amino acids at each place as well as the presence or absence of amino acids at a particular position. Both patterns and profiles can be used to characterize conserved sequence motifs, but profiles offer a more flexible and nuanced representation since they consider the frequency of various amino acids at each position. 2. What is the most common pattern did PROSITE find under “ hits by pattern with high probability of occurrences’? What does this pattern indicate in terms of where this protein might be located: cytoplasmic or membrane? Why? (You can use other resources like pubmed, google scholar or UniProt to see what kind of proteins show this modification) The most pattern found was MYRISTYL, PS00008; N-myristoylation site. This indicated that the proteins are found in the cytoplasm. A lipid modification involves the addition of a 14-carbon unsaturated fatty acid, myristic acid, to the N-terminal glycine of a subset of proteins. This is a modification that promotes their binding to cell membranes for a variety of biological functions like protein interactions from inside the cytoplasm.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
PART III: Q1. What is the most common secondary structure present in this unknown protein? What percentage of this secondary structure is present in this protein? (1 point) The most common secondary structure present is the Helix structure. A 51.27% percent of this structure is present in the protein. Q2. Do you think this unknown protein is highly accessible to water/solvent? Why/Why not? (1 point) I think this protein has limited solvent accessibility because of a greater Buried Solvent Accessibility (69.82%). This means that the residues or amino acids are buried when their side chain is in the interior core of the protein. Q3. Based on the prediction result under “Topology”, where do you think this protein locate in a cell? Why? (2 points) The protein is present in the transmembrane and is made of the transmembrane helix. The interior of the transmembrane is hydrophobic and most of the proteins are non-polar in nature as we have seen from the pie chart hence it proves that the protein is in the transmembrane. Q4. Is it likely that this protein will be a DNA or RNA-binding protein? Explain your answer (1 point) Yes, this protein will be DNA or RNA binding because it has arginine amino acid. Because of this amino acid, the protein can form hydrogen bonds with DNA and RNA. Q5. Find the best matched (aligned) structure with the highest sequence identity. Click to open that structure in pdb. Read a bit of summary of this protein on this page. What is the name of the protein? (1 point) The protein is the Major Intrinsic Protein of lens fiber. Q6. Looking at the structure of the matched protein in pdb and reading about it briefly, can you justify your answer to Q3 about the location of your unknown protein? Explain. (1 point) Yes, this protein is present in the transmembrane because the literature from PDB says that it involves in the transport of ions, metabolites, and other molecules.