2_Proj_S23

docx

School

Northeastern University *

*We aren’t endorsed by this school

Course

2301

Subject

Biology

Date

Apr 3, 2024

Type

docx

Pages

6

Uploaded by KidBeeMaster1068

Report
Project Two: The ABCs (or ACGTs) of Life (25 pts possible) Life revolves around an unbroken chain of transmission of information, encoded in molecules and containing the instructions to build and maintain the code's housing (the organism), and replicate and transmit the message the code carries to keep the chain going. In this lab session you'll review the basics of the message and its molecules, then explore how information about particular messages (genes) is represented in a GenBank record. You'll also get to formulate and test a couple of hypotheses concerning GC content . Learning goals: ~Given a DNA sequence, analyze the six potential reading frames to predict the mRNA and peptide sequences. ~Use the NCBI website to obtain specific information on a given protein and its nucleotide sequences. ~Given starting information, formulate a hypothesis and test it by analyzing online data. I. Review of DNA basics (10 questions worth one pt each) 1. What happens if a nucleotide that is missing the 3' -OH group is added to a DNA strand that is being copied? The strand of DNA will not be able to form a bond with the 5’ phosphate group, meaning the two strands cannot join together. 2. How many 60 bp pieces of DNA would fit into a length of 60 Mbp? (bp stands for base pair) There would be 60,000,000 bp in a length of 60 Mbp. 1 Mbp = 1,000,000 bp, so 60 x 1,000,000 would be 60,000,000. 3. The genome of SARS-CoV-2 is described as being ~30,000 bases in length, rather than 30 bp. What is the difference? Bases in length refers to how many bases there are that make up the length of the genome and base pairs are found in double stranded DNA and refer to the amount of bonds formed between the
bases. SARS-CoV-2 is described as bases in length because it is made of a single strand of RNA, not DNA, therefore making it not out of base pairs. 4. What is the difference between a codon and a reading frame (RF)? The difference between a codon and a reading frame is that a codon is a set of three base pairs or nucleotides that code for a specific amino acid and a reading frame is a set of 3 non-overlapping nucleotide-codons in DNA that code for a specific gene. 5. How many distinct codons are possible with 4 nucleotides? There are 64 codons possible with 4 nucleotides. 6. How many naturally occurring amino acids are known to exist? Around 500 amino acids are known to exist in nature, only 20 of those are in the human body. 7. How many codons code for methionine? Only one codon codes for methionine, AUG. 8. How many codons code for leucine? There are 6 different codons that can produce leucine. 9. What do the amino acids A, C, G, and T each stand for? The amino acid A stands for adenine, T for thymine, G for guanine, and C for cytosine. 10. If the strand given is the template strand, draw the corresponding mRNA sequence, again labeling the 5' and 3' ends: 5'-CCAGCGTAAGCGGGAGCAAG-3' 3’-GGUCGCAUUCGCCCUCGUUC-5’ It's important to have a clear understanding of the difference between the two sets of chromosomes you inherit from your mother and father, and the two complementary strands of a DNA molecule. Take chromosome 6, for example. You have two copies in each cell. Both of these are made of dsDNA, like the helix at the left in the figure above. So one double helix is the maternal copy, one is the paternal copy. If you "denature" the DNA of, say, the paternal copy (separate the two strands of the double helix), you have two separate strands. These strands are complementary to one another. Please don't think that one strand is paternal and one maternal! Again, do not confuse the two complementary strands of DNA with paternal/maternal copies (known as "alleles"). II. Reading frames (7 questions worth 1 pt each) The concept of open reading frames is very important in genetics and is key in mutational analysis. We will revisit this concept throughout this semester, but for now let’s get acquainted with how codons, reading frames and the translation tool ExPASy all fit together. In the research world, sequences are usually presented in FASTA format . FASTA format consists of two parts: a “header” and a nucleotide or amino acid sequence. The header includes text or numbers to identify the sequence-- such as its name or the organism from which it comes-- and begins with a greater-than sign (>). This
symbol lets any program that is processing the sequence know that the text on that line should not be treated as part of the sequence. For example, if your sequence is identified as being from a cat, you don't want ExPASy to translate "cat" into histidine! The sequence itself begins on the second line. Please read this overview of reading frames ("How to find ORF" under "Theory" tab only). You may also find this page helpful. Assume the following sequence in FASTA format is given 5' to 3': >seq_frag CCAGCGTAAGCGGGAGCAAGTTGCATTACG CG TAA TGC AAC TTG CTC CCG CTT ACG CTG G 11. What is the second codon of the 2nd forward (+2) reading frame? The second codon of the [+2] reading frame is CGT. 12. What is the first codon of the 2nd reverse (-2) reading frame? The first codon of the [-2] reading frame is GTA. 13. What is the third codon of the 3rd reverse (-3) reading frame? The third codon of the [-3] reading frame is TGC The sequence below ("sequence x") is a partial protein-coding sequence so it might not include the start and stop codons . It comes from a bacterium so it lacks introns. The most likely reading frame out of the six is the one that produces the longest stretch of uninterrupted translation, representing part of a functional protein. Use Expasy translate and examine all six possible reading frames. 14. Which reading frame is most likely to be the correct one and why? The reading frame that is most likely to be correct would frame 2. This is because on Expasy, there is the longest open reading frame out of all the other frames. The most likely one should be one with the largest open reading frame, so this would make sense to be the correct one. 15. What is the DNA codon in “sequence_x” that codes for the first amino acid of the most likely reading frame? ( You do NOT need a codon chart for this .) The DNA codon in ‘sequence x’ that codes for the first amino acid of the most likely reading frame is ATG. >sequence_x TGCCAAATCTCCAAAGAGCCAGGACAAGCAGCGTCCACTGCAGGTTCCAGTGTCAGTGGCCATGATGAGT CCCCAAGTGATCACGCCACAACAGATGCAGCAGATCCTCCAGCAGCAGGTTCTTTCCCCACAGCAGCTCC AGGCTTTGCTCCAGCAGCAGCAGGCAGTGATGCTGCAGCAGCAACACCTGCAGGAGTTTTATAAGAAACA GCAGGAGCAGCTTCATCTGCAGCTTCTCCAGCAACAGCACCCCGGCAAGCAGGCTAAAGTGCTGCATCTG CAACAACAGCAGGCTCTACAAGCAGCGAGGCAATTACTCTTGCAGCAGCCAGGCAGTGGCCTG cDNA is mRNA that has been reverse transcribed into DNA. Remember that the introns of DNA are removed during mRNA processing. In other words, cDNA only includes exons.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Expressed sequence tags (ESTs) are short pieces of cDNA representing only the ends of an mRNA sequence. The Wikipedia article about ESTs is actually pretty good. Read it here . The following sequence is an EST. >EST_1 ACTTTTTAAATCAGTTCAAGCTGGTCCTCCTCGAGTTCTTGTGAGTTTAGAC Use the ExPASy translate tool to answer the questions below: 16. Which reading frame most likely represents the true reading frame for EST_1? The reading frame that most likely represents the true reading frame for EST_1 is frame 2. 17. Which codon corresponds to the second amino acid of that reading frame? The codon that corresponds to the second amino acid is TTT. III. Deciphering a GenBank record (4 questions worth 1 pt each) These questions will give you experience searching an NCBI Gene page. There is a lot of information about genes in the NCBI gene database. It is a valuable tool to become familiar with. Navigate to the NCBI website . NCBI stands for the National Center for Biotechnology Information. It is a federally funded group of databases. Change the database to “gene”, as shown below. Use the search bar to locate the record for one of the proteins mutated in one of the diseases from Week 1. After clicking “search”, answer the following questions about the gene. 18. What functions are carried out by the protein product of this gene? The HBB gene and the protein product of this determine the structure of the 2 types of polypeptide chains in adult hemoglobin. Mutant HBB causes sickle cell anemia. 19. How many exons does this gene have? This gene has 3 exons. Scroll down down down the page to the “NCBI Reference Sequences (RefSeq)” section. Look at the “mRNA and protein(s)” subsection. An accession number is a unique number assigned to a record of interest in a database, such as a gene or protein. In the NCBI databases, specifically RefSeq, accession numbers that start with 'NM' refer to mRNA, and those starting with 'NP' refer to proteins. Find the link to the protein record (should start with NP). 20. How long is the protein (in amino acids)? The protein is 147 amino acids long.
21. What is the title of the most recent journal article involving this protein? The title of the most recent article is “Sickle cell allele HBB-rs334(T) is associated with decreased risk of childhood Burkitt lymphoma in East Africa”. IV. GC-content and testing GC-content hypotheses (9 questions worth 2 pts each) 22. Review this page about GC-content if you haven’t already. Which nucleotide base pair has more hydrogen bonds and greater thermostability: G-C or T-A? G-C has more hydrogen bonds and thermostability. The Human Genome Project revealed that genes are randomly scattered throughout the genome. Therefore, the density of genes among chromosomes varies. Please note that gene density refers to how many genes per unit length and NOT to the absolute number of genes. Look at the following table, which includes the gene density per chromosome (see Genes/Mb column). You will use this table to answer the following four questions. 23. What is the difference between the last column and the next to last column? Which one of these two columns gives the value for gene density? The difference between the last and next to last column is that the last column is Genes/Mb and the next to last is only Genes. The last column gives you the value for gene density. 24. Which autosome has the greater number of genes per unit length?
Which autosome has the least number of genes per unit length? The autosome with the greatest number of genes per unit length is 19 and the one with the least number of genes per unit length is Y. 25. Which of the two autosomes from question 24 do you predict will have the higher GC content? Use this webpage to check your answer. Out of the two autosomes, I think that 19 will have a greater G-C content because there is a higher gene density and more opportunity for G-C base pairings.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help