Building and using DNA Motifs Background Sequence motifs are short, recurring (meaning conserved) patterns in DNA that are presumed to have a biological function. Often they indicate sequence-specific binding sites for proteins and other important markers. However, sometimes they are not exactly conserved, meaning some mutations can happen in a motif in a particular organism. Mutations can be DNA substītutions/deletions/insertions. Therefore, sequences are usually aligned and a consensus pattern of a motif is calculated over all examples from organisms. The following are examples of a transcription factor binding (TFB) site for the lexA repressor in_ E. Coli _located in a file called lexA.fasta: >dinD 32->52 aactgtatataaatacagtt >ding 15->35 tattggctgtttatacagta >dinH 77->97 tcctgttaatccatacagca >dinI 19->39 acctgtataaataaccagta >lexA-1 28->48 tgctgtatatactcacagca >lexA-2 7->27 aactgtatatacacccaggg >polB (dinA) 53->73 gactgtataaaaccacagcc >recA 59->79 tactgtatgagcatacagta >recN-1 49->69 tactgtatataaaaccagtt >recN-2 27->47 tactgtacacaataacagta >recN-3 9-29 TCCTGTATGAAAAACCATTA >ruvAB 49->69 cgctggatatctatccagca >303C 18->38 tactgatgatatatacaggt >303D 14->34 cactggatagataaccagca >sulA 22->42 tactgtacatccatacagta >umuDC 20->40 tactgtatataaaaacagta >uvrA 83->103 tactgtatattcattcaggt >uvrB 75->95 aactgtttttttatccagta >uvrD 57->77 atctgtatatatacccagct Each line that starts with ">" is the header that states what gene this sequence was upstream of and where it is located relative to the gene. (For your purposes, we can ignore this and your code should ignore these lines when parsing the DNA sequences in). Each line in between is each nucleotide sequence of each TFB. Each nucleotide has a position in the sequence. You can assume that all sequences will be the same length. You also can do very minimal input error checking -I won't be checking extensively for input error checking. However, do make sure that if a function relies on another function being run first, you have it do that. Creating DNAMOTIF class You will create a DNAMOTIF class that has the following attributes and functions: 1._init_(self): Initialize the class. self.instances=[] #These are a list of DNA sequence strings (no header) self.consensus=[] # A DNA sequence String self.counts= {'A': [], 'C': [1, 'G':[],'I':[]} # A dictionary of nucleotide counts 2. _str_ Return a string with the sequence instances of the motif on each line 3._len_: Return the length of a motif, which is the length of one of the sequences in the collection. Example Input: lexA=DNAMOTIF () lexA.parse ("lexA.fasta") print (len (lexA) ) Output: 20 4. parse(selfilename): read in DNA instances from a FASTA file Example Usage: lexA.parse ("lexA.fasta") print (lexA)

Database System Concepts
7th Edition
ISBN:9780078022159
Author:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Chapter1: Introduction
Section: Chapter Questions
Problem 1PE
icon
Related questions
Question

I cannot figure out this code, I am having issues with test len, test counts, and test consensus. I will paste my code below:

class DNAMOTIF:
  def __init__(self):
    self.instances=[]
    self.consensus=""
    self.counts={'A':[],'C':[],'G':[],'T':[]}
  
  def __str__(self):
    for i in self.consensus:
      string += i 
    return string
  
  def __len__(self):
    return len(self.instances)
  
  def count(self):
      for i in self.instances:
          up = i.upper()
          self.counts["A"].append(up.count("A"))
          self.counts["C"].append(up.count("C"))
          self.counts["G"].append(up.count("G"))
          self.counts["T"].append(up.count("T"))

  def compute_consensus(self):
    A = self.counts["A"] #computing the most frequent nucleotide in this position
    C = self.counts["C"]
    G = self.counts["G"]
    T = self.counts["T"]
    for i in range(len(A)):
      if(A[i] >= C[i] and A[i] >= G[i] and A[i] >= T[i]):
        self.consensus.append("A")
      elif (C[i] >= G[i] and C[i] >= T[i]):
        self.consensus.append("C")
      elif (G[i] >= T[i]):
        self.consensus.append("G")
      else:
        self.consensus.append("T")
  
  def parse(self,filename):
    with open(filename,'r') as f: #reading the data from the file
        for i in f:
            if ">" in i:
                continue
            else:
                self.instances.append(i) #stores data from file in instances

Building and using DNA Motifs
Background
Sequence motifs are short, recurring (meaning conserved) patterns in DNA that are presumed to have a biological function. Often they
indicate sequence-specific binding sites for proteins and other important markers. However, sometimes they are not exactly conserved,
meaning some mutations can happen in a motif in a particular organism. Mutations can be DNA substitutions/deletions/insertions.
Therefore, sequences are usually aligned and a consensus pattern of a motif is calculated over all examples from organisms.
The following are examples of a transcription factor binding (TFB) site for the lexA repressor in_ E. Coli _located in a file called lexA.fasta:
>dind 32->52
aactgtatataaatacagtt
>ding 15->35
tattggctgtttatacagta
>dinH 77->97
tcctgttaatccatacagca
>dinI 19->39
acctgtataaataaccagta
>lexA-1 28->48
tgctgtatatactcacagca
>lexA-2 7->27
aactgtatatacacccaggg
>polB (dinA) 53->73
gactgtataaaaccacagcc
>recA 59->79
tactgtatgagcatacagta
>recN-1 49->69
tactgtatataaaaccagtt
>recN-2 27->47
tactgtacacaataacagta
>recN-3 9-29
TCCTGTATGAAAAACCATTA
>ruvAB 49->69
cgctggatatctatccagca
>308C 18->38
tactgatgatatatacaggt
>803D 14->34
cactggatagataaccagca
>sulA 22->42
tactgtacatccatacagta
>umuDC 20->40
tactgtatataaaaacagta
>uvrA 83->103
tactgtatattcattcaggt
>uvrB 75->95
aactgtttttttatccagta
>uvrD 57->77
atctgtatatatacccagct
Each line that starts with ">" is the header that states what gene this sequence was upstream of and where it is located relative to the gene.
(For your purposes, we can ignore this and your code should ignore these lines when parsing the DNA sequences in). Each line in between
is each nucleotide sequence of each TFB. Each nucleotide has a position in the sequence. You can assume that all sequences will be the
same length.
You also can do very minimal input error checking - I won't be checking extensively for input error checking. However, do make sure
that if a function relies on another function being run first, you have it do that.
Creating DNAMOTIF class
You will create a DNAMOTIF class that has the following attributes and functions:
1._init_(self): Initialize the class.
self.instances=[]
#These are a list of DNA sequence strings (no header)
self.consensus=[]
# A DNA sequence String
self.counts= {'A': [], 'C': [], 'G':[],'T':[]}
# A dictionary of nucleotide counts
2. _str_: Return a string with the sequence instances of the motif on each line
3._len_: Return the length of a motif, which is the length of one of the sequences in the collection.
Example Input:
lexA=DNAMOTIF ()
lexA.parse ("lexA.fasta")
print (len (lexA))
Output:
20
4. parse(self,filename): read in DNA instances from a FASTA file
Example Usage:
lexA.parse ("lexA.fasta")
print (lexA)
Transcribed Image Text:Building and using DNA Motifs Background Sequence motifs are short, recurring (meaning conserved) patterns in DNA that are presumed to have a biological function. Often they indicate sequence-specific binding sites for proteins and other important markers. However, sometimes they are not exactly conserved, meaning some mutations can happen in a motif in a particular organism. Mutations can be DNA substitutions/deletions/insertions. Therefore, sequences are usually aligned and a consensus pattern of a motif is calculated over all examples from organisms. The following are examples of a transcription factor binding (TFB) site for the lexA repressor in_ E. Coli _located in a file called lexA.fasta: >dind 32->52 aactgtatataaatacagtt >ding 15->35 tattggctgtttatacagta >dinH 77->97 tcctgttaatccatacagca >dinI 19->39 acctgtataaataaccagta >lexA-1 28->48 tgctgtatatactcacagca >lexA-2 7->27 aactgtatatacacccaggg >polB (dinA) 53->73 gactgtataaaaccacagcc >recA 59->79 tactgtatgagcatacagta >recN-1 49->69 tactgtatataaaaccagtt >recN-2 27->47 tactgtacacaataacagta >recN-3 9-29 TCCTGTATGAAAAACCATTA >ruvAB 49->69 cgctggatatctatccagca >308C 18->38 tactgatgatatatacaggt >803D 14->34 cactggatagataaccagca >sulA 22->42 tactgtacatccatacagta >umuDC 20->40 tactgtatataaaaacagta >uvrA 83->103 tactgtatattcattcaggt >uvrB 75->95 aactgtttttttatccagta >uvrD 57->77 atctgtatatatacccagct Each line that starts with ">" is the header that states what gene this sequence was upstream of and where it is located relative to the gene. (For your purposes, we can ignore this and your code should ignore these lines when parsing the DNA sequences in). Each line in between is each nucleotide sequence of each TFB. Each nucleotide has a position in the sequence. You can assume that all sequences will be the same length. You also can do very minimal input error checking - I won't be checking extensively for input error checking. However, do make sure that if a function relies on another function being run first, you have it do that. Creating DNAMOTIF class You will create a DNAMOTIF class that has the following attributes and functions: 1._init_(self): Initialize the class. self.instances=[] #These are a list of DNA sequence strings (no header) self.consensus=[] # A DNA sequence String self.counts= {'A': [], 'C': [], 'G':[],'T':[]} # A dictionary of nucleotide counts 2. _str_: Return a string with the sequence instances of the motif on each line 3._len_: Return the length of a motif, which is the length of one of the sequences in the collection. Example Input: lexA=DNAMOTIF () lexA.parse ("lexA.fasta") print (len (lexA)) Output: 20 4. parse(self,filename): read in DNA instances from a FASTA file Example Usage: lexA.parse ("lexA.fasta") print (lexA)
Example Input:
lexA=DNAMOTIF ()
lexA.parse ("lexA.fasta")
print (len (lexA)
Output:
20
4. parse(self filename): read in DNA instances from a FASTA file
Example Usage:
lexA.parse ("lexA.fasta")
print (lexA)
aactgtatataaatacagtt
tattggctgtttatacagta
tcctgttaatccatacagca
acctgtataaataaccagta
tgctgtatatactcacagca
aactgtatatacacccaggg
gactgtataaaaccacagcc
tactgtatgagcatacagta
tactgtatataaaaccagtt
tactgtacacaataacagta
TCCTGTATGAAAAACCATTA
cgctggatatctatccagca
tactgatgatatatacaggt
cactggatagataaccagca
tactgtacatccatacagta
tactgtatataaaaacagta
tactgtatattcattcaggt
aactgtttttttatccagta
atctgtatatatacccagct
5. count(self): Count occurrences of A's, C's, G's, and T's in each position and store in a dictionary. Convert all sequences to upper case
for consistency
Example Input:
lexA.count ()
To Access Result:
lexA.counts3{'A': [5, 13, 0, 0, о, 1, 15, 1, 15, 4, 12, 6, 16, 6, 10, 0, 19, 0, 0, 12], "С': [2,
3, 18, 0, 0, 0, 1, 2, 0, 1, 3, 6, 1, 4, 8, 19, 0, 0, 6, 1], 'G': [1, 2, 0, 0, 19, 3, 0, 1, 3, 1,
1, 0, 0, 0, 0, 0, о, 18, 3, 1]1, 'T': [11, 1, 1, 19, 0, 15, 3, 15, 1, 13, 3, 7, 2, 9, 1, 0, 0, 1,
10, 5]}
6. compute_consensus(self): Return an UPPERCASE sequence of the most frequent nucleotides in each position of the motif. If more
than one are tied, return the first one lexicographically.
Example Input:
lexA.compute consensus ()
To Access Result:
print (lexA.consensus)
TACTGTATATATATACAGTA
Transcribed Image Text:Example Input: lexA=DNAMOTIF () lexA.parse ("lexA.fasta") print (len (lexA) Output: 20 4. parse(self filename): read in DNA instances from a FASTA file Example Usage: lexA.parse ("lexA.fasta") print (lexA) aactgtatataaatacagtt tattggctgtttatacagta tcctgttaatccatacagca acctgtataaataaccagta tgctgtatatactcacagca aactgtatatacacccaggg gactgtataaaaccacagcc tactgtatgagcatacagta tactgtatataaaaccagtt tactgtacacaataacagta TCCTGTATGAAAAACCATTA cgctggatatctatccagca tactgatgatatatacaggt cactggatagataaccagca tactgtacatccatacagta tactgtatataaaaacagta tactgtatattcattcaggt aactgtttttttatccagta atctgtatatatacccagct 5. count(self): Count occurrences of A's, C's, G's, and T's in each position and store in a dictionary. Convert all sequences to upper case for consistency Example Input: lexA.count () To Access Result: lexA.counts3{'A': [5, 13, 0, 0, о, 1, 15, 1, 15, 4, 12, 6, 16, 6, 10, 0, 19, 0, 0, 12], "С': [2, 3, 18, 0, 0, 0, 1, 2, 0, 1, 3, 6, 1, 4, 8, 19, 0, 0, 6, 1], 'G': [1, 2, 0, 0, 19, 3, 0, 1, 3, 1, 1, 0, 0, 0, 0, 0, о, 18, 3, 1]1, 'T': [11, 1, 1, 19, 0, 15, 3, 15, 1, 13, 3, 7, 2, 9, 1, 0, 0, 1, 10, 5]} 6. compute_consensus(self): Return an UPPERCASE sequence of the most frequent nucleotides in each position of the motif. If more than one are tied, return the first one lexicographically. Example Input: lexA.compute consensus () To Access Result: print (lexA.consensus) TACTGTATATATATACAGTA
Expert Solution
steps

Step by step

Solved in 2 steps

Blurred answer
Knowledge Booster
Intelligent Machines
Learn more about
Need a deep-dive on the concept behind this application? Look no further. Learn more about this topic, computer-science and related others by exploring similar questions and additional content below.
Similar questions
Recommended textbooks for you
Database System Concepts
Database System Concepts
Computer Science
ISBN:
9780078022159
Author:
Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:
McGraw-Hill Education
Starting Out with Python (4th Edition)
Starting Out with Python (4th Edition)
Computer Science
ISBN:
9780134444321
Author:
Tony Gaddis
Publisher:
PEARSON
Digital Fundamentals (11th Edition)
Digital Fundamentals (11th Edition)
Computer Science
ISBN:
9780132737968
Author:
Thomas L. Floyd
Publisher:
PEARSON
C How to Program (8th Edition)
C How to Program (8th Edition)
Computer Science
ISBN:
9780133976892
Author:
Paul J. Deitel, Harvey Deitel
Publisher:
PEARSON
Database Systems: Design, Implementation, & Manag…
Database Systems: Design, Implementation, & Manag…
Computer Science
ISBN:
9781337627900
Author:
Carlos Coronel, Steven Morris
Publisher:
Cengage Learning
Programmable Logic Controllers
Programmable Logic Controllers
Computer Science
ISBN:
9780073373843
Author:
Frank D. Petruzella
Publisher:
McGraw-Hill Education