Write a function in Python that takes a DNA sequence and kmer size (integer) as input, and returns a dictionary of all kmers (keys) in the string with a list of positions as values. The positions should start at 1. Use your function to make a dictionary of the 'seq' string below and print the dictionary. The following sequence with size = 3 should return: seq = 'ATCGTTCATCG' kmerdict(seq, 3) {'ATC': [1, 8], "CAT': [7, CGT': [3], "GTT': [4], 'TCA': [6], 'TCG': [2, 9], 'TTC': [5]} Note that the order in the output is not important. Use your function and the second string and print the positions of all ATGS |: seq = 'ATCGTTCATCG def kmerdict(sequence, size): index = {} return index seq2 3D 'САСтТСАСТССАТGGCCCАТСТСТСАTGAATCAGTACСАААТGCAСТСАСАТСАТТАТGСАCGGCACTTGссТСAGCGGTCTАТАСССТGTGCСАТТТАСССАТААCGCCC print( "Here are all the ATG positions in seq2: ")
Could you help with the code and explanation?
The approach i used is as follows:-
- First find all the possible substrings of the string and get them in the list
- Then only take those substrings which are of the size in our case it is 3
- Then find the occurence of those substring in the sequecne using startswith
- Then add the occurences one by one in the list
- Add the record in the dictionary as the substring as key and list of occurence as value
- Finally return the dictionary
Everything is mentioned in the code comments
Code is added in the step 2 along with screenshot for the code and output
#To find all the substring of string
We use approach in which 1 string hold the index of current element and other element take the substrings
Ex. Hello
Now i will hold H and j will also hold H first substring = H
Now i stays there only and j increment to e second substring = He again j increments we get substring = Hel
Like this we do till end but at end j will be at o hence we will get hello but we eliminate that and not add that
Then i increment to e and also j = e hence another substring = e again j keeps incrementing till end
Trending now
This is a popular solution!
Step by step
Solved in 2 steps with 3 images