C++ Code: In checkpoint B, you will build on checkpoint A to load (from standard input) a database of individuals and their known counts for several STRs (Short Tandem Repeats). Given a query DNA sequence, you will then check the database to find the name of the individual most likely to have that DNA. You will implement the following functions: /** * Reads from standard input a list of Short Tandem Repeat (STRs) * and their known counts for several individuals * * @param nameSTRs the STRs (eg. AGAT, AATG, TATC) * @param nameIndividuals the names of individuals (eg. Alice, Bob, Charlie) * @param STRcounts the count of the longest consecutive occurrences of each STR in the DNA sequence for each individual * @pre nameSTRs, nameIndividuals, and nameSTRs are empty * @post nameSTRs, nameIndividuals and STRcounts are populated with data read from stdin **/ void readData(vector& nameSTRs, vector& nameIndividuals, vector>& STRcounts) For example, consider the input: 3 AGAT AATG TATC Alice 5 2 8 Bob 3 7 4 Charlie 6 1 5 It shows, in the first line, the number of STRs followed by the names of those STRs, which will be populated into the vector nameSTRs. The remaining lines contain data for a number of individuals. Their names will be populated into the vector nameIndividuals and the longest consecutive counts of STRs will be stored in the 2D vector STRcounts (which is a vector of vector of ints). Elements in a 2D vector are vector themselves. Check this resource for learning more about 2D Vectors in C++. Note, that an empty line at the end of the input denotes the end of data. In other words, the code should stop reading names and STR counts as soon as an empty line is encountered. /** * Prints a list of Short Tandem Repeat (STRs) and their * known counts for several individuals * * @param nameSTRs the STRs (eg. AGAT, AATG, TATC) * @param nameIndividuals the names of individuals (eg. Alice, Bob, Charlie) * @param STRcounts the STR counts * @pre nameSTRs, nameIndividuals, and STRcounts hold the data intended to be printed * @post the name of individuals and their STR counts in a column-major format are printed to stdout **/ void printData(vector& nameSTRs, vector& nameIndividuals, vector>& STRcounts) This function will print out the information that has been previously read (using the function readData) in a format that aligns an individual's STR counts along a column. For example, the output for the above input will be: name Alice Bob Charlie ---------------------------------------- AGAT 5 3 6 AATG 2 7 1 TATC 8 4 5 This output uses text manipulators to left-align each name and counts within 10 characters. The row of dashes is set to 40 characters. /** * Computes the longest consecutive occurrences of several STRs in a DNA sequence * * @param sequence a DNA sequence of an individual * @param nameSTRs the STRs (eg. AGAT, AATG, TATC) * @returns the count of the longest consecutive occurrences of each STR in nameSTRs **/ vector getSTRcounts(string& sequence, vector& nameSTRs) For example, if the sequence is AACCCTGCGCGCGCGCGATCTATCTATCTATCTATCCAGCATTAGCTAGCATCAAGATAGATAGATGAATTTCGAAATGAATGAATGAATGAATGAATGAATG and the vector namesSTRs is {"AGAT", "AATG", "TATC"}, then the output is the vector {3, 7, 4} /** * Compares if two vectors of STR counts are identical or not * * @param countQuery STR counts that is being queried (such as that computed from an input DNA sequence) * @param countDB STR counts that are known for an individual (such as that stored in a database) * @returns a boolean indicating whether they are the same or not **/ bool compareSTRcounts(vector& countQuery, vector& countDB) For example, if countQuery is the vector {3, 7, 4}, and countDB is the vector {3, 3, 4}, the function returns false. Bringing it all together in main Part of the main() function is already written for you: it reads a query DNA sequence and also reads (and prints) the database of STR counts for several individuals. Do NOT change that code. You should only need to make modifications beyond this point. In particular, your code should display the counts for each STR in the query DNA sequence. If there is a match with one of the individuals, it should display their name. If there is no match with any individual, then the program should output No Match found. For example, for the above query sequence, the output is: Counts of the STRs in the DNA sequence is: 3 7 4 Found Match: Bob
C++ Code:
In checkpoint B, you will build on checkpoint A to load (from standard input) a
You will implement the following functions:
- /** * Reads from standard input a list of Short Tandem Repeat (STRs) * and their known counts for several individuals * * @param nameSTRs the STRs (eg. AGAT, AATG, TATC) * @param nameIndividuals the names of individuals (eg. Alice, Bob, Charlie) * @param STRcounts the count of the longest consecutive occurrences of each STR in the DNA sequence for each individual * @pre nameSTRs, nameIndividuals, and nameSTRs are empty * @post nameSTRs, nameIndividuals and STRcounts are populated with data read from stdin **/ void readData(vector<string>& nameSTRs, vector<string>& nameIndividuals, vector<vector<int>>& STRcounts)
For example, consider the input:
3 AGAT AATG TATC Alice 5 2 8 Bob 3 7 4 Charlie 6 1 5
It shows, in the first line, the number of STRs followed by the names of those STRs, which will be populated into the vector nameSTRs. The remaining lines contain data for a number of individuals. Their names will be populated into the vector nameIndividuals and the longest consecutive counts of STRs will be stored in the 2D vector STRcounts (which is a vector of vector of ints). Elements in a 2D vector are vector themselves. Check this resource for learning more about 2D
Note, that an empty line at the end of the input denotes the end of data. In other words, the code should stop reading names and STR counts as soon as an empty line is encountered.
- /** * Prints a list of Short Tandem Repeat (STRs) and their * known counts for several individuals * * @param nameSTRs the STRs (eg. AGAT, AATG, TATC) * @param nameIndividuals the names of individuals (eg. Alice, Bob, Charlie) * @param STRcounts the STR counts * @pre nameSTRs, nameIndividuals, and STRcounts hold the data intended to be printed * @post the name of individuals and their STR counts in a column-major format are printed to stdout **/ void printData(vector<string>& nameSTRs, vector<string>& nameIndividuals, vector<vector<int>>& STRcounts)
This function will print out the information that has been previously read (using the function readData) in a format that aligns an individual's STR counts along a column. For example, the output for the above input will be:
name Alice Bob Charlie ---------------------------------------- AGAT 5 3 6 AATG 2 7 1 TATC 8 4 5
This output uses text manipulators to left-align each name and counts within 10 characters. The row of dashes is set to 40 characters.
- /** * Computes the longest consecutive occurrences of several STRs in a DNA sequence * * @param sequence a DNA sequence of an individual * @param nameSTRs the STRs (eg. AGAT, AATG, TATC) * @returns the count of the longest consecutive occurrences of each STR in nameSTRs **/ vector<int> getSTRcounts(string& sequence, vector<string>& nameSTRs)
For example, if the sequence is
AACCCTGCGCGCGCGCGATCTATCTATCTATCTATCCAGCATTAGCTAGCATCAAGATAGATAGATGAATTTCGAAATGAATGAATGAATGAATGAATGAATG
and the vector namesSTRs is {"AGAT", "AATG", "TATC"}, then the output is the vector {3, 7, 4}
- /** * Compares if two vectors of STR counts are identical or not * * @param countQuery STR counts that is being queried (such as that computed from an input DNA sequence) * @param countDB STR counts that are known for an individual (such as that stored in a database) * @returns a boolean indicating whether they are the same or not **/ bool compareSTRcounts(vector<int>& countQuery, vector<int>& countDB)
For example, if countQuery is the vector {3, 7, 4}, and countDB is the vector {3, 3, 4}, the function returns false.
Bringing it all together in main
- Part of the main() function is already written for you: it reads a query DNA sequence and also reads (and prints) the database of STR counts for several individuals. Do NOT change that code. You should only need to make modifications beyond this point. In particular, your code should display the counts for each STR in the query DNA sequence. If there is a match with one of the individuals, it should display their name. If there is no match with any individual, then the program should output No Match found. For example, for the above query sequence, the output is:
Counts of the STRs in the DNA sequence is: 3 7 4 Found Match: Bob
Trending now
This is a popular solution!
Step by step
Solved in 3 steps with 1 images