Number of Polymorphisms 10000- 1000- 100- 10- Chromosome 6 Polymorphism Density Plot 06+00 16+07 26+07 Chromosome Position 36+07 46+07 56+07 Figure 1 - Polymorphism density plot create using a window size of 1,000,000 with an increment of 100,000 HINTS: 1. The VCF file is tab delimited 2. You don't have to use Biopython to parse the file 3. Use the library matplotlib to create the image 4. In the VCF file the polymorphism data for each individual starts with the genotype call. 0/0 means the individual does NOT have a polymorphism at that location and 1/1 means that it does have a polymorphism at that location. When resequencing a genome a researcher is often interested in how the polymorphisms in a genome are positioned relative to the reference. They may have questions like "are the polymorphisms evenly distributed or are they concentrated in particular regions of the genome". For this assignment, you will write a python program to parse a Variant Call Format (VCF) file and then create a polymorphism density plot from the data extracted from the file. Briefly, a VCF file is a standard text file format to record information about polymorphisms found in a genome. The file begins with a header section (lines beginning with the '#' symbol) followed by a title line with the polymorphism records appearing after that. There is one record per line and each record captures information like where in the genome the polymorphism was found, what is the polymorphism relative to the reference, and what kind of data is present to support the 'calling' of the polymorphism. This could be information like the number of sequencing reads supporting the call and the quality score assigned to the call. The variant call information for more than one individual can be present in a record. To calculate the polymorphism density you do the following for each individual; 1. Establish a window of X bases wide and count the number of polymorphisms in that window 2. Record the polymorphism count and the start position of the of window 3. Shift the window down the chromosome by Y bases, count the number of polymorphisms in the window. You will be counting many of the same polymorphisms you counted in the previous window. 4. Record the polymorphism count and the current start position of the window 5. Continue moving the window down the chromosome by Y bases, counting the polymorphisms, and recording the count and position data until your window reaches the end of the chromosome 6. Do this for all of the individuals in the VCF file 7. Create a line graph of the (count, position) data for each individual. The graph should present one line for each parent. NOTE: Assume the VCF file will only have data for 1 chromosome The program will prompt the user for the name of the VCF file, the window size, and the increment value The program will create a polymorphism density plot similar to the example given. This assignment will be marked on the following: 1. Correctness of function 2. Clearly written, formatted and documented code 3. Proper error handling 4. Formatting of the polymorphism density image

Database System Concepts
7th Edition
ISBN:9780078022159
Author:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Chapter1: Introduction
Section: Chapter Questions
Problem 1PE
icon
Related questions
Question
100%

ps I cannot attach the chr02.vcf.gz file, I will link a dropbox that you can assess it - https://easyupload.io/261jzv

Please make sure that your graph matches the one attached to this. DO NOT USE DEF FUNCTION. 

Number of Polymorphisms
10000-
1000-
100-
10-
Chromosome 6 Polymorphism Density Plot
06+00
16+07
26+07
Chromosome Position
36+07
46+07
56+07
Figure 1 - Polymorphism density plot create using a window size of 1,000,000 with an increment of
100,000
HINTS:
1. The VCF file is tab delimited
2. You don't have to use Biopython to parse the file
3. Use the library matplotlib to create the image
4. In the VCF file the polymorphism data for each individual starts with the genotype call. 0/0
means the individual does NOT have a polymorphism at that location and 1/1 means that it does
have a polymorphism at that location.
Transcribed Image Text:Number of Polymorphisms 10000- 1000- 100- 10- Chromosome 6 Polymorphism Density Plot 06+00 16+07 26+07 Chromosome Position 36+07 46+07 56+07 Figure 1 - Polymorphism density plot create using a window size of 1,000,000 with an increment of 100,000 HINTS: 1. The VCF file is tab delimited 2. You don't have to use Biopython to parse the file 3. Use the library matplotlib to create the image 4. In the VCF file the polymorphism data for each individual starts with the genotype call. 0/0 means the individual does NOT have a polymorphism at that location and 1/1 means that it does have a polymorphism at that location.
When resequencing a genome a researcher is often interested in how the polymorphisms in a genome
are positioned relative to the reference. They may have questions like "are the polymorphisms evenly
distributed or are they concentrated in particular regions of the genome". For this assignment, you will
write a python program to parse a Variant Call Format (VCF) file and then create a polymorphism density
plot from the data extracted from the file.
Briefly, a VCF file is a standard text file format to record information about polymorphisms found in a
genome. The file begins with a header section (lines beginning with the '#' symbol) followed by a title
line with the polymorphism records appearing after that. There is one record per line and each record
captures information like where in the genome the polymorphism was found, what is the polymorphism
relative to the reference, and what kind of data is present to support the 'calling' of the polymorphism.
This could be information like the number of sequencing reads supporting the call and the quality score
assigned to the call. The variant call information for more than one individual can be present in a
record.
To calculate the polymorphism density you do the following for each individual;
1. Establish a window of X bases wide and count the number of polymorphisms in that window
2. Record the polymorphism count and the start position of the of window
3. Shift the window down the chromosome by Y bases, count the number of polymorphisms in the
window. You will be counting many of the same polymorphisms you counted in the previous
window.
4. Record the polymorphism count and the current start position of the window
5. Continue moving the window down the chromosome by Y bases, counting the polymorphisms,
and recording the count and position data until your window reaches the end of the
chromosome
6. Do this for all of the individuals in the VCF file
7. Create a line graph of the (count, position) data for each individual. The graph should present
one line for each parent.
NOTE: Assume the VCF file will only have data for 1 chromosome
The program will prompt the user for the name of the VCF file, the window size, and the increment
value
The program will create a polymorphism density plot similar to the example given.
This assignment will be marked on the following:
1. Correctness of function
2. Clearly written, formatted and documented code
3. Proper error handling
4. Formatting of the polymorphism density image
Transcribed Image Text:When resequencing a genome a researcher is often interested in how the polymorphisms in a genome are positioned relative to the reference. They may have questions like "are the polymorphisms evenly distributed or are they concentrated in particular regions of the genome". For this assignment, you will write a python program to parse a Variant Call Format (VCF) file and then create a polymorphism density plot from the data extracted from the file. Briefly, a VCF file is a standard text file format to record information about polymorphisms found in a genome. The file begins with a header section (lines beginning with the '#' symbol) followed by a title line with the polymorphism records appearing after that. There is one record per line and each record captures information like where in the genome the polymorphism was found, what is the polymorphism relative to the reference, and what kind of data is present to support the 'calling' of the polymorphism. This could be information like the number of sequencing reads supporting the call and the quality score assigned to the call. The variant call information for more than one individual can be present in a record. To calculate the polymorphism density you do the following for each individual; 1. Establish a window of X bases wide and count the number of polymorphisms in that window 2. Record the polymorphism count and the start position of the of window 3. Shift the window down the chromosome by Y bases, count the number of polymorphisms in the window. You will be counting many of the same polymorphisms you counted in the previous window. 4. Record the polymorphism count and the current start position of the window 5. Continue moving the window down the chromosome by Y bases, counting the polymorphisms, and recording the count and position data until your window reaches the end of the chromosome 6. Do this for all of the individuals in the VCF file 7. Create a line graph of the (count, position) data for each individual. The graph should present one line for each parent. NOTE: Assume the VCF file will only have data for 1 chromosome The program will prompt the user for the name of the VCF file, the window size, and the increment value The program will create a polymorphism density plot similar to the example given. This assignment will be marked on the following: 1. Correctness of function 2. Clearly written, formatted and documented code 3. Proper error handling 4. Formatting of the polymorphism density image
Expert Solution
steps

Step by step

Solved in 2 steps

Blurred answer
Similar questions
  • SEE MORE QUESTIONS
Recommended textbooks for you
Database System Concepts
Database System Concepts
Computer Science
ISBN:
9780078022159
Author:
Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:
McGraw-Hill Education
Starting Out with Python (4th Edition)
Starting Out with Python (4th Edition)
Computer Science
ISBN:
9780134444321
Author:
Tony Gaddis
Publisher:
PEARSON
Digital Fundamentals (11th Edition)
Digital Fundamentals (11th Edition)
Computer Science
ISBN:
9780132737968
Author:
Thomas L. Floyd
Publisher:
PEARSON
C How to Program (8th Edition)
C How to Program (8th Edition)
Computer Science
ISBN:
9780133976892
Author:
Paul J. Deitel, Harvey Deitel
Publisher:
PEARSON
Database Systems: Design, Implementation, & Manag…
Database Systems: Design, Implementation, & Manag…
Computer Science
ISBN:
9781337627900
Author:
Carlos Coronel, Steven Morris
Publisher:
Cengage Learning
Programmable Logic Controllers
Programmable Logic Controllers
Computer Science
ISBN:
9780073373843
Author:
Frank D. Petruzella
Publisher:
McGraw-Hill Education