Write a program (call it intersection.py) that finds all gene symbols that appear both in the chr21_genes.txt file and in the HUGO_genes.txt file. These gene symbols should be printed to a file in alphabetical order (you can hard code the output file OUTPUT/intersection_output.txt) . The program should also print on the terminal how many common gene symbols were found. Use Lists or Sets to solve the problem. It is fine to us
Write a program (call it intersection.py) that finds all gene symbols that appear both in the chr21_genes.txt file and in the HUGO_genes.txt file. These gene symbols should be printed to a file in alphabetical order (you can hard code the output file OUTPUT/intersection_output.txt) . The program should also print on the terminal how many common gene symbols were found. Use Lists or Sets to solve the problem. It is fine to use a temporary Dictionary to find the intersection of two Lists, but this can be simplified with Sets. Note: HUGO_genes.txt could have some duplicate entries.
Remember to have these command line options:
$ python3 intersection.py -h usage: intersection.py [-h] -i1 FILE1 -i2 FILE2 Provide two gene list (ignore header line), find intersection optional arguments: -h, --help show this help message and exit -i1 INFILE1, --infile1 INFILE1 Gene list 1 to open -i2 INFILE2, --infile2 INFILE2 Gene list 2 to open
$ python3 intersection.py -i1 chr21_genes.txt -i2 HUGO_genes.txt # the N's below are an integer and bolded for illustration only
Number of unique gene names in chr21_genes.txt: N
Number of unique gene names in HUGO_genes.txt: N
Number of common gene symbols found: N
Output stored in OUTPUT/intersection_output.txt
STDOUT is shown above, and the actual output of the intersection goes to the file (OUTPUT/intersection_output.txt) from this program:ABCG1 ADAMTS1 ADAMTS5
...
...
... If you implemented intersection.py correctly, this program could take any gene file that has the gene in the first column (even if it's the only column)
(additional examples: hgnc_complete_set_reduced.txt (Links to an external site.) and gene_age.txt (Links to an external site.))
$ python3 intersection.py -i1 hgnc_complete_set_reduced.txt -i2 HUGO_genes.txt # the N's below are an integer and bolded for illustration only
Number of unique gene names in hgnc_complete_set_reduced.txt: 43547
Number of unique gene names in HUGO_genes.txt: 11815
Number of common gene symbols found: 8654
Output stored in OUTPUT/intersection_output.txt
$ python3 intersection.py -i1 gene_age.txt -i2 chr21_genes.txt # the N's below are an integer and bolded for illustration only
Number of unique gene names in gene_age.txt: 307
Number of unique gene names in chr21_genes.txt: 285
Number of common gene symbols found: 4
Output stored in OUTPUT/intersection_output.txt
- Pay attention to the outputs above
- Do not forget to implement the my_io.py module and place that in a subdirectory of assignment4, i.e. assignment4/assignment4/my_io.py
- Whenever you need to open a filehandle in any of your programs the code should call the my_io.get_fh function frommy_io.py
- Make sure to include documentation for the my_io.py module
- Make sure to include the __init__.py (see above)
- Do not forget to implement the test_my_io.py test script and put that in a subdirectory of assignment4, i.e. assignment4/tests/unit/test_my_io.py
- create a .coveragerc file in the working directory like I showed you above and submit that with your final submission
- To receive credit for your test_my_io.py, you must achieve coverage >=70%.
- Use relative paths for outputs, i.e. do not use absolute paths like: /home/your_user_name/programming6200/assignment4/OUTPUT
- Instead, use relative paths like outfile = "OUTPUT/intersection_output.txt"
- Make sure your programs run with defaults on Defiance, but also have the options provided above
- HINT: If you use the same data structure in problems 1 and 2 (like a Dictionary of Dictionaries) or have the same code to get the data, wrap it up in another Module.
- This is not required, but great practic
Trending now
This is a popular solution!
Step by step
Solved in 3 steps with 2 images