Write a program (call it intersection.py) that finds all gene symbols that appear both in the chr21_genes.txt file and in the HUGO_genes.txt file. These gene symbols should be printed to a file in alphabetical order (you can hard code the output file OUTPUT/intersection_output.txt) . The program should also print on the terminal how many common gene symbols were found. Use Lists or Sets to solve the problem. It is fine to us

Computer Networking: A Top-Down Approach (7th Edition)
7th Edition
ISBN:9780133594140
Author:James Kurose, Keith Ross
Publisher:James Kurose, Keith Ross
Chapter1: Computer Networks And The Internet
Section: Chapter Questions
Problem R1RQ: What is the difference between a host and an end system? List several different types of end...
icon
Related questions
Question

Write a program (call it intersection.py) that finds all gene symbols that appear both in the chr21_genes.txt file and in the HUGO_genes.txt file. These gene symbols should be printed to a file in alphabetical order (you can hard code the output file OUTPUT/intersection_output.txt) . The program should also print on the terminal how many common gene symbols were found. Use Lists or Sets to solve the problem. It is fine to use a temporary Dictionary to find the intersection of two Lists, but this can be simplified with Sets. Note: HUGO_genes.txt could have some duplicate entries.

Remember to have these command line options:

$ python3 intersection.py -h usage: intersection.py [-h] -i1 FILE1 -i2 FILE2 Provide two gene list (ignore header line), find intersection optional arguments: -h, --help show this help message and exit -i1 INFILE1, --infile1 INFILE1 Gene list 1 to open -i2 INFILE2, --infile2 INFILE2 Gene list 2 to open

$ python3 intersection.py -i1 chr21_genes.txt -i2 HUGO_genes.txt # the N's below are an integer and bolded for illustration only

Number of unique gene names in chr21_genes.txt: N
Number of unique gene names in HUGO_genes.txt: N
Number of common gene symbols found: N
Output stored in OUTPUT/intersection_output.txt

STDOUT is shown above, and the actual output of the intersection goes to the file (OUTPUT/intersection_output.txt) from this program:ABCG1 ADAMTS1 ADAMTS5
...
...
... If you implemented intersection.py correctly, this program could take any gene file that has the gene in the first column (even if it's the only column)
(additional examples: hgnc_complete_set_reduced.txt (Links to an external site.) and gene_age.txt (Links to an external site.))

$ python3 intersection.py -i1 hgnc_complete_set_reduced.txt -i2 HUGO_genes.txt # the N's below are an integer and bolded for illustration only

Number of unique gene names in hgnc_complete_set_reduced.txt: 43547
Number of unique gene names in HUGO_genes.txt: 11815
Number of common gene symbols found: 8654
Output stored in OUTPUT/intersection_output.txt

$ python3 intersection.py -i1 gene_age.txt -i2 chr21_genes.txt # the N's below are an integer and bolded for illustration only

Number of unique gene names in gene_age.txt: 307
Number of unique gene names in chr21_genes.txt: 285
Number of common gene symbols found: 4
Output stored in OUTPUT/intersection_output.txt

  • Pay attention to the outputs above
  • Do not forget to implement the my_io.py module and place that in a subdirectory of assignment4, i.e. assignment4/assignment4/my_io.py
    • Whenever you need to open a filehandle in any of your programs the code should call the my_io.get_fh function frommy_io.py
    • Make sure to include documentation for the my_io.py module
    • Make sure to include the __init__.py (see above)
  • Do not forget to implement the test_my_io.py test script and put that in a subdirectory of assignment4, i.e. assignment4/tests/unit/test_my_io.py
  • create a .coveragerc file in the working directory like I showed you above and submit that with your final submission
  • To receive credit for your test_my_io.py, you must achieve coverage >=70%.
  • Use relative paths for outputs, i.e. do not use absolute paths like: /home/your_user_name/programming6200/assignment4/OUTPUT
  • Instead, use relative paths like outfile = "OUTPUT/intersection_output.txt"
  • Make sure your programs run with defaults on Defiance, but also have the options provided above
  • HINT: If you use the same data structure in problems 1 and 2 (like a Dictionary of Dictionaries) or have the same code to get the data, wrap it up in another Module.
    • This is not required, but great practic
Expert Solution
trending now

Trending now

This is a popular solution!

steps

Step by step

Solved in 3 steps with 2 images

Blurred answer
Recommended textbooks for you
Computer Networking: A Top-Down Approach (7th Edi…
Computer Networking: A Top-Down Approach (7th Edi…
Computer Engineering
ISBN:
9780133594140
Author:
James Kurose, Keith Ross
Publisher:
PEARSON
Computer Organization and Design MIPS Edition, Fi…
Computer Organization and Design MIPS Edition, Fi…
Computer Engineering
ISBN:
9780124077263
Author:
David A. Patterson, John L. Hennessy
Publisher:
Elsevier Science
Network+ Guide to Networks (MindTap Course List)
Network+ Guide to Networks (MindTap Course List)
Computer Engineering
ISBN:
9781337569330
Author:
Jill West, Tamara Dean, Jean Andrews
Publisher:
Cengage Learning
Concepts of Database Management
Concepts of Database Management
Computer Engineering
ISBN:
9781337093422
Author:
Joy L. Starks, Philip J. Pratt, Mary Z. Last
Publisher:
Cengage Learning
Prelude to Programming
Prelude to Programming
Computer Engineering
ISBN:
9780133750423
Author:
VENIT, Stewart
Publisher:
Pearson Education
Sc Business Data Communications and Networking, T…
Sc Business Data Communications and Networking, T…
Computer Engineering
ISBN:
9781119368830
Author:
FITZGERALD
Publisher:
WILEY