BIO257_PS3.2023

txt

School

University of Rochester *

*We aren’t endorsed by this school

Course

257W

Subject

Biology

Date

Jan 9, 2024

Type

txt

Pages

2

Uploaded by MagistrateViper3914

Report
Problem Set 3 BIO 257/457 Due 9/28/23 You may discuss this problem set with your classmates, but the work must be done independently. Please refer to the Academic honesty statement on Blackboard for additional details. Be sure to include your modified versions of the python scripts for each question and their output in the tar file that you submit. Please annotate your code with comments using ‘#’. *DO ALL WORK IN A FOLDER TITLED 'user_id.PS3' IN YOUR OWN DIRECTORY: /scratch/bio257_2023/Users/user_id/* hint: You will need to use the 'mkdir' command to create this directory Please paste your command lines for each answer below. To turn in your assignment, move your tarfile (see #4) to /scratch/bio257_2023/Assignment_dump/PS3. Hints: refer to the Lab handout for example code that you can modify! Total points: 9 1) The file /scratch/bio257_2023/Module6-Lab/gene_lengths.txt has the format: UniqueIdentifier=gene.name=CommonGeneName GeneLength Use an awk command to create a new file named ‘only.lengths.txt’ that contains all of the CommonGeneName (i.e. what follows gene.name=) and the GeneLength. For example, “FBgn0000210.type=gene.name=br 89738” should be printed as “br 89738”. awk '{FS = ".name="}{print $2}' /scratch/bio257_2023/Module6-Lab/gene_lengths.txt > only.lengths.txt 2) The fasta headers in /scratch/bio257_2023/Example_data/reference.genome/dmel- all-chromosome-r6.13.NoN.fasta have ";" separating fields with information. Use awk or python to print out only the "loc" field. E.g. loc=2R:1..25286936. Redirect your output to a file. (Hint: modify the example script parse_fasta_header.py). awk '{FS = ";"}{print $2}' /scratch/bio257_2023/Example_data/reference.genome/dmel- all-chromosome-r6.13.NoN.fasta > only.loc.txt 3) The following genome file is a fasta file that contains headers with sequences on a single line:/scratch/bio257_2023/Example_data/reference.genome/dmel-all- chromosome-r6.13.NoN.fasta. Use grep to create a new fasta file that only has the major chromosome arms: 2L, 2R, 3L, 3R, 4, X and Y. NOTE: a proper fasta file contains both headers AND sequence, E.G. >header AGTGTCGTCGT ..... 4) Make a tarfile of your directory that includes your user_id and *move* (not copy) this tarfile to the following directory: /scratch/bio257_2023/Assignment_dump/PS3. Hint: you can use the command ‘tar -tvf mytarfile.tar’ to list the contents of your
tar file after archiving your folder. Use this command to check that your tar file has all of its contents before you move it to the Assignment_dump/PS3 directory. I have worked with the following students on this assignment: ___________________________________________________________________________________ ___________________________________________________________________________________ ______________
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help