Group Project I Fall 2022(1) (2)

docx

School

George Washington University *

*We aren’t endorsed by this school

Course

6305

Subject

Computer Science

Date

Dec 6, 2023

Type

docx

Pages

5

Uploaded by SuperHumanPartridgePerson470

Report
DNSC 6305 Dr. Ali Obaidi Group Project II DNSC 6305 Section 80/81 Fall 2022 Write one Jupyter notebook with your solutions to all of the following problems. Document each step of your process in a reproducible manner, including downloads and other file changes. When you are finished, your instructors and anyone else should be able to run your notebook from start to finish without error. Add text notes on your process as appropriate, documenting any assumptions and explaining key decisions you make along the way. Use markdown cells for this text, formatting your notes so they are easy to read. Be sure to answer each question directly and precisely, using the data to justify your answers, and showing all of your work along the way. This is a group project. Please work in your assigned groups. Only one group member, the lead, should submit on behalf of all group members. As always, you are welcome to seek and give assistance to others who might become stuck along the way. Please acknowledge any assistance you receive. At the same time, each group must perform and submit its own work, in accordance with the GWU Code of Academic Integrity. This assignment is due on Monday, Oct 31, at 4 pm. In addition to your Jupyter notebook, we would like you to create a pdf file of your main Jupyter notebook showing all comments and output. Also, the group lead needs to fill-in the participation detail table provided below. So, three files are required Project_I_Group_#.ipynb (Jupyter Notebook) Project_I_Group_#.pdf (pdf version of Jupyter Notebook) Participation_detail_Group_#.pdf (see participation requirements at the end of this file) No need to attach publicly available data files in your final answer. 1
DNSC 6305 Dr. Ali Obaidi 2
DNSC 6305 Dr. Ali Obaidi This project is a continuation from Assignment II. Problem 1 – Finalizing your Schema and Physical model (10 points) Discuss among your group the ER model and schema designs you have created in Assignment II and come up, as a group, with a final schema diagram that is agreed by all members. Make sure that you document (in your Jupyter notebook) any disagreements among the group and how you resolved them. If a disagreement can’t be resolved, Group lead should seek a meeting with me and all group members. Problem 2 – Creating your Database Objects (20 points) Using the agreed schema diagram from Problem 1, create the physical tables for all entities and their relationships. Make sure the table structures contain all primary, foreign, and unique keys (if applicable) as well as any default, and/or check constraints. The tables must also indicate the not null constraint for all applicable attributes. Provide comments on your tables and their attributes All entities from Assignment II are required. Problem 3 – Constructing your Database – Bulk Data Loading (20 points) Once you have created the tables and their constraints, construct (i.e., populate) the tables in bulk using the csv data files provided to you in assignment II. Problem 4 – checking your data (10 points) For all tables, find the total number of rows loaded. Check your answer (using select queries) against the original text files (using Linux or csvkit commands) Problem 5 – Sponsor request to add (15 points) Your sponsor asked you to include extra information into your data model and database. The information is related to property and their loss type and description. The Sponsor provided four files for this purpose: the NIBRS_PROPERTY.csv and NIBRS_PROP_LOSS_TYPE.csv, NIBRS_PROP_DESC_TYPE.csv, and NIBRS_PROPERTY_DESC.csv. Study the four files and provide your assessment on the four files and your recommendation on how to proceed with this request. Problem 6 – Basic Data analysis (25 points) Write a query to determine the top 5 offenses and bottom 5 offenses in terms of the total number of offenses committed. In the query provide the Offense code, offense name, Offense category, and the number of offenses. 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
DNSC 6305 Dr. Ali Obaidi Write a query to find the distribution of victims by sex and race and offense location. Only include victims of type “Individual”. In the query, provide the sex, race, location, and count of offenses Write a query to provide a list of victims who suffered at least 4 offenses in an incident. For each victim, provide victim id, sex, race, ethnicity, and location. Write a query to provide a list of incidents with more than 9 victims. Provide a distribution of the victims (counts) based on their sex, race and ethnicity 4
DNSC 6305 Dr. Ali Obaidi The lead for all groups needs to fill the following table Participation Details: Submission Date: < Enter Submission Date and Time > Group Lead for this assignment: < Enter first, last name of the group lead for this assignment > All Participants: Fill the following table: (Group lead responsibility) Student Name Questio n No Group Participation in discussions (min)/week Final submission date Percent Participation in discussions (0-100) Percent Participation in final proof reading and editing First, last Q1 Ideally no less than 60 min/week MM/DD/YY Ideally 100 if participated in all discussions Ideally 100 if you have proof read the final paper and provided edits Q2 Q3 .. .. The group has used the following tools for discussions: < list tools here (e.g. blogs, wikis, email, phone.) > 5