Question 1. Let's clean up our data. First, filter out the HOM, HYP, and NEP rows from the table for the reasons described in the above paragraph. Next, join together the abbreviations table and our causes of death table so that we have a more detailed discription of each disease in each row. Lastly, drop the columr which contains the acronym of the disease, and rename the column with the full description 'Cause of Death'. Assign the variable cleaned causes to the resulting table.

Database System Concepts
7th Edition
ISBN:9780078022159
Author:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Chapter1: Introduction
Section: Chapter Questions
Problem 1PE
icon
Related questions
Question
How can I solve with Python Numpy? The tables are given. Only the instructions should be done in the exercise.
Question 1. Let's clean up our data. First, filter out the HOM, HYP, and NEP rows from the table for the reasons described in the above paragraph. Next, join
together the abbreviations table and our causes of death table so that we have a more detailed discription of each disease in each row. Lastly, drop the column
which contains the acronym of the disease, and rename the column with the full description 'Cause of Death'. Assign the variable cleaned_causes to the
resulting table.
n [147]: causes.drop('HOM','HYP', 'NEP')
abbreviations.drop('HOM', 'HYP','NEP')
cleaned causes - (causes, 'Cause of Death', abbreviations)
abbreviations.relabeled('Cause of Death (Full Description)', 'Cause of Death')
cleaned causes=causes.drop(Cause of Death (Full Description)
cleaned_causes
File "<ipython-input-147-df2a08c571a0>", line 6
cleaned causes=causes.drop(Cause of Death (Full Description)
SyntaxError: invalid syntax
We're going to examine the changes in causes of death over time. To make a plot of those numbers, we need to have a table with one row per year, and the
information about all the causes of death for each year.
Question 2. Create a table with one row for each year and a column for each kind of death, where each cell contains the number of deaths by that cause in
that year. Call the table cleaned_causes_by_year.
In [ ]: cleaned_causes_by_year = ...
cleaned causes_by_year.show()
Question 3. Make a plot of all the causes of death by year, using your cleaned-up version of the dataset. There should be a single plot with one line per cause
of death.
Hint: Use the Table method plot. If you pass only a single argument, a line will be made for each of the other columns.
In [ ]:...
After seeing the plot above, we would now like to examine the distributions of diseases over the years using percentages. Below, we have assigned
distributions to a table with all of the same columns, but the raw counts in the cells are replaced by the percentage of the the total number of deaths by a
particular disease that happened in that specific year.
Try to understand the code below.
In [8]: def percents (array_x):
return np.round ( (array_x/sum(array_x))*100, 2)
labels - cleaned_causes_by_year.labels
distributions - Table().with_columns(labels[0], cleaned_causes_by_year.column(@),
labels[1], percents (cleaned_causes_by_year.column(1)),
labels[2], percents(cleaned_causes_by_year.column(2)),
labels[3], percents(cleaned_causes_by_year.column(3)),
labels[4], percents(cleaned_causes_by_year.column(4)),
labels[5], percents (cleaned_causes_by_year.column(5)),
labels[6], percents(cleaned_causes_by_year.column(6)),
labels[7], percents(cleaned_causes_by_year.column(7)),
labels[8], percents(cleaned_causes_by_year.column(8)),
labels[9], percents(cleaned_causes_by_year.column(9)),
labels[10], percents (cleaned_causes_by_year.column(1e)),
labels[11], percents (cleaned_causes_by_year.column(11)))
distributions.show()
Moma
Transcribed Image Text:Question 1. Let's clean up our data. First, filter out the HOM, HYP, and NEP rows from the table for the reasons described in the above paragraph. Next, join together the abbreviations table and our causes of death table so that we have a more detailed discription of each disease in each row. Lastly, drop the column which contains the acronym of the disease, and rename the column with the full description 'Cause of Death'. Assign the variable cleaned_causes to the resulting table. n [147]: causes.drop('HOM','HYP', 'NEP') abbreviations.drop('HOM', 'HYP','NEP') cleaned causes - (causes, 'Cause of Death', abbreviations) abbreviations.relabeled('Cause of Death (Full Description)', 'Cause of Death') cleaned causes=causes.drop(Cause of Death (Full Description) cleaned_causes File "<ipython-input-147-df2a08c571a0>", line 6 cleaned causes=causes.drop(Cause of Death (Full Description) SyntaxError: invalid syntax We're going to examine the changes in causes of death over time. To make a plot of those numbers, we need to have a table with one row per year, and the information about all the causes of death for each year. Question 2. Create a table with one row for each year and a column for each kind of death, where each cell contains the number of deaths by that cause in that year. Call the table cleaned_causes_by_year. In [ ]: cleaned_causes_by_year = ... cleaned causes_by_year.show() Question 3. Make a plot of all the causes of death by year, using your cleaned-up version of the dataset. There should be a single plot with one line per cause of death. Hint: Use the Table method plot. If you pass only a single argument, a line will be made for each of the other columns. In [ ]:... After seeing the plot above, we would now like to examine the distributions of diseases over the years using percentages. Below, we have assigned distributions to a table with all of the same columns, but the raw counts in the cells are replaced by the percentage of the the total number of deaths by a particular disease that happened in that specific year. Try to understand the code below. In [8]: def percents (array_x): return np.round ( (array_x/sum(array_x))*100, 2) labels - cleaned_causes_by_year.labels distributions - Table().with_columns(labels[0], cleaned_causes_by_year.column(@), labels[1], percents (cleaned_causes_by_year.column(1)), labels[2], percents(cleaned_causes_by_year.column(2)), labels[3], percents(cleaned_causes_by_year.column(3)), labels[4], percents(cleaned_causes_by_year.column(4)), labels[5], percents (cleaned_causes_by_year.column(5)), labels[6], percents(cleaned_causes_by_year.column(6)), labels[7], percents(cleaned_causes_by_year.column(7)), labels[8], percents(cleaned_causes_by_year.column(8)), labels[9], percents(cleaned_causes_by_year.column(9)), labels[10], percents (cleaned_causes_by_year.column(1e)), labels[11], percents (cleaned_causes_by_year.column(11))) distributions.show() Moma
Out[28]:
Year ZIP Code Cause of Death Count
Location
1999
90002
SUI
1
(33.94969, -118.246213)
1999
90005
HÔM
1
(34.058508, -118.301197)
1999
90006
ALZ
1 (34.049323, -118.291687)
1999
90007
ALZ
1 (34.029442, -118.287095)
1999
90009
DIA
1
(33.9452, -118.3832)
1999
90009
LIV
1
(33.9452, -118.3832)
1999
90009
OTH
1
(33.9452, -118.3832)
1999
90010
STK
1 (34.060633, -118.302664)
1999
90010
CLD
1 (34.060633, -118.302664)
1999
90010
DIA
1 (34.060633, -118.302664)
(320142 rows omitted)
The causes of death in the data are abbreviated. We've provided a table called abbreviations.csv to translate the
In [29]: abbreviations = Table.read_table('abbreviations.csv')
abbreviations.show()
Cause of Death
Cause of Death (Full Description)
AID
Acquired Immune Deficiency Syndrome (AIDS)
ALZ
Alzheimer's Disease
CAN
Malignant Neoplasms (Cancers)
CLD
Chronic Lower Respiratory Disease (CLRD)
CPD
Chronic Obstructive Pulmonary Disease (COPD)
DIA
Diabetes Mellitus
HIV
Human Immunodeficiency Virus Disease (HIVD)
HOM
Homicide
HTD
Diseases of the Heart
HYP Essential Hypertension and Hypertensive Renal Disease
INJ
Unintentional Injuries
LIV
Chronic Liver Disease and Cirrhosis
NEP
Kidney Disease (Nephritis)
OTH
All Other Causes
PNF
Pneumonia and Influenza
STK
Cerebrovascular Disease (Stroke)
SUI
Intentional Self Harm (Suicide)
Transcribed Image Text:Out[28]: Year ZIP Code Cause of Death Count Location 1999 90002 SUI 1 (33.94969, -118.246213) 1999 90005 HÔM 1 (34.058508, -118.301197) 1999 90006 ALZ 1 (34.049323, -118.291687) 1999 90007 ALZ 1 (34.029442, -118.287095) 1999 90009 DIA 1 (33.9452, -118.3832) 1999 90009 LIV 1 (33.9452, -118.3832) 1999 90009 OTH 1 (33.9452, -118.3832) 1999 90010 STK 1 (34.060633, -118.302664) 1999 90010 CLD 1 (34.060633, -118.302664) 1999 90010 DIA 1 (34.060633, -118.302664) (320142 rows omitted) The causes of death in the data are abbreviated. We've provided a table called abbreviations.csv to translate the In [29]: abbreviations = Table.read_table('abbreviations.csv') abbreviations.show() Cause of Death Cause of Death (Full Description) AID Acquired Immune Deficiency Syndrome (AIDS) ALZ Alzheimer's Disease CAN Malignant Neoplasms (Cancers) CLD Chronic Lower Respiratory Disease (CLRD) CPD Chronic Obstructive Pulmonary Disease (COPD) DIA Diabetes Mellitus HIV Human Immunodeficiency Virus Disease (HIVD) HOM Homicide HTD Diseases of the Heart HYP Essential Hypertension and Hypertensive Renal Disease INJ Unintentional Injuries LIV Chronic Liver Disease and Cirrhosis NEP Kidney Disease (Nephritis) OTH All Other Causes PNF Pneumonia and Influenza STK Cerebrovascular Disease (Stroke) SUI Intentional Self Harm (Suicide)
Expert Solution
trending now

Trending now

This is a popular solution!

steps

Step by step

Solved in 3 steps with 6 images

Blurred answer
Knowledge Booster
Fundamentals of Multithreaded Algorithms
Learn more about
Need a deep-dive on the concept behind this application? Look no further. Learn more about this topic, computer-science and related others by exploring similar questions and additional content below.
Recommended textbooks for you
Database System Concepts
Database System Concepts
Computer Science
ISBN:
9780078022159
Author:
Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:
McGraw-Hill Education
Starting Out with Python (4th Edition)
Starting Out with Python (4th Edition)
Computer Science
ISBN:
9780134444321
Author:
Tony Gaddis
Publisher:
PEARSON
Digital Fundamentals (11th Edition)
Digital Fundamentals (11th Edition)
Computer Science
ISBN:
9780132737968
Author:
Thomas L. Floyd
Publisher:
PEARSON
C How to Program (8th Edition)
C How to Program (8th Edition)
Computer Science
ISBN:
9780133976892
Author:
Paul J. Deitel, Harvey Deitel
Publisher:
PEARSON
Database Systems: Design, Implementation, & Manag…
Database Systems: Design, Implementation, & Manag…
Computer Science
ISBN:
9781337627900
Author:
Carlos Coronel, Steven Morris
Publisher:
Cengage Learning
Programmable Logic Controllers
Programmable Logic Controllers
Computer Science
ISBN:
9780073373843
Author:
Frank D. Petruzella
Publisher:
McGraw-Hill Education