Lab 5 - Health Risks - Instructions-3
docx
keyboard_arrow_up
School
University of Illinois, Urbana Champaign *
*We aren’t endorsed by this school
Course
MISC
Subject
Statistics
Date
Apr 3, 2024
Type
docx
Pages
4
Uploaded by CorporalDragon7961
Lab 5 – Comparing Health Risks
NAME 1 – NETID
NAME 2 – NETID [if applicable]
NAME 3 – NETID [if applicable]
Formatting Instructions
-
Please submit your lab report as a pdf to Gradescope.
-
When you upload to Gradescope, please match pages
with the question number
.
-
Be sure that all group members
are added
in your submission to Gradescope (click view/edit
group on the top right of the page once shown your final submission after matching pages).
Assignment Overview
-
We’ll be investigating the heart dataset, which collected data on the health factors of 303 patients being screened for heart disease. We’ll use this data to address the following three research questions (one on each page):
o
Do people with fasting blood sugar levels above 120 mg/dL have a
higher risk for heart disease?
o
Do people who have experienced an exercise induced angina have a higher
risk for heart disease? o
Do people who experience exercise induced anginas have
different
cholesterol levels on average? STEP 0
-
Pre-lab work
o
Complete the pre-lab tutorial (Comparing Groups) for Lab 5 first: https://stat212-learnr.stat.illinois.edu/
-
Download the heart.csv file to your computer and then
import
into your RStudio session.
-
Create a new
R script (or use the
RMarkdown file if you are using that option)
-
Remember to library(tidyverse)
so that you can use the ggplot function!
Variables
Each row of this dataset represents one patient being screened, and the following variables were documented for each patient:
-
age: age in years
-
sex:
biological sex (0 if female, 1 if male)
-
cp
: chest pain type (0 if typical angina, 1 if atypical angina, 2 if non-anginal pain, 3 if asymptomatic)
-
exang: binary variable documenting whether patient experienced exercise induced angina
-
trestbps:
resting systolic blood pressure (in mm/Hg on admission to hospital)
-
chol:
serum cholesterol (mg/dL)
-
fbs: binary variable documenting whether fasting blood sugar was high (“yes” if > 120 mg/dL and “no” if <= 120 mg/dL)
-
restecg:
resting electrocardiographic results (0 if normal, 1 if having ST-T wave abnormality, 2 if showing probable or
definite left ventricular hypertrophy)
-
thalach:
maximum heart rate achieved
-
oldpeak: ST depression induced by exercise relative to rest
-
slope:
the slope of the peak exercise ST segment
-
ca:
number of major vessels (0-3) colored by flourosopy
-
target:
Whether patient was found to have angiographic disease status (heart disease) as determined by amount of blood vessel narrowing (“positive” if heart disease diagnosis, “negative” if no heart disease diagnosis)
Research Question 1:
Do people who are diabetic (fasting blood sugar levels above 120 mg/dL) have a
higher risk for heart disease?
Question 1 (5pts)
: Let’s first investigate visually. Create a 100% stacked barplot
to compare the proportion of patients with heart disease based on whether their fasting blood sugar level was above 120 mg/dL.
Include an image of your barplot in the report and Include your R code
-
One bar should represent those who are diabetic, and the other should represent those who are not. The bar should be shaded to reflect what proportion in each group have heart disease.
-
Give the bars a black border, and adjust the width to be between 0.2 and 0.5
-
Add an appropriate x axis label, y axis label, and title. Question 2 (5pts)
: Now, let’s use a test for two proportions to make a statistical inference. Using a pipe, create
a frequency table to get counts of how many people have or don’t have heart disease based on whether they are diabetic or not.
Copy or screenshot the frequency table into your report and Include your R code
-
If done correctly, this table will have 4 rows. -
You can screenshot in the table exactly as it appears in R output, or you can re-format it in your document if you wish to.
Run a proportions test to answer research question 1 and Include your R code.
-
Tip:
Is this a directional or non-directional test? Read the research question again!
-
Remember that you need to enter two vectors into your code, the first vector includes the numbers in each group who have heart disease, and the second vector includes the totals for each group.
-
Copy+paste or screenshot the summary output from your proportions test.
In your own words
, interpret the results and make a conclusion in context. A full response should:
-
Identify the proportion with heart disease in each group
-
Identify the p-value
-
Briefly summarize your answer to our first research question using these results.
Research Question 2:
Do people who have experienced an exercise induced angina have a higher
risk for heart disease? Question 3 (5pts)
: Repeat the procedures for Question 1,
but with this new predictor variable.
Include an image of your 100% stacked barplot in the report and Include your R code
Question 4 (5pts):
Follow the same procedures in Question 2 to address our second research question statistically.
Copy or screenshot the frequency table into your report and Include your R code
Run a proportions test designed to answer your second research question and Include your R code.
In your own words
, interpret the results and make a conclusion in context (same as Question 2).
Question 5 (5pts):
Let’s now estimate the relative risks for heart disease for the two situations we explored. Rather than code this computationally in R, we will use an online calculator
! See Calculator link below:
Report the relative risk (and 95% confidence interval)
for heart disease when patient is diabetic as compared to when they are not diabetic.
Report the relative risk (and 95% confidence interval)
for heart disease when the patient had experienced an exercise induced angina as compared to if they hadn’t.
Use any online calculator you’d like, but here is the calculator we used in class
: https://istats.shinyapps.io/Association_Categorical/ . Web search “Art of Stat Web Apps” and choose “Association Between Two Categorical Variables”.
Choose contingency table setup in the first drop-down.
Have your rows represent your explanatory variables (diabetes status, or angina status), and columns represent your response variable (heart disease status). Labels are optional, but might be helpful to keep track of what to plug in!
Be sure to choose ratio of proportions from the drop-down and check the “95% confidence interval” option below.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Let’s next consider cholesterol levels as our target response variable. Notice that cholesterol will be a numeric
variable, so our approach to this question will be slightly different. Research Question 3
: Do people who experience exercise-induced anginas have
different
cholesterol levels on average? Let’s say the researchers believe either a drop or an increase in cholesterol is possible and noteworthy
to report!
Question 6 (5pts)
: Create a jittered plot
to compare cholesterol levels between the angina and no angina groups. Include an image of your jittered plot in the report and Include your R code
-
Keep the width of your jitter small (like between 0.02 and 0.10)
-
Color each group of points differently (one color for “No” and one color for “Yes”)
-
Add an appropriate x axis label, y axis label, and title
Question 7 (5pts)
: Complete a t-test
to address the research question posed. Even though we have enough observations such that a z-test would be reasonable, it’s easier in R to just run a t-test, and the results will be approximately the same! We will not
assume equal variances (software can handle this situation easier, and this is the “safer” testing option).
Copy or screenshot the summary output from your t-test
In your own words
, interpret the results and make a conclusion in context. A full response should:
-
Identify the average cholesterol level for each group
-
Identify the p-value
-
Briefly summarize how this result helps you address research question 3.
Related Documents
Related Questions
Describe the similarities and differences between a research proposal and a research report.
arrow_forward
Create a side-by-side boxplot for vitamin D level vs. NewAge and a side-
by-side boxplot for vitamin D level vs. country.
Create a scatterplot to show the relationship between vitamin D level
and Age.
Compare these two side-by-side boxplots and the scatterplot and explain
your findings.
• Note: Write appropriate captions for the tables, graphs, and outputs.
arrow_forward
Here is a set of 20 values in order:
1, 2, 5, 10, 11, 12, 14, 15, 15, 16, 18, 19, 20, 21, 22, 24, 25, 28, 32, 45
Do not use software for this question. Using your pencil, paper, and perhaps your calculator, determine the values for the 5 Number Summary for this set of data and type your answer into the essay text box.
arrow_forward
How to determine the number of observation in the model?
arrow_forward
Draw a histogram for the data. Use a class width of 15. Be sure to include the screenshot of Excel of your answers and formulas/command that you use.
arrow_forward
Since 1994, the Best Roller Coaster Poll has been ranking the world's roller coasters. In 2013, Bizarro dropped to 4th after eaming the top steel coaster rank for six straight years. Data on 11 of the top steel coasters from this poll are presented in the accompanying lable
What do these data indicate about the Length of the track and the Duration of the ride you can expect?
E Click the icon to view the steel coaster data.
Construct a scatterplot of the data. Choose the correct plot below.
A.
OB.
O D.
Q
240
240
240-
240
40
2000 6000
Length (A)
40
2000 6000 2
40
2000
6000 (2
Length ()
40
2000
Length ()
6000 (
Length (n)
Find the correlation coefficient.
Steel coaster data
(Round to three decimal places as
Маx
Max Vert
Duration Height Speed
(sec)
Length
(ft)
Initial
Angle
Rank Roller Coaster
(mph) (degrees)
Park
1 Expedition GeForce Holiday Park
Six Flags
Kings Dominion
SF New England
Hersheypark
Cedar Point
PortAventua
Six Flags
Mirabilandia
Location
Drop (ft)
(ft)
DE
184
75
188
74.6
82…
arrow_forward
Describe about the interpret decision in terms of the original research problem.
arrow_forward
please show all steps and please write out complete SENTENCES as to how you did each step. And explaining what you did to get the answer!!! Thanks
arrow_forward
Download the file interceptor.ipynb. Instructions how to download notebook files are posted here. Open this file in Jupyter Notebook and play the interceptor game. Once
you win, enter below the data displayed on the final game screen.
You can also click on the Binder button below to launch an interactive version of the interceptor notebook. The game may take a a couple minutes to load and it may run slowly in
Binder.
launch binder
Enter the intial target position in the form [p1, p2]:
50
40-
30-
20-
10-
time: 3.20
target distance: 211.95
missile en route
0
-300
-200
-100
0
100
200
300
Select missile launch time to
arrow_forward
please provide CORRECT answers! Thank you.
arrow_forward
The price per share of stock for a sample of companies was recorded at the beginning of the first financial quarter and then again at the end of the first financial quarter. How stocks perform during the first quarter is an indicator of what is ahead for the stock market and the economy. Use the sample data in the file StockQuarter to answer the following.
Click on the datafile logo to reference the data.
Beginning of 1st end of 1stCompany Quarter QuarterBank of New York Mellon 53.82 54.19 Kraft Foods 77.02 55.70 E.I. du Pont de Nemours and Company 109.00 102.57 Consolidated Edison 83.58 79.95 Johnson & Johnson 139.23 126.01 Union Pacific…
arrow_forward
What type of variable is highest education degree completed
arrow_forward
I have attached the screenshot of the question and the screenshot of the data that is in shuttlemission file. Please answer the questions and provide the screenshots of the outputs as well. Please briefly explain your answers. And provide step to step screenshot of the JASP. Thanks
arrow_forward
What are some of the instruments that can be used to conduct a research project? How does one select the correct one?
arrow_forward
IQR for data set
41, 49, 55, 82, 84, 85, 93, 103, 113, 121, 126, 127, 136, 136, 155, 166, 169, 178, 193, 204, 445
arrow_forward
The Ministry of Tourism in Trinidad and Tobago is interested in developing a campaign to increase the number of visitors to the island. The Ministry in collaboration with the island’s hotels collected data to be used as a guide to determine what steps should be taken going forward. Using the data in the Microsoft Excel file attached you are required to use the knowledge you have acquired during the semester to answer the following question. Ensure that your responses are detailed and all the necessary steps are clearly outlined.
Please note that JASP should only be used if stated in the question.
The Description of the variables in the data set is given below
Name
Description
ID
Visitor Number (1-150)
Length_stay
Length of stay in the island (days)
Age_years
Age of visitor (years)
Return_pct
Average estimated probability of returning to the island
Attraction
Number of attractions visited in the island
Trip_ratio
Number of trips taken off…
arrow_forward
please show all steps and please write out complete SENTENCES as to how you did each step. And explaining what you did to get the answer!!! Thanks
arrow_forward
please answer in text form and in proper format answer with must explanation , calculation for each part and steps clearly
arrow_forward
Determine whether the value is from a discrete or continuous data set.Number of bacteria in a petri dish is 12,120Is the value from a discrete or continuous data set?
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you

Elementary Algebra
Algebra
ISBN:9780998625713
Author:Lynn Marecek, MaryAnne Anthony-Smith
Publisher:OpenStax - Rice University

Algebra: Structure And Method, Book 1
Algebra
ISBN:9780395977224
Author:Richard G. Brown, Mary P. Dolciani, Robert H. Sorgenfrey, William L. Cole
Publisher:McDougal Littell
Related Questions
- Describe the similarities and differences between a research proposal and a research report.arrow_forwardCreate a side-by-side boxplot for vitamin D level vs. NewAge and a side- by-side boxplot for vitamin D level vs. country. Create a scatterplot to show the relationship between vitamin D level and Age. Compare these two side-by-side boxplots and the scatterplot and explain your findings. • Note: Write appropriate captions for the tables, graphs, and outputs.arrow_forwardHere is a set of 20 values in order: 1, 2, 5, 10, 11, 12, 14, 15, 15, 16, 18, 19, 20, 21, 22, 24, 25, 28, 32, 45 Do not use software for this question. Using your pencil, paper, and perhaps your calculator, determine the values for the 5 Number Summary for this set of data and type your answer into the essay text box.arrow_forward
- How to determine the number of observation in the model?arrow_forwardDraw a histogram for the data. Use a class width of 15. Be sure to include the screenshot of Excel of your answers and formulas/command that you use.arrow_forwardSince 1994, the Best Roller Coaster Poll has been ranking the world's roller coasters. In 2013, Bizarro dropped to 4th after eaming the top steel coaster rank for six straight years. Data on 11 of the top steel coasters from this poll are presented in the accompanying lable What do these data indicate about the Length of the track and the Duration of the ride you can expect? E Click the icon to view the steel coaster data. Construct a scatterplot of the data. Choose the correct plot below. A. OB. O D. Q 240 240 240- 240 40 2000 6000 Length (A) 40 2000 6000 2 40 2000 6000 (2 Length () 40 2000 Length () 6000 ( Length (n) Find the correlation coefficient. Steel coaster data (Round to three decimal places as Маx Max Vert Duration Height Speed (sec) Length (ft) Initial Angle Rank Roller Coaster (mph) (degrees) Park 1 Expedition GeForce Holiday Park Six Flags Kings Dominion SF New England Hersheypark Cedar Point PortAventua Six Flags Mirabilandia Location Drop (ft) (ft) DE 184 75 188 74.6 82…arrow_forward
- Describe about the interpret decision in terms of the original research problem.arrow_forwardplease show all steps and please write out complete SENTENCES as to how you did each step. And explaining what you did to get the answer!!! Thanksarrow_forwardDownload the file interceptor.ipynb. Instructions how to download notebook files are posted here. Open this file in Jupyter Notebook and play the interceptor game. Once you win, enter below the data displayed on the final game screen. You can also click on the Binder button below to launch an interactive version of the interceptor notebook. The game may take a a couple minutes to load and it may run slowly in Binder. launch binder Enter the intial target position in the form [p1, p2]: 50 40- 30- 20- 10- time: 3.20 target distance: 211.95 missile en route 0 -300 -200 -100 0 100 200 300 Select missile launch time toarrow_forward
- please provide CORRECT answers! Thank you.arrow_forwardThe price per share of stock for a sample of companies was recorded at the beginning of the first financial quarter and then again at the end of the first financial quarter. How stocks perform during the first quarter is an indicator of what is ahead for the stock market and the economy. Use the sample data in the file StockQuarter to answer the following. Click on the datafile logo to reference the data. Beginning of 1st end of 1stCompany Quarter QuarterBank of New York Mellon 53.82 54.19 Kraft Foods 77.02 55.70 E.I. du Pont de Nemours and Company 109.00 102.57 Consolidated Edison 83.58 79.95 Johnson & Johnson 139.23 126.01 Union Pacific…arrow_forwardWhat type of variable is highest education degree completedarrow_forward
arrow_back_ios
SEE MORE QUESTIONS
arrow_forward_ios
Recommended textbooks for you
- Elementary AlgebraAlgebraISBN:9780998625713Author:Lynn Marecek, MaryAnne Anthony-SmithPublisher:OpenStax - Rice UniversityAlgebra: Structure And Method, Book 1AlgebraISBN:9780395977224Author:Richard G. Brown, Mary P. Dolciani, Robert H. Sorgenfrey, William L. ColePublisher:McDougal Littell

Elementary Algebra
Algebra
ISBN:9780998625713
Author:Lynn Marecek, MaryAnne Anthony-Smith
Publisher:OpenStax - Rice University

Algebra: Structure And Method, Book 1
Algebra
ISBN:9780395977224
Author:Richard G. Brown, Mary P. Dolciani, Robert H. Sorgenfrey, William L. Cole
Publisher:McDougal Littell