Lab 5 - Health Risks - Instructions-3
docx
keyboard_arrow_up
School
University of Illinois, Urbana Champaign *
*We aren’t endorsed by this school
Course
MISC
Subject
Statistics
Date
Apr 3, 2024
Type
docx
Pages
4
Uploaded by CorporalDragon7961
Lab 5 – Comparing Health Risks
NAME 1 – NETID
NAME 2 – NETID [if applicable]
NAME 3 – NETID [if applicable]
Formatting Instructions
-
Please submit your lab report as a pdf to Gradescope.
-
When you upload to Gradescope, please match pages
with the question number
.
-
Be sure that all group members
are added
in your submission to Gradescope (click view/edit
group on the top right of the page once shown your final submission after matching pages).
Assignment Overview
-
We’ll be investigating the heart dataset, which collected data on the health factors of 303 patients being screened for heart disease. We’ll use this data to address the following three research questions (one on each page):
o
Do people with fasting blood sugar levels above 120 mg/dL have a
higher risk for heart disease?
o
Do people who have experienced an exercise induced angina have a higher
risk for heart disease? o
Do people who experience exercise induced anginas have
different
cholesterol levels on average? STEP 0
-
Pre-lab work
o
Complete the pre-lab tutorial (Comparing Groups) for Lab 5 first: https://stat212-learnr.stat.illinois.edu/
-
Download the heart.csv file to your computer and then
import
into your RStudio session.
-
Create a new
R script (or use the
RMarkdown file if you are using that option)
-
Remember to library(tidyverse)
so that you can use the ggplot function!
Variables
Each row of this dataset represents one patient being screened, and the following variables were documented for each patient:
-
age: age in years
-
sex:
biological sex (0 if female, 1 if male)
-
cp
: chest pain type (0 if typical angina, 1 if atypical angina, 2 if non-anginal pain, 3 if asymptomatic)
-
exang: binary variable documenting whether patient experienced exercise induced angina
-
trestbps:
resting systolic blood pressure (in mm/Hg on admission to hospital)
-
chol:
serum cholesterol (mg/dL)
-
fbs: binary variable documenting whether fasting blood sugar was high (“yes” if > 120 mg/dL and “no” if <= 120 mg/dL)
-
restecg:
resting electrocardiographic results (0 if normal, 1 if having ST-T wave abnormality, 2 if showing probable or
definite left ventricular hypertrophy)
-
thalach:
maximum heart rate achieved
-
oldpeak: ST depression induced by exercise relative to rest
-
slope:
the slope of the peak exercise ST segment
-
ca:
number of major vessels (0-3) colored by flourosopy
-
target:
Whether patient was found to have angiographic disease status (heart disease) as determined by amount of blood vessel narrowing (“positive” if heart disease diagnosis, “negative” if no heart disease diagnosis)
Research Question 1:
Do people who are diabetic (fasting blood sugar levels above 120 mg/dL) have a
higher risk for heart disease?
Question 1 (5pts)
: Let’s first investigate visually. Create a 100% stacked barplot
to compare the proportion of patients with heart disease based on whether their fasting blood sugar level was above 120 mg/dL.
Include an image of your barplot in the report and Include your R code
-
One bar should represent those who are diabetic, and the other should represent those who are not. The bar should be shaded to reflect what proportion in each group have heart disease.
-
Give the bars a black border, and adjust the width to be between 0.2 and 0.5
-
Add an appropriate x axis label, y axis label, and title. Question 2 (5pts)
: Now, let’s use a test for two proportions to make a statistical inference. Using a pipe, create
a frequency table to get counts of how many people have or don’t have heart disease based on whether they are diabetic or not.
Copy or screenshot the frequency table into your report and Include your R code
-
If done correctly, this table will have 4 rows. -
You can screenshot in the table exactly as it appears in R output, or you can re-format it in your document if you wish to.
Run a proportions test to answer research question 1 and Include your R code.
-
Tip:
Is this a directional or non-directional test? Read the research question again!
-
Remember that you need to enter two vectors into your code, the first vector includes the numbers in each group who have heart disease, and the second vector includes the totals for each group.
-
Copy+paste or screenshot the summary output from your proportions test.
In your own words
, interpret the results and make a conclusion in context. A full response should:
-
Identify the proportion with heart disease in each group
-
Identify the p-value
-
Briefly summarize your answer to our first research question using these results.
Research Question 2:
Do people who have experienced an exercise induced angina have a higher
risk for heart disease? Question 3 (5pts)
: Repeat the procedures for Question 1,
but with this new predictor variable.
Include an image of your 100% stacked barplot in the report and Include your R code
Question 4 (5pts):
Follow the same procedures in Question 2 to address our second research question statistically.
Copy or screenshot the frequency table into your report and Include your R code
Run a proportions test designed to answer your second research question and Include your R code.
In your own words
, interpret the results and make a conclusion in context (same as Question 2).
Question 5 (5pts):
Let’s now estimate the relative risks for heart disease for the two situations we explored. Rather than code this computationally in R, we will use an online calculator
! See Calculator link below:
Report the relative risk (and 95% confidence interval)
for heart disease when patient is diabetic as compared to when they are not diabetic.
Report the relative risk (and 95% confidence interval)
for heart disease when the patient had experienced an exercise induced angina as compared to if they hadn’t.
Use any online calculator you’d like, but here is the calculator we used in class
: https://istats.shinyapps.io/Association_Categorical/ . Web search “Art of Stat Web Apps” and choose “Association Between Two Categorical Variables”.
Choose contingency table setup in the first drop-down.
Have your rows represent your explanatory variables (diabetes status, or angina status), and columns represent your response variable (heart disease status). Labels are optional, but might be helpful to keep track of what to plug in!
Be sure to choose ratio of proportions from the drop-down and check the “95% confidence interval” option below.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Let’s next consider cholesterol levels as our target response variable. Notice that cholesterol will be a numeric
variable, so our approach to this question will be slightly different. Research Question 3
: Do people who experience exercise-induced anginas have
different
cholesterol levels on average? Let’s say the researchers believe either a drop or an increase in cholesterol is possible and noteworthy
to report!
Question 6 (5pts)
: Create a jittered plot
to compare cholesterol levels between the angina and no angina groups. Include an image of your jittered plot in the report and Include your R code
-
Keep the width of your jitter small (like between 0.02 and 0.10)
-
Color each group of points differently (one color for “No” and one color for “Yes”)
-
Add an appropriate x axis label, y axis label, and title
Question 7 (5pts)
: Complete a t-test
to address the research question posed. Even though we have enough observations such that a z-test would be reasonable, it’s easier in R to just run a t-test, and the results will be approximately the same! We will not
assume equal variances (software can handle this situation easier, and this is the “safer” testing option).
Copy or screenshot the summary output from your t-test
In your own words
, interpret the results and make a conclusion in context. A full response should:
-
Identify the average cholesterol level for each group
-
Identify the p-value
-
Briefly summarize how this result helps you address research question 3.
Related Documents
Related Questions
Describe the similarities and differences between a research proposal and a research report.
arrow_forward
Install RStudio: Begin by installing RStudio on your computer. If you haven't done so, please refer to the official RStudio website for download and installation instructions.
Watch the Tutorial Video: Watch the provided video tutorial that explains how to run RStudio. Pay close attention to the steps for opening and managing data files. https://www.youtube.com/watch?v=RhJp6vSZ7z0
Open RStudio: Once RStudio is installed, open the application.
Load the Dataset: In RStudio, open a data file named "mtcars". To do this, type the command mtcars in the script editor and run the command.
Attach the Data: Next, attach the dataset using the command attach(mtcars).
Examine the Variables: Carefully review and note the names of all variables in the dataset. Examples of these variables include:
Mileage (mpg)
Number of Cylinders (cyl)
Displacement (disp)
Horsepower (hp)
Research: Google to understand these variables.
Statistical Analysis: Select mpg variable, and perform the following…
arrow_forward
Create a side-by-side boxplot for vitamin D level vs. NewAge and a side-
by-side boxplot for vitamin D level vs. country.
Create a scatterplot to show the relationship between vitamin D level
and Age.
Compare these two side-by-side boxplots and the scatterplot and explain
your findings.
• Note: Write appropriate captions for the tables, graphs, and outputs.
arrow_forward
can a cause and effect relationship be determined?
arrow_forward
Briefly explain the procedure you may adopt to summarize data set obtained from field study.
arrow_forward
Here is a set of 20 values in order:
1, 2, 5, 10, 11, 12, 14, 15, 15, 16, 18, 19, 20, 21, 22, 24, 25, 28, 32, 45
Do not use software for this question. Using your pencil, paper, and perhaps your calculator, determine the values for the 5 Number Summary for this set of data and type your answer into the essay text box.
arrow_forward
Draw a histogram for the data. Use a class width of 15. Be sure to include the screenshot of Excel of your answers and formulas/command that you use.
arrow_forward
what is mode?
arrow_forward
There are no hard-and-fast rules in determining how to develop a questionnaire. Fortunately, research experience has yielded some guidelines that help prevent the most common mistakes. The language should be simple to allow for variations in educational level. There are six critical and common mistakes researchers make when determining wording of each questions. Describe each of them and provide an example question to each of the common mistake.
arrow_forward
Since 1994, the Best Roller Coaster Poll has been ranking the world's roller coasters. In 2013, Bizarro dropped to 4th after eaming the top steel coaster rank for six straight years. Data on 11 of the top steel coasters from this poll are presented in the accompanying lable
What do these data indicate about the Length of the track and the Duration of the ride you can expect?
E Click the icon to view the steel coaster data.
Construct a scatterplot of the data. Choose the correct plot below.
A.
OB.
O D.
Q
240
240
240-
240
40
2000 6000
Length (A)
40
2000 6000 2
40
2000
6000 (2
Length ()
40
2000
Length ()
6000 (
Length (n)
Find the correlation coefficient.
Steel coaster data
(Round to three decimal places as
Маx
Max Vert
Duration Height Speed
(sec)
Length
(ft)
Initial
Angle
Rank Roller Coaster
(mph) (degrees)
Park
1 Expedition GeForce Holiday Park
Six Flags
Kings Dominion
SF New England
Hersheypark
Cedar Point
PortAventua
Six Flags
Mirabilandia
Location
Drop (ft)
(ft)
DE
184
75
188
74.6
82…
arrow_forward
please show all steps and please write out complete SENTENCES as to how you did each step. And explaining what you did to get the answer!!! Thanks
arrow_forward
please provide CORRECT answers! Thank you.
arrow_forward
Please do not give solution in image format thanku
arrow_forward
A pediatrician records the age x (in yr) and average height y (in inches) for girls between the ages of 2 and 10.
Height of Girls vs. Age
50-
40-
(4,38)
30-
10-
0
Age(yr)
Part: 0 / 4
Part 1 of 4
(a) Use the points (4, 38) and (8, 50) to write a linear model for these data.
X
y =
Skip Part
Check
Height (in.)
(8,50)
Ⓒ2022 McGraw Hill LLC. All Rights Reserve
A
arrow_forward
I have attached the screenshot of the question and the screenshot of the data that is in shuttlemission file. Please answer the questions and provide the screenshots of the outputs as well. Please briefly explain your answers. And provide step to step screenshot of the JASP. Thanks
arrow_forward
IQR for data set
41, 49, 55, 82, 84, 85, 93, 103, 113, 121, 126, 127, 136, 136, 155, 166, 169, 178, 193, 204, 445
arrow_forward
Describe about Which Level of Significance to choose.
arrow_forward
I'm new to line segments. Can someone help me with these problems?
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you

Elementary Algebra
Algebra
ISBN:9780998625713
Author:Lynn Marecek, MaryAnne Anthony-Smith
Publisher:OpenStax - Rice University

Algebra: Structure And Method, Book 1
Algebra
ISBN:9780395977224
Author:Richard G. Brown, Mary P. Dolciani, Robert H. Sorgenfrey, William L. Cole
Publisher:McDougal Littell
Related Questions
- Describe the similarities and differences between a research proposal and a research report.arrow_forwardInstall RStudio: Begin by installing RStudio on your computer. If you haven't done so, please refer to the official RStudio website for download and installation instructions. Watch the Tutorial Video: Watch the provided video tutorial that explains how to run RStudio. Pay close attention to the steps for opening and managing data files. https://www.youtube.com/watch?v=RhJp6vSZ7z0 Open RStudio: Once RStudio is installed, open the application. Load the Dataset: In RStudio, open a data file named "mtcars". To do this, type the command mtcars in the script editor and run the command. Attach the Data: Next, attach the dataset using the command attach(mtcars). Examine the Variables: Carefully review and note the names of all variables in the dataset. Examples of these variables include: Mileage (mpg) Number of Cylinders (cyl) Displacement (disp) Horsepower (hp) Research: Google to understand these variables. Statistical Analysis: Select mpg variable, and perform the following…arrow_forwardCreate a side-by-side boxplot for vitamin D level vs. NewAge and a side- by-side boxplot for vitamin D level vs. country. Create a scatterplot to show the relationship between vitamin D level and Age. Compare these two side-by-side boxplots and the scatterplot and explain your findings. • Note: Write appropriate captions for the tables, graphs, and outputs.arrow_forward
- can a cause and effect relationship be determined?arrow_forwardBriefly explain the procedure you may adopt to summarize data set obtained from field study.arrow_forwardHere is a set of 20 values in order: 1, 2, 5, 10, 11, 12, 14, 15, 15, 16, 18, 19, 20, 21, 22, 24, 25, 28, 32, 45 Do not use software for this question. Using your pencil, paper, and perhaps your calculator, determine the values for the 5 Number Summary for this set of data and type your answer into the essay text box.arrow_forward
- Draw a histogram for the data. Use a class width of 15. Be sure to include the screenshot of Excel of your answers and formulas/command that you use.arrow_forwardwhat is mode?arrow_forwardThere are no hard-and-fast rules in determining how to develop a questionnaire. Fortunately, research experience has yielded some guidelines that help prevent the most common mistakes. The language should be simple to allow for variations in educational level. There are six critical and common mistakes researchers make when determining wording of each questions. Describe each of them and provide an example question to each of the common mistake.arrow_forward
- Since 1994, the Best Roller Coaster Poll has been ranking the world's roller coasters. In 2013, Bizarro dropped to 4th after eaming the top steel coaster rank for six straight years. Data on 11 of the top steel coasters from this poll are presented in the accompanying lable What do these data indicate about the Length of the track and the Duration of the ride you can expect? E Click the icon to view the steel coaster data. Construct a scatterplot of the data. Choose the correct plot below. A. OB. O D. Q 240 240 240- 240 40 2000 6000 Length (A) 40 2000 6000 2 40 2000 6000 (2 Length () 40 2000 Length () 6000 ( Length (n) Find the correlation coefficient. Steel coaster data (Round to three decimal places as Маx Max Vert Duration Height Speed (sec) Length (ft) Initial Angle Rank Roller Coaster (mph) (degrees) Park 1 Expedition GeForce Holiday Park Six Flags Kings Dominion SF New England Hersheypark Cedar Point PortAventua Six Flags Mirabilandia Location Drop (ft) (ft) DE 184 75 188 74.6 82…arrow_forwardplease show all steps and please write out complete SENTENCES as to how you did each step. And explaining what you did to get the answer!!! Thanksarrow_forwardplease provide CORRECT answers! Thank you.arrow_forward
arrow_back_ios
SEE MORE QUESTIONS
arrow_forward_ios
Recommended textbooks for you
- Elementary AlgebraAlgebraISBN:9780998625713Author:Lynn Marecek, MaryAnne Anthony-SmithPublisher:OpenStax - Rice UniversityAlgebra: Structure And Method, Book 1AlgebraISBN:9780395977224Author:Richard G. Brown, Mary P. Dolciani, Robert H. Sorgenfrey, William L. ColePublisher:McDougal Littell

Elementary Algebra
Algebra
ISBN:9780998625713
Author:Lynn Marecek, MaryAnne Anthony-Smith
Publisher:OpenStax - Rice University

Algebra: Structure And Method, Book 1
Algebra
ISBN:9780395977224
Author:Richard G. Brown, Mary P. Dolciani, Robert H. Sorgenfrey, William L. Cole
Publisher:McDougal Littell