Problem Set 1-2
docx
keyboard_arrow_up
School
Indiana University, Purdue University, Indianapolis *
*We aren’t endorsed by this school
Course
551
Subject
Statistics
Date
Feb 20, 2024
Type
docx
Pages
9
Uploaded by anishadesai02
1
Problem Set 1
PBHL-P551
Problem set policies. Please provide concise, clear answers for each question. Note that only
writing the result of a calculation (e.g., "SD = 3.3") without explanation is not su
ffi
cient.
For
problems involving R
, include the code in your solution, along with any plots.
Please
submit
your
problem
set
via
Canvas
as
a
PDF,
along
with
the
R
Markdown
source
file.
We encourage you to discuss problems with other students (and, of course, with the instructor
and the TAs), but you must write your final answer in your own words.
Solutions prepared "in
committee" are not acceptable.
If you do collaborate with classmates on a problem, please list
your collaborators on your solution.
Problem 1.
Since states with larger numbers of elderly residents would naturally have more
nursing home residents, the number of nursing home residents in a state is often
adjusted for the number of people 65 years or older (65+). That adjustment is
usually given as the number of nursing home residents age 65+ per 1,000
members of the population age 65+. For example, a hypothetical state with 200
nursing home residents age 65+ and 50,000 people age 65+ would have the same
adjusted number of residents as a state with 400 residents and a total age 65+
population of 100,000 – 4 residents per 1,000.
The data file nursing.home.Rdata contains this adjusted number of residents for each
state in the United States. The state names are saved under the variable name
state and the adjusted number of residents under the variable name resident
.
1
a) Which state has the smallest number of nursing home residents per 1000
population 65 years of age and over? Which state has the largest number?
Hint: use the R functions which.min() and which.max()
. Alternatively, look
directly at the data in RStudio.
b) What factors might influence the substantial amount of variability among
di
ff
erent states? This question cannot be answered from the data; speculate
using what you know about the demographics of the United States.
c) Construct a boxplot for the number of nursing home residents per 1,000 population.
d) Is the distribution of nursing home resident per 1000 population symmetric or
skewed? Are there any states that could be considered outliers?
e) Display the number of nursing home residents per 1000 population using a
histogram. Do you find this graph to be more or less informative than the box
plot? Explain your answer.
2
1
The data originally appeared in Chapter 12 of Case Studies in Biometry
, 1994, by Lange et al.
3
Problem 2.
The file adolescent.fertility.Rdata contains data on the number of children born to women aged 15-19 from 189 countries around the world for the years 1997, 2000, 2002, 2005, and 2006.
2
The data are defined using a scaling similar to that used in the nursing home data.
The values for the annual adolescent fertility rates represent the number of live
births among women aged 15-19 per 1,000 women members of the population of
that age.
For the years 2000-2006, the adolescent fertility rate for Iraq is coded NA
, or
missing. When calculating a mean or standard deviation in R for a variable x which
has missing data, add na.rm=TRUE to the argument to perform the calculations
without the missing observations: mean(x, na.rm=TRUE); sd(x, na.rm=TRUE)
.
a) Calculate the mean, standard deviation, and five-number summary for the
distribution of adolescent fertility in 2006 (
fert_2006
). Note that the summary()
command in R produces six numbers; specify which five belong in the five-
number summary as defined in lecture.
b) What is the 75
th
percentile of the distribution? Write a sentence explaining the
75
th
percentile in the context of this data.
c) Why might those observations for Iraq be missing between 2000 and 2006?
Would the five-number summary have been a
ff
ected very much if the values
had been available?
d) Use a single boxplot command to produce side-by-side boxplots of the fertility
rates for each of the five years in the dataset. What pattern do you see?
Problem 3.
A recently published analysis examined 10 studies that measured optimism and
pessimism by asking participants about their level of agreement with statements
like “In uncertain times, I usually expect the best,” or “I rarely expect good things
to happen to me”. Optimistic people tend to expect that they will encounter
favorable outcomes, whereas less optimistic people tend to expect that they will
encounter unfavorable outcomes.
3
These studies also measured other variables on participants, including factors
related to heart disease. The analysis found that compared with pessimists, people
with the most optimistic outlook had a 35% lower risk for cardiovascular events
(e.g., heart attacks). The studies, on average, observed people over a 14-year period
and compared the rate of cardiovascular events between those classified as
optimists versus pessimists.
a) A popular newspaper reports on the analysis with the headline “Thinking
Positively Improves Cardiovascular Health”. Write a short response to the
editor explaining clearly why the headline is potentially misleading. Be sure to
use language accessible to a general audience without a statistics
background. Limit your answer to at most five sentences.
b) Briefly describe a plausible study design that has the potential to demonstrate
the e
ff
ect of thinking positively on cardiovascular health.
c) Suppose someone who is very optimistic reads about the analysis and
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
4
concludes that the findings suggest he has a 35% lower risk for
cardiovascular events than his friend who is
2
Data from the CIA World Factbook
3
Alan Rozanski, MD, et al. Association of optimism with cardiovascular events and all-cause mortality. JAMA Network Open 2019; 2(9):e1912200.
5
extremely pessimistic. Explain why this is not necessarily the case.
Problem 4.
Suppose that you are interested in determining whether a relationship exists between the fluoride content in a public water supply and the dental caries experience of children using this water. The
file water.Rdata contains the data from a study examining 7,257 children in 21 cities from the
Flanders region in Belgium.
The fluoride content of the public water supply in each city, measured in parts per
million (ppm), is saved under the variable name fluoride
; the number of dental
caries per 100 children examined is saved under the name caries
. The total dental
caries number is obtained by summing the numbers
of filled teeth, teeth with untreated dental caries, teeth requiring extraction, and
missing teeth.
4
a) Construct a two-way scatterplot for these data, with fluoride as the x
-variable
and caries as the y
-variable.
b) Do fluoride and caries appear to be positively or negatively associated?
Explain your answer.
c) Later in the course, we will study methods for fitting a straight line to data.
i.
If you were to add a straight line to the plot that you think best fits the data, what would be its x
-intercept and y
-intercept? (
Hint
: Be sure to look at the limits on the axes. . . )
ii. Based on the appearance of the plot, do you think that a straight line would be a reasonable way to represent these data? Explain your answer.
Problem 5.
This problem features data from the FAMuSS (
Functional SNPs Associated with
Muscle Size and Strength
) study discussed in lecture. The study examined the
possible genetic determinants of skeletal muscle size and strength, before and
after training.
This problem uses the following variables from the FAMuSS data:
–
ndrm.ch
: the percent change in strength in a participant’s non-dominant arm, from before training and after.
–
drm.ch
: the percent change in strength in a particpant’s dominant arm.
–
actn3.r577x
: the genotype at residue r577x within the ACTN3 gene.
–
race
: race of the participant, with values stored as text strings.
The famuss dataset is in the oibiostat package.
a) Make a table of the genotypes for the SNP actn3.r577x
.
b) Construct a table of actn3.r577x by race, with the genotypes in the columns of
the table and races in the rows. The command for creating a two-way table of
categorical variables x and y is: table(x, y)
.
6
4
These data appear in Table B21 in Principles of Biostatistics
, 2nd ed. by Pagano and Gauvreau.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
7
c) If you were to use numerical summaries to describe the ndrm.ch variable,
would you prefer the mean and standard deviation or the five-number
summary? Why?
d) Produce a graphical summary that shows the association between age and
genotype at the SNP actn3.r577x
. Describe what you see.
Problem 6.
Does smoking have the same association with cardiovascular disease in women as
it does in men? Epidemiologists typically use data from observational studies to
investigate possible causes of disease.
Aortic stenosis is a narrowing or stricture of the aorta that impedes blood flow
to the body.
5
The dataset contains three variables, for 215 study participants:
–
disease
: coded Yes if stenosis is present, No if it is absent.
–
smoke
: coded Smoker if the participant is a current or former smoker, NonSmoker if the partici- pant has never smoked.
–
sex
: coded as either Male or Female
Use the data in stenosis.Rdata to answer the following questions.
a) Construct a two-way table for smoking status and disease presence. What
percentage of the 215 participants were both smokers and had aortic
stenosis? This percentage is one component of the joint distribution of smoking
and stenosis; what are the other three numbers of the joint distribution?
b) Among the smokers, what proportion have aortic stenosis? This number is a
component of the conditional distribution of stenosis for the two categories of
smokers. What proportion of non-smokers have aortic stenosis?
c) Repeat part b) for males and females separately. To do this, first subset the
data to create two datasets: one with only males, and one with only females.
Include the tables in your solution. Are there any di
ff
erences by sex in the
proportion of smokers who su
ff
er from aortic stenosis?
d) Epidemiologists sometimes use a statistic called relative risk. In this context,
relative risk is the ratio of the proportion of smokers with stenosis to the
proportion of non-smokers with stenosis. Relative risks greater than 1
indicate that smokers are at a higher risk for aortic stenosis than non-smokers,
because, among smokers, a higher proportion of them will su
ff
er from stenosis
than the proportion among non-smokers.
The interpretation of relative risk is a bit subtle. Suppose, for example, that
among men with high cholesterol, 30% develop heart disease, while among
men with low cholesterol, 24% develop heart disease. The relative risk of
heart disease, comparing high versus low cholesterol, is 0.30/0.24 = 1.25.
Epidemiologists would say that high cholesterol is associated with a 25%
increase in the probability of heart disease. Relative risks of 1.2 or higher are
generally considered cause for alarm.
i.
Calculate the relative risk of stenosis among all participants, comparing
8
smokers to non-smokers.
5
The data appear in Table B20, Principles of Biostatistics
, 2nd ed. by Pagano and Gauvreau.
9
ii.
Repeat the relative risk calculation for males and females separately.
iii.
Describe the apparent discrepancy between the overall relative risk from part i. and the relative risks calculated in part ii.
iv.
Bonus
: Provide an explanation for the apparent discrepancy described in part iii.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Documents
Related Questions
This table shows the number of reported cases of a disease in a country.
Week
Size
400
500
625
3.
781
Write an equation that would model this data. Your answer should be in the proper form using correct letters and numbers with
no spaces.
arrow_forward
Homework
Q1: Implement All plot types using MS Excel and Send the files
Q2: What type of plot is most suitable to address the electromagnetic
spectrum and why?
Report
Write a report about using plots for manufacturing of One of these types
•
•
Lasers
Light Emitting Diodes
• Optical Switches
22
arrow_forward
The owner of a radio station wants to determine how much airtime songs take up. The owner collects the song length in
seconds of 10 popular songs. The data are reproduced in the table below. Calculate the mode(s) using a TI-83. TI-83 plus, or
TI-84 graphing calculator.
Length of Songs in
Seconds
279
219
298
134
206
293
165
227
230
212
arrow_forward
Use Excel to construct a boxplot for the Presidential Heights data. Select the data in the worksheet and click the Insert tab in the top menu. Click Chart and then Box and Whisker. Click on the boxplot, click the Design tab in the top menu, click Add Chart Element, and choose the appropriate element that allows you to add a vertical axis label.
PRESIDENT
HEIGHT
Washington
188
J. Adams
170
Jefferson
189
Madison
163
Monroe
183
J. Q. Adams
171
Jackson
185
Van Buren
168
Harrison
173
Polk
173
Taylor
173
Pierce
178
Buchanan
183
Lincoln
193
Grant
173
Hayes
173
Garfield
183
Cleveland
180
Harrison
168
McKinley
170
T. Roosevelt
178
Taft
182
Wilson
180
Harding
183
Coolidge
178
Hoover
182
F. Roosevelt
188
Truman
175
Eisenhower
179
J. Kennedy
183
Johnson
192
Nixon
182
Carter
177
Reagan
185
G. H. W. Bush
188
Clinton
188
G. W. Bush
183
Obama
188
arrow_forward
Consider the chart to the right What is the greatest single expense category? To
the nearest degree, what is the central angle of that category's sector?
Federal Spending 2011-2020 Totals
Social Security 10%
O Net Interest 23%
Choose the correct answer below and fill in the answer box to complete your
choice.
(Round to the nearest degree as needed.)
Medicare & Medicaid 30%
O Other 7%
Defense 15%
O A. Defense
Non-Defense Other 15%
O B. Medicare & Medicaid
O C. Social Security
O D. Net Interest
Click to select and enter your answer(s) and then click Check Answer
All parts showing
Clear All
Check
1|0 represent 100, double key represents two times.
arch
arrow_forward
Please help me answer a,b and c with complete solution
arrow_forward
What would it look like as a drawing and what is the answer?
arrow_forward
y.S.M.A
your answer.
Would a table be helpful? What about the equation you wrote in part C?
(E) Suppose one click of the zoom-out button results in a 10 percent decrease in the size of the drawing. How
many clicks of the zoom-out button would it take to transform the display of the door from 3 feet wide back to a
width of approximately 3 inches? Explain how you got your answer.
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you

Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL

Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill

Intermediate Algebra
Algebra
ISBN:9781285195728
Author:Jerome E. Kaufmann, Karen L. Schwitters
Publisher:Cengage Learning

College Algebra (MindTap Course List)
Algebra
ISBN:9781305652231
Author:R. David Gustafson, Jeff Hughes
Publisher:Cengage Learning

Algebra: Structure And Method, Book 1
Algebra
ISBN:9780395977224
Author:Richard G. Brown, Mary P. Dolciani, Robert H. Sorgenfrey, William L. Cole
Publisher:McDougal Littell
Algebra & Trigonometry with Analytic Geometry
Algebra
ISBN:9781133382119
Author:Swokowski
Publisher:Cengage
Related Questions
- This table shows the number of reported cases of a disease in a country. Week Size 400 500 625 3. 781 Write an equation that would model this data. Your answer should be in the proper form using correct letters and numbers with no spaces.arrow_forwardHomework Q1: Implement All plot types using MS Excel and Send the files Q2: What type of plot is most suitable to address the electromagnetic spectrum and why? Report Write a report about using plots for manufacturing of One of these types • • Lasers Light Emitting Diodes • Optical Switches 22arrow_forwardThe owner of a radio station wants to determine how much airtime songs take up. The owner collects the song length in seconds of 10 popular songs. The data are reproduced in the table below. Calculate the mode(s) using a TI-83. TI-83 plus, or TI-84 graphing calculator. Length of Songs in Seconds 279 219 298 134 206 293 165 227 230 212arrow_forward
- Use Excel to construct a boxplot for the Presidential Heights data. Select the data in the worksheet and click the Insert tab in the top menu. Click Chart and then Box and Whisker. Click on the boxplot, click the Design tab in the top menu, click Add Chart Element, and choose the appropriate element that allows you to add a vertical axis label. PRESIDENT HEIGHT Washington 188 J. Adams 170 Jefferson 189 Madison 163 Monroe 183 J. Q. Adams 171 Jackson 185 Van Buren 168 Harrison 173 Polk 173 Taylor 173 Pierce 178 Buchanan 183 Lincoln 193 Grant 173 Hayes 173 Garfield 183 Cleveland 180 Harrison 168 McKinley 170 T. Roosevelt 178 Taft 182 Wilson 180 Harding 183 Coolidge 178 Hoover 182 F. Roosevelt 188 Truman 175 Eisenhower 179 J. Kennedy 183 Johnson 192 Nixon 182 Carter 177 Reagan 185 G. H. W. Bush 188 Clinton 188 G. W. Bush 183 Obama 188arrow_forwardConsider the chart to the right What is the greatest single expense category? To the nearest degree, what is the central angle of that category's sector? Federal Spending 2011-2020 Totals Social Security 10% O Net Interest 23% Choose the correct answer below and fill in the answer box to complete your choice. (Round to the nearest degree as needed.) Medicare & Medicaid 30% O Other 7% Defense 15% O A. Defense Non-Defense Other 15% O B. Medicare & Medicaid O C. Social Security O D. Net Interest Click to select and enter your answer(s) and then click Check Answer All parts showing Clear All Check 1|0 represent 100, double key represents two times. archarrow_forwardPlease help me answer a,b and c with complete solutionarrow_forward
- What would it look like as a drawing and what is the answer?arrow_forwardy.S.M.A your answer. Would a table be helpful? What about the equation you wrote in part C? (E) Suppose one click of the zoom-out button results in a 10 percent decrease in the size of the drawing. How many clicks of the zoom-out button would it take to transform the display of the door from 3 feet wide back to a width of approximately 3 inches? Explain how you got your answer.arrow_forward
arrow_back_ios
arrow_forward_ios
Recommended textbooks for you
- Holt Mcdougal Larson Pre-algebra: Student Edition...AlgebraISBN:9780547587776Author:HOLT MCDOUGALPublisher:HOLT MCDOUGALGlencoe Algebra 1, Student Edition, 9780079039897...AlgebraISBN:9780079039897Author:CarterPublisher:McGraw HillIntermediate AlgebraAlgebraISBN:9781285195728Author:Jerome E. Kaufmann, Karen L. SchwittersPublisher:Cengage Learning
- College Algebra (MindTap Course List)AlgebraISBN:9781305652231Author:R. David Gustafson, Jeff HughesPublisher:Cengage LearningAlgebra: Structure And Method, Book 1AlgebraISBN:9780395977224Author:Richard G. Brown, Mary P. Dolciani, Robert H. Sorgenfrey, William L. ColePublisher:McDougal LittellAlgebra & Trigonometry with Analytic GeometryAlgebraISBN:9781133382119Author:SwokowskiPublisher:Cengage

Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL

Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill

Intermediate Algebra
Algebra
ISBN:9781285195728
Author:Jerome E. Kaufmann, Karen L. Schwitters
Publisher:Cengage Learning

College Algebra (MindTap Course List)
Algebra
ISBN:9781305652231
Author:R. David Gustafson, Jeff Hughes
Publisher:Cengage Learning

Algebra: Structure And Method, Book 1
Algebra
ISBN:9780395977224
Author:Richard G. Brown, Mary P. Dolciani, Robert H. Sorgenfrey, William L. Cole
Publisher:McDougal Littell
Algebra & Trigonometry with Analytic Geometry
Algebra
ISBN:9781133382119
Author:Swokowski
Publisher:Cengage