Biometrics Lab 6 - Fall2023 WC.docx
pdf
keyboard_arrow_up
School
Beloit College *
*We aren’t endorsed by this school
Course
247
Subject
Industrial Engineering
Date
Jan 9, 2024
Type
Pages
14
Uploaded by BailiffSnow15934
Prof Cary & Werner
Fall 2023
Biometrics Lab 6, Part I
Name(s):
&
Alexander Klemp
Grace Louise Suttman
You will complete Part I with a partner.
Clearly identify the contribution of each member to each
question; additionally, identify any contributions made by other classmates (i.e., how did April help
you complete this question?)
.
Submit one file for your group and make sure that the file name clearly
identifies the group members. When you have completed Part I, you may move on to Part II.
Please read the statements below. When you have completed the entire lab, sign the
statement by typing your name in an appropriate blank. By signing this contract, you
acknowledge your commitment to the academic honesty policy.
Academic Honesty Policy of Beloit College:
“In an academic institution, few offenses against the community are as serious as academic
dishonesty. Such behavior is a direct attack upon the concept of learning and inquiry and casts
doubts upon all measures of achievement. Beloit insists that only those who are committed to
principles of honest scholarship may study at the college.”
Acts of Academic Dishonesty
“Cheating is an act of deception by which a student misrepresents that he/she has mastered
information on an academic exercise that he/she has not mastered. For example, intentionally
using or attempting to use unauthorized materials, information, or study aids in any academic
exercise is considered cheating.”
I,
Alexander Klemp
, hereby acknowledge that the academic work presented in this exam is an
honest reflection of my own learning.
I,
Grace Suttman
, hereby acknowledge that the academic work presented in this exam is an honest
reflection of my own learning.
Prof Cary & Werner
Fall 2023
For this question, you will select data and perform an ANCOVA. Limit the number of levels of
the factor to no more than 3, and include at least n=10 for each level. Use the
Gapminder World
Data
database to choose a topic and find data that interests you. Approach finding data by using
year and country as a uniting feature of the independent variables (i.e., your continuous numeric
independent variable should be sourced from the same year as your response (dependent)
variable) – this will minimize confounding variables in your study. You may assume that ALL
assumptions are met to run the ANCOVA. (45 points)
a.
State your question of interest:
Is there a relationship between the price of gas per liter in usd and population density per square
km in the regions of Europe and Asia in 2010?
Alexander and Grace both selected one variable and agreed on the year. Alexander wrote out the
question based on him and Grace’s discussion.
b.
Write your scientific hypothesis (or explanation for what you think might explain an
expected outcome). Note: A “scientific hypothesis” is a testable statement about the way the
world works. It is not a statistical null hypothesis. The scientific hypothesis usually
corresponds to the alternate hypothesis in the statistical test.
There will be an increase in gas price based on higher population densities per km.
Alexander typed out the statement based on his and Grace’s discussion.
c.
Download the dataset in Excel format (.xlsx or .csv) and edit the file to include the
information needed for your analysis. Enter the data into JMP and paste the datasheet here.
Grace read out the data and Alexander entered it into the excel sheet.
Prof Cary & Werner
Fall 2023
d.
Describe the data you chose (you should include the type of data, the units, factors/levels,
and replication). Justify why using an ANCOVA is appropriate for addressing your scientific
hypothesis.
●
Type of data:
○
Our independent variable is the population density based on the number of people
per sq km and is a numeric continuous data set.
○
Our dependent variable is the gasoline prices per liter and is also numeric
continuous data.
○
Our factor’s are Asia and Europe.
●
Units:
○
Gas Prices is in USD per liter
○
Population density is in people per km
●
Factors/Levels:
○
Factors: Region
■
Level: Asia
■
Level: Europe
●
Replication: we have no replication
After discussing the data, Grace wrote out the types of data, while Alexander wrote out the units
and factors. The two then discussed whether the data was replicated or not, and Alexander wrote
out the statement.
e.
Write your null hypotheses statements here.
H
0
: μ
A
=μ
E
H
A
: μ
A
≠μ
E
H
0
: There is no functional relationship between population density and region
H
A
:There is a functional relationship between population density and region
H
0
: There is no interaction between population density and region on gas price
H
A
: There is a interaction between population density and region on gas price
Grace brought up her notes for reference and read off how the hypothesis should be written.
Alexander wrote out the hypotheses.
f. Run the analysis in R Studio and paste the commands/output here.
> setwd("C:/Users/blaxh/OneDrive/Documents/R(biometrics)")
> foo<-read.csv("C:/Users/blaxh/OneDrive/Documents/R(biometrics)/lab6data.csv")
> shapiro.test(foo$Gasprice)
Shapiro-Wilk normality test
data:
foo$Gasprice
W = 0.9351, p-value = 0.1741
> shapiro.test(foo$Popdensity)
Shapiro-Wilk normality test
data:
foo$Popdensity
W = 0.33159, p-value = 7.901e-09
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Prof Cary & Werner
Fall 2023
> fit<-lm(Gasprice ~ Popdensity, data=foo)
> summary(fit)
Call:
lm(formula = Gasprice ~ Popdensity, data = foo)
Residuals:
Min
1Q
Median
3Q
Max
-1.3648 -0.3212
0.1307
0.4528
1.0570
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.461e+00
1.432e-01
10.200 3.82e-09 ***
Popdensity
2.543e-05
2.823e-05
0.901
0.379
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.6252 on 19 degrees of freedom
Multiple R-squared:
0.04098,
Adjusted R-squared:
-0.009499
F-statistic: 0.8118 on 1 and 19 DF,
p-value: 0.3789
> fit2<-lm(Gasprice ~ Popdensity+Region, data=foo)
> summary(fit2)
Call:
lm(formula = Gasprice ~ Popdensity + Region, data = foo)
Residuals:
Min
1Q
Median
3Q
Max
-1.05632 -0.22462
0.03411
0.24410
0.72960
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
1.115e+00
1.676e-01
6.654 3.04e-06 ***
Popdensity
1.525e-05
2.401e-05
0.635
0.53348
RegionEurope 6.894e-01
2.324e-01
2.967
0.00826 **
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.5264 on 18 degrees of freedom
Multiple R-squared:
0.3559,
Adjusted R-squared:
0.2843
F-statistic: 4.973 on 2 and 18 DF,
p-value: 0.01908
> fit3<-lm(Gasprice~Popdensity, data=asia)
> summary(fit3)
Call:
lm(formula = Gasprice ~ Popdensity, data = asia)
Residuals:
Min
1Q
Median
3Q
Max
-0.91488 -0.15182
0.08597
0.13721
0.79419
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.005e+00
1.824e-01
5.510 0.000567 ***
Popdensity
1.501e-04
8.835e-05
1.699 0.127778
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.5295 on 8 degrees of freedom
Multiple R-squared:
0.2651,
Adjusted R-squared:
0.1733
F-statistic: 2.886 on 1 and 8 DF,
p-value: 0.1278
> fit4<-lm(Gasprice~Popdensity, data=Europe)
> summary(fit4)
Call:
lm(formula = Gasprice ~ Popdensity, data = Europe)
Residuals:
Min
1Q
Median
3Q
Max
-1.07909 -0.08753
0.09027
0.19593
0.69098
Prof Cary & Werner
Fall 2023
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.829e+00
1.518e-01
12.043 7.47e-07 ***
Popdensity
4.349e-06
2.257e-05
0.193
0.851
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.4759 on 9 degrees of freedom
Multiple R-squared:
0.004109,
Adjusted R-squared:
-0.1065
F-statistic: 0.03713 on 1 and 9 DF,
p-value: 0.8515
> ANCOVA<-lm(Gasprice ~ Popdensity*Region, data=foo)
> anova(ANCOVA)
Analysis of Variance Table
Response: Gasprice
Df Sum Sq Mean Sq F value
Pr(>F)
Popdensity
1 0.3173 0.31727
1.2598 0.277284
Region
1 2.4385 2.43852
9.6829 0.006343 **
Popdensity:Region
1 0.7060 0.70599
2.8034 0.112363
Residuals
17 4.2812 0.25184
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Alexander ran the data through R studios and Grace checked the code to make sure it looked correct.
g. Write the methods and results sections of a scientific report based upon your analysis. Be sure
to include all necessary information – see WS on Writing Scientific Reports and “Notes on
reporting regression analysis” for guidance. Include a figure and figure legend/caption in the
results section.
Methods: To assess the effects of population density and region on the price of gasoline,
population density per sq km and gasoline prices per liter in USD were taken from a total of ten
counties in two regions, five countries in Asia and five countries in Europe, in 2010 using a
random online generator. The data was then tested through a Shapiro-Wilks test for normality,
which determined that the data was drawn from a normally distributed population. An ANCOVA
was then run. All statistical tests were performed using R Studio (R 4.3.2) and the significance
level, α, was set at 0.05.
Results: The data was collected from ten countries in Asia and five countries in Europe in
2010 (n = 10). The linear regression model revealed no significant association between Gas
Prices and Population density per square km (p = 0.379). The overall model fit was weak
(Adjusted R
2
= -0.0095). After adding the region factor the model fit significantly (Adjusted R
2
=
0.2843). Region, specifically Europe (p = 0.00826), was found to have a significant effect on Gas
Prices. There was no significant relationship between Gas Prices and Population Density per
square km for Asia (p = 0.1278). Similarly, there was no significant relationship for Europe (p =
0.8515). The ANCOVA indicated that the interaction between population density per square km
Prof Cary & Werner
Fall 2023
and Region was not statistically significant (p = 0.1124)
.
This figure illustrates the relationship between population density (foo$Popdensity) and gas
prices (foo$Gasprice) based on the dataset. Each data point is one of twenty countries. The
X-Axis is a country's population density per square km and the y-axis is a country's average
price, in USD, of gas per liter. (df=17, R
2
= 0.2843, p = 0.1124)
Grace wrote the methods section and Alexander wrote the results section.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Prof Cary & Werner
Fall 2023
Biometrics Lab 6, Part II
You may continue to work in partners or you may choose to complete the remainder of the
lab individually.
Word identification: Fill in the blank with the term that is defined. (2 points each; 12pts)
1.
Type II Error
The type of error committed when one fails to reject a null hypothesis
that is false and it should be rejected.
2.
Pearson’s Correlation
Coefficient
The name (not symbol) of the parameter that measures the strength of
the association between two variables that do not have a functional
relationship.
3.
R
2
A measure of the proportion of the variation in the values of a
dependent variable explained by the independent variable in
regression.
4.
simple linear
regression
A statistical analysis that tests for a functional relationship between
two continuous variables.
5.
ANCOVA
A statistical test used to test whether a dependent variable is
functionally related to an independent variable under two different
conditions.
6.
simple linear
correlation
A statistical test that determines whether two variables are associated
with one another.
Alexander and Grace reviewed their quizzes and the textbook to find the answers.
Prof Cary & Werner
Fall 2023
7. A group of students conducted an experiment that was designed to examine whether there is a functional
relationship between the amount of food (g) eaten by rats and the carbohydrate composition (%) of the food.
Please help these students by analyzing the data in the table and answering the questions. (23 points)
Rat
Food
eaten (g)
Carbohydrate
composition (%)
1
452
21.7
2
488
25.7
3
490
32.0
4
546
34.3
5
446
33.2
6
495
29.2
7
452
34.5
8
488
33.8
9
490
38.6
10
546
41.6
11
430
21.7
12
465
25.7
13
496
32.0
14
510
34.3
15
534
33.2
16
542
29.2
17
580
34.5
18
585
33.8
19
604
38.6
20
624
41.6
Prof Cary & Werner
Fall 2023
a) What parametric statistical inference test should be performed? Justify the selection of this
analysis based upon the description of the experiment. Please be specific.
We will run a simple linear regression test because we are assessing if there is a functional
relationship between two continuous variables, in this case food eaten and carbohydrate
composition.
Grace wrote out the statement based on her and Alexander's discussion.
b) What are the null and alternative hypotheses(es) for this statistical test? Be sure to include all
hypothesis statements (
hint
: consider whether this is a replicated design).
H
0
: β= 0 or The amount of food eaten by rats does not depend on the food’s carbohydrate
content.
H
A
: β≠ 0 or The amount of food eaten by rats is dependent on the food’s carbohydrate content.
(Unreplicated design)
Grace found the hypotheses in her notes and Alexander wrote them out.
c) What are the assumptions for this test?
-
Sampling methods are random and independent
-
Bivariate normality
-
All populations of Y values gave equal variance
-
The measurements of X were made without error
-
Unreplicated
: the means of the Y populations lie along a straight line (the relationship
between X and Y is linear)
Grace added the assumption from her notes and Alexander made sure it was correct.
d) You may assume these data have met the assumptions you provided in c). Run the statistical
test and paste the R commands/output here.
> fit<- lm(Food~Carb, data = rat)
> summary(fit)
Call:
lm(formula = Food ~ Carb, data = rat)
Residuals:
Min
1Q
Median
3Q
Max
-74.110 -21.475
-1.322
28.261
63.337
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
306.934
57.479
5.340 4.48e-05 ***
Carb
6.353
1.746
3.639
0.00188 **
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 43.01 on 18 degrees of freedom
Multiple R-squared:
0.4239,
Adjusted R-squared:
0.3918
F-statistic: 13.24 on 1 and 18 DF,
p-value: 0.001877
Alexander ran the data through R studios and Grace checked the code to make sure it looked correct.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Prof Cary & Werner
Fall 2023
f) Report the results of this test. Include the text and all statistical information that would appear
in the results section of a scientific paper.
We conducted a simple linear regression analysis and we are able to conclude that we will reject
the null hypothesis. We will do this because the p-value = 0.001877 which is less than the
significance level of 0.05. This tells us that the amount of food eaten by rats is dependent on
the food’s carbohydrate content (F
1,18
= 13.24, r
2
= .4239, df = 1 and 18, p-value = 0.001877).
We can also conclude that around 42.39% of the variation in food eaten (g) was accounted
for by the percentage of carbohydrates in the food. The regression equation for this problem
would be
y
= 6.353(
x
) + 306.934.
Grace reported the results and Alexander checked if the values were right
g) What proportion of the variation in food eaten (g) was accounted for by the % carbohydrate
in food?
r
2
= .4239
This shows that 42.39% of the variation in food eaten was accounted for by the percentage of
carbohydrates in food.
Alexander added the r
2
and Grace added the written statement
h) Use Excel to create a publication quality figure to illustrate your answer and include an
appropriate figure legend/caption.
This scatter plot illustrates the relationship between the amount of food eaten by rats and the
carbohydrate composition of their diet. Each point represents an individual rat's data. The
solid line represents the linear regression fit to the data, providing insights into the direction
and strength of the association. ( F
1,18
= 13.24, r
2
= .4239, df = 1 and 18, p-value = 0.001877 )
Alexander made the graph and Grace wrote the figure legend
Prof Cary & Werner
Fall 2023
8. Ecologists were interested in analyzing the association between liver length (mm) and body
mass (g) of the yellow perch,
Perca flavescens
. Their collected data are reported in the following
table. Help them draw conclusions by answering the questions below. (20 points)
Fish
Liver
length
(mm)
Body
mass
(g)
1
126
14.4
2
175
15.2
3
106
10.6
4
96
5.4
5
147
22.7
6
138
14.9
7
78
11.4
8
120
14.81
9
98
5.19
10
132
15.39
11
140
17.25
12
123
11.52
13
108
11.5
14
124
14.8
15
156
18.3
Prof Cary & Werner
Fall 2023
a) What parametric statistical test should the researchers use to analyze these data and why is
this an appropriate test?
We will conduct a simple linear correlation analysis, as we are trying to find how body mass is
related to liver length in yellow perch.
Alexander wrote out this statement based on him and Grace’s discussion.
b) What is the null hypothesis for this test?
H
0
: ρ= 0 or There is no correlation between body mass and liver length in yellow perch
H
A
: ρ≠ 0 or There is a correlation between body mass and liver length in yellow perch.
Alexander added the first section with rho and Grace added in the written statement.
c) What are the assumptions for this test?
-
Sampling methods are random and independent
-
Bivariate normality
Grace added in the assumption and Alexander double checked them.
d) You may assume that these data have met the assumptions you listed in c). Run the statistical
test and paste the R commands/output here.
> cor.test(lab6b$Liver, lab6b$mass)
Pearson's product-moment correlation
data:
lab6b$Liver and lab6b$mass
t = 3.8503, df = 13, p-value =0.002006
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval: 0.3476544 0.9041236
sample estimates:cor 0.7299246
Alexander ran the data through R studios and Grace checked the code to make sure it looked correct.
e) From your R output, obtain the coefficient value and use this to calculate the test statistic.
Include the formula for the test statistic and show any mathematical work necessary to calculate
test statistic, including any terms not reported in R. Compare this value to the critical value for
the test statistic.
r = 0.7299246
r
2
= 0.5328
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Prof Cary & Werner
Fall 2023
S
r
=
=
0.1895
(1−0.5328)
(15−2)
0.4672
13
=
0. 0359
= t = 0.7299/0.1895 = 3.8523
Critical value t
0.05(2),13
= 2.16
Since our critical value, 2.16, is less than the test statistic, 3.8523, we will reject our null
hypothesis.
Alexander did the calculation on the calculator and Grace wrote the statement.
f) Report the result of testing the null hypothesis. Provide a clear, concluding statement about
the data, which is supported by the appropriate statistical output. Also include the 95%
confidence interval for the coefficient.
Since the p-value = 0.002 < 0.05, we will reject the null hypothesis and conclude that there is a
significant correlation between body mass and liver length in 15 yellow perch (t=3.8523, df=13,
r=0.7299, t
0.05(2),13
= 2.16). The 95% confidence intervals for the coefficients are 0.3477 and
0.9041.
Grace wrote the statement out based on Alexander and Grace’s discussion.
g) Use Excel to create a publication quality figure to illustrate your answer and include an
appropriate figure legend/caption.
This scatter plot illustrates the correlation between body mass and liver length in yellow perch.
Each point represents an individual fish with body mass (g) plotted on the x-axis and liver
length (cm) in the y-axis. A significant linear correlation was found between the two
Prof Cary & Werner
Fall 2023
variables and the strength and direction of the association is indicated. The linear correlation
analysis was conducted using a Pearson correlation analysis (t=3.8523, df=13, r=0.7299,
t
0.05(2),13
= 2.16).
Alexander created the graph and Grace wrote the figure legend