Biometrics Lab 6 - Fall2023 WC.docx

pdf

School

Beloit College *

*We aren’t endorsed by this school

Course

247

Subject

Industrial Engineering

Date

Jan 9, 2024

Type

pdf

Pages

14

Uploaded by BailiffSnow15934

Report
Prof Cary & Werner Fall 2023 Biometrics Lab 6, Part I Name(s): & Alexander Klemp Grace Louise Suttman You will complete Part I with a partner. Clearly identify the contribution of each member to each question; additionally, identify any contributions made by other classmates (i.e., how did April help you complete this question?) . Submit one file for your group and make sure that the file name clearly identifies the group members. When you have completed Part I, you may move on to Part II. Please read the statements below. When you have completed the entire lab, sign the statement by typing your name in an appropriate blank. By signing this contract, you acknowledge your commitment to the academic honesty policy. Academic Honesty Policy of Beloit College: “In an academic institution, few offenses against the community are as serious as academic dishonesty. Such behavior is a direct attack upon the concept of learning and inquiry and casts doubts upon all measures of achievement. Beloit insists that only those who are committed to principles of honest scholarship may study at the college.” Acts of Academic Dishonesty “Cheating is an act of deception by which a student misrepresents that he/she has mastered information on an academic exercise that he/she has not mastered. For example, intentionally using or attempting to use unauthorized materials, information, or study aids in any academic exercise is considered cheating.” I, Alexander Klemp , hereby acknowledge that the academic work presented in this exam is an honest reflection of my own learning. I, Grace Suttman , hereby acknowledge that the academic work presented in this exam is an honest reflection of my own learning.
Prof Cary & Werner Fall 2023 For this question, you will select data and perform an ANCOVA. Limit the number of levels of the factor to no more than 3, and include at least n=10 for each level. Use the Gapminder World Data database to choose a topic and find data that interests you. Approach finding data by using year and country as a uniting feature of the independent variables (i.e., your continuous numeric independent variable should be sourced from the same year as your response (dependent) variable) – this will minimize confounding variables in your study. You may assume that ALL assumptions are met to run the ANCOVA. (45 points) a. State your question of interest: Is there a relationship between the price of gas per liter in usd and population density per square km in the regions of Europe and Asia in 2010? Alexander and Grace both selected one variable and agreed on the year. Alexander wrote out the question based on him and Grace’s discussion. b. Write your scientific hypothesis (or explanation for what you think might explain an expected outcome). Note: A “scientific hypothesis” is a testable statement about the way the world works. It is not a statistical null hypothesis. The scientific hypothesis usually corresponds to the alternate hypothesis in the statistical test. There will be an increase in gas price based on higher population densities per km. Alexander typed out the statement based on his and Grace’s discussion. c. Download the dataset in Excel format (.xlsx or .csv) and edit the file to include the information needed for your analysis. Enter the data into JMP and paste the datasheet here. Grace read out the data and Alexander entered it into the excel sheet.
Prof Cary & Werner Fall 2023 d. Describe the data you chose (you should include the type of data, the units, factors/levels, and replication). Justify why using an ANCOVA is appropriate for addressing your scientific hypothesis. Type of data: Our independent variable is the population density based on the number of people per sq km and is a numeric continuous data set. Our dependent variable is the gasoline prices per liter and is also numeric continuous data. Our factor’s are Asia and Europe. Units: Gas Prices is in USD per liter Population density is in people per km Factors/Levels: Factors: Region Level: Asia Level: Europe Replication: we have no replication After discussing the data, Grace wrote out the types of data, while Alexander wrote out the units and factors. The two then discussed whether the data was replicated or not, and Alexander wrote out the statement. e. Write your null hypotheses statements here. H 0 : μ A E H A : μ A ≠μ E H 0 : There is no functional relationship between population density and region H A :There is a functional relationship between population density and region H 0 : There is no interaction between population density and region on gas price H A : There is a interaction between population density and region on gas price Grace brought up her notes for reference and read off how the hypothesis should be written. Alexander wrote out the hypotheses. f. Run the analysis in R Studio and paste the commands/output here. > setwd("C:/Users/blaxh/OneDrive/Documents/R(biometrics)") > foo<-read.csv("C:/Users/blaxh/OneDrive/Documents/R(biometrics)/lab6data.csv") > shapiro.test(foo$Gasprice) Shapiro-Wilk normality test data: foo$Gasprice W = 0.9351, p-value = 0.1741 > shapiro.test(foo$Popdensity) Shapiro-Wilk normality test data: foo$Popdensity W = 0.33159, p-value = 7.901e-09
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Prof Cary & Werner Fall 2023 > fit<-lm(Gasprice ~ Popdensity, data=foo) > summary(fit) Call: lm(formula = Gasprice ~ Popdensity, data = foo) Residuals: Min 1Q Median 3Q Max -1.3648 -0.3212 0.1307 0.4528 1.0570 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.461e+00 1.432e-01 10.200 3.82e-09 *** Popdensity 2.543e-05 2.823e-05 0.901 0.379 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.6252 on 19 degrees of freedom Multiple R-squared: 0.04098, Adjusted R-squared: -0.009499 F-statistic: 0.8118 on 1 and 19 DF, p-value: 0.3789 > fit2<-lm(Gasprice ~ Popdensity+Region, data=foo) > summary(fit2) Call: lm(formula = Gasprice ~ Popdensity + Region, data = foo) Residuals: Min 1Q Median 3Q Max -1.05632 -0.22462 0.03411 0.24410 0.72960 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.115e+00 1.676e-01 6.654 3.04e-06 *** Popdensity 1.525e-05 2.401e-05 0.635 0.53348 RegionEurope 6.894e-01 2.324e-01 2.967 0.00826 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.5264 on 18 degrees of freedom Multiple R-squared: 0.3559, Adjusted R-squared: 0.2843 F-statistic: 4.973 on 2 and 18 DF, p-value: 0.01908 > fit3<-lm(Gasprice~Popdensity, data=asia) > summary(fit3) Call: lm(formula = Gasprice ~ Popdensity, data = asia) Residuals: Min 1Q Median 3Q Max -0.91488 -0.15182 0.08597 0.13721 0.79419 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.005e+00 1.824e-01 5.510 0.000567 *** Popdensity 1.501e-04 8.835e-05 1.699 0.127778 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.5295 on 8 degrees of freedom Multiple R-squared: 0.2651, Adjusted R-squared: 0.1733 F-statistic: 2.886 on 1 and 8 DF, p-value: 0.1278 > fit4<-lm(Gasprice~Popdensity, data=Europe) > summary(fit4) Call: lm(formula = Gasprice ~ Popdensity, data = Europe) Residuals: Min 1Q Median 3Q Max -1.07909 -0.08753 0.09027 0.19593 0.69098
Prof Cary & Werner Fall 2023 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.829e+00 1.518e-01 12.043 7.47e-07 *** Popdensity 4.349e-06 2.257e-05 0.193 0.851 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.4759 on 9 degrees of freedom Multiple R-squared: 0.004109, Adjusted R-squared: -0.1065 F-statistic: 0.03713 on 1 and 9 DF, p-value: 0.8515 > ANCOVA<-lm(Gasprice ~ Popdensity*Region, data=foo) > anova(ANCOVA) Analysis of Variance Table Response: Gasprice Df Sum Sq Mean Sq F value Pr(>F) Popdensity 1 0.3173 0.31727 1.2598 0.277284 Region 1 2.4385 2.43852 9.6829 0.006343 ** Popdensity:Region 1 0.7060 0.70599 2.8034 0.112363 Residuals 17 4.2812 0.25184 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Alexander ran the data through R studios and Grace checked the code to make sure it looked correct. g. Write the methods and results sections of a scientific report based upon your analysis. Be sure to include all necessary information – see WS on Writing Scientific Reports and “Notes on reporting regression analysis” for guidance. Include a figure and figure legend/caption in the results section. Methods: To assess the effects of population density and region on the price of gasoline, population density per sq km and gasoline prices per liter in USD were taken from a total of ten counties in two regions, five countries in Asia and five countries in Europe, in 2010 using a random online generator. The data was then tested through a Shapiro-Wilks test for normality, which determined that the data was drawn from a normally distributed population. An ANCOVA was then run. All statistical tests were performed using R Studio (R 4.3.2) and the significance level, α, was set at 0.05. Results: The data was collected from ten countries in Asia and five countries in Europe in 2010 (n = 10). The linear regression model revealed no significant association between Gas Prices and Population density per square km (p = 0.379). The overall model fit was weak (Adjusted R 2 = -0.0095). After adding the region factor the model fit significantly (Adjusted R 2 = 0.2843). Region, specifically Europe (p = 0.00826), was found to have a significant effect on Gas Prices. There was no significant relationship between Gas Prices and Population Density per square km for Asia (p = 0.1278). Similarly, there was no significant relationship for Europe (p = 0.8515). The ANCOVA indicated that the interaction between population density per square km
Prof Cary & Werner Fall 2023 and Region was not statistically significant (p = 0.1124) . This figure illustrates the relationship between population density (foo$Popdensity) and gas prices (foo$Gasprice) based on the dataset. Each data point is one of twenty countries. The X-Axis is a country's population density per square km and the y-axis is a country's average price, in USD, of gas per liter. (df=17, R 2 = 0.2843, p = 0.1124) Grace wrote the methods section and Alexander wrote the results section.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Prof Cary & Werner Fall 2023 Biometrics Lab 6, Part II You may continue to work in partners or you may choose to complete the remainder of the lab individually. Word identification: Fill in the blank with the term that is defined. (2 points each; 12pts) 1. Type II Error The type of error committed when one fails to reject a null hypothesis that is false and it should be rejected. 2. Pearson’s Correlation Coefficient The name (not symbol) of the parameter that measures the strength of the association between two variables that do not have a functional relationship. 3. R 2 A measure of the proportion of the variation in the values of a dependent variable explained by the independent variable in regression. 4. simple linear regression A statistical analysis that tests for a functional relationship between two continuous variables. 5. ANCOVA A statistical test used to test whether a dependent variable is functionally related to an independent variable under two different conditions. 6. simple linear correlation A statistical test that determines whether two variables are associated with one another. Alexander and Grace reviewed their quizzes and the textbook to find the answers.
Prof Cary & Werner Fall 2023 7. A group of students conducted an experiment that was designed to examine whether there is a functional relationship between the amount of food (g) eaten by rats and the carbohydrate composition (%) of the food. Please help these students by analyzing the data in the table and answering the questions. (23 points) Rat Food eaten (g) Carbohydrate composition (%) 1 452 21.7 2 488 25.7 3 490 32.0 4 546 34.3 5 446 33.2 6 495 29.2 7 452 34.5 8 488 33.8 9 490 38.6 10 546 41.6 11 430 21.7 12 465 25.7 13 496 32.0 14 510 34.3 15 534 33.2 16 542 29.2 17 580 34.5 18 585 33.8 19 604 38.6 20 624 41.6
Prof Cary & Werner Fall 2023 a) What parametric statistical inference test should be performed? Justify the selection of this analysis based upon the description of the experiment. Please be specific. We will run a simple linear regression test because we are assessing if there is a functional relationship between two continuous variables, in this case food eaten and carbohydrate composition. Grace wrote out the statement based on her and Alexander's discussion. b) What are the null and alternative hypotheses(es) for this statistical test? Be sure to include all hypothesis statements ( hint : consider whether this is a replicated design). H 0 : β= 0 or The amount of food eaten by rats does not depend on the food’s carbohydrate content. H A : β≠ 0 or The amount of food eaten by rats is dependent on the food’s carbohydrate content. (Unreplicated design) Grace found the hypotheses in her notes and Alexander wrote them out. c) What are the assumptions for this test? - Sampling methods are random and independent - Bivariate normality - All populations of Y values gave equal variance - The measurements of X were made without error - Unreplicated : the means of the Y populations lie along a straight line (the relationship between X and Y is linear) Grace added the assumption from her notes and Alexander made sure it was correct. d) You may assume these data have met the assumptions you provided in c). Run the statistical test and paste the R commands/output here. > fit<- lm(Food~Carb, data = rat) > summary(fit) Call: lm(formula = Food ~ Carb, data = rat) Residuals: Min 1Q Median 3Q Max -74.110 -21.475 -1.322 28.261 63.337 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 306.934 57.479 5.340 4.48e-05 *** Carb 6.353 1.746 3.639 0.00188 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 43.01 on 18 degrees of freedom Multiple R-squared: 0.4239, Adjusted R-squared: 0.3918 F-statistic: 13.24 on 1 and 18 DF, p-value: 0.001877 Alexander ran the data through R studios and Grace checked the code to make sure it looked correct.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Prof Cary & Werner Fall 2023 f) Report the results of this test. Include the text and all statistical information that would appear in the results section of a scientific paper. We conducted a simple linear regression analysis and we are able to conclude that we will reject the null hypothesis. We will do this because the p-value = 0.001877 which is less than the significance level of 0.05. This tells us that the amount of food eaten by rats is dependent on the food’s carbohydrate content (F 1,18 = 13.24, r 2 = .4239, df = 1 and 18, p-value = 0.001877). We can also conclude that around 42.39% of the variation in food eaten (g) was accounted for by the percentage of carbohydrates in the food. The regression equation for this problem would be y = 6.353( x ) + 306.934. Grace reported the results and Alexander checked if the values were right g) What proportion of the variation in food eaten (g) was accounted for by the % carbohydrate in food? r 2 = .4239 This shows that 42.39% of the variation in food eaten was accounted for by the percentage of carbohydrates in food. Alexander added the r 2 and Grace added the written statement h) Use Excel to create a publication quality figure to illustrate your answer and include an appropriate figure legend/caption. This scatter plot illustrates the relationship between the amount of food eaten by rats and the carbohydrate composition of their diet. Each point represents an individual rat's data. The solid line represents the linear regression fit to the data, providing insights into the direction and strength of the association. ( F 1,18 = 13.24, r 2 = .4239, df = 1 and 18, p-value = 0.001877 ) Alexander made the graph and Grace wrote the figure legend
Prof Cary & Werner Fall 2023 8. Ecologists were interested in analyzing the association between liver length (mm) and body mass (g) of the yellow perch, Perca flavescens . Their collected data are reported in the following table. Help them draw conclusions by answering the questions below. (20 points) Fish Liver length (mm) Body mass (g) 1 126 14.4 2 175 15.2 3 106 10.6 4 96 5.4 5 147 22.7 6 138 14.9 7 78 11.4 8 120 14.81 9 98 5.19 10 132 15.39 11 140 17.25 12 123 11.52 13 108 11.5 14 124 14.8 15 156 18.3
Prof Cary & Werner Fall 2023 a) What parametric statistical test should the researchers use to analyze these data and why is this an appropriate test? We will conduct a simple linear correlation analysis, as we are trying to find how body mass is related to liver length in yellow perch. Alexander wrote out this statement based on him and Grace’s discussion. b) What is the null hypothesis for this test? H 0 : ρ= 0 or There is no correlation between body mass and liver length in yellow perch H A : ρ≠ 0 or There is a correlation between body mass and liver length in yellow perch. Alexander added the first section with rho and Grace added in the written statement. c) What are the assumptions for this test? - Sampling methods are random and independent - Bivariate normality Grace added in the assumption and Alexander double checked them. d) You may assume that these data have met the assumptions you listed in c). Run the statistical test and paste the R commands/output here. > cor.test(lab6b$Liver, lab6b$mass) Pearson's product-moment correlation data: lab6b$Liver and lab6b$mass t = 3.8503, df = 13, p-value =0.002006 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: 0.3476544 0.9041236 sample estimates:cor 0.7299246 Alexander ran the data through R studios and Grace checked the code to make sure it looked correct. e) From your R output, obtain the coefficient value and use this to calculate the test statistic. Include the formula for the test statistic and show any mathematical work necessary to calculate test statistic, including any terms not reported in R. Compare this value to the critical value for the test statistic. r = 0.7299246 r 2 = 0.5328
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Prof Cary & Werner Fall 2023 S r = = 0.1895 (1−0.5328) (15−2) 0.4672 13 = 0. 0359 = t = 0.7299/0.1895 = 3.8523 Critical value t 0.05(2),13 = 2.16 Since our critical value, 2.16, is less than the test statistic, 3.8523, we will reject our null hypothesis. Alexander did the calculation on the calculator and Grace wrote the statement. f) Report the result of testing the null hypothesis. Provide a clear, concluding statement about the data, which is supported by the appropriate statistical output. Also include the 95% confidence interval for the coefficient. Since the p-value = 0.002 < 0.05, we will reject the null hypothesis and conclude that there is a significant correlation between body mass and liver length in 15 yellow perch (t=3.8523, df=13, r=0.7299, t 0.05(2),13 = 2.16). The 95% confidence intervals for the coefficients are 0.3477 and 0.9041. Grace wrote the statement out based on Alexander and Grace’s discussion. g) Use Excel to create a publication quality figure to illustrate your answer and include an appropriate figure legend/caption. This scatter plot illustrates the correlation between body mass and liver length in yellow perch. Each point represents an individual fish with body mass (g) plotted on the x-axis and liver length (cm) in the y-axis. A significant linear correlation was found between the two
Prof Cary & Werner Fall 2023 variables and the strength and direction of the association is indicated. The linear correlation analysis was conducted using a Pearson correlation analysis (t=3.8523, df=13, r=0.7299, t 0.05(2),13 = 2.16). Alexander created the graph and Grace wrote the figure legend