Assignment 4 NO solutions

docx

School

University of Guelph *

*We aren’t endorsed by this school

Course

2230

Subject

Statistics

Date

Feb 20, 2024

Type

docx

Pages

Uploaded by BarristerKangaroo2917

University of Guelph STAT*2230- Biostatistics for Integrative Biology Assignment (4) Please hand in a typed (not handwritten!) version of the assignment. Show/Provide (Copy/Paste in the word document) both your R code and the result/output. No screenshots. Question 1 [10 marks] As part of a large study of body composition, researchers captured 20 male Monarch butterflies at Oceano Dunes State Park in California and measured wing area (in cm 2 ). The data are given in the table below: Sample (1)- Wing area ( cm 2 ) 33.9 33.2 30.0 36.6 35.5 34.0 34.2 32.0 28.0 32.0 32.2 32.2 32.3 30.0 33.1 30.2 35.5 36.5 34.0 36.1 a) [2 marks] Create a density (probability, not frequency) histogram for the wing area (in cm 2 ) of 20 male Monarch butterflies. Use a suitable label for the x-axis [0.5 mark]. Use color red for the bins [0.5 mark]. Use a suitable title for your histogram [0.5 mark] Note: Show/Provide (Copy/Paste in the word document) both the R code [0.5 mark] and the graph. Do not take a screenshot. > wing_area <- c(33.9, 33.2, 30.0, 36.6, 35.5, 34.0, 34.2, 32.0, 28.0, 32.0, + 32.2, 32.2, 32.3, 30.0, 33.1, 30.2, 35.5, 36.5, 34.0, 36.1) > hist(wing_area, freq = FALSE, col = 'red', main = 'Density Histogram of Wing Area for Male Monarch Butterflies', xlab = 'Wing Area (cm^2)') Page 1 of 9

b) [2 marks] Create and interpret [ 1 mark ] a 95% confidence interval for the average wing area (in cm 2 ) of 20 male Monarch butterflies. Note (1): Do the calculations one time using R [ 0.5 mark ] and the other time by hand [ 0.5 mark ] (show your work). Note (2): when you create the 95% C.I. using R, code manually (write the code for each step not using numbers), do not use a function in R to calculate the C.I. and show your code. Page 2 of 9

> mean_wing_area <- mean(wing_area) > sd_wing_area <- sd(wing_area) > t_score <- qt(0.975, df = length(wing_area) - 1) > se <- sd_wing_area / sqrt(length(wing_area)) > margin_error <- t_score * se > ci_lower <- mean_wing_area - margin_error > ci_upper <- mean_wing_area + margin_error > list(ci_lower = ci_lower, ci_upper = ci_upper) $ci_lower [1] 31.96895 $ci_upper [1] 34.18105 c) [4 marks] Suppose we picked another SRS of 10 male Monarch butterflies from Oceano Dunes State Park in California. The data are given in the table below: Sample (2)- Wing area ( cm 2 ) 33.7 33.3 30.2 36.5 34.5 35.0 34.2 32.0 28.0 36.0 I) [1 mark] Do you think that the 95% confidence interval for this sample will be wider or narrower compared to the first sample of 20 male Monarch butterflies from Oceano Dunes State Park in California? Why? When comparing the 95% confidence intervals of two samples where the samples are from the same population and the level of confidence is the same, the width of the confidence interval depends on two factors: the sample standard deviation and the sample size. The sample standard deviation (s) affects the margin of error; a larger standard deviation leads to a wider confidence interval. The sample size (n) affects the standard error of the mean; a larger sample size decreases the standard error and thus narrows the confidence interval. Page 3 of 9

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Since the second sample has a smaller sample size (n=10) compared to the first sample (n=20), the standard error of the mean for the second sample will be larger. This is because the standard error is inversely proportional to the square root of the sample size. As a result, even if the sample standard deviations were similar, the 95% confidence interval for the second sample of 10 butterflies is expected to be wider because of the smaller sample size. II) [1 mark] Create 95% confidence interval (using R) for the average wing area (in cm 2 ¿ . > wing_area_sample2 <- c(33.7, 33.3, 30.2, 36.5, 34.5, 35.0, 34.2, 32.0, 28.0, 36.0) > mean_wing_area_sample2 <- mean(wing_area_sample2) > sd_wing_area_sample2 <- sd(wing_area_sample2) > t_score_sample2 <- qt(0.975, df = length(wing_area_sample2) - 1) > se_sample2 <- sd_wing_area_sample2 / sqrt(length(wing_area_sample2)) > margin_error_sample2 <- t_score_sample2 * se_sample2 > ci_lower_sample2 <- mean_wing_area_sample2 - margin_error_sample2 > ci_upper_sample2 <- mean_wing_area_sample2 + margin_error_sample2 > list(ci_lower = ci_lower_sample2, ci_upper = ci_upper_sample2) $ci_lower [1] 31.45934 $ci_upper [1] 35.22066 III) [2 marks] Create a density (probability) histogram for the wing area (in cm 2 ) of the 10 male Monarch butterflies. Use a suitable label for the x-axis [0.5 mark]. Use color blue for the bins [0.5 mark]. Use a suitable title for your histogram [0.5 mark] and [0.5 mark] for R code. > hist(wing_area_sample2, freq = FALSE, col = 'blue', main = 'Density Histogram of Wing Area for 10 Male Monarch Butterflies', xlab = 'Wing Area (cm^2)') Page 4 of 9

d) [2 marks] Combine the two plots (histograms) from the previous two parts (a and c-III) into one figure with two panels. Place one graph below the other, using the par() function in R. You can check how the par() function works in R using the help tool by entering: ?par Note: Show your codes [1 mark] and the graph (Copy/Paste) [1 mark] do not take a screenshot of the graph > wing_area_sample1 <- c(33.9, 33.2, 30.0, 36.6, 35.5, 34.0, 34.2, 32.0, 28.0, 32.0, + 32.2, 32.2, 32.3, 30.0, 33.1, 30.2, 35.5, 36.5, 34.0, 36.1) > wing_area_sample2 <- c(33.7, 33.3, 30.2, 36.5, 34.5, 35.0, 34.2, 32.0, 28.0, 36.0) > par(mfrow = c(2, 1)) > hist(wing_area_sample1, freq = FALSE, col = 'red', main = 'Density Histogram of Wing Area for 20 Male Monarch Butterflies', xlab = 'Wing Area (cm^2)') > hist(wing_area_sample2, freq = FALSE, col = 'blue', main = 'Density Histogram of Wing Area for 10 Male Monarch Butterflies', xlab = 'Wing Area (cm^2)') > par(mfrow = c(1, 1)) Page 5 of 9

Question (2) [9 marks] For each of 584 longleaf pine trees in the Wade Tract in Thomas County, Georgia, researchers measured the diameter at breast height (DBH). This is the diameter of the tree at a height of 4.5 feet, and the units are centimeters (cm). Only trees with DBH greater than 1.5 cm were sampled. Here are the diameters of a random sample of 40 of these trees: 10.5, 13.3, 26.00, 18.3, 52.2, 9.2, 26.1, 17.6, 40.5, 31.8, 47.2, 11.4, 2.7, 69.3, 44.4, 16.9, 35.7, 5.4, 44.2, 2.2, 4.3, 7.8, 38.1, 2.2, 11.4, 51.5, 4.9, 39.7, 32.6, 51.8, 43.6, 2.3, 44.6, 31.5, 40.3, 22.3, 43.3, 37.5, 29.1, 27.9 a) [2.5 marks] Combine three plots (a frequency histogram, a boxplot, and a Q-Q plot) into one figure to examine the distribution of DBHs. Place all of them in one row using the par() function in R. You can check how the par() function works in R using the help tool by entering: ?par Page 6 of 9

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Note (1): Show your codes [0.25 mark] and the figure (Copy/Paste) do not take a screenshot of the figure. Note (2): Use a suitable label for the x-axis [0.75 mark] for each graph. Use color red for the bins of the histogram, light blue for the boxplot, and color red for the Q- Q plot [0.75 mark] . Use a suitable title for your histogram [0.75 mark] > dbh_data <- c(10.5, 13.3, 26.0, 18.3, 52.2, 9.2, 26.1, 17.6, 40.5, 31.8, 47.2, 11.4, 2.7, 69.3, 44.4, 16.9, 35.7, 5.4, 44.2, 2.2, 4.3, 7.8, 38.1, 2.2, 11.4, 51.5, 4.9, 39.7, 32.6, 51.8, 43.6, 2.3, 44.6, 31.5, 40.3, 22.3, 43.3, 37.5, 29.1, 27.9) > par(mfrow = c(1, 3)) > hist(dbh_data, col = 'red', main = 'Frequency Histogram of DBH', xlab = 'Diameter at Breast Height (cm)') > boxplot(dbh_data, col = 'lightblue', main = 'Boxplot of DBH', xlab = 'Diameter at Breast Height (cm)') > qqnorm(dbh_data, main = 'Q-Q Plot of DBH', xlab = 'Theoretical Quantiles', ylab = 'Sample Quantiles', col = 'red') > qqline(dbh_data, col = 'red') > par(mfrow = c(1, 1)) Page 7 of 9

b) [0.75 marks] Write a careful description of the distribution for the graphs in (a), commenting on the distribution of the data. c) [1.75 marks] Assuming that the Normality assumption was held (even if it was not), use the mean, standard deviation, and qt() from R to calculate a 90% confidence interval for the mean DBH of all trees in the Wade Tract. Interpret your finding. d) [2 marks] Is the average DBH of all trees in the Wade Tract different from 23 (cm)? Use α = 0.10 and interpret your findings. Notes: State the null and alternative hypotheses that you are testing [0.5 mark] and your conclusion (include both statistical and biological interpretation). [1.5 marks]. e) [2 marks] Use the confidence interval found in (c) to test whether the true average of DBH of all trees in the Wade Track different from 23 (mm) [0.5 mark] . Justify your answer [0.5 mark] . State your null and alternative hypotheses [1 mark] Note: DO NOT perform a full hypothesis test. Question 3 [6 marks] The Classic Bottling Company has just installed a new bottling process that will fill 344 ml cans of its Classic Cola soft drink. Under-filling leads to customer complaints, and over-filling costs the company considerable money. The bottling company wants to set up a hypothesis to test if on average the machine underfills the bottles. To test this hypothesis the company drew a random sample of 36 filled Page 8 of 9

cans and found that the sample average was 344.06. Assume that the amount filled by the machine is normally distributed with a population standard deviation of 0.1 . Does the data suggest that on average the machine underfills the cans? Test using a 20% level of significance. a) [1 mark] State the two hypotheses of interest. The two hypotheses of interest are the null hypothesis and the Alternative hypothesis. Null: the machine does not underfill the cans on average (the mean fill is μ = 344ml) Alternative: the machine underfills the cans on average ( the mean fill is μ < 344ml) b) [2 marks] Calculate an appropriate test statistic for (a). Z = ¯ x − μ 0 σ / √ n Z = 344.06 − 344 0.1 / √ 36 c) [3 marks] Write your conclusion using the p-value [1 mark] method (include both statistical [1 mark] and biological interpretation [1 mark] ) Page 9 of 9

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Assignment 4 NO solutions

Related Documents