HWK5_Soln

pdf

School

University of Wisconsin, Madison *

*We aren’t endorsed by this school

Course

371

Subject

Statistics

Date

Feb 20, 2024

Type

pdf

Pages

6

Uploaded by UltraDolphinMaster987

Report
Stat 371 Homework #5 SOLUTIONS Submit your homework to Canvas by the due date and time. Email your lecturer if you have extenuating circumstances and need to request an extension. If an exercise asks you to use R, include a copy of all relevant code and output in your submitted homework file. You can copy/paste your code, take screenshots, or compile your work in an Rmarkdown document. If a problem does not specify how to compute the answer, you many use any appropriate method. I may ask you to use R or use manual calculations on your exams, so practice accordingly. You must include an explanation and/or intermediate calculations for an exercise to be complete. Be sure to submit the HWK5 Autograde Quiz which will give you ~20 of your 40 accuracy points. 50 points total: 40 points accuracy, and 10 points completion Sampling Distributions and CLT Exercise 1. Retail stores experience their heaviest volume of transactions that include returns on December 26th and December 27th each year. The distribution for the Number of Items Returned (X) by Macy’s customers who do a return transaction on those days last year is given in the table below. It has mean: µ = 2 . 61 and variance σ 2 1 . 80 . Number of Items Returned in Transaction (x) Probability 1 0.25 2 0.28 3 0.20 4 0.17 5 0.08 6 0.02 a. Is this population distribution left skewed, symmetric, or right skewed? How do you know? This distribution is right skewed since the lower half of the data is more tightly clustered together. b. What proportion of returns had three or more items? P ( X 3) = 0 . 20 + 0 . 17 + 0 . 08 + 0 . 02 = 0 . 47 c. Identify which histogram below diplays (1) the population X values, (2) the simulated sampling distribution of the sample mean ¯ X , (3) the simulated sampling distribution of the sample total T . Briefly explain how you know. 1
Histogram A Frequency 60 80 100 120 140 160 0 Histogram B Density 1 2 3 4 5 6 0.00 Histogram C Frequency 1.5 2.0 2.5 3.0 3.5 0 Histogram A: sample total; Histogram B: population; Histogram C: sample mean. We can see the right-skewed nature of the data in histogram B, and the others appear more normal. Histogram C is centered at 2.61, the original population mean, and histogram A has a much bigger center. d. Describe the sampling distribution (shape, mean, and standard deviation) of the sample mean number of items returned in 45 return transactions ¯ X = X 1 + X 2 + ... + X 45 45 according to theory. Make sure to name any theorems you are using. With a fairly large sample, the CLT applies, so X-bar is approximately normal with mean µ ¯ X = 2 . 61 and var σ 2 ¯ X = 1 . 80 45 = 0 . 04 . ¯ X N (2 . 61 , 0 . 2 2 ) e. What is the probability that the mean number of items returned in the 45 return transactions reviewed will be 3 or more items? P ( ¯ X > 3) = P ( Z 3 2 . 61 1 . 80 / 45 ) = P ( Z 3 2 . 61 0 . 2 ) = P ( Z 1 . 95) = 1 0 . 9744119 = 0 . 02558806 . # Two different ways to find the probability with pnorm 1 - pnorm( 3 , 2.61 , sqrt( 1.80 / 45 )) ## [1] 0.02558806 pnorm( 3 , 2.61 , sqrt( 1.80 / 45 ), lower.tail = FALSE) ## [1] 0.02558806 f. Explain why the value you found in (e) was so much smaller than the value found in (b). The mean values will have less variability around the mean value of 2.61. That is, it is more likely to find values in the population which are extreme ( 3 ) than sample mean values. g. Consider the total number of items returned in 45 customer return transactions. Describe the sampling distribution (shape, center, and spread) of the total number of items returned T = X 1 + X 2 + · · · + X 45 . Make sure to name any theorems you are using. We can use the CLT to approximate T = n ¯ X , as being Normally distributed. 2
E ( T ) = E ( X 1 + X 2 + ... + X 45 ) = 45 E ( X 1 ) = 45(2 . 61) = 117 . 45 V ( T ) = V ( X 1 + X 2 + ... + X 45 ) = V ( X 1 ) + V ( X 2 ) + ... + V ( X 45 ) = 45 V ( X 1 ) = 45(1 . 80) = 81 So, T N (117 . 45 , 9 2 ) . h. Find an upper bound b such that the total number of items returned in 45 customers’ return transactions will be less than b with probability 0.95. P ( T < b ) = P ( Z < b 117 . 45 9 ) = 0 . 95 . So we need the quantile z such that P ( Z z ) = 0 . 95 . z = 1 . 645 = b 117 . 45 9 , so b = 1 . 645(9) + 117 . 45 = 132 . 25 . We can round up to 133 as the bound. # Two different ways to find the 95% percentile with qnorm qnorm( 0.95 ) ## [1] 1.644854 qnorm( 0.95 )* 9 + 117.45 ## [1] 132.2537 qnorm( 0.95 , 117.45 , 9 ) ## [1] 132.2537 Interval estimation for a population mean Exercise 2. Consider the tree data set in R, trees . trees is a data frame object, which contains multiple vectors. We can access a specific vector by using the $ operator. For example: # The data frame contains 3 columns (vectors) trees # This is how to access the "Girth" vector specifically trees$Girth # We can use this vector in our usual R functions mean(trees$Girth) a. Construct histograms and qqnorm plots for all three of the quantitative variables recorded on the 31 trees. For which of the three variables do we have the strongest evidence that the population of values may not be well approximated by a normal random variable? From the histogram and qqnorm plots, we have evidence that the population of Volume values of cherry trees felled for lumber is probably not normal, as our sample of data is right skewed. par( mfrow= c( 1 , 2 )) hist(trees$Girth, main= "Girth" , xlab= "Girth" ); qqnorm(trees$Girth); qqline(trees$Girth) 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Girth Girth Frequency 8 12 16 20 0 2 4 6 8 10 12 -2 -1 0 1 2 8 10 12 14 16 18 20 Normal Q-Q Plot Theoretical Quantiles Sample Quantiles hist(trees$Height, main= "Height" , xlab= "Height" ); qqnorm(trees$Height); qqline(trees$Height) Height Height Frequency 60 70 80 90 0 2 4 6 8 10 -2 -1 0 1 2 65 70 75 80 85 Normal Q-Q Plot Theoretical Quantiles Sample Quantiles hist(trees$Volume, main= "Volume" , xlab= "Volume" ); qqnorm(trees$Volume); qqline(trees$Volume) Volume Volume Frequency 10 30 50 70 0 2 4 6 8 10 -2 -1 0 1 2 10 20 30 40 50 60 70 Normal Q-Q Plot Theoretical Quantiles Sample Quantiles par( mfrow= c( 1 , 1 )) b. Since n = 31 for each of these variables, we believe the CLT will make ¯ X N even for the possibly non normal populations referenced above. Construct 90% t confidence intervals “by hand” for all three variables using the sample data found in the trees data set. Summaries of the variables are given below and you should use an R function to find the relevant t critical value. mean(trees$Girth); sd(trees$Girth); length(trees$Girth) ## [1] 13.24839 4
## [1] 3.138139 ## [1] 31 mean(trees$Height); sd(trees$Height); length(trees$Height) ## [1] 76 ## [1] 6.371813 ## [1] 31 mean(trees$Volume); sd(trees$Volume); length(trees$Volume) ## [1] 30.17097 ## [1] 16.43785 ## [1] 31 We can use the sample averages, sample averages, and sample sizes to compute t CIs for the population means. The multiplier will be 1.697 on each CI since they are all at the 90% level and n=31 for each group. Girth: 13 . 25 ± (1 . 697) 3 . 138 31 = (12 . 294 , 14 . 206) Height: 76 ± (1 . 697) 6 . 372 31 = (74 . 058 , 77 . 942) Volume: 30 . 17 ± (1 . 697) 16 . 44 31 = (25 . 16 , 35 . 18) # t critical value for 90% confidence qt( 0.95 , df = 30 ) ## [1] 1.697261 c. Construct the same confidence intervals that you constructed in (b) above using the t.test() command in R. Confirm that you get very similar endpoints. t.test(trees$Girth, conf.level = 0.9 ) ## ## One Sample t-test ## ## data: trees$Girth ## t = 23.506, df = 30, p-value < 2.2e-16 ## alternative hypothesis: true mean is not equal to 0 ## 90 percent confidence interval: ## 12.29177 14.20501 ## sample estimates: ## mean of x ## 13.24839 t.test(trees$Height, conf.level = 0.9 ) ## ## One Sample t-test ## ## data: trees$Height ## t = 66.41, df = 30, p-value < 2.2e-16 ## alternative hypothesis: true mean is not equal to 0 ## 90 percent confidence interval: ## 74.05764 77.94236 ## sample estimates: 5
## mean of x ## 76 t.test(trees$Volume, conf.level = 0.9 ) ## ## One Sample t-test ## ## data: trees$Volume ## t = 10.219, df = 30, p-value = 2.753e-11 ## alternative hypothesis: true mean is not equal to 0 ## 90 percent confidence interval: ## 25.16010 35.18183 ## sample estimates: ## mean of x ## 30.17097 d. Suppose this data came from 31 trees cut down by a single logger. How does that affect the conclusions we can draw? Suppose this data came from 31 trees selected at the saw mill from a variety of logging companies, how does that affect the conclusions we can draw? Knowing the trees were all taken by a single logger means I would only feel comfortable make an inference about the average values of trees that that logger selects. Knowing the trees were all taken by from a single mill, I would only feel comfortable make an inference about the average values of trees that that mill collects/its loggers supply. We need to consider what population we have a random sample from and only make inferences about that population’s parameter. e. Suppose the 31 trees in the trees data set is a random sample from those at a saw mill. The mill would like to use this sample to estimate the proportion of trees that they have at their mill with Volume over 65 cubic ft. Use the code below to determine what count of trees in this sample have Volume over 65 cubic ft. Then, explain why they should not do a large-sample z confidence interval for the proportion of trees at their mill with Volume over 65 cubic feet with this sample of 31 trees. sum(trees$Volume > 65 ) ## [1] 1 In order for the normal approximation for ˆ p to be good, we would want n ˆ p 5 and n (1 ˆ p ) 5 . This is not true for this sample, since there is only one single tree above 65 cubic feet. They should consider taking a larger sample from their stockyard. 6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help