Lab 8 Pre-Lab and Exercises

pdf

School

University of Calgary *

*We aren’t endorsed by this school

Course

217

Subject

Statistics

Date

Apr 3, 2024

Type

pdf

Pages

9

Uploaded by trist8182

Report
11/26/2019 Statistics 213 Lab Exercises – Confidence Intervals and a Bootstrapped Sample https://scott-robison.rstudio.cloud/a8c401512bf9446cb9e30f29e3a95885/file_show?path=%2Fcloud%2Fproject%2FLab8.nb.html 1/9 Statistics 213 Lab Exercises – Confidence Intervals and a Bootstrapped Sample © Jim Stallard, Scott Robison, and Claudia Mahler 2019 all rights reserved. Pre-Lab Exercise 1: To determine the amount of caffeine (in milligrams) that are in a medium ‘light-roast’ cup of coffee from Good Dirt Café, a random sample of 12 medium cups of light- roast blend were inspected over the course of a week. The amount of caffeine in each cup was observed. The resulting data are provided. x=c(112.8, 86.4, 45.9, 110.3, 100.3, 93.3, 101.9, 115.7, 92.5, 117.3, 105.6, 81.6) x The mean and standard deviation of this sample were computed: mean(x) sd(x) n=length(x) n a. Compute 95% confidence interval for , the mean amount of caffeine in a medium-sized cup of light-roast blend from Good Dirt Café, if you assume the data is normally distributed . Code Hide Hide Hide
11/26/2019 Statistics 213 Lab Exercises – Confidence Intervals and a Bootstrapped Sample https://scott-robison.rstudio.cloud/a8c401512bf9446cb9e30f29e3a95885/file_show?path=%2Fcloud%2Fproject%2FLab8.nb.html 2/9 mean(x)-qt(0.975,n-1)*sd(x)/n^.5 mean(x)+qt(0.975,n-1)*sd(x)/n^.5 Or by using tables: 96.96667-2.201*19.75262/12^.5 96.96667+2.201*19.75262/12^.5 Notice that the table gives less precision due to rounding! Or by bootstrapping the sample: library(mosaic) RNGkind(sample.kind="Rejection"); set.seed(1); #this makes it so that so you will get the same random sample as we get below B=do(1000) * mean(resample(x, n)); quantile(B$mean,0.025) quantile(B$mean,0.975) Hide Hide
11/26/2019 Statistics 213 Lab Exercises – Confidence Intervals and a Bootstrapped Sample https://scott-robison.rstudio.cloud/a8c401512bf9446cb9e30f29e3a95885/file_show?path=%2Fcloud%2Fproject%2FLab8.nb.html 3/9 We now have produced three different confidence intervals for : 1. using the assumption that the data is normally distributed (yet we don’t know ) and then using the computer to give more digits for the T distribution than tables would give: 95% confidence interval: 2. using the assumption that the data is normally distributed (yet we don’t know ) and then using the T tables would give: 95% confidence interval: 3. using no assumptions allowing the computer to resample (with replacement from the sample we have, repeatedly) “Bootstrapping”: 95% confidence interval: So which interval is best? Typically the less assumptions we make the better, so unless we know a sample is normally distributed it is likely the bootstrapped confidence interval is the best. However, in order to bootstrap, you must have a computer and the raw data from the sample. Therefore, when possible, bootstrapping is a great idea but the other methods are good too. In this scenario it was possible to bootstrap, and we did not know the normality of the data, so let’s use the bootstrapped confidence interval as the best option: 95% confidence interval: . b. Now look at the lower and upper bounds of the confidence interval found in (a) and consider the following statement: “The probability that falls between the lower and upper bounds is about 0.95.” Is this statement true or false? Why do you think this is true or false? False! is either in the interval or it is not. the confidence interval is not a measure of probability but a measure of certainty ! We are 95%, confident or sure that will fall between . c. What do you think would happen to the 95% confidence interval you found in (a) if the sample size were larger than 12 medium cups of light-roast blend? Or, if the sample size were the same and the level of confidence was smaller, like 90%?
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
11/26/2019 Statistics 213 Lab Exercises – Confidence Intervals and a Bootstrapped Sample https://scott-robison.rstudio.cloud/a8c401512bf9446cb9e30f29e3a95885/file_show?path=%2Fcloud%2Fproject%2FLab8.nb.html 4/9 When sample size ( ) goes up the interval width will go down. Since our accuracy of estimation increases with the number of data points. When confidence ( ) goes up the interval width will increase. Since we are more confident we will have to include more possibilities. Pre-Lab Exercise 2: Is the probability of getting heads for a particular coin really 0.50? You decide to flip a particular coin 100 times, each time observing the upper-side of the coin being ‘heads’ or ‘tails’. If the upper-side shows ‘heads’, you quantify this with a ‘1’. Otherwise, you quantify the outcome of the coin-flip with a ‘0’. After the 100 tosses, you observe 61 heads. Find a 95% confidence interval for , the probability that this coin will show ‘heads’. From your answer, can you say that the probability of this coin showing ‘heads’ is 0.50? Why or why not? 61/100-qnorm(0.975)*((61/100)*(1-61/100)/100)^.5 61/100+qnorm(0.975)*((61/100)*(1-61/100)/100)^.5 Or by using tables: Hide
11/26/2019 Statistics 213 Lab Exercises – Confidence Intervals and a Bootstrapped Sample https://scott-robison.rstudio.cloud/a8c401512bf9446cb9e30f29e3a95885/file_show?path=%2Fcloud%2Fproject%2FLab8.nb.html 5/9 61/100-1.96*((61/100)*(1-61/100)/100)^.5 61/100+1.96*((61/100)*(1-61/100)/100)^.5 Notice that the table gives less precision due to rounding! Or by bootstrapping the sample: Hide Hide
11/26/2019 Statistics 213 Lab Exercises – Confidence Intervals and a Bootstrapped Sample https://scott-robison.rstudio.cloud/a8c401512bf9446cb9e30f29e3a95885/file_show?path=%2Fcloud%2Fproject%2FLab8.nb.html 6/9 library(mosaic) RNGkind(sample.kind="Rejection"); set.seed(1); #this makes it so that so you will get the same random sample as we get below B=do(1000) * mean(resample(c(rep(1,61),rep(0,100-61)), 100)); quantile(B$mean,0.025) quantile(B$mean,0.975) We now have produced three different confidence intervals for : 1. using the assumption that the data is binomial distributed and then using the computer to give more digits for the Z distribution than tables would give: 95% confidence interval: 2. using the assumption that the data is Binomial distributed and then using the Z tables would give: 95% confidence interval: 3. using no assumptions allowing the computer to resample (with replacement from the sample we have, repeatedly) “Bootstrapping”: 95% confidence interval: So which interval is best? Typically the less assumptions we make the better, so unless we know a sample is normally distributed it is likely the Bootstrapped confidence interval is the best. However, to bootstrap you must have a computer. Therefore, when possible bootstrapping is a great idea but the other methods are good too. Since does not fall between the bounds of the confidence interval (95% confidence ) it does not appear that this coin is “fair” (flips heads or tails 50% of the time). Lab Exercise 1: A random sample of 16 flights offered by a certain national air carrier is taken. For each flight chosen, the minutes each flight was delayed was observed. The flight delay is defined as the difference between the time the plane was scheduled to pull away from the jet way and the actual time the plane pulls away from the jet way (with positive values indicating that the flight is late). For now, assume the flight-delay variable is normally distributed . Data: a. Analyze the data by copying and pasting it into R-Studio. Hide
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
11/26/2019 Statistics 213 Lab Exercises – Confidence Intervals and a Bootstrapped Sample https://scott-robison.rstudio.cloud/a8c401512bf9446cb9e30f29e3a95885/file_show?path=%2Fcloud%2Fproject%2FLab8.nb.html 7/9 x=c(0, 1, -6, -6, 157, -3, 178, -3, -10, 42, -2, 120, 5, 59, 0, -2) Compute the value of the sample mean and the sample standard deviation. b. Using R Studio, find the -multiplier needed for the -version of the confidence interval for the population mean . Calculate the 95% confidence interval for using this -multiplier. c. Now consider a 99% bootstrapped confidence for Before doing any computation, do you expect this confidence interval to be wider, narrower, or to have the same width as the 95% confidence interval you computed in part b? d. Copy and paste the following code into R Studio to bootstrap the sample: library(mosaic) RNGkind(sample.kind="Rejection"); set.seed(1);#so you will get the "same" random sampling as me B=do(1000) * mean(resample(c(0, 1, -6, -6, 157, -3, 178, -3, -10, 42, -2, 120, 5, 59, 0, -2), 16 )); Compute a 99% bootstrap confidence interval for by copying the following two pieces of code into R Studio. Hide
11/26/2019 Statistics 213 Lab Exercises – Confidence Intervals and a Bootstrapped Sample https://scott-robison.rstudio.cloud/a8c401512bf9446cb9e30f29e3a95885/file_show?path=%2Fcloud%2Fproject%2FLab8.nb.html 8/9 quantile(B$mean,0.005); quantile(B$mean,0.995); Lab Exercise 2: The following is data that resulted from a cluster sample of 109 students taking Statistics 213. Each student was asked if they support differential tuition fees. If a student did support differential fees, their response was coded with a “1”. A “non support” was coded with a “0”. The data is as follows: 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1 a. Use R Studio to find the number of Statistics 213 students in this sample who “support” differential tuition fees, which you will record as the observed value of the random variable , a binomially distributed random variable. Rather than simply counting the number of ’1’s, follow the R Studio steps: x=c(0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1); sum(x); length(x); b. Compute the 97% confidence interval for , the proportion of all Statistics 213 students who support differential tuition fees (when data follows a binomial distribution). c. Use R Studio to compute a 97% bootstrapped confidence interval. Hide Hide Hide
11/26/2019 Statistics 213 Lab Exercises – Confidence Intervals and a Bootstrapped Sample https://scott-robison.rstudio.cloud/a8c401512bf9446cb9e30f29e3a95885/file_show?path=%2Fcloud%2Fproject%2FLab8.nb.html 9/9 library(mosaic) RNGkind(sample.kind="Rejection"); set.seed(1);#so you will get the "same" random sampling as me B=do(1000) * mean(resample(c(rep(1,55),rep(0,109-55)), 109)); quantile(B$mean,0.015); quantile(B$mean,0.985); d. A figure recently quoted by an executive of the Student’s Union (SU) was that 30% of all U of C students support differential tuition fees. From your finding in parts (b) and (c), is this figure supported? Use the skills you have learned in this lab to complete the lab quiz. Hide
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help