asg2

pdf

School

University of Alberta *

*We aren’t endorsed by this school

Course

151

Subject

Statistics

Date

Apr 3, 2024

Type

pdf

Pages

5

Uploaded by BarristerDanger12880

Report
1 LAB 2 ASSIGNMENT Due: Friday October 20 at 9:59 pm IMPORTANT: 1) In this lab, you will need to use R (or R commander ) to generate the outputs. 2) For all graphs and charts, please label the axes and ensure proper titles are used. 3) For all tables, please ensure the correct variable name(s) are used. 4) Each group will be expected to create a Google document for the lab report where students will type their answers (in full sentences) and paste the R commander output (where necessary) for each lab question. 5) Completed assignments will be saved as a PDF file, submitted, graded, and returned on eClass. 6) Each lab group MUST upload and submit only ONE lab report, so students MUST work together to complete the lab assignment together. 7) Please see the Lab Submission Info tab through the Lab Information link in the Labs section on eClass for details on how to submit your lab report on eClass. SAMPLING DISTRIBUTIONS, CENTRAL LIMIT THEOREM In this lab assignment, you will explore important properties of the sampling distribution of a sample mean in the context of a filling process. In particular, you will use some sampling procedures in R commander to demonstrate the validity of the Central Limit Theorem. You will see that the distribution of the sample mean for samples drawn from a highly skewed distribution becomes approximately normal as the sample size increases. Moreover, you will investigate how the spread of the sampling distribution of the sample mean is affected by sample size. How Much Cola in the Bottle? These days, soft drink dispensing (such as cola) is performed by filling machines. These are set to deliver a certain amount of drink, which we will call the target amount, and the contents of bottles will vary around this mean value. The amount of variation will depend on the efficiency of the machine itself as well as certain properties of the drink, such as its density. The bottler may be able to reduce this variation, but no amount of expertise or effort could lead to its complete removal. A company uses a filling machine to fill plastic bottles with a popular cola. The bottles are supposed to contain 8 ounces (oz) of the drink. However, when we buy a bottle of cola, which bears a stamp claiming that the amount of the drink is 8 oz, would we expect to obtain exactly 8 oz of cola? We would probably expect some amount close but not exactly equal to 8 oz. If the amount of drink dispensed by the filling machine follows a symmetric distribution and the mean target value is set equal to the claimed amount of 8 ounces, half of the bottles would be underfilled and half would be overfilled. This may seem perfectly reasonable to the bottler but consumers may feel differently, particularly if they happen to buy the underfilled bottles. To make the customers happy, the bottler may decide to overfill the bottles slightly so that the target fill of the machine is more than the claimed amount. However, even a small increase in the target fill represents a loss of many thousands of dollars to the bottler. For this lab, suppose the bottles are shipped in packs or boxes containing either 6 or 30 bottles, respectively. How does the amount of drink vary from bottle to bottle? How does the average amount of drink vary from box to box containing the same number of bottles? How does the number of bottles in a box affect the distribution of the means? You will obtain the answers to all these questions in this lab.
2 Answer the following questions: 1. Suppose the amount of cola dispensed by a filling machine follows a normal distribution with a mean (μ) and a standard deviation (σ). Select the Distributions option in the R commander menu and then the Normal distribution among continuous distributions options. This allows you to obtain a graph of the normal density function, and to calculate normal probabilities when the parameters (μ and σ) are provided. Use R commander to answer the following questions. ( Hint : Numerical answers for parts (b) and (c) should be rounded to three decimal places.) (b) Assume that the mean amount dispensed by the machine is set at μ = 8 oz. Describe what happens to the percentage of underfilled bottles (the bottles containing less than 8 oz) when σ decreases or increases? In general, how does the magnitude of the standard deviation affect the filling process? (c) Now assume that the mean amount dispensed by the machine is set at μ = 8.1 oz. Enter the value of σ as 0.1 oz. Calculate the percentage of underfilled bottles (the bottles containing less than 8 oz) in this case. What is the percentage of underfilled bottles if σ were 0.05 oz and 0.04 oz? In general, what is the effect of decreasing σ on the percentage of underfilled bottles? (d) Now set the standard deviation to 0.05 oz and change the mean. Enter the value of µ as 8, then 8.05, and eventually 8.1 oz. Calculate the percentage of underfilled bottles in each case. Describe briefly how the shape of the corresponding curve changes. How does changing the value of µ affect the filling process? Does the percentage of underfilled bottles increase or decrease? Do not print the density curves. 2. Consider a random sample of 400 bottles obtained from the population of all bottles filled by the machine over a specific short time period. The volume of cola in each bottle is determined. The 400 observations recorded in the first column Volume are available in the data file Lab2-Q2.txt on eClass. Given the very large sample size, we may assume that the distribution of the volume of cola in bottles in the sample (data file) is close enough to the population distribution while its mean and standard deviation are close to the population parameters (μ and σ). (a) Obtain a frequency histogram of the 400 observations with the bins starting at 8.07, ending at 8.18, and using a width of 0.01. ( Hint: R assumes that the right endpoint of each interval is included. Your histogram should include the left endpoints.) Paste the histogram into your report. The format of the histogram should be the same as the format of the histogram in Lab 1 Instructions (labels at the axes, title). (b) Describe the shape of the histogram obtained in part (a). Does the histogram support the claim of the company that the bottles are slightly overfilled? (c) Obtain a Q-Q plot and a boxplot for the 400 observations. Add a title to each plot. Paste both plots into your report. (TIP: Click “Options” and select Outliers “(Interactively) with mouse” when you make the boxplot in R commander to see to which observation the outlier(s) corresponds.) Is (are) there any outlier(s)? Based upon the QQ-plot, does the distribution of volume of cola in the bottles appear to be normal? What conclusions can be made about the shape of the distribution from the Q-Q plot and boxplot? What does the relationship between the whiskers tell us about the shape of the distribution? Do the plots collectively confirm your findings in part (b) about the shape of the distribution? (d) Obtain the summary statistics (mean, standard deviation, IQR, min, Q 1 , median, Q 3 , max, and n ) of the 400 observations. Paste the summary statistics into your report. Briefly describe the relationship between the mean and median, as well as the relationship between the three quartiles. Are the relationships consistent with the observed shape of the histogram in part (b)?
3 Suppose that 200 packs are randomly selected, each consisting of 6 bottles of cola obtained from the population of all bottles filled over a certain short time period. The amount of cola in each bottle is determined. The measurements are saved in a table consisting of 6 rows (sample size) and 200 columns (number of random samples) that occupies the columns Sample1 Sample200 in the lab2-Q3.txt file. 3. Obtain the mean amount of cola for each sample consisting of 6 bottles. Make sure that all 200 columns are included in the panel of the “Numerical Summaries” dialog box. (a) Obtain a frequency histogram of the 200 means with the bins starting at 8.08, ending at 8.15, and using a width of 0.005. ( Hint: R assumes that the right endpoint of each interval is included. Your histogram should include the left endpoints.) Paste the histogram into your report. The format of the histogram should be the same as the format of the histogram in Lab 1 Instructions (labels at the axes, title). (b) Refer to the histogram obtained in part (a). Do the data appear to be normally distributed? Compare the distribution of the means to the distribution of individual observations studied in Question 2 in terms of their degree of skewness and spread. (c) Obtain a Q-Q plot and a boxplot for the 200 means. Add a title to each plot. Paste the plots into your report. Is (are) there any outlier(s)? Do the plots collectively confirm your findings in part (b)? Compare the plots with the ones in Question 2, part (c). (d) Obtain the sample size, mean, and standard deviation of the 200 means. Paste the summaries into your report. Compare the values with the mean and the standard deviation of the sampling distribution of the sample mean predicted by the theory of sampling distributions. What does the standard deviation mean here? Now suppose 200 boxes are randomly selected, each consisting of 30 bottles of cola obtained from the population of all bottles filled over the same short time period. The amount of cola in each bottle is determined. The measurements are saved in a table consisting of 30 rows (sample size) and 200 columns (number of random samples) that occupies the columns Sample1 Sample200 in the lab2-Q4.txt file. 4. Obtain the mean amount of cola for each sample consisting of 30 observations. Make sure that all 200 columns are included in the panel of the “Numerical Summaries” dialog box. (a) Obtain a frequency histogram of the 200 means with the bins starting at 8.09, ending at 8.13, and using a width of 0.003. Paste the histogram into your report. ( Hint: R assumes that the right endpoint of each interval is included. Your histogram should include the left endpoints.) The format of the histogram should be the same as the format of the histogram in Lab 1 Instructions (labels at the axes, title). (b) Describe the shape of the histogram in part (a). Does the data appear to be approximately normally distributed? Compare the histogram with the histogram obtained in Question 2, part (a) and the one in Question 3, part (a). In particular, comment about differences in degree of skewness and spread between each pair of graphs. (c) Obtain a Q-Q plot and a boxplot for the 200 means. Add a title to each plot. Paste the plots into your report. Is (are) there any outlier(s)? Does it appear that the sample means come from a normal distribution? Explain. Do the plots collectively confirm your findings in part (b)? Compare the plots with the plots obtained in part (c) of Questions 2 and 3. What do you conclude? (d) Use the Summary Statistics (Columns) feature to obtain the sample size, mean, and standard deviation of the 200 means. Paste the summaries into your report. Compare the value of the standard deviation of the sample mean for n = 30 with the standard deviation of the sample mean in Question 3, part (d) (for n = 6). Compare the values with the mean and the standard deviation of the sampling distribution of the sample mean predicted by the theory of sampling distributions. Which sample mean tends to be a more accurate estimate of the population mean?
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
4 LAB 2 ASSIGNMENT MARKING SCHEMA Proper Title Page (Using Lab Assignment Template on eClass): 5 points Appearance: 5 points (1 bonus point for each question submitted properly on eClass) Note: Lab assignments must be typed and submitted on eClass . A handwritten assignment is not acceptable and it will receive a mark of zero for the whole assignment. Question 1 (20) (a) Percentage of underfilled bottles when the standard deviation decreases or increases: 2 points How the magnitude of the standard deviation affects the filling process: 2 points (b) Percentage of underfilled bottles when μ = 8.1 and σ = 0.1 oz: 2 points Percentage of underfilled bottles when μ = 8.1 and σ = 0.05 oz: 2 points Percentage of underfilled bottles when μ = 8.1 and σ = 0.04 oz: 2 points Effect of decreasing σ on the percentage of underfilled bottles: 2 points (c) Percentage of underfilled bottles when μ = 8 and σ = 0.05 oz: 2 points Percentage of underfilled bottles when μ = 8.05 and σ = 0.05 oz: 2 points Percentage of underfilled bottles when μ = 8.1 and σ = 0.05 oz: 2 points Effect of increasing μ on the percentage of underfilled bottles: 2 points Question 2 (33) (a) Properly formatted histogram of the 400 observations: 4 points (b) Shape of the histogram in part (a): 2 points Conclusion about histogram support of company’s claim: 2 points (c) Q-Q plot with a title: 4 points Boxplot with a title: 4 points Outliers: 4 points (2 points for each plot) Normal distribution?: 1 point Conclusion about shape of distribution: 2 points Consistency with the conclusions in part (b): 2 points (d) Summary statistics output: 2 points Relationship between mean and median: 2 points Relationship among the three quartiles: 2 points Consistency with the conclusions in part (b): 2 points Question 3 (35) (a) Properly formatted histogram of the 200 sample means ( n = 6): 4 points (b) Shape of the histogram in part (a), normality: 2 points Comparison with parent distribution (degree of skewness, spread): 4 points (2 points each feature) (c) Q-Q plot with a title: 4 points Boxplot with a title: 4 points Outliers: 4 points (2 points for each plot) Comparison with conclusions in part (b): 2 points Comparison with plots in Question 2: 2 points (d) Summary statistics output: 3 points Comparison with the values predicted by theory: 4 points (2 points for mean and 2 points for sd) Standard deviation: 2 points
5 Question 4 (45) (a) Properly formatted histogram of the 200 sample means ( n = 30): 4 points (b) Shape of the histogram in part (a), normality: 2 points Comparison with graph from Question 2 (skewness, spread): 4 points (2 points for each feature) Comparison with graph from Question 3 (skewness, spread): 4 points (2 points for each feature) (c) Q-Q plot with a title: 4 points Boxplot with a title: 4 points Outliers: 4 points (2 points for each plot) Normality: 2 points Comparison with conclusions in part (b): 2 points Comparison with plots in Question 2 and conclusion: 2 points Comparison with plots in Question 3 and conclusion: 2 points (d) Summary statistics output: 3 points Comparison of the standard deviations: 2 points Comparison with the values predicted by theory: 4 points (2 points for mean and 2 points for sd) Sample mean which is more accurate estimate of the population mean: 2 points TOTAL = 143