dis09_solutions

pdf

School

University of California, Berkeley *

*We aren’t endorsed by this school

Course

Subject

Statistics

Date

Feb 20, 2024

Type

pdf

Pages

Uploaded by CaptainFinch748

Data 8 Fall 2023 The Bootstrap Lab 07 October 2023 1. Mid-semester Check In a. What has been your favorite topic/assignment/lecture/anything so far with the first half of the class done? If you have any concerns about your performance in the class so far, feel free to bring it up to your lab TA. 2. Facts About the Bootstrap Suppose we are trying to estimate a population parameter . Whenever we take a random sample and calculate a statistic to estimate the parameter, we know that the statistic could have come out differently if the sample had come out differently by random chance. We want to understand the variability of the statistic in order to better estimate the parameter. However, we don’t have the resources to collect multiple random samples. In order to solve this problem, we use a technique called bootstrapping . a. When we conduct a bootstrap resample, what size resample should we draw from our sample? Why? The resample should have the same sample size as our sample. This is because our original estimate of some parameter from our sample is based on a certain sample size. If we changed the sample size, the distribution of the estimate would change. b. Why do we need to resample from our sample with replacement? If we don’t sample with replacement, then we will get the same exact sample every time. c. When we conduct a bootstrap resample, what is the underlying assumption/reasoning for resampling from our sample? Why does it work? The underlying assumption is that our sample looks similar to our population — that is, the sample is representative of what the population looks like. The validity of the bootstrap is based on this assumption, because if the sample is unrepresentative of the population, we don’t actually end up with a good picture of what range of values our estimate could take on. 1

3. Thirsty Warmup: What is the difference between a parameter and a statistic? Which of the two is random? A parameter is a property of the population, so it is fixed and doesn’t change. However, we calculate statistics from samples, which are often random. Typically, we want to use statistics in order to estimate population parameters. Therefore, a statistic is random and a parameter is not random. You are interested in investigating the liters of water consumed every day by UC Berkeley students. In particular, you want to study the proportion of students drinking less than 3 liters of water per day. You contact 150 ran- dom students from the directory and obtain the amounts of water each one of them drinks, storing them in the table water . The table has 1 column, amount , which stores the number of liters of water drunk by each student. a. What is the parameter and what is the statistic in this scenario? Population parameter: The proportion of UC Berkeley students who drink less than 3 liters of water per day. Statistic: The proportion of students in your sample who drink less than 3 liters of water per day. b. Write a line of code to calculate the proportion of students in your sample who drank less than 3 liters of water per day. np.mean(water.column(’amount’) < 3) c. Write a line of code to perform a single bootstrap resample of the data stored in the water table. water.sample(water.num rows, with replacement=True) d. Fill in the following blanks to conduct 10000 bootstrap resamples of your data, calculating the proportion of students in each resample that drink less than 3 liters of water per day, then plotting the distribution of those proportions using an appropriate visualization. proportions = for i in np.arange(10000): resampled table = resampled statistic = proportions = proportions table = Table().with column(’Resampled proportion’, proportions) proportions table. proportions = make array() for i in np.arange(10000): resampled table = water.sample(water.num rows, with replacement=True) resampled statistic = np.mean(resampled table.column(’amount’) < 3) proportions = np.append(proportions, resampled statistic) proportions table = Table().with column(’Resampled proportion’, proportions) proportions table.hist(’Resampled proportion’) 2

4. Tennis Time Ciara is interested in the heights of female tennis players. She’s collected a sample of 100 heights of profes- sional women’s tennis players. She wants to use this sample to estimate the true interquartile range (IQR) of all heights of professional women’s tennis players. Hint: We defined the interquartile range (IQR) to be: 75th percentile - 25th percentile a. In order to construct a 99% confidence interval for the IQR, what should our upper and lower percentile endpoints be? Our lower endpoint should be 0.5 and upper endpoint should be 99.5 b. Define a function ci iqr that constructs a 99% confidence interval for the IQR as follows. The function takes the following arguments: • tbl : A one-column table consisting of a random sample from the population; you can assume this sample is large • reps : The number of bootstrap repetitions Hint: To find the 25th and 75th percentile of an array, you can use the percentile function def ci iqr(tbl, reps): stats = for : resample col = new iqr = stats = left end = right end = return make array(left end, right end) def ci iqr(tbl, reps): stats = make array() for i in np.arange(reps): resample col = tbl.sample().column(0) new iqr = percentile(75, resample col) - percentile(25, resample col) stats = np.append(stats, new iqr) left end = percentile(0.5, stats) right end = percentile(99.5, stats) return make array(left end, right end) 3

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

c. Say Ciara recruited 500 of her friends to perform the same bootstrapping process she did. In other words, each of her friends drew a large, random sample of 100 heights from the population of professional women’s tennis players and constructed their own 99% confidence intervals. Approximately how many of these CI’s do we expect to contain the actual IQR for the heights of professional women’s tennis athletes? We interpret a 99% confidence interval to mean that we are 99% confident in the process used to construct that given interval. In other words, 99% of the time we use this process we expect to construct an interval that contains the true population parameter. Since we have 500 CIs, each at a 99% confidence level, we find that since 500*(0.99) = 495, we expect to have 495 of these CIs containing the actual IQR of heights. 4

Related Documents

Data Collection Technique Spr 2024-1.docx

HCM 3001 Fall 2023 Class Exercises.xlsx

Examples for Class updated.xlsx

Assignment 2.xlsx

Web capture_11-11-2023_21495_canvas.unf.edu.jpeg

Exam 3 Bonus Worksheet_Fall 2020(1) copy.docx

spring 23 solutions data 8.pdf

5-3 final project milestone two additional research methods and key findings .docx

lab 4 answers.docx

Problem Set 2- Aanyra Maddox.docx

phil 120 lab 7.pdf

quiz 3 stats.pdf

Recommended textbooks for you

Linear Algebra: A Modern Introduction

Algebra

ISBN:9781285463247

Author:David Poole

Publisher:Cengage Learning

Glencoe Algebra 1, Student Edition, 9780079039897...

Algebra

ISBN:9780079039897

Author:Carter

Publisher:McGraw Hill

Algebra & Trigonometry with Analytic Geometry

Algebra

ISBN:9781133382119

Author:Swokowski

Publisher:Cengage

Big Ideas Math A Bridge To Success Algebra 1: Stu...

Algebra

ISBN:9781680331141

Author:HOUGHTON MIFFLIN HARCOURT

Publisher:Houghton Mifflin Harcourt

Holt Mcdougal Larson Pre-algebra: Student Edition...

Algebra

ISBN:9780547587776

Author:HOLT MCDOUGAL

Publisher:HOLT MCDOUGAL

SEE MORE TEXTBOOKS

Recommended textbooks for you

Linear Algebra: A Modern Introduction
Algebra
ISBN:9781285463247
Author:David Poole
Publisher:Cengage Learning
Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill
Algebra & Trigonometry with Analytic Geometry
Algebra
ISBN:9781133382119
Author:Swokowski
Publisher:Cengage
Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt
Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL