Lab 1 - Simulation Study - Instructions-1

pdf

School

University of Illinois, Urbana Champaign *

*We aren’t endorsed by this school

Course

MISC

Subject

Statistics

Date

Apr 3, 2024

Type

pdf

Pages

Uploaded by CorporalDragon7961

Lab 1 – Simula,on Study with Diamonds Jack Voreis – jvoreis2 Forma@ng Instruc,ons - Please include all requested responses in a document, then save it as a pdf when done. o You may use this instruc:ons document, or you may create a new document. o All responses should be numbered (leaving the original ques:on text is op:onal!) - Upload your pdf to Gradescope and please match pages with the ques,on number when prompted to. - If working with one or two partners , be sure to do both of these things: o Please put all names and netIDs at the top of your document (like shown above). o Have one person upload the pdf and then ensure group members are added in your submission to Gradescope (click view/edit group on the top right of the page once shown your final submission aKer matching pages). Assignment Overview - For this lab, w e will explore the diamonds dataset stored in the :dyverse package. This dataset has over 59,000 diamonds catalogued, and we will treat this dataset like it’s a popula:on. - Let’s see how much varia:on we see from sample to sample and how reliable our sample sta:s:cs are in different situa:ons! Step 0 – Come to Lab day, or ask for help if you get stuck somewhere in Step 0! - Pre-lab work o Complete the pre-lab tutorials for Lab 1: hVps://stat212-learnr.stat.illinois.edu/ o Watch videos 1 (or 2), 3, 4, and 5 in this playlist: hVps://www.youtube.com/playlist? list=PLTE0IJCCTM9ILfW8OaLqZd37G7X4WDtl- - Open RStudio (or RStudio Cloud) to get started o Be careful not to open up R (this icon with just R and a swirly thing on the leK) . o Open up RStudio (this icon with the blue circle on the right!) . - Open the starter script linked in the assignment descrip:on. o I don’t recommend coding directly into the console (command line). Coding in your script is much easier for edi:ng your code, saving your code, and making comments for what each code does (video 3!) - Install and library ,dyverse o Write and run the following code: install.packages("tidyverse") o This will take a minute or two! Wait un:l the liVle stop sign disappears to proceed. o Next, you will want to run the following code: library(tidyverse) - Open the Data o We will be using the diamonds data frame stored in the :dyverse package. o AKer librarying :dyverse, run the code: View(diamonds) .

o Each row represents one diamond from a collec:on of over 59,000. We will treat this as our “popula:on.” Ques,on 1 (3pts): Create a histogram of the price variable. Set “breaks = 20” to keep a consistent number of bins. This is your popula?on distribu?on! Include the image of your histogram in your report. You may either save it to your computer and upload it, or include a properly cropped screenshot. Would you describe this distribu,on as symmetric or skewed? - Right skew

Ques,on 2 (3pts) Calculate the mean and standard devia:on of the price variable. This is your popula?on mean and standard devia?on. Include the popula,on mean and standard devia,on values in your report > mean [1] 3932.8 > standard devia:on [1] 3989.44

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Ques,on 3 (5pts): Take a random sample of 50 diamond prices from this dataset. • Name this vector fifty_diam (If saved properly, you will see this in your global environment!) • Sample without replacement (this will be the default op:on) Create a histogram of the fifty_diam variable. Set “breaks = 20” to keep a consistent number of bins. This is a sample data distribu?on! Create a histogram of your sample data and include this image in your report If you were to take a much larger sample, the shape of your sample data distribu,on would look more and more like…what? If you’re not sure what we mean by this, check Chapter 3 again! > CLT says that it would end up looking more like a bell curve with normal distribu:on

Ques,on 4 (5pts) Calculate the mean and standard devia:on of your sample data. Note that these values will change if you take a new sample, and that’s ok! Just report the values you get for one par?cular sample. Include your sample mean and standard devia,on values in your report > mean(fiKy_diam) [1] 3974.38 > standard devia:on(fiKy_diam) [1] 4254.187 What is the absolute error of your sample mean as an es,mate of the popula,on mean? If you’re not sure what we mean by this, check Chapter 3 again! |3932.8 - 3974.38| = 41.58

Ques,on 5 (5pts): Next, set up a for loop to simulate taking a sample of size 50 at least 10,000 :mes. Inside your loop, calculate the mean price and save it to a vector called means_fifty . Please reference the en?re “For Loops: Returning a Vector” sec?on of the “Sampling and Simula?on” tutorial for assistance on this part. AKer successfully running your simula:on, create a histogram of your means_fifty vector and set “breaks = 20” to keep a consistent number of bins. Include the image of your histogram in your report Include the R code you used to generate this loop means_fiKy = NULL for (i in 1:10000) { means_fiKy[i] = mean(sample(x = diamonds$price, size = 50, replace = FALSE)) }

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

Ques,on 6 (5pts): As you should no:ce from your histogram, our sample means will vary with each sample we take. Calculate the standard devia:on of the simulated sample means ( means_fifty vector) you created. Include this standard devia,on value in your report 565.5762 If you run the loop again and recalculate the standard devia:on, you’ll likely find that the number changed a liVle bit! What is the standard devia:on of the simulated means approxima:ng? Report the name of this measure and calculate the true value for this measure using the formula we learned. Check pages 3 and 9 of Chapter 3 if you’re not sure! > Its approxima:ng how close the sample mean is to the popula:on (standard error of x bar) > 565.5762/square root(50) = 79.98455

Ques,on 7 (5pts): Repeat ques:on 5, but with a sample size of 10 instead of 50. Call your vector of sample means means_ten . AKer successfully running your simula:on, create a histogram of your means_ten vector. Again, set “breaks = 20” to keep a consistent number of bins. Include the image of your histogram in your report Include the R code you used to generate this loop means_ten = NULL for (i in 1:10000) { means_ten[i] = mean(sample(x = diamonds$price, size = 10, replace = FALSE)) }

Ques,on 8 (4pts) Let’s compare the distribu:on of sample means when we took samples of size 50 versus when we took samples of size 10 Is there any difference in the shapes of these distribu,ons? Yes! The sample with a size 10 tends to be more right skewed than the sample with a size of 50 What is the Central Limit Theorem, and how does this relate to what you found in your previous answer? The CLT states that the graph will be a normal distribu:on if the sample is large enough and since the sample(size = 50) has a larger size, it it more normally distributed than sample(size = 10) which correlates with the theory.

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version