Lab4.knit

pdf

School

Syracuse University *

*We aren’t endorsed by this school

Course

687

Subject

Statistics

Date

Feb 20, 2024

Type

pdf

Pages

Uploaded by DeanTigerMaster997

9/21/23, 9:36 PM Lab4.knit file:///C:/Users/morea/Downloads/Lab4.html 1/4 Intro to Data Science - Lab 4 Copyright 2022, Jeffrey Stanton and Jeffrey Saltz Please do not post online. Week 4 - Sampling # Enter your name here: Adesh More Please include nice comments. Instructions: Run the necessary code on your own instance of R-Studio. Attribution statement: (choose only one and delete the rest) # 1. I did this lab assignment by myself, with help from the book and the professor. A key focus of this week is how to make inferences about populations based on samples. The essential logic lies in comparing a single instance of a statistic such as a sample mean to a distribution of such values. The comparison can lead to one of two conclusions the sample statistic is either extreme or not extreme. But what are the thresholds for making this kind of judgment call (i.e., whether a value is extreme or not)? This activity explores that question. The problem is this: You receive a sample containing the ages of 30 students. You are wondering whether this sample is a group of undergraduates (mean age = 20 years) or graduates (mean age = 25 years). To answer this question, you must compare the mean of the sample you receive to a distribution of means from the population. The following fragment of R code begins the solution: set.seed(2) sampleSize <- 30 studentPop <- rnorm(20000,mean=20,sd=3) undergrads <- sample(studentPop,size=sampleSize,replace=TRUE) grads <- rnorm(sampleSize,mean=25,sd=3) if (runif(1)>0.5) { testSample <- grads } else { testSample <- undergrads } mean(testSample) After you run this code, the variable ** testSample ** will contain either a sample of undergrads or a sample of grads. The line before last flips a coin by generating one value from a uniform distribution (by default the distribution covers 0 to 1) and comparing it to 0.5. The question you must answer with additional code is: Which is it, grad or undergrad? Here are the steps that will help you finish the job:

9/21/23, 9:36 PM Lab4.knit file:///C:/Users/morea/Downloads/Lab4.html 2/4 1. Copy the code above and annotate it with line-by-line commentary. In other words, you must explain what each of the seven lines of code above actually do! You will have to lookup the meaning of some commands. set.seed(2) # maintains consistency, sets random seed to 2 sampleSize <- 30 #setting sample size studentPop <- rnorm(20000,mean=20,sd=3) # using rnorm() to generate 20k values undergrads <- sample(studentPop,size=sampleSize,replace=TRUE) # creating undergrads grads <- rnorm(sampleSize,mean=25,sd=3) # creating grads if (runif(1)>0.5) { testSample <- grads } else { testSample <- undergrads } # generating random values in testSample using runif()) mean(testSample) ## [1] 24.89729 2. Generate 10 samples from the ** undergrads ** dataset. undergrads_10 <- sample(undergrads, size= 10, replace= FALSE) undergrads_10 ## [1] 14.13022 25.57921 14.32067 25.92212 13.99717 21.48056 24.82276 21.07116 ## [9] 18.73168 19.87687 3. Generate 10 new samples and take the mean of that sample undergrads_10_1 <- sample(undergrads, size= 10, replace= FALSE) mean(undergrads_10_1) ## [1] 18.26193 4. Repeat this process 3 times (i.e., generate a sample and take the mean 3 times, using the replicate function). undergrads_replicated <- replicate(3,sample(undergrads, size= 10, replace= FALSE)) mean(undergrads_replicated) ## [1] 19.68058 5. Generate a list of sample means from the population called ** undergrads ** How many sample means should you generate? Really, you can create any number that you want hundreds, thousands, whatever but I suggest for ease of inspection that you generate just 100 means. That is a pretty small number, but it makes it easy to think about percentiles and ranks. undergrads_100 <- replicate(100, mean(sample(undergrads, size= 10, replace= FALSE))) mean(undergrads_100) ## [1] 19.60209

9/21/23, 9:36 PM Lab4.knit file:///C:/Users/morea/Downloads/Lab4.html 3/4 6. Once you have your list of sample means generated from undergrads , the trick is to compare mean(testSample) to that list of sample means and see where it falls. Is it in the middle of the pack? Far out toward one end? Here is one hint that will help you: In chapter 7, the quantile() command is used to generate percentiles based on thresholds of 2.5% and 97.5%. Those are the thresholds we want, and the quantile() command will help you create them. # The mean I got is close to the original sample mean. percentile <- quantile( undergrads_100, c(0.025, 0.975)) 7. Your code should have a print() statement that should say either, Sample mean is extreme, or, Sample mean is not extreme. if (mean(undergrads_100) >= percentile[1] || mean(undergrads_100) <= percentile[2]) { print("Sample Mean is not extreme.") } else { print("Sample Mean is extreme.") } ## [1] "Sample Mean is not extreme." 8. Add a comment stating if you think the testSample are undergrad students. Explain why or why not. # testSample are not undergrad students as average is around 25. 9. Repeat the same analysis to see if the testSample are grad students. grad_means <- replicate(100, mean(sample(grads, size=10, replace=FALSE))) mean(grad_means) ## [1] 24.98488 quant <- quantile(grad_means, probs = c(0.025, 0.975)) quant ## 2.5% 97.5% ## 23.40300 26.23425 mean(testSample) ## [1] 24.89729 if ((mean(testSample) < quant[1]) || (mean(testSample) > quant[2])) { print("mean was extreme") } else { print ("mean was not extreme") }

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version

Access to all documents
Unlimited textbook solutions
24/7 expert homework help

9/21/23, 9:36 PM Lab4.knit file:///C:/Users/morea/Downloads/Lab4.html 4/4 ## [1] "mean was not extreme" # It is grad students as mean is close to the same.