worksheet_01

html

School

University of California, Los Angeles *

*We aren’t endorsed by this school

Course

301

Subject

Statistics

Date

Feb 20, 2024

Type

html

Pages

22

Uploaded by DeaconEnergyCat30

Report
Worksheet 1: Introduction to Statistical Modelling and A/B Testing Welcome to STAT 301: Statistical Modelling for Data Science Each week you will complete a lecture assignment like this one. Before we get started, let's talk about some administrative details. Hands-on practice can be very useful when you learn technical subjects!! Weekly lecture worksheets and tutorials are an essential part of the course!! Collaborating on lecture worksheets and tutorial assignments is more than okay -- it is encouraged! You should rarely be stuck for more than a few minutes on questions in lecture or tutorial Ask a neighbour, TA or an instructor for help (explaining things is beneficial, too -- the best way to solidify your knowledge of a subject is to explain it) Please do not just share answers, though, work cooperatively!! Everyone must submit a copy of their own work. You can read more about course policies on the course website . Learning Objectives After completing this week's worksheet and tutorial work, you will be able to: 1. Describe the goals of hypothesis testing, in particular difference in means tests related to A/B testing. 2. Give an example of a problem that requires A/B testing. 3. List methods used to test difference in means between two populations. 4. Interpret the results of hypothesis tests. 5. Explain the relation between type I and type II errors, power and sample size in 2-sample hypothesis testing. 6. Write a computer script to perform difference in means hypothesis testing and compute errors, power and p-values. Loading packages In [3]: # Run this cell before continuing. library(tidyverse) library(infer) library(broom) source("tests_worksheet_01.R")
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ── dplyr 1.1.4 readr 2.1.4 forcats 1.0.0 stringr 1.5.1 ggplot2 3.4.4 tibble 3.2.1 lubridate 1.9.3 tidyr 1.3.0 purrr 1.0.2 ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ── dplyr::filter() masks stats::filter() dplyr::lag() masks stats::lag() Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors Attaching package: ‘testthat’ The following object is masked from ‘package:dplyr’: matches The following object is masked from ‘package:purrr’: is_null The following objects are masked from ‘package:readr’: edition_get, local_edition The following object is masked from ‘package:tidyr’: matches 1. Warm Up Questions Question 1.0 {points: 1} In DSCI 100, you learned about 6 different types of data analysis questions you can ask and answer . Moreover, in STAT 201, you reviewed what an inferential question is. Now, it is time to do a more comprehensive exercise to identify what class of data analysis a given real-life question implicates. Below there is a table that lists out various types of data analysis questions on the left column: Question Type Is wearing sunscreen associated with a decreased probability of developing skin cancer in answer1.0.0
Question Type Canada? How does alcohol consumption relate to socioeconomic status in the 2018 City of Vancouver survey dataset? answer1.0.1 Does a more concise Google ad lead to an increased number of visits to the advertised company's website? answer1.0.2 How do changes in human behaviour lead to a reduction in the number of COVID-19 confirmed cases? answer1.0.3 Does a reduced caloric intake cause weight-loss? answer1.0.4 Do tweets with GIFs get on average more impressions than tweets that do not? answer1.0.5 Does including a GIF in tweets lead to more profile visits than tweets that do not include a GIF? answer1.0.6 How many mentions will my next tweet get? answer1.0.7 How many accounts are there on Twitter today? answer1.0.8 Does increasing the contrast in images lead to better visual discrimination of visually impaired image content? answer1.0.9 The right column of the table is empty but should describe one of the following types of statistical question being asked: A. Descriptive. B. Exploratory. C. Inferential. D. Predictive. E. Causal. F. Mechanistic. Assign your answers to the objects answer1.0.0 , answer1.0.1 , answer1.0.2 , answer1.0.3 , answer1.0.4 , answer1.0.5 , answer1.0.6 , answer1.0.7 , answer1.0.8 , and answer1.0.9 . Your answer should each be a single character ( "A" , "B" , "C" , "D" , "E" , or "F" ) surrounded by quotes. In [6]: answer1.0.0 <- "C"
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
answer1.0.1 <- "B" answer1.0.2 <- "E" answer1.0.3 <- "F" answer1.0.4 <- "E" answer1.0.5 <- "C" answer1.0.6 <- "E" answer1.0.7 <- "D" answer1.0.8 <- "A" answer1.0.9 <- "E" In [7]: test_1.0() Test passed 😸 Test passed 🎉 Test passed 🎉 Test passed 🎉 Test passed 🎉 Test passed 🎉 Test passed 🎉 Test passed 🎉 Test passed 😸 Test passed 😀 Test passed 😸 Test passed 🎉 Test passed 🎉 Test passed 🎉 Test passed 🎉 Test passed 🎉 Test passed 🎉 Test passed 🎉 Test passed 🎉 Test passed 🎉 [1] "Success!" Question 1.1 {points: 1} We must have language/terminology that we can use to discuss concepts related to experimentation and causal inference, as in A/B testing. It takes time and practice to commit these terms and corresponding definitions to our memory to use them fluidly in practice. Let us get some more training by matching language/terminology with their definitions. Read the table below and assign the correct term on the right column.
Defintion Term Technique to investigate effects of several variables in one study; experimental units are assigned to all possible combinations of factors. answer1.1.0 Explanatory variable manipulated by the experimenter. answer1.1.1 The entity/object in the sample that is assigned to a treatment and for which information is collected. answer1.1.2 Repetition of an experimental treatment. answer1.1.3 Equal number of experimental units for each treatment group. answer1.1.4 Process of randomly assigning explanatory variable(s) of interest to experimental units answer1.1.5 A combination of factor levels. answer1.1.6 Statistically comparing a key performance indicator (conversion rate, dwell time, etc.) between two versions of a webpage/app/add to assess which one performs better. answer1.1.7 Assign your answers to the objects answer1.1.0 , answer1.1.1 , answer1.1.2 , answer1.1.3 , answer1.1.4 , answer1.1.5 , answer1.1.6 , and answer1.1.7 . Your answer should each be a single string ( "randomization" , "A/B testing" , "treatment" , "factor" , "experimental unit" , "replicate" , "balanced design" , and "factorial design" ) surrounded by quotes. In [12]: answer1.1.0 <- "factorial design" answer1.1.1 <- "factor" answer1.1.2 <- "experimental unit" answer1.1.3 <- "replicate" answer1.1.4 <- "balanced design" answer1.1.5 <- "randomization" answer1.1.6 <- "treatment" answer1.1.7 <- "A/B testing" In [13]: test_1.1() Test passed 😸 Test passed 😀 Test passed 🎉 Test passed 🎉 Test passed 😸 Test passed 🎉 Test passed 🎉 Test passed 🎉
Test passed 🎉 Test passed 🎉 Test passed 🎉 Test passed 🎉 Test passed 🎉 Test passed 😀 Test passed 🎉 Test passed 🎉 [1] "Success!" 2. Review of hypothesis testing The first topic you will learn in the course is A/B testing optimization. An A/B test refers to a statistical hypothesis test to compare parameters of two populations (or groups), namely Group A and Group B (hence A/B). Many web applications have emerged in the last decade to help industries to optimize their product offerings by comparing different variations. Stakeholders can collect data and use these platforms to analyze it and see if one variation (say A) improves their services over another one (say B). More generally, multiple variations can be compared and analyzed. While A/B testing is highly related to hypothesis tests topics covered in STAT 201, soon you will recognize some important limitations of the methods you have learned before. A/B testing We are trying to compare the parameters of two populations. The parameters being compared can vary depending on the problem. For example, you could be interested in the proportion of website visitors who register for the newsletter the average amount of money spent by each visitor the efficacy of a new drug Naturally, the statistical analysis will change depending on the parameters being tested (remember the different formulae for hypothesis testing using CLT you learned in STAT 201?). Let's review topics on Hypothesis testing using the following example: Suppose a company's marketing team has developed a new video for their TikTok ad. They want to know if this new ad will increase the ad engagement (which they will measure via ad dwell time in seconds, i.e., a continuous response) compared to the current ad they are currently running. Question 2.0 {points: 1} The null hypothesis, $H_0$, generally refers to the status quo, i.e., there is no change in ad engagement. Let $\ mu_{\text{new}}$ and $\mu_{\text{current}}$ be the mean dwell times of the new and current ads, respectively. What is the null hypothesis we are testing? A. $H_0: \mu_{\text{new}} > \mu_{\text{current}}$
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
B. $H_0: \mu_{\text{new}} < \mu_{\text{current}}$ C. $H_0: \mu_{\text{new}} = \mu_{\text{current}}$ D. $H_0: \mu_{\text{new}} \neq \mu_{\text{current}}$ Assign your answer to an object called answer2.0 . Your answer should be one of "A" , "B" , "C" , or "D" surrounded by quotes. In [16]: answer2.0 <- "C" In [17]: test_2.0() Test passed 😸 Test passed 🎉 Test passed 🎉 [1] "Success!" Question 2.1 {points: 1} The alternative hypothesis, $H_1$, generally refers to the researcher's hypothesis of interest, i.e., the new ad increases the ad engagement. Let $\mu_{\text{new}}$ and $\mu_{\text{current}}$ be the mean dwell times of the new and current ads, respectively. What is the alternative hypothesis we are testing? A. $H_1: \mu_{\text{new}} > \mu_{\text{current}}$ B. $H_1: \mu_{\text{new}} < \mu_{\text{current}}$ C. $H_1: \mu_{\text{new}} = \mu_{\text{current}}$ D. $H_1: \mu_{\text{new}} \neq \mu_{\text{current}}$ Assign your answer to an object called answer2.1 . Your answer should be one of "A" , "B" , "C" , or "D" surrounded by quotes. In [18]: answer2.1 <- "A" In [19]: test_2.1() Test passed 🎉 Test passed 🎉 Test passed 🎉
[1] "Success!" Question 2.2 {points: 1} The company would like to run an experiment on TikTok users in the age demographic most of their customers fall (between 16 and 24 years old) to compare the mean dwell times of both populations (current vs new ads). The company plans to collect a representative sample of $n = 2000$ TikTok users and randomize them so that each user views one of the two ads. The sample will be split by half, i.e., $n_{\text{current}} = n_{\text{new}} = 1000$. Once the data is collected, we will conduct a hypothesis test. This analysis will depend on the nature of our response and on the approach we want to use (e.g., Bootstrapping, Central Limit Theorem). If we opt for using the CLT to conduct the analysis, what is the specific test we need to perform? A. One-sample $z$-test. B. One-sample $t$-test. C. Two-sample $z$-test. D. Two-sample $t$-test. E. Two-way ANOVA. Assign your answer to an object called answer2.2 . Your answer should be one of "A" , "B" , "C" , "D" , or "E" surrounded by quotes. In [24]: answer2.2 <- "D" In [25]: test_2.2() Test passed 🎉 Test passed 🎉 Test passed 🎉 [1] "Success!" Simulation Study In practice, we would run an A/B testing by drawing a sample of size of $n$ experimental units (i.e., subjects) from the populations being studied in our example: $n$ TikTok users Then, we split the subjects in the sample in such a way that some of the subjects will receive one of the treatments and the remaining subjects will receive the other treatment
in our example: some will watch the current ad and others the new one However, in practice we never know if a difference trully exists between both populations examined or the magnitude of this difference (effect size) Thus, to explore the behavior of different inference methods, in this exercise we are going to use simulated data to have full control (and knowledge) of all the parameters used to generate the data Simulated data Suppose we have two populations of one million TikTok users each who are between 16 and 24 years old. Users in one population have watched the current, while users in the other population have watch the new ad. Assume that these are the entire populations. The object tiktok_pop stores the dwell time (in seconds) of each user for the current ad ( dwell_time_current_ad ) and of each user for the new ad ( dwell_time_new_ad ). In [26]: # run this cell before continuing tiktok_pop <- read_csv("data/tiktok_pop.csv") head(tiktok_pop) Rows: 1000000 Columns: 3 ── Column specification ──────────────────────────────────────────────────────── Delimiter: "," chr (1): user dbl (2): dwell_time_current_ad, dwell_time_new_ad Use `spec()` to retrieve the full column specification for this data. Specify the column types or set `show_col_types = FALSE` to quiet this message. A tibble: 6 × 3 user dwell_time_current_ad dwell_time_new_ad <chr> <dbl> <dbl> User 1 13.68792 29.18848 User 2 15.25864 18.64276 User 3 22.55960 22.96510 User 4 23.51949 18.97294 User 5 15.99553 22.66494
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
user dwell_time_current_ad dwell_time_new_ad <chr> <dbl> <dbl> User 6 29.47273 17.44211 Question 2.3 {points: 1} Calculate the (true) population means and the (true) population standard deviations of dwell time for both ads. Save the result in a tibble called tiktok_true_params . The tibble should have four columns: mean_current_ad , sd_currnt_ad , mean_new_ad , and sd_new_ad . Fill out those parts indicated with ..., uncomment the corresponding code in the cell below, and run it. In [31]: # Assuming your data is in a data frame named df tiktok_true_params <- tiktok_pop %>% summarize(mean_current_ad = mean(dwell_time_current_ad), sd_current_ad = sd(dwell_time_current_ad), mean_new_ad = mean(dwell_time_new_ad), sd_new_ad = sd(dwell_time_new_ad)) # Print the resul In [32]: test_2.3() Test passed 🎉 Test passed 🎉 Test passed 🎉 Test passed 🎉 Test passed 🎉 [1] "Success!" Question 2.4 {points: 1} Although in the previous exercise we assumed that we have access to both populations, and (true) population parameters, this is not the case in practice!! Let's see how things actually work in practice. Here's what you need to do: 1. Take one sample of size 200 users
2. Assume that the first 100 users in our sample will watch the current ad, and the remaining will watch the new ad ( note : since the data have been generated, the dwell times of each ad are already available in the second and third columns, respectively) Save the sample in a tibble called tiktok_sample . The tibble should have three columns: user , ad_watched , and dwell_time . Fill out those parts indicated with ..., uncomment the corresponding code in the cell below, and run it. In [33]: set.seed(432121) # do not change this! tiktok_sample <- tiktok_pop%>% rep_sample_n(size = 200) %>% ungroup() %>% mutate(row = row_number(), ad_watched = if_else(row <= 100, "current", "new"), dwell_time = if_else(row <= 100, dwell_time_current_ad, dwell_time_new_ad)) %>% select(user, ad_watched, dwell_time) head(tiktok_sample) A tibble: 6 × 3 user ad_watched dwell_time <chr> <chr> <dbl> User 644538 current 31.72216 User 929727 current 16.01492 User 178752 current 21.25067 User 685992 current 16.27671 User 306869 current 11.89419 User 278642 current 23.30618 In [34]: test_2.4() Test passed 🎉 Test passed 🎉 Test passed 🎉 Test passed 🎉
Test passed 🎉 [1] "Success!" Question 2.5 {points: 1} Once we have collected our experimental samples for both treatments, it is time to conduct a statistical analysis. However, before testing the hypotheses stated in Questions 2.0 and 2.1 , it is always good to graphically compare the distributions of both samples of dwell times. Make the side-by-side plot of the boxplots of each sample distribution, current and new , stored in the ad_watched column. Since boxplots do not show the sample means, let's add a point on top of each boxplot to represent the sample mean dwell time. The function stat_summary() can help with that. Store the plot in a object named dwell_time_boxplots . Fill out those parts indicated with ..., uncomment the corresponding code in the cell below, and run it. In [35]: options(repr.plot.width = 15, repr.plot.height = 9) dwell_time_boxplots <- tiktok_sample %>% ggplot(aes(x = ad_watched, y = dwell_time, fill = ad_watched)) + geom_boxplot() + theme( text = element_text(size = 22), plot.title = element_text(face = "bold"), axis.title = element_text(face = "bold") ) + ggtitle("Boxplots of Dwell Time for Current and New Ads") + xlab("Ad Watched") + ylab("Dwell Time") + guides(fill = FALSE) + stat_summary( aes(x = ad_watched, y = dwell_time, fill = ad_watched), fun = mean, colour = "yellow", geom = "point", shape = 18, size = 5 ) dwell_time_boxplots Warning message: “The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as of ggplot2 3.3.4.” In [36]:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
test_2.5() Test passed 😀 Test passed 🎉 Test passed 😸 Test passed 🎉 Test passed 🎉 Test passed 🎉 Test passed 😸 Test passed 🎉 Test passed 😸 Test passed 🎉 Test passed 🎉 Test passed 🎉 Test passed 🎉 Test passed 🎉 Test passed 🎉 [1] "Success!" Question 2.6 {points: 1} Based on your findings in Question 2.5 , what can you conclude about dwell times of TikTok users? A. The current ad's sample median and sample mean are higher than those of the new ad. Moreover, both data spreads are quite similar. B. The new ad's sample median and sample mean are statistically significantly higher than those of the current ad. C. The new ad's population median and population mean seems to be higher than those of the current ad. However, there's the possibility that the observed difference is due to sampling variability. Both data spreads are quite similar. D. The new ad's population median and population mean are higher than those of the current ad. Moreover, the data spreads are quite different by treatment. Assign your answer to an object called answer2.6 . Your answer should be one of "A" , "B" , "C" , or "D" surrounded by quotes. In [37]: answer2.6 <- "C" In [38]: test_2.6() Test passed 🎉 Test passed 🎉 Test passed 😸
[1] "Success!" Question 2.7 {points: 1} The previous plot shows that the new ad's sample mean dwell time is higher than that of the current ad. Nonetheless, given the variations found in each treatment's dwell times, how likely would it be for us to see a sample difference in means at least as extreme as the observed one if there were no difference in population means ? In other words, is the observed difference statistically significant? Recall the 2-sample $t$-test that you've learned in STAT 201 to test the hypotheses stated in Questions 2.0 and 2.1 . The test statistic to conduct this test (with unequal population variances ) is defined as: $$ T = \frac{\bar{x}_{\text{new}} - \bar{x}_{\text{current}}}{\sqrt{\frac{s^2_{\text{new}}}{n_{\ text{new}}}+\frac{s^2_{\text{current}}}{n_{\text{current}}}}} $$ where $\bar{x}_{\text{new}}$ and $\bar{x}_{\text{current}}$ are the sample means of the dwell times for the new and current ads, respectively; $s^2_{\text{new}}$ and $s^2_{\text{current}}$ are the sample variances of the dwell times for the new and current ads, respectively; and $n_{\text{new}}$ and $n_{\text{current}}$ are the sample size for new and current ads, respectively. Furthermore, without making further distributional assumptions and using results of the CLT, under the null hypothesis $H_0$, the $T$ statistic approximately follows a $t$-distribution with approximately $$ \nu = \frac{ \left(\frac{s_{\text{new}}^2}{n_\text{new}}+\frac{s_{\text{current}}^2}{n_\text{current}}\ right)^2 } { \frac{s_{\text{new}}^4}{n_{\text{new}}^2(n_{\text{new}}-1)}+\frac{s_{\text{current}}^2} {n_{\text{current}}^2(n_{\text{current}}-1)} } $$ degrees of freedom. Note: if we assume equal variances, the pooled SD is used as a denominator of $T$ and $\text{df} = n_\ text{new} + n_\text{current} -2$ Use the corresponding R function to calculate all these values (i.e. compute 2-sample t-test). Make sure to use broom::tidy() to get a more organized result. Fill out those parts indicated with ... , uncomment the corresponding code in the cell below, and run it. Assign your answer to an object called answer2.7 . In [61]: answer2.7 <- tidy(t.test( x = tiktok_sample$dwell_time[tiktok_sample$ad_watched == "new"], y = tiktok_sample$dwell_time[tiktok_sample$ad_watched == "current"], alternative = "greater", var.equal = FALSE, # Corrected to indicate unequal variances )) answer2.7
A tibble: 1 × 10 estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high method alternative <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr> 5.467909 21.23937 15.77146 5.030197 5.484754e- 07 197.6653 3.671506 Inf Welch Two Sample t- test greater In [62]: test_2.7() Test passed 🎉 Test passed 🎉 Test passed 🎉 [1] "Success!" Question 2.8 {points: 1} What is your decision at the 5% significance level? A. Since the p-value is less than 0.05, we reject $H_0$. Therefore, we have statistical evidence to state that the current ad's mean dwell time is larger than the new ad's one. B. Since the p-value is less than 0.05, we fail to reject $H_0$. Therefore, we have statistical evidence to state that the new ad's mean dwell time is equal to the current ad's one. C. Since the p-value is less than 0.05, we reject $H_0$. Therefore, we have statistical evidence to state that the new ad's mean dwell time is larger than the current ad's one. D. Since the p-value is less than 0.05, we fail to reject $H_0$. Therefore, we have statistical evidence to state that the new ad's mean dwell time is larger than the current ad's one. Assign your answer to an object called answer2.8 . Your answer should be one of "A" , "B" , "C" , or "D" surrounded by quotes. In [63]: answer2.8 <- "C" In [64]: test_2.8() Test passed 😀 Test passed 😀 Test passed 😀
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
[1] "Success!" Question 2.9 {points: 1} Instead of using an asymptotic approximation of the sampling distribution, one could use a permutation test to test the equality of the parameters of two populations. Using 1000 replications, test the hypothesis $H_0: \ mu_{current}=\mu_{new}$ vs $H_0: \mu_{current}<\mu_{new}$. Store the result in an object named tiktok_permute_results . Your answer should be a tibble with the observed test statistic, and the permutation p-value estimate. Fill out those parts indicated with ..., uncomment the corresponding code in the cell below, and run it. In [67]: set.seed(10) # Don't change this tiktok_permute_results <- tibble( obs_test_stat = tiktok_sample %>% specify(formula = dwell_time ~ ad_watched) %>% calculate(stat = "diff in means", order = c("new", "current")) %>% pull(), pvalue = tiktok_sample %>% specify(formula = dwell_time ~ ad_watched) %>% hypothesize(null = "independence") %>% generate(reps = 1000, type = "permute") %>% calculate(stat = "diff in means", order = c("new", "current")) %>% get_p_value(obs_stat = obs_test_stat, direction = "greater") %>% pull() ) tiktok_permute_results Warning message: “Please be cautious in reporting a p-value of 0. This result is an approximation based on the number of `reps` chosen in the `generate()` step. See `?get_p_value()` for more information.” A tibble: 1 × 2 obs_test_stat pvalue <dbl> <dbl> 5.467909 0 In [68]: test_2.9()
Test passed 🎉 Test passed 😀 Test passed 🎉 Test passed 🎉 Test passed 😀 [1] "Success!" 3. Review of concepts related to A/B Testing Let us reconsider the TikTok ad example but this time assume that the company uses two samples of 10 users each, and that the dwell times for sample are drawn from Gaussian distributions with known variances. Note : this is not a very realistic assumption. However, for large sample sizes the t-distribution is very similar to the Normal distribution. Assume further that the following plot, test_plot , summarises the information of a test at $5\%$ significance level for the hypotheses stated in Questions 2.0 and 2.1 In [69]: options(repr.plot.width = 10, repr.plot.height = 9) # Adjust these numbers so the plot looks good in your desktop. norm_x_axis <- seq(-5, 10, 0.1) norm_critical <- 1.64 z_stat <- 2.2 norm_dens_data <- data.frame( x = norm_x_axis, y1 = dnorm(norm_x_axis), y2 = dnorm(norm_x_axis, 2 , 1) ) test_plot <- ggplot(norm_dens_data, aes(x = norm_x_axis) ) + geom_line( aes( y = y1, colour = 'H0 is true' ), size = 1.2 ) + geom_line( aes( y = y2, colour = 'H1 is true' ), size = 1.2 ) + geom_area( aes( y = y1, x = ifelse(x > norm_critical, norm_x_axis, NA)), fill = 'black') + geom_area( aes( y = y2, x = ifelse(x > norm_critical, norm_x_axis, NA) ), fill = 'blue', alpha = 0.3 ) + theme( legend.title = element_blank() ) + labs( x = '', y = '' ) + geom_point(y=0,x=z_stat,size = 3,shape=19,aes(color = "z score"))+ geom_point(y=0,x=norm_critical,size = 3,shape=19,aes(color = "critical val"))+ scale_colour_manual( breaks = c("H0 is true", "H1 is true", "z score","critical val"), values = c("black", "blue","#f94f21","#33CC00"), guide = guide_legend(override.aes = list( linetype =c(rep("solid", 2),rep("blank", 2)), shape = c(NA, NA,rep(16, 2))))) test_plot Warning message: “Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
Please use `linewidth` instead.” Warning message: “Removed 67 rows containing non-finite values (`stat_align()`).” Warning message: “Removed 67 rows containing non-finite values (`stat_align()`).” Question 3.0 {points: 1} Based on the results illustrated in the plot above, the company found the difference to be statistically significant. Therefore they rejected $H_0$ and started using the new ad. True or false: The problem with this scenario is that 10 is a fairly small sample size, which considerably hinders the sensitivity of the test to detect if there's a difference. In addition, for a sample of size 10, the probability of Type I Error is very high. Therefore, the company should not rely on this result and expand the experiment. Assign your answer to an object called answer3.0. Your answer should be either "true" or "false", surrounded by quotes. In [72]: answer3.0 <- "false" In [73]: test_3.0() Test passed 🎉 Test passed 😀 Test passed 🎉 [1] "Success!" Question 3.1 {points: 1} The Normal curves of test_plot represent: A. The population distribution corresponding to the difference of dwell times for the new vs the current ad B. The sampling distribution of the statistic used to test $H_0$ against $H_1$ C. The sample distribution corresponding to the difference of dwell times for the new vs the current ad Assign your answer to an object called answer3.1 . Your answer should be one of "A" , "B" , or "C" surrounded by quotes. In [74]:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
answer3.1 <- "B" In [75]: test_3.1() Test passed 😸 Test passed 😸 Test passed 🎉 [1] "Success!" Question 3.2 {points: 1} Select the right option to complete the sentence below: In test_plot , the probability of Type I error is ... A. illustrated by the light blue area B. not illustrated in the plot C. illustrated by the red dot D. the blank area of the black curve E. illustrated by the black area Assign your answer to an object called answer3.2 . Your answer should be one of "A" , "B" , "C" , "D" , or "E" surrounded by quotes. In [84]: answer3.2 <- "E" In [85]: test_3.2() Test passed 🎉 Test passed 🎉 Test passed 😀 [1] "Success!" Question 3.3 {points: 1} Select the right option to complete the sentence below: In test_plot , the power of the test is ...
A. illustrated by the light blue area B. not illustrated in the plot C. illustrated by the red dot D. the blank area of the black curve E. illustrated by the black area Assign your answer to an object called answer3.3 . Your answer should be one of "A" , "B" , "C" , "D" , or "E" surrounded by quotes. In [86]: answer3.3 <- "A" In [87]: test_3.3() Test passed 🎉 Test passed 🎉 Test passed 🎉 [1] "Success!" Question 3.4 {points: 1} In test_plot the p-value is represented by the portion of the black area to the right of the red dot. True or false? Assign your answer to an object called answer3.4 . Your answer should be either "true" or "false", surrounded by quotes. In [90]: answer3.4 <- "true" In [91]: test_3.4() Test passed 😀 Test passed 🎉 Test passed 🎉 [1] "Success! You are done with the first worksheet of STAT 301!!"
Summary and Review Some basic concepts to recall: Sampling Distribution The sampling distribution is the distribution of a statistic (e.g., sample mean, sample proportion, t- statistic, z-score). The sampling distribution is different from the sample distribution The sampling distribution is different from the population distribution We need a sampling distribution to make probabilistic statements about our statistic. For example: if the population mean is actually 0 (we usually want to test this, you don't know it), what is the probability that the sample mean would be greater than 1? The problem is that the sampling distribution is usually unknown, mainly because the population distribution is unknown. You may be able to derive mathematically the sampling distribution if you know the population distribution (rarely in practice). For example, if your sample comes from Normal distribution, then the sample mean is Normal as well In certain cases, you can use results of the CLT if your sample size is large and additional assumptions are met. For exmple, for a sample of independent and identically distributed random variables, if the sample size is large, the sampling distribution of the mean is approximately Normal You can use bootstrapping (although conditions exist as well) to approximate the sampling distribution. Errors in Hypothesis Tests There are 2 types of errors in a hypothesis testing problem: Type I error : rejecting $H_0$ when $H_0$ is true Type II error : failing to reject $H_0$ when $H_0$ is false The probability of the type I error is usually called significance level (aka $\alpha$) and it is set by the analyst when designing a test. Another important measure used to design a test is the power : Power : the probability of rejecting $H_0$ when $H_0$ is false (i.e., power = $1 - P(\text{type II error}) $) $p$-value The $p$-value can be used to assess the significance of the observed results by comparing its value to the specified significance level:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Is $p < \alpha$?? But what is a $p$-value?? It's been greatly missused for sure!! $p$-value : the probability, under the model specified in $H_0$, that a statistic would be at least as extreme as its observed value Note that the $p$-value is NOT : the probability that $H_0$ is true the probability that $H_0$ is false the probability that the statistic observed was produced by random chance alone a measure of the importance of the observed effect
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help