worksheet_01
html
keyboard_arrow_up
School
University of California, Los Angeles *
*We aren’t endorsed by this school
Course
301
Subject
Statistics
Date
Feb 20, 2024
Type
html
Pages
22
Uploaded by DeaconEnergyCat30
Worksheet 1: Introduction to Statistical
Modelling and A/B Testing
¶
Welcome to STAT 301: Statistical Modelling for Data Science
¶
Each week you will complete a lecture assignment like this one. Before we get started, let's talk about some administrative details.
Hands-on practice can be very useful when you learn technical subjects!!
•
Weekly lecture worksheets and tutorials are an essential part of the course!!
•
Collaborating on lecture worksheets and tutorial assignments is more than okay -- it is encouraged!
•
You should rarely be stuck for more than a few minutes on questions in lecture or tutorial
•
Ask a neighbour, TA or an instructor for help (explaining things is beneficial, too -- the best way to solidify your knowledge of a subject is to explain it)
•
Please do not just share answers, though, work cooperatively!!
Everyone must submit a copy of their own work.
You can read more about course policies on the course website
.
Learning Objectives
¶
After completing this week's worksheet and tutorial work, you will be able to:
1.
Describe the goals of hypothesis testing, in particular difference in means tests related to A/B testing. 2.
Give an example of a problem that requires A/B testing. 3.
List methods used to test difference in means between two populations. 4.
Interpret the results of hypothesis tests. 5.
Explain the relation between type I and type II errors, power and sample size in 2-sample hypothesis testing. 6.
Write a computer script to perform difference in means hypothesis testing and compute errors, power and p-values. Loading packages
¶
In [3]:
# Run this cell before continuing.
library(tidyverse)
library(infer)
library(broom)
source("tests_worksheet_01.R")
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
dplyr 1.1.4 readr 2.1.4
✔
✔
forcats 1.0.0 stringr 1.5.1
✔
✔
ggplot2 3.4.4 tibble 3.2.1
✔
✔
lubridate 1.9.3 tidyr 1.3.0
✔
✔
purrr 1.0.2 ✔
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
dplyr::filter() masks stats::filter()
✖
dplyr::lag() masks stats::lag()
✖
Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to ℹ
become errors
Attaching package: ‘testthat’
The following object is masked from ‘package:dplyr’:
matches
The following object is masked from ‘package:purrr’:
is_null
The following objects are masked from ‘package:readr’:
edition_get, local_edition
The following object is masked from ‘package:tidyr’:
matches
1. Warm Up Questions
¶
Question 1.0
{points: 1}
In DSCI 100, you learned about 6 different types of data analysis questions you can ask and answer
. Moreover, in STAT 201, you reviewed what an inferential question is. Now, it is time to do a more comprehensive exercise to identify what class of data analysis a given real-life question implicates.
Below there is a table that lists out various types of data analysis questions on the left column:
Question
Type
Is wearing sunscreen associated with a decreased probability of developing skin cancer in answer1.0.0
Question
Type
Canada?
How does alcohol consumption relate to socioeconomic status in the 2018 City of Vancouver survey dataset?
answer1.0.1
Does a more concise Google ad lead to an increased number of visits to the advertised company's website?
answer1.0.2
How do changes in human behaviour lead to a reduction in the number of COVID-19 confirmed cases?
answer1.0.3
Does a reduced caloric intake cause weight-loss?
answer1.0.4
Do tweets with GIFs get on average more impressions than tweets that do not?
answer1.0.5
Does including a GIF in tweets lead to more profile visits than tweets that do not include a GIF?
answer1.0.6
How many mentions will my next tweet get?
answer1.0.7
How many accounts are there on Twitter today?
answer1.0.8
Does increasing the contrast in images lead to better visual discrimination of visually impaired image content?
answer1.0.9
The right column of the table is empty but should describe one of the following types of statistical question being asked:
A.
Descriptive.
B.
Exploratory.
C.
Inferential.
D.
Predictive.
E.
Causal.
F.
Mechanistic.
Assign your answers to the objects answer1.0.0
, answer1.0.1
, answer1.0.2
, answer1.0.3
, answer1.0.4
, answer1.0.5
, answer1.0.6
, answer1.0.7
, answer1.0.8
, and answer1.0.9
. Your answer should each be a single character (
"A"
, "B"
, "C"
, "D"
, "E"
, or "F"
) surrounded by quotes.
In [6]:
answer1.0.0 <- "C"
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
answer1.0.1 <- "B"
answer1.0.2 <- "E"
answer1.0.3 <- "F"
answer1.0.4 <- "E"
answer1.0.5 <- "C"
answer1.0.6 <- "E"
answer1.0.7 <- "D"
answer1.0.8 <- "A"
answer1.0.9 <- "E"
In [7]:
test_1.0()
Test passed 😸
Test passed 🎉
Test passed 🎉
Test passed 🎉
Test passed 🎉
Test passed 🎉
Test passed 🎉
Test passed 🎉
Test passed 😸
Test passed 😀
Test passed 😸
Test passed 🎉
Test passed 🎉
Test passed 🎉
Test passed 🎉
Test passed 🎉
Test passed 🎉
Test passed 🎉
Test passed 🎉
Test passed 🎉
[1] "Success!"
Question 1.1
{points: 1}
We must have language/terminology that we can use to discuss concepts related to experimentation and causal inference, as in A/B testing. It takes time and practice to commit these terms and corresponding definitions to our memory to use them fluidly in practice. Let us get some more training by matching language/terminology with their definitions.
Read the table below and assign the correct term on the right column.
Defintion
Term
Technique to investigate effects of several variables in one study; experimental units are assigned to all possible combinations of factors.
answer1.1.0
Explanatory variable manipulated by the experimenter.
answer1.1.1
The entity/object in the sample that is assigned to a treatment and for which information is collected.
answer1.1.2
Repetition of an experimental treatment.
answer1.1.3
Equal number of experimental units for each treatment group.
answer1.1.4
Process of randomly assigning explanatory variable(s) of interest to experimental units
answer1.1.5
A combination of factor levels.
answer1.1.6
Statistically comparing a key performance indicator (conversion rate, dwell time, etc.) between two versions of a webpage/app/add to assess which one performs better.
answer1.1.7
Assign your answers to the objects answer1.1.0
, answer1.1.1
, answer1.1.2
, answer1.1.3
, answer1.1.4
, answer1.1.5
, answer1.1.6
, and answer1.1.7
. Your answer should each be a single
string (
"randomization"
, "A/B testing"
, "treatment"
, "factor"
, "experimental unit"
,
"replicate"
, "balanced design"
, and "factorial design"
) surrounded by quotes.
In [12]:
answer1.1.0 <- "factorial design"
answer1.1.1 <- "factor"
answer1.1.2 <- "experimental unit"
answer1.1.3 <- "replicate"
answer1.1.4 <- "balanced design"
answer1.1.5 <- "randomization"
answer1.1.6 <- "treatment"
answer1.1.7 <- "A/B testing"
In [13]:
test_1.1()
Test passed 😸
Test passed 😀
Test passed 🎉
Test passed 🎉
Test passed 😸
Test passed 🎉
Test passed 🎉
Test passed 🎉
Test passed 🎉
Test passed 🎉
Test passed 🎉
Test passed 🎉
Test passed 🎉
Test passed 😀
Test passed 🎉
Test passed 🎉
[1] "Success!"
2. Review of hypothesis testing
¶
The first topic you will learn in the course is A/B testing optimization. An A/B test refers to a statistical hypothesis test to compare parameters of two populations (or groups), namely Group A and Group B (hence A/B).
Many web applications have emerged in the last decade to help industries to optimize their product offerings by comparing different variations. Stakeholders can collect data and use these platforms to analyze it and see if one
variation (say A) improves their services over another one (say B). More generally, multiple variations can be compared and analyzed.
While A/B testing is highly related to hypothesis tests topics covered in STAT 201, soon you will recognize some important limitations of the methods you have learned before.
A/B testing
¶
We are trying to compare the parameters of two populations. The parameters being compared can vary depending on the problem. For example, you could be interested in
•
the proportion of website visitors who register for the newsletter •
the average amount of money spent by each visitor •
the efficacy of a new drug Naturally, the statistical analysis will change depending on the parameters being tested (remember the different formulae for hypothesis testing using CLT you learned in STAT 201?).
Let's review topics on Hypothesis testing using the following example:
Suppose a company's marketing team has developed a new video for their TikTok ad. They want to know if this
new
ad will increase the ad engagement (which they will measure via ad dwell time
in seconds, i.e., a continuous response) compared to the current
ad they are currently running.
Question 2.0
{points: 1}
The null hypothesis, $H_0$, generally refers to the status quo, i.e., there is no change in ad engagement. Let $\
mu_{\text{new}}$ and $\mu_{\text{current}}$ be the mean dwell times of the new and current ads, respectively. What is the null hypothesis we are testing?
A.
$H_0: \mu_{\text{new}} > \mu_{\text{current}}$
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
B.
$H_0: \mu_{\text{new}} < \mu_{\text{current}}$
C.
$H_0: \mu_{\text{new}} = \mu_{\text{current}}$
D.
$H_0: \mu_{\text{new}} \neq \mu_{\text{current}}$
Assign your answer to an object called answer2.0
. Your answer should be one of "A"
, "B"
, "C"
, or "D"
surrounded by quotes.
In [16]:
answer2.0 <- "C"
In [17]:
test_2.0()
Test passed 😸
Test passed 🎉
Test passed 🎉
[1] "Success!"
Question 2.1
{points: 1}
The alternative hypothesis, $H_1$, generally refers to the researcher's hypothesis of interest, i.e., the new ad increases the ad engagement. Let $\mu_{\text{new}}$ and $\mu_{\text{current}}$ be the mean dwell times of the new and current ads, respectively. What is the alternative hypothesis we are testing?
A.
$H_1: \mu_{\text{new}} > \mu_{\text{current}}$
B.
$H_1: \mu_{\text{new}} < \mu_{\text{current}}$
C.
$H_1: \mu_{\text{new}} = \mu_{\text{current}}$
D.
$H_1: \mu_{\text{new}} \neq \mu_{\text{current}}$
Assign your answer to an object called answer2.1
. Your answer should be one of "A"
, "B"
, "C"
, or "D"
surrounded by quotes.
In [18]:
answer2.1 <- "A"
In [19]:
test_2.1()
Test passed 🎉
Test passed 🎉
Test passed 🎉
[1] "Success!"
Question 2.2
{points: 1}
The company would like to run an experiment on TikTok users in the age demographic most of their customers fall (between 16 and 24 years old) to compare the mean dwell times of both populations (current vs new ads).
The company plans to collect a representative sample of $n = 2000$ TikTok users and randomize them so that each user views one of the two ads. The sample will be split by half, i.e., $n_{\text{current}} = n_{\text{new}}
= 1000$.
Once the data is collected, we will conduct a hypothesis test. This analysis will depend on the nature of our response and on the approach we want to use (e.g., Bootstrapping, Central Limit Theorem). If we opt for using the CLT to conduct the analysis, what is the specific test we need to perform?
A.
One-sample $z$-test.
B.
One-sample $t$-test.
C.
Two-sample $z$-test.
D.
Two-sample $t$-test.
E.
Two-way ANOVA.
Assign your answer to an object called answer2.2
. Your answer should be one of "A"
, "B"
, "C"
, "D"
, or "E"
surrounded by quotes.
In [24]:
answer2.2 <- "D"
In [25]:
test_2.2()
Test passed 🎉
Test passed 🎉
Test passed 🎉
[1] "Success!"
Simulation Study
¶
In practice, we would run an A/B testing by drawing a sample of size of $n$ experimental units (i.e., subjects) from the populations being studied
•
in our example: $n$ TikTok users Then, we split the subjects in the sample in such a way that some of the subjects will receive one of the treatments and the remaining subjects will receive the other treatment
•
in our example: some will watch the current ad and others the new one However, in practice we never know if a difference trully exists between both populations examined or the magnitude of this difference (effect size)
Thus, to explore the behavior of different inference methods, in this exercise we are going to use simulated data to have full control (and knowledge) of all the parameters used to generate the data
Simulated data
¶
Suppose we have two populations of one million TikTok users each who are between 16 and 24 years old. Users in one population have watched the current, while users in the other population have watch the new ad. Assume that these are the entire populations.
The object tiktok_pop
stores the dwell time (in seconds) of each user for the current ad (
dwell_time_current_ad
) and of each user for the new ad (
dwell_time_new_ad
).
In [26]:
# run this cell before continuing
tiktok_pop <-
read_csv("data/tiktok_pop.csv")
head(tiktok_pop)
Rows: 1000000 Columns: 3
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): user
dbl (2): dwell_time_current_ad, dwell_time_new_ad
Use `spec()` to retrieve the full column specification for this data.
ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
ℹ
A tibble: 6 × 3
user
dwell_time_current_ad dwell_time_new_ad
<chr>
<dbl>
<dbl>
User 1
13.68792
29.18848
User 2
15.25864
18.64276
User 3
22.55960
22.96510
User 4
23.51949
18.97294
User 5
15.99553
22.66494
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
user
dwell_time_current_ad dwell_time_new_ad
<chr>
<dbl>
<dbl>
User 6
29.47273
17.44211
Question 2.3
{points: 1}
Calculate the (true) population means and the (true) population standard deviations of dwell time for both ads.
Save the result in a tibble called tiktok_true_params
. The tibble should have four columns: mean_current_ad
, sd_currnt_ad
, mean_new_ad
, and sd_new_ad
.
Fill out those parts indicated with ..., uncomment the corresponding code in the cell below, and run it.
In [31]:
# Assuming your data is in a data frame named df
tiktok_true_params <- tiktok_pop %>% summarize(mean_current_ad = mean(dwell_time_current_ad),
sd_current_ad = sd(dwell_time_current_ad),
mean_new_ad = mean(dwell_time_new_ad),
sd_new_ad = sd(dwell_time_new_ad))
# Print the resul
In [32]:
test_2.3()
Test passed 🎉
Test passed 🎉
Test passed 🎉
Test passed 🎉
Test passed 🎉
[1] "Success!"
Question 2.4
{points: 1}
Although in the previous exercise we assumed that we have access to both populations, and (true) population parameters, this is not the case in practice!!
Let's see how things actually work in practice.
Here's what you need to do:
1.
Take one sample of size 200 users
2.
Assume that the first 100 users in our sample will watch the current ad, and the remaining will watch the
new ad (
note
: since the data have been generated, the dwell times of each ad are already available in the second and third columns, respectively) Save the sample in a tibble called tiktok_sample
. The tibble should have three columns: user
, ad_watched
, and dwell_time
.
Fill out those parts indicated with ..., uncomment the corresponding code in the cell below, and run it.
In [33]:
set.seed(432121) # do not change this!
tiktok_sample <-
tiktok_pop%>% rep_sample_n(size = 200) %>% ungroup() %>% mutate(row = row_number(),
ad_watched = if_else(row <= 100, "current", "new"),
dwell_time = if_else(row <= 100, dwell_time_current_ad, dwell_time_new_ad)) %>%
select(user, ad_watched, dwell_time)
head(tiktok_sample)
A tibble: 6 × 3
user
ad_watched dwell_time
<chr>
<chr>
<dbl>
User 644538
current
31.72216
User 929727
current
16.01492
User 178752
current
21.25067
User 685992
current
16.27671
User 306869
current
11.89419
User 278642
current
23.30618
In [34]:
test_2.4()
Test passed 🎉
Test passed 🎉
Test passed 🎉
Test passed 🎉
Test passed 🎉
[1] "Success!"
Question 2.5
{points: 1}
Once we have collected our experimental samples for both treatments, it is time to conduct a statistical analysis.
However, before testing the hypotheses stated in Questions 2.0
and 2.1
, it is always good to graphically compare the distributions of both samples of dwell times.
Make the side-by-side plot of the boxplots of each sample distribution, current
and new
, stored in the ad_watched
column. Since boxplots do not show the sample means, let's add a point on top of each boxplot to represent the sample mean dwell time. The function stat_summary()
can help with that.
Store the plot in a object named dwell_time_boxplots
.
Fill out those parts indicated with ..., uncomment the corresponding code in the cell below, and run it.
In [35]:
options(repr.plot.width = 15, repr.plot.height = 9)
dwell_time_boxplots <- tiktok_sample %>%
ggplot(aes(x = ad_watched, y = dwell_time, fill = ad_watched)) +
geom_boxplot() +
theme(
text = element_text(size = 22),
plot.title = element_text(face = "bold"),
axis.title = element_text(face = "bold")
) +
ggtitle("Boxplots of Dwell Time for Current and New Ads") +
xlab("Ad Watched") +
ylab("Dwell Time") +
guides(fill = FALSE) +
stat_summary(
aes(x = ad_watched, y = dwell_time, fill = ad_watched),
fun = mean, colour = "yellow", geom = "point",
shape = 18, size = 5
)
dwell_time_boxplots
Warning message:
“The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as
of ggplot2 3.3.4.”
In [36]:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
test_2.5()
Test passed 😀
Test passed 🎉
Test passed 😸
Test passed 🎉
Test passed 🎉
Test passed 🎉
Test passed 😸
Test passed 🎉
Test passed 😸
Test passed 🎉
Test passed 🎉
Test passed 🎉
Test passed 🎉
Test passed 🎉
Test passed 🎉
[1] "Success!"
Question 2.6
{points: 1}
Based on your findings in Question 2.5
, what can you conclude about dwell times of TikTok users?
A.
The current ad's sample median and sample mean are higher than those of the new ad. Moreover, both data spreads are quite similar.
B.
The new ad's sample median and sample mean are statistically significantly higher than those of the current ad.
C.
The new ad's population median and population mean seems to be higher than those of the current ad. However, there's the possibility that the observed difference is due to sampling variability. Both data spreads are
quite similar.
D.
The new ad's population median and population mean are higher than those of the current ad. Moreover, the data spreads are quite different by treatment.
Assign your answer to an object called answer2.6
. Your answer should be one of "A"
, "B"
, "C"
, or "D"
surrounded by quotes.
In [37]:
answer2.6 <- "C"
In [38]:
test_2.6()
Test passed 🎉
Test passed 🎉
Test passed 😸
[1] "Success!"
Question 2.7
{points: 1}
The previous plot shows that the new ad's sample mean dwell time is higher than that of the current ad. Nonetheless, given the variations found in each treatment's dwell times, how likely would it be for us to see a sample difference in means at least as extreme as the observed one if there were no difference in population means
?
In other words, is the observed difference statistically significant?
Recall the 2-sample $t$-test that you've learned in STAT 201 to test the hypotheses stated in Questions 2.0
and 2.1
. The test statistic to conduct this test (with unequal population variances
) is defined as:
$$ T = \frac{\bar{x}_{\text{new}} - \bar{x}_{\text{current}}}{\sqrt{\frac{s^2_{\text{new}}}{n_{\
text{new}}}+\frac{s^2_{\text{current}}}{n_{\text{current}}}}} $$
where $\bar{x}_{\text{new}}$ and $\bar{x}_{\text{current}}$ are the sample means of the dwell times for the new and current ads, respectively; $s^2_{\text{new}}$ and $s^2_{\text{current}}$ are the sample variances of the dwell times for the new and current ads, respectively; and $n_{\text{new}}$ and $n_{\text{current}}$ are the sample size for new and current ads, respectively.
Furthermore, without making further distributional assumptions and using results of the CLT, under the null hypothesis $H_0$, the $T$ statistic approximately follows a $t$-distribution with approximately
$$ \nu = \frac{ \left(\frac{s_{\text{new}}^2}{n_\text{new}}+\frac{s_{\text{current}}^2}{n_\text{current}}\
right)^2 } { \frac{s_{\text{new}}^4}{n_{\text{new}}^2(n_{\text{new}}-1)}+\frac{s_{\text{current}}^2}
{n_{\text{current}}^2(n_{\text{current}}-1)} } $$
degrees of freedom.
•
Note: if we assume equal variances, the pooled SD is used as a denominator of $T$ and $\text{df} = n_\
text{new} + n_\text{current} -2$ Use the corresponding R
function to calculate all these values (i.e. compute 2-sample t-test). Make sure to use broom::tidy()
to get a more organized result.
Fill out those parts indicated with ...
, uncomment the corresponding code in the cell below, and run it. Assign
your answer to an object called answer2.7
.
In [61]:
answer2.7 <- tidy(t.test(
x = tiktok_sample$dwell_time[tiktok_sample$ad_watched == "new"],
y = tiktok_sample$dwell_time[tiktok_sample$ad_watched == "current"],
alternative = "greater",
var.equal = FALSE, # Corrected to indicate unequal variances
))
answer2.7
A tibble: 1 × 10
estimate estimate1 estimate2 statistic
p.value
parameter conf.low conf.high
method
alternative
<dbl>
<dbl>
<dbl>
<dbl>
<dbl>
<dbl>
<dbl>
<dbl>
<chr>
<chr>
5.467909 21.23937 15.77146 5.030197
5.484754e-
07
197.6653
3.671506 Inf
Welch Two Sample t-
test
greater
In [62]:
test_2.7()
Test passed 🎉
Test passed 🎉
Test passed 🎉
[1] "Success!"
Question 2.8
{points: 1}
What is your decision at the 5% significance level?
A.
Since the p-value is less than 0.05, we reject $H_0$. Therefore, we have statistical evidence to state that the current ad's mean dwell time is larger than the new ad's one.
B.
Since the p-value is less than 0.05, we fail to reject $H_0$. Therefore, we have statistical evidence to state that the new ad's mean dwell time is equal to the current ad's one.
C.
Since the p-value is less than 0.05, we reject $H_0$. Therefore, we have statistical evidence to state that the new ad's mean dwell time is larger than the current ad's one.
D.
Since the p-value is less than 0.05, we fail to reject $H_0$. Therefore, we have statistical evidence to state that the new ad's mean dwell time is larger than the current ad's one.
Assign your answer to an object called answer2.8
. Your answer should be one of "A"
, "B"
, "C"
, or "D"
surrounded by quotes.
In [63]:
answer2.8 <- "C"
In [64]:
test_2.8()
Test passed 😀
Test passed 😀
Test passed 😀
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
[1] "Success!"
Question 2.9
{points: 1}
Instead of using an asymptotic approximation of the sampling distribution, one could use a permutation test to test the equality of the parameters of two populations. Using 1000 replications, test the hypothesis $H_0: \
mu_{current}=\mu_{new}$ vs $H_0: \mu_{current}<\mu_{new}$.
Store the result in an object named tiktok_permute_results
. Your answer should be a tibble with the observed test statistic, and the permutation p-value estimate.
Fill out those parts indicated with ..., uncomment the corresponding code in the cell below, and run it.
In [67]:
set.seed(10) # Don't change this
tiktok_permute_results <- tibble(
obs_test_stat = tiktok_sample %>%
specify(formula = dwell_time ~ ad_watched) %>%
calculate(stat = "diff in means", order = c("new", "current")) %>%
pull(),
pvalue = tiktok_sample %>%
specify(formula = dwell_time ~ ad_watched) %>%
hypothesize(null = "independence") %>%
generate(reps = 1000, type = "permute") %>%
calculate(stat = "diff in means", order = c("new", "current")) %>%
get_p_value(obs_stat = obs_test_stat, direction = "greater") %>%
pull()
)
tiktok_permute_results
Warning message:
“Please be cautious in reporting a p-value of 0. This result is an approximation based on the number of `reps` chosen in the `generate()` step. See `?get_p_value()` for more information.”
A tibble: 1 × 2
obs_test_stat pvalue
<dbl>
<dbl>
5.467909
0
In [68]:
test_2.9()
Test passed 🎉
Test passed 😀
Test passed 🎉
Test passed 🎉
Test passed 😀
[1] "Success!"
3. Review of concepts related to A/B Testing
¶
Let us reconsider the TikTok ad example but this time assume that the company uses two samples of 10 users each, and that the dwell times for sample are drawn from Gaussian distributions with known variances.
•
Note
: this is not a very realistic assumption. However, for large sample sizes the t-distribution is very similar to the Normal distribution. Assume further that the following plot, test_plot
, summarises the information of a test at $5\%$ significance level for the hypotheses stated in Questions 2.0
and 2.1
In [69]:
options(repr.plot.width = 10, repr.plot.height = 9) # Adjust these numbers so the plot looks good in your desktop.
norm_x_axis <- seq(-5, 10, 0.1)
norm_critical <- 1.64
z_stat <- 2.2
norm_dens_data <- data.frame( x = norm_x_axis, y1 = dnorm(norm_x_axis), y2 = dnorm(norm_x_axis, 2 , 1) )
test_plot <- ggplot(norm_dens_data, aes(x = norm_x_axis) ) + geom_line( aes( y = y1, colour = 'H0 is true' ), size = 1.2 ) + geom_line( aes( y = y2, colour = 'H1 is true' ), size = 1.2 ) + geom_area( aes( y = y1, x = ifelse(x > norm_critical, norm_x_axis, NA)), fill = 'black') +
geom_area( aes( y = y2, x = ifelse(x > norm_critical, norm_x_axis, NA) ), fill = 'blue', alpha = 0.3 ) + theme( legend.title = element_blank() ) +
labs( x = '', y = '' ) + geom_point(y=0,x=z_stat,size = 3,shape=19,aes(color = "z score"))+
geom_point(y=0,x=norm_critical,size = 3,shape=19,aes(color = "critical val"))+
scale_colour_manual( breaks = c("H0 is true", "H1 is true", "z score","critical val"), values = c("black", "blue","#f94f21","#33CC00"),
guide = guide_legend(override.aes = list(
linetype =c(rep("solid", 2),rep("blank", 2)),
shape = c(NA, NA,rep(16, 2)))))
test_plot
Warning message:
“Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
Please use `linewidth` instead.”
ℹ
Warning message:
“Removed 67 rows containing non-finite values (`stat_align()`).”
Warning message:
“Removed 67 rows containing non-finite values (`stat_align()`).”
Question 3.0
{points: 1}
Based on the results illustrated in the plot above, the company found the difference to be statistically significant.
Therefore they rejected $H_0$ and started using the new ad. True or false:
The problem with this scenario is that 10 is a fairly small sample size, which considerably hinders the sensitivity of the test to detect if there's a difference. In addition, for a sample of size 10, the probability of Type I Error is very high. Therefore, the company should not rely on this result and expand the experiment.
Assign your answer to an object called answer3.0. Your answer should be either "true" or "false", surrounded by quotes.
In [72]:
answer3.0 <- "false"
In [73]:
test_3.0()
Test passed 🎉
Test passed 😀
Test passed 🎉
[1] "Success!"
Question 3.1
{points: 1}
The Normal curves of test_plot
represent:
A.
The population distribution corresponding to the difference of dwell times for the new vs the current ad
B.
The sampling distribution of the statistic used to test $H_0$ against $H_1$
C.
The sample distribution corresponding to the difference of dwell times for the new vs the current ad
Assign your answer to an object called answer3.1
. Your answer should be one of "A"
, "B"
, or "C"
surrounded by quotes.
In [74]:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
answer3.1 <- "B"
In [75]:
test_3.1()
Test passed 😸
Test passed 😸
Test passed 🎉
[1] "Success!"
Question 3.2
{points: 1}
Select the right option to complete the sentence below:
In test_plot
, the probability of Type I error is ...
A.
illustrated by the light blue area
B.
not illustrated in the plot
C.
illustrated by the red dot
D.
the blank area of the black curve
E.
illustrated by the black area
Assign your answer to an object called answer3.2
. Your answer should be one of "A"
, "B"
, "C"
, "D"
, or "E"
surrounded by quotes.
In [84]:
answer3.2 <- "E"
In [85]:
test_3.2()
Test passed 🎉
Test passed 🎉
Test passed 😀
[1] "Success!"
Question 3.3
{points: 1}
Select the right option to complete the sentence below:
In test_plot
, the power of the test is ...
A.
illustrated by the light blue area
B.
not illustrated in the plot
C.
illustrated by the red dot
D.
the blank area of the black curve
E.
illustrated by the black area
Assign your answer to an object called answer3.3
. Your answer should be one of "A"
, "B"
, "C"
, "D"
, or "E"
surrounded by quotes.
In [86]:
answer3.3 <- "A"
In [87]:
test_3.3()
Test passed 🎉
Test passed 🎉
Test passed 🎉
[1] "Success!"
Question 3.4
{points: 1}
In test_plot
the p-value is represented by the portion of the black area to the right of the red dot. True or false?
Assign your answer to an object called answer3.4
. Your answer should be either "true" or "false", surrounded by quotes.
In [90]:
answer3.4 <- "true"
In [91]:
test_3.4()
Test passed 😀
Test passed 🎉
Test passed 🎉
[1] "Success! You are done with the first worksheet of STAT 301!!"
Summary and Review
¶
Some basic concepts to recall:
Sampling Distribution
¶
•
The sampling distribution
is the distribution of a statistic (e.g., sample mean, sample proportion, t-
statistic, z-score).
•
The sampling distribution is different from
the sample distribution •
The sampling distribution is different from
the population distribution •
We need a sampling distribution to make probabilistic statements about our statistic.
•
For example: if the population mean is actually 0 (we usually want to test this, you don't know it), what is the probability that the sample mean would be greater than 1? •
The problem is that the sampling distribution is usually unknown, mainly because the population distribution is unknown.
•
You may be able to derive mathematically the sampling distribution if you know the population distribution (rarely in practice).
•
For example, if your sample comes from Normal distribution, then the sample mean is Normal as
well •
In certain cases, you can use results of the CLT if your sample size is large and additional assumptions are met.
•
For exmple, for a sample of independent and identically distributed random variables, if the sample size is large, the sampling distribution of the mean is approximately Normal •
You can use bootstrapping (although conditions exist as well) to approximate the sampling distribution.
Errors in Hypothesis Tests
¶
There are 2 types of errors in a hypothesis testing problem:
•
Type I error
: rejecting $H_0$ when $H_0$ is true
•
Type II error
: failing to reject $H_0$ when $H_0$ is false
The probability of the type I error is usually called significance level
(aka $\alpha$) and it is set by the analyst when designing a test.
Another important measure used to design a test is the power
:
•
Power
: the probability of rejecting $H_0$ when $H_0$ is false (i.e., power = $1 - P(\text{type II error})
$) $p$-value
¶
The $p$-value can be used to assess the significance of the observed results by comparing its value to the specified significance level:
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
•
Is $p < \alpha$?? But what is a $p$-value?? It's been greatly missused for sure!!
•
$p$-value
: the probability, under the model specified in $H_0$, that a statistic would be at least as extreme as its observed value Note that the $p$-value is NOT
:
•
the probability that $H_0$ is true •
the probability that $H_0$ is false •
the probability that the statistic observed was produced by random chance alone •
a measure of the importance of the observed effect
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Documents
Recommended textbooks for you
Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill
Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL
Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt
Recommended textbooks for you
- Glencoe Algebra 1, Student Edition, 9780079039897...AlgebraISBN:9780079039897Author:CarterPublisher:McGraw HillHolt Mcdougal Larson Pre-algebra: Student Edition...AlgebraISBN:9780547587776Author:HOLT MCDOUGALPublisher:HOLT MCDOUGALBig Ideas Math A Bridge To Success Algebra 1: Stu...AlgebraISBN:9781680331141Author:HOUGHTON MIFFLIN HARCOURTPublisher:Houghton Mifflin Harcourt
Glencoe Algebra 1, Student Edition, 9780079039897...
Algebra
ISBN:9780079039897
Author:Carter
Publisher:McGraw Hill
Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL
Big Ideas Math A Bridge To Success Algebra 1: Stu...
Algebra
ISBN:9781680331141
Author:HOUGHTON MIFFLIN HARCOURT
Publisher:Houghton Mifflin Harcourt