worksheet_01

html

School

University of California, Los Angeles *

*We aren’t endorsed by this school

Course

301

Subject

Statistics

Date

Feb 20, 2024

Type

html

Pages

Uploaded by DeaconEnergyCat30

Worksheet 1: Introduction to Statistical Modelling and A/B Testing ¶ Welcome to STAT 301: Statistical Modelling for Data Science ¶ Each week you will complete a lecture assignment like this one. Before we get started, let's talk about some administrative details. Hands-on practice can be very useful when you learn technical subjects!! • Weekly lecture worksheets and tutorials are an essential part of the course!! • Collaborating on lecture worksheets and tutorial assignments is more than okay -- it is encouraged! • You should rarely be stuck for more than a few minutes on questions in lecture or tutorial • Ask a neighbour, TA or an instructor for help (explaining things is beneficial, too -- the best way to solidify your knowledge of a subject is to explain it) • Please do not just share answers, though, work cooperatively!! Everyone must submit a copy of their own work. You can read more about course policies on the course website . Learning Objectives ¶ After completing this week's worksheet and tutorial work, you will be able to: 1. Describe the goals of hypothesis testing, in particular difference in means tests related to A/B testing. 2. Give an example of a problem that requires A/B testing. 3. List methods used to test difference in means between two populations. 4. Interpret the results of hypothesis tests. 5. Explain the relation between type I and type II errors, power and sample size in 2-sample hypothesis testing. 6. Write a computer script to perform difference in means hypothesis testing and compute errors, power and p-values. Loading packages ¶ In [3]: # Run this cell before continuing. library(tidyverse) library(infer) library(broom) source("tests_worksheet_01.R")

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ── dplyr 1.1.4 readr 2.1.4 ✔ ✔ forcats 1.0.0 stringr 1.5.1 ✔ ✔ ggplot2 3.4.4 tibble 3.2.1 ✔ ✔ lubridate 1.9.3 tidyr 1.3.0 ✔ ✔ purrr 1.0.2 ✔ ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ── dplyr::filter() masks stats::filter() ✖ dplyr::lag() masks stats::lag() ✖ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to ℹ become errors Attaching package: ‘testthat’ The following object is masked from ‘package:dplyr’: matches The following object is masked from ‘package:purrr’: is_null The following objects are masked from ‘package:readr’: edition_get, local_edition The following object is masked from ‘package:tidyr’: matches 1. Warm Up Questions ¶ Question 1.0 {points: 1} In DSCI 100, you learned about 6 different types of data analysis questions you can ask and answer . Moreover, in STAT 201, you reviewed what an inferential question is. Now, it is time to do a more comprehensive exercise to identify what class of data analysis a given real-life question implicates. Below there is a table that lists out various types of data analysis questions on the left column: Question Type Is wearing sunscreen associated with a decreased probability of developing skin cancer in answer1.0.0

Question Type Canada? How does alcohol consumption relate to socioeconomic status in the 2018 City of Vancouver survey dataset? answer1.0.1 Does a more concise Google ad lead to an increased number of visits to the advertised company's website? answer1.0.2 How do changes in human behaviour lead to a reduction in the number of COVID-19 confirmed cases? answer1.0.3 Does a reduced caloric intake cause weight-loss? answer1.0.4 Do tweets with GIFs get on average more impressions than tweets that do not? answer1.0.5 Does including a GIF in tweets lead to more profile visits than tweets that do not include a GIF? answer1.0.6 How many mentions will my next tweet get? answer1.0.7 How many accounts are there on Twitter today? answer1.0.8 Does increasing the contrast in images lead to better visual discrimination of visually impaired image content? answer1.0.9 The right column of the table is empty but should describe one of the following types of statistical question being asked: A. Descriptive. B. Exploratory. C. Inferential. D. Predictive. E. Causal. F. Mechanistic. Assign your answers to the objects answer1.0.0 , answer1.0.1 , answer1.0.2 , answer1.0.3 , answer1.0.4 , answer1.0.5 , answer1.0.6 , answer1.0.7 , answer1.0.8 , and answer1.0.9 . Your answer should each be a single character ( "A" , "B" , "C" , "D" , "E" , or "F" ) surrounded by quotes. In [6]: answer1.0.0 <- "C"

Your preview ends here