Lecture 22

pdf

School

McGill University *

*We aren’t endorsed by this school

Course

MISC

Subject

Industrial Engineering

Date

Oct 30, 2023

Type

pdf

Pages

46

Uploaded by AAT1234567

Report
Lecture 22: Multi arm bandits (Continued) + AB Testing D ATA - DRIVEN M ODELS F OR O PERATIONS A NALYTICS 1
A NNOUNCEMENTS HW6 (Individual) is posted and is due on April 8th (Multi Arm Bandits) I will count the best 3 individual HWs out of 4 HW 5 grades will be posted this weekend Next lecture will be Project Work Day 2
P ROJECT P RESENTATION Please submit by April 5th your presentation slides (10 min-2min questions) Discuss the background, the data, the approach and methodology, and results (if any) We will together select top 3 presentation teams. These groups will get some awards J You will be graded for submission of the slides. The numerical grade will be for the report. Example of slides are posted on myCourses 3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
P ROJECT REPORT Please submit 4 page-report + appendices (Max 10 pages) + Code, by April 11 th Examples of reports on myCourses 4
F INAL EXAM Cumulative, Duration 3h Covers: Regression, Times Series, Network Analysis, Linear and integer programming, Capacity Allocation and Opportunity costs (Bid price policies), Logical Constraints, Multi Arm bandits Will post practice questions this weekend Prepared for you a cheat sheet to help you summarize the course learnings ! 5
O UTLINE Multi-armed Bandit Application - A/B Testing at Vungle 6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
A PPROXIMATE S OLUTIONS Semi-random strategies Perform a greedy action by default Occasionally, take a random action Probability matching strategies
T HOMPSON S AMPLING We begin with prior belief about the arms’ reward distributions Idea: At each trial, we choose an arm randomly , with the choice probability equal to our current belief that the chosen arm is indeed the best arm Then we observe realized reward and update our belief (by Bayes’ rule)
T HOMPSON S AMPLING Widely used in digital applications (where sampling is easy) A good way to incorporate prior knowledge Need to have relatively reliable priors or “long” enough horizon (could just be minutes in digital apps)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
A B USINESS E XAMPLE A store wants to test and learn about the profitability of 5 possible assortments over one year of experimentation Each week, decide on an assortment to offer 52 weeks in total How should they perform the experiment?
S IMPLIFIED A SSUMPTION Assume each assortment ࠵? ’s revenue follows a normal distribution ࠵? ࠵? ࠵? , ࠵? ࠵? But we don’t know what the ࠵? ࠵? ’s are So, we need to learn them as we gather more information
O BSERVATIONS Epsilon-greedy tends to do better eventually with smaller ࠵? (rule of thumb: ࠵? = ࠵?. ࠵? ) With no prior information, epsilon-decreasing does well with larger ࠵? to be able to explore initially Thompson sampling performs well if you have priors and can sample from beliefs – and performs better as your estimation of priors improves
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
M ULTI - ARMED B ANDIT W RAP -U P Exploration vs. Exploitation Valuable approach if you need to learn while you earn: “online/real-time learning” Very broad applications Next week: mobile ads, recommendation systems Epsilon-greedy/decreasing, Thompson sampling
C ASE : A/B T ESTING AT V UNGLE
C ASE B ACKGROUND How does Vungle make money? Video-ad platform that embeds ads in apps and games Who are the key players in the mobile video ad market? Users, publishers, advertisers, platform (Vungle) What challenges does Vungle face? Mobile in-app advertising funnel (98% fill rate, 88% completion rate, 5% click-through rate, 0.5% conversion rate); advertisers pay upon installs How to improve conversion rate?
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
I S A LGORITHM A OR B B ETTER ? Hypothesis testing Null hypothesis: eRPM of A = eRPM of B Alternative hypothesis: eRPM of B > eRPM of A Paired t-test (paired by day) – Mean(eRPM of B – eRPM of A) = 0.11 (this is called an "effect", how much is the right effect?) Standard error = 0.034 p = 0.003 • eRPM of B is significantly higher than eRPM of A
H YPOTHESIS T ESTING Distribution under null hypothesis ࠵? = ࠵? ࠵? #࠵? p value p value : the probability of observing data at least as extreme as the current ones, assuming the null is true
H YPOTHESIS T ESTING Distribution under null hypothesis ࠵? = ࠵? ࠵? #࠵? p value p value : the probability of observing data at least as extreme as the current ones, assuming the null is true REJECT at 5% level ( p < 0.05)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
S TATISTICAL P OWER OF THE T EST Power = Prob (reject null | alternative is true) Ability of a test to detect an effect if it exists The higher the better Given a significance level (e.g., 5%) and an effect size (smaller effect size requires more data) , need more data to achieve higher power Power analysis: What is the minimum sample size to detect an effect?
P OWER A NALYSIS E X .: T - TEST • ࠵? ࠵? : eRPM of A on day ࠵? • ࠵? ࠵? : eRPM of B on day ࠵? • Null: . ࠵? − . ࠵? = ࠵? ; Alternative: . ࠵? − . ࠵? > ࠵? Desired significance level = 5% Effect size ࠵? : . ࠵? − . ࠵? = ࠵?. ࠵?࠵? With 30 days of experiments, what is the power of our test? Prob(reject . ࠵? − . ࠵? = ࠵? | . ࠵? − . ࠵? = ࠵?. ࠵?࠵? )
t statistic with ࠵? days of experiments ࠵? ࠵? = & ࠵? ࠵? #࠵? ࠵? = ࠵? ࠵? ࠵?$࠵? ࠵? ࠵? ࠵? − ࠵? ࠵? #࠵? ࠵? ࠵? : sample size; #࠵? : sample standard deviation To reject null hypothesis at 5% level, need ࠵? ࠵? > ࠵?. ࠵?࠵? (use normal approx.) Power = Prob ࠵? ࠵? > ࠵?. ࠵?࠵?|࠵? = ࠵?. ࠵?࠵? = Prob & ࠵? ࠵? − ࠵? #࠵? ࠵? > ࠵?. ࠵?࠵? − ࠵? #࠵? ࠵? |࠵? = ࠵?. ࠵?࠵? ≈ ࠵? − ࠵? ࠵?. ࠵?࠵? − ࠵?. ࠵?࠵? #࠵? ࠵? ≈ ࠵?. ࠵?࠵? P OWER A NALYSIS E X .: T - TEST (TECHNICAL SLIDE)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
P OWER A NALYSIS E X .: T - TEST (TECHNICAL SLIDE) What is the min # of days of experiments to achieve a power of at least 0.95? Power = ࠵? − ࠵? ࠵?. ࠵?࠵? − ࠵?.࠵?࠵? ’࠵? ࠵? > ࠵?. ࠵?࠵? ⇒ ࠵? ≥ ࠵?࠵? R function: power.t.test(n = , d = , sig.level = , power = , type = c("two.sample", "one.sample", "paired")) n is the sample size, d is the effect size (theta) , and type indicates a two-sample, one-sample, or paired t- test Input three of the four (n, d, sig.level, power) and get the fourth as output
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
P OWER A NALYSIS E X .: T - TEST Power = ࠵? − ࠵? ࠵? − ࠵? B࠵? ࠵? How does power relate to sample size? A larger sample size à higher power How does power relate to desired significance level? A stronger sig. level à lower power (harder to reject) à need more data How does power relate to effect size? A larger effect size à higher power (easier to detect) à need less data
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
C HALLENGE OF A/B T ESTING If the effect size is small… If we need to test many different variations… Need many experiments May not be able to distinguish “good” from “best” Likely sacrifice revenue/profit! Experiment on inferior options à do not incorporate information feedback • Solution? Learn while you earn! à Multi-armed bandit algorithms
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
A/B T ESTING VS . M ULTI - ARMED B ANDIT A/B testing is essentially first explore then exploit Epsilon-first strategy in multi-armed bandit Pure exploration (the first ࠵?࠵? rounds) followed by pure exploitation (the remaining ࠵? − ࠵? ࠵? rounds) Arms are uniformly randomly selected in exploration Best arm identified in exploration is always selected in exploitation
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
S TATE - OF - THE -A RT O NLINE E XPERIMENTS Google Analytics Content Experiments A platform for publishers to test ways to improve their websites Use Thompson sampling to direct traffic among different variations Update posterior beliefs on conversion rates twice daily Variations with cumulative better performance get more traffic in the future Benefits vs. A/B testing?
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
A S IMULATION E XAMPLE Two versions of a website A: original version, 4% conversion rate (CvR) B: new version, predicted 5% CvR If we do A/B testing on which version is better Null hypothesis? A’s CvR = B’s CvR Alternative hypothesis? A’s CvR ≠ B’s CvR How many observations do we need?
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
P OWER A NALYSIS Significance level: 5% Effect size: 5% – 4 % = 1% Power: 95% Two-sample proportion test R function: power.prop.test(n = NULL, p1 = 0.04, p2 = 0.05, sig.level = 0.05, power = 0.95, alternative = “two.sided”) Required sample size ~ 22,300 Assume 100 obs./day, need 223 days
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
T HOMPSON S AMPLING A PPROACH Recall from last lecture Thompson sampling 1. For each arm, sample parameters from prior beliefs 2. Choose the arm with the highest expected reward given the sampled parameters 3. Pull that arm and observe actual reward 4. Update belief for the pulled arm only (using Bayes’ rule) Use Thompson sampling to redirect traffic Probability matching: % of traffic to A = Prob(A CvR > B CvR)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
T HOMPSON S AMPLING – B ERNOULLI B ANDIT Conversion can be viewed as a Bernoulli event Convert with prob. p à p is CvR Online content experiment wants to learn the ࠵? ࠵? ’s for each variation ࠵? For Bernoulli bandits, we usually use the Beta distribution as the prior • Why? Mathematically, can compute posterior easily Practically, Beta distribution is flexible
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
B ETA D ISTRIBUTION Beta distribution is flexible to model a large range of actual shapes
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
P OSTERIOR FOR B ERNOULLI B ANDIT WITH B ETA P RIOR If prior belief for an arm is Beta(1, 1) (the uniform distribution; no information) Then posterior belief for that arm after getting ࠵? − ࠵? successes and ࠵? − ࠵? failures is Beta( ࠵?, ࠵? ) To update, just need to keep track of number of successes (conversions) and failures (non- conversions)!
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
T HOMPSON S AMPLING FOR L ARGE O NLINE E XPERIMENTS Don’t update after every observation, but periodically Google Analytics: update twice daily After each update, need to determine % of traffic going to each arm Probability matching: % of traffic to A = Prob(A CvR > B CvR)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
T HOMPSON S AMPLING TO D IRECT E XPERIMENT T RAFFIC Need to estimate Prob(A CvR > B CvR) given current prior (Beta distributions) Directly compute this prob. from the distribution is formidable Simulate a large matrix Each row is a random draw (i.e., a sampled CvR) Each column is one variation (A or B) Estimated Prob(A CvR > B CvR) = Empirical fraction of rows where column A sample > column B sample
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
T HOMPSON S AMPLING – S IMULATION For one simulation Black = original Red = new Graph shows the probability of either being optimal (sums to 1) Stop when either hits 95% Finish in 66 days (saved 157 days) and correctly identify the optimum
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
T HOMPSON S AMPLING 500 S IMULATIONS On average, saved 175 days and increased 98 conversions
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
A NOTHER E XAMPLE 6 versions of the website Original: 4% conversion rate Alternatives: 5%, 4.5%, 3.5%, 3%, 2% conversion rates If we do A/B testing… Need 91,842 obs. à 2.5 years! If we use Thompson sampling…
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
T HOMPSON S AMPLING S IMULATION By 100 days: inferior arms are all eliminated Less wasteful (earn more while learning) Learn more (more observations for better arms)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
T HOMPSON S AMPLING 500 S IMULATIONS On average, saved 831 days and increased 1,173 conversions
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
M ULTI - ARMED B ANDIT A PPROACH IN G OOGLE A NALYTICS • Benefits Often produces answers more quickly than A/B testing Efficient: Move toward winning arms without having to wait for the end of the experiment Faster natural selection: Samples are redistributed from obviously inferior arms to better arms More samples on better arms help separate “good” arms from “best arms” more quickly
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
T HOMPSON S AMPLING VS . A/B T ESTING FOR V UNGLE Simulate performance of Thompson sampling Update posterior beliefs once a day Assume the eRPM observed each day remains the same as in the A/B testing data Given the large numbers of impressions, this is reasonable
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
T HOMPSON S AMPLING FOR V UNGLE Assume that the eRPM for A and B is normally distributed with known variance and unknown mean Take normal distribution as our prior Can calculate explicitly Prob(mean A eRPM > mean B eRPM | prior beliefs)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
D ETERMINE T RAFFIC S PLIT • Suppose ࠵?~࠵? ࠵? ࠵? , ࠵? ࠵? ࠵? , ࠵?~࠵? ࠵? ࠵? , ࠵? ࠵? ࠵? • Prior: ࠵? ࠵? ~࠵? B࠵? ࠵?,࠵? , ࠵? ࠵? ࠵? /(࠵? + ࠵?) ࠵? ࠵? ~࠵? B࠵? ࠵?,࠵? , ࠵? ࠵? ࠵? /(࠵? + ࠵?) So Prob ࠵? ࠵? > ࠵? ࠵? |࠵?~࠵? ࠵? ࠵? , ࠵? ࠵? ࠵? , ࠵?~࠵? ࠵? ࠵? , ࠵? ࠵? ࠵? = P rob ࠵? > ࠵? where ࠵?~࠵? B࠵? ࠵?,࠵? − B࠵? ࠵?,࠵? , (࠵? ࠵? ࠵? + ࠵? ࠵? ࠵? )/(࠵? + ࠵?)
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
T HOMPSON S AMPLING FOR V UNGLE Begin with 15/16 of traffic going to A on day 1 To match the probability for A/B testing Give A a small starting advantage If B really beats A, then we expect the sample mean of A to fall below that of B with more data Then traffic will be gradually redirected to B At the end of each day, update B࠵? ࠵?,࠵?/࠵? and B࠵? ࠵?,࠵?/࠵? to new sample means
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
R ESULTS 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627282930 Fraction of TrafficDirectedto A or B by Day A fraction B fraction 2.5% higher revenue with Thompson sampling vs. A/B testing
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
V UNGLE C ASE W RAP -U P Multi-armed bandit (Thompson sampling) is state-of-the-art for online experimentation Learn while you earn! = Efficiency Google Analytics: https://analytics.googleblog.com/2013/01/multi- armed-bandit- experiments.html#:~:text=Google%20Analytics%20 uses%20a%20multi,updated%20as%20the%20exper iment%20progresses
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help

Browse Popular Homework Q&A

Q: Procedure: Review the summary of a published scientific study below: Patients with multiple…
Q: The percent by mass of phenol (MM = 94.11 g/mol) in an aqueous solution is 10.9%. What is the…
Q: How many asymmetric centers are in the molecule on the right?
Q: The following transactions are July activities of Bennett's Bowling, Inc., which operates several…
Q: 13 of 36 > Consider the pair of reactions. Draw the organic products, then predict the type of…
Q: Suppose that W is a discrete random variable and its cumulative distribution function is given in…
Q: on ni aliuzen en B. Gene A contains a base substitution that results in a nonsense mutation. HTW…
Q: Write a class called Pokemon that has a name (e.g. “Pikachu”) and a number (e.g. 25). Implement a…
Q: If the second of three consecutive integers is subtracted from 108, the results is the sum of the…
Q: The projections of the LGN to the visual cortex are easy to tract because: layers 1 and 2 are…
Q: 2) Graph the following conic. -4x² + 8x + y² - 4y + 4 = 0
Q: == What was the order of the pigments on the chromatography paper from the bottom up? Postlab 6.13…
Q: b. quantity of inputs used and quantity of output generated.
Q: What are some affordances of visual mediums?
Q: In term of the sugar identity , what is the difference between DNA and RNA. 2 What nucleobases are…
Q: If m 42 = 112°, then m 47 = 2 + 36 78 B C
Q: A = -3 0 17 1] 506 803
Q: Journal entry worksheet < 1 2 3 4 5 6 As of December 31, Lyn Addie has not been paid for four days…
Q: An electric field of strength 1.84 kv/m and a magnetic field of 0.499 T act on a moving electron to…
Q: A charge of -20 nC is placed in a region with constant electric field of 4.5 x 10^5 N/C directed in…
Q: On January 1, the long-term liability section of Cracker & Company Balance Sheet showed a balance of…
Q: 1 People who are severely obese (BMI ≥40) are at most risk for serious health problems, which are…