Figure 5: Compare predicted HTE to observed HTE
- Metric: Profit
Figure 6: Compare predicted HTE to observed HTE
- Metric: Orders
shows these estimates, with the first level of each factor as
the omitted baseline.
We assess the importance of each factor, which is the max-
imum impact the factor can have on the outcome metric.
Specifically, for a factor
f
, we calculate max
l
{
ˆ
β
fl
−
min
l
ˆ
β
fl
}
.
By this measure, we note that Discount is the most im-
portant factor, followed by Promo Spread, Messaging, and
Trigger Timing. Within the Discount factor, Level 3, which
stands for a “%off” with a limited redemption count promo
representation of the discount has the largest and most sta-
tistically significant coefficient.
Overall, these estimates tell us that the way the discount
is communicated in a promotion is the most important fac-
tor among those considered here, and, specifically, a unified
“%off” with a limited max redemption count representation
of the discount has the greatest impact on profit for the tar-
get audience of the program we optimize for.
Next, we use the approach described in Section 2.6 to pre-
dict a policy based on the combination of the best level
f
l
∗
of
each factor
f
. Based on this calculation, our optimal policy
is: Upfront Promo Spread; Discount conveyed as a “%off”
with a limited max redemption count; Ongoing Triggering;
Generic Messaging. This policy happens to be out of sample,
that is, it was not included in the eight-arm experiment. Its
predicted profit is greater than the highest among the eight
Variable Name
Coef
Std err
t
P
>
|
t
|
Intercept
7.35231
23.08625
61.032
0.000
Promo Spread [Upfront]
0.12905
0.40522
1.414
0.157
Discount [Level2]
0.40757
1.27977
3.648
0.000
Discount [Level3]
0.41950
1.31723
3.253
0.001
Trigger Timing [weekday]
-0.01350
-0.04239
-0.147
0.883
Messaging [Merchant Recs]
-0.02732
-0.08578
-0.298
0.766
Notes: Variables are represented as “Factor[Level]”.
We use data on the eight experimental arms for this estimation.
Table 1:
Regression of Average profits per user on factor
levels.
experimental arms by 1% and higher than the control group
by 5%. Table 5 in the Appendix shows our predicted profits
from each of the 24 possible policies.
6.3
Heterogeneous Treatment Effects
When we conduct a joint test (7) of interaction between the
factors and all user characteristics, we are unable to reject
the null hypothesis (
p
-value = 0
.
2).
More detailed results
are attached in the appendix, where we can see most inter-
action terms are not statistically significant. This analysis
indicates that a blanket approach of detecting heterogeneity
may not be suitable for our application.
Based on findings from previous campaigns, there might still
exist some user-level heterogeneity that can impact business
outcomes.
To investigate this possibility more closely, we
pick one feature that has the highest historical correlation
with our outcome metric; avg-order-spend, which is the av-
erage amount of money in dollars a user spent on previous
orders. We regress our outcome metric on factor levels, in-
teracting them with avg-order-spend.
Table 2 shows the
results from this regression. Notice that several interaction
terms, such as those with trigger timing and discount, are
statistically significant (joint test
p
-value = 0
.
01).
This heterogeneity recommends different optimal policies
across users with different avg-order-spend.
For example,
controlling for Discount, Promo Spreat, and Messaging, the
treatment effect of Trigger Timing [Weekday] relative to
Trigger Timing [Ongoing] is 0
.
3636
−
0
.
0147
×
avg-order-
spend, which means when avg-order-spend is less than about
$
25, Trigger Timing [Weekday] is more profitable; the oppo-
site is true when avg-order-spend is greater than
$
25. This
differs from the recommendation of launching a blanket on-
going trigger timing made by the model without heterogene-
ity.
We present the optimal policy result in Table 3, where we
discretize the avg-order-spend as 0, 1, 2, etc., recommend
different policies, and give different predicted profits given
different ranges of the avg-order-spend. From the results, we
can also see that the optimal arm selected in Table 5 is only
optimal in Table 3 when the avg-order-spend is between 25
and 26.
Using our dataset to compare the predicted benefits from
using HTE we find that the HTE model can generate 2%
more profit.
7.
CONCLUSION
12