Part Three - Dynamic Environments
Part Four - Contextual Bandits
Part Five - A/B Testing to Contextual Bandits
We have four experiences and are operating in all provinces and territories. It may be the case that there are some variants that work better for certain regions. If we test in a certain region the sample sizes are small, it's hard to know how long to run the experiments.
Contextual bandits allow us to segment an audience and run an experiment per context. A context is a description of a situation. A context is made up of one or more variables that describe that situation. For example the format {province}-{region} takes on values such as AB-rural. For the national case all contexts are described as canada. Which means all pulls and rewards are accumulated in one context.
In our example we will be testing four user experiences across Canada. Our data set is going to be a set of conversion rates per province broken down by rural and urban. Generally we would run an experiment across all of Canada while ignoring that there is variance in conversion between provinces and rural vs urban users.
The national view represents the conversion rates if we didn't take into account the context when selecting variants. Notice how homogenous the conversion rates are. It is not clear which one is best. This would be a result of typical A/B test.
During that A/B test we collected the data and were able to segment the conversions by geographical location of the user. This view segments the audience into 20 categories. The best variant according to this view is often not the best in the context. For example, QC - rural, v3 has a conversion of 0.53 whereas v2 has conversion of 0.88 which is much higher.
The data set is a 20x4 array of conversion rates. 13 for provinces and territories and 2 for the region types. Conversions range from 0.5 to 0.9. Each conversion rate is calculated independently.
We are going to have a map keyed by a context name that is composed of a province name and region. This gives us a unique set of variant conversions from the conversions table. Pulls and rewards are stored in the values of the map for each variant. This means that each context gets it's own experiment.
This is how the variant conversions evolve over time. Both algorithms are good at approximating the variant conversion rates.
Noisy at the beginning. Converges over time.
Approximately half the amount of time to converge on conversion rates.
We can visualize the accumulation of regret. In this case regret is calculated as the difference in conversion rate of the best variant in a context and the conversion rate of the variant selected.
In the national case, even if the system picks the best variant, that variant is not the best variant in that context.
Using the contextual approach we see smaller regret accumulation than the national approach. This mean we are using the best converting variant more often because of contextual information.
In the short term contextual e-greedy looks really good. In the long run contextual UCB starts to outperform contextual e-greedy. In this example this happens around half a million iterations.
We can track when we receive rewards. Ultimately the goal of this system is maximize rewards.
Contextual e-greedy looks to get the most amount of reward in short term.
If we had this dataset when we were making decisions, each time we had to choose a variant for a context, we would find the best converting variant and use it. If we did that we can calculate the average of the best conversions across all of our contexts. This gives 0.79. This is the best we can do.
If we relied on traditional A/B testing we would productize V3 with a conversion rate of 0.70. This represents the baseline of a single variant with any optimization.
Show how the conversion changes by algorithm and context description.
The national conversions are below the baseline. This is because some exploration is required which costs some conversion. From the national perspective we can't do better than the best converting variant which is 0.70. In the contextual case e-greedy outperforms UCB in the short term. UCB will approach the optimal line and e-greedy will be parallel to it in the long run.
For variant iteration we need to identify which variants should be removed. For each context we will have one of two situations; the variant we are considering is the current best or it is not. If we remove the best variant in a context, then the second best variant would be used. For contexts where the variant is not the winner it wouldn't affect the experiment if it was removed. In the best variant scenario the difference in conversion between the best and second best variants can be summed across contexts to find relative contribution.
For each variant we take the contextual conversion rates and set the variants conversion to zero. Then find the best possible average conversion rate across contexts.
The results show that removing any one of the variants would result in a lose in possibly achievable conversion rate. V1 is the most important. Without it, our maximum achievable conversion drops by the most when removed.
V0 | 0.781 |
V1 | 0.778 |
V2 | 0.780 |
V3 | 0.780 |
The results show that removing more than one of the variants would also result in a lose in possibly achievable conversion rate. The loss is greater with this number of variants removed. Only two variant removal is explored here because removing three variants is just the column average (national view). Removing four variants doesn't make sense.
V0 V1 | 0.756 |
V0 V2 | 0.758 |
V0 V3 | 0.742 |
V1 V2 | 0.748 |
V1 V3 | 0.745 |
V2 V3 | 0.747 |
V1 finds the most amount of conversion across the contexts.
V0 | 0.38 |
V1 | 0.47 |
V2 | 0.40 |
V3 | 0.41 |
Our example looked at a problem at two different levels of granularity. By segmenting the audience we found that different experiences convert differently in different regions. By running an experiment per context using contextual multi-armed bandits, we were able to gain conversion and come close to the optimally achievable conversion rate. The longer the horizon the better decisions this system will make.
Generally the conversion rate of the best performing algorithm, contextual e-greedy (at least in the short term), reaches a conversion rate of 0.75 to 0.77. Which is a 7 to 10 percent uplift.