Economists often find instruments in policies. Josh Angrist and Alan Krueger instrument for years of schooling with the Vietnam Draft lottery; Steve Levitt instruments for prison populations with prison overcrowding litigation. Although good instruments in the real world can generate incredible insights, they are notoriously hard to come by.

The good news is that instruments are everywhere in tech. As long as your company has an active AB testing culture, you almost certainly have a plethora of instruments at your fingertips. In fact, any AB test that drives a specific behavior is a contender for instrumenting ex post for the effect of that behavior on an outcome you care about.

Suppose we are interested in learning the causal effect of referring a friend on churn. We see that users who refer friends are less likely to churn, and hypothesize that getting users to refer more friends will increase their likelihood of sticking around. (One reason we might think this is true is what psychologists call the Ikea Effect: users care more about products that they have invested time contributing to.)

Looking at the correlation of churn with referrals will of course not give us the causal effect. Users who refer their friends are de facto more committed to our product.

But if our company has a strong referral program, it’s likely been running lots of AB tests pushing users to refer more — email tests, onsite banner ad tests, incentives tests, you name it. The IV strategy is to focus on a successful AB test — one that increased referrals — and use that experiment’s bucketing as an instrument for referring. (If IV sounds a little like RDD, that’s because it is! In fact, IV is sometimes referred to as “Fuzzy RDD”.)

IV results are internally valid provided the strong first stage and exclusion restriction assumptions (above) are satisfied:

  • We’ll likely have a strong first stage as long as the experiment we chose was “successful” at driving referrals. (This is important because if Z is not a strong predictor of X, the resulting second stage estimate will be biased.) The F-statistic lets us check the strength of our first stage directly - a good rule of thumb is that the F-statistic from the first stage should be least 11.
  • What about the exclusion restriction? It’s important that the instrument, Z, affect the outcome, Y, only through its effect on the endogenous regressor, X. Suppose we are instrumenting with an AB email test pushing referrals. If the control group received no email, this assumption isn’t valid: the act of getting an email could in and of itself drive retention. But if the control group received an otherwise-similar email, just without any mention of the referral program, then the exclusion restriction likely holds.

Want to estimate the causal effects of X on Y but don’t have any good historical AB tests on hand to leverage? AB tests can also be implemented specifically to facilitate IV estimation! These AB tests even come with a sexy name — Randomized Encouragement Trials.

More here, including sample R code on GitHub.

View 1 other answer to this question
About · Careers · Privacy · Terms · Contact · Languages · Your Ad Choices · Press ·
© Quora, Inc. 2025