"Correlation does not equal causation"

I don't like this phrase. Not because it's not accurate (of course it's accurate). I don't like how it's used to disarm analysts. This one simple phrase can bring any analysis to a screeching halt as soon as the words are uttered.

When a stakeholder says, "Yes... but correlation does not equal causation," that's code for "Your insights aren't good enough for me to act on." aka a showstopper.

Source: https://xkcd.com/552/

This popular phrase has led decision-makers to believe that they need a causal insight in order to make a decision with data. Yes, in a perfect world, we'd only act on causal insights. But in practice, this requirement isn't reasonable. More often than not, when stakeholders require "causality" to make a decision, it takes way too long so they lose patience and end up making a decision without any data at all.

Consider A/B testing, for example. This is the most common way teams are tackling the requirement of causality today. But an A/B is surprisingly difficult to execute correctly - as shown by the countless statisticians waving their hands trying to get us to acknowledge this fact (like this and this). The sad reality is that A/B tests require a lot of data, flawless engineering implementation, and a high level of statistical rigor to do it right... so we end up releasing new features without valid results.

This happens all the time! But data teams are not doing themselves any favors by going through the motions of a half-baked attempt at proving causality, just to make a gut-based decision at the end of the day. We need to change the approach.

The downside to causality

The reality is that causality is very difficult to prove. Not only does it require a higher level of statistical rigor, it also requires A LOT of carefully collected data. Meaning you will have to wait a long time before you can make any causal claim. This is true for other causal inference methods too, not just A/B testing.

Ultimately, causality is an impractical requirement when making decisions with data. So let's stop trying and find another way. Let's go back to using correlations.

I'm not suggesting a free-for-all. We don't want to end up with ridiculous insights that are "technically correlated" but have no reasonable explanation, like these. I'm talking about using correlations in a business context to maximize our chances of making the "best" decision. And doing it in such a way where we can trust that the insights give us a reasonable expectation about how any given decision will impact the things we care about. After all, this is the goal of data.

The solution: A heuristic approach for using correlations to inform decisions

To get trustworthy insights from correlations, we want to maximize the chances that the correlated relationship we're seeing in the data is actually a causal relationship. We can do that by following four best practices below.

1. Be intentional when testing correlations

Don't correlate random things. Search long enough and you're bound to find a really "surprising" correlation, like this. Most likely that relationship is due to chance and now you've wasted your time and everyone else's. Statisticians refer to this approach as "p-hacking".

Source: https://www.tylervigen.com/spurious-correlations

Instead, focus on correlating things that are already connected. A great way to do this is by focusing on the customer's action. If you try to correlate data centered around a customer's actions, you're guaranteed to only explore behaviors that are actually linked in some way. It sounds simple, but it's easy to overlook. For example, we shouldn't look at how "the time of day that we launched a class" influences a customer's likelihood to sign up, we should look at how the "time of day that the customer saw the class" influences their likelihood to sign up. Because we're looking at all behavior from the perspective of the customer and their actions, we can guarantee the actions are linked, increasing the chances that the correlated relationship we see is also causal.

2. Correlate conversion rates, not totals

In most analyses we're trying to gauge how a particular decision or change will influence a customer's behavior. And customer behaviors are best represented with conversion rates, not totals. If I tell you that 10 people converted vs 30 people converted that doesn't tell me much about the customer's behavioral mindset. If I tell you that 10% of people on the website converted vs 30% of people, then you have some understanding of the customer's willingness to convert.

Now, imagine trying to correlate the total phone inquiries with total conversions. We might see that an increase in the total phone inquiries is highly correlated with an increase in total conversions. Obviously yes, because we have a higher volume of people in our funnel, but we intuitively know that more phone inquiries don't cause more conversions. They're just correlated because they're both influenced by another factor (volume). By looking at conversion rates, we can more easily tease out insights that can be used to influence customer behavior.

You may also want to correlate other normalized metrics (like average order value), but if possible this should be avoided. These metrics can have more variance, meaning you will either need a lot more data to find a reliable insight OR you will need to do dimensionality reduction on the metric (effectively turning it into a conversion rate) to reduce the variance before analyzing it. I won't go into too much detail on this now, but we'll cover it in a future blog post. In the meantime, conversion rates are always a safe bet.

Your business and customers are always changing, so it's important to be aware of that. Often times we'll look at correlations in an aggregated way, stripping time from our analysis. But everything is changing in time so a correlation that existed in the past may have disappeared today, and you'll never know it if you don't analyze the data over time.

Since historical behavior is the best predictor of future behavior, we need to look at how the data has been changing over time, especially as the feature we're analyzing has been changing too. If people who have X have consistently had a higher metric Y, then we know we can more easily trust that when we get more people to do X, we'll have a better shot of also increasing Y. As long as the impact is consistent over time, we'll can be more confident that these trends will be reliable in the future too.

A good exercise to demonstrate this point is to look back at previous A/B tests you've run. When we've done this, we've found that some of them were statistically significant during a short blip in time, and not consistent as time went on. This is a common risk in A/B testing and as a result, many experts recommend running evergreen tests. This comes with lots of engineering complexity though and most teams don't end up setting it up.

An A/B test can show significance, but have no impact over the long run

4. Always monitor the results

The downfall of using a correlation, is that we could be wrong. Albeit less likely when we follow best practices above, it's still a risk. But we if can act quickly on correlated findings and vigilantly monitor the results, we can significantly minimize the risk of any wrong decision becoming a catastrophe. This is true for causal insights too, by the way. There is always a risk of things not panning out the way you expected (like in this cautionary tale).

When monitoring the results it's important to track how the change you're making is influencing the outcome you're trying to achieve.  Suppose you found a positive correlation between customers who used a discount code and average order value. Once you find a correlated trend, you'll want to slowly shift volume of customers into the group with the better performance. Then, we can monitor the results using the plot below.

If we can track this data while we're making changes, we can always tell if we made the right decision and fail fast if we didn't.

A reusable, reliable approach

We can automate this approach and the analysis will look something like the analysis below, allowing decision-makers to make more data-driven decisions, faster.

Example auto-generated analysis created with Narrator's Analyze Button

Conclusion

In practice, we need accurate insights and we to act fast. Waiting two months for an analysis that claims causality or 4 weeks for an A/B test to run its course is not going to cut it. But if we can act quickly on correlations, especially when they've been rigorously evaluated using the techniques above, we'll be able to make better decisions, faster. So, let's retire the phrase "correlation does not equal causation" and put a little more faith into correlations themselves.