Avoiding Bias in Your Data Analysis

Collecting your business data is one thing, but how do you ensure you effectively analyze it?

Bias is a major problem when it comes to any sort of data analysis, and it’s often not obvious to the business that they’re falling into a trap. You might have perfectly good data, but it’s in the analysis of that data that incorrect conclusions are drawn.

How does bias occur and how can you avoid it in your business? Let’s take a look:

Can you spot anchor bias? Read our quick guide here

Download now

Confirmation Bias

Virtually everyone is guilty of this at some point. In business, in politics, in life - it doesn’t really matter where, confirmation bias can rear its head.

Confirmation bias is when you have an idea, expectation, or opinion about something before you get any data on the topic. When the data comes in, your analysis is heavily weighted by what you expect to find - you actively look for examples to prove the results you expected.

Usually when people are struck by this bias, they emphasize data that supports their opinion, while downplaying or ignoring data that either moderates their opinion, or shows the opposite to be true. In some cases, people cling quite stubbornly to the examples that support their beliefs, willfully ignoring data that doesn’t.

With that being said, confirmation bias is usually not a conscious choice, it’s just that the person becomes too steeped in what they believe to pay attention to it. Almost anyone you meet can fall into this trap, their judgement clouded by past experiences or beliefs.

“(...) if you notice a rise in reports about shark attacks on the news, you start to believe sharks are out of control, when the only thing you know for sure is the news is delivering more stories about sharks than usual.” - David McRaney

In your business, this can be dangerous. There have been many projects, business strategies or products that went ahead, but were a disastrous failure due to confirmation bias. At best, perhaps you waste time and effort for a while, at worst, acting on confirmation bias can put the company in a perilous position.

Confirmation bias occurs when you only count on the data you expect to find

Avoiding confirmation bias in your data analysis

As a business leader, how can you put measures in place to ensure that your data analysis isn’t corrupted by prior opinion or expectation?

It’s fairly difficult to squash our own opinions, so a key piece of advice is take time to consider before executing on the data. What would be the likely outcome if you made the opposite choice? This next quote is from HBR:

“Gather the data you would need to defend this opposite view, and compare it with the data used to support your original decision. Reevaluate your decision in light of the bigger data set. Your perspective may still be incomplete, but it will be much more balanced.”

Highly successful investor Warren Buffett recognizes that his own decisions may be swayed by opinions, so he follows a strategy similar to what HBR outlined. As a Forbes article reveals, Buffett has gone as far as to invite one of his known critics, Doug Kass, to participate in Berkshire Hathaway’s annual meeting.

This might seem like an unusual step - Kass is a hedge fund manager who was shorting stock in Berkshire Hathaway - but Buffett saw it as an opportunity to spice the meeting up and get a prime view of an opposing opinion.

Being aware that confirmation bias exists is a good first step to avoiding it, followed by doing what you can to seek, and understand information that opposes your current belief.

If we were to borrow the shark attack analogy from the quote in the last section, your next step would probably be to acknowledge that your opinion, “the sharks are getting out of control” may not be correct. You’d seek out data from as neutral a perspective as possible - perhaps records on shark attacks over the years. Perhaps the data across years tells you that there is a pattern of an increase in attacks, but perhaps there is no significant increase, and your opinion is formed because the news is reporting it more often.

Selection Bias

Selection bias occurs when the data you use has been selected in such a way, that it excludes data that is important to consider. (There are times to exclude data, which we’ll get to in a moment). The results or conclusions you draw are then skewed as a result.

This particular bias is sometimes shown on purpose, particularly if someone wants to make something look better or worse than reality. Let’s say you needed to report on the number of sign-ups your company is getting. If you were to look at data for the whole year, but select only the best-performing quarter for your report, that would be selection bias.

You can probably tell from this description, but confirmation bias can also lead to selection bias, particularly where you ignore data that doesn’t support your pre-formed ideas.

Sometimes, selection bias is not deliberate at all, but more faulty methodology for selecting data. This is why it’s important to be aware of this bias.

How to avoid selection bias in data analysis

Some popular ways to avoid selection bias include:

  • Ensuring data is truly random. For example, if you are selecting subgroups from populations.
  • Ensuring that data you do select is representative of the characteristics of the population as a whole.
  • Seeking larger data samples and averaging across a greater time period. That example in the previous section about a report on sales would be more accurate if the data used was across an entire year. This is especially important to consider if aspects such as seasonality or what your marketing is doing may impact results.

Outlying Data

Remember we mentioned that there are cases for data to be excluded? Outlying data can be one of those.

Outlying data is any that is significantly different to what is normal for you. Outliers are an abnormal distance from other values in a random sample. For example, going back to data on shark attacks, you might find that there were an unusually large number in the last week, but across a year that number is abnormal. If numbers for the following months were more regular, you’d know that week was an outlier.

Outliers can skew your overall results, for example, taking averages too far in one direction.

What to do about outliers

There are a couple of different approaches to ensuring that outlying data doesn’t skew your overall results. First, you could exclude the outliers altogether. This would especially be the case if the outlier appeared due to bad data (i.e., if something wasn’t set up properly in the data collection). You would want to be fairly certain that this was the case though.

Second, you could do lots of calculations - one with the outlying data and one without. This can give you a range, depending on what the data is that you’re actually looking at. A point to remember is that, while extreme outliers will affect the mean of a range of numbers, they will not affect the median, so it depends upon what sort of data you need.

Availability Bias

Availability bias occurs when we have the tendency to think that the most immediately available data is more representative than what it really is. It can come from a cognitive shortcut known as availability heuristic, whereby people make judgments about the likelihood of an event based on how easily an example, instance, or case comes to mind.

For example, what if you were thinking of founding a startup, and several of your friends were very successful startup founders? You might think that it is therefore likely you will be successful because that is the data you have immediately available. Of course, this would be ignoring deeper data that shows large numbers of startups fail...

The danger of availability bias is that you’re making decisions based on a small piece of the picture. There may be significant data points that you are missing, for example, what about things that you don’t already know about? No one can claim to know everything…

Avoiding availability bias

Availability bias can lead to people rushing to make a decision or solve a problem, especially if they have little time.

The key to avoiding it is to make sure that you are properly digging into the data. Do you have the full story, or are key parts missing?

Know and avoid anchor bias. Get our quick guide here

Download now

Final Thoughts

Bias in data analysis is a real pain for businesses that want to use data to make better decisions. There are many types of bias that can befall your analysis, but we have touched on some of the most common biases here.

You probably won’t entirely avoid all bias (when was the last time you passionately argued for something you believed in, despite credible data to the contrary?). However, being aware of bias is a great start. Always look for the bigger picture and ensure that data is read from as neutral a perspective as possible.