What is statistical significance and why should I care (or not!)?

Data Literacy

Researchers use statistical significance to flag potentially interesting findings that aren’t easily explained by chance. However, this measure is only part of the story – it does not tell us how much a finding matters in the real world (practical or clinical significance). Read on for a closer look at what statistical significance can and cannot tell us.

Statistical significance is important to consider because humans see lots of patterns in things that could be due to chance. If you flip a coin ten times and get seven heads, is something fishy with the coin, or was this random chance? If someone rolls a die and gets three sixes in a row, are they cheating, or is it just a fluke? Beyond coin tosses and dice rolls, statistical tests can also help researchers pressure test the results of clinical trials.

To better understand statistical significance, we need to revisit science class, where we learned about hypothesis testing. When scientists design an experiment, they define a “null hypothesis” (nothing to see here!) and an “alternative hypothesis” (something interesting is happening!).

Let’s take the example of coin flips. Here the “null” hypothesis is that the coin is fair (equal chances of landing heads or tails). The “alternative” hypothesis is that the coin is biased (unequal chances of landing heads or tails). We evaluate statistical significance by calculating how likely it is that our results would occur if the coin were fair. If the results are easily explained by chance, they are not “statistically significant”, and we don’t have enough evidence to say that the coin is rigged. On the other hand, if our heads and tails are so imbalanced that they would be very unlikely to happen with a fair coin (our null hypothesis), the results would be considered “statistically significant,” and we could reject the hypothesis that the coin is fair.

Here’s the tricky part. Is a fair coin flip always 50-50%? Actually, if you flip a fair coin 10 times the chances that you get exactly 5 heads and 5 tails is only 1 in 4 (25%) – most of the time it will be a little off (more heads than tails or vice versa, just by chance). Now, imagine you flipped a coin 10 times and got 7 heads and 3 tails. Can you conclude that the coin is biased? Using statistics, we can see that this result is not that surprising even with a fair coin. In fact, we would expect to see this (or more extreme imbalance of heads and tails) about 1 in 6 times (17%). This means the number of heads is not statistically different from what we would expect from a fair coin – nothing to see here! However, suppose you flipped a coin 10 times, and got 9 heads and 1 tails. Statistically, this result is highly unlikely with a fair coin (~1% probability ) – so maybe there IS something to see here! A “statistically significant” finding is a vote of confidence for the alternative hypothesis – it supports (but doesn’t prove) the hypothesis that the coin is biased.

If your head is spinning, fear not. Statistical significance can be hard to wrap your head around, and even scientists misinterpret this term. The key thing to bear in mind is that you’re not directly testing your exciting alternative hypothesis (something is going on!). Rather, you’re seeing whether or not your results can easily be explained by your null hypothesis (nothing to see here!).

In a drug trial, for example, the null hypothesis would be that the new drug has no effect on the disease (say reducing heart attacks). At the end of the trial, we compare the number of heart attacks in the treatment (drug) and control (placebo) groups. If the drug has no effect, we would expect roughly the same number of cases in both groups (like 5 heads and 5 tails). But, we know if there is a slight difference between the two groups, this could be due to chance (like the 7 heads and 3 tails). Before we give new drugs to people, we want to be very confident that they work. If statistics tell us that we’d see similar findings when the groups don’t truly differ, that’s not good enough.

Nerd Note: Researchers use the term “p-value” to refer to the probability of seeing the observed data if the null hypothesis is true. Typically, researchers use 5% as a cutoff for statistical significance (written as p<0.05). In the above coin toss example with 9 heads out of 10, the p-value was 1% (statistically significant)-meaning that the probability of getting 9 heads in 10 coin flips is less than 1% if the coin is fair. Of note, the threshold of a 5% p-value for statistical significance is arbitrary and controversial – a result with a 4% p-value isn’t dramatically better than a 6% p-value.

Testing for statistical significance can help us identify “real” results, but it’s NOT the only thing to consider when deciding how excited we should be about a finding. A result can be “statistically significant” but not worth caring about because the effect size is tiny or lacks practical applications. We call this the difference between “statistical” and “substantive” significance (or “clinical” significance). For example, a large clinical trial may find a statistically significant difference in pain between those who take the drug and those who don’t, but the drug only reduces pain by 6%. Thus the real-world value of this drug – how much it helps patients in the real world – is almost nil. The statistical significance of a result also doesn’t tell you how well a study was designed, or how likely it is that the results will apply to you.

Nerd Note: It’s also possible for a “real” phenomenon to fail the statistical significance test. For example, a drug may have a real – but small – benefit that is not statistically significant in a given study because the study population was not large enough to detect the effect. This is just one more way that statistical significance doesn’t tell the whole story!

The Bottom Line:
Statistical significance is a vote of confidence for a finding but doesn’t tell the full story – just like those little green check marks on social media! All it tells us is how confidently we can reject the null hypothesis (“nothing to see here”). Before getting too excited about a “significant” finding, dig more deeply to understand the effect size, study limitations, and generalizability of the results.

Resources

Statistical Significance (StatPearls)

Statistical significance explainer (Investopedia)

Coin flip probability calculator and explainer

Podcast episode on statistical significance podcast from The Studies Show (very entertaining!)

Link to Original Substack Post