Dr. Manouchehr Hessabi
← All writing
8 min readepidemiology · methods · causation

Why correlation is not causation: confounding and how epidemiologists actually decide

What confounding means in epidemiology, why correlation is not causation, and how researchers use study design and Bradford Hill viewpoints to weigh evidence.

By Manouchehr Hessabi, MD, MPH

Almost every week a headline announces that one thing is "linked to" another. Coffee linked to longer life. A common medication linked to a disease. A diet linked to better outcomes. Readers hear a cause. What the study usually reports is an association, which is a far more modest claim.

The gap between those two ideas is where most public confusion about health research lives. Closing it is not a matter of skepticism for its own sake. It is a matter of knowing the specific reasons an association can be real and still not be causal, and knowing the tools researchers use to tell the difference. The single most important of those reasons has a name: confounding.

What "linked to" really means

Two terms make the rest of this clearer. An exposure is whatever a study is examining as a possible influence, such as a food, a pollutant, a behavior, or a treatment. An outcome is the result being tracked, such as a diagnosis, a recovery, or a death. When researchers say an exposure is associated with an outcome, they mean the two tend to occur together more often than chance would predict. That is a pattern, not a mechanism.

A pattern can arise for several reasons. The exposure might genuinely cause the outcome. The outcome might influence the exposure (reverse causation). The link might be a fluke of a small or biased sample. Or a third factor, lurking behind both, might be generating the appearance of a relationship that is not really there. That last possibility is confounding, and it is the one that most often fools careful people.

What a confounder is, in plain terms

A confounder is a third factor that is associated with both the exposure and the outcome and that distorts the apparent relationship between them. It is not part of the causal chain from exposure to outcome. It sits off to the side, connected to both, quietly manufacturing a correlation.

The classic teaching example makes it concrete. Imagine a study finds that people with yellowed fingers are more likely to develop lung cancer. Taken at face value, the data might suggest the staining is dangerous. It is not. Smoking causes both the yellowed fingers and the lung cancer. Smoking is the confounder, and once it is accounted for, the fingernail-cancer association disappears. The relationship was real in the data and spurious in reality (see the overview of confounding and causal principles in the NIH StatPearls reference on the principles of causation).

Age is another constant troublemaker. A region with more retirees will show higher rates of many diseases simply because its population is older, not because the place itself is harmful. Unless age is handled, it confounds almost any comparison across populations.

Tool one: study design that levels the playing field

The cleanest defense against confounding is built into how a study is run.

In a randomized study, participants are assigned to groups by chance. Done well, randomization tends to balance both known and unknown confounders across the groups, so the groups differ mainly in the exposure being tested. That is why randomized trials sit near the top of the evidence hierarchy. The catch is that randomizing is often impossible or unethical. No one can assign children to drink contaminated water, so many of the most important environmental and public-health questions can never be answered by a trial.

When randomization is off the table, researchers turn to observational study designs and to statistical adjustment. Several tools help:

  • Restriction limits the study to a single category of the confounder, for example studying only nonsmokers, so the confounder cannot vary.
  • Stratification analyzes the exposure-outcome relationship separately within levels of the confounder, such as within each age band, then combines the results.
  • Regression adjustment uses a statistical model to estimate the exposure's relationship with the outcome while holding measured confounders constant.
  • Propensity score methods summarize many background characteristics into a single score that captures how likely each person was to receive the exposure, then compare people with similar scores.

Each of these is a real and useful instrument, and methodologists have written extensively about their proper use and their limits (see the discussion of confounding in analytical epidemiologic studies on the NIH PMC archive). The honest limitation shared by all of them is the same: they can only adjust for confounders that were actually measured. A confounder no one thought to record, or could not measure, remains in the result. This residual, unmeasured confounding is the permanent humility built into observational research.

Tool two: the Bradford Hill viewpoints

Even with good design, deciding that an association is causal is a judgment, not a calculation. In 1965, the British statistician Austin Bradford Hill offered a now-famous set of considerations to guide that judgment. They are usually listed as nine: strength of the association, consistency across studies and settings, specificity, temporality, biological gradient (a dose-response pattern), plausibility, coherence with what is already known, experiment, and analogy.

One of the nine carries special weight. Temporality, the requirement that the cause precede the effect, is close to a true necessity. If the outcome came first, the exposure cannot be its cause. The rest are weights on a scale rather than boxes to tick.

It is worth being precise about what Hill intended, because the list is so often misused. Hill himself called these "viewpoints," not criteria, and warned that none could be required as an absolute condition and none could deliver proof on its own. A 2020 review in the European Journal of Epidemiology revisited the framework in light of modern causal thinking and reinforced this caution. Notably, it observed that even apparently strong evidence, including a strength of association or a dose-response gradient, can itself arise from confounding rather than cause.

Shimonovich M, Pearce A, Thomson H, Keyes K, Katikireddi SV (2020). Assessing causality in epidemiology: revisiting Bradford Hill to incorporate developments in causal thinking. European Journal of Epidemiology.DOI: 10.1007/s10654-020-00703-7

The practical lesson is that the Bradford Hill viewpoints are a structured way of thinking, not a scoring sheet. A study can satisfy several of them and still be confounded. The framework organizes the argument; it does not settle it.

Why this matters most in environmental health

Confounding is a particular challenge in environmental epidemiology, the study of how exposures in air, water, food, and surroundings relate to health. The reason is that environmental exposures rarely travel alone. Diet, income, neighborhood, occupation, and other exposures tend to cluster together. A community exposed to a contaminant may also differ in nutrition, healthcare access, and dozens of other factors, any of which could be the real driver of an outcome.

That is why work connecting the environment to child development, including the long line of research into autism and the environment, spends so much effort measuring and adjusting for the factors that surround an exposure. The goal is not to prove a single contaminant guilty. It is to ask a narrower, answerable question: after accounting for everything else that plausibly differs, does the exposure still track with the outcome?

This piece is educational and is not a substitute for personal medical advice. Its aim is to explain how evidence is weighed, not to recommend any action about any specific exposure.

How to read the next "X causes Y" headline

The next time a study makes the news, four quick questions separate a strong finding from a fragile one.

  • Was it randomized or observational? Randomized evidence is harder to confound, though it is not always possible.
  • What did the researchers adjust for? More importantly, what plausible confounder might they have missed?
  • Does the temporal order hold? Did the exposure clearly come before the outcome?
  • Has it replicated? A result that appears consistently across different populations and methods is far more trustworthy than a single striking study.

None of these questions require a statistics degree. They require remembering that an association is the beginning of an investigation, not its conclusion. That habit of mind, more than any single number, is what separates careful reading of science from headline-chasing.

For readers who want to see how these methods are applied in practice, the peer-reviewed publications page collects studies where exactly these questions of design, measurement, and confounding had to be worked through.

About the author. Dr. Manouchehr Hessabi is a physician-epidemiologist and Senior Research Scientist at the BERD core of UTHealth Houston's Center for Clinical and Translational Sciences. See his peer-reviewed publications or research programs.