How to think like an epidemiologist
By Paula Span
There is a statistician’s rejoinder — sometimes offered as wry criticism, sometimes as honest advice — that could hardly be a better motto for our times: “Update your priors!”
In stats lingo, “priors” are your prior knowledge and beliefs, inevitably fuzzy and uncertain, before seeing evidence. Evidence prompts an updating; and then more evidence prompts further updating, so forth and so on. This iterative process hones greater certainty and generates a coherent accumulation of knowledge.
In the early pandemic era, for instance, airborne transmission of COVID-19 was not considered likely, but in early July the World Health Organization, with mounting scientific evidence, conceded that it is a factor, especially indoors. The WHO updated its priors, and changed its advice.
This is the heart of Bayesian analysis, named after Thomas Bayes, an 18th-century Presbyterian minister who did math on the side. It captures uncertainty in terms of probability: Bayes’ theorem, or rule, is a device for rationally updating your prior beliefs and uncertainties based on observed evidence.
Bayes set out his ideas in “An Essay Toward Solving a Problem in the Doctrine of Chances,” published posthumously in 1763; it was refined by preacher and mathematician Richard Price and included Bayes’ theorem. A couple of centuries later, Bayesian frameworks and methods, powered by computation, are at the heart of various models in epidemiology and other scientific fields.
As Marc Lipsitch, an infectious disease epidemiologist at Harvard University, noted on Twitter, Bayesian reasoning comes awfully close to his working definition of rationality. “As we learn more, our beliefs should change,” Lipsitch said in an interview. “One extreme is to decide what you think and be impervious to new information. Another extreme is to over-privilege the last thing you learned. In rough terms, Bayesian reasoning is a principled way to integrate what you previously thought with what you have learned and come to a conclusion that incorporates them both, giving them appropriate weights.”
With a new illness like COVID-19 and all the uncertainties it brings, there is intense interest in nailing down the parameters for models: What is the basic reproduction number, the rate at which new cases arise? How deadly is it? What is the infection fatality rate, the proportion of people with the virus that it kills?
But there is little point in trying to establish fixed numbers, said Natalie Dean, an assistant professor of biostatistics at the University of Florida.
“We should be less focused on finding the single ‘truth’ and more focused on establishing a reasonable range, recognizing that the true value may vary across populations,” Dean said. “Bayesian analyses allow us to include this variability in a clear way, and then propagate this uncertainty through the model.”
The Logic of Uncertainty
Joseph Blitzstein, a statistician at Harvard, delves into the utility of Bayesian analysis in his popular course “Statistics 110: Probability.” For a primer, in lecture one, he says: “Math is the logic of certainty, and statistics is the logic of uncertainty. Everyone has uncertainty. If you have 100% certainty about everything, there is something wrong with you.”
By the end of lecture four, he arrives at Bayes’ theorem — his favorite theorem because it is mathematically simple yet conceptually powerful.
“Literally, the proof is just one line of algebra,” Blitzstein said. The theorem essentially reduces to a fraction; it expresses the probability P of some event A happening given the occurrence of another event B.
“Naively, you would think, ‘How much could you get from that?’” Blitzstein said. “It turns out to have incredibly deep consequences and to be applicable to just about every field of inquiry” — from finance and genetics to political science and historical studies. The Bayesian approach is applied in analyzing racial disparities in policing (in the assessment of officer decisions to search drivers during a traffic stop) and search-and-rescue operations (the search area narrows as new data is added). Cognitive scientists ask, ‘Is the brain Bayesian?’ Philosophers of science posit that science as a whole is a Bayesian process — as is common sense.
Take diagnostic testing. In this scenario, the setup of Bayes’ theorem might use events labeled “T” for a positive test result — and “C” for the presence of COVID-19 antibodies:
Now suppose the prevalence of cases is 10% (that was so in New York City in the spring), and you have a positive result from a test with accuracy of 87.5% sensitivity and 97.5% specificity. Running numbers through the Bayesian gears, the probability that the result is correct and that you do indeed have antibodies is 79.5%. Decent odds, all things considered. If you want more certainty, get a second opinion. And continue to be cautious.
An international collaboration of researchers, doctors and developers created another Bayesian strategy, pairing the test result with a questionnaire to produce a better estimate of whether the result might be a false negative or a false positive. The tool, which has won two hackathons, collects contextual information: Did you go to work during lockdown? What did you do to avoid catching COVID-19? Has anyone in your household had COVID-19?
“It’s a little akin to having two ‘medical experts,’” said Claire Donnat, who recently finished her Ph.D. in statistics at Stanford and was part of the team. One expert has access to the patient’s symptoms and background, the other to the test; the two diagnoses are combined to produce a more precise score, and more reliable immunity estimates. The priors are updated with an aggregation of information.
“As new information comes in, we update our priors all the time,” said Susan Holmes, a Stanford statistician, via unstable internet from rural Portugal, where she unexpectedly pandemicked for 105 days, while visiting her mother.
That was the base from which Holmes refined a preprint paper, co-authored with Donnat, that provides another example of Bayesian analysis, broadly speaking. Observing early research in March about how the pandemic might evolve, they noticed that classic epidemiological models tend to use fixed parameters, or constants, for the reproduction number — for instance, with an R0 of 2.0.
But in reality, the reproduction number depends on random, uncertain factors: viral loads and susceptibility, behavior and social networks, culture and socioeconomic class, weather, air conditioning and unknowns.
With a Bayesian perspective, the uncertainty is encoded into randomness. The researchers began by supposing that the reproductive number had various distributions (the priors). Then they modeled the uncertainty using a random variable that fluctuates, taking on a range of values as small as 0.6 and as large as 2.2 or 3.5. In something of a nesting process, the random variable itself has parameters that fluctuate randomly; and those parameters, too, have random parameters (hyperparameters), etc. The effects accumulate into a “Bayesian hierarchy” — “turtles all the way down,” Holmes said.
The effects of all these up-and-down random fluctuations multiply, like compound interest. As a result, the study found that using random variables for reproductive numbers more realistically predicts the risky tail events, the rarer but more significant superspreader events.
Humans on their own, however, without a Bayesian model for a compass, are notoriously bad at fathoming individual risk.
“People, including very young children, can and do use Bayesian inference unconsciously,” said Alison Gopnik, a psychologist at the University of California, Berkeley. “But they need direct evidence about the frequency of events to do so.”
Much of the information that guides our behavior in the context of COVID-19 is probabilistic. For example, by some estimates, if you get infected with the coronavirus, there is a 1% chance you will die; but in reality an individual’s odds can vary by a thousandfold or more, depending on age and other factors. “For something like an illness, most of the evidence is usually indirect, and people are very bad at dealing with explicit probabilistic information,” Gopnik said.