Which animal viruses could infect people? Computers are racing to find out.
By Carl Zimmer
Colin Carlson, a biologist at Georgetown University, has started to worry about mousepox.
The virus, discovered in 1930, spreads among mice, killing them with ruthless efficiency. But scientists have never considered it a potential threat to humans. Now Carlson, his colleagues and their computers aren’t so sure.
Using a technique known as machine learning, the researchers have spent the past few years programming computers to teach themselves about viruses that can infect human cells. The computers have combed through vast amounts of information about the biology and ecology of the animal hosts of those viruses, as well as the genomes and other features of the viruses themselves. Over time, the computers came to recognize certain factors that would predict whether a virus has the potential to spill over into humans.
Once the computers proved their mettle on viruses that scientists had already studied intensely, Carlson and his colleagues deployed them on the unknown, ultimately producing a short list of animal viruses with the potential to jump the species barrier and cause human outbreaks.
In the latest runs, the algorithms unexpectedly put the mousepox virus in the top ranks of risky pathogens.
“Every time we run this model, it comes up super high,” Carlson said.
Puzzled, Carlson and his colleagues rooted around in the scientific literature. They came across documentation of a long-forgotten outbreak in 1987 in rural China. Schoolchildren came down with an infection that caused sore throats and inflammation in their hands and feet.
Years later, a team of scientists ran tests on throat swabs that had been collected during the outbreak and put into storage. These samples, as the group reported in 2012, contained mousepox DNA. But their study garnered little notice, and a decade later mousepox is still not considered a threat to humans.
If the computer programmed by Carlson and his colleagues is right, the virus deserves a new look.
“It’s just crazy that this was lost in the vast pile of stuff that public health has to sift through,” he said. “This actually changes the way that we think about this virus.”
Scientists have identified about 250 human diseases that arose when an animal virus jumped the species barrier. HIV jumped from chimpanzees, for example, and the new coronavirus originated in bats.
Ideally, scientists would like to recognize the next spillover virus before it has started infecting people. But there are far too many animal viruses for virus experts to study. Scientists have identified more than 1,000 viruses in mammals, but that is most likely a tiny fraction of the true number. Some researchers suspect that mammals carry tens of thousands of viruses, while others put the number in the hundreds of thousands.
To identify potential new spillovers, researchers like Carlson are using computers to spot hidden patterns in scientific data. The machines can zero in on viruses that may be particularly likely to give rise to a human disease, for example, and can also predict which animals are most likely to harbor dangerous viruses we don’t know about yet.
“It feels like you have a new set of eyes,” said Barbara Han, a disease ecologist at the Cary Institute of Ecosystem Studies in Millbrook, New York, who collaborates with Carlson. “You just can’t see in as many dimensions as the model can.”
In March, Carlson and his colleagues unveiled an open-access database called VIRION, which has amassed half a million pieces of information about 9,521 viruses and their 3,692 animal hosts — and is still growing.
Databases such as VIRION are now making it possible to ask more focused questions about new pandemics. When the COVID pandemic struck, it soon became clear that it was caused by a new virus called SARS-CoV-2. Carlson, Han and their colleagues created programs to identify the animals most likely to harbor relatives of the new coronavirus.
SARS-CoV-2 belongs to a group of species called betacoronaviruses, which also includes the viruses that caused the SARS (severe acute respiratory syndrome) and MERS (Middle East respiratory syndrome) epidemics among humans. For the most part, betacoronaviruses infect bats. When SARS-CoV-2 was discovered in January 2020, 79 species of bats were known to carry them.
But scientists have not systematically searched all 1,447 species of bats for betacoronaviruses, and such a project would take years to complete.
By feeding biological data about the various types of bats — their diet, the length of their wings, and so on — into their computer, Carlson, Han and their colleagues created a model that could offer predictions about the bats most likely to harbor betacoronaviruses. They found more than 300 species that fit the bill.
Since that prediction in 2020, researchers have indeed found betacoronaviruses in 47 species of bats — all of which were on the prediction lists produced by some of the computer models they had created for their study.
Nardus Mollentze, a computational virus expert at the University of Glasgow, and his colleagues have pioneered a method that could markedly increase the accuracy of the models. Rather than looking at a virus’ hosts, their models look at its genes. A computer can be taught to recognize subtle features in the genes of viruses that can infect humans.
In their first report on this technique, Mollentze and his colleagues developed a model that could correctly recognize human-infecting viruses more than 70% of the time. Mollentze can’t yet say why his gene-based model worked, but he has some ideas. Our cells can recognize foreign genes and send out an alarm to the immune system. Viruses that can infect our cells may have the ability to mimic our own DNA as a kind of viral camouflage.
When they applied the model to animal viruses, they came up with a list of 272 species at high risk of spilling over. That is too many for virus experts to study in any depth.
“You can only work on so many viruses,” said Emmie de Wit, a virus expert at Rocky Mountain Laboratories in Hamilton, Montana, who oversees research on the new coronavirus, influenza and other viruses. “On our end, we would really need to narrow it down.”
Mollentze acknowledged that he and his colleagues need to find a way to pinpoint the worst of the worst among animal viruses.
“This is only a start,” he said.
To follow up on his initial study, Mollentze is working with Carlson and his colleagues to merge data about the genes of viruses with data related to the biology and ecology of their hosts. The researchers are getting some promising results from this approach, including the tantalizing mousepox lead.
De Wit said that machine learning models could some day guide virus experts like herself to study certain animal viruses.
“There’s definitely a great benefit that’s going to come from this,” she said.
But she noted that the models so far have focused mainly on a pathogen’s potential for infecting human cells. Before causing a new human disease, a virus also has to spread from one person to another and cause serious symptoms along the way. She is waiting for a new generation of machine learning models that can make those predictions, too.
“What we really want to know is not necessarily which viruses can infect humans, but which viruses can cause an outbreak,” she said. “So that’s really the next step that we need to figure out.”