The San Juan Daily Star
- Jan 23
- 5 min read

Before the coronavirus pandemic, overlooked clues from Chinese scientists

The documents reinforced questions circulating since early 2020 about when China learned of the virus that was causing its unexplained outbreak — and also drew attention to gaps in the U.S. system of monitoring for dangerous new viruses.

By Benjamin Mueller

In late December 2019, eight pages of genetic code were sent to computers at the National Institutes of Health in Bethesda, Maryland.

Unbeknown to U.S. officials at the time, the genetic map that had landed on their doorstep contained critical clues about the virus that would soon touch off a pandemic.

The genetic code, submitted by Chinese scientists to a vast public repository of sequencing data run by the U.S. government, described a mysterious new virus that had infected a 65-year-old man weeks earlier in Wuhan, China. At the time the code was sent, Chinese officials had not yet warned of the unexplained pneumonia sickening patients in the central city of Wuhan.

But the U.S. repository, which was designed to help scientists share run-of-the-mill research data, never added the submission it received on Dec. 28, 2019, to its database. Instead, it asked the Chinese scientists three days later to resubmit the genetic sequence with certain additional technical details. That request went unanswered.

It took almost another two weeks for a separate pair of virologists, one Australian and the other Chinese, to work together to post the genetic code of the new coronavirus online, setting off a frantic global effort to save lives by building tests and vaccines.

The initial attempt by Chinese scientists to publicize the crucial code was revealed for the first time in documents released last week by House Republicans investigating COVID’s origins. The documents reinforced questions circulating since early 2020 about when China learned of the virus that was causing its unexplained outbreak — and also drew attention to gaps in the U.S. system of monitoring for dangerous new viruses.

The Chinese government has said it promptly shared the virus’s genetic code with global health officials. House Republicans said the new documents suggested that was untrue. News accounts and Chinese social media posts have long reported that the virus was first sequenced in late December 2019.

But lawmakers and independent scientists said that the documents did offer tantalizing new details about when and how scientists first tried to share those sequences globally, illustrating the difficulty the United States has with picking worrisome viruses out of the thousands of humdrum genetic sequences that are submitted to its repository every day.

“You’d never have an ambulance sitting in normal 3 p.m. traffic,” said Jeremy Kamil, a virus expert at Louisiana State University Health Sciences Center Shreveport. Referring to the coronavirus code from 2019, he said, “Why would you allow this sequence to sit there under the same process as a sequence I just got from a new snail species I found in a ravine?”

A spokesperson for the Department of Health and Human Services, which includes the NIH, said in a statement Wednesday that the genetic code was not published because it “was unable to be verified, despite follow-ups by NIH to the Chinese scientist for more information and a response.”

In an earlier letter to House Republicans, Melanie Anne Egorin, a senior Health Department official, said that the sequence had initially been subjected to a “technical, but not scientific or public health,” review, as was customary. After not hearing back from the Chinese scientists about its requested corrections, the database, known as GenBank, automatically deleted the submission from its queue of unpublished sequences on Jan. 16, 2020.

It is not clear why the Chinese scientists did not respond. One of the submitters, Lili Ren, who worked at a virus institute within the state-affiliated Chinese Academy of Medical Sciences in Beijing, did not respond to a request for comment. The Chinese embassy said China’s response was “science-based, effective and consistent with China’s national realities.”

But the same sequence that Ren’s group sent to GenBank was made public on a different online database, known as GISAID, on Jan. 12, 2020, shortly after other scientists had posted the first coronavirus code. Ren’s group also resubmitted a corrected version of the code to GenBank in early February and published a paper describing its work.

The two-week gap between the code first being sent to the U.S. database and China sharing the sequence with global health officials “underscores why we cannot trust any of the so-called ‘facts’ or data” from the Chinese government, the Republican leaders of the House Energy and Commerce Committee said.

Jesse Bloom, a virus expert at the Fred Hutchinson Cancer Center in Seattle, said that the genetic sequence would have strongly suggested to anyone reviewing it in late December 2019 that a new coronavirus was causing the mysterious pneumonia cases in Wuhan. Instead, official Chinese timelines indicate the government did not make that diagnosis until early January.

“If this sequence had been made available, probably the prototype vaccines could’ve been started right away, and that was two weeks earlier than they were started,” Bloom said.

The documents, first reported by The Wall Street Journal, do not provide insight into the origins of the virus, Bloom and other scientists said, given that the sequence did not contain special clues about the virus’s evolution and was later made public anyway.

But they do offer new details about the pace at which Ren’s team worked to sequence the virus. The swab containing the virus they analyzed was taken from the 65-year-old patient, a vendor at the large market where the illness was first seen spreading, on Dec. 24, 2019. Within four days, scientists sent that virus’s genetic data to GenBank.

“That’s incredibly fast,” said Kristian Andersen, a virus expert at the Scripps Research Institute.

At the time, finding a new coronavirus in the patient’s sample would not have proved that it was that virus, and not a different virus or bacteria, causing his illness, Andersen said, though it would have been a reasonable hypothesis.

That consideration appeared to weigh on the Chinese scientists studying samples from early patients. One researcher at a Chinese commercial laboratory that worked with Ren wrote on a blog in late January 2020 that while she had identified a new virus in hospital samples, that alone did not demonstrate that the virus was causing pneumonia cases, slowing down an official announcement.

In early 2020, the Chinese government also issued directives discouraging certain lines of scientific research and restricted the release of data about the virus.

Even once the virus’s genetic code was sent to the U.S. repository, it would have been difficult for U.S. officials staffing the research-oriented database to take notice. The repository holds hundreds of millions of genetic sequences. Much of the process for screening them is automated.

And at least until Chinese officials started sounding an alarm at the very end of December 2019, almost no one would have known to look for a new coronavirus within the heaps of submissions.

Still, some scientists believe that U.S. and global health officials have been slow to retrofit databases such as GenBank to allow them to seize on sequences that could have critical public health implications.

Before the coronavirus pandemic, overlooked clues from Chinese scientists

Recent Posts