Thirteen genetic sequences — isolated from people with COVID-19 infections in the early days of the pandemic in China — were mysteriously deleted from an online database last year, but have now been recovered.
Jesse Bloom, a computer biologist and viral evolution specialist at the Fred Hutchinson Cancer Research Center, found that the sequences had been removed from an online database at the request of scientists in Wuhan, China. But with a little internet, he was able to retrieve copies of the data stored in Google Cloud.
Sequences do not fundamentally change the understanding that scientists have origins of COVID-19 – including the full question of whether the coronavirus spread naturally from animals to people or escaped in a laboratory accident. But its removal adds to concerns that the Chinese government’s secrecy has obstructed international efforts to understand how COVID-19 emerged.
Bloom’s results were published in a prepress paper, not yet reviewed by other scientists, published Tuesday. “I think it’s certainly consistent with an attempt to hide the sequences,” he told BuzzFeed News.
Bloom learned of the deleted data later reading a document of a team led by Carlos Farkas at the University of Manitoba, Canada, on some of the first genetic sequences of SARS-CoV-2. Farkas’s paper described sequences of hospital outpatient samples in a project conducted by Wuhan researchers who were developing diagnostic tests for the virus. But when Bloom tried to download the sequences Sequence reading file, an online database managed by the U.S. National Institutes of Health, received error messages showing that they had been removed.
Bloom realized that copies of SRA data are also kept on Google-managed servers and was able to decrypt URLs where missing sequences could be found in the cloud. In this way, he recovered 13 genetic sequences that can help answer questions about how the coronavirus evolved and where it originated.
Bloom found that the deleted sequences, like others collected at later dates outside the city, were more similar to bat coronaviruses (which are presumed to be the final ancestors of the virus that causes COVID-19) than sequences related to the Huanan fish and seafood market in Wuhan. This is in addition to previous suggestions that the seafood market may have been an early victim of COVID-19, rather than where the coronavirus first jumped from animals to humans.
“This is a very interesting study done by Dr. Bloom and, in my opinion, the analysis is totally correct,” Farkas told BuzzFeed News by email. Scott Gottlieb, former head of the Food and Drug Administration, also praised the findings on Twitter.
But some scientists were less impressed. “It really doesn’t add anything to the debate about origins,” Robert Garry of Tulane University in New Orleans told BuzzFeed News via email. Garry argued that the Huanan market or other Wuhan markets could still be the source of COVID-19.
Bloom is one of 18 scientists who in May published a letter criticizing the WHO and China study on the origins of SARS-CoV-2. The scientists argued that the WHO-China report did not give “balanced consideration” to competing ideas that the coronavirus spread naturally from animals to people or escaped from a laboratory, a theory that report considered it “extremely unlikely.” Following the publication of the WHO-China report, the US and 13 other governments complained that “he had no access to original and complete data and samples.”
The deleted virus sequences were first uploaded to the SRA in early March 2020, around the time researchers led by Yan Li and Tiangang Liu of Wuhan University has published a prepress describing their work by genetic sequencing to diagnose COVID-19. A few days earlier, the State Council of China he had ordered that all documents related to COVID-19 be approved centrally.
The sequences were withdrawn from the SRA in June, approximately at the time final version of the document appeared in a scientific journal. According to the NIH, the authors called for the sequences to be deleted. “The applicant indicated that the script information had been updated, that it was being sent to another database and wanted the data to be removed from SRA to avoid version control issues,” he said. say Amanda Fine, NIH spokeswoman, to BuzzFeed News by email.
However, it is unclear whether the scripts have been published online in another database.
“There is no plausible scientific reason for the suppression,” Bloom wrote in his prepress, arguing that the sequences were probably “suppressed to conceal their existence.” This suggested, he wrote, “a wholehearted effort to trace the early spread of the epidemic.”
Although the sequences were deleted, Garry noted that the key genetic mutations they contained were still published in a table in the final document of the Wuhan team. “Jesse Bloom found exactly nothing new that is no longer part of the scientific literature,” Garry told BuzzFeed News, accusing Bloom of writing his prepress in an “unscientific and unnecessary inflammatory way.”
Bloom wrote to Wuhan investigators asking them why the sequences had been deleted but received no response. Similarly, Li and Liu did not immediately respond to a BuzzFeed News query.
This is not the first time scientists have raised concerns about deleting data that may help answer questions about the origins of COVID-19. The main database containing information on coronavirus sequences maintained by the Wuhan Institute of Virology, which is the focus of speculation on a possible “laboratory leak” of the virus – has disconnected in September 2019. When members of the WHO-China team who studied the origins of the pandemic visited the institute in February, they were told in the database, that as data were included in 22,000 coronavirus samples and sequence records, the bee was withdrawn after repeated attempts at piracy.