Exclusive: Chinese researcher, at the center of COVID data withdrawal controversy, speaks up
And he made a sensational story much more boring
This newsletter is a joint effort by your host Yang Liu and Zichen Wang, of the Pekingnology newsletter.
This newsletter will address one particular allegation against Chinese scientists over the origin tracing of SARS-CoV-2.
Around one month ago, in late June 2021, media outlets from The New York Times, the Wall Street Journal, Financial Times, Bloomberg, to Nature reported that Wuhan University researchers had “mysteriously” withdrawn COVID-19 sequencing data from a U.S. database, quoting experts saying “the incident demonstrated further evidence of how Chinese researchers and officials have not been fully transparent in how they dealt with data related to the pandemic’s origins.” (FT)
The allegation accused a group of scientists from China’s Wuhan University of withdrawing a set of data that they had uploaded to an online database called the Sequence Read Archive, managed by the U.S. government’s National Library of Medicine. The allegation insinuates a coverup aimed at blocking research into how the COVID-19 pandemic started.
Starting with the sleuth work of a U.S. researcher and ending up in the headlines of multiple international media, the allegation escalated, in the words of a Wall Street Journal report, “concerns that scientists studying the origin of the pandemic may lack access to key pieces of information.”
Your host and Zichen were able to secure an exclusive interview with a Wuhan University researcher at the center of this controversy, to hear the other side of the story. The researcher asks to remain anonymous for the time being, after your host repeatedly played up the alumni card - he graduated from Wuhan University in 2011.
Nobody in Beijing directed or aided this interview - it's purely a personal effort.
Upon hearing the researcher’s account, it became apparent that the whole affair actually has a boring technical explanation, a far cry from the over-sensationalized news reports that came to dominate the public discourse in English.
The following is a play-by-play recount of what actually happened.
I. What was the research in question actually about?
The research, as shown in the published paper, was an attempt to find a sequencing method that could improve the diagnostic accuracy of SARS-CoV-2. At the time of the research in 2020, medical professionals in China, and especially Wuhan, lacked an efficient and effective method to diagnose infections.
Or, as described in the published Abstract of the paper
The ongoing global novel coronavirus pneumonia COVID-19 outbreak has engendered numerous cases of infection and death. COVID-19 diagnosis relies upon nucleic acid detection; however, currently recommended methods exhibit high false-negative rates and are unable to identify other respiratory virus infections, thereby resulting in patient misdiagnosis and impeding epidemic containment.
Combining the advantages of targeted amplification and long-read, real-time nanopore sequencing, herein, nanopore targeted sequencing (NTS) is developed to detect SARS-CoV-2 and other respiratory viruses simultaneously within 6–10 h, with a limit of detection of ten standard plasmid copies per reaction...NTS is thus suitable for COVID-19 diagnosis; moreover, this platform can be further extended for diagnosing other viruses and pathogens.
II. What was the COVID-19 data for?
According to the researcher, the data was originally uploaded to the U.S. National Library of Medicine database because it could be of value to reviewers in the journal’s publication process, so as they could assess the results of their sequencing method.
Because the reviewers would need proof of the authenticity of the data used in the research - was the data (in the paper) real or fabricated? What does your data look like? So we submitted our data to the National Library of Medicine database.
According to the researcher, the primary purpose of the paper was to establish a new diagnosis method; the sequence of the SARS-CoV-2 that was in the samples was somewhat irrelevant to the main point of the research.
The (sequencing) data set had no direct scientific value to the paper. The new diagnosis method that we developed was like a Chinese stir-fried dish, and the exact ingredients for this dish didn’t matter that much, because we had already fully presented the sequences of the virus in the paper’s text.
All the information has been presented in the body of the text in the form of tables and figures. Actually, these sequences are like a bunch of pencils and then we have described it in a very understandable way, some pencils are red, some are blue. Even if these pencils are not before your eyes, we have described the facts.
The researcher further stated due to the nature of the data collected from the samples, they are not suitable for use in any origin tracing efforts.
The net we cast could only capture one-third of the sequence of the virus, leaving the most part out of our coverage. So from this perspective, the data does not meet the standard of origin tracing requirements. The WHO has laid out the criteria, none of which our data meets. So no one would take our data for origin tracing work. This is like identifying a person by his ID number, if we only have a fraction of that ID number, how can we know the complete ID number? Furthermore, the data harvested from our samples are accurate enough for diagnosis, but not for origin tracing. This is normal, no one would take our data to do origin tracing work.
III, When was the sampling done that produced the data?
According to the researcher, a total of two batches of samples were taken. In the first batch, a total of 45 samples were taken randomly from patients that sought treatment in Wuhan on Jan. 30th, 2020. The second batch of samples was taken from a group of patients in mid-February, 2020.
According to media reports, on January 30th, 2020, China reported nearly 10,000 COVID-19 cases in total.
In the opinions of a Chinese vice-minister, that means the data had little value in COVID-19 origin tracing - it’s simply not “early” data when the case count gets to nearly 10,000.
Your host notes that the first reported Covid case took place on Dec. 8, 2019.
IV. Why did Wuhan University researchers withdraw the data from the U.S. database?
The researcher said that in their submission to the journal, they included a paragraph that describes the Internet link to the U.S. database.
However, the paragraph was deleted during the copy editing phase of the publication.
The researcher accepted the change, as it had been in his opinion non-essential information.
When we saw that the journal had deleted the paragraph, we believed that then the paragraph was unnecessary. The journal itself focused on methodology. And the paper was not intended to be a paper publishing virus sequences, which foreign media and scholars paid a lot of attention to. We were not in the same field (of science).
Since the language pointing to the data on the database was deleted, the researcher thought there was no reason for the data to be kept on the database, “because no one will know why it existed there”.
Because the paper no longer included this descriptive paragraph (of the link to the database), the data that was stored in the database was like a headless fly. Nobody would know the data’s association with us, maybe after some time, even we wouldn’t be able to find the data, since there was no link. So we asked for the data to be deleted. This took place in June 2020.
V. How did Dr. Jesse Bloom come into the picture?
Fast forward to one year later, in 2021, the researcher said he received an email from Dr. Jesse Bloom on Monday, June 7, but didn’t reply because the first thought coming to his mind was he should re-upload the data to somewhere else public.
I didn't know this person (Dr. Jesse Bloom) at all, and he and I had no intersection in my research field. I just regarded him as an ordinary researcher. If an ordinary researcher writes me an email and asks me for data, if this data needs to be published, I will share it publicly, not exclusively with the ordinary researcher.
I have never heard of this person (Dr. Jesse Bloom) before. When I saw his email to me, my first reaction was that we should re-upload this data to another place.
The researcher said the next time he heard of Dr. Jesse Bloom was on June 23, 2021.
He (Dr. Jesse Bloom) posted an article on June 22nd that was very offensive and slanderous, directly saying that we were hiding something, otherwise why did we pull back the data.
After he published the article, many media in the United States, including the New York Times and many others, bombarded us with emails asking me what happened. I absolutely didn't know what was happening. And then on the same day the mainstream media in the United States “upgraded” the story on the basis of that person's article.
There were all kinds of reports, such as Wuhan researchers mysteriously deleted the data uploaded to the NIH database. I counted about four dozen media reports, including the mainstream (academic) journals Science and Nature, which also wrote to me asking me about the real situation of the incident, but they published articles describing the incident before I could respond or react. Then we responded what we have been working on, including re-uploading the data to a database in China, just like before, all open-access.
VI. Was there any hiding of Wuhan COVID-19 data?
The researcher said that there was no basis in asserting that the Chinese researchers were hiding anything because first and foremost, the published paper didn’t include any description of the Internet link to the data in the U.S. database.
The published text did not have that paragraph, this is very simple, you can go to that journal, and the article is open-access, you can download it. In the officially published version, there is no description of the link to the data’s storage in the database. If there were such a paragraph in the text describing where the data was accessible, and then I privately deleted it, the journal would not spare me.
VII, Why not reply at that time?
If we were to respond to him (publicly) on June 23, when he was writing all over the place (on Twitter, in pre-print), we still had several pieces of evidence that were not yet complete. First, I needed to make this data public. I can't respond when I still haven’t made the data public - in that case, even if I responded, the data was still not public, what’s the point?
The researcher said the whole data had been re-uploaded.
We re-uploaded the data to the GSA database constructed by China's National Center for Biotechnology Information. What we uploaded to the NCBI (National Center for Biological Information) of the United States, we re-uploaded it to the Chinese database completely. It is totally public.
Responding to suspicions that the Chinese government was behind the withdrawal of the data, the researcher said he couldn't recall the exact date when he was contacted by the Chinese authorities, but it was sometime after June 22, 2021 (when Dr. Jesse Bloom published his preprint and Twitter thread) and before early July, a full year after the data was withdrawn.
The Chinese vice-minister hinted at this during Thursday’s press conference, saying
After this matter was reported, we immediately investigated and understood this matter.
In hindsight, the researcher said:
I am also reflecting on this matter/process: when I received the email about it, could I have responded better? Would things turn out to be better if I had said something better? But I ask for your understanding - at the time, I didn’t know what to do, and there was nobody telling me what to do.
In summary, by putting the two sides of the story together, your host believes that the following conclusions can be drawn:
The research in question was not related to origin tracing, nor was particular attention paid to this aspect by the researchers.
The samples gathered for this study were almost two months after the first Covid case had been reported, and therefore shouldn’t be considered “early evidence” that could have boosted origin tracing efforts in any significant manner.
There was nothing nefarious behind the decision to withdraw the data, as the researcher explained it was a purely technical reason and he had uploaded the deleted data once more.
The Chinese government had played zero roles in the withdrawal, and suggestions of a coverup are unsubstantiated.
(Turn to Zichen’s Pekingnology for further observation.)
Two outstanding interns, Chenjie Liao, a student at Sun Yat-sen University, and Qi Cui, a graduate student at China Foreign Affairs University, have just joined Beijing Channel and contributed to this newsletter.