Exclusive: Chinese researcher, at the center of COVID data withdrawal controversy, speaks up

And he made a sensational story much more boring

Jul 24, 2021

This newsletter is a joint effort by your host Yang Liu and Zichen Wang, of the Pekingnology newsletter.

This newsletter will address one particular allegation against Chinese scientists over the origin tracing of SARS-CoV-2.

Around one month ago, in late June 2021, media outlets from The New York Times, the Wall Street Journal, Financial Times, Bloomberg, to Nature reported that Wuhan University researchers had “mysteriously” withdrawn COVID-19 sequencing data from a U.S. database, quoting experts saying “the incident demonstrated further evidence of how Chinese researchers and officials have not been fully transparent in how they dealt with data related to the pandemic’s origins.” (FT)

The allegation accused a group of scientists from China’s Wuhan University of withdrawing a set of data that they had uploaded to an online database called the Sequence Read Archive, managed by the U.S. government’s National Library of Medicine. The allegation insinuates a coverup aimed at blocking research into how the COVID-19 pandemic started.

Starting with the sleuth work of a U.S. researcher and ending up in the headlines of multiple international media, the allegation escalated, in the words of a Wall Street Journal report, “concerns that scientists studying the origin of the pandemic may lack access to key pieces of information.”

Your host and Zichen were able to secure an exclusive interview with a Wuhan University researcher at the center of this controversy, to hear the other side of the story. The researcher asks to remain anonymous for the time being, after your host repeatedly played up the alumni card - he graduated from Wuhan University in 2011.

Nobody in Beijing directed or aided this interview - it's purely a personal effort.

Upon hearing the researcher’s account, it became apparent that the whole affair actually has a boring technical explanation, a far cry from the over-sensationalized news reports that came to dominate the public discourse in English.

The following is a play-by-play recount of what actually happened.

I. What was the research in question actually about?

The research, as shown in the published paper, was an attempt to find a sequencing method that could improve the diagnostic accuracy of SARS-CoV-2. At the time of the research in 2020, medical professionals in China, and especially Wuhan, lacked an efficient and effective method to diagnose infections.

Or, as described in the published Abstract of the paper

The ongoing global novel coronavirus pneumonia COVID-19 outbreak has engendered numerous cases of infection and death. COVID-19 diagnosis relies upon nucleic acid detection; however, currently recommended methods exhibit high false-negative rates and are unable to identify other respiratory virus infections, thereby resulting in patient misdiagnosis and impeding epidemic containment.
Combining the advantages of targeted amplification and long-read, real-time nanopore sequencing, herein, nanopore targeted sequencing (NTS) is developed to detect SARS-CoV-2 and other respiratory viruses simultaneously within 6–10 h, with a limit of detection of ten standard plasmid copies per reaction...NTS is thus suitable for COVID-19 diagnosis; moreover, this platform can be further extended for diagnosing other viruses and pathogens.

II. What was the COVID-19 data for?

According to the researcher, the data was originally uploaded to the U.S. National Library of Medicine database because it could be of value to reviewers in the journal’s publication process, so as they could assess the results of their sequencing method.

因为审稿人会要求你说明这些数据是编造的还是真实的？真实发生完以后，你测到的数据长成什么样，得给大家看一下，你得给审稿人看一下。我们就按照这个要求把数据提交到美国国立卫生研究院的数据库上面了。
Because the reviewers would need proof of the authenticity of the data used in the research - was the data (in the paper) real or fabricated? What does your data look like? So we submitted our data to the National Library of Medicine database.

According to the researcher, the primary purpose of the paper was to establish a new diagnosis method; the sequence of the SARS-CoV-2 that was in the samples was somewhat irrelevant to the main point of the research.

因为首先这个数据对这篇文章没有什么直接的科学价值。我们开发的检测方法就相当于炒盘菜，这道菜使用的原材料可以写也可以不写，因为我们已经在正文中完整地展示了病毒的序列。
The (sequencing) data set had no direct scientific value to the paper. The new diagnosis method that we developed was like a Chinese stir-fried dish, and the exact ingredients for this dish didn’t matter that much, because we had already fully presented the sequences of the virus in the paper’s text.

所有信息都在正文中以图表的形式展示出来。其实这些序列就像一堆铅笔，然后我们已经使用了非常易懂的方式描述了出来，铅笔有红色的，有蓝色的，你就算不把这些铅笔放在那里，我们也已经描述了事实。
All the information has been presented in the body of the text in the form of tables and figures. Actually, these sequences are like a bunch of pencils and then we have described it in a very understandable way, some pencils are red, some are blue. Even if these pencils are not before your eyes, we have described the facts.

The researcher further stated due to the nature of the data collected from the samples, they are not suitable for use in any origin tracing efforts.

我们检测就是去撒个网的范围，只能最大程度上能捞到病毒序列的 1/3。但是绝大多数我们这里面还覆盖不到。所以这些数据的质量从覆盖度来讲，达不到溯源数据的要求。在世卫组织的报告里面有很明确的这个描述。在他讲那段之前上来就提到几个标准，这几个标准我们没有一条是满足的，所以说是谁都不会拿我们这个数据去讲溯源的事情。第一，他的覆盖度不够，你想给做一个人的身份证鉴定，我们只包括了身份证号码的后面的那一小部分，怎么能证明说这个人身份证号是全的。第二就是我们深度测序的准确性。这种方法的准确性用于诊断是足够了，但是用于溯源精准的判断，那是不够的。所以这是很正常的，不会有人拿我们这个数据去做溯源分析的。
The net we cast could only capture one-third of the sequence of the virus, leaving the most part out of our coverage. So from this perspective, the data does not meet the standard of origin tracing requirements. The WHO has laid out the criteria, none of which our data meets. So no one would take our data for origin tracing work. This is like identifying a person by his ID number, if we only have a fraction of that ID number, how can we know the complete ID number? Furthermore, the data harvested from our samples are accurate enough for diagnosis, but not for origin tracing. This is normal, no one would take our data to do origin tracing work.

III, When was the sampling done that produced the data?

According to the researcher, a total of two batches of samples were taken. In the first batch, a total of 45 samples were taken randomly from patients that sought treatment in Wuhan on Jan. 30th, 2020. The second batch of samples was taken from a group of patients in mid-February, 2020.

According to media reports, on January 30th, 2020, China reported nearly 10,000 COVID-19 cases in total.

In the opinions of a Chinese vice-minister, that means the data had little value in COVID-19 origin tracing - it’s simply not “early” data when the case count gets to nearly 10,000.

Your host notes that the first reported Covid case took place on Dec. 8, 2019.

IV. Why did Wuhan University researchers withdraw the data from the U.S. database?

The researcher said that in their submission to the journal, they included a paragraph that describes the Internet link to the U.S. database.

However, the paragraph was deleted during the copy editing phase of the publication.

The researcher accepted the change, as it had been in his opinion non-essential information.

所以基于这种情况我们一看这个杂志就是把这段删掉了，我们觉得这个是没必要的，因为这个杂志本身也是一个方法学的杂志，它不是一个国外的媒体或学者关注或发表的病毒序列文章，大家完全不在一个领域里面。
When we saw that the journal had deleted the paragraph, we believed that then the paragraph was unnecessary. The journal itself focused on methodology. And the paper was not intended to be a paper publishing virus sequences, which foreign media and scholars paid a lot of attention to. We were not in the same field (of science).

Since the language pointing to the data on the database was deleted, the researcher thought there was no reason for the data to be kept on the database, “because no one will know why it existed there”.

就是你这个数据因为在正文里面它没有这段描述了，所以你把数据传到一个地方，它就像一只无头的苍蝇在里面，没有人知道说这个数据是跟我们没有关系的，也可能时间长了，我们自己也找不着那个东西了，也没有一个链接，所以我们就把这个数据删掉了。这个是去年6月份的事情。
Because the paper no longer included this descriptive paragraph (of the link to the database), the data that was stored in the database was like a headless fly. Nobody would know the data’s association with us, maybe after some time, even we wouldn’t be able to find the data, since there was no link. So we asked for the data to be deleted. This took place in June 2020.

V. How did Dr. Jesse Bloom come into the picture?

Fast forward to one year later, in 2021, the researcher said he received an email from Dr. Jesse Bloom on Monday, June 7, but didn’t reply because the first thought coming to his mind was he should re-upload the data to somewhere else public.

我完全不认识这个人，在我这个研究领域里面也都没有交集。我只是把他当成一个普通的研究者。普通的研究者如果给我写邮件，向我要数据的话，如果这个数据也确实是发表的，我就会把数据公开共享。但我不可能说我把这个数据共享给他一个人。
I didn't know this person (Dr. Jesse Bloom) at all, and he and I had no intersection in my research field. I just regarded him as an ordinary researcher. If an ordinary researcher writes me an email and asks me for data, if this data needs to be published, I will share it publicly, not exclusively with the ordinary researcher.

我之前压根没听说过这个人，因为，我看到他给我写信的话，我的第一反应是我们把这个数据再上传一个地方。
I have never heard of this person (Dr. Jesse Bloom) before. When I saw his email to me, my first reaction was that we should re-upload this data to another place.

The researcher said the next time he heard of Dr. Jesse Bloom was on June 23, 2021.

他在6月22号发了一篇文章，非常具有攻击性和污蔑性，直接说我们是有隐瞒的，不然为什么把数据撤回来。在他发布文章后，美国有很多家媒体，包括纽约时报等很多就给我们轮番轰炸，写信问我这边是怎么回事，我压根不知道是怎么回事，然后紧接着就在当天美国的主媒体就在那个人的基础上把这个事情又进行了升华，就是衍生了一些报道。
He (Dr. Jesse Bloom) posted an article on June 22nd that was very offensive and slanderous, directly saying that we were hiding something, otherwise why did we pull back the data.
After he published the article, many media in the United States, including the New York Times and many others, bombarded us with emails asking me what happened. I absolutely didn't know what was happening. And then on the same day the mainstream media in the United States “upgraded” the story on the basis of that person's article.
报道各式各样，比如武汉研究人员神秘的删除了在美国国立卫生研究院数据库上上传的数据。我数了一下，大约有四十几家媒体进行报道，包括主流杂志Science、Nature也给我写信，问我事件的真是情况，但是他们还没等我回应，或者没等我反应过来，就纷纷发表了文章，描述这个事件。然后我们就先后回复我们做了什么，第一就是我们把这个数据重新上传了，上传到国家生物信息数据库里面，原来怎么上传，我们现在就怎么上传，是完全公开的。
There were all kinds of reports, such as Wuhan researchers mysteriously deleted the data uploaded to the NIH database. I counted about four dozen media reports, including the mainstream (academic) journals Science and Nature, which also wrote to me asking me about the real situation of the incident, but they published articles describing the incident before I could respond or react. Then we responded what we have been working on, including re-uploading the data to a database in China, just like before, all open-access.

VI. Was there any hiding of Wuhan COVID-19 data?

Bloom Lab @jbloom_lab

There are also broader implications. First, fact this dataset was deleted should make us skeptical that all other relevant early Wuhan sequences have been shared. We already know many labs in China ordered to destroy early samples: scmp.com/news/china/soc… (16/n)

The researcher said that there was no basis in asserting that the Chinese researchers were hiding anything because first and foremost, the published paper didn’t include any description of the Internet link to the data in the U.S. database.

正文里面也没有这段，这个很简单，你可以到我那个杂志，文章是开源的，就是open access的，你都可以下载，在正式发表的文章里面是没有任何数据描述的，如果说我在正文里面描述了这么一段它存在哪个地方了，然后我又自己私下里面就把这段给它删除了，杂志社都不会饶了我。
The published text did not have that paragraph, this is very simple, you can go to that journal, and the article is open-access, you can download it. In the officially published version, there is no description of the link to the data’s storage in the database. If there were such a paragraph in the text describing where the data was accessible, and then I privately deleted it, the journal would not spare me.

VII, Why not reply at that time?

如果在23号他铺天盖地写的时候去回应他，我们有几个证据还没有完成。首先，我需要把这个数据公开。我不能说在我还没公开这个数据的时候，就去回应他。你回应完你不还是没有公开吗？
If we were to respond to him (publicly) on June 23, when he was writing all over the place (on Twitter, in pre-print), we still had several pieces of evidence that were not yet complete. First, I needed to make this data public. I can't respond when I still haven’t made the data public - in that case, even if I responded, the data was still not public, what’s the point?

The researcher said the whole data had been re-uploaded.

We re-uploaded the data to the GSA database constructed by China's National Center for Biotechnology Information. What we uploaded to the NCBI (National Center for Biological Information) of the United States, we re-uploaded it to the Chinese database completely. It is totally public.

Responding to suspicions that the Chinese government was behind the withdrawal of the data, the researcher said he couldn't recall the exact date when he was contacted by the Chinese authorities, but it was sometime after June 22, 2021 (when Dr. Jesse Bloom published his preprint and Twitter thread) and before early July, a full year after the data was withdrawn.

The Chinese vice-minister hinted at this during Thursday’s press conference, saying

这个事情报道出来以后，我们马上对这个事情进行了调查、了解。
After this matter was reported, we immediately investigated and understood this matter.

In hindsight, the researcher said:

我也在反思这个事、这个过程。上一次有人给我写信问这个事的时候，我怎么样回复能更好？但是请你理解，这个时候我不知道该怎么办，也没有人告诉我该怎么办。
I am also reflecting on this matter/process: when I received the email about it, could I have responded better? Would things turn out to be better if I had said something better? But I ask for your understanding - at the time, I didn’t know what to do, and there was nobody telling me what to do.

In summary, by putting the two sides of the story together, your host believes that the following conclusions can be drawn:

The research in question was not related to origin tracing, nor was particular attention paid to this aspect by the researchers.
The samples gathered for this study were almost two months after the first Covid case had been reported, and therefore shouldn’t be considered “early evidence” that could have boosted origin tracing efforts in any significant manner.
There was nothing nefarious behind the decision to withdraw the data, as the researcher explained it was a purely technical reason and he had uploaded the deleted data once more.
The Chinese government had played zero roles in the withdrawal, and suggestions of a coverup are unsubstantiated.

(Turn to Zichen’s Pekingnology for further observation.)

Two outstanding interns, Chenjie Liao, a student at Sun Yat-sen University, and Qi Cui, a graduate student at China Foreign Affairs University, have just joined Beijing Channel and contributed to this newsletter.

Beijing Channel

Exclusive: Chinese researcher, at the center of COVID data withdrawal controversy, speaks up

And he made a sensational story much more boring

Discussion about this post