Today’s Nerdy Guest post is from Dr. Michael Levy, associate professor of epidemiology at the Perelman School of Medicine at the University of Pennsylvania. Dr. Levy studies disease ecology and control of vector-borne infectious disease. He likes bugs.
Dr. Levy’s post addresses the CDC’s reporting of COVID-19 test results. He is quoted in this story.
Q: What’s up with the CDC mixing together different kinds of tests in their reporting of test results?
A: There are two main classes of tests for COVID-19: 1) Tests that detect the RNA of the virus, and 2) serological tests that detect antibodies we make against the virus. Several states had been reporting those two types of test results together as “number of tests completed”, and those combined results fed into the data the CDC reports.
What is the effect of mixing the test results together? More noise. More confusion. In order to make sense of any datapoint, you need a good sense of how it got into the dataset and why. For testing, that’s been challenging. A typical dataset is a hodgepodge of passive testing of symptomatic individuals, surveillance in high-risk settings, maybe even some contact tracing. Worse still, the rules and definition for all of these are changing.
Why did test results get mixed up? It’s not entirely clear. Some suggest that states may be trying to downplay the epidemic. If that’s your motive, combining RNA and serological test results together in one dataset would be a weird way to do it. In fact, including serological results with RNA results might make things look worse: people are generally positive to the serology tests once they are recovering or recovered from infection. So some old, recovered cases might look like new cases.
Or, it is possible that including the serological tests is an attempt to show increased testing effort. Serology testing is generally cheaper, and many people are keen to get serology tests done. Adding these test counts to the total would be an easy way to inflate testing numbers and meet targets.
While mixing data from the two types of testing is certainly a mess up, it’s not surprising. Even in the best of times it’s easy to mess up data, or to mix things together. Those in charge of the datasets aren’t always the same people who are using the data for decisions.
This brings me to the biggest issue: we (the public, epidemiologists, anyone trying to make sense of the data) aren’t seeing the whole picture. I’m not suggesting things are being hidden, but we simply don’t have access to the context around the numbers. Without context we’re left guessing a lot–which means some pretty complicated math–which means that if we get something wrong (like thinking we’re looking at RNA tests when we’re actually looking at a mix of RNA and serology) we might make misleading conclusions.
Open data is critical to tracking the epidemic. But data without context are just numbers.