Gathering statistics about communicable diseases is difficult. Covid-19 is no exception. The best available measures are all trailing measures, often with long lag times. I continually see statistics that are poorly or inappropriately presented.
This article was written in the first few months of Covid-19 pandemic. The issues discussed are still occurring They are applicable to many cases.
The most important statistic is the number of newly infected people. There is currently no way to directly measure this. Contact tracing can be used to notify some people who may have been infected by an infected person. However, it is rare that people know (or remember) all the people they have been in contact with while they were infectious.
These are some issues with the data being presented. At best they lead to misunderstanding the situation, sometimes by world leaders.
The most commonly used statistic is confirmed cases. This is believed to represent less than ten percent of the actual cases. Unfortunately, this is the best available measure. This statistic is extremely sensitive to testing availability and timeliness. NOT testing does not reduce the number of cases, it only reduces the count. As it takes days after infection to obtain a positive test, this indicator shows infections that occurred a few days ago.
Another commonly reported statistic is the number of infected people in hospitals. This is the subset of people most severely impacted by the infection and is believed to represent five to ten percent of total infections. This statistic is important in gauging the capability of the health care system to handle severe cases. This statistic significantly trails new infections; firstly, because it is often a week or two after infection that hospitalization is required; and secondly, because people may be hospitalized for many weeks. This statistic is also limited due to available capacity, and some overloaded hospitals send infected people home to die.
The death rate is also commonly reported. This statistic is believed to represent one percent or less of total infections. In areas where the spread is poorly controlled, it may represent a larger percentage. It trails initial infection by weeks, and the time lag may vary significantly depending on medical care availability.
Frequently the wrong statistics are present with new items. This may be in part because inappropriate statistics may be easier to find. Some examples:
- A graphic of the total number of cases by state presented to show which states currently have an increasing number of cases. States with few new cases are color-coded the same as states with large increases in the number of new cases.
- Graphics that code the number of cases by state. 100,000 represents about seventeen percent of the population of Wyoming, but less than one percent of the population of North Carolina and eight other states. As I write this a dozen states have over 100,000 confirmed cases of Covid-19.
- Aggregating data from states in different stages of control. Most states have yet to control the viral spread, while some have. Combining data from both may represent the current situation. One organization graphed the data for New York, New Jersey, and Connecticut in one graph; and the other states in another showing a clearer representation of the situation in the other states as well as the tri-state area.
- Reporting total cases is of limited use, especially early in the pandemic. This can be used to estimate the percent of the population that has been exposed. Unfortunately, this statistic is significantly undercounted. Testing a random as for antibodies can provide a much more reliable measure.
Problems Gathering Data
Gathering data about an infection is difficult. For most infections, it is unlikely that all cases will be identified. Testing and identifying identified persons may be difficult and possibly provide erroneous results. Getting timely results may be difficult because of the infection’s development cycle and delays in the testing process.
Delays in Data Availability
There is always a delay between exposure to the infection and when it becomes identifiable. It takes time for the infection to take hold and become measurable. Once the infection becomes measurable, data won’t be available until after the infected person is tested. Unless the person is significantly symptomatic, they may be unlikely to get tested. Once the test has been done, there may be a delay, possibly days, until the infection is confirmed.
People are rarely hospitalized immediately after they become symptomatic. Except for the most deadly infections, few infected people are hospitalized. Usually, hospitalization only occurs after the infection causes severe symptoms. This may be many days after symptoms were first noticed.
Deaths are particularly problematic. They often occur after a relatively long hospitalization. People who die without having received medical care may not be counted. Additionally, people may delay seeking care for other conditions and die as a result; it is not clear if these deaths should be counted.
Unless everyone can be tested with absolutely accurate tests, the counts will be inaccurate. Repeated tests will be required to count people whose infection was not significantly advanced to be found in a prior test, including those infected during the testing period.
Tests are rarely absolutely accurate. Some tests are generally unreliable, but other failure reasons exist. Failures fall into two categories:
- False Positives are cases where an infection is identified but does not exist. These may be a result of detecting something else besides the desired infection. (Cowpox works as a smallpox vaccine because the cowpox immunity cells misidentify smallpox as cowpox and attack it.)
- False Negatives are cases where the infection exists but does not exist. These may be due to the infection not being severe enough at the location the sample was taken.
I have already mentioned the issue of people not being tested, and therefore not being counted. This contributes to undercounting and may contribute to the viral spread.
There are reports that a significant number of people, mostly young, do not display symptoms. These are often called asymptomatic. However, the virus may be causing damage which is not readily identified. As we gain knowledge about the virus, the number of organs which can be damaged is increasing.
The available statistics have significant issues. However, they provide important information. Carefully consider the issues before relying on any statistics you see.
Note to Reporters: Carefully consider the statistics you use in your reporting. The most useful statistics are daily new case counts. Five-day or seven-day rolling averages help reduce variability but reduce trend change indications. Remember these are trailing indicators and research how long the delay is.