Understanding Covid-19 Statistics

Gathering statistics about communicable diseases is difficult. Covid-19 is no exception. The best available measures are all trailing measures, often with long lag times. I continually see statistics poorly or inappropriately presented.

The most important statistic is the number of newly infected people. There is currently no way to directly measure this. Contact tracing can be used to notify some people who may have been infected by an infected person. However, it is rare that people know (or remember) all the people they have been in contact with while they were infectious.

Statistical Issues

These are some issues with the data being presented. At best they lead to misunderstanding the situation, sometimes by world leaders.

Undercounting

The most commonly used statistic is confirmed cases. This is believed to represent less than ten percent of the actual cases. Unfortunately, this is the best available measure. This statistic is extremely sensitive to testing availability and timeliness. NOT testing does not reduce the number of cases, it only reduces the count.

Another commonly reported statistic is the number of infected people in hospitals. This is the subset of people most severely impacted by the infection and is believed to represent five percent or less of total infections. This statistic is important in gauging the capability of the health care system to handle severe cases. This statistic significantly trails new infections; firstly, because it is often a week or two after infection that hospitalization is required; and secondly, because people may be hospitalized for many weeks. This statistic is also limited due to available capacity, and some overloaded hospitals send infected people home to die.

The death rate is also commonly reported. This statistic is believed to represent one percent or less of total infections. In areas where the spread is poorly controlled, it may represent a larger percentage. It trails initial infection by weeks, and the time lag may vary significantly depending on medical care availability.

Inappropriate statistics

Frequently the wrong statistics are present with new items. This may be in part because inappropriate statistics may be easier to find. Some examples:

  • A graphic of the total number of cases by state presented to show which states currently have an increasing number of cases. States with few new cases are color-coded the same as states with large increases in the number of new cases.
  • Graphics that code the number of cases by state. 100,000 represents about seventeen percent of the population of Wyoming, but less than one percent of the population of North Carolina and eight other states. As I am writing this a dozen states have over 100,000 confirmed cases of Covid-19.
  • Aggregating data from states in different stages of control. Most states have yet to control the viral spread, while some have. Combining data from both may represent the current situation. One organization graphed the data for New York, New Jersy, and Connecticut in one graph; and the other states in another showing a clearer representation of the situation in the other states as well as the tri-state area.
  • Reporting total cases is of limited use, especially this early in the pandemic. This can be used to estimate the percent of the population that has been exposed. Unfortunately, this statistic is significantly undercounted. Antibody tests can be used to provide a much more reliable measure.

Problems Gathering Data

Gathering data about an infection is difficult. For most infections, it is unlikely that all cases will be identified. Testing and identifying identified persons may be difficult and possibly provide erroneous results. Getting timely results may be difficult because of the infections development cycle and delays in the testing process.

Delays in Data Availability

There is always a delay between exposure to the infection and when it becomes identifiable. It takes time for the infection to take hold and become measurable. Once the infection becomes measurable, data won’t be available until after the infected person is tested. Unless the person is significantly symptomatic, they may be unlikely to get tested. Once the test has been done, there may be a delay, possibly days, until the infection is confirmed.

People are rarely hospitalized immediately after they become symptomatic. Except for the most deadly infections, few infected people are hospitalized. Usually, hospitalization only occurs after the infection causes severe symptoms. This may be many days after symptoms were first noticed.

Deaths are particularly problematic. They often occur after a relatively long hospitalization. People who die without having received medical care may not be counted. Additionally, people may delay seeking care for other conditions and die as a result; it is not clear if these deaths should be counted.

Inaccurate Counts

Unless everyone can be tested with absolutely accurate tests, the counts will be inaccurate. Repeated tests will be required to count people whose infection was not significantly advanced to be found in a prior test, including those infected during the testing period.

Tests are rarely absolutely accurate. Some tests are generally unreliable, but other failure reasons exist. Failures fall into two categories:

  • False Positives are cases where an infection is identified but does not exist. These may be a result of detecting something else besides the desired infection. (Cowpox works as a smallpox vaccine because the cowpox immunity cells misidentify smallpox as cowpox and attack it.)
  • False Negatives are cases where the infection exists but does not exist. These may be due to the infection not being severe enough at the location the sample was taken.

Unidentified Cases

I have already mentioned the issue of people not being tested, and therefore not being counted. This contributes to undercounting, and may contribute to viral spread.

There are reports that a significant number of people, mostly young, do not display symptoms. These are often called asymptomatic. However, the virus may be causing damage which is not readily identified. As we gain knowledge about the virus, the number of organs which can be damaged is increasing.

Summary

The available statistics have significant issues. However, they provide important information. Carefully consider the issues before relying on any statistics you see.

Note to Reporters: Carefully consider the statistics you use in your reporting. The most useful statistics are daily new case counts. Five-day or seven-day rolling averages help reduce variability but reduce trend change indications. Rember these are trailing indicators and research how long the delay is.

WordPress SSH2 configuration

Instead of the packaged WordPress I run the version provided by WordPress.  It is installed using a different userid from the userid the web server runs as.  To enable updates from the Admin Dashboard, I enabled sftp (ssh). This is how I did it.

Using the sftp option requires the php ssh module.  This command installs the php ssh module.

apt install php-ssh2

The FTP funtionality includes the sftp (ssh2) option for connectivity.  To enable this the /etc/wordpress/config.php file must be updated to include the following lines.  (Use the appropriate directories for your installation.)

// This value should be ssh2 not ssh
define('FS_METHOD', 'ssh2');
define('FTP_BASE', '/var/www/');
define('FTP_CONTENT_DIR', '/var/www/wp-content/');
define('FTP_PLUGIN_DIR ', '/var/www/wp-content/plugins/');
define('FTP_PUBKEY', '/etc/wordpress/.ssh/id_rsa.pub');
define('FTP_PRIKEY', '/etc/wordpress/.ssh/id_rsa');
// user that owns wordpress install - should not be root
define('FTP_USER', 'wordpress');
// password for FTP_USER username - may be empty
define('FTP_PASS', 'changeme');
// hostname:port combo for your SSH/FTP server
define('FTP_HOST', 'localhost');

The following script creates and poputates the directories required for ssh to work. An ssh key is generated and granted restricted access to the user owning the distribution. The last command verifies the setup.   

# Make the directories
www-data mkdir -p -m 0755 ~www-data/.ssh /etc/wordpress/.ssh
sudo chown www-data /etc/wordpress/.ssh
# Create the known hosts fi
sudo ssh-keyscan -c "localhost > ~www-data/.ssh/known_hosts"
sudo chmod 444 ~www-data/.ssh/known_hosts
# Generate the key file 
sudo -u www-data ssh-keygen -b 4096 -f /etc/wordpress/.ssh/id_rsa -N changeme
# Secure the directories
sudo chown root:www-data /etc/wordpress/.ssh ~www-data/.ssh
# Authorize the key - with restricted access
echo -n 'from="127.0.0.1,::1",restrict,pty ' >> ~/.ssh/authorized_keys
sudo cat /etc/wordpress/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
# Test the configuration - should be prompted for the key's password.
sudo -u www-data ssh -i /etc/wordpress/.ssh/id_rsa $(logname)@localhost

I hope this is useful for you. As always, please change the password used above.

My original installation used a key without a password. At the time sftp access was not stable. I have not yet done an upgrade with a password on the key.

init.d for Non-root Processes

When installing third-party applications, they often default to running as root. The server applications for TeamSite/LiveSite are among those. I have applied a simple modification to the init.d scripts that starts them as a non-root user. It also allows the scripts to be run by members of an administration group via sudo. This approach is applicable to other applications. Continue reading “init.d for Non-root Processes”

Geo blocking with tcpwrappers

i recently had an issue with frequent login attempts against on of my services. These were almost all from countries that should not be accessing my service. To resolve the issue I implemented geo blocking with TCP Wrappers. This is how I went about geo blocking connections. Continue reading “Geo blocking with tcpwrappers”

Tuning Java Garbage Collection

I recently completed a garbage collection exercise on a variety of applications. In all, twenty WebLogic application clusters were tuned. A dozen of these are large busy application clusters. These provide a mix of Web Applications and Web Services.

Tuning garbage collection is a matter of trade-offs. Large heaps take longer to garbage collect. Small heaps need to be collected frequently using more CPU time. Continue reading “Tuning Java Garbage Collection”

Securing TLS

A StackExchange question on using HAProxy’s capture feature to pass data from TCP mode to HTTP mode prompted me to update my SSL configuration. This was intended to get an A+ rating from SSL Labs by sending non-SNI capable clients to a server with weaker ciphers. This was to enable clients on WinXP/IE8, Java 6, and an old Android version to connect. I found a solution without having to have two sets of ciphers and handling traffic in both the TCP mode and HTTP mode. I then optimized my settings to a minimal list of cipher specifications.
Continue reading “Securing TLS”

WordPress Tuning

I’ve done a little tuning to my WordPress setup. In order to keep up to date, I’ve switched from the Ubuntu installation to a downloaded installation under /opt/wordpress. This is owned by my user and served by apache running as www-data. Updates are done using the sftp add-on.

Securing /opt/wordpress

I added myself to the www-data group. This allow apache to read any files with group access, but prevents writing if the web-server is compromised.

I set the group sticky bit on all the directories. If required, setting it on the wp-content/upgrade directory should be sufficient.

SSH Key

I generated my key outside the home directory for www-data which is /var/www. The directory I chose is not one I would publish. However, ssh requires a .ssh/known_hosts file in its home directory. This was created and the appropriate security added. The key is password protected.

Outstanding Issues

There are some outstanding issues. I’ll look into these as time permits.

Native ssh

The WordPress ssh2 modules does not work on my server. I’ve found a couple of issues.

  • Passwords on the key don’t work. This is a known issue with a work-around. The initial connection appears to fail, but a second call should resolve the issue.
  • The is_dir function does not work. Returning true for paths that end in a slash (/) is a workaround. This got me as far as trying to install. This may be a result of how the path is constructed and there is a published workaround.
  • The is_file function appear to fail as WordPress reports the download contains no files. This is likely the same issue as for the is_dir function.

Theme upgrades

My modifications to the theme are getting a little old. The theme works reasonably well on mobile devices, but I would like to update to a more streamlined theme. The site statistics I have indicate a surprisingly high percentage of viewers use a mobile device.

Command Line Arguments in Python

When I need a new tool, I often code it in Python.  Often, command line options are useful. Sometimes it is possible to have a fixed set of parameters, but this is not very flexible. Fortunately, Python has standard libraries to handle parsing command lines.  There are three libraries providing varying capabilities.  Some of the systems I run on have older versions like Jython 2.1 or Python 2.6.  This limits which libraries I can use without backporting libraries

This document provides examples for four command line processing options.  The examples are for a program that processes files and has an optional argument to report the execution time. Continue reading “Command Line Arguments in Python”