Peace for the World

Peace for the World
First democratic leader of Justice the Godfather of the Sri Lankan Tamil Struggle: Honourable Samuel James Veluppillai Chelvanayakam

Monday, April 27, 2020

The Coronavirus Pandemic & The Statistical Wizardry



Tharosa Missaka Rajaratne
logoThe Coronavirus pandemic can be seen as a milestone in information transparency, more scientifically speaking the scientific transparency, in the research community. Despite the fact that a brief lapse of transparency of information occurred at the very beginning of the outbreak in China, a renaissance was experienced in accessing information related to the Coronavirus outbreak worldwide. Colossal datasets that comprise essential raw data and information covering various aspects of the disease itself and the outbreak are released to the internet on daily basis for access of the general public by credible organizations such as the John Hopkins University Center for Systems Science and Engineering (JHU CSSE). Thus an unprecedented liberty in data analysis by scientific community in general has been bestowed due to the fact that the data are now more open than usual.
Essentially, the basic parameters such as the number of confirmed cases, recoveries, and deaths etc. are the fundamental types of raw data based on which the rational disease statistics such as indices, rates, ratios, ranges, etc. are derived. Among them only a few is comprehensible to the general public while other statistics often require much sense in mathematics and statistical concepts to understand the underlying concepts and methods employed. Thus, statistics which employ only one or two evident disease parameters are often used for public broadcasting purposes. The caveat is that almost all the disease statistics have limitations in generating a wholesome analysis of a situation and are constructed based on certain predefined conditions and definitions that thwart deriving the exact image of a situation if the required conditions are not met. In consequence, use of graphical methods employing such statistics could distort the real picture. This article intends to analyze the statistical fallacies that are generated when using and presenting disease statistics.
The Three Basic Parameters
The three basic parameters viz. confirmed cases, recoveries, and deaths can be identified as absolute figures by convention or nature as they represent one distinct status. By using only one disease parameter and a time parameter, basic statistics such as total counts, and daily counts are derived. In the case of the time parameter, the basic interval is widely accepted as a day. These measurements are straightforward and thus can return a segregated (for the case of non-cumulative counts) or aggregated (for the case of cumulative counts) statistic along the timeline. Cumulative measurements are seen as more comprehensive when employed in a graphical representation as it generates a path that shows the overall pattern of the disease up to a given date. Daily measurements alone only can return an isolated result that lacks the ability of returning a segregated observation as they vaguely represent the total scenario. Dissimilar results produced by these two methods are presented in the Figure – 1 and Figure – 2.

Figure 1 – COVID-19: Daily New Confirmed Cases, Location: Sri Lanka (Data: JHU CSSE, 22nd April 2020)

Figure 2 – COVID-19: Cumulative Confirmed Cases, Location: Sri Lanka. (Data: JHU CSSE, 22nd April 2020)
The diagnostic basis of the term ‘confirmed case’ has been a point of debate throughout the pandemic. Different countries have adopted varying approaches in declaring a suspected individual as a ‘confirmed case’ of COVID-19. For instance, currently, a confirmed case is widely considered as an individual who is tested positive for COVID-19 by a laboratorial method. The tests mainly include much accurate less convenient PCR Test and the less accurate more convenient Antigen/Antibody Test. At the infancy of the pandemic, in China, various other clinical and laboratory diagnostic methods with varying success rates were also experimented until an effective method was identified. The changes in diagnostic methods produced a steep and an unexpected spike in confirmed cases in China on 12th February 2020. Unless a complete revision or a clear log noting the changes, is made to the data at such a situation, an erroneous statistical result is likely to be generated. The basis of the term ‘recovered case’ is by convention understood as when the patient is discharged from the hospital. Death, however, is absolute and does not dispute with multiple definitions.
Active Cases
Apart from the three basic parameters, few other parameters are derived from incorporating two or more basic parameters. The most well-known derived parameters are Active cases, Case Fatality Rate, and Doubling Time etc. The term ‘Active Cases’ is defined as,
This parameter can be identified as the resultant parameter of the three basic parameters. Casually, Active cases relate to “the number of people who are still in the hospital at the end of the day”. One of the several advantages of using Active Cases is that its cumulative measurement reaches a maximum level when the outbreak is at the peak, and it eventually converges to zero at a condition where the disease is not terminal. This facilitates in generating a graphical representation that is more effective in observing the direction of the progress of the disease. Solely studying the number of confirmed cases would be counterproductive as it would not generate the image of the aftermath of the disease. Figure – 3 is a graphical representation of the epidemic scenario in South Korea by utilizing the Cumulative Active Case count method. It is apparent that South Korea has shown a steady decline in accumulating new cases since mid-March. Another advantage is that the Active case count is more intuitive to be tallied against the carrying capacity (for instance, ICU beds) than the Confirmed case count. Recovered cases and Deaths are often combined to form the parameter ‘Removed’ especially in the fields of epidemiology and epidemic simulation (e.g. Susceptible – Infected – Removed Model). When equations (1) and (2) are rearranged to form equation (3), it becomes certain that Active Cases account to the difference between accumulation of removal of cases. 
Therefore,
(2)
(3)

Figure 3 – COVID-19: Cumulative Active Cases, Location: South Korea. (Data: JHU CSSE, 22nd April 2020)
Although Active Case method can act as a useful measurement due to the aggregation of three parameters, the very definition could conceal the true nature of the scenario if the preferable conditions are not met. Once an aggregation happens, the parameters tend to lose their individual identities. In this parameter, according to equation (2) it is certain that a removal of a case can happen in either of two ways: through recovery or death. Therefore, at a condition where a decline in Active cases are observed, it does not confide how the cases are being removed and in what proportions. Thus it is wise to revisit the casual definition now to amend it as “the number of people who could not make it back home at the end of the day” hence it instinctively raises the question of the possible reasons why an ill-taken individual could not reach back home. For an instance, in Sweden, despite the number of Active cases is still rising, more deaths are occurred than recoveries. Thus, the removal of a case necessitates a discriminator that indicates the mode of removal: by recovery or by death, which can be defined as,
(4)
This equation yields the arithmetic difference between Recoveries and Deaths, also the mathematical sign according to the difference of the two figures (i.e. + sign indicates there are more Recoveries than Deaths likewise). But it should be noted that whenever the both parameters are equal and/or zero, the equation yields zero indicating a zero difference. Nonetheless, this equation is sufficient to show the drastic differences on how cases are being removed in various countries. Apart from using the arithmetic difference, a proportional method can also be used using the following equation.

Read More