Ameerah Ali1, Shirin Haque1
Department of Physics, The University of the West Indies, St. Augustine, Trinidad, W.I.
Corresponding Author’s:
Ameerah Ali
Email: [email protected]
Shirin Haque
Email: [email protected]
DOI:
DOAJ: 5c2f3dd500fd4b8ab8bd43805443da87
Copyright: This is an open-access article under the terms of the Creative Commons Attribution License which permits use, distribution, and reproduction in any medium, provided the original work is properly cited.
©2022 The Authors. Caribbean Medical Journal published by Trinidad & Tobago Medical Association.
ABSTRACT
Objective
To use the mathematical tool of Benford’s law to determine the efficacy of preventative measures against COVID-19 taken by Trinidad, Jamaica, Barbados, the USA and New Zealand. Benford’s law may also indicate the likelihood of fraudulent or manipulated COVID-19 data in these target countries.
Method
Aggregate information on the number of contracted cases of COVID-19 per day from the governments and health authorities of each target country was collected via John Hopkins’ Coronavirus Resource Center. This was used to tabulate the leading digits of each data point recorded and the frequency of appearance of each number from 1 to 9. A bar graph was then generated for each set of data alongside the expected Benford distribution for a direct comparison. Finally, a Chi-Square test, suitable for such an investigation, was carried out to statistically ascertain how close the observed distribution was to the expected Benford distribution.
Results
Of the five countries, the USA had a distribution which followed Benford’s law the closest due to exponential spread of the virus with a χ2 value of 12.81 (versus a critical value of 15.51). The χ2 values of New Zealand, Barbados, Jamaica and Trinidad are 161.45, 110.99, 54.26 and 52.92 respectively. These values indicated that the datasets strayed from Benford’s law.
Conclusion
The control interventions to mitigate the spread of the COVID-19 virus, taken by the USA were insufficient to control the proliferation of COVID-19 while those taken by the remaining countries examined during the course of the study (between the first COVID-19 case in each country to late October 2020) ranged from fairly sufficient to excellent as supported by the cumulative curves. In spite of the Benford analysis, it is unlikely that there has been any manipulation of data or fraudulent reporting occurring among the target countries.
Introduction
In December of 2019, a new virus originated in Wuhan, Hubei province, China and was originally termed the 2019 novel coronavirus (2019-nCoV) or the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) [1]. Since then, the virus has come to be known as coronavirus disease 2019 (COVID-19) and has resulted in a worldwide health crisis [1], reaching the level of global pandemic. Symptoms of the virus are mild in most cases and typically include fever, cough, sore throat, breathlessness, and fatigue. However, COVID-19 can progress to pneumonia, acute respiratory distress syndrome and multi organ dysfunction in severe instances, most common among the elderly or those with comorbidities [1].
In the absence of significant interventions by the relevant authorities, the spread of COVID-19 can increase rapidly and move towards exponential growth in the number of contracted cases recorded as well as number of deaths recorded. Thus, it is important to be able to determine if the measures that are being put in place to protect the global population from the virus are functioning as intended [7]. When numerical data is exponential in nature (such as uninhibited virus proliferation) it automatically follows Benford’s law, also called the law of anomalous numbers, which describes the phenomenon in which the significant digits of a number are not distributed evenly but instead the smaller digits are favoured [2].
Benford’s law can be presented as follows, where the probability of the first significant digit (FSD) or leading digit, k is
Benford’s law has been shown to present in datasets containing 50 to 100 numbers, however, due to the law of large numbers, it is recommended to use larger datasets overall [8]. For this reason, it was ensured that each country had more than 200 data points respectively. It should be noted that the number of contracted cases per day added to the cumulative count in this study refers only to the confirmed cases of COVID-19 reported by each country as there may be some number of unreported or unconfirmed COVID-19 cases in each country.
Data analysis
When the data was collected, a graph of contracted cases per day was generated for each country in order to visualize the growth rate and proliferation of the virus in each target country. Additionally, a graph of all five countries based on cumulative count of contracted cases per day per capita was generated to efficiently compare the target countries. The leading digit of each data point was recorded and the frequency of appearance of each number from 1 to 9 was tabulated per country. Following this, a series of bar graphs were generated for each set of data alongside the expected Benford distribution to provide direct comparison.
Since there exists an expected theoretical distribution, that is, the Benford distribution, a Chi-Square goodness-of-fit test was applied to each country’s dataset to determine how close the results garnered were to Benford’s law. In the Mathematical and Physical sciences, the Chi-Square test is a powerful tool used to quantify the variance from expected and observed distributions and was determined to be most suitable for this analysis. The Chi-Square test was found to be the most widely implemented technique when examining distributions in comparison to Benford’s law and is particularly employed for smaller datasets (<5000 observations) [6] [9]. The Chi-Square test was thus conducted in compliance with the method detailed by Gingrich through the implementation of the critical value approach [10][11]. The null hypothesis, H0, is that the observed distribution does not follow Benford’s law. The alternative hypothesis, H1, is that the observed distribution does, in fact, follow Benford’s law. The degrees of freedom present in these datasets were k-1 where k is the number of categories (number of leading digits, 9) and this leads to the degrees of freedom being 8. A standard level of α = 0.05 was utilised for the analysis. By then conferring with the table of Chi-Square distribution, it was found that for 8 degrees of freedom and an α value of 0.05, the critical value (CV) is 15.51 [10].
The Chi-Square test is such that if χ2 ≥ CV, then the null hypothesis is rejected and if χ2 < CV, the researcher fails to reject the null hypothesis.
If the Chi-Square value is found to be below the critical value, then it can be stated that the distribution is very close to Benford’s law (there is evidence at the 5% level that the distribution of leading digits is different from Benford’s law) and this further leads to the notion that the distribution is not interrupted in a significant way by control measures. Therefore, the closer the Chi-Square value is to the critical value, the more rapidly the virus is proliferating in any given country. As the Chi-square value is a measure of how far the observed distribution is from the expected distribution, these values for each country can be compared with one another, utilizing the critical value as a convenient benchmark of comparison. The further away the Chi-Square value is to the critical value, the less rapidly the virus is proliferating in any given country which suggests that the control interventions being taken in that country have been successful to varying degrees in reducing the spread of COVID-19. If neither of these scenarios present themselves, then it can be hypothesized that the discrepancy present in the reporting could be due to manipulation of data.
Results
Normalized data for target countries / Application of Benford’s model to data
Figure 2 facilitates qualitative comparison between the target countries as the cumulative number of contracted cases per capita allows for the data to be visualized without having to account for the scaling of the respective populations. From this figure, it can be noted that the USA has the highest cumulative number of cases per capita at every point in the period of study and also demonstrated the steepest curve. New Zealand and Barbados demonstrated very flat cumulative curves per capita on the lowest spectrum of the graph. Finally, Trinidad’s and Jamaica’s curves of the cumulative number of cases per capita fell in the middle-lower spectrum of the graph.
Figure 3 allows for direct visual comparison between the expected Benford distribution and the actual distribution for each of the five target countries. For Trinidad the FSDs 4 and 5 follow the expected frequency while all other FSDs are particularly lower with the exception of 1 which is notably much higher than the expected frequency. Jamaica displayed anomalies such FSDs 1, 2, 3, 4 and 9 are all of a lower frequency than expected while 5, 6, 7 and 8 are of a much higher frequency than expected. For Barbados, FSDs 1 and 9 are notably high (with 9 being favoured) while the rest are notably low. The USA displayed slight differences in the FSDs of 5, 6 and 7 but otherwise, the distribution is similar to the Benford distribution. New Zealand’s distribution exhibited a drastic difference when compared to the Benford distribution such that the observed distribution overwhelmingly favours the FSD of 1 while the other digits are notably low in frequency.
Table 2 summarizes the results obtained when the Chi-Square test was applied to each dataset utilizing an α value of 0.05 and, with 8 degrees of freedom, a CV of 15.51. The USA was the only target country whose χ2 value was below the critical value, which means that this distribution can be deemed as following Benford’s law. The other four countries exhibit χ2 values significantly greater than the critical value and thus do not follow Benford’s law. By examining the χ2 values relative to one another, it can be stated that Barbados and New Zealand’s χ2 values are much further from the critical value than Trinidad and Jamaica and this can be understood as the latter two countries’ distributions being closer to the expected distribution than the former two.
Discussion
Trinidad
The most probable explanation for Trinidad to have not followed the Benford distribution, as described in the results section, is the fact that the final date of data collection for this country was October 28th and, at this point in time, the number of contracted cases were in the 5000 range and had not yet reached the 6000, 7000, …, 9000 ranges. Since Benford’s law requires data to completely span several orders of magnitude, it may be that, if the data was continually collected until a magnitude of 10,000 contracted cases, the distribution may be closer to the expected frequencies [12].
Another possibility for Trinidad to have not followed the Benford distribution is due to the fairly strict quarantining, restricted entry into the country and lockdown procedures taken by the government as well as the mandate for wearing masks, social distancing, limiting number of businesses open and limiting grouping of persons [13] [14]. This is reflected in the plot of cumulative contracted cases per capita which indicated that there were fewer new cases contracted each day at a slower rate than would be if exponential growth occurred. Furthermore, when an individual is tested positive for the virus, they are subject to strict quarantining in government-provided facilities and thus, the likelihood of person-to-person transmission via a quarantined individual is greatly reduced. Based on these findings, the authors hypothesize that it is unlikely that Trinidad authorities have been manipulating, fabricating or otherwise reporting fraudulent COVID-19 data as there exists salient explanations for the non-compliance to Benford’s law.
Jamaica
Through examination of the plot of the cumulative count of contracted cases per day per capita in Jamaica, it was seen that there was a fairly low rate of proliferation from the beginning of the study to, approximately, August 17th, after which the growth rate increased steadily. During the time in which the growth rate was low, this would have resulted in a section of the entire distribution that does not fit Benford’s law particularly well which may have impacted the overall distribution generated for the study period. When the growth rate spiked, the number of contracted cases per day increased rapidly such that some ranges were skipped altogether (for example, if there were 30 cases contracted in one day that was added to a 70-case total, the new total would be 100, effectively skipping over the FSDs of 8 and 9 for this order of magnitude) and may have further contributed to the non-compliance to Benford’s law.
Since the distribution was not found to follow Benford’s law, there must be an explanation on the reasoning for this and one such explanation is that the preventative and control measures were somewhat efficient in reducing the spread of the virus. Although the control measures taken were not enough to maintain the country’s low proliferation rate after approximately August 17th, it should be noted that any measures taken, specifically travel restrictions into the country, were sufficient to control the spread for a long enough time that the entire distribution was not Benford-compliant [15]. Similar to Trinidad, Jamaica implemented a mandate on masks in public as well as a work-from-home order and travel restrictions [15]. Furthermore, the government announced various curfews at key points in the period of active COVID-19 cases, ensuring that persons stay socially isolated when possible. In some instances, lockdown measures were stricter within the parishes where the virus was most present such as Kingston and St. Andrews [16]. Moreover, when persons were tested positive for the virus, they were isolated in hospitals or quarantined at home [17]. These control mechanisms, as well as many others such as strategic closing of places of worship and schools, are the likely cause of the lack of a clear Benford distribution for this country.
When considering the anomalously high frequencies of FSDs 5, 6, 7 and 8, the explanation may lie in the initial slow growth followed by rapid growth in later months. Furthermore, similar to Trinidad, the data may not have spanned enough orders of magnitude as data collection was halted on October 18th for this country and thus the number of contracted cases did not yet grow to reach the 9000 and 10,000 ranges to complete this order of magnitude. From these findings, the authors hypothesize that it is unlikely that Jamaican authorities have been manipulating, fabricating or otherwise reporting fraudulent COVID-19 data as there exists salient explanations for the non-compliance to Benford’s law.
Barbados
During the period of data collection, Barbados had only 233 contracted cases out of a population of 287,000 which corresponds to a relatively low spread of the virus which in turn corresponds to the distribution garnered that was not Benford-compliant in nature [18]. The anomalously high frequency of the FSD 9 is likely due to the fact that the cumulative count of contracted cases per day remained in the 90s for all of June, some of May and some of July. This extremely slow spread of the virus during this period is likely due to the rapid and detailed response of Barbados to the virus, including travel restrictions into the country and the implementation of the “alphabet system” for movement of citizens within the country [19]. The low contraction rate of COVID-19 in Barbados is evidently due to the strict level of control interventions put in place for the citizens by the government.
These interventions began with a detailed guide of preparations, responses and management of COVID-19 divided into four distinct stages by the relevant authorities such that each subsequent stage corresponds to an increased amount of control intervention methods in order to keep the virus from spreading rapidly [20]. Methods include social distancing, wearing cloth masks in public, limited number of individuals permitted to gather in one area, curfews and closure of selected businesses [21]. As Barbados has a significant control over the spread of the virus, it can be said that the control methods are sufficient at managing COVID-19 and thus the distribution for this country acts as a control dataset for the other Caribbean countries. As in the cases of Trinidad and Jamaica, it is proposed that there is no strong indication of fraudulent data being reported from Barbados.
The United States of America
As the USA was found to have a Benford-compliant distribution, this suggests that any control interventions that were attempted have fallen short in preventing, reducing and controlling the spread of COVID-19. These findings were in accordance with the examined literature of Koch and Okamura as well as of Wei and Vellwock [22] [23]. This strong correlation between contracted cases and the Benford distribution validates the concept that this law is an applicable method of statistical analysis for this type of data. When compared with the target Caribbean countries it is evident that the proliferation of COVID-19 in the USA has been much more rapid.
The results garnered by this study reflect the state of the pandemic in the USA such that this country, at the time of data collection, had 8.3 million contracted cases of COVID-19 and makes up approximately a quarter of all contracted cases and deaths globally [24]. There have been attempts to control the virus throughout the pandemic but, conflicting control measures such as mask mandates, political interference and a relatively large unwilling sect of the population who refused to comply, these attempts have been falling short [24]. During the early months of the pandemic, the USA instated a lockdown such that only essential workers were allowed to continue work outside of their homes. As the pandemic progressed, however, various states and businesses were reopened, individuals attended mass gatherings such as Fourth of July celebrations, there was rampant misinformation based on non-scientific sources and pseudo-science as well as political interference at each stage of the process [24]. It can thus be stated that the USA did not take as intense control interventions as the target Caribbean countries, and this is reflected in both the Benford-compliant nature of the distribution as well as in the cumulative curve per capita in Figure 2. Consequently, there has been little control of the virus and thus there is rapid proliferation that results in large daily increases in both the death toll and the number of contracted cases. The exponential nature of the data that fits Benford’s law well supports the notion that it is unlikely that there has been any manipulation of data or fraudulent reporting occurring in the USA.
New Zealand
When the plot of the cumulative count of contracted cases per day for New Zealand was examined, the anomalously high frequency of the FSD 1 was determined to be due to the slow rate of increase in the number of contracted cases in this country such that the brief period of rapid increase at the beginning of the pandemic resulted in “skipping” of many FSDs. Following the rapid increase, there was a drastic decrease in the rate of contraction such that the number of contracted cases held an FSD of 1 (in the range of 1,000 cases) for an extended period of time as, at the time of data collection, the number of contracted cases had not yet entered the 2,000 range. This drastic decrease in contraction rate was due largely to the exemplary control interventions taken by New Zealand such that this country is heralded to have had “the best response in the world to COVID-19” due to its efficient control of the virus [25].
New Zealand’s Director-General of Health, Dr. Ashley Bloomfield reported that the key to controlling the virus was rapid responses in testing, contact tracing, isolation where necessary and public adherence to health guidelines put in place during the pandemic [25]. New Zealand implemented intense combative measures very early in the pandemic in order to take significant precautions and, on March 26th, implemented a country-wide self-quarantine for non-essential workers [26]. Furthermore, there was strict border control of those entering as well as leaving the country in order to prevent the number of cases imported into the country. Unlike the other countries in this study, it was also noted that there was “high public confidence” which led to relatively good adherence to control interventions such as social distancing, limiting large groupings and wearing cloth masks [27]. It is thus proposed that there is very little chance of purposefully fraudulent data reporting or manipulation in this country, and it can thus be used as a worldwide model to reduce the spread of the virus on a global level.
Limitations
This method of statistical analysis does not take into account many outside factors that may affect the proliferation of the virus such as the economic state of the country, the willingness of the population to comply with control interventions, the political and social mechanisms of decisions made during the pandemic, the average age of the population, the susceptibility of the population to contract COVID-19 and many other such factors. Furthermore, it does not consider the size of the population for each country or the physical size of the country which both play a part in the spread of the virus. Benford’s law is thus limited to only stating broad implications from the data collected, that is, if the control interventions are sufficient or not, or if there is a hypothetical possibility that the data is fraudulent.
A further limitation of this experiment is that the dataset is relatively small as the pandemic had spanned less than 300 days in the countries considered (at the time of data collection) and thus less than 300 datapoints were available at the time of the study. Additionally, since the pandemic is still ongoing, there are no complete datasets available for consideration, but it will be useful to repeat this study with larger datasets in the future. Finally, since Benford’s law is most suited to datasets spanning many orders of magnitude, there exists a limitation when drawing conclusions as most of the countries’ datasets in this study only spanned a few orders of magnitude.
Conclusion
This study applied the first digit version of Benford’s law on the cumulative number of contracted cases per day of COVID-19 in 5 countries in order to preliminarily assess the sufficiency of the preventative measures taken as well as hypothesizing the possibility of fraudulent data reporting. It was found that the USA closely followed Benford’s law while the other 4 countries did not. This leads to the plausible conclusion that the control interventions taken by the USA are insufficient, as supported by the cumulative curve per capita for the study period, and additional measures must be taken to contain the spread of COVID-19. Additionally, it follows that, comparatively, Trinidad, Jamaica, New Zealand and Barbados have taken moderately sufficient to excellent precautions to control the virus to varying degrees as supported by the respective cumulative curves per capita for the study period. Finally, it is hypothesized that it is unlikely that there has been any manipulation of data or fraudulent reporting occurring among the target countries.
Conflict of Interest: Nothing to declare.
Ethical Approval: Not Applicable.
Informed Consent: Not Applicable.
Funding Statement: No funding.
Author Contributions: SH developed the initial concept, both authors designed the study, AA collected and analysed data, both authors interpreted data, AA prepared the manuscript and SH revised the manuscript.
References
- Singhal, Tanu. “A Review of Coronavirus Disease-2019 (COVID-19).” The Indian Journal of Pediatrics 87, no. 4 (2020): 281–86. https://doi.org/10.1007/s12098-020-03263-6.
- Alexopoulos, T, Leontsinis, S. Benford’s Law in Astronomy. Journal of Astrophysics and Astronomy 2014; 35: 639–648; doi: 10.1007/s12036-014-9303-z.
- Li, F, Han, S, Zhang, H, Ding, J, Zhang, J, Wu, J. Application of Benford’s Law in Data Analysis. Journal of Physics: Conference Series 2019; 1168: 032133; doi: 10.1088/1742-6596/1168/3/032133.
- Benford, F. The Law of Anomalous Numbers. Proceedings of the American Philosophical Society1938; 78: 551-72.
- Tammaru, M, Alver, L. Application of Benford’s Law for Fraud Detection in Financial Statements: Theoretical Review. Proceedings of the 5th International Conference on Accounting, Auditing, and Taxation (ICAAT 2016) 2016; doi: 10.2991/icaat-16.2016.46.
- Sambridge, M, Jackson, A. National COVID Numbers – Benford’s Law Looks for Errors. Nature 2020; 581:384; doi: 10.1038/d41586-020-01565-5.
- Lee, KB, Han, S, Jeong, Y. COVID-19, Flattening the Curve, and Benford’s Law. Physica A: Statistical Mechanics and its Applications 2020; doi: 10.1016/j.physa.2020.125090.
- Collins, C. Using Excel and Benford’s Law to Detect Fraud. 2017. https://www.journalofaccountancy.com/issues/2017/apr/excel-and-benfords-law-to-detect-fraud.html (Accessed 01/05/2021).
- Farhadi, N. Can We Rely On COVID-19 Data? An Assessment of Data from over 200 Countries Worldwide. Science Progress 2021; doi: 003685042110212. https://doi.org/10.1177/00368504211021232.
- Gingrich, Paul. Introductory Statistics for the Social Sciences. Regina: Dept. of Sociology and Social Sciences, University of Regina, 1992.
- NIST/SEMATECH e-Handbook of Statistical Methods, https://www.itl.nist.gov/div898/handbook/prc/section1/prc131.htm (Accessed 03/10/2021).
- Fewster, RM. A Simple Explanation of Benford’s Law. The American Statistician 2009; 63: 26–32; doi: 10.1198/tast.2009.0005.
- Parsanslal, N. Watch: Senate Passes Bill for Mandatory Mask-Wearing, 2020. https://www.looptt.com/content/watch-senate-passes-bill-mandatory-mask-wearing (Accessed 01/05/2021).
- Christopher, P. Stay-at-Home Order Extended, PM Says All Restaurants Will Be Closed, 2020. https://guardian.co.tt/news/stayathome-order-extended-pm-says-all-restaurants-will-be-closed-6.2.1093520.0305392a80 (Accessed 01/05/2021).
- Mundle, Tanesha. Employers Urged to Facilitate Work-from-Home Where Possible, 2020. https://jis.gov.jm/employers-urged-to-facilitate-work-from-home-where-possible/ (Accessed 01/05/2021).
- MOHWJ. 18 KSA Communities at High Risk of COVID-19 Transmission, 2020. https://www.moh.gov.jm/18-ksa-communities-at-high-risk-of-covid-19-transmission/ (Accessed 01/05/2021).
- PAHO/WHO. Situation Report COVID-19 Jamaica, 2020. https://www.paho.org/en/situation-report-covid-19-jamaica (Accessed 01/05/2021).
- UN. World Population Prospects, 2019. https://population.un.org/wpp/Publications/Files/WPP2019_DataBooklet.pdf (Accessed 01/05/2021).
- Government of Barbados. COVID-19 Order, 2020 Directive 4 Phase 2 (May 4-17), 2020. https://www.barbadoschamberofcommerce.com/download/covid-19-order-2020-directive-4-phase-2-may-4-17/ (Accessed 08/08/2021).
- Gooding, K. COVID-19: What Does Stage 0, 1, 2, 3 Mean, 2020. https://www.loopnewsbarbados.com/content/covid-19-management-and-response-plans-stage (Accessed 01/05/2021).
- King, K. Barbados under Curfew from March 28, 2020. http://www.loopnewsbarbados.com/content/barbadians-enters-stage-3-curfew-march-28 (Accessed 01/05/2021).
- Koch, C., Okamura, K. Benford’s Law and COVID-19 reporting. Economics Letters 2020; 196: 109573; doi: 10.1016/j.econlet.2020.109573
- Wei, A, Vellwock, A. Is COVID-19 Data Reliable? A Statistical Analysis with Benford’s Law. ResearchGate 2020. https://www.researchgate.net/publication/344164702_Is_COVID19_data_reliable_A_statistical_analysis_with_Benford’s_Law (Accessed 08/08/2021).
- Yong, E. How the Pandemic Defeated America, 2020. https://www.theatlantic.com/magazine/archive/2020/09/coronavirus-american-failure/614191/ (Accessed 01/05/2021).
- Farrer, M. New Zealand’s Covid-19 Response the Best in the World, Say Global Business Leaders, 2020. https://www.theguardian.com/world/2020/oct/08/new-zealands-covid-19-response-the-best-in-the-world-say-global-business-leaders (Accessed 01/05/2021).
- WHO. New Zealand Takes Early and Hard Action to Tackle COVID-19, 2020. https://www.who.int/westernpacific/news/feature-stories/detail/new-zealand-takes-early-and-hard-action-to-tackle-covid-19 (Accessed 01/05/2021).
- Kunzmann, K. How Did New Zealand Control COVID-19?, 2020. https://www.contagionlive.com/view/how-did-new-zealand-control-covid19 (Accessed 01/05/2021).
Figure 1: Benford’s Law Percentage Distribution of Leading Digits
Figure 2: Cumulative Number of Contracted Cases per day per Capita for each Target Country
Figure 3: Comparison of the Benford Frequency with the Respective Actual Frequencies for Each Country Examined between January and October of 2020