Article Text
Statistics from Altmetric.com
Summary box
The COVID-19 pandemic extracted a heavy mortality toll across the world during 2020 and 2021.
The pandemic toll in India was estimated by several statistical analyses that used a range of locally available mortality records for 2020–2021 from different parts of the country.
The validity of estimation methodology as well as results have been a matter of intense debate among scientists, policy analysts and civil society.
This article scrutinises the characteristics and quality of local data that were used for estimation, and assesses the potential for input data biases that could affect the validity of modelled estimates.
The official Civil Registration System (CRS) report for 2021 is scheduled for release during the second half of 2023, and would serve as an invaluable resource for pandemic mortality estimation.
We propose specific standards for demographic and event related variables in the 2023 CRS data release, as well as alternate methods to evaluate and correct for data biases.
The proposed analyses will improve the utility of death registration data for better pandemic mortality estimates for India, and provide recommendations to strengthen mortality data systems at national and subnational levels.
Background
Timely and accurate mortality statistics are essential for health policy, monitoring and evaluation of health programmes, and epidemiological research.1 The COVID-19 pandemic brought into sharp focus the critical need for reliable mortality data to monitor its impact across time, place and person. Across the world, initial data compilations were conducted according to pandemic mortality surveillance guidelines issued by the WHO.2 However, it was quickly realised that there were challenges in health sector reporting of COVID-19 deaths due to gaps in availability of confirmatory testing; variations in practices for ascribing deaths to COVID-19; and the diagnostic challenges for deaths that occur at home, outside the purview of the health sector. Hence, the surveillance community turned its attention to evaluate pandemic impact through the assessment of ‘excess’ mortality, defined as the increase in observed deaths in a defined time period, as compared with mortality expectations based on prepandemic trends. Such assessments require reliable population wide mortality data from Civil Registration and Vital Statistics systems, but such systems are not efficient in many countries. Hence, mathematical models were formed to derive both prepandemic mortality patterns as well as excess mortality attributable to the COVID-19 pandemic for such populations.3 The inherent limitation of these mathematical models are the assumptions on which calculations are based, as well as potential inaccuracies in primary data used as model inputs, which together create considerable uncertainty in estimated mortality patterns.4
In India too, the national government implemented mortality reporting according to the WHO pandemic surveillance guidelines, but these data were inadequate for the same reasons.5 In response to public interest in tracking the pandemic, various compilations of pandemic period mortality data were reported by independent investigators in national newspapers.6 This article evaluates the characteristics of pandemic period mortality data available from different sources for India, and potential biases in them that could impact accuracy of mortality models and excess mortality estimates. The article also suggests alternate statistical methods to estimate pandemic mortality in India, and makes recommendations to strengthen death registration.
Pandemic mortality data compilations
Table 1 describes various data sources that reported mortality compilations for India. The Civil Registration System (CRS) is the most comprehensive mortality data source, and was established in 1969 under the Office of the Registrar General of India (ORGI), with registrars at state and local level.6 State registrars compile and submit data to the national ORGI after 1 year of the reference period, and aggregated deaths by age and sex for all 37 states and union territories are subsequently published in an official CRS report. The most recent report is for 2020, released in June 2022.7 Although the CRS has evolved over the past five decades with significantly improved data completeness at national level, the system performance has remained patchy across the country.8
The severity of the second wave of the COVID-19 pandemic in India during March to June 2021 prompted an urgent need to estimate the mortality impact across India. Several independent investigators compiled local mortality data in different locations.9 For 14 states, data was procured either from state CRS office websites or through ‘Right to Information’ requests to the state registrar. The data comprised monthly numbers of reported deaths for varying periods between 2018 and 2021 across the states. In addition, investigators also compiled mortality data for several cities from local municipal corporations, local hospitals and crematoriums. These local data for states and cities were efficiently processed and disseminated by the investigators through a series of media reports, which served as timely initial evidence to gauge the mortality impact of the pandemic.9 These data were also used in several mortality estimation studies.10–15 Although these state/city data compilations are also sourced from the same CRS death reporting processes, we refer to them as ‘local mortality records’, since they could potentially differ from the final reported data in the official annual CRS report, which include delayed registrations up to 12 months following the reference period. The characteristics and potential limitations of these ‘local mortality records’ are discussed in the following section.
The Health Management Information System (HMIS) compiles reports of deaths that have occurred in health facilities across the country. Annual HMIS reports for all states are published, and data is also available from the Ministry of Health and Family Welfare website.16 HMIS data for the years 2018–2021 were used by one research team in their excess mortality estimation methodology.11 Prior to the pandemic, HMIS reporting was largely confined to public sector institutions, which limits use of the data to make predictions of population based expected mortality in 2020 and 2021. Also, not all the observed increase in HMIS reported deaths in 2020–2021 should be solely considered to be due to the pandemic, since increased attention to data reporting for pandemic surveillance could have improved private health facility reporting compliance.
The C Voter survey is a nationally representative telephonic sample household survey that regularly polls electoral preferences in the sample population. The survey conducted during July to September 2021 including a substudy of 57 000 people in 13 500 households who were asked to report all COVID-19 and non-COVID-19 deaths that had occurred in immediate family members during January 2019 to June 2021. A total of 2107 deaths were recorded to have occurred in the sample population during the reference period, which were used for analysis of excess mortality.11 Essentially, the C Voter survey mortality data is based on self-reported events without any official verification of records. In addition, the small sample size also limits generalisability of these data in modelling pandemic mortality estimates for India.
The Consumer Pyramid Household Survey is a routine quarterly socioeconomic survey covering 0.87 million individuals from 170 000 households. During 2020−2021, each quarterly survey collected reports of household deaths that had occurred within the previous 3 months. A total of 8694 deaths were recorded in the sample population, and these data were used for mortality estimation by Anand et al.12 Since this was an economic survey, its sample size was not sufficiently powered to derive measures for rare events such as deaths. Also, the survey report lacks information on the specific variables collected for each death, which limits the reliability of these data for modelling excess mortality.
Another primary data compilation used to estimate the pandemic mortality toll was the Indian Railways database which recorded 1952 deaths between March 2020 and May 2021 among a cohort of 1.3 million railway employees.17 Although a largely pan-Indian sample (with the exception of mountainous areas), the relatively small sample size, the socioeconomic characteristics of railway employees, and their accessibility to health services and medical facilities limits the generalisability of the mortality experience of this sample to the national population.
Limitations of local mortality records
From the above description, it is evident that the CRS primary data is most appropriate of all to evaluate population level mortality patterns, given the legally mandated total population coverage which ensures adequate sample size for subnational mortality estimation. A recent review of analyses that reported COVID-19 mortality estimates for India identified that most of the studies used these ‘local mortality records’ as inputs for their statistical models.18 We compared these ‘local mortality records’ to the official CRS data contained for these 14 states in the annual CRS reports for 2018, 2019 and 2020 (see table 2).
As can be seen, the ‘local mortality records’ are relatively lower than the annual CRS reports for all states, indicating that these data were possible derived from ‘provisional’ databases maintained at local registration departments, rather than final tallies of verified events at the end of the official reporting period of 12 months from the date of death, submitted to the national ORGI for dissemination in annual CRS reports. Hence, forward projections of these under-reported data from ‘local mortality records’ for the prepandemic period would under-estimate expected deaths during the pandemic, and these lower baselines would tend to exaggerate the magnitude of estimated excess mortality.
From another perspective, even the data from annual CRS reports are biased due to incompleteness of death registration, which is estimated by the ORGI using crude death rates from the Sample Registration System (SRS), combined across both sexes as the evaluation standard for comparison.19 Table 2 shows the ORGI estimate of completeness for each state for 2019, along with an alternate set of sex-specific completeness estimates derived from an age-stratified weighted analysis using age-specific death rates from the SRS and National Family Health Survey (NFHS) for comparison.20 As can be seen, the age stratified weighted analysis shows that CRS completeness is lower than that inferred from the ORGI aggregated analysis. Another analysis by Saikia et al using self-reported responses regarding registration of reported deaths in the NFHS household sample found that only 70% of the reported deaths had been registered.21
On the other hand, CRS mortality data trends have shown a steady increase in data completeness over the past decade.19 Hence, any increase in reported deaths during 2020–2021 could not entirely be attributed to the pandemic, since there could be an element of increased reporting compliance, in keeping with improving CRS performance trends. The likelihood of improved reporting compliance is supported by the fact that the national government had announced financial compensation for pandemic victims, for which proof in terms of death registration documents was necessary, and this resulted in a general enforcement of death registration protocols in all states.22
On comparing mortality estimates from different studies, the highest was 4.7 million excess deaths in India during 2020–2021, from the WHO study.15 However, the review of all pandemic mortality studies for India noted that the methods did not satisfactorily account for various sources of bias18 These include the use of prepandemic completeness estimates to adjust ‘local mortality records’ from the pandemic period, without considering likelihood of improvements in reporting compliance during the pandemic. Other areas of uncertainty noted in the review included the limited generalisability of extrapolation of data from the selected reporting states to the whole country, as well as the limited plausibility in projection of data from the first few months of 2021 across the entire calendar year.18 All these sources of bias raise the likelihood of potential over estimation of the magnitude of pandemic mortality in India, and the need for a more nuanced approach to understand the actual impact.
In search of robust pandemic mortality measures
The characteristics of data sources displayed in tables 1 and 2 emphasise the importance of CRS data for pandemic mortality measurement in India, despite challenges in quantifying and adjusting for varying levels of data completeness by sex, age and location. As per standard practice, the official CRS report for 2021 is expected to be released sometime during the second half of 2023, which will include all delayed registrations that may have resulted from pandemic related disruptions. As mentioned earlier, the government monetary support scheme for COVID-19 related deaths could have incentivised delayed death registration for both 2020 and 2021, thus improving the prospects of the availability of more reliable data to measure pandemic mortality.22 It will be necessary to conduct a detailed analysis of completeness of the 2020/2021 CRS data and make relevant adjustments as required, to improve the potential accuracy of excess mortality estimates.
Despite these prospects, however, there are some background aspects of CRS data from official reports, that serve as constraints for detailed mortality analysis, as follows:
CRS forms record both ‘place of usual residence’ of the deceased and ‘place of occurrence’ of the death, but the CRS report provides data only by place of occurrence.19 Data should be made available from both aspects, since statistics by place of usual residence eliminate distortions arising from temporary migration and help understand disease exposure patterns, while analysis by place of occurrence could guide the planning of critical care services.
CRS provides adult mortality data according to 10-year age groups (15–24, 25–34, … 55–64, 65–69) with a terminal category of 70 years and above. In view of the known associations between older ages and COVID-19 mortality as well as non-communicable diseases it is desirable that CRS data are available in standard demographic age categories (0–1, 1–4, 5–9, 10–14, …. 80–84, 85+) to enable in-depth understanding of mortality patterns in the elderly.
Annual CRS reports include a summary table of delayed registrations aggregated for all previous years, and for both sexes and all ages for each state. However, given the time sensitivity of pandemic mortality metrics, it is desirable that data on delayed registrations are made available for each state by sex, age and year of death occurrence for 2019 onwards. This will facilitate their appropriate integration with previously released annual data, and enable improved time-specific analysis.
Owing to irregular data submission from some states, there are variations in inclusion of data from states in different years, which hinders subnational mortality trend analysis.
Finally, although there have been improvements in CRS performance, there still remains the potential for under-reporting during the pandemic period. Such gaps in completeness would preferably require detailed data from the SRS for 2021, as a comparator for evaluation.19 In addition to completeness estimation methods discussed earlier, record linkage between individual records from SRS and CRS for the SRS sample sites could be used with ‘capture-recapture’ analysis to derive the Chandrasekhar-Deming completeness factor, as direct evidence of the magnitude of such bias.23 Such record linkage analysis could also be attempted with individual records from other local mortality sources including surveys, HMIS and other administrative databases, ensuring that geographic and time references are of the same scope as CRS records.
Conclusions and the way forward
Reliable measurement of the mortality impact of the COVID-19 pandemic in India is essential to understand the epidemiological profile of the disease, the health status of the population and the priorities for health system strengthening into the future.24 The forthcoming annual CRS report for 2021 holds much promise for providing the best possible primary evidence for such mortality measurement, which will reduce uncertainty from current estimates. We propose that the national public health leadership should convene a Task Force of national experts in this field who could be provided access to all the microdata from the CRS, SRS and other relevant data sources as necessary. It is anticipated that the Task Force would then be sufficiently equipped with all the available and required information to conduct a thorough analysis using standard statistical methods that use empirical data, to provide the required mortality measures at national, state and district level. The activities and outputs of such a detailed analysis would have both immediate and long-term influence on the performance of the death registration system, towards strengthening evidence-based health monitoring and policy action for India.
Data availability statement
Data are available in a public, open access repository.
Ethics statements
Patient consent for publication
Ethics approval
Not applicable.
Footnotes
Handling editor Seye Abimbola
Twitter @DrMamtaGupta7, @ChalapatiRao13, @Munitajat05, @nanditajnu
Contributors MG and CR conceptualised the manuscript, and MG prepared the first draft. All authors collaborated in sourcing relevant information, interpretation of findings and revisions to prepare the final version of the manuscript.
Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.
Competing interests None declared.
Provenance and peer review Not commissioned; externally peer reviewed.