The binomial distribution of meta-analysis was preferred to model within-study variability

doi:10.1016/j.jclinepi.2007.03.016

Journal of Clinical Epidemiology

Volume 61, Issue 1, January 2008, Pages 41-51

https://doi.org/10.1016/j.jclinepi.2007.03.016 Get rights and content

Abstract

Objective

When studies report proportions such as sensitivity or specificity, it is customary to meta-analyze them using the DerSimonian and Laird random effects model. This method approximates the within-study variability of the proportion by a normal distribution, which may lead to bias for several reasons. Alternatively an exact likelihood approach based on the binomial within-study distribution can be used. This method can easily be performed in standard statistical packages. We investigate the performance of the standard method and the alternative approach.

Study Design and Setting

We compare the two approaches through a simulation study, in terms of bias, mean-squared error, and coverage probabilities. We varied the size of the overall sensitivity or specificity, the between-studies variance, the within-study sample sizes, and the number of studies. The methods are illustrated using a published meta-analysis data set.

Results

The exact likelihood approach performs always better than the approximate approach and gives unbiased estimates. The coverage probability, in particular for the profile likelihood, is also reasonably acceptable. In contrast, the approximate approach gives huge bias with very poor coverage probability in many cases.

Conclusion

The exact likelihood approach is the method of preference and should be used whenever feasible.

Introduction

In this paper we consider meta-analysis of proportions. Very frequently occurring examples of proportions being meta-analyzed are sensitivity or specificity of a diagnostic test. Therefore, this article is written from a diagnostic research perspective, though the results apply to meta-analysis of proportions in general, such as prevalences or incidences.

Meta-analytic methods for a diagnostic test depend on the type of data that are available from the different studies. In most medical articles, the commonly reported measures of diagnostic test accuracy are sensitivity and/or specificity. Alternatively, other measures such as diagnostic odds ratio (OR), predictive values, and area under the receiver operating characteristic (ROC) curve are reported.

Statistical methods to pool the results of diagnostic test measures from different studies lay on different assumptions. For example, it might be assumed that the observed differences between individual study results are only due to sampling variation, leading to what is called a fixed effect analysis. When an estimate of the sensitivity or specificity is reported in a single study, the simplest method to get a summary measure is to calculate the average sensitivity and/or specificity, possibly with weights depending on the within-study sample sizes or standard errors (SEs). However, this approach is usually inappropriate because it is likely that variability beyond chance can be attributed to between-study differences [1], [2]. Some of the between-study variability could be accounted for using explanatory variables in a regression analysis. But mostly not all heterogeneity can be explained, and a random effects model is used in the statistical analysis that allows between-studies heterogeneity [3], [4].

In the last decade, many random effects methods have been developed to relax the fixed effect assumptions in meta-analysis [5], [6], [7], [8] of diagnostic tests [9], [10]. Some of these methods enable analyzing sensitivity and specificity jointly. However, in the medical literature numerous meta-analyses are published in which one is interested in meta-analyzing only sensitivity or specificity, and in this paper we concentrate on this situation. Then the standard way of analysis is with the DerSimonian and Laird [6] random effects model. It is not well known that this method can be heavily biased when it is applied to proportions, such as specificities or sensitivities, though some authors have mentioned this [5], [11], [12]. Chang et al. [11] have proposed a method that repairs the bias. However, this article has been cited only once since the year 2001, showing that in practice this method is not used. It might be due to the difficulty to perform the method easily in standard statistical packages. The reason for the standard method being biased is that the binomial within-study likelihood of the sensitivity or specificity is approximated using a normal likelihood. It is well known that this approximation can be bad if the proportion is close to one or zero, and/or the sample size is small. So bias can be expected if this is the case in a meta-analysis. However, even if the normal approximation would be good enough for ordinary applications, bias could be introduced because the use of the normal approximation in meta-analysis ignores the correlation between the estimated proportion and its variance. We come back to this point in the next section. Nowadays standard statistical packages allow for fitting generalized linear mixed models (GLMM). This makes it very easy to use the exact binomial within-study distribution of the estimated sensitivity or specificity instead of a normal approximation of it. In this article, we call the latter the approximate method and the former the exact method.

The purpose of this article is to compare the performance of the two modeling approaches, approximate and exact, through a simulation study. In Section 2, both methods are discussed. In Section 3 we describe the design of the simulation study, and in Section 4 we present the results. In Section 5, we apply the methods on real meta-analysis data. We end with a discussion in Section 6. We used SAS software (version 9; SAS Institute, Cary, NC) to simulate the data and to estimate the parameters for the models discussed in Section 2.

Section snippets

Random effects model

In a situation where the interest is to meta-analyze sensitivities or specificities separately, the commonly used method is the DerSimonian and Laird [6] random effects model. In the remainder of this paper we will talk about meta-analyzing sensitivities, but all the results apply to specificities as well. In fact, the results apply to any meta-analysis where the target parameter is a proportion or probability and each study contributes a sample size and a number of “successes.” Unlike a fixed

Simulation study

A simulation study was carried out to compare the performance of the two methods, approximate and exact, discussed in Section 2. We investigated the effect of the number of studies included in the meta-analysis, the mean within-study sample size, the between-study variability, and the true median sensitivity. The data were simulated in two steps. First, the true logit sensitivity, η_i, was simulated from a normal distribution with a given mean logit sensitivity η and between-studies variance τ².

Simulation results

The results from the simulations are presented in Fig. 1, Fig. 2, and Table 2. Fig. 1 shows the biases and MSEs for the mean logit sensitivity η. It can be seen from Fig. 1a that the exact likelihood approach yields estimates of η that are quite unbiased regardless of the different scenarios; that is, the expected value of the estimated η using the exact method is almost equal to the true value, and always closer to the true value than the approximate likelihood method. The bias in the

Data example

To illustrate the methods discussed in this article, we reanalyzed the data of a published meta-analysis [33]. Patwardhan et al. [33] present data from 15 studies to assess the operating characteristics of positron emission tomography (PET) by using fluorine 18 fluorodexyglucose (FDG). They performed a literature search in the MEDLINE, CINAHL, and HealthSTAR databases published between 1989 and 2003. Articles were selected if FDG PET was performed with a dedicated scanner and the resolution was

Discussion

In numerous medical articles sensitivities or specificities, or more generally proportions are analyzed, nowadays almost invariably with the DerSimonian and Laird [6] random effects model. This model uses a normal distribution for the logit transformed true probabilities. Alternatively, one could assume a beta distribution for the true probabilities. Then the model can be fitted in a statistical package such as EGRET. However, this model is not used in practice, may be due to the fact that many

References (40)

R. DerSimonian et al.
Meta-analysis in clinical trials
Control Clin Trials
(1986)
J.B. Reitsma et al.
Bivariate analysis of sensitivity and specificity produces informative summary measures in diagnostic reviews
J Clin Epidemiol
(2005)
L. Irwig et al.
Guidelines for meta-analyses evaluating diagnostic tests
Ann Intern Med
(1994)
J.G. Lijmer et al.
Exploring sources of heterogeneity in systematic reviews of diagnostic tests
Stat Med
(2002)
S.G. Thompson et al.
How should meta-regression analyses be undertaken and interpreted
Stat Med
(2002)
R.J. Hardy et al.
Detecting and describing heterogeneity in meta-analysis
Stat Med
(1998)
C.S. Berkey et al.
A random-effect regression model for meta-analysis
Stat Med
(1995)
L.R. Arends et al.
Combining multiple outcome measures in meta-analysis: an application
Stat Med
(2003)
H.C. Van Houwelingen et al.
Advanced methods in meta-analysis: multivariate approach and meta-regression
Stat Med
(2002)
C.M. Rutter et al.
A hierarchical regression approach to meta-analysis of diagnostic test accuracy evaluations
Stat Med
(2001)

B.H. Chang et al.

Meta-analysis of binary data: which study variance estimate to use?

Stat Med

(2001)

R.W. Platt et al.

Generalized linear mixed models for meta-analysis

Stat med

(1999)

D.R. Cox

The analysis of binary data

(1970)

L.E. Moses et al.

Combining independent studies of a diagnostic test into a summary ROC curve: data-analytic approaches and some additional considerations

Stat Med

(1993)

R.M. Turner et al.

A multilevel model framework for meta-analysis of clinical trials with binary outcomes

Stat Med

(2000)

G. Knapp et al.

Assessing the amount of heterogeneity in random-effects meta-analysis

Biom J

(2006)

SAS Institute Inc.

SAS/STAT 9.1 user's guide

(2004)

J. Rasbash et al.

A user's guide to MLwiN

(2004)

B.W.J. Mol et al.

The accuracy of single serum progesterone measurement in the diagnosis of ectopic pregnancy: a meta-analysis

Human Reproduction

(1998)

M. Cruciani et al.

Systematic review of the accuracy of the ParaSightTM-F test in the diagnosis of Plasmodium falciparum malaria

Med Sci Monit

(2004)

Cited by (324)

Direct comparison of the diagnostic accuracy of 2-[<sup>18</sup>F]-fluoro-2-deoxy-D-glucose PET/CT and MRI for the differentiation of malignant peripheral nerve sheath tumour in neurofibromatosis type I: a meta-analysis
2024, Clinical Radiology
To compare the diagnostic test of integrated 2-[¹⁸F]-fluoro-2-deoxy-d-glucose positron-emission tomography/computed tomography (FDG PET/CT) with that of magnetic resonance imaging (MRI) for the differentiation of malignant peripheral nerve sheath tumours (MPNSTs) in neurofibromatosis type 1 (NF1) patients.
A systematic search was performed in PubMed and EMBASE (last updated in 30 November 2022). Studies investigating the performance of FDG PET/CT and MRI for differentiation of MPNSTs were eligible for inclusion. Only studies reporting a direct comparison between these imaging methods were considered to establish precise summary estimates in the same setting of patients.
The pooled estimate of sensitivity of FDG PET/CT was 0.99 and a pooled specificity of 0.53. The pooled estimate of sensitivity of MRI was 0.85 and a pooled specificity of 0.85.
Analysis of the available studies indicated that FDG PET/CT and MRI had similar diagnostic performances for differentiation of MPNSTs in patients with NF1; however, either technique can be a complement to the other rather than being used singly.
Worldwide prevalence of natal and neonatal teeth: Systematic review and meta-analysis
2023, Journal of the American Dental Association
Identifying the presence of teeth in newborns is important as it may require immediate care. This study aimed to determine the worldwide prevalence of natal and neonatal teeth.
Six electronic databases and the gray literature were searched on February 23, 2023 to identify observational studies reporting the prevalence of natal or neonatal teeth. Studies assuming natal and neonatal teeth as identical terms or not reporting prevalence indicators were excluded. The methodological quality of the studies was assessed using the Joanna Briggs Institute checklist for studies reporting prevalence data. The worldwide prevalence of natal and neonatal teeth was estimated via proportion meta-analysis using a β-binomial model. Heterogeneity across studies was explored via subgroup analyses and meta-regression.
None of the 23 included studies fulfilled all items of the methodological quality checklist. The worldwide prevalence of natal teeth was 34.55 (95% CI, 20.12 to 59.26) per 10,000, and the prevalence of neonatal teeth was 4.52 (95% CI, 2.59 to 17.91) per 10,000. Subgroup analysis by continent showed that the prevalence of natal teeth ranged from 11.26 (95% CI, 7.58 to 16.61) per 10,000 in Asia through 75.32 (95% CI, 51.11 to 99.86) per 10,000 in North America, and the prevalence of neonatal teeth ranged from 3.52 (95% CI, 1.73 to 7.06) per 10,000 in Europe through 6.01 (95% CI, 2.25 to 16.60) per 10,000 in South America. Meta-regression did not find a statistically significant association between prevalence rates and year of publication or sample size.
Approximately 1 in 289 newborns had natal teeth and 1 in 2,212 had neonatal teeth. Although this is not a high prevalence, professionals must be alert to identify these conditions, which often require immediate care.
Chlamydia pneumonia infection and risk of multiple sclerosis: A meta-analysis
2023, Multiple Sclerosis and Related Disorders
The role of infectious agents, including Chlamydia pneumoniae (Cpn), in the development of multiple sclerosis (MS), is still a matter of major contention.
This meta-analysis study aimed to assess the actual involvement of Cpn in MS development.
We undertook a search of international scientific databases to identify eligible studies. We used a random-effects meta-analysis model (REM) to generate the pooled odds ratio (OR) and 95% confidence intervals (CIs). Heterogeneity was calculated using the I² statistic. Sensitivity and subgroup analyses were applied to assess the effects of study characteristics and socio-demographic variables on the pooled OR.
We identified 37 studies comprising 51 datasets that satisfied the inclusion criteria. Considering diagnostic methods for Cpn, 26 and 25 datasets used PCR- and serological-based methods, respectively. In PCR-based datasets, REM showed a significant positive association between Cpn infection and the development of MS (OR, 5.29; 95% CI, 3.12–8.97), while a non-significant positive association was achieved in serological-based datasets (OR, 1.34; 95% CI, 0.88–2.03). In subgroup analyses on PCR-based datasets, results were significant for both CSF (OR, 5.70) and serum (OR, 4.84) samples; both healthy (OR, 16.11) and hospital-based (OR, 2.88) controls; and both moderate (OR, 5.14) and high (OR, 5.48) quality studies. In serological-based datasets, only those that used CSF samples yielded significant results (OR, 3.41).
Our findings verify the significant positive relationship between Cpn infection and MS. We advocate prospective cohort studies with lifelong follow-ups and also experimental studies to better understand the role of Cpn in MS development.
Performance of saliva compared with nasopharyngeal swab for diagnosis of COVID-19 by NAAT in cross-sectional studies: Systematic review and meta-analysis
2023, Clinical Biochemistry
Citation Excerpt :
The degree of interdependence between performance measures (sensitivity and specificity) was tested using bivariate box plot. For subgroups with <4 studies, we used random effects logistic regression for meta-analysis of diagnostic accuracy data (an extension of generalized linear model for binomial family with a logit link) [22,23]. Publication bias was also assessed using both statistical tests and visual inspection of funnel plot asymmetry [24].
Nucleic acid amplification testing (NAAT) is the preferred method to diagnose coronavirus disease 2019 (COVID-19). Saliva has been suggested as an alternative to nasopharyngeal swabs (NPS), but previous systematic reviews were limited by the number and types of studies available. The objective of this systematic review and meta-analysis was to assess the diagnostic performance of saliva compared with NPS for COVID-19. We searched Ovid MEDLINE, Embase, Cochrane, and Scopus databases up to 24 April 2021 for studies that directly compared paired NPS and saliva specimens taken at the time of diagnosis. Meta-analysis was performed using an exact binomial rendition of the bivariate mixed-effects regression model. Risk of bias was assessed using the QUADAS-2 tool. Of 2683 records, we included 23 studies with 25 cohorts, comprising 11,582 paired specimens. A wide variety of NAAT assays and collection methods were used. Meta-analysis gave a pooled sensitivity of 87 % (95 % CI = 83–90 %) and specificity of 99 % (95 % CI = 98–99 %). Subgroup analyses showed the highest sensitivity when the suspected individual is tested in an outpatient setting and is symptomatic. Our results support the use of saliva NAAT as an alternative to NPS NAAT for the diagnosis of COVID-19.
Residual respiratory disability after successful treatment of pulmonary tuberculosis: a systematic review and meta-analysis
2023, eClinicalMedicine
Pulmonary tuberculosis (PTB) can result in long-term health consequences, even after successful treatment. We conducted a systematic review and meta-analysis to estimate the occurrence of respiratory impairment, other disability states, and respiratory complications following successful PTB treatment.
We identified studies from January 1, 1960, to December 6, 2022, describing populations of all ages that successfully completed treatment for active PTB and had been assessed for at least one of the following outcomes: occurrence of respiratory impairment, other disability states, or respiratory complications following PTB treatment. Studies were excluded if they reported on participants with self-reported TB, extra-pulmonary TB, inactive TB, latent TB, or if participants had been selected on the basis of having more advanced disease. Study characteristics and outcome-related data were abstracted. Meta-analysis was performed using a random effects model. We adapted the Newcastle Ottawa Scale to evaluate the methodological quality of the included studies. Heterogeneity was assessed using the I² statistic and prediction intervals. Publication bias was assessed using Doi plots and LFK indices. This study is registered with PROSPERO (CRD42021276327).
61 studies with 41,014 participants with PTB were included. In 42 studies reporting post-treatment lung function measurements, 59.1% (I² = 98.3%) of participants with PTB had abnormal spirometry compared to 5.4% (I² = 97.4%) of controls. Specifically, 17.8% (I² = 96.6%) had obstruction, 21.3% (I² = 95.4%) restriction, and 12.7% (I² = 93.2%) a mixed pattern. Among 13 studies with 3179 participants with PTB, 72.6% (I² = 92.8%) of participants with PTB had a Medical Research Council dyspnoea score of 1–2 and 24.7% (I² = 92.2%) a score of 3–5. Mean 6-min walk distance in 13 studies was 440.5 m (I² = 99.0%) in all participants (78.9% predicted, I² = 98.9%) and 403.0 m (I² = 95.1%) among MDR-TB participants in 3 studies (70.5% predicted, I² = 97.6%). Four studies reported data on incidence of lung cancer, with an incidence rate ratio of 4.0 (95% CI 2.1–7.6) and incidence rate difference of 2.7 per 1000 person-years (95% CI 1.2–4.2) when compared to controls. Quality assessment indicated overall low-quality evidence in this field, heterogeneity was high for pooled estimates of nearly all outcomes of interest, and publication bias was considered likely for almost all outcomes.
The occurrence of post-PTB respiratory impairment, other disability states, and respiratory complications is high, adding to the potential benefits of disease prevention, and highlighting the need for optimised management after successful treatment.
Canadian Institutes of Health Research Foundation Grant.
Neurological and psychiatric presentations associated with human monkeypox virus infection: A systematic review and meta-analysis
2022, eClinicalMedicine
Neuropsychiatric presentations of monkeypox (MPX) infection have not been well characterised, despite evidence of nervous system involvement associated with the related smallpox infection.
In this pre-registered (PROSPERO ID 336649) systematic review and meta-analysis, we searched MEDLINE, EMBASE, PsycINFO, AMED and the preprint server MedRxiv up to 31/05/2022. Any study design of humans infected with MPX that reported a neurological or psychiatric presentation was included. For eligible symptoms, we calculated a pooled prevalence using an inverse variance approach and corresponding 95% confidence intervals. The degree of variability that could be explained by between-study heterogeneity was assessed using the I² statistic. Risk of bias was assessed with the Newcastle Ottawa Scale and the Joanna Briggs Institute quality assessment tool.
From 1705 unique studies, we extracted data on 19 eligible studies (1512 participants, 1031 with confirmed infection using CDC criteria or PCR testing) most of which were cohort studies and case series with no control groups. Study quality was generally moderate. Three clinical features were eligible for meta-analysis: seizure 2.7% (95% CI 0.7–10.2%, I² 0%), confusion 2.4% (95% CI 1.1–5.2%, I² 0%) and encephalitis 2.0% (95% 0.5–8.2%, I² 55.8%). Other frequently reported symptoms included myalgia, headache and fatigue, where heterogeneity was too high for estimation of pooled prevalences, possibly as a result of differences in viral clades and study methodology.
There is preliminary evidence for a range of neuropsychiatric presentations including severe neurological complications (encephalitis and seizure) and nonspecific neurological features (confusion, headache and myalgia). There is less evidence regarding the psychiatric presentations or sequelae of MPX. This may warrant surveillance within the current MPX outbreak, with prospective longitudinal studies evaluating the mid- to long-term sequelae of the virus. Robust methods to evaluate the potential causality of MPX with these clinical features are required. More evidence is necessary to explain heterogeneity in prevalence estimates.
UKRI/MRC (MR/V03605X/1), MRC-CSF (MR/V007181/1), MRC/AMED (MR/T028750/1) and the Wellcome Trust (102186/B/13/Z) and (102186/B/13/Z) and UCLH BRC.

View all citing articles on Scopus

View full text

Original ArticleThe binomial distribution of meta-analysis was preferred to model within-study variability

Abstract

Objective

Study Design and Setting

Results

Conclusion

Introduction

Section snippets

Random effects model

Simulation study

Simulation results

Data example

Discussion

Control Clin Trials

J Clin Epidemiol

Guidelines for meta-analyses evaluating diagnostic tests

Ann Intern Med

Exploring sources of heterogeneity in systematic reviews of diagnostic tests

Stat Med

How should meta-regression analyses be undertaken and interpreted

Stat Med

Detecting and describing heterogeneity in meta-analysis

Stat Med

A random-effect regression model for meta-analysis

Stat Med

Combining multiple outcome measures in meta-analysis: an application

Stat Med

Advanced methods in meta-analysis: multivariate approach and meta-regression

Stat Med

A hierarchical regression approach to meta-analysis of diagnostic test accuracy evaluations

Stat Med

Meta-analysis of binary data: which study variance estimate to use?

Stat Med

Generalized linear mixed models for meta-analysis

Stat med

The analysis of binary data

Combining independent studies of a diagnostic test into a summary ROC curve: data-analytic approaches and some additional considerations

Stat Med

A multilevel model framework for meta-analysis of clinical trials with binary outcomes

Stat Med

Assessing the amount of heterogeneity in random-effects meta-analysis

Biom J

SAS/STAT 9.1 user's guide

A user's guide to MLwiN

The accuracy of single serum progesterone measurement in the diagnosis of ectopic pregnancy: a meta-analysis

Human Reproduction

Systematic review of the accuracy of the ParaSightTM-F test in the diagnosis of Plasmodium falciparum malaria

Med Sci Monit

Original Article
The binomial distribution of meta-analysis was preferred to model within-study variability