Abstract
Studies based on databases, medical records and registers are used extensively today in epidemiological research. Despite the increasing use, no developed methodological literature on use and evaluation of population-based registers is available, even though data collection in register-based studies differs from researcher-collected data, all persons in a population are available and traditional statistical analyses focusing on sampling error as the main source of uncertainty may not be relevant. We present the main strengths and limitations of register-based studies, biases especially important in register-based studies and methods for evaluating completeness and validity of registers. The main strengths are that data already exist and valuable time has passed, complete study populations minimizing selection bias and independently collected data. Main limitations are that necessary information may be unavailable, data collection is not done by the researcher, confounder information is lacking, missing information on data quality, truncation at start of follow-up making it difficult to differentiate between prevalent and incident cases and the risk of data dredging. We conclude that epidemiological studies with inclusion of all persons in a population followed for decades available relatively fast are important data sources for modern epidemiology, but it is important to acknowledge the data limitations.
Similar content being viewed by others
References
Irgens LM, Bjerkeda T. Epidemiology of leprosy in Norway—history of National Leprosy Registry of Norway from 1856 until today. Int J Epidemiol. 1973;2(1):81–9.
Goldberg J, Gelfand HM, Levy PS. Registry evaluation methods: a review and case study. Epidemiol Rev. 1980;2:210–20.
St Sauver JL, Grossardt BR, Yawn BP, Melton LJ III, Rocca WA. Use of a medical records linkage system to enumerate a dynamic population over time: the Rochester epidemiology project. Am J Epidemiol. 2011;173:1059–68.
Olsen J, Bronnum-Hansen H, Gissler M, Hakama M, Hjern A, Kamper-Jorgensen F, et al. High-throughput epidemiology: combining existing data from the Nordic countries in health-related collaborative research. Scand J Public Health. 2010;38:777–9.
Thygesen LC, Daasnes C, Thaulow I, Bronnum-Hansen H. Introduction to Danish (nationwide) registers on health and social issues: structure, access, legislation, and archiving. Scand J Public Health. 2011;39:12–6.
Sorensen TI. Great scientific potential in Danish registries [in Danish]. Ugeskr Laeger. 1994;156:5812–3.
Frank L. Epidemiology—when an entire country is a cohort. Science. 2000;287:2398–9.
Sorensen HT, Sabroe S, Olsen J. A framework for evaluation of secondary data sources for epidemiological research. Int J Epidemiol. 1996;25:435–42.
Sorensen H. Regional administrative health registries as a resource in clinical epidemiology. Aarhus: Aarhus University; 1996.
Sorensen H. Regional administrative health registries as a resource in clinical epidemiology. Int J Risk Saf Med. 1997;10:1–22.
Pike MC, Henderson BE, Casagrande JT, Rosario I, Gray GE. Oral-contraceptive use and early abortion as risk-factors for breast-cancer in young-women. Br J Cancer. 1981;43:72–6.
Brind J, Chinchilli VM, Severs WB, Summy-Long J. Induced abortion as an independent risk factor for breast cancer: a comprehensive review and meta-analysis. J Epidemiol Community Health. 1996;50:481–96.
Melbye M, Wohlfahrt J, Olsen JH, Frisch M, Westergaard T, Helweg-Larsen K, et al. Induced abortion and the risk of breast cancer. N Engl J Med. 1997;336:81–5.
Blenstrup LT, Knudsen LB. Danish registers on aspects of reproduction. Scand J Public Health. 2011;39(7 Suppl.):79–82.
Gjerstorff ML. The Danish cancer registry. Scand J Public Health. 2011;39(7 Suppl):42–5.
Norgaard M, Wogelius P, Pedersen L, Rothman KJ, Sorensen HT. Maternal use of oral contraceptives during early pregnancy and risk of hypospadias in male offspring. Urology. 2009;74:583–7.
Peltola M, Juntunen M, Hakkinen U, Rosenqvist G, Seppala TT, Sund R. A methodological approach for register-based evaluation of cost and outcomes in health care. Ann Med. 2011;43:S4–13.
Sund R, Nurmi-Luthje I, Luthje P, Tanninen S, Narinen A, Keskimaki I. Comparing properties of audit data and routinely collected register data in case of performance assessment of hip fracture in Finland. Methods Inf Med. 2007;46:558–66.
Dans PE. Looking for answers in all the wrong places. Ann Intern Med. 1993;119:855–7.
Hsia DC, Krushat WM, Fagan AB, Tebbutt JA, Kusserow RP. Accuracy of diagnostic coding for Medicare patients under the prospective-payment system. N Engl J Med. 1988;318:352–5.
Irgens LM. Challenges to registry-based epidemiology in post-modernistic civilization. Nor Epidemiol. 2001;11:127–31.
United Nations Economic Commission of Europe. Register-based statistics in the Nordic countries. New York: United Nations; 2007.
Wallgren A, Wallgren B. Register-based statistics—administrative data for statistical purposes. Sussex: Wiley; 2007.
Hartley HO, Sielken RL Jr. A “super-population viewpoint” for finite population sampling. Biometrics. 1975;31:411–22.
Edington ES. Randomization tests. New York: Marcel Dekker; 1986.
Sorensen HT, Schulze S. Danish health registries. A valuable tool in medical research. Dan Med Bull. 1996;43:463.
Agerbo E. Epidemiological suicide research based on Danish routine registers. Aarhus: Aarhus University; 2009.
Olsen J. Register-based research: some methodological considerations. Scand J Public Health. 2011;39:225–9.
Jensen VM, Rasmussen AW. Danish education registers. Scand J Public Health. 2011;39(7 Suppl):91–4.
Olsen J. Using secondary data. In: Rothman KJ, Greenland S, Lash TL, editors. Modern epidemiology. Philadelphia, PA: Lippincott Williams & Wilkins; 2008. p. 481–91.
Thomsen CF, Skovdal J, Helkjaer PE. Intraobserver variation in the classification of diseases [in Danish]. Ugeskr Laeger. 1995;157:3746–9.
Green J, Wintfeld N. How accurate are hospital discharge data for evaluating effectiveness of care? Med Care. 1993;31:719–31.
Jencks SF, Williams DK, Kay TL. Assessing hospital-associated deaths from discharge data. The role of length of stay and comorbidities. JAMA. 1988;260:2240–6.
Ray WA. Improving automated database studies. Epidemiology. 2011;22:302–4.
Weiss NS. The new world of data linkages in clinical epidemiology: are we being brave or foolhardy? Epidemiology. 2011;22:292–4.
Schneeweiss S. Sensitivity analysis and external adjustment for unmeasured confounders in epidemiologic database studies of therapeutics. Pharmacoepidemiol Drug Saf. 2006;15:291–303.
Schneeweiss S, Glynn RJ, Tsai EH, Avorn J, Solomon DH. Adjusting for unmeasured confounders in pharmacoepidemiologic claims data using external information. Epidemiology. 2005;16:17–24.
Greenland S. An introduction to instrumental variables for epidemiologists. Int J Epidemiol. 2000;29:722–9.
Hernan MA, Robins JM. Instruments for causal inference. An epidemiologists dream? Epidemiology. 2006;17:360–72.
Earle CC, Tsai JS, Gelber RD, Weinstein MC, Neumann PJ, Weeks JC. Effectiveness of chemotherapy for advanced lung cancer in the elderly: instrumental variable and propensity analysis. J Clin Oncol. 2001;19:1064–70.
Cavelaars AEJM, Kunst AE, Geurts JJM, Crialesi R, Grotvedt L, Helmert U, et al. Educational differences in smoking: international comparison. Br Med J. 2000;320:1102–7.
Groth MV, Fagt S, Brondsted L. Social determinants of dietary habits in Denmark. Eur J Clin Nutr. 2001;55:959–66.
Schneeweiss S, Maclure M. Use of comorbidity scores for control of confounding in studies using administrative databases. Int J Epidemiol. 2000;29:891–8.
Charlson ME, Pompei P, Ales KL, MacKenzie CR. A new method of classifying prognostic comorbidity in longitudinal studies: development and validation. J Chronic Dis. 1987;40:373–83.
Deyo RA, Cherkin DC, Ciol MA. Adapting a clinical comorbidity index for use with ICD-9-CM administrative databases. J Clin Epidemiol. 1992;45:613–9.
Ghali WA, Hall RE, Rosen AK, Ash AS, Moskowitz MA. Searching for an improved clinical comorbidity index for use with ICD-9-CM administrative data. J Clin Epidemiol. 1996;49:273–8.
Clark DO, VonKorff M, Saunders K, Baluch WM, Simon GE. A chronic disease score with empirically derived weights. Med Care. 1995;33:783–95.
Greenland S. Basic methods for sensitivity analysis of biases. Int J Epidemiol. 1996;25:1107–16.
Groenwold RHH, Nelson DB, Nichol KL, Hoes AW, Hak E. Sensitivity analyses to estimate the potential impact of unmeasured confounding in causal research. Int J Epidemiol. 2010;39:107–17.
Rothman KJ. Epidemiology—an introduction. Oxford: Oxford University Press; 2002.
Jaro MA. Probabilistic linkage of large public-health data files. Stat Med. 1995;14:491–8.
Dean JM, Vernon DD, Cook L, Nechodom P, Reading J, Suruda A. Probabilistic linkage of computerized ambulance and inpatient hospital discharge records: a potential tool for evaluation of emergency medical services. Ann Emerg Med. 2001;37:616–26.
Victor TW, Mera RM. Record linkage of health care insurance claims. J Am Med Inform Assoc. 2001;8:281–8.
Kripke DF, Langer RD, Kline LE. Hypnotics’ association with mortality or cancer: a matched cohort study. BMJ Open. 2012;2:e000850.
Hommel K, Rasmussen S, Madsen M, Kamper AL. The Danish Registry on regular dialysis and transplantation: completeness and validity of incident patient registration. Nephrol Dial Transplant. 2010;25:947–51.
Lynge E, Sandegaard JL, Rebolj M. The Danish National Patient Register. Scand J Public Health. 2011;39(7 Suppl):30–3.
Almdal TP, Sorensen TI. Incidence of parenchymal liver diseases in Denmark, 1981 to 1985: analysis of hospitalization registry data. The Danish Association for the Study of the Liver. Hepatology. 1991;13:650–5.
Bernillon P, Lievre L, Pillonel J, Laporte A, Costagliola D. Record-linkage between two anonymous databases for a capture-recapture estimation of underreporting of AIDS cases: France 1990–1993. The Clinical Epidemiology Group from Centres d’Information et de Soins de l’Immunodeficience Humaine. Int J Epidemiol. 2000;29:168–74.
Thomas AM, Thygerson SM, Merrill RM, Cook LJ. Identifying work-related motor vehicle crashes in multiple databases. Traffic Inj Prev. 2012;13:348–54.
Patterson CC, Gyurus E, Rosenbauer J, Cinek O, Neu A, Schober E, et al. Trends in childhood type 1 diabetes incidence in Europe during 1989–2008: evidence of non-uniformity over time in rates of increase. Diabetologia. 2012;55:2142–7.
McDonald TL, Amstrup SC. Estimation of population size using open capture-recapture models. J Agric Biol Environ Stat. 2001;6:206–20.
Devantier A, Kjer JJ. The national patient register—a research tool? Ugeskr Laeger. 1991;153:516–7.
Christensen J, Vestergaard M, Olsen J, Sidenius P. Validation of epilepsy diagnoses in the Danish National Hospital Register. Epilepsy Res. 2007;75:162–70.
Krarup LH, Boysen G, Janjua H, Prescott E, Truelsen T. Validity of stroke diagnoses in a National Register of Patients. Neuroepidemiology. 2007;28:150–4.
Djurhuus BD, Skytthe A, Faber CE. Validation of the cholesteatoma diagnosis in the Danish National Hospital Register. Dan Med Bull. 2010;57:A4159.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Thygesen, L.C., Ersbøll, A.K. When the entire population is the sample: strengths and limitations in register-based epidemiology. Eur J Epidemiol 29, 551–558 (2014). https://doi.org/10.1007/s10654-013-9873-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10654-013-9873-0