Skip to main content

Advertisement

Log in

Rasch Trees: A New Method for Detecting Differential Item Functioning in the Rasch Model

  • Published:
Psychometrika Aims and scope Submit manuscript

Abstract

A variety of statistical methods have been suggested for detecting differential item functioning (DIF) in the Rasch model. Most of these methods are designed for the comparison of pre-specified focal and reference groups, such as males and females. Latent class approaches, on the other hand, allow the detection of previously unknown groups exhibiting DIF. However, this approach provides no straightforward interpretation of the groups with respect to person characteristics. Here, we propose a new method for DIF detection based on model-based recursive partitioning that can be considered as a compromise between those two extremes. With this approach it is possible to detect groups of subjects exhibiting DIF, which are not pre-specified, but result from combinations of observed covariates. These groups are directly interpretable and can thus help generate hypotheses about the psychological sources of DIF. The statistical background and construction of the new method are introduced by means of an instructive example, and extensive simulation studies are presented to support and illustrate the statistical properties of the method, which is then applied to empirical data from a general knowledge quiz. A software implementation of the method is freely available in the R system for statistical computing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Figure 1.
Figure 2.
Figure 3.
Figure 4.
Figure 5.

Similar content being viewed by others

Notes

  1. At the time the quiz was conducted, Croatia was not yet a member of the EU.

References

  • Andersen, E. (1972). A goodness of fit test for the Rasch model. Psychometrika, 38, 123–140.

    Article  Google Scholar 

  • Andrews, D.W.K. (1993). Tests for parameter instability and structural change with unknown change point. Econometrica, 61, 821–856.

    Article  Google Scholar 

  • Ben-Shakhar, G., & Sinai, Y. (1991). Gender differences in multiple-choice tests: the role of differential guessing tendencies. Journal of Educational Measurement, 28(1), 23–35.

    Article  Google Scholar 

  • Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. Lord & M. Novick (Eds.), Statistical theories of mental test scores, Reading: Addison-Wesley.

    Google Scholar 

  • Boulesteix, A.L. (2006). Maximally selected chi-square statistics and binary splits of nominal variables. Biometrical Journal, 48(5), 838–848.

    Article  PubMed  Google Scholar 

  • Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1984). Classification and regression trees. New York: Chapman and Hall.

    Google Scholar 

  • Cohen, A., & Bolt, D. (2005). A mixture model analysis of differential item functioning. Journal of Educational Measurement, 42(3), 133–148.

    Article  Google Scholar 

  • Dobra, A., & Gehrke, J. (2001). Bias correction in classification tree construction. In C.E. Brodley & A.P. Danyluk (Eds.), Proceedings of the seventeenth international conference on machine learning (ICML 2001), Williams College, Williamstown, MA, USA (pp. 90–97). San Mateo: Morgan Kaufmann.

    Google Scholar 

  • Fischer, G. & Molenaar, I. (Eds.) (1995). Rasch models: foundations, recent developments and applications. New York: Springer.

    Google Scholar 

  • Fraley, C., & Raftery, A. (2002). Model-based clustering, discriminant analysis and density estimation. Journal of the American Statistical Association, 97(458), 611–631.

    Article  Google Scholar 

  • Fraley, C., & Raftery, A. (2012). mclust: Model-based clustering/Normal mixture modeling. R package version 3.4.11. http://CRAN.R-project.org/package=mclust.

  • Gelin, M., Carleton, B., Smith, M., & Zumbo, B. (2004). The dimensionality and gender differential item functioning of the mini asthma quality of life questionnaire (MiniAQLQ). Social Indicators Research, 68, 91–105.

    Article  Google Scholar 

  • Gustafsson, J. (1980). Testing and obtaining fit of data in the Rasch model. British Journal of Mathematical & Statistical Psychology, 33(2), 205–233.

    Article  Google Scholar 

  • Hancock, G. & Samuelsen, K. (Eds.) (2007). Advances in latent variable mixture models. Charlotte: Information Age.

    Google Scholar 

  • Hochberg, Y. & Tamhane, A. (Eds.) (1987). Multiple comparison procedures. New York: Wiley.

    Google Scholar 

  • Hothorn, T., Hornik, K., & Zeileis, A. (2006). Unbiased recursive partitioning: a conditional inference framework. Journal of Computational and Graphical Statistics, 15(3), 651–674.

    Article  Google Scholar 

  • Hothorn, T., & Lausen, B. (2003). On the exact distribution of maximally selected rank statistics. Computational Statistics & Data Analysis, 43(2), 121–137.

    Article  Google Scholar 

  • Hothorn, T., & Zeileis, A. (2008). Generalized maximally selected statistics. Biometrics, 64(4), 1263–1269.

    Article  PubMed  Google Scholar 

  • Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218.

    Article  Google Scholar 

  • Kelderman, H., & MacReady, G. (1990). The use of loglinear models for assessing differential item functioning across manifest and latent examinee groups. Journal of Educational Measurement, 27(4), 307–327.

    Article  Google Scholar 

  • Koziol, J. (1991). On maximally selected chi-square statistics. Biometrics, 47(4), 1557–1561.

    Article  Google Scholar 

  • Liou, M. (1994). More on the computation of higher-order derivatives on the elementary symmetric functions in the Rasch model. Applied Psychological Measurement, 18(1), 53–62.

    Article  Google Scholar 

  • Maij-de Meij, A., Kelderman, H., & Van der Flier, H. (2008). Fitting a mixture item response theory model to personality questionnaire data: characterizing latent classes and investigating possibilities for improving prediction. Applied Psychological Measurement, 32(8), 611–631.

    Article  Google Scholar 

  • Mair, P., & Hatzinger, R. (2007). Extended Rasch modeling: the eRm package for the application of IRT models in R. Journal of Statistical Software, 20, 9. http://www.jstatsoft.org/v20/i09/.

    Google Scholar 

  • Mair, P., Hatzinger, R., & Maier, M. (2012). eRm: extended Rasch modeling. R package version 0.15-0. http://CRAN.R-project.org/package=eRm.

  • Marcus, R., Peritz, E., & Gabriel, K. (1976). Closed testing procedures with special reference to ordered analysis of variance. Biometrika, 63(3), 655–660.

    Article  Google Scholar 

  • Masters, G. (1982). A Rasch model for partial credit scoring. Psychometrika, 47(2), 149–174.

    Article  Google Scholar 

  • Merkle, E.C., Fan, J., & Zeileis, A. (2013). Testing for measurement invariance with respect to an ordinal variable. Psychometrika, forthcoming.

  • Merkle, E.C., & Zeileis, A. (2013). Tests of measurement invariance without subgroups: a generalization of classical methods. Psychometrika, 78(1), 59–82.

    Article  PubMed  Google Scholar 

  • Miller, R., & Siegmund, D. (1982). Maximally selected chi square statistics. Biometrics, 38(4), 1011–1016.

    Article  Google Scholar 

  • Milligan, G., & Cooper, M. (1986). A study of the comparability of external criteria for hierarchical cluster analysis. Multivariate Behavioral Research, 21(4), 441–458.

    Article  Google Scholar 

  • Mislevy, R., & Verhelst, N. (1990). Modeling item responses when different subjects employ different solution strategies. Psychometrika, 55(2), 195–215.

    Article  Google Scholar 

  • Pedraza, O., Graff-Radford, N., Smith, G., Ivnik, R., Willis, F., Petersen, R., & Lucas, J. (2009). Differential item functioning of the Boston Naming Test in cognitively normal African American and Caucasian older adults. Journal of the International Neuropsychological Society, 15(05), 758–768.

    Article  PubMed Central  PubMed  Google Scholar 

  • Penfield, D. (2007). Assessing differential step functioning in polytomous items using a common odds ratio estimator. Journal of Educational Measurement, 44(3), 187–210.

    Article  Google Scholar 

  • Penfield, D., Alvarez, K., & Lee, O. (2009). Using a taxonomy of differential step functioning to improve the interpretation of DIF in polytomous items: an illustration. Applied Measurement in Education, 22(1), 61–78.

    Article  Google Scholar 

  • Perkins, A., Stump, T., Monahan, P., & McHorney, C. (2006). Assessment of differential item functioning for demographic comparisons in the MOS SF-36 health survey. Quality of Life Research, 15, 331–348.

    Article  PubMed  Google Scholar 

  • R Development Core Team (2012). R: a language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. ISBN 3-900051-07-0. http://www.R-project.org/.

    Google Scholar 

  • Rijmen, F., Tuerlinckx, F., De Boeck, P., & Kuppens, P. (2003). A nonlinear mixed model framework for item response theory. Psychological Methods, 8(2), 185–205.

    Article  PubMed  Google Scholar 

  • Rizopoulos, D. (2006). ltm: an R package for latent variable modeling and item response analysis. Journal of Statistical Software, 17, 5. http://www.jstatsoft.org/v17/i05/.

    Google Scholar 

  • Rizopoulos, D. (2012). ltm: latent trait models under IRT. R package version 0.9-9. http://CRAN.R-project.org/package=ltm.

  • Rost, J. (1990). Rasch models in latent classes: an integration of two approaches to item analysis. Applied Psychological Measurement, 14(3), 271–282.

    Article  Google Scholar 

  • Shih, Y.S. (2004). A note on split selection bias in classification trees. Computational Statistics & Data Analysis, 45(3), 457–466.

    Article  Google Scholar 

  • Smit, J., Kelderman, H., & Van der Flier, H. (2000). The mixed Birnbaum model: estimation using collateral information. Methods of Psychological Research Online, 5, 1–13.

    Google Scholar 

  • Strobl, C., Boulesteix, A.L., & Augustin, T. (2007). Unbiased split selection for classification trees based on the Gini index. Computational Statistics & Data Analysis, 52(1), 483–501.

    Article  Google Scholar 

  • Strobl, C., Malley, J., & Tutz, G. (2009). An introduction to recursive partitioning: rationale, application and characteristics of classification and regression trees, bagging and random forests. Psychological Methods, 14(4), 323–348.

    Article  PubMed Central  PubMed  Google Scholar 

  • Strobl, C., Wickelmaier, F., & Zeileis, A. (2011). Accounting for individual differences in Bradley–Terry models by means of recursive partitioning. Journal of Educational and Behavioral Statistics, 36(2), 135–153.

    Google Scholar 

  • Trepte, S. & Verbeet, M. (Eds.) (2010). Allgemeinbildung in Deutschland—Erkenntnisse aus dem SPIEGEL Studentenpisa-Test. Wiesbaden: VS Verlag.

    Google Scholar 

  • Van den Noortgate, W., & De Boeck, P. (2005). Assessing and explaining differential item functioning using logistic mixed models. Journal of Educational and Behavioral Statistics, 30(4), 443–464.

    Article  Google Scholar 

  • Westers, P., & Kelderman, H. (1992). Examining differential item functioning due to item difficulty and alternative attractiveness. Psychometrika, 57(1), 107–118.

    Article  Google Scholar 

  • Woods, C., Oltmanns, T., & Turkheimer, E. (2009). Illustration of MIMIC-model DIF testing with the schedule for nonadaptive and adaptive personality. Journal of Psychopathology and Behavioral Assessment, 31, 320–330.

    Article  PubMed Central  PubMed  Google Scholar 

  • Zeileis, A., & Hornik, K. (2007). Generalized m-fluctuation tests for parameter instability. Statistica Neerlandica, 61(4), 488–508.

    Article  Google Scholar 

  • Zeileis, A., Hothorn, T., & Hornik, K. (2008). Model-based recursive partitioning. Journal of Computational and Graphical Statistics, 17(2), 492–514.

    Article  Google Scholar 

  • Zeileis, A., Strobl, C., Wickelmaier, F., & Kopf, J. (2012). psychotree: recursive partitioning based on psychometric models. R package version 0.12-2. http://CRAN.R-project.org/package=psychotree.

Download references

Acknowledgements

Julia Kopf is supported by the German Federal Ministry of Education and Research (BMBF) within the project “Heterogeneity in IRT-Models” (grant ID 01JG1060).

The authors would like to thank three anonymous reviewers for their very helpful and constructive feedback.

Special thanks go to Reinhold Hatzinger (1953–2012), who stimulated important insights for this and other projects through many conversations and his extensive work on the R package eRm (Mair & Hatzinger, 2007; Mair et al., 2012). We miss him as a researcher and friend.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Carolin Strobl.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Strobl, C., Kopf, J. & Zeileis, A. Rasch Trees: A New Method for Detecting Differential Item Functioning in the Rasch Model. Psychometrika 80, 289–316 (2015). https://doi.org/10.1007/s11336-013-9388-3

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11336-013-9388-3

Key words

Navigation