Skip to main content
Published Online:https://doi.org/10.1027/1614-2241.4.2.73

The Likert-type format is one of the most widely used in all types of scales in the field of social sciences. Nevertheless, there is no definitive agreement on the number of response categories that optimizes the psychometric properties of the scales. The aim of the present work is to determine in a systematic fashion the number of response alternatives that maximizes the fundamental psychometric properties of a scale: reliability and validity. The study is carried out with data simulated using the Monte Carlo method. We simulate responses to 30 items with correlations between them ranging from 0.2 to 0.9. We also manipulate sample size, analyzing four different sizes: 50, 100, 200, and 500 cases. The number of response options employed ranges from two to nine. The results show that as the number of response alternatives increases, both reliability and validity improve. The optimum number of alternatives is between four and seven. With fewer than four alternatives the reliability and validity decrease, and from seven alternatives onwards psychometric properties of the scale scarcely increase further. Some applied implications of the results are discussed.

References

  • Aiken, L.R. (1983). Number of response categories and statistics on a teacher rating scale. Educational and Psychological Measurement, 43, 397–401. First citation in articleCrossrefGoogle Scholar

  • Alsawalmeh, Y.M. , Feldt, L.S. (1999). Testing the equality of two independent α coefficients adjusted by the Spearman-Brown formula. Applied Psychological Measurement, 23, 363–370. First citation in articleGoogle Scholar

  • Babakus, E. , Ferguson, C.E. , Jöreskog, K.G. (1987). The sensitivity of confirmatory maximum likelihood factor analysis to violations of measurement scale and distributional assumptions. Journal of Marketing Research, 24, 222–228. First citation in articleCrossrefGoogle Scholar

  • Bandalos, D.L. , Enders, C.K. (1996). The effect of nonnormality and number of response categories on reliability. Applied Measurement in Education, 9, 151–160. First citation in articleCrossrefGoogle Scholar

  • Bernstein, I.H. , Teng, H. (1989). Factoring items and factoring scales are different: Spurious evidence for multidimensionality due to item categorization. Psychological Bulletin, 76, 186–204. First citation in articleGoogle Scholar

  • Boote, A.S. (1981). Reliability testing of psychographic scales: Five-point or seven-point? Anchored or labeled?. Journal of Advertising Research, 21, 53–60. First citation in articleGoogle Scholar

  • Brown, G. , Widing, R.E. , Coulter, R.L. (1991). Customer evaluation of retail salespeople utilizing the SOCO scale: A replication, extension, and application. Journal of Academy Marketing Science, 9, 347–351. First citation in articleCrossrefGoogle Scholar

  • Cicchetti, D.V. , Showalter, D. , Tyrer, P.J. (1985). The effect of number of rating scale categories on levels of interrater reliability: A Monte Carlo investigation. Applied Psychological Measurement, 9, 31–36. First citation in articleCrossrefGoogle Scholar

  • Comrey, A.L. , Montang, I. (1982). Comparison of factor analytic results with two choice and seven choice personality item formats. Applied Psychological Measurement, 6, 285–289. First citation in articleCrossrefGoogle Scholar

  • DiStefano, C. (2002). The impact of categorization with confirmatory factor analysis. Structural Equation Modeling: A Multidisciplinary Journal, 9, 327–346. First citation in articleCrossrefGoogle Scholar

  • Dolan, C.V. (1994). Factor analysis of variables with 2, 3, 5 and 7 response categories: A comparison of categorical variable estimator using simulated data. British Journal of Mathematical and Statistical Psychology, 47, 309–326. First citation in articleCrossrefGoogle Scholar

  • Feldt, L.S. , Seonghoon, K. (2006). Testing the difference between two α coefficients with small samples of subjects and raters. Educational and Psychological Measurement, 66, 589–600. First citation in articleCrossrefGoogle Scholar

  • Ferrando, P.J. (1995). Equivalencia entre los formatos Likert y continuo en ítems de personalidad: un estudio empírico [Equivalence between the Likert and continuous format in personality items: An empirical study]. Psicológica, 16, 417–428. First citation in articleGoogle Scholar

  • Ferrando, P.J. (2000). Testing the equivalence among different item response formats in personality measurement: A structural equation modeling approach. Structural Equation Modeling, 7(2), 271–286. First citation in articleCrossrefGoogle Scholar

  • Green, S.B. , Akey, T.M. , Fleming, K.K. , Hershberger, S.L. , Marquis, J.G. (1997). Effect of the number of the scale points on χ² fit indices in confirmatory factor analysis. Structural Equation Modeling: A Multidisciplinary Journal, 4, 108–120. First citation in articleCrossrefGoogle Scholar

  • Hakstian, A.R. , Whalen, T.E. (1976). A K-sample significance test for independent α coefficients. Psychometrika, 41, 219–231. First citation in articleCrossrefGoogle Scholar

  • Hernández, A. , Muñiz, J. , García-Cueto, E. (2000). Comportamiento del modelo de respuesta graduada en función del número de categorías de la escala [Graded model trend based on the number of the response categories of the scale]. Psicothema, Suppl. 2, 288–291. First citation in articleGoogle Scholar

  • Hutchinson, S.R. , Olmos, A. (1998). Behaviour of descriptive fit indexes in confirmatory factor analysis using ordered categorical data. Structural Equation Modeling: A Multidisciplinary Journal, 5, 344–364. First citation in articleCrossrefGoogle Scholar

  • Jenkings, C.D. , Taber, T.A. (1977). A Monte Carlo study of factors affecting three indices of composite scale reliability. Journal of Applied Psychology, 62, 392–398. First citation in articleCrossrefGoogle Scholar

  • Jöreskog, K.G. , Sörbom, D. (1993). PRELIS 2 user’s reference guide. Chicago: Scientific Software. First citation in articleGoogle Scholar

  • King, L.A. , King, D. , Klockars, A.J. (1983). Dichotomous and multipoint scales using bipolar adjectives. Applied Psychological Measurement, 7, 173–180. First citation in articleCrossrefGoogle Scholar

  • Laveault, D. , Grégoire, J. (1997). Introduction aux théories des tests en sciences humaines [Introduction to the test theory in the human sciences]. Bruxelles: De Boeck Université. First citation in articleGoogle Scholar

  • Likert, R. (1932). A technique for the measurement of attitudes. Archives of Psychology, 140, 44–53. First citation in articleGoogle Scholar

  • Lissitz, R.W. , Green, S.B. (1.975). Effects of the number of scale points on reliability: A Monte Carlo approach. Journal of Applied Psychology, 60, 10–13. First citation in articleCrossrefGoogle Scholar

  • MacCallum, R.C. , Zhang, S. , Preacher, K.J. , Rucker, D.D. (2002). On the practice of dichotomization of quantitative variables. Psychological Methods, 7, 19–40. First citation in articleCrossrefGoogle Scholar

  • McCallum, L.S. , Keith, B.R. , Wiebe, D.T. (1988). Comparison of response formats for multidimensional health locus of control scales: Six levels versus two levels. Journal of Personality Assessment, 52, 732–736. First citation in articleCrossrefGoogle Scholar

  • McKelvie, S.J. (1978). Graphic rating scales. How many categories? British Journal of Psychology, 69, 185–202. First citation in articleCrossrefGoogle Scholar

  • Moreno, R. , Martínez, R.J. , Muñiz, J. (2004). Directrices para la construcción de ítems de elección múltiple [A guide to the multiple choice items development]. Psicothema, 16, 490–497. First citation in articleGoogle Scholar

  • Muñiz, J. (2003). Teoría clásica de los tests [Classical test theory]. Madrid: Pirámide. First citation in articleGoogle Scholar

  • Muñiz, J. , García-Cueto, E. , Lozano, L.M. (2000). The influence of the number of categories of the items on the psychometric properties of the scale. Paper presented at the XXVII International Congress of Psychology, Stockholm, Sweden. First citation in articleGoogle Scholar

  • Muñiz, J. , García-Cueto, E. , Lozano, L.M. (2005). Item format and the psychometric properties of the Eysenck Personality Questionnaire. Personality and Individual Differences, 38, 61–69. First citation in articleCrossrefGoogle Scholar

  • Neumann, L. (1979). Effects of categorization on relationships in bivariate distributions and applications to rating scales. Dissertation Abstracts International, 40, 2262-B. First citation in articleGoogle Scholar

  • Nunnally, J.C. (1970). Psychometric theory. New York: McGraw-Hill. First citation in articleGoogle Scholar

  • Oswald, I. , Velicer, W.F. (1989). Item format and the structure of the EPI: A replication. Journal of Personality Assessment, 44, 283–288. First citation in articleCrossrefGoogle Scholar

  • Ramsay, J.O. (1973). Effects of number of categories in rating scales on precision of estimation of scale values. Psychometrika, 38, 513–532. First citation in articleCrossrefGoogle Scholar

  • Sancerni, M.D. , Meliá, J.L. , González-Romá, V. (1990). Formato de respuesta, fiabilidad y validez en la medición de conflicto de rol [Answer format, reliability, and validity in the roll conflict measurement]. Psicológica, 11, 167–175. First citation in articleGoogle Scholar

  • Tomas, J.M. , Oliver, A. (1998). Efectos de formato de respuesta y método de estimación en el análisis factorial confirmatorio [Effect of the format answer and estimation method in the confirmatory factor analysis]. Psicothema, 10, 197–208. First citation in articleGoogle Scholar

  • Velicer, W. , DiClemente, C.C. , Corriveau, D.P. (1984). Item format and the structure of the personal orientation inventory. Applied Psychological Measurement, 8, 409–419. First citation in articleCrossrefGoogle Scholar

  • Weng, L.J. (2004). Impact of the number of response categories and anchor labels on coefficient α test-retest reliability. Educational and Psychological Measurement, 64, 956–972. First citation in articleCrossrefGoogle Scholar