Detecting Heterogeneity in Complex IRT Models for Measuring Latent Traits

SNF project

Principal investigator   Prof. Dr. Carolin Strobl, Dr. Matthew Zeigenfuse and Dr. Rudolf Debelak
Staff   Thorben Huelmann
Duration of the project   February 1, 2016, to January 31 2019
Funded by   Swiss National Science Foundation (SNF)

Psychological properties, unlike physical properties of a person, cannot be measured directly. While a person’s height, for instance, can easily be measured by means of a tape measure, measuring skills or personality traits requires the construction of psychological tests or questionnaires. Such properties are thus referred to as latent (i.e. not directly observable). A latent property can reliably be inferred from the person’s responses – however, only if the test or questionnaire meets certain quality standards. Psychometrics, a science located at the intersection of psychology and statistics, is concerned with the mathematical description and examination of these quality standards.

The goal of the project is to develop new methods of quality management for a class of particularly flexible statistical models for the validation of psychological tests and questionnaires. Models of the so-called item response theory allow to make a fair comparison between persons, provided that the assumptions underlying the models are met – which is not always the case in practice. For instance, in a test constructed to measure mathematical ability, a math word problem may be more difficult to solve for a student who learned German as a second language than for a German native speaker, even though both students have the same mathematical ability. Such an item, which exhibits differential item functioning, leads to distorted test results and does not allow us to make a fair comparison between students. This project therefore aims to develop statistical methods that help identify items with differential item functioning as well as the affected groups of people in flexible models of item response theory. These methods are based on modern approaches taken from parametric statistics and machine learning. By means of these methods, problematic items can be excluded from the test or modified, based on the additional information about the affected groups of people.

Due to the freely available software implementation, the statistical instruments developed in this project can be used directly to validate existing or new psychological tests and questionnaires and to detect unfair items. It therefore allows for more reliable conclusions about the properties of individuals and fair comparisons between groups of people, e.g. in the field of empirical educational research.