Application of Artificial Intelligence for Test Generation and Test Evaluation in Educational Assessments

Principal investigator

PD Dr. Rudolf Debelak

The assessment of reading and writing skills is essential in school, educational and work settings. Unfortunately, traditional methods, such as the creation of suitable task material or the grading of essays by humans, are time-consuming and expensive. In this project, we are investigating ways of using artificial intelligence to support professionals in the assessment of language skills in order to reduce the associated costs and workload without compromising the quality of the language tests together with the Institut für Bildungsevaluation Zürich AG, Prof. Dr. Martin Tomasik and other university and industry partners.

To achieve this goal, technologies from the field of language processing, such as large language models (LLMs), are combined with modern psychometric methods to develop new models and software at the interface of machine learning, psychometrics and performance diagnostics. The methods developed in this way are to be used for several tasks, including:

Item Generation: Generating new items to assess reading skills is time-consuming and cost-intensive. This is particularly true because these tasks are not only based on content specifications, but must also meet high psychometric standards, particularly in terms of measurement accuracy, fairness and validity. A central goal of this project is the development of methods to quickly generate large quantities of item of high psychometric quality.
Computer-assisted grading of essays: Essays provide evidence of language mastery and aspects of verbal intelligence, but they also require critical thinking and creative skills. The models developed in this project aim to allow the computer-assisted grading of essays in different languages. This grading can be used by teachers and other professionals to support their work and save valuable time.
Computer-assisted feedback: The results of language skill assessments can be summarized using artificial intelligence in order to provide individualized feedback on strengths and development potential, which supports teachers. In this way, learners who need special support can also be identified. Subsequently, such methods should also enable forms of personalized computer-assisted learning, from which students benefit.

Our work aims to innovate in creating teaching materials and assessments, in personalized language learning and in supporting accessibility and inclusion for students. The methods developed in this project also have a wide range of potential applications beyond educational research, for example in the field of psychological diagnostics, human-computer interaction and interpretable machine learning.

Quicklinks and available languages

Main navigation

Application of Artificial Intelligence for Test Generation and Test Evaluation in Educational Assessments