Testlet Effects on Standardized Log-likelihood Person Fit Index to Detect Aberrant Responses for the IRT Testlet Model
Metadata[+] Show full item record
[ACCESS RESTRICTED TO THE UNIVERSITY OF MISSOURI AT AUTHOR'S REQUEST.] The accuracy of test scores is a major concern in standardized testing for establishing test reliability and validity. Research evidence has suggested that aberrant response patterns on a test have serious implications for the accuracy of item calibration and ability estimation. Aberrant responses refer to examinees' test responses that are not based on their abilities, which are estimated by specified item response theory (IRT) models, but rather are affected by factors such as students' carelessness, cheating, guessing, or other sluggish behaviors on a test. In such cases, examinees respond to test questions in ways that violate the assumptions of the measurement IRT models to produce their test scores. Consequently, students' abilities are inaccurately measured. Existing research on aberrant response detection has focused on regular IRT models such as the Rasch model, two parameter logistic (2-PL) model, or three parameter logistic (3-PL) model. The purpose of this proposed study is to investigate the aberrant response detection using a testlet IRT model, which is commonly used in applied psychological measurement situation, and to explore how testlet effects influence Type I errors of the standardized log-likelihood person fit index to detect aberrant responses. Testlet effects are explored in two ways: 1) varied numbers of testlet items on tests; 2) varied magnitude of testlet variances on tests. Three aspects are explored in each of the simulation study: 1) Type I errors based on true abilities, ability estimates (posterior mode in this study), and Wainer's corrected ability estimates; 2) detection rates based on the above mentioned three methods; 3) model misspecification, that is, testlet items are misspecified as regular IRT items. Several important findings can be identified from the simulated studies: First, consistent with previous studies, inflated Type I errors for all simulated situations were observed when simulated true scores were used to calculate the standardized person fit index. Also, consistent with previous studies, Type I errors based on estimated abilities were conservative for most simulated conditions for testlet model in current research. Second, with regard to the influence of the number of testlet items on aberrant response detection, results indicated that the less number of testlet items, the better aberrant response detection rates at all simulated conditions while results can be improved if based on the corrected ability estimates for the testlet items. As for the influence of testlet variances on aberrant response detection, results showed that the smaller testlet variances on a test, the better detection rates at all simulated conditions. Corrected ability estimates performed well for aberrant response detection when large testlet variances presented on a test. Third, if testlet items were misspecified as regular IRT items and testlet effects were ignored, detection rates were higher for both simulated studies. However, detection rates were not affected if based on corrected ability estimates when the testlet items were misspecified as regular IRT items. It can be concluded that ability estimation errors influence the distribution of the standardized log-likelihood person fit statistic for the testlet model, and consequently, influence the aberrant response detection rates. The Wainer's ability correction method improved the performance of the standardized log-likelihood person fit index when testlet effects presented on a test. This study presented illustrative cases representing how testlet effects including the number of testlet items and the magnitude of testlet variances affect aberrant response detection. The results indicated that the testlet effects were not beneficial for ability estimation and aberrant response detection. More testlet items or larger magnitude of testlet variances included on a test may bring more bias in ability estimation, and consequently in aberrant response detection. Results of this study can provide insights into person misfit detection and the test design in practical situation. Applications of the results are discussed and future directions of aberrant response detection for testlet model to improve research quality are emphasized for applied psychologists.
Access is limited to the campuses of the University of Missouri.