Using response times in CAT

Kern, Justin L

Using response times in CAT

Kern, Justin L

Permalink

https://hdl.handle.net/2142/98159

Description

Title

Using response times in CAT

Author(s)

Kern, Justin L

Issue Date

2017-05-30

Director of Research (if dissertation) or Advisor (if thesis)

Chang, Hua-Hua

Doctoral Committee Chair(s)

Chang, Hua-Hua

Committee Member(s)

Anderson, Carolyn J.
Culpepper, Steven A.
Douglas, Jeffrey A.
Zhang, Jinming

Department of Study

Psychology

Discipline

Psychology

Degree Granting Institution

University of Illinois at Urbana-Champaign

Degree Name

Ph.D.

Degree Level

Dissertation

Keyword(s)

Computer adaptive testing
Response times
Item response theory
Item selection criteria
Bayesian estimation

Abstract

"Many areas of psychology and education place a high premium on measurement, using psychometric theory to measure constructs, such as cognitive ability, personality, and attitudes. Some of the more well-known measurement theories used are classical test theory (CTT), structural equation modeling (SEM), and item response theory (IRT). For the practical test construction needs of psychology and education, IRT is the most heavily used, and has been ever since Lord and Novick (1968) published their book, Statistical Theories of Mental Test Scores. One of the biggest advances in IRT has been the advent of computerized adaptive testing (CAT). First introduced as tailored tests by Lord (1980), CATs have increasingly gained in popularity as the cost of computation has gone down. As suggested by the term ""tailored tests,"" every person who takes an adaptive test takes a test form unique to the person. The test is constructed item-by-item by matching items' diﬃculty levels to the ability level of that particular person. The promise of CAT is that by constructing a test in this way items that do not contribute much to the overall eﬀectiveness of the measurement are left out, which can shorten the test substantially while still maintaining a high level of measurement accuracy. The eﬃciency of CAT has not gone unnoticed. The Armed Services Vocational Aptitude Test Battery (ASVAB), which is used for measuring vocationally relevant abilities was originally introduced as a paper-and-pencil test in 1968, and became operational as a CAT in 1996; the ASVAB was the ﬁrst large-scale, high-stakes operational CAT. Numerous adaptive tests have gone into operational use for use in selection including the Graduate Management Admission Test (GMAT), the Adjustable Competence Evaluation (ACE), the Business Language Testing Service (BULATS) Computer Test, the IBM Selection Tests, among others. Additionally, many licensure exams currently in use—including the Uniform CPA Examination (for certiﬁed personal accountants), and the National Council Licensure Examinations (for nurses)—are adaptive. Furthermore, the recently signed Every Student Succeeds Act has recommended a greater use of adaptive testing in the American educational system, allowing states to develop and administer CATs. Computer-based tests, such as CATs, allow for easy collection of response times. With the abundance of essentially-free data, methods and applications for using response time data have become en vogue, though they are still in their infant stage. As such, no large-scale assessments are currently using response times as an active part of the test. Because the data is essentially-free, it is reasonable to believe that their use is simply the next step in the evolution of computer-based tests. Indeed, it only seems natural that CATs be modiﬁed to take advantage of response time information, especially since it is well-known that response accuracy and response time are related (Sternberg, 1999). Some applications include cheating detection (van der Linden, 2009a), shortening the time needed to take a test (Choe & Kern, 2014; Fan, Wang, Chang, & Douglas, 2012), and item selection (van der Linden, 2008). The goal of this dissertation is to introduce CAT and some of the current issues surrounding its use, to introduce response times in measurement, and several new methods for using response times in adaptive testing. In the ﬁrst chapter, a quick review of IRT will be given, including its historical roots, its assumptions, and some examples of commonly used IRT models. Following this will be a brief overview of the basic components of CAT, including item selection, ability (or trait) estimation, and item constraints. I will then discuss response times in measurement and their current role in CAT. In the second chapter, I will describe an already-completed study on estimating person ability and speededness jointly. In Chapter 3, I investigate the eﬃcacy of using the MAP estimator developed in Chapter 2 when selecting items using a generalized time-weighted maximum information criterion (GMICT). In Chapter 4, I introduce a new item selection technique based on the ideas of Bayesian item selection that incorporates the response time model directly. A modiﬁed version of this criterion using the ideas from the GMICT is also investigated. In Chapter 5, I introduce a time-weighted Kullback-Leibler information technique and investigate its eﬀectiveness. Finally, I conclude in Chapter 6 with some remarks about how these techniques ﬁt in in the current literature on response times, scoring, and adaptive testing."

Graduation Semester

2017-08

Type of Resource

text

Permalink

http://hdl.handle.net/2142/98159

Copyright and License Information

Owning Collections

Dissertations and Theses - Psychology

Dissertations and Theses from the Dept. of Psychology

Graduate Dissertations and Theses at Illinois PRIMARY

Graduate Theses and Dissertations at Illinois

Using response times in CAT

Kern, Justin L

Permalink

Description

Owning Collections

Dissertations and Theses - Psychology

Graduate Dissertations and Theses at Illinois PRIMARY

Log In