Multiple-choice questions (MCQs) are commonly used in higher education assessment tasks because they can be easily and accurately scored, while giving good coverage of instructional content in a short time. However, studies that have evaluated the quality of MCQs used in higher education assessments have found many flawed items, resulting in misleading insights about student performance and contaminating important decisions. Thus, MCQs need to be evaluated statistically to ensure high-quality items are used as the basis of inference. This study evaluated the quality of 100 instructor-written MCQs used in an undergraduate midterm test (50 items) and final exam (50 items), making up 50% of the course grade, using the responses of 372 students enrolled in one first-year undergraduate general education course. Item difficulty, discrimination, and chance properties were determined using classical test theory and item response theory (IRT) statistical item analysis models. The two-parameter logistic (2PL) model consistently had the best fit to the data. The impact on overall course grades between the original raw score model and the IRT 2PL model showed 70% of students would receive the same grade (i.e., D to A), but only one-third would get the same mark using the standard augmented grade scale (i.e., A+ to D−). The analyses show that higher education institutions need to ensure MCQs are evaluated before student grading decisions are made.
- Lượt xem: 4454