Improving Scoring Consistency of Flight Performance through Inter-Rater Reliability Analyses

Main Article Content

Matthew V. Smith
Mary C. Niemczyk
William K. McCurry

Abstract

Students, as well as the other stake-holders of flight schools, must be sure that the scoring of flight performance is such that the scores are a meaningful indicator of the student’s performance rather than an
arbitrary indicator of the instructor’s perception. The scores should be somewhat consistent from one instructor to another. The apparent inconsistency in scoring from one instructor to another can be
examined by conducting inter-rater reliability (IRR) analyses. Inter-rater reliability measures the extent of agreement between two or more individual raters – it is used to measure the consistency of a scoring or
rating system, and those who use it. This foundational investigation was designed to assess inter-rater reliability between instructor pilots when observing 10 sample flights performed by student pilots. Results
of the study indicated that inter-rater reliability was low. Suggestions for improving the consistency of flight instructor scoring are discussed, as well as recommendations for future research.

Article Details

Section
Peer-Reviewed Articles