Raters’ Perception and Expertise in Evaluating Second Language Compositions

Document Type: Research Paper


Zanjan Branch, Islamic Azad University, Zanjan, Iran


The consideration of rater training is very important in construct validation of a writing test because it is through training that raters are adapted to the use of students’ writing ability instead of their own criteria for assessing compositions (Charney, 1984). However, although training has been discussed in the literature of writing assessment, there is little research regarding raters’ perceptions and understandings of the training program. Although a few studies have looked at the differences between trained and untrained raters in writing assessment (Cumming, 1990; Huot, 1990), few studies have used a pre-and post-training design. The purpose of this study is to investigate the effectiveness of the training program on experienced and inexperienced raters with regard to a pre-and post- training design. Twelve EFL raters scored 45 pre-rated benchmark essay compositions by an authorized IELTS trainer. These essay compositions were scored before, during and after the training program. The results regarding the comparison across raters showed that inexperienced raters had wider range of inconsistency before training but they became more consistent than experienced raters after training.


Bachman, L. F. (1990). Fundamental considerations in language testing. Oxford: Oxford University Press.

Breland, H. M., & Jones, R. J. (1984).Perceptions of writing skills.Written Communication, 1(1), 101-119.

Charney, D. (1984). The validity of using holistic scoring to evaluate writing: A critical overview. Research in the Teaching of English, 18, 65-81.

Cumming, A. (1990). Expertise in evaluating second language compositions.Language Testing, 7, 31-51.

Diederich, P. B., French, J. E., & Carlton, S. T. (1998).Factors in judgments of writing ability.Educational Testing Service.Priceton, Nj.

Elder, C., Barkhuizen, G., Knoch, U., &Randow, J. (2007).Evaluating rater responses to an online training program for L2 writing assessment.Language Testing, 24(1), 37-64.

Hamilton, J., Reddel, S., & Spratt, M. (2001). Teachers’ perception of online rater training and monitoring .System, 29, 505-520.

Huot, B. (1990). In reliability, validity, and holistic scoring: What we know and what we need to know. College Composition and Communication, 41, 201-213.

Knoch, U., Read, J., &Randow, J. V. (2007). Re-training writing raters online: How does it compare with face-to-face training? Assessing Writing, 12, 26-43.

Linacre, J. M. (1989). Many-faceted Rasch measurement. Chicago, IL: MESA Press.

Reed, D. J. & Cohen, A. D. (2001). Revising raters and ratings in oral language assessment. In C. Elder, A. Brown, E. Grove, K. Hill, N. Iwashita, T. Lumley, T. McNamara, & K, O’Loughlin (Eds.), Experimenting with uncertainty: Essays in honor of Allan Davies. Cambridge: Cambridge University Press.

Ruth, L. & Murphy, S. (1988). Designing writing tasks for the assessment of writing. Norwood, NJ: Ablex Publishing Corp.

Shohamy, E., Gordon, C. M., & Kramer, R. (1992).The effects of raters’ backgrounds and training on the reliability of direct writing tests.Modern Language Journal, 76(1), 27-33.

Weigle, S. C. (1994a). Effects of training on raters of ESL compositions.Language Testing, 11, 197-223.

Weigle, S. C. (1994b). Effects of training on raters of English as a second language compositions: Qualitative and quantitative approaches. Unpublished Ph.D. dissertation, University of California, Los Angeles.

Wigglesworth, G. (1993). Exploring bias analysis as a tool for improving rater consistency in assessing oral interaction.Language Testing, 10, 305-23.