Alderson, C. (2000). Assessing reading. Cambridge: Cambridge University Press.
Bachman, L. (1990). Fundamental considerations in language testing. Oxford: Oxford University Press.
Born, M., & Lynn, R. (1994). Sex differences on the Dutch WISC-R: A comparison with the USA and Scotland Educational Psychology, 14(2), 249–255.
Breland, H. M., Bridgeman, B., & Fowles, M. E. (1999). Writing assessments in admission to higher education: Review and framework (Report No. 99-3). New York, NY: College Entrance Examination Board.
Buck, G. (2001). Assessing listening. Cambridge: Cambridge University Press.
Carleton University. (2009). CAEL test score and users’ guide . Ottawa, Canada: Author. Retrieved from http://www.cael.ca/edu/testuserguide.shtml
Carlton, S. T., & Harris, A. M. ( 1992). Characteristics associated with differential item functioning on the scholastic aptitude test: gender and majority/minority group comparisons. ETS Research Report, 92–64. Princeton, NJ: ETS.
Chen, Z., & Henning, G. (1985). Linguistic and cultural bias in language proficiency tests. Language Testing, 2(2), 155-163.
Cheng, H. f. (2004). A comparison of multiple-choice and open-ended response formats for the assessment of listening proficiency in English. Foreign Language Annals, 37(4), 544-553.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates.
Cole, N. S. (1997). The ETS gender study: how females and males perform in educational setting. Princeton, NJ: Educational Testing Service.
Dávid, G. (2007). Investigating the performance of alternative types of grammar items. Language Testing, 24(1), 65-97.
Educational Testing Service. (2007). Test and score data summary for TOEFL Internet-based test: September 2005-December 2006 test data. Princeton, NJ. Retrieved from www.ets.org/toefl.
Farhady, H. (1982). Measures of language proficiency from the learner’s perspective. TESOL Quarterly, 16(1), 43-59.
In'nami, Y., & Koizumi, R. (2009). A meta-analysis of test format effects on reading and listening test performance: Focus on multiple-choice and open-ended formats. Language Testing, 26(2), 219-244.
James, C. L. (2010). Do language proficiency test scores differ by gender? TESOL Quarterly, 44(2), 387-398.
Johnson, J. S., & Song, T. (2008). MELAB 2007 descriptive statistics and reliability estimates. Ann Arbor, MI: English Language Institute, University of Michigan.
Kobayashi, M. (2002). Method effects on reading comprehension test performance: text organization and response format. Language Testing, 19(2), 193-220.
Kunnan, A. J. (1990). DIF in native language and gender groups in an esl placement test. TESOL Quarterly, 24(4), 741-746.
Lawrence, I. M., & Curley, W. E. (1989). Differential Item Functioning for males and females on SAT-Verbal Reading subscore items: follow-up study.: Princeton, NJ: Educational Testing Service.
Lawrence, I. M., Curley, W. E., & Hale, F. J. M. (1988). Differential item functioning for males and females on SAT verbal reading subscore items . Report No. 88–4. New York: College Entrance Examination Board.
Lumley, T., & O’Sullivan, B. (2005). The effect of test-taker gender, audience and topic on task performance in tape-mediated assessment of speaking. Language Testing, 22(4), 415-437.
Lynn, R., & Dai, X. Y. (1993). Sex differences on the Chinese standardiz-ation sample of the WAIS-R. Journal of Genetic Psychology, 154 (4), 459–464.
Pae, T.-I. (2004). DIF for examinees with different academic backgrounds. Language Testing, 21(1), 53-73.
Rodriguez, M. C. (2003). Construct equivalence of multiple-choice and constructed-response items: a random effects synthesis of correlations. Journal of Educational Measurement, 40(2), 163-184.
Shohamy, E. (1984). Does the testing method make a difference? The case of reading comprehension. Language Testing, 1(2), 147-170.
Takala, S., & Kaftandjieva, F. (2000). Test fairness: a DIF analysis of an L2 vocabulary test. Language Testing, 17(3), 323-340.
Tsagari, C. (1994). Method effects on testing reading comprehension: How far can we go? . Unpublished MA thesis, University of Lancaster, UK.
Wolf, D. F. (1993). A comparison of assessment tasks used to measure fl reading comprehension. The Modern Language Journal, 77(4), 473-489.