Assessment in EAP: a continuously improved means to carelessly examined ends

I am really pleased that Jane Pearson (a PhD student at Nottingham and EAP Lecturer at Kings College) has provided a stimulating post on assessment. Comments very welcome!

I was struck recently by a quotation used in an article by George Madaus (1993) which seemed to me to sum up the current state of affairs in EAP assessment. Assessment is described in the article as “a continuously improved means to carelessly examined ends” (Merton 1964 p vi in Madaus, 1993), and although this is being used to refer to the state of mainstream education in the USA, it seems especially pertinent to our politically constrained and time restricted context. In my experience, EAP course test writers tend to be divided into three groups: those with test writing experience for large scale exam boards; those with an interest in testing but little experience other than in large scale testing administration and prep courses; and those who are interested in assessment innovations, keen to emphasise authentic testing over statistical reliability, but have no framework within which to work given the dearth of examples or research evidence.  The assessment culture of EAP departments seems to lean towards one of the three points on the triangle.
Is this problematic? According to Bachman and Palmer (2010), two common testing misconceptions are that ‘experts’ should be the ones to write tests and that there is one standard method or framework, a ‘one size fits all’ approach to assessment. This means that the first and second group are likely to appropriate the most familiar method of assessment onto their EAP courses, regardless of the pedagogical aims and outcomes of the course. The third group may have the best intentions at heart, but, without a rigorous design specification, test construction cycle and validation procedure in place, can actually end up doing more harm than good, with confusing tests that change on a regular basis and do not meet criteria for either appropriate summative or effective formative assessment.
However, all of these people make invaluable contributions of EAP assessment. Therefore, it may be important to take a step back from the design of tests to allow open dialogue regarding how exactly we are defining the constructs we are testing. Some questions which could be put on the table are:
1.    Are we testing achievement or proficiency? Paran (2010)’s book Testing The Untestable points to a narrowing of the assessment agenda to include only that which is measureable, but where do critical thinking, autonomy and intercultural competence fit into this? If we are teaching and emphasising these skills, is it appropriate or logical to not test them? If we are not teaching these, but only language proficiency, on which it has been indicated that pre sessional EAP has little effect (Green, 2005), then what are the benefits for students of our courses over taking (and retaking and retaking) the IELTS or TOEFL exams for direct university entry?

2.    Are we testing four skills or integrated academic literacies and discourses? As we know, a significant body of research ( Lea and Street, 1998; Zamel,1998; Lillis,2003) suggests that deficiencies in the latter are the main barrier to success for international students. Yet the shift to a means of testing which acknowledges the complexity of skills and literacies in academia, while still providing a score in the ‘four skills’,  can lead to unthoughtout  assessments which may have face validity but little else. How many of us test speaking using a presentation, mainly because it represents an authentic means of assessment in HE, without considering if it is a fair assessment of the linguistic construct of ‘speaking’? Students may spend weeks researching, planning and practising a critical presentation, only to receive a low score due to poor grammar, lexis or pronunciation which is weighted more heavily because, while paying lip service to authentic assessment, we are obliged to assess mainly language proficiency. On the flip side, students with a high level of proficiency may receive a lower grade than expected due to a poor lack of planning or evaluative skills. Again, if we are testing achievement of academic skills learned and applied, this is fair; if testing language proficiency, it may not be. Perhaps the answer to this is to remind ourselves of Spolsky’s (1997) warning that the search for a ‘fair test’ may lead us down a dead end, and rather, that we need to make it transparent to all stakeholders what our assessments are trying to do. This may include an explicit definition of our constructs and how these link to pedagogy, along with the acknowledgement that they represent a theory of language and academic discourse particular to our context and imposed by us, as those in control of the process, rather than objective truth.

3.    Are we bound by the need to test in ways which are most familiar to us? And if we try to test in an alternative way, do we leave ourselves open to criticisms of lack of robustness? Alternative assessments do not often lend themselves to statistical validation procedures and thus are they considered unreliable or invalid? What kinds of evidence would we need in order to claim our alternative, integrated, process oriented tests are meeting all stakeholders’ needs? Do we need to redefine our paradigms of assessment validation to include a more interpretivist approach (Moss, 1992, McNamara, 2001)? What is preventing research from being conducted on alternative assessments in EAP contexts in the same way as in mainstream education? Obviously, questions are raised but no answers given and I would be fascinated to hear other practitioners’ views on these matters. As assessment affects us all, it would seem that there are ‘ends’ that need to be examined before we can begin to focus on the ‘means’ with which to assess our students.
Lea, M. & Street, B. V. (1998). Student Writing and Staff Feedback in Higher Education: An Academic Literacies Approach. Studies in Higher Education 23(2):157-72.
Lillis, T. (2003). Student writing as ‘academic literacies’: Drawing on Bakhtin to move from critique to design.  Language and education 17 (3 ):192-207
Madaus, G. (1993). A National testing system: manna from above? A historical/ technical perspective   Educational assessment 1 (1): 9-26
Moss, P. A.(1996.) Enlarging the Dialogue in Educational Measurement: Voices From Interpretive Research Traditions  Educational researcher 25(1): 20-28
Macnamara, T. (2001).  Language assessment as social practice: challenges for research. Language testing 18(4): 333 -349
Spolsky, B. (1997). The ethics of gatekeeping tests: what have we learned in a 100 years?.  Language testing 14(3) 242-247
Zamel, V. (1998) Strangers in academia: the experiences of faculty and ESL students across the curriculum p249-264 IN Negotiating academic literacies: teaching and learning across languages and cultures eds Spack, R and Zamel V  Lawrence Erlbaum associates new jersey