Độ "tin cậy" và độ "xác trị" trong xây dựng, thiết kế bài kiểm tra đánh giá năng lực Tiếng Anh, những điểm cần lưu ý đối với giảng viên

Kiểm tra là một phần không thể thiếu trong các chương trình học ngoại ngữ nói chung, và chương

trình học tiếng Anh nói riêng. Từ thực tế đó, mối quan tâm tới “độ tin cậy” và “độ xác trị” của

một bài kiểm tra năng lực tiếng Anh là thực sự quan trọng. Bởi một thực tế là hầu hết các giáo

viên tiếng Anh hiện nay hầu như chưa được đào tạo về kiểm tra đánh giá, mà họ hầu hết dựa vào

khả năng trực giác, kinh nghiệm và giáo trình để xây dựng, thiết kế một bài kiểm tra tiếng Anh.

Từ những lý do nêu trên, trong khuôn khổ bài viết này, một vài vấn đề có liên quan tới quá trình

xây dựng và thiết kế một bài kiểm tra năng lực trong chương trình học tiếng Anh sẽ được nêu lên

và thảo luận.

t method
In this method, the same test is implemented 
twice in the same group of students. The second 
implementation takes place no later than two 
weeks from the first one. Students are not only 
uninformed of the first test result but also given 
no feedback on their performances. They are also 
not warned about the second one and, therefore, 
undergo no preparation in the upcoming test 
during this period. After the second test, individual 
results will be arranged into two columns to make 
comparison. If there is no significant difference, it 
will be claimed that the test seem to meet reliability 
requirement. Although, as Brown (1996) states, 
this way might sound strange and upset students 
who are asked to take the same test twice, it could 
prove to be a useful method of working out about 
the reliability of a test.
Parallel Test Method
In this method two test equivalent in terms 
of difficulty are conducted to the same group of 
students. The same procedures as in the test-retest 
methods are applied. Now, although parallel test 
method sounds more natural than the test-retest 
method, it is more challenging because two 
versions of a test need to be designed with the strict 
equivalence in terms of difficulty. Consequently, 
the level of difficulty, at first, is defined and then 
the test items are developed to match the difficulty, 
requiring teachers and test designer a huge amount 
of effort. 
3.2. Test Validity
As Huges (1992) states, a test proves valid 
only when it corresponds with language skills or 
structures which are going to be measured. For 
example, when testing students’ knowledge of 
vocabulary, which they have just covered, students 
should be tested what they have already been 
presented. If in the test, some vocabulary items 
of which students have yet to receive instructions 
and explanations are included in the test, the test 
is surely reduced to invalidity, since it fails to 
respond what is designed to identify.
It will be a mistake when discussing language 
test validity without clarifying the construct 
validity. According to Bachman (1996) “the so 
called construct validity is subordinate to the sense 
and rationality of interpretation of the language 
test scores, which means this interpretation 
is the assessment of language skills of the 
subject” (Bachman and Palmer, 1996, pp.254-
271). Bachman holds a belief that by means of 
interpreting the test score, we can not only assess 
the language ability of the subject, but we also 
estimate the reasonability of the language adopted 
in the test. For example, when the aim of the test is 
to evaluate students’ ability to use Passive Voice, 
it is important that the test be designed to directly 
deal with this grammatical structure in the hope 
that the scores will help us to assess our students’ 
language proficiency. If somehow the test items 
include other structures, such as Conditionals, the 
test will surely lack validity. 
From the mentioned ideas, it could be said that 
construct validity is to interpret scores, from which 
language proficiency of students and test tasks can 
be estimated. 
3.2.1. Factors that Affect Test Validity
A series of factors having negative effects on 
validity have been identified. Henning (1987), for 
example, has listed some of them. The first factor 
that affects test validity is the mismatch between a 
test and construct it is going to measure. Bachman 
also proposes that an invalid adaptation of tests 
is another detrimental factor. If, for instance, a 
test designed to test lexical level of first-year 
students, is used with high school students, it is 
surely invalid. However, only when McNamara 
(2000) proposes that there are two major notable 
factors: “irrelevant variance of validity” and 
“underrepresentation of validity ”, is the problem 
further clarified.
Irrelevant Variance of Validity
A test will be classified into “irrelevant 
variance” if the test is too broad, consisting a 
number of variables which are irrelevant to the 
interpreted validity. McNamara argues that the 
tested knowledge or skill mismatches in a setting 
which is either out of student’s experience or 
irrelevant to the content being tested. For example, 
in an oral test, candidates may be asked to discuss 
an abstract topic; if that topic is of their disinterest 
or is one of which they may be ignorance, their 
performance stands less chance of competence 
than when they are asked to speak on a more 
accustomed topic at the same level of abstraction. 
In this case, it is noted that the quality being tested, 
the ability to discuss an abstract topic in English, is 
inconsistent with irrelevant requirement of having 
particular knowledge of a certain topic.
Underrepresentation of Validity
“Underrepresentation of validity is contrary 
to “irrelevant variance of validity”, that is to say 
the testing is insufficient; the test either is too 
narrow in terms of knowledge or fails to include 
important aspects of validity. In other words, as 
Fulcher (2010) states, the extent to which a test 
fails to measure the relevant knowledge is the 
degree to which it under-represents the validity 
that is supposed to be tested.
3.2.2. Methods of Improving Language 
Proficiency Test Validity
When discussing how to determine the test 
validity, Henning (1987) indicates that there are 
two main ways to achieve test validity. One is the 
experimental method in which the data collection 
together with the statistic formulas is applied to 
calculation of validity. The other is through non-
experimental methods. This involves inspection, 
intuition and common sense. Since the application 
of experimental methods requires special training 
in terms of statistics and the use of specialized 
computer programs to work out complex 
calculations, within the paper, the author would 
focus on non-experimental methods for preference.
Although, as many worry, lack of experimental 
evidence may somehow lead to lack of objectivity, 
by a number of practical actions teachers can 
enhance the chances of upholding the validity of 
their test. For example, if one teacher wants to 
evaluate his/her students’ knowledge of grammar 
at the end of an elementary course, he or she need to 
acknowledge and be aware of what knowledge of 
grammar at the elementary level consists of. Then, 
he or she should adopt test items matching what 
students have been exposed to during the course.
This paper has provided some basic 
understandings of English proficiency test in 
which the definition, along with qualities needed 
for English proficiency test, is mentioned. Also, 
“reliability” and “validity” are chosen among 
the features of English proficiency test to be 
discussed. Accordingly, the factors that affect and 
the methods used to improve “reliability” and 
“validity” are also discussed.
The paper is written in the hope of providing 
what is fundamental in designing and developing 
English proficiency test. Without it, students will 
be exposed to a considerable challenge in English 
learning process. This, unfortunately, leads to 
the fact that teachers are incapable of providing 
students with objective feedback about students’ 
progress in their English learning process. This 
lack of knowledge in turn has bad effect on 
teachers as well. They will do not address what 
their students’ weaknesses are and how to promote 
their strengths. 
From such reasons, it is significant that 
teachers train themselves in problems relevant 
to assessment and testing. Also, our educational 
institutions should start offering courses in test 
design and development together with other courses 
in English language teaching methodology./.
 Abstract: Testing is an indispensable component in foreign language programs in general, and 
in English in particular. In this context, the concerns about the reliability and validity are of 
importance. There is a fact that teachers with practically no training in the field of test development 
often depend mostly on their own intuition or their previous experience and text books. From 
these above, within this article, the problems of test design and development in English program 
will be raised and discussed.
Keywords: English proficiency test, English program, reliability, validity
