The teacher normally prepares a draft of the test. Such a draft is subjected to item analysis and validation in order to ensure that the final version of the test would be useful and functional. First, the teacher tries out the draft test to a group of students of similar characteristics as the intended test takers (try-out phase). From the try-out group, each item will be analyzed in terms of its ability to discriminate between those who know and those who do not know and also its level of difficulty (item analysis phase). The item analysis will provide information that will allow the teacher to decide whether to revise or replace an item (item revision phase). Then, finally, the final draft of the test is subjected to validation if the intent is to make use of the test as a standard test for the particular unit or grading period.
There are two important characteristics of an item that will be of interest to the teacher. These are: (a) item difficulty, and (b) discrimination index. We shall learn how to measure these characteristics and apply our knowledge in making a decision about the item in question.
The difficulty of an item or item difficulty is defined as the number of students who are able to answer the item correctly divided by the total number of students. Thus:
Item difficulty = number of students with correct answer / total number of students
The item difficulty is usually expressed in percentage.
Example: What is the item difficulty index of an item if 25 students are unable to answer it correctly while 75 answered it correctly?
Here, the total number of students is 100 , hence, the item difficulty index is 75/100 or 75%.
One problem with this type of difficulty index is that it may not actually indicate that the item is difficult ( or easy). A student who does not know the subject matter will naturally be unable to answer the item correctly even if the question is easy. How do we decide on the basis of this index whether the item is too difficult or too easy? The following arbitrary rule is often used in the literature:
Difficult items tend to discriminate between those who know and those who do not know the answer. Conversely, easy items cannot discriminate between these two groups of students. We are therefore interested in deriving a measure that will tell us whether an item can discriminate between these two groups of students. Such a measure is called an index of discrimination.
An easy way to derive such a measure is to measure how difficult an item is with respect to those in the upper 25% of the class and how difficult it is with respect to those in the lower 25% of the class. If the upper 25% of the class found the item easy yet the lower 25% found it difficult, then the item can discriminate properly between these two groups. Thus:
Index of discrimination = DU — DL
Example: Obtain the index of discrimination of an item if the upper 25% of the class had a difficulty index of 0.60 (i.e. 60% of the upper 25% got the correct answer) while the lower 25% of the class had a difficulty index of 0.20. Here, DU = 0.60 while DL = 0.20, thus index of discrimination = .60 – .20 = .40.
Theoretically, the index of discrimination can range from -1.0 (when DU =0 and DL = 1) to 1.0 ( when DU = 1 and DL = 0). When the index of discrimination is equal to -1, then this means that all of the lower 25% of the students got the correct answer while all of the upper 25% got the wrong answer. In a sense, such an index discriminates correctly between the two groups but the item itself is highly questionable. Why should the bright ones get the wrong answer and the poor ones get the right answer? On the other hand, if the index of discrimination is 1.0, then this means that all of the lower 25% failed to get the correct answer while all of•the upper 25% got the correct answer. This is a perfectly discriminating item and is the ideal item that should be included in the test. From these discussions, let us agree to discard or revise all items that have negative discrimination index for although they discriminate correctly between the upper and lower 25% of the class, the content of the item itself may be highly dubious. As in the case of the index of difficulty, we have the following rule of thumb:
Example: Consider a multiple choice type of test of which the following data were obtained:
The correct response is B. Let us compute the difficulty index and index of discrimination:
Difficulty Index = no. of students getting correct response/total
= 40/100 = 40%, within range of a "good item"
The discrimination index can similarly be computed: DU = no. of students in upper 25% with correct response/no. of students in the upper 25%
= 15/20 = .75 or 75%
DL = no. of students in lower 75% with correct response/ no. of students in the lower 25%
= 5/20 = .25 or 25%
Discrimination Index = DU — DL = .75 – .25 = .50 or 50%.
Thus, the item also has a “good discriminating power”.
It is also instructive to note that the distracter A is not an effective distracter since this was never selected by the students. Distracters C and D appear to have good appeal as distracters.
Basic Item Analysis Statistics
The Michigan State University Measurement and Evaluation Department reports a number of item statistics which aid in evaluating the effectiveness of an item. The first of these is the index of difficulty which MSU defines as the proportion of the total group who got the item wrong. “Thus a high index indicates a difficult item and a low index indicates an easy item. Some item analysts prefer an index of difficulty which is the proportion of the total group who got an item right. This index may be obtained by marking the PROPORTION RIGHT option on the item analysis header sheet. Whichever index is selected is shown as the INDEX OF DIFFICULTY on the item analysis print-out. For classroom achievement tests, most test constructors desire items with indices of difficulty no lower than 20 nor higher than 80, with an average index of difficulty from 30 or 40 to a maximum of 60.
The INDEX OF DISCRIMINATION is the difference between the proportion of the upper group who got an item right and the proportion of the lower group who got the item right. This index is dependent upon the difficulty of an item. It may reach a maximum value of 100 for an item with an index of difficulty of 50, that is, when 100% of the upper group and none of the lower group answer the item correctly. For items of less than or greater than 50 difficulty, the index of discrimination has a maximum value of less than 100. Interpreting the Index of Discrimination document contains a more detailed discussion of the index of discrimination.”
More Sophisticated Discrimination Index
Item discrimination refers to the ability of an item to differentiate among students on the basis of how well they know the material being tested. Various hand calculation procedures have traditionally been used to compare item responses to total test scores using high and low scoring groups of students. Computerized analyses provide more accurate assessment of the discrimination power of items because they take into account responses of all students rather than just high and low scoring groups.
The item discrimination index provided by ScorePak® is a Pearson Product Moment correlation between student responses to a particular item and total scores on all other items on the test. This index is the equivalent of a point-biserial coefficient in this application. It provides an estimate of the degree to which an individual item is measuring the same thing as the rest of the items.
Because the discrimination index reflects the degree to which an item and the test as a whole are measuring a unitary ability or attribute, values of the coefficient will tend to be loWer for tests measuring a wide range of content areas than for more homogeneous tests. Item discrimination indices must always be interpreted in the context of the type of test which is being analyzed. Items with low discrimination indices are often ambiguously worded and should be examined. Items with negative indices should be examined to determine why a negative value was obtained. For example, a negative value may indicate that the item was miskeyed, so that students who knew the material tended to choose an unkeyed, but correct, response option.
Tests with high internal consistency consist of items with mostly positive relationships with total test score. In practice, values of the discrimination index will seldom exceed .50 because of the differing shapes of item and total score distributions. ScorePak® classifies item discrimination as “good” if the index is above .30; “fair” if it is between .10 and.30; and “poor” if it is below .10.
A good item is one that has good discriminating ability and has sufficient level of difficult (not too difficult nor too easy). In the two tables presented for the levels of difficulty and discrimination there is a little area of intersection where the two indices will coincide (between 0.56 to 0.67) which represent the good items in a test.
At the end of the Item Analysis report, test items are listed according their degrees of difficulty (easy, medium, hard) and discrimination (good, fair, poor). These distributions provide a quick overview of the test, and can be used to identify items which are not performing well and which can perhaps be improved or discarded.
The Item-Analysis Procedure for Norm-Provides the following information
- The difficulty of the item
- The discriminating power of the item
- The effectiveness of each alternative
Benefits derived from Item Analysis
- It provides useful information for class discussion of the test.
- It provides data which helps students improve their learning.
- It provides insights and skills that lead to the preparation of better tests in the future.
Index of Difficulty
P = (Ru + RL) ÷ T x 100
- Ru — The number in the upper group who answered the item correctly.
- RL — The number in the lower group who answered the item correctly.
- T — The total number who tried the item.
Index of item Discriminating Power
D = (Ru + RL) ÷ 1/2T
- P – percentage who answered the item correctly (index of difficulty)
- R – number who answered the item correctly
- T – total number who tried the item.
P = 8/20 x 100
The smaller the percentage figure the more difficult the item
Estimate the item discriminating power using the formula below:
D = (Ru — RL) / 1/2T
= (6 -2) / 10
The discriminating power of an item is reported as a decimal fraction; maximum discriminating power is indicated by an index of 1.00.
Maximum discrimination is usually found at the 50 percent level of difficulty
0.00 – 0.20 = Very difficult
0.21 – 0.80 = Moderately difficult
0.81 – 1.00 = Very easy