Development of Varied Assessment Tools: Knowledge and Reasoning

Types of Objective Tests

We are concerned with developing objective tests for assessing the attainment of educational objectives based on Bloom’s taxonomy in this Chapter. For this purpose, we restrict our attention to the following types of objective tests: (a) true-false items, (b) multiple-choice type items, (c) matching items, (d) enumeration and filling of blanks and (e) essays. The first four types of objective tests are used to test the first four to five levels of the hierarchy of educational objectives while the last (essay) is used for testing higher-order thinking skills.

The development of objective tests requires careful planning and expertise in terms of actual test construction. The more seasoned teachers can produce true-false items that can test even higher-order thinking skills and not just rote memory learning. Essays are easier to construct than the other types of objective tests but the difficulty with which objective grades are derived from essay examinations often discourage teachers from using this particular form of examination in actual practice.

Planning a Test and Construction of Table of Specifications (TOS)

The important steps in planning for a test are:

  • Identifying test objectives
  • Deciding on the type of objective test to be prepared
  • Preparing a Table of Specifications (TOS)
  • Constructing the draft test items
  • Try-out and validation

Identifying Test Objectives

An objective test, if it is to be comprehensive, must cover the various levels of Bloom’s taxonomy. Each objective consists of a statement of what is to be achieved and, preferably, by how many percent of the students. 

Example. We want to construct a test on the topic: “Subject-Verb Agreement in English” for a Grade V class. The following are typical objectives:

Knowledge. The students must be able to identify the subject and the verb in a given sentence.

Comprehension. The students must be able to determine the appropriate form of a verb to be used given the subject of a sentence.

Application. The students must be able to write sentences observing rules on subject-verb agreement.

Analysis. The students must be able to break down a given sentence into its subject and predicate.

Synthesis/Evaluation. The students must be able to formulate rules to be followed regarding the subject-verb agreement.

Deciding on the type of objective test

The test objectives guide the kind of objective tests that will be designed and constructed by the teacher. For instance, for the first four (4) levels, we may want to construct a multiple-choice type of test while for application and judgment, we may opt to give an essay test or a modified essay test.

Preparing a Table of Specifications (TOS)

A table of specifications or TOS is a test map that guides the teacher in constructing a test. The TOS ensures that there is balance between items that test lower level thinking skills and those which test higher order thinking skills ( or alternatively, a balance between easy and difficult items) in the test. The simplest TOS consists of four (4) columns: (a) level of objective to be tested, (b) statement of objective, (c) item numbers where such an objective is being tested, and (d) Number of items and percentage out of the total for that particular objective. A prototype table is shown below: 

Table of Specifications Prototype

Level Objective Item Numbers No. %
Knowledge Identify subject-verb 1,3,5,7,9 5 16.67%
Comprehension Forming appropriate verb forms 2,4,6,8,10 5 16.67%
Application Determining subject and predicate 11,13,15,17,19 5 16.67%
Analysis Formulating rules on agreement 12,14,16,18,20 5 16.67%
Synthesis/Evaluation Writing of sentences observing rules on subject-verb agreement Part II 10 pts 33.32%
TOTAL 30 100%

In the table of specifications we see that there are five items that deal with knowledge and these items are items 1,3,5,7,9. Similarly, from the same table we see that five items represent synthesis, namely: 12, 14, 16, 18, 20. The first four levels of Bloom’s taxonomy are equally represented in the test while application (tested through essay) is weighted equivalent to ten (10) points or double the weight given to any of the first four levels. The table of specifications guides the teacher in formulating the test. As we can see, the TOS also ensures that each of the objectives in the hierarchy of educational objectives is well represented in the test. As such, the resulting test that will be constructed by the teacher will be more or less comprehensive. Without the table of specifications, the tendency for the test maker is to focus too much on facts and concepts at the knowledge level.

Constructing the test items

The actual construction of the test items follows the TOS. As a general rule, it is advised that the actual number of items to be constructed in the draft should be double the desired number of items, For instance, if there are five (5) knowledge level items to be included in the final test form, then at least ten (10) knowledge level items should be included in the draft. The subsequent test try-out and item analysis will most likely eliminate many of the constructed items in the draft (either they are too difficult, too easy or non-discriminatory), hence, it will be necessary to construct more items than will actually be included in the final test form.

Item analysis and try-out

The test draft is tried out to a group of pupils or students. The purpose of this try out is to determine the : (a.) item characteristics through item analysis, and (b) characteristics of the test itself-validity, reliability, and practicality.

Constructing a True-False Test

Binomial-choice tests are tests that have only two (2) options such as true or false, right or wrong, good or better and so on. A student who knows nothing of the content of the examination would have 50°i chance of getting the correct answer by sheer guess work. Although correction-for-guessing formulas exist, it is best that the, teacher ensures that a true-false item is able to discriminate properly between those who know and those who are just guessing. A modified true-false test can offset the effect of guessing by requiring students to explain their answer and to disregard a correct answer if the explanation is incorrect. Here are some rules of thumb in constructing true-false items.

Rule 1: Do not give a hint (inadvertently) in the body of the question.

Example: The Philippines gained its independence in 1898 and therefore celebrated its centennial year in 2000. ______

Obviously, the answer is FALSE because 100 years from 1898 is not 2000 but 1998.

Rule 2: Avoid using the words “always”, “never” “often” and other adverbs that tend to be either always true or always false.

Example: Christmas always falls on a Sunday because it is a Sabbath day.

Statements that use the word “always” are almost always false. A test-wise student can easily guess his way through a test like these and get high scores even if he does not know anything about the test.

Rule 3: Avoid long sentences as these tend to be “true”. Keep sentences short.

Example: Tests need to be valid, reliable and useful, although, it would require a great amount of time and effort to ensure that tests possess these test characteristics. _______

Notice that the statement is true. However, we are also not sure which part of the sentence is deemed true by the student. It is just fortunate that in this case, all parts of the sentence are true and hence, the entire sentence is true. The following example illustrates what can go wrong in long sentences:

Example: Tests need to be valid, reliable and useful since it takes very little amount of time, money and effort to construct tests with these characteristics.

The first part of the sentence is true but the second part is debatable and may, in fact, be false. Thus, a “true” response is correct and also, a “false” response is correct.

Rule 4. Avoid trick statements with some minor misleading word or spelling anomaly, misplaced phrases, etc. A wise student who does not know the subject matter may detect this strategy and thus get the answer correctly.

Example: True or False. The Principle of our school is Mr. Albert P. Panadero.

The Principal’s name may actually be correct but since the word is misspelled and the entire sentence takes a different meaning, the answer would be false! This is an example of a tricky but utterly useless item.

Rule 5: Avoid quoting verbatim from reference materials or textbooks. This practice sends the wrong signal to the students that it is necessary to memorize the textbook word for word and thus, acquisition of higher level thinking skills is not given due importance.

Rule 6. Avoid specific determiners or give-away qualifiers. Students quickly learn that strongly worded statements are more likely to be false than true, for example, statements with “never” “no” “all” or “always.” Moderately worded statements are more likely to be true than false. Statements with “many” “often” “sometimes” “generally” ‘frequently” or “some” should be avoided.

Rule 7. With true or false questions, avoid a grossly disproportionate number of either true or false statements or even patterns in the occurrence of true and false statements.

Constructing Multiple Choice Tests

A generalization of the true-false test, the multiple-choice type of test offers the student with more than two (2) options per item to choose from. Each item in a multiple-choice test consists of two parts: (a) the stem, and (b) the options. In the set of options, there is a “correct” or “best” option while all the others are considered “distracters”. The distracters are chosen in such a way that they are attractive to those who do not know the answer or are guessing but at the same time, have no appeal to those who actually know the answer. It is this feature of multiple-choice type tests that allow the teacher to test higher-order thinking skills even if the options are clearly stated. As in true-false items, there are certain rules of thumb to be followed in constructing multiple-choice tests.

Guidelines in constructing Multiple Choice Items

Rule 1: Do not use unfamiliar words, terms and phrases. The ability of the item to discriminate or its level of difficulty should stem from the subject matter rather than from the wording of the question.

Example: What would be the system reliability of a computer system whose slave and peripherals are connected in parallel circuits and each one has a known time to failure probability of 0.05?

A student completely unfamiliar with the terms “slave” and “peripherals”may not be able to answer correctly even if he knew the subject matter of reliability.

Rule 2: Do not use modifiers that are vague and whose meanings can differ from one person to the next such as: much, often, usually, etc.

Example: Much of the process of photosynthesis takes place in the:
a. bark
b. leaf
c. stem

The qualifier “much” is vague and could have been replaced by more specific qualifiers like:” 90% of the photosynthetic process” or some similar phrase that would be more precise.

Rule 3: Avoid complex or awkward word arrangements. Also, avoid use of negatives in the stem as this may add unnecessary comprehension difficulties.


(Poor) As President of the Republic of the Philippines, Corazon Cojuangco Aquino would stand next to which President of the Philippine Republic subsequent to the 1986 EDSA Revolution?

(Better) Who was the President of the Philippines after Corazon C. Aquino?

Rule 5: Do not use negatives or double negatives as such statements tend to be confusing. It is best to use simpler sentences rather than sentences that would require expertise in grammatical construction.


(Poor) Which of the following will not cause inflation in the Philippine economy?

(Better) Which of the following will cause inflation in the Philippine economy?

(Poor) What does the statement “Development patterns acquired during the formative years are NOT Unchangeable” imply?

(Better) What does the statement “Development patterns acquired during the formative years are changeable” imply?

 Rule 5: Each item stem should be as short as possible; otherwise you risk testing more for reading and comprehension skills.

Rule 6: Distracters should be equally plausible and attractive.

Example: The short story: May Day’s Eve, was written by which Filipino author?
a. Jose Garcia Villa
b. Nick Joaquin
c. Genoveva Edrosa Matute
d. Robert Frost
e. Edgar Allan Poe

If distracters had all been Filipino authors, the value of the item would be greatly increased. In this particular instance, only the first three carry the burden of the entire item since the last two can be essentially disregarded by the students.

Rule 7: All multiple choice options should be grammatically consistent with the stem.

Rule 8: The length, explicitness, or degree of technicality of alternatives should not be the determinants of the correctness of the answer. The following is an example of this rule:

Example: If the three angles of two triangles are congruent, then the triangles are:
a. congruent whenever one of the sides of the triangles are congruent
b. similar
c. equiangular and therefore. must also be congruent
d. equilateral if they are equiangular

The correct choice, “b,” may be obvious from its length and explicitness alone. The other choices are long and tend to explain why they must be the correct choices forcing the students to think that they are, in fact, not the correct answers!

Rule 9: Avoid stems that reveal the answer to another item.

Rule 10: Avoid alternatives that are synonymous with others or those that, include or overlap others.

Example: What causes ice to transform from solid state to liquid state’?
a. Change in temperature
b. Changes in pressure
c. Change in the chemical composition
d. Change in heat levels

The options a and d are essentially the same. Thus, a student who spots these identical choices would right away narrow down the field of choices to a, b, and c. The last distracter would play no significant role in increasing the value of the item.

Rule 11: Avoid presenting sequenced items in the same order as in the text.

Rule 12: Avoid use of assumed qualifiers that many examinees may not be aware of.

Rule 13: Avoid use of unnecessary words or phrases, which are not relevant to the problem at hand (unless such discriminating ability is the primary intent of the evaluation). The items value is particularly damaged if the unnecessary material is designed to distract or mislead. Such items test the student’s reading comprehension rather than knowledge of the subject matter.

Example: The side opposite the thirty degree angle in a right triangle is equal to half the length of the hypotenuse. If the sine of a 30-degree is 0.5 and its hypotenuse is 5, what is the length of the side opposite the 30-degree angle?
a. 2.5
b. 3.5
c. 5.5
d. 1.5

The sine of a 30-degree angle is really quite unnecessary since the first sentence already gives the method for finding the length of the side opposite the thirty-degree angle. This is a case of a teacher who wants to make sure that no student in his class gets the wrong answer!

Rule 14:  Avoid use of non-relevant sources of difficulty such as requiring a complex calculation when only knowledge of a principle is being tested.

Note in the previous example, knowledge of the sine of the 30-degree angle would have led some students to use the sine formula for calculation even if a simpler approach would have sufficed.

Rule 15: Avoid extreme specificity requirements in responses.

Rule 16: Include as much of the item as possible in the stem. This allows for less repetition and shorter choice options.

Rule 17: Use the “None of the above” option only when the keyed answer is totally correct. When choice of the “best” response is intended, “none of the above” is not appropriate, since the implication has already been made that the correct response may be partially inaccurate.

Rule 18: Note that the use of “all of the above” may allow credit for partial knowledge. In a multiple option item, (allowing only one option choice) if a student only knew that two (2) options were correct, he could then deduce the correctness of “all of the above”. This assumes you are allowed only one correct choice.

Rule 19: Having compound response choices may purposefully increase difficulty of an item.

Rule 20: The difficulty of a multiple choice item may be controlled by varying the homogeneity or degree of similarity of responses. The more homogeneous, the more difficult the item.


(Less Homogeneous) Thailand is located in:
a. Southeast Asia
b. Eastern Europe
c. South Amer
d. East Africa
e. Central America 

(More Homogeneous) Thailand is located next to:
a. Laos and Kampuchea
b. India and China
c. China and Malaya
d. Laos and China
e. India and Malaya

Constructing Matching Type and Supply Type Items

The matching type items may be considered as modified multiple-choice type items where the choices progressively reduce as one successfully matches the items on the left with the items on the right.

Example: Match the items in column A with the items in column B.


_________1. Magellan
_________2. Mabini
_________3. Rizal
_________4. Lapu-Lapu
_________5. Aguinaldo


a. First President of the Republic
b. National Hero
c. Discovered the Philippines
d. Brain of Katiputian
e. The great painter
f. Defended Limasawa island 

Normally, column B will contain more items than column A to prevent guessing on the part of the students. Matching type items, unfortunately, often test lower order thinking skills (knowledge level) and are unable to test higher order thinking skills such as application and judgement skills.

A variant of the matching type items is the data sufficiency and comparison type of test illustrated below:

Example: Write G if the item on the left is greater than the item on the right; L if the item on the left is less than the item on the right; E if the item on the left equals the item on the right and D if the relationship cannot be determined.


  1. Square root of 9 ______
  2. Square root of 25 ______
  3. 36 inches ______
  4. 4 feet ______
  5. 1 kilogram ______


a. -3
b. 615
c. 3 meters
d. 48 inches
e. 1 pound

The data sufficiency test above can, if properly constructed, test higher-order thinking skills. Each item goes beyond simple recall of facts and, in fact, requires the students to make decisions.

Another useful device for testing lower-order thinking skills is the supply type of tests. Like the multiple-choice test, the items in this kind of test consist of a stem and a blank where the students would write the correct answer.

Example: The study of life and living organisms is called ____________.

Supply type tests depend heavily on the way that the stems are constructed. These tests allow for one and only one answer and, hence, often test only the students’ knowledge. It is , however, possible to construct supply type of tests that will test higher order thinking as the following example shows:

Example: Write an appropriate synonym for each of the following. Each blank corresponds to a letter:

Metamorphose: _ _ _ _ _ _
Flourish: _ _ _ _

The appropriate synonym for the first is CHANGE with six(6) letters while the appropriate synonym for the second is GROW with four (4) letters. Notice that these questions require not only mere recall of words but also understanding of these words. 

Constructing Essay Tests

Essays, classified as non-objective tests, allow for the assessment of higher-order thinking skills. Such tests require students to organize their thoughts on a subject matter in coherent sentences in order to inform an audience. In essay tests, students are required to write one or more paragraphs on a specific topic.

Essay questions can be used to measure the attainment of a variety of objectives. Stecklein (1955) has listed 14 types of abilities that can be measured by essay items:

  1. Comparisons between two or more things
  2. The development and defense of an opinion
  3. Questions of cause and effect
  4. Explanations of meanings
  5. Summarizing of information in a designated area
  6. Analysis
  7. Knowledge of relationships
  8. Illustrations of rules, principles, procedures, and applications
  9. Applications of rules, laws, and principles to new situations
  10. Criticisms of the adequacy, relevance, or correctness of a concept, idea, or information
  11. Formulation of new questions and problems
  12. Reorganization of facts
  13. Discriminations between objects, concepts, or events
  14. Inferential thinking

Note that all these involve the higher-level skills mentioned in Bloom’s Taxonomy.

The following are rules of thumb which facilitate the scoring of essays:

Rule 1: Phrase the direction in such a way that students are guided on the key concepts to be included.

Example: Write an essay on the topic: “Plant Photosynthesis” using the following keywords and phrases: chlorophyll, sunlight, water, carbon dioxide, oxygen, by-product, stomata.

Note that the students are properly guided in terms of the keywords that the teacher is looking for in this essay examination. An essay such as the one given below will get a score of zero (0). Why?

Plant Photosynthesis

Nature has its own way of ensuring the balance between food producers and consumers. Plants are considered producers of food for animals. Plants produce food _for animals through a process called photosynthesis. It is a complex process that combines various natural elements on earth into the final product which animals can consume in order to survive. Naturally, we all need to protect plants so that we will continue to have food on our table. We should discourage the burning of grasses, cutting trees, and illegal logging. If the leaves of plants are destroyed, they cannot perform photosynthesis and animals will also perish.

Rule 2: Inform the students on the criteria to be used for grading their essays. This rule allows the students to focus on relevant and substantive materials rather than on peripheral and unnecessary facts and bits of information.

Example: Write an essay on the topic: “Plant Photosynthesis” using the keywords indicated. You will be graded according to the following criteria: (a) coherence, (b) accuracy of statements, (c) use of keywords, (d) clarity and (e) extra points for innovative presentation of ideas.

Rule 3: Put a time limit on the essay test.

Rule 4: Decide on your essay grading system prior to getting the essays of your students.

Rule 5: Evaluate all of the students’ answers to one question before proceeding to the next question. Scoring or grading essay tests question by question, rather than student by student, makes it possible to maintain a more uniform standard for judging the answers to each question. This procedure also helps offset the halo effect in grading. When all of the answers on one paper are read together, the grader’s impression of the paper as a whole is apt to influence the grades he assigns to the individual answers. Grading question by question, of course. prevents the formation of this overall impression of a student’s paper. Each answer is more apt to be judged on its own merits when it is read and compared with other answers to the same question. than when it is read and compared with other answers by the same student.

Rule 6: Evaluate answers to essay questions without knowing the identity of the writer. This is another attempt is control personal bias during scoring. Answers to essay questions should be evaluated in terms of what is written, not it terms of what is known about the writers from other contacts with them. The best way to prevent our prior knowledge from influencing our judgment is to evaluate each answer without knowing the identity of the writer. This can be done by having the students write their names on the back of the paper or by using code numbers in place of names. 

Rule 7: Whenever possible, have two or more persons grade each answer. The best way to check on the reliability of the scoring of essay answers is to obtain two or more independent judgments. Although this may not be a feasible practice for routine classroom testing, it might be done periodically with a fellow teacher (one who is equally competent in the area). Obtaining two or more independent ratings becomes especially vital where the results are to be used for important and irreversible decisions, such as in the selection of students for further training or for special awards. Here the pooled ratings of several competent persons may be needed to attain level of reliability that is commensurate with the significance of the decision being made.

Some teachers use the cumulative criteria i.e. adding the weights given to each criterion, as basis for grading while others use the reverse. In the latter method, each student begins with a score of 100. Points are then deducted every time a teacher encounters a mistake or when a criterion is missed by the student in his essay.