The Quality of Measurements:

Validity and Reliability

Measurement is a matter of describing concrete observations in terms of conceptual systems.  Thus, we observe a person and describe them as male or female, labeling them in terms of the conceptual variable, gender.  We say we have measured that person's gender.  While this may seem an unnecessarily complex way of detailing a simple and straightforward process, the complexity is more obvious in the case of other variables.

Suppose we want to "measure" a person's political orientations in the sense of being liberal or conservative.  Notice, to begin, that this variable is a concept, the product of our minds.  Political orientations do not exist in nature.  Whereas you may feel that people are "really" male or female, political orientations are hardly that solid.  Not only do people seem to have different political orientiations, they have different opinions about what constitutes a liberal or a conservative.  Thus, people would different on how to "measure" political orientations.

Social researchers use two standards for evaluating the quality of measurements: validity and reliability.

Validity refers to the extent to which we are actually measuring what we intended to measure.  If you measure political orientations by asking people what party they usually vote for, you would probably be basing your measurement strategy on the fact that Democrats tend to be more liberal, Republicans more conservative.  However, the relationship between party and orientations is hardly perfect, and there are other reasons for party affiliation, such as region or family tradition.  For all these reasons, we could question the validity of measuring political orientations by what party people vote for.   Notice that party voting is probably a more valid measure of political orientations than, say, income--even though wealthy people tend to be more conservative than working class people.  Thus, the validity of measures is a matter of degree.  In this case, by the way, it would be best to ask people to identify themselves as liberal or conservative, even though they will not all mean the same thing by those terms.

The matter of validity is further complicated when a given variable has several dimensions.  In the present case, for example, a person might be socially liberal and economically conservative.  And their international affairs orientation might be liberal or conservative.  In all cases, validity depends on a specification of what is meant by the terms of the variable.

Reliability is a different standard of quality.  It is a matter of "repeatability" or "dependability."  If we could make the same measurement repeatedly, would we get the same answer each time?  Let's say we set out to measure political orientations by asking people (Caution: do not do this at home or anywhere else), "How many times have you voted for the most liberal candidate in a race?" and "How many times have you voted for the most conservative candidate in a race?"  Our intent might be to calculate a ratio between the two votes.  I think you can see that this would be a terrible procedure since most subjects would be unable to recall every vote they had previously cast, including the political distinctions asked for.  At best, they would give some kind of rough estimate.

Let's assume that the subjects promptly forgot the answers they gave to these questions--and we asked the same questions again.  Chances are, we would get different answers the second time, since they would be giving us rough estimates each time.  This would challenge the reliability of the measurement technique.  It did not produce the same response each time.

This issue is most easily seen in the case of the bathroom scale, since it does forget your weight between measurements.  If you step repeatedly on the scale and it gives you the same weight each time, it is reliable.  However, if it consistently overstates your weight by 10 pounds (as often seems the case), it is not a valid measure.

The textbook discusses these concepts at greater length and examines some of the specific techniques social researchers use for enhancing validity and reliability.  You will discover, however, that the two qualities somewhat conflict with one another.  Often, the most reliable measurement techniques have relatively low validity, and vice versa.  Your task, then, is to optimize the balance.