Problems of Internal Invalidity

In 1963, Donald Campbell and Julian Stanley published what was to become a classic work in social research: Experimental and Quasi-Experimental Designs for Research (Chicago: Rand McNally, 1963). Among other things, they discussed several problems of "invalidity."  Suppose you do an experiment to test the impact of a stimulus, and it seems to have the expected impact on the dependent variable.  Internal invalidity means something other than the stimulus produced the observed change in the dependent variable.  As we discuss examples of this, you'll see how the experimental model is designed to avoid or detect such problems.

For purposes of illustration, let's imagine that a prison warden wanted to improve morale among his prisoners by introducing a new program of conjugal visits (letting prisoners have sex with their wives).  Morale was measured before the program began and again afterward.  Morale was seen to improve.  But was it the conjugal-visit program that made the difference?

Here are eight different problems of internal invalidity, introduced by Campbell and Stanley, and discussed in The Practice of Social Research.

1. History.  Perhaps something else happened during the course of the experiment, and that something else improved morale.

Prison:  Perhaps the prison got a new chef, who threw out the mystery-meat recipes and began offering trout almondine, coq au vin, and other delicacies.
Solution:  The prisoners who did not participate in the conjugal-visit program (e.g., those without wives) could serve as a control group.  Since they would benefit from the new chef, presumably their morale would also improve; if it improved as much as for those in the program, we could see that the program didn't make a difference.
2. Maturation.  The passage of time causes subjects to change and that may affect the dependent variable.
Prison:  The longer they spend in prison, the more prisoners may become reconciled to their situations and may even learn how to work the system.  Thus, their morale improves, but it has nothing to do with the conjugal visits.

Solution:  As in #1 above, this factor would operate equally among those in the program and those not in it, so the presence of a control group will alert us to this problem if it exists.

3. Testing.  The act of measurement may have an apparent affect on dependent variable, perhaps by tipping off the subjects to the purpose of the experiment.
Prison:  Let's say you measure morale by way of a questionnaire that asks prisoners questions relating to morale.  The second time they are asked such questions (in the post-test) they may figure out what the experiment is about and may tailor their answers to what they think you want to hear and/or what they think will best serve their interests.

Solution: Once again, the presence of a control group will let us know if this is happening.  As you'll see in the discussion of the Solomon Four-group Design in the textbook, this problem can sometimes be avoided by omitting the pre-test.

4. Instrumentation: Perhaps the way the depedent variable is measured in the pre-test and post-test are not comparable.
Prison:  Continuing with the idea of asking prisoners questions, and worrying that the prisoners might catch on to our purpose (see #3 above), we might ask a different set of questions in the pre-test and post-test.  Unfortunately, one of the measurements might set a higher standard that the other.  So, if the percentage of prisoners judged to have "high morale" increased from 40% to 70% among those in the program, that might only reflect a lower standard for "high morale" in the post-test as compared to the pre-test.

Solution: If the two measurements use different standards, as mentioned above, then the control group should also seem to increase in morale.  However, if the experimental group's morale increased more than that of the control group, we'd have a basis for concluding that the program of conjugal visits made a difference.

5. Statistical regression: If the pre-test shows a very high or very low degree of the dependent variable, then any change is likely to be away from that extreme, toward the middle.
Prison:  Let's say our prison pre-test indicates that 0% of the prisoners have high morale.  That means that any subsequent change has to be in the direction of higher morale, since it can't get any lower.  If the experimental group went from 0% to 40% with "high morale," we couldn't be sure that the stimulus was responsible for the change.

Solution: As in #5 above, we want to see if the control group's morale improved as much as that of the experimental group.

6. Selection biases: Perhaps the experimental and control groups were not comparable, differing in a way that made the experimental group members more likely to improve in morale.
Prison:  Let's say the prison warden uses the conjugal-visit program as a reward to prisoners who have shown recent improvement in their behavior.  Their morale, therefore, is possibly already improving for a variety of idiocyncratic reasons.  Improvements in morale during the term of the experiment might merely reflect a continuing process rather than the impact of the conjugal visits.

Solution: Don't do that.  It is important that the experimental and control groups be comparable to one another.  This can be accomplished by random assignment or by careful matching.  Barring that, the pre-test should measure not only the dependent variable but other relevant variables.  If you asked about recent behavioral changes for all the prisoners, you could detect that those with recent improvements in behavior showed increased morale, whether they were in the program or not.

7. Experimental mortality: Some subjects may fail to complete the experiment, and they may differ importantly from those who stay.
Prison:  Let's say the prisoners with the worst morale go over the wall or commit suicide.  They won't be around to drag down the average morale scores in the post-test measurement.  Thus, overall morale levels will appear to have improved.

Solution: By matching the pre- and post-test data, you would be able to omit the drop-outs from the pre-test maesurement as well.
     In other situations, however, this problem can be more complex.  Suppose we are giving pills to medical patients, all hoping to experience an  improvement in their health.  Patients who experience no benefits may become demoralized (see #12 below) and drop out of the experiment.  The post-test among those left in the experiment will show more improvement than was really the case.  One solution to this problem is to perform several post-tests over the course of the experiment.  Checking the progress of the medical patients every day would reveal that those making no progress early on were more likely to drop out later.

8.  Causal time-order: Sometimes, the time-order of cause and effect can be twisted around.
Prison:  This problem was included in the discussion of #6 above.  Improvement in morale occured prior to participation in the conjugal-visit program.

Solution: A careful and conscious structuring of the experiment and in the pre-testing and post-testing should avoid this problem.

9. Diffusion or imitation of treatments: The stimulus reserved for the experiment group subjects may somehow spread to the control group members.
Prison:   Control group prisoners bribe the guards to provide them with surreptitious, conjugal visits from loved ones or prostitutes.  Thus, their morale improves as much as that of the experimental group, suggesting the program had no impact, when, in fact, it did.

Solution: What can I say?  Don't do that.  As an experimenter, you need to be eternally vigilant to prevent diffusion and imitation of treatments.  Sometimes careful measurements can turn up such problems.

10.  Compensation:  Similar to #9 above, the control group may get some sort of alternative to the experimental treatment.
Prison:  When control group subjects get unruly in their complaints about being denied the conjugal visits, the warden supplies beer and pizza as a compensation.  Morale improves as much among the control group as among the experimental group.

Solution: See #9 above.

11. Compensatory rivalry: Control group members may feel unfairly treated and organize themselves to show they are just as worthy as the experimental group.
Prison:  Prisoners denied the conjugal visits renounce sex as weak and unworthy, turning to a regimen of exhaustive exercise and cold showers.  They lose weight, become buff musclemen, gain in self-esteem, and exhibit improved morale.  Once again, the conjugal-visit program seems to do no good.

Solution: Pre-tests and post-tests should often go beyond simply measuring the dependent variable to include other information that could point to changes like those discussed above.  Whereas laboratory experiments can control the situation sufficiently to avoid such problems, experiments in the real world must be monitored in their real world context.

12. Demoralization: This is the opposite of #11 above.  Control group subjects or other experiencing no benefits will become unhappy, dejected, and demoralized.
Prison:  Control group prisoners, denied the conjugal visits, experience decreased morale for precisely that reason.  The experimental group will have higher morale than the control group at the end of the experiment, not because the experimental group's morale improved but because the control group's morale decreased.

Solution: Comparable pre- and post-tests of the dependent variable should highlight this problem, pointing out that the experimental group's morale had not actually improved.

As you can see, then, the classical experimental model and variations on it (discussed in the textbook) help us avoid or at least detect problems of internal invalidity.

Campbell and Stanley also talked about external invalidity.  This means that although the experimental stimulus really produced the changes in the experimental situation but it wouldn't have the same effect outside the experiment, in real life.  This is discussed in more detail in Chapter 8.