- Home
- Andy Field
How to Design and Report Experiments Page 11
How to Design and Report Experiments Read online
Page 11
Figure 3.5 The ‘post-test only/control group’ design
This design is frequently used in psychology, but it does suffer from one weakness. Participants are randomly assigned to the two conditions, to ensure that the groups are equivalent before the experimental manipulation takes place. However, if randomization fails to produce equivalence, you have no way of knowing that it has failed. You cannot be certain that the two groups were comparable before you administered your treatment.
As with the previous design, the pre-test/post-test control group design (Figure 3.6) uses two groups – a control group and an experimental group. Each participant is randomly assigned to one or other of the groups. Behaviour is measured before intervention, when it should be roughly comparable between the two conditions. Behaviour is measured again after the intervention: any difference between the two conditions now, is presumed to be due to the treatment administered to the experimental group.
The advantage of this design over the previous one is that, because we pre-test, we can be certain that our two groups were equivalent (or discover that they were not, if for some reason randomization of participant-allocation to the different conditions has failed) before our experimental manipulation took place.
One problem with the previous design is that pre-testing the participants might affect their subsequent performance: the Solomon fourgroup design (Figure 3.7) controls for that possibility. (This is a design for the seriously paranoid researcher!)
Here, we have four conditions, two control and two experimental. Groups A and B show the effects of presence and absence of the experimental manipulation; groups C and D allow an assessment to be made of the effects of the pre-testing experienced by groups A and B. By making the appropriate comparisons between these four conditions, you can assess the effects of pre-testing and thus gain some idea of the generalizability of the findings (and eliminate the possibility that the cause of the changes in group A was our experimental manipulation plus the pre-testing).
Figure 3.6 The ‘pre-test/post-test control group’ design
Figure 3.7 The ‘Solomon four-group’ design
This is an excellent design, but it suffers from the disadvantages of being expensive in terms of time and number of participants: it basically doubles the cost of running a study, and as a result, is rarely used in practice.
Repeated Measures Designs
Advantages of repeated measures designs
If possible, it is often highly desirable to use the same participants in every condition – so that a given participant produces one result for every condition of the experiment. There are two reasons for this:
Economy: We’ve already mentioned this: within-subjects designs are economical to run in terms of time and effort, because you’re using each participant several times.
Sensitivity: As mentioned earlier, what we are interested in are the differences in our results which have been produced by our experimental interventions. However, these differences have to be distinguished from all the random ‘noise’ produced in our data by the fact that participants differ from each other in lots of weird and wonderful ways. It is in our interests to reduce this random variability as much as possible, because the smaller it is, the easier it will be to detect our experimentally-induced variation. One way to reduce participant-induced variation would be to carefully match participants in different groups on factors which might affect our results. So, we could make sure that participants in group A were the same age, sex, height, IQ, etc., etc., as those in group B. The problem with this approach is that we might not necessarily match participants on appropriate characteristics: if, unknown to us, height was an important influence on our dependent variable but we had been matching participants on weight instead, then the matching would not have been much use. In any case, matching is time-consuming and tedious to do. A simpler and more effective technique is to match participants perfectly between the different conditions, in all respects – by using the same people in all conditions. By doing this, we can dramatically reduce the variation in scores between conditions which is due merely to non-experimental factors – that is, due to random differences between the different participants. Since each participant participates in all conditions, the only difference between a participant’s scores for the different conditions should be that produced by our experimental manipulations. In other words, instead of participants in different groups having different ages, interests, sexes etc., all of these factors are held completely constant across all conditions of the experiment.
All things being equal, a repeated-measures design will always be more sensitive (i.e. be more likely to reveal the effects of the experimenter’s manipulations) than an independent-measures design, because there are fewer sources of random variation to obscure the effects of your manipulations of the independent variable. In the between-groups design, differences between your experimental conditions include differences due to what you did to the participants (your manipulations of the independent variable that you’re interested in); (hopefully) random differences between individuals within a group; and (hopefully) random differences between individuals in one group and individuals in another. With a within-subjects design, you eliminate the last factor, and all you have to contend with is individual variation in participants’ responses to your experimental manipulations.
Disadvantages of repeated-measures designs
Repeated-measures designs therefore seem a good idea in theory. In practice, there are a couple of problems with them, which mean that it’s not possible to use them in all situations. (If you’ve read the section on independent-measures designs, you should be able to work out what these problems are).
‘Carry-over’ effects from one condition to another: Even if our experimental manipulations had no effect on a participant’s behaviour at all, he or she would still probably give slightly more or less different responses in our different experimental conditions. Participants are not robots: their performance will spontaneously vary slightly from trial to trial, and from condition to condition. This would not be too much of a problem if this were merely random fluctuation in performance, because then it should cancel out across conditions. However, systematic variations in performance pose a much more serious problem. Participants become fatigued, bored, better practised at doing the set tasks, and so on. These systematic or ‘confounding’ effects may interact with our manipulations of the independent variable of interest, rendering our results uninterpretable. For example, imagine that we were looking at the effects of stress on word-list recall, and we had three stress levels, low, medium and high. If each participant participated in each condition, in the same order, and we found an effect of condition on recall, we would be unable to tell if the effect was due to stress, or due to practice, fatigue or any permutation of these three factors. ‘Carry-over’ effects such as these are not necessarily an insuperable problem, but avoiding their effects does complicate matters somewhat, especially if you want to manipulate several independent variables in the same experiment. To avoid the influence of systematic confounding variables such as practice or fatigue, we can do one of two things: either randomize the order of presentation of the different conditions (so that one participant gets the order ‘high, medium, low’ stress; another gets the order ‘medium, high, low’ and yet another gets ‘medium, low, high’, etc.) or we can counterbalance the order – so that if you have two conditions, half the participants get the conditions in the order A then B and the other half get the order B then A. Counterbalancing has the advantage that you can then include the order of presentation of conditions in the analysis as a variable in itself. This means that you would then be able to look directly for effects of the order in which conditions were presented to participants, rather than merely keeping your fingers crossed that you had controlled for order effects. This is useful in situations where you suspect that the effects of the different orders of conditions are likely to be different. Suppose for example that you hav
e three conditions, A, B and C: doing A has a large effect on people’s subsequent performance, whereas doing C has a comparatively minor effect. This means that performance in B and A will be affected markedly when the sequence of conditions is ABC, but will be affected little when the sequence is CBA. If you randomize the order of conditions, it’s going to be hard to detect these effects; however, if you use counterbalancing and include ‘order of conditions’ in the statistical analysis as an independent variable in its own right, the order effect will be easier to detect. Whichever of these techniques is chosen, it is vital that any given participant is randomly allocated to one sequence of conditions or another.
The need for conditions to be reversible: The final catch with repeated-measures designs is that they can only be used if being in one condition does not have irreversible effects that prevent the participant being used in another condition – so, a repeated-measures design would be no good for an experiment investigating the effects of temperature on the speed with which lemmings throw themselves off a cliff, for example!
Figure 3.8 below shows a basic two-condition repeated-measures design. Half of the participants do the treatment condition first and the control condition second; the other half do these two conditions in the opposite order. It’s important that participants are randomly allocated to these two presentation orders.
More Sophisticated Designs
So far, we have considered simple two-condition designs, where there is an experimental condition (in which participants receive some treatment) and a control condition (in which participants don’t). In the jargon, we have one independent variable with two levels (in this case, presence versus absence of whatever the independent variable involves). However, many experiments involve more sophisticated designs than this.
Experiments with multiple levels of the independent variable
Figure 3.8 A basic two-condition repeated-measures design
First, you might want to examine the effects of more levels of the independent variable. Suppose you were interested in the effects of a new tranquillizer, ‘Tohellwivmiwurriz’, on levels of anxiety in first-time parachute jumpers. Instead of just having two conditions (tranquillizer versus no tranquillizer), you could have several conditions, each of which corresponded to a particular dosage of tranquillizer. Thus you might have one group taking no tranquillizer before jumping from the plane; another group taking 5 mg before jumping; another group taking 10 mg; and so on. (We would have to use a between-groups design in this particular instance: because they are first-time parachutists, their level of anxiety is probably unlikely to be the same on subsequent jumps, and so we can use each person only once – especially if the parachute doesn’t open).
This experiment would obviously be much more informative than a simple two-condition design, because it would give us details about how the amount of tranquillizer affected anxiety levels, rather than merely examining the effects of its presence or absence. This increases the generality of our findings, by showing that they do not apply merely to the unique level of the independent variable chosen in a single-factor experiment. Paradoxically, such an experiment may also show the limits of the generality of the experiment’s findings, by revealing the boundaries of the effects in question. For example, high levels of ‘Tohellwivmiwurriz’ might have the same effect as no drug whatsoever: a study in which we used varying dosages of the drug would show this, and reveal the limits under which the drug operates, limits which would be unknown to us had we conducted only a simple ‘presence versus absence’ study.
Figure 3.9 An experiment with multiple levels of one independent variable
This is simply one possible variant of the ‘pre-test/post-test control’ design that I discussed earlier, extending that design to look at several levels of the independent variable of interest. Here (Figure 3.9) we have one control group (group D), and three experimental groups (groups A, B and C), each receiving one level of the same independent variable. We could compare each of the three experimental groups to the control group, to assess the size of the effects of different manipulations of the independent variable. We could also look for trends in the data. For example, we could see if the reduction in anxiety was systematically related to the amount of tranquillizer taken. (Is it the case that the more tranquillizer that’s taken, the more relaxed the parachutist becomes? Or is there some other kind of relationship? It might be that the effects of the tranquilliser increase with increasing dosage, but only up to a certain point, after which there is no further effect of the drug.)
We can use more than two conditions in within-subjects designs as well, as long as we are careful about controlling for order effects (and as long as the effects of the different conditions are not irreversible). When we had only two conditions, we merely needed half the participants to do them in one order (A then B) and the other half to do them in the opposite order (B then A). However, with three conditions, there are 3 × 2 × 1 = 6 possible orders of conditions: ABC; CBA; ACB; BCA; CAB; BAC. If you have four conditions, there are 4 × 3 × 2 × 1 = 24 possible orders. Here lies the path to insanity . . .
Latin Squares designs
One way to deal with the problem of order effects in within-subjects designs is to use a ‘Latin Squares’ design. In this design, the order in which the various conditions of your experiment occur is counterbalanced, so that each possible order of conditions occurs just once. (It gets its name from an ancient mathematical puzzle – the problem of how to arrange a set of different things in a square so that each one appears only once in each column and once in each row).
Here’s a simple example of a situation in which we might use a Latin Squares design. Suppose we have three conditions, A, B and C, and we want to ensure that we don’t have any systematic order effects in our experiment (due to all participants doing the conditions in the order A first, B second and C third, for example). We could randomize the order in which each participant does the three conditions, as we discussed earlier. Alternatively, we could systematically counterbalance the order in which the conditions are presented. As mentioned a moment ago, there are three conditions, so there are six possible orders in which these conditions can occur. That’s a lot of conditions, and hence a lot of participants to run in our study. However, by using a Latin Squares design, we can cut down on the number of groups we need to run, and yet still avoid order effects. In a Latin Squares design for a study with three conditions, we need to use only the orders ABC, BCA and CAB, to be able to present each condition either first, second or third in the experiment (Table 3.1). We have enough permutations of presentation order to ensure that we have a sample of participants for each condition at each of the possible positions in the sequence. (In the subsequent statistical analysis, we could go on to do one of two things. We could combine the data for all of the ‘A’ conditions, all of the ‘B’ conditions and so on, in the assumption that this procedure has eliminated any order effects (e.g. effects of practice, fatigue, boredom, etc.). Or, we could explicitly include ‘order of presentation’ as an independent variable in our analysis, and actually check to see if there were any effects of doing the conditions in a particular order.) Note that the number of participants should be the same for each of the presentation orders used: so, if we had 60 participants, 20 would experience order ABC, 20 BCA and 20 CAB.
Table 3.1
If you have a design with just one independent variable, it’s comparatively easy to work out the relevant Latin Squares design. Table 3.2 shows a Latin Squares design for four levels of an independent variable (conditions A, B, C and D):
Table 3.2
Once again, each condition occurs at each point in the sequence: as many participants experience condition A first, as experience it second, third or last. The same is true for conditions B, C and D.
There is one potential problem with the Latin Squares we’ve looked at so far: they don’t completely eliminate order effects. In the 3 × 3 design above, A precedes B twice, but B precedes A
only once. (Similarly, C precedes A twice, but A precedes C only once). Suppose there were a practice effect from doing A then B: two groups of participants (those experiencing orders ABC and CAB) would get a practice effect, and the other group (the one experiencing order BCA) wouldn’t. To get round this problem, you can use a ‘balanced’ Latin Squares design.
Table 3.3
In Table 3.3 AB occurs as often as BA; BD occurs as often as DB; and so on. Balanced Latin Squares exist only for experiments with an even number of conditions (4 × 4, 6 × 6 and so on) and not for experiments with an odd number of conditions (3 × 3, 5 × 5 and so on).
That takes care of order effects when you have only one independent variable with several different levels. However, what do you do if you have two or more independent variables in the same study (see the section on ‘multi-factorial designs’ below)? It’s still possible to use a variation on the Latin Squares design, but now it gets a bit more timeconsuming to work out the presentation orders! The simplest way to do it (unless you’re someone who rattles off the Times Crossword in five minutes and are still stuck for something to do) is to use James Brown’s EDGAR program (http://www.jic.bbsrc.ac.uk/services/statistics/edgar.htm). Amongst other things, this excellent program automatically works out Latin Square designs for various numbers of independent variables with various numbers of levels. All you then have to do is run the participants . . .