- Home
- Andy Field
How to Design and Report Experiments Page 12
How to Design and Report Experiments Read online
Page 12
Multi-factorial designs
All of the designs so far have looked at manipulations of just one independent variable. However, you can have two or three different independent variables in the same study. You can in theory include as many independent variables as you like, which is fine in correlational research, but once you get beyond three in an experimental design, the data usually become horrendously complicated to interpret – don’t go there . . .
The advantage of including more than one independent variable is that it enables you to extend the generality of your findings at the cost of relatively little extra effort. If you have more than one independent variable in a study, you can look at how they interact with each other, in a way that you can’t if you don’t include the independent variables in the same experiment. (There will be more on this topic in Chapter 6: see in particular page 191 and Box 6.1. See the example below, as well.) This is often much more informative than looking at one variable at a time.
The main problem with including extra independent variables in a wholly between-groups study is that it increases the number of participants you will need. Suppose we added the independent variable of ‘sex’ to the parachuting study: this would double the number of participants required. If we looked at ‘age’, and used ‘young’, ‘middle aged’ and ‘elderly’ parachutists at each of the drug dosages, we would be faced with running nine groups of participants! Sometimes you can get round this problem by designing the study so that you get repeated measures data from one or both of the independent variables.
Suppose we were interested in factors affecting the performance of air-traffic controllers. For reasons that I need hardly go into, we would like them to stay alert while they are at work, so two independent variables we might want to look at are shift-pattern and room temperature. Both of these might cause tiredness on their own, and the interactions between them might be interesting: we might expect tiredness to be greatest when the air-traffic controllers are working late at night in a cosy, warm control tower. (Should Horlicks be a proscribed drug as far as air-traffic controllers are concerned?) Alternatively, it might be possible to counteract the effects of being tired while on a night-shift by making the room quite cold.
In the study shown in Figure 3.10, we have two independent variables: time of shift (with three levels, early, midday or late) and room temperature (with two levels, cold or warm). This gives us six different groups, each group representing a unique permutation of the two independent variables (shift and temperature). The measurement might be performance on some test of performance such as ‘number of undetected potential collisions between planes’. (It would probably be best to do this experiment using some kind of simulation of this, so that we had control over the number of potential plane collisions that took place within a shift).
With this design, we can assess the effects on performance of shift and temperature (the two ‘main effects’ in this study), and also look at the interaction between shift and temperature. There are all sorts of possibilities for the outcome of multivariate experiments like this. In the present example, they include: (a) Shift might have an effect on performance, regardless of temperature (a main effect of shift). (b) Temperature might have an effect on performance regardless of shift (a main effect of temperature). (c) There might be some interaction between the effects of shift and temperature. We have already mentioned one such possibility – that the effects of temperature moderate those of shift, so that air-traffic controllers are more alert if the room is cold. Another possibility is that temperature might affect performance differently according to the time of the shift – so that it makes a difference whether the room is warm or cold for early and late shifts (when the controllers are tired anyway) but not for midday shifts (when they are more awake). These kinds of interactions between variables would be much harder to identify if we looked at the influence of each variable in isolation, in a separate experiment.
Figure 3.10 An experiment with two independent variables (temperature and time of day)
Multivariate designs with repeated measures
You can of course use multivariate designs with repeated measures. You can have either wholly repeated measures (so that the same group of participants take part in all conditions, and the conditions represent all permutations of the independent variables concerned) or use a ‘mixed’ design (where some of the independent variables have repeated measures and some are between-groups). The same advantages and disadvantages apply as for the simpler designs. The airtraffic controller study just described could be recast as a wholly within-subjects design, as long as you were careful to give each participant enough time to recover from the effects of one condition before subjecting them to the next.
Single-Subject Experimental Designs
If you are an undergraduate, most of your study of research methods will be based on experimental designs similar to those described on p. 70 – designs that involve selectively exposing groups of people to one level of an independent variable or another, and then using statistics to measure the average performance in the different groups and decide whether it differs between conditions. Probably for most students this technique is synonymous with the idea of investigating behaviour scientifically – psychology could almost be defined as the process of comparing average performance in one condition with that in another. However there are alternative methods to the traditional experiment. Rather than measure the average behaviour of groups of people, these techniques measure the behaviour of one or just a few people.
Single-subject designs have a long and venerable history that dates back to the beginnings of scientific psychology in the 19th century, when the first psychology laboratory was set up in Leipzig by Wilhelm Wundt. His primary interest was ‘psychophysics’, the study of the relationship between perceptual experiences and physical stimuli. Psychophysics is alive and well today, and still largely based on experiments involving just one or a very few participants. Much of what we know about learning comes from studies on classical and operant conditioning, which are also frequently based on the systematic study of individual behaviour rather than averages of groups of participants.
These single-subject techniques may not be as commonly used as the ‘traditional’ experimental method; however, when used appropriately, they are potentially no less ‘scientific’ than the latter. In some circumstances, they may be preferable to conventional experimental methods. They are especially useful in situations where averaging together the performance of individuals might produce a misleading picture of the phenomenon under investigation. A good example of this is what happens when you average data in one-trial learning experiments. Each participant shows an abrupt change from no learning to 100% performance, but at an idiosyncratic point in the experiment. However if you average these data, you obtain a smooth ‘learning curve’ (see Box 3.2). This is purely a statistical artefact, which describes the data of none of the participants and thus gives a false picture of how learning is actually taking place (Sidman, 1960).
If behaviourism and psychophysics can get away with using only one or a handful of people, then why do most psychologists have to go to the trouble of finding zillions of participants and having to grapple with statistics? The answer lies in the issue of unwanted variation in behaviour, and the extent to which we can eradicate it. It’s all about control.
Box 3.2: An example of how the process of averaging data can sometimes produce misleading impressions
Graphs of individual performances show an abrupt transition from poor to perfect performance. Averaging these produces a different – and false – impression that performance improves steadily over trials.
The Logical Basis of Experimental Design
Group-averaging techniques versus single-subject methods
All of the conventional experimental designs described earlier represent strategies for dealing with unwanted variation in performance – unwanted in the sense of not occurring as a direct result of our manip
ulations of the independent variable(s) in which we are interested. The enemy of establishing causal relationships between variables is uncontrolled variation in performance. In an ideal world, participants would all produce identical scores, and would all change in the same way in response to our manipulations. In real life, of course, this does not happen. People differ in their behaviour, both within themselves (e.g., due to the passage of time, changes due to learning, etc.) and between individuals. We want to ignore variations in our scores due to factors such as these, and focus on the variations in scores directly attributable to what we did.
Recall that on page 55, I said that any obtained score can be thought of as consisting of several different components:
a ‘true score’ for the thing we hope we are measuring;
a ‘score for other things’, that we are measuring inadvertently;
systematic bias;
random error.
When you obtain an individual score in an experiment, you ideally would like it to consist of (a) and nothing else. Taken together, (b), (c) and (d) constitute unwanted ‘noise’ that may obscure our detection of (a), the effect of our manipulations. In other words, you can conceive of an individual score as consisting of a ‘signal’ (a) and ‘noise’ (the combined effects of (b), (c) and (d)). We want to detect the signal, despite the noise. For a given strength of signal, the greater the noise, the harder it makes detection of the signal. For a given level of noise, the bigger the signal, the easier it will be to detect.
It’s a bit like being at a party, and trying to hear what someone is saying despite the loud music. What they are saying (‘do you still want that tenner I owe you?’) is the signal that you are trying to detect; the ‘noise’ is the monotonous thumping and discordant jangling of some modern popular beat combo, inflicted upon you by a moronic disc jockey whose personal mission seems to be to prevent intelligent conversation at all costs. If the person talking to you has a quiet voice and the music is very loud, the signal to noise ratio is low, and you have a poor chance of retrieving your tenner: their question is drowned out by the noise. If your friend has a voice that is louder than the music, then there is a high signal to noise ratio, and the chances are you will be ten pounds richer. (Note that it’s the ratio between signal and noise that is important; so you are also likely to hear your friend if they have a quiet voice but the music is even quieter).
In short, if we have a weak effect, it is likely to be swamped by unwanted variation in performance which is unrelated to our manipulations of the independent variable. Since in practice most of our manipulations are unlikely to have massive effects on participants’ performance (i.e. we are trying to detect relatively weak signals), what we need are techniques which minimize the noise as much as possible.
The logical basis of traditional experiments
Traditional experiments take each participant’s data and try to separate the signal from the noise by statistical methods. These exploit the fact that the two types of variation in an experiment have different properties. Variation due to our experimental manipulation is systematic, while variation due to uncontrolled factors is random. Random variation, in the long run, is as likely to increase a score as it is to decrease it. The task is therefore to eliminate all sources of systematic variation, other than that produced by our experimental manipulation; we can then use statistical techniques to filter out our signal from the random noise produced by all the uncontrolled factors.
Think of a simple between-groups experiment with two conditions, A and B. We test a group of participants for condition A and another group for condition B. Within group A, everyone has received the same treatment from the experimenter, a different treatment to that received by everyone in group B. If the experiment was designed properly, this should be the only factor that can produce any systematic difference between the people in group A and the people in group B. Within each group, there may be lots of random differences between individuals, but these should pretty much cancel each other out: they shouldn’t give rise to any systematic differences between the groups.
In a conventional experiment, the task is to assess whether the difference between our groups is larger than the differences which would be produced by the operation of random factors. This is why it’s so important to randomize allocation of participants to different conditions (in a between-groups design) or randomize order of participation in conditions (in a within-subjects design). By randomly allocating participants to different conditions, these uncontrolled variations will hopefully cancel out to a greater or lesser extent. The problem is that, although the effects of these uncontrolled variables may now be rendered unsystematic, they may still produce a lot of unwanted variability in the scores, which has to be filtered out statistically.
The logical basis of single-subject experiments
Single-subject techniques use a different strategy. Instead of dealing with the signal-and-noise problem by ensuring that the noise is spread randomly across conditions, single-subject designs try to eliminate the noise as far as possible. The argument is that, if there is uncontrolled variation in scores, one should attempt to find out where it comes from and control those variables in the experiment. Single-subject techniques have been most successful where it is possible to effectively control unwanted sources of variation. In most areas of psychology, this simply isn’t possible; however, behaviourists and psychophysicists can exert powerful control over the conditions under which the individual is performing. In animal learning studies, the environment used is very impoverished (for example a Skinner box or a maze), and the participants’ previous learning history is well known. In psychophysical studies, the conditions are also very circumscribed. The stimuli are generally quite impoverished, so that there is minimal scope for participants’ responses to differ as a consequence of their previous history. It is reasonable to assume that individual responses to the task of deciding whether one grey patch is lighter or darker than another will be fairly consistent. Participants are also asked to make the minimum of decisions (typically either detection of a stimulus versus non-detection, or simple judgements such as ‘wider’ versus ‘thinner’). Finally, elaborate techniques are used to maximise the reliability of performance measurement (i.e. various sophisticated methods for calculating an individual’s threshold for detecting a stimulus or reliably discriminating between stimuli). All these factors minimise the intrusion of individual differences in performance, and hence minimize the contribution of ‘noise’ to participants’ overall performance.
Single-subject techniques are often known as ‘steady-state methodologies’, because they compare the effects of an experimental manipulation to some baseline ‘steady state’ in the same individual. The aim in designing such a study is to produce changes in the behaviour of a single individual, and demonstrate conclusively that they are produced by the experimenter’s manipulations of the independent variable concerned, rather than by other factors. It is this latter point that makes these experimental designs, as opposed to quasi-experimental designs like the single pre-test/post-test design and interrupted time-series design mentioned earlier. We’ll discuss just two versions of this method, roughly equivalent to conventional within-subjects and between-groups designs respectively; Sidman’s (1960) book still provides an excellent starting-point if you want further information on this topic.
Examples of single-subject designs
In the ABA design (Figure 3.11), baseline behaviour is carefully measured: this is state ‘A’. Some treatment is applied, and behaviour is measured while it is in force (state ‘B’). Then the treatment is removed, and the baseline behaviour recorded once more (a return to state ‘A’). If the treatment produces an effect (as opposed to the effect being produced by time threats to validity such as maturation), the effect should decrease or disappear when the treatment is withdrawn, and appear again when the treatment is re-established.
Figure 3.11 The ABA design
Here’s a hypotheti
cal example that might make this clearer. Suppose we are want to try to reduce nose-picking behaviour in a child, ‘Little Wilbert’. We could measure nose-picking for some days, to get a baseline state ‘A’ for this behaviour. Then, we could introduce some experimental manipulation: suppose, for example, that every time Wilbert’s finger went up his nose, we poked him with an electric cattle prod. (I was going to suggest that we played him a ‘Steps’ record, but that would be too unethical!) So this is now state ‘B’. We measure nose-picking for a few days while this treatment is in force. Finally, we return to state ‘A’, and let Wilbert scrape his hooter to his heart’s content. If our manipulation has controlled Wilbert’s behaviour, the frequency of nose-picking should be clearly lower on the days when state ‘B’ was in force, compared to the periods of state ‘A’ before and afterwards.
By administering the treatment twice, at irregular intervals (i.e. ABAB), we can guard against the possibility that the results obtained are simply due to coincidence (i.e., the fortuitous simultaneous operation of our treatment and other uncontrolled variables). It is this that distinguishes the ABA design from the superficially similar interrupted time-series design: in the latter, the experimenter has no control over when the manipulation of the independent variable occurs, and so has no way of knowing for sure whether changes in behaviour are due to their manipulation or because of the influence of extraneous factors (i.e. the action of various time threats). In the ABA design, because the experimenter has complete control over when the treatment is applied or removed from the participant, they are able to pinpoint cause and effect: they can be much more sure that it is their actions which are affecting the participant.