How to Design and Report Experiments Read online

Page 3


  The method of agreement: an effect should be present when the cause is present.

  The method of difference:

  be absent also.

  The method of concomitant variation: when the two previous relationships are observed, causal inference will be made stronger because most other interpretations of the cause-effect relationship will have been ruled out.

  Figure 1.3 illustrates some of these points. Here we have a situation in which we have two events (me talking about causality or my cat talking about being fed – in cat language of course) that precede two outcomes (boredom or happiness). According to Mill’s criteria, to infer that talking about causality causes boredom: (1) talking about causality must precede boredom (in all situations in the diagram this is true); (2) the correlation between boredom and my talking about causality must be strong (in the diagram 4 out of 4 occasions when I talk about causality boredom is observed so the correlation is perfect in this case); (3) whenever boredom occurs I have previously (and recently) been talking about causality (again in the diagram every instance of boredom is preceded by my talking about causality); (4) when cause is absent effect is also absent (when I don’t talk about causality in the diagram – and instead my cat talks about food – no boredom is observed so boredom is not elicited by anything other than me); and (5) the manipulation of cause leads to an associated change in effect (so if we manipulate whether I’m talking about causality or my cat is mewing, the effect elicited changes according to the manipulation – again, in the diagram the change in the cause is always associated with a change in the effect). This final manipulation serves to rule out external variables that might affect the cause-effect relationship.

  Figure 1.3 Me talking about causality appears to correlate with boredom!

  It would be fair to sum up Mill’s ideas by saying that the only way to infer causality is through comparison of two controlled situations: one in which the cause is present and one in which the cause is absent. These situations should be identical in all senses except the presence of cause (for an applied example of these ideas read Field & Davey, 1998).

  The Evolution of Theories

  In psychology, we typically generate theories from which we develop hypotheses that we test (either with experiments or correlational methods). If we want to be certain about the causal relationships between variables then we typically design experiments. If the results confirm our hypotheses then they are published in academic journals (provided we haven’t tested a theory of little importance, or tested a theory of great importance in a ridiculously flawed way, or tested a theory that disproves the long espoused beliefs of the editor of the journal!). Once published, the theory is (hopefully) accepted as a valid contribution to our understanding of psychology. This approach is inductive in that we collect evidence to corroborate theories (you can see now just what an impact Mill’s ideas had).

  Nevertheless, I’m sure you’ll be pleased to learn that the inductive approach that dictates all of scientific knowledge has a logical flaw! The flaw is really an extension of Hume’s critique of induction, which can be summarized as ‘just because we observe that night always followed day in the past does not prove that night will follow day again in the future’. Although if we were really bored you could argue about whether night follows day, or day follows night, this simple example also illustrates why correlation is no indication of causality, although night and day are perfectly correlated, neither causes the other (both are caused by an external factor – the spinning of the Earth in relation to its sun).

  Duhem and the conventionalists

  Karl Popper also believed in the inherent ambiguity of confirmation. In his books, The logic of scientific

  historicism (1957) Popper developed Hume’s critique to devise a system of scientific discovery. Popper was hugely influenced by the conventionalist philosophers such as Pierre Duhem, Francis Bacon and possibly more importantly, Henri Poincaré. We’ve seen that the prediction of some outcome involves sets of laws and hypotheses about predictive events, or causes. It logically follows that if these causes are present but the outcome is not then one of the causes must be false (i.e. it is not actually a cause). The main contribution of the conventionalists was to contradict this logic; they argued that all that would be disconfirmed in this situation is the conjunction (combination) of the causal events. In fact Duhem in particular went on to suggest that when the observed evidence contradicts the causal relationships that a scientist has proposed then that scientist is free to alter one or more of their hypotheses in such a way as to explain the observed data. An example might be that I predict that when he is hungry, and when I am in the house my cat, Fuzzy, will meow. The outcome is my cat meowing and the predictive events are hunger and my presence at home. If one day I get home and my cat doesn’t meow, then this observation contradicts my hypothesis. There are three obvious explanations: (1) Fuzzy isn’t hungry; (2) I am not at home (despite the fact I think I am!); or (3) one of these variables is not really a cause. However, Duhem would’ve argued that Fuzzy did not meow because the combination of Fuzzy being hungry and me being home is wrong. Furthermore, as a scientist I would be entitled to explain away the disconfirming evidence by suggesting other explanations: perhaps some other causal variable is necessary; perhaps Fuzzy wasn’t hungry; perhaps I only thought I was at home when actually I was somewhere else (being a philosopher Duhem would love to question my belief in my own perception); maybe Fuzzy did meow but I’d suddenly gone deaf; or maybe Fuzzy had been replaced by a non-meowing doppelganger cat by the evil Dr. Catthief who spends his life deliberately stuffing up everyone else’s experiments. Duhem believed that the scientist should be left to exercise his (or her) expert and objective judgment (yeah, right!) in deciding which part, or parts, of the original hypothesis to change. Although it might sound as if Duhem was endorsing doggedly sticking to preconceived scientific ideas even in the face of contradictory evidence, he was actually astute in recognizing the role of outside influences on data and experiments (the tertium quid on page 12). He also, along with Francis Bacon, was one of the first to acknowledge the idea of a critical experiment that might distinguish between two competing hypotheses. This idea was key in Popper’s later thinking regarding how theories should develop.

  Popper’s logic of scientific discovery

  Popper distinguished between scientific and non-scientific statements. Scientific statements are ones that can be verified with reference to empirical evidence, whereas non-scientific statements are ones that cannot be empirically tested. So, statements such as ‘Korn are a great band’, ‘cats are better than dogs’, ‘Gonzo is the funniest Muppet’ and ‘beating children is morally wrong’ are all non-scientific; they cannot be proved or disproved and although we might all agree that beating children is morally wrong, the morality of the issue cannot be tested empirically (it is a subjective value). Scientific statements can be confirmed or disconfirmed empirically. ‘Eating blue Smarties makes you hyperactive’, ‘drinking wine and beer gives you a worse hangover than drinking beer alone’ and ‘depression increases the risk of suicide’ are all things that can be tested empirically (provided you can quantify and measure the variables concerned). Non-scientific statements can sometimes be altered to become scientific statements, so, ‘cats are better than dogs’ is non-scientific but by changing the statement to ‘cats are better than dogs at climbing trees’ it becomes testable (we can collect data about the success of cats and dogs when climbing trees and prove or disprove the statement). Popper believed that non-scientific statements were nonsense, and had no place in science (Box 1.2).

  However, although a statement of hypothesis might be testable (unless you’re Freud!), Popper was aware that verification through corroborative evidence was insufficient evidence of the truth of a statement. He went as far as to say that such statements could be assigned only the status of ‘yet to be disconfirmed’. So, even when there was a wealth of corroborative evidence in favour of a theory, Popper
would argue that this theory was merely waiting to be disproved (falsification). Popper believed that it is more powerful to disprove a theory than to corroborate it, therefore, scientists should seek disconfirming evidence. To take an example always close to the hearts of young females, suppose we had a theory that all men are unreliable (that wasn’t the word I was going to use, but I need to get this past the editor) based on an observation that all of the men we had previous experience of were indeed unreliable. The more unreliable men we experience, the more this theory is corroborated. However, we have to encounter only one man who is reliable to disconfirm this theory. Hence, one instance that disconfirms a hypothesis is more powerful than many instances that confirm the hypothesis.

  Box 1.2: Scientific statements?

  Popper believed that non-scientific statements had no place in science. Have a look at the following theories and ideas and think about whether these statements, or theories, are scientific (in Popper’s view):

  Answers to Box 1.2: (1) Freud’s theory of reaction formation has come under fire for being non-scientific because it doesn’t predict when someone will behave oppositely to their true feelings and when they will not (as such it cannot be tested); (2) dimensions of personality can (arguably) be measured and counted and so this statement can be scientifically tested; (3) Darwin’s ideas rely on speculation about past events, in the absence of a time machine it is impossible to test his speculations and so in a sense it is a non-scientific statement (however, we can collect corroborative evidence through fossils and so on); (4) it is possible to see whether a dog learns that a noise predicts food by measuring salivation to both stimuli before and after learning (see Davey & Field, in press) and so this statement is scientific; and (5) it is also possible to measure thoughts after social events in anxious and non-anxious individuals and compare them, so this statement is scientific also.

  It’s interesting that in Popper’s terms two key theories (evolution, and Freud’s theory of personality) centre on largely unscientific statements and yet have still been hugely influential!

  The idea of falsification can be further demonstrated by reference to a familiar psychological task: Wason’s (1966) 4 card task. In this task, participants are presented with four cards known to have a letter on one side and a number on the other side (see Figure 1.4) and are told the rule that if a card has a vowel on one side then it must have an even number on the other side. The task is to decide which card or cards should be turned over to determine whether this rule is true.

  Figure 1.4 One example of Wason’s card task

  Wason found that most people (about 79%) responded by turning over the card labelled E, and sometimes the card labelled 4 also. In a sense, this is a sensible thing to do because it could confirm the theory: if the E (a vowel) has an even number on the back of it then the rule is supported, and if the 4 has a vowel on the back then the rule is also supported. However, these actions alone do not disconfirm the rule provided the E does indeed have an even number on the back. Turning over the 4 is irrelevant because the rule does not say that even numbers cannot occur on the back of consonants. To fully test the rule, the 7 must be turned over (as well as the E) because if this card has a vowel on the back then the rule is disconfirmed. Therefore, the card labelled 7 has power to determine the truth of the rule whereas the card labelled 4 does not. So, people have a natural bias to try to confirm hypotheses and yet science is based on the opposite – we try to disconfirm hypotheses.

  The kind of falsification just described was Popper’s fundamental basis for his theory of scientific discovery. He took this idea and set up principles on which to base scientific investigation in which he proposed that when generating hypotheses, scientists should attempt to disconfirm past hypotheses. As such, any new scientific study should do two things:

  Disconfirm an old hypothesis

  Confirm (or corroborate) a new hypothesis.

  He went on to say that

  . . .if we test two such systems which differ in one hypothesis only, and if we can design experiments which refute the first system while leaving the second very well corroborated, then we may be on reasonably safe ground if we attribute the failure of the first system to that hypothesis in which it differs from the other (Popper, 1957: p. 132).

  This idea is similar to Mill’s thinking on causality. Mill believed that by comparing two situations that differ only in the presence of the causal variable, causality could be isolated. In a similar vein, Popper was suggesting that the truth of a scientific statement or theory could be tested only by comparing two hypotheses that differ in a single respect. Both Mill and Popper saw the fundamental importance of controlling all factors other than the one that is of interest to the scientist.

  Combining Popper’s beliefs about scientific statements and falsification we might sum up his work by suggesting that all theories should be open to test, or falsification, and that any such test must involve direct comparison with a competing theory that is similar in all but one respect. Over time, the theory will therefore evolve by gradual falsification of incorrect elements of the initial proposal. Ultimately all theories are awaiting falsification and can only be assigned the status of a truth when all attempts to falsify the theory have failed (i.e. all alternative models have been tested), and, of course, this situation will never be reached.

  Putting Theories to the Test

  We have already learnt two important research principles: (1) to isolate causal variables we must rule out all other explanations of the effect we observe (we must compare two situations that are identical in every respect except the supposed causal variable); and (2) we must attempt to falsify theories by designing experiments that compare one theory with another. The next obvious thing to learn is how we actually put these principles into practice. There are two issues here, the first is how we rule out other explanations of the supposed cause, and the second is how we gain confidence that one theory is correct and another is not.

  Isolating cause: control conditions

  Mill proposed that causal factors could be isolated only by comparing two conditions: one in which supposed cause is present and one in which supposed cause is absent. For example, if we wanted to see whether using mobile phones causes brain tumours, we are proposing a cause (mobile phones) and an effect (tumours). To verify that mobile phones do cause tumours we would need to have one condition in which mobile phones are present and one in which mobile phones are absent. The condition in which cause is absent is known as a control condition and it acts as a baseline against which to compare behaviour when a proposed cause is present.

  Earlier on, I explained that the difference between experiments and correlational research was that in experiments we manipulate one variable to observe its effect on another. The variable that we typically manipulate is the one we have proposed as a cause and in the simplest situation we manipulate it by changing whether the cause is present or absent. These manipulations are known as levels of the variable. The most basic experiment manipulates the causal variable so that it has two levels (supposed cause present versus supposed cause absent) but we can go on to have more levels. For example, we could manipulate not only whether mobile phones are used or not, but also how much they are used. So, we could look at when a mobile phone is not used, when it is used for up to 7 hours a week, up to 14 hours a week and up to 21 hours a week. The mobile phone variable now has 4 levels (0, up to 7, up to 14 and up to 21). The variable that is manipulated is called the independent variable (because its value is independent of the other variables in the experiment, it instead depends on the experimenter) whereas the outcome variable, the one that is not manipulated by the experimenter, is called the dependent variable (because its value depends on the other variables in the experiment). Students often confuse these terms, but really if you think of it in terms of cause and effect, the outcome variable should depend on the causal variable so anything that is an outcome of your experiment is a dependent variable, and anything that you
, as an experimenter, propose as a cause of the outcome you’re measuring is an independent variable.

  Box 1.3 shows a real-life research example of the principles of control conditions and the dramatic effects they can have. The example is from my own research on the learning of likes and dislikes. It illustrates that when control conditions are not used, the conclusions that researchers reach can sometimes overlook other factors that they might never have conceived would explain their findings. We’ll talk about some different types of control conditions in Chapter 3.

  Killing the tertium quid I: controlling other factors

  The second stage in ruling out other explanations of causal relations is to minimize the risk of random factors influencing your experiment. Mill showed that he was aware of these factors when he suggested comparing conditions that were identical in all respects except the proposed cause. What he meant by this is that all random factors should be held constant. To do this we have to ensure that all aspects of our experimental conditions are the same and you can take this idea to varying degrees of extreme. In our mobile phone example it would, for instance, be sensible to ensure that everyone used the same phone (to ensure everyone was exposed to the same level of microwaves), and had similar degrees of brain health to begin with. In other experiments, you would hope to test everyone in the same room and use identical instructions to ensure that no variable has a systematic influence on the outcome.