Scientific method

The neutrality of this article is disputed.

The scientific method refers to a sequence or collection of procedures that are considered characteristic of scientific investigation and the acquisition of new scientific knowledge. This method is believed to distinguish science from other intellectual traditions, such as painting, philosophy or theology.

Strong differences of opinion exist among philosophers, historians, sociologists and even among scientists about what processes truly characterize science. Some believe that no characteristic method exists, and some doubt even the possibility of specifying more effective or more reliable approaches. Nevertheless, observers of science have spent a lot effort over many centuries to understand how science works. This article attempts to summarize the most influential theories and ideas.

Table of contents

1 Introduction
2 The idealized scientific method

2.1 Observation
2.2 Hypothesis
2.3 Prediction
2.4 Verification
2.5 Evaluation

3 Other aspects of the scientific method

3.6 Creativity
3.7 Asthetics

4 Philosophical Issues

4.8 Verification
4.9 Foundationalism
4.10 Demarcation
4.11 Science as a communal activity

5 Annotated list of related issues
6 See Also

6.12 Collateral topics
6.13 External links

Introduction

In his enunciation of a 'scientific method' in the thirteenth century, Roger Bacon was inspired by the writings of Arab alchemists, who had preserved and built upon Aristotle's portrait of induction. Bacon described a repeating cycle of observation, hypothesis, experimentation, and the need for independent verification. In the 17th century Francis Bacon described a rational procedure for establishing causation between phenomena. Argument by analogy, which was popular in the ecclesiastical scholarly tradition, became much less acceptable in science (or "natural philosophy," as it was still called).

It is common to speak as though a single approach such as Roger Bacon's is how scientists operate all the time. Many historians, philosophers and sociologists regard this perspective as naïve, and see the actual operation of science as more complicated and haphazard. How scientists actually operate--that is, design experiments, come up with theories, and choose among them--evidently differs from one scientific discipline to another, from one scientist to another, and even from one scientific investigation to another made by the same scientist. Yet a common element of scientific research involves the vetting of theories in a way that seems more formal and rigorous than the practices of other disciplines and traditions.

The question of how science operates has importance well beyond scientific circles or the academic community. In the judicial system and in public policy controversies, for example, a study's deviation from accepted scientific practice is grounds for rejecting it as "junk science." Whether formularizable or not, scientific method represents a standard of proficiency and reliability.

The idealized scientific method

The essential elements of the scientific method are traditionally described as follows:

Observe: Observe or read about a phenomenon.
Hypothesize: Wonder about your observations, and invent a hypothesis, (sometimes one's hypothesis is initially nothing more than a "guess"), which could explain the phenomenon or set of facts that you have observed.
Test a hypothesis
- Predict: Use the logical consequences of your hypothesis to predict results (e.g., measurable experimental values) that must be found if the hypothesis is to be judged correct -- whether it is 'complete' or not.
- Experiment: Perform experiments to test those predictions. (Note that great precision regarding a negative result might not be required to falsify a hypothesis.)
Conclude: Failure to see the predicted results from a well designed and implemented experiment is clear indication that the hypothesis is defective. Try again. Seeing the predicted results is an indication that the hypothesis is acceptable though not 'confirmation' or 'proof' of its correctness.
- Evaluate: Search for other possible explanations of the result until you can propose no better account of your data.
Formulate a new hypothesis which may better explain the experimental data and the original observation.
Repeat

These activities do not describe all that scientists do. The simplified method described above is often used in teaching, and describes how many scientists believe that they work.

This idealised process is often misinterpreted as applying to scientists individually rather than to the scientific enterprise as a whole. Science is a social activity, and one scientist's theory or proposal cannot become accepted unless it has become known to others (usually via publication, ideally peer reviewed publication), criticised, and finally accepted by the scientific community.

Observation

The scientific method begins with observation. Observation often demands careful measurement. It also requires the establishment of operational definitions of measurements and other relevant concepts. Definitions are not scientific hypotheses; they are not "falsifiable"; they are simply a way to ensure that everyone is talking about, experimentally testing, etc the same thing. Definitions condense a number of ideas into a single word or phrase. That being said, an observer's definition could differ significantly from commonly understood concepts of a term, and still be correct. Such a definition, however, would as 'private speech' always does, carry greater risk of being misunderstood. These definitions are operational (involve a clear statement of procedures involved) and may differ depending upon the context determined by the terms of a hypothesis. As such they may be refined when the hypothesis is refined and/or when better methods are found for performing the pertinent experiments.

For example, the term "day" is useful in ordinary life, and its meaning may vary with the context in which it is used. (Do we mean a 24 hour period or do we mean the time between sunrise and sunset?) We usually do not have to define it with great quantitative precision to make use of it. In many contexts, it is precisely 86,400 atomic seconds, but in daily life being a few seconds or even minutes off will not even be noticed. In studying the motion of the Earth, we may use two distinct operational definitions: a solar day is determined by making two successive observations of the sun at the same position in the sky; a sidereal day is the time between two successive observations of a specific star at the same position in the sky. Both kinds of observations are made relative to a single observation position on the ground. The length of these two kinds of day differs by about four minutes and is due to the motion of the Earth along its orbit around the Sun during a 'day'.

Slight differences between operational definitions are often important, as they are needed to make experiments precise enough to distinguish subtle underlying phenomena. Another example of the need for carefully constructed operational definitions is found in the task of choosing the appropriate segmentation in the statistical analysis of data.

Distinctions in operational definitions can also reflect important conceptual differences: for example, mass and weight are quite different concepts in science, but the distinction is often ignored in everyday life. Weight is a measure of the attraction between two or more masses. The operations performed when one wants to weigh something are well known. The operations to be performed when one wants to determine the mass of something are not well known, so it is easy for people to equate mass with weight. Then, when watching an astronaut moving a massive article in space it is hard to understand how the object could be weightless and yet might crush the astronaut if it squished him against the space station. A more sophisticated observer would know that if one stands at the point where two locomotives will collide, it will be the mass of the trains and not their weight that will do the damage.

Hypothesis

To explain the observation, scientists use whatever they can (their own creativity (currently not well understood), ideas from other fields, or even systematic guessing, or any other methods available) to come up with possible explanations for the phenomenon under study.

In the twentieth century Karl Popper introduced the idea that a hypothesis must be falsifiable; that is, it must be capable of being demonstrated wrong. This was similar to C S Peirce's position, falibilism, which Popper credited after he became aware of Peirce's work. Paul Feyerabend argued against this position, providing examples of falsified scientific theories that nevertheless had a vital role in the progress of scientific understanding.

Of course, it is impossible for a scientist to be absolutely impartial, or to consider all known evidence. There is too much evidence in many cases, and like everyone else, scientists so far have been human and can miss things for assorted reasons. But by comparing their results and work with that of others (e.g., by submission for 'peer review' before publishing), scientists can at least make it more likely that the hypotheses formed will be not obviously in error, and perhaps even relevant and useful.

In those cases where no better grounds for discriminating between alternative hypotheses can be found, the bias scientists almost always follow is provided by the principle called Occam's Razor (there are several spellings): One chooses the simplest explanation for all the available evidence, in whatever sense "simple" is appropriate in the context. The usual meaning is 'conceptually simple'. An example is an ancient theory of how the Earth is supported. It was believed that it rested on the back of a giant turtle. But, the moment one asks what supports the turtle, the issue of infinite regress arises ("Why, sonny, it's turtles all the way down!" is a famous retort to such an awkward observation), and any explanation not involving an infinite turtle stack becomes 'simpler'.

In other cases, it is 'mathematical simplicity'. For example, about 1860 James Maxwell developed a mathematically quite elegant, and structually simple, account of electromagnetic radiation (light, X-rays, radio, radar, ...) which involved dual magnetic and electric fields and their interactions. It was surely less simple in many ways than the earlier account of vibrations of the ether 'carrying' electromagnetic radiation. Unfortunately for the simpler luminiferous ether theory, the Michelson-Morley experiment of 1887 made any account including both the ether and their experimental results more complicated still. Their work indicated that the experience of a ship moving through the ether and measuring the speed of vibrations in the ether would not work in the way observed for a ship moving through water and measuring the speed of waves in the water. Both experimenters continued for some decades to ponder the problem in an attempt to reconcile the result with the existence of the ether; neither they nor anyone else has succeeded. Ernst Mach may have been the first to explicitly abandon the ether as a consequence of their experimental result. Maxwell's hypothesis became, rather quickly, the simplest available account. With modifications required by Einstein's Relativity, it still is.

Currently, a theory (or several theories in a family of them) is developing which will require 10 dimensions (or 11 -- which is down from quite a few more) to account for the phenomena being described (elementary particles and their interactions, plus gravity). The necessity appears, to outsiders, to be both mathematical and excessively complicated. But it may be, when the dust settles, that superstrings with 11 (or however many) dimensions may be the simplest available theory which satisfies the Razor. Or something else altogether (turtles redux, perhaps) may eventually hold sway.

Prediction

An hypothesis must make specific predictions; these predictions must be testable, typically with concrete measurements. If results contradictory to the predictions are found, the hypothesis under test is wrong (requiring either revision or abandonment). If results consistent with the hypothesis are found, the hypothesis might be correct, but is always subject to further tests. In Popper's view, any hypothesis that does not make testable predictions is simply not science. Something else useful and valuable perhaps (or perhaps not), but not science.

For instance, Albert Einstein's General Relativity makes several specific predictions about the observable structure of space-time, such as a prediction that light bends in a gravitational field, and that the amount of bending depends in a precise way on the strength of the gravitational field. Observations made during a 1919 solar eclipse supported the hypothesis (i.e., General Relativity) as against those of other hypotheses which predicted different results, and falsified any theory which predicted something else, e.g., Newtonian gravitation. (Later similar observations improved the experimental fit between the measurements and Einstein's hypothesis; better equipment helps in many cases. Other experiments have provided other forms of verification of General Relativity. So far, no observation has contradicted a General Relativity prediction; but note that several alternative theories have been developed, none of which has, to date, done better.)

Going back to the foundations of science and scientific explanations for things, no one has succeeded in finding any of those giant turtles. By now, observation has established that they must be invisible, allow undetectable penetration by spacecraft, and probably have giant rollers in their backs to allow for the Earth's rotation, else they would have been noticed by now, one way or another. Any turtle theory suggesting otherwise has been demonstrated to be wrong. The turtle theory (as modified to account for new observations) is getting more and more complicated. Occam would suggest its rejection.

Deductive reasoning is the way in which predictions are developed with which to test a hypothesis.

Verification

Probably the most important aspect of scientific reasoning is the demand for empirical verification: One's experimental observations must be verifiable by other researchers. Verification is the process of determining whether the hypothesis is in accord with empirical evidence, both newly acquired and already existing. It is the necessary complement to predictions.

Ideally, the experiments performed should be fully described so that anyone can reproduce them, and many scientists should independently verify every hypothesis. Results that can be obtained from experiments performed by many are termed reproducible and are given much greater weight in evaluating hypotheses than is given to non-reproducible results.

Scientists must design their experiments carefully. For example, if the measurements are difficult to make, or subject to observer bias, one must be careful to avoid distorting the results because of influences that arise from the experimenter's wishes. When experimenting on complex systems, one must be careful to isolate the effect being tested from other possible causes of the intended effect (this results in a controlled experiment).

In testing a drug, for example, it is important to carefully test that the supposed effect of the drug is produced only by the drug itself, and not by the placebo effect or by random chance. Doctors do this with what is called a double-blind study: two groups of patients are compared, one of which receives the drug and one of which receives a placebo. No patient in either group knows whether or not they are getting the real drug. Even the doctors or other personnel who interact with the patients do not know which patients are getting the drug under test and which are getting a fake drug (often sugar pills), so their knowledge cannot influence the patients either.

Evaluation

Falsificationism requires that any hypothesis, no matter how respected or time-honoured, be discarded once it is contradicted by reliable evidence, evidence that usually would come from new experiments. This is something of an oversimplification, since individual scientists will often hold on to their pet theory long after contrary evidence has been found. Max Planck is said to have suggested that new scientific theories are adopted when today's scientists finally die. This is not always a bad thing -- delayed adoption, not scientist mortality. Any theory can be made to correspond to the facts, simply by making a few adjustments—called "auxiliary hypotheses"—so as to bring it into correspondence with the accepted observations. Additions to the turtle theory (invisibility, non-corporality, ...) are examples of this kind of thing. When to reject one theory and accept another ('better' one) is dependent on the judgement of individual scientists, rather than on some law or authority. As for the turtles, the members of the Flat Earth Society apparently still regard it as a tenable hypothesis, despite any observations made in the last few thousand years that might contradict it.

All scientific knowledge is thus always in a state of flux, for at any time new evidence could be presented/discovered/developed that contradicts a long-held hypothesis. A particularly luminous example is the theory of light. Light had long been supposed to be made of particles. Isaac Newton was convinced it was so, but his light-is-particles account was overturned by evidence in favor of a wave theory of light suggested most notably in the early 1800s by Thomas Young an English physician. Light waves neatly explained the observed diffraction and interference of light when, to the contrary, the explanation of light as a particle did not. The wave interpretatoin of light was widely held to be unassailably correct for most of the 19th century. Later, however, observations were made that a wave theory of light could not explain. This new set of observations could be accounted for by Max Planck's quantum theory (including the photoelectric effect and Brownian motion -- both from Albert Einstein), but not by a wave theory of light.

The failure of one hypothesis often does not lead smoothly to a new and successful hypothesis. In the case of light, the result of the ferment created by the two seemingly contrary sets of very well verified observations was that the nature of light now appears to us to depend very much on how we observe it. We speak of light in terms of other phenomena with which we are very familiar from our daily lives. When we say "Light is a wave," or "Light is a particle," we are actually indicating that it appears to us to behave like a wave or a particle. But we actually see neither "light waves" nor "light particles". Instead, we see results that drive us to use these analogies. The currently theory of light (the best available account of the observed phenomena), dating from the first decades of the 20th century, holds that light shares both wave and particle characteristics. Both "behaviors" of light have been verified so many times, in laboratories that research the properties of light and in commercial enterprises that use the theoretical results to guide the design of precision equipment, that people have become extremely confident of the predictions that can be made on these theoretical grounds. No future theory proposed to explain light could be accepted if it could not consistently explain these highly consistent observations. Those experimental observations show that both Newton (particles) and Young (waves) were wrong. Yet if viewed as simplifications to be used only to predict certain kinds of light phenomena, both a simple quantum account and a simple wave account serve quite well to make practicle applications. Richard Feynman, among others, has tried to combine the wave interpretation and the particle interpretation in one consistent theory, but the task of expressing results in language better suited to explaining the world of our everyday experience has led to discussion of whimsical and chimerical creatures like the "wavicle."

Experiments that force rejection of an hypothesis, whether widely accepted or not, should be performed by many different scientists so as to guard against bias, error, misunderstanding, fraud, etc. Scientific journals use a process of peer review, in which scientists' papers describing experimental results and their conclusions are submitted to a panel of fellow scientists (who may or may not know the identity of the writer) for evaluation.

Scientists are rightly suspicious of results that do not go through this process. For example, the cold fusion experiments of Fleischmann and Pons were never peer reviewed—they were announced directly to the press before any other scientists were able to evaluate their efforts or reproduce their results. Their results have not been reproduced elsewhere else in the decades since; the press announcement was regarded at the time, by most nuclear physicists, as very likely wrong. Peer review may well have turned up problems and led to a closer examination of the experimental evidence Fleischmann, Pons, et al believed they had found. Paul Kammerer's experiments on acquired physical traits in amphibians (described in Arthur Koestler's The Midwife Toad) seem to have been deliberate faked, while the confusion in the 60s and 70s about 'polywater' seems to have been the result of micro contamination (and maybe some Cold War political oneupsmanship). Much embarrassment, and wasted effort, might have been avoided by proper peer review in many such cases.

On the other hand, peer review of new discoveries is sometimes not very open-minded. The discovery of prions caused much scoffing and even hostility to be directed against Stanley Prusiner, yet in 2004 his name is back in the news as having discovered an enzyme that may eliminate the threat caused by "mad cow disease."

Other aspects of the scientific method

Creativity

There are no definitive guidelines for the production of new hypotheses. The history of science is filled with stories of scientists claiming a "flash of inspiration", or a hunch, which then motivated them to look for evidence to support or refute their idea. Michael Polanyi made such creativity the centrepiece of his discussion of methodology. The story about an apple falling on Isaac Newton's head and inspiring his theory of gravity is a popular example of this; there is no evidence that an apple actually fell on his head. All Newton said was that his ideas were inspired "by the fall of an apple." In contrast, Kekule's account of the inspiration (in the mid 19th century) for his hypothesis of the structure of the benzene-ring (day dreaming of snakes biting their own tails while he was dozing in an omnibus) is better attested, in his own words from the time. Though primarily an engineer and not a scientist, Thomas Edison was famously quoted in the 20th century as saying that "inspiration is 99% perspiration", but he sought to capture the creative insights that may occur during the twilight between wakefulness and sleep. He made a frequent practice of holding something in his hand as he drifted off to sleep in his chair so that as soon as he entered sleep he would be awakened by the sound of the dropping weight. He would then be able to remember what he had envisioned during his most recent twilight state. Hypotheses come from many sources and there is no method known which always, or even mostly, generates "good" ones.

Asthetics

Scientists tend to look for theories that are "elegant" or "beautiful". In contrast to the usual English use of these terms, scientists have more specific meanings in mind. "Elegance" (or "beauty") refers to the ability of a theory to neatly explain all known facts as simply as possible, or at least in a manner consistent with Occam's Razor while at the same time being aesthetically pleasing. This seems to be primarily a psychological bias, however often it has been useful in predicting correctly among competing theories. After all, 'more complex' (and so less psychologically satisfying) theories have often been required 'to account for the phenomena'. Superstring theory (even with all those dimensions) may turn out to be a theory which is both beautiful and yet as lean as it possibly could be. Turtlian world support theory has not been widely praised for either its predictive successes or its aesthetic qualities.

Philosophical Issues

What has been called idealised scientific method in this article is one of many theories describing the way in which science works or should be conducted. These include hypothetico-deductive method, falsification, the research programs of Imre Lakatos, and the scientific revolutions of Thomas Samuel Kuhn. Whilst the idealised method is a description often presented to novice scientists, and as such carries great sway, it seems reasonable to ask how accurate it is in portraying the actual procedures followed by working scientists.

The material presented below is a brief introduction to some concepts in the Philosophy of science, and is intended to show that some of the issues surrounding the scientific method are neither straightforward nor simple.

Verification

The idealised scientific method relies on observation in that observation is crucial to verification. There are two difficulties with using observation in this way. Firstly it can be argues that observation is embedded in theory, and so cannot act as a neutral arbiter. Secondly, any theory can be made compatible with any observation by suitable modification.

Observation involves perception, and so is a cognitive process. That is, one does not make an observation passively, but is actively involved in distinguishing the thing being observed from the other sensory data. Observations, therefore, depend on an underlying understanding of the way in which the world functions.

This is amplified if we use technology to assist our observations. The theory of optics of was controversial at the time of the introduction of telescopes, and much discussion occurred before observations through them were accepted. Consider how much more reliance on theory is involved in accepting photos returned from Mars.

So observations are embedded in theory. The idealised scientific method uses empirical observation to determine the acceptability of hypotheses during the verification phase. Observation can only do this neutrally if the theory on which the observation depends and the theory being verified are independent.

Kuhn denied that this was ever possible, arguing that observations always rely on a specific paradigm, and that it is not possible to evaluate competing paradigms independently. For Kuhn, the choice of paradigm was sustained, but not ultimately determined by, rational processes. Instead, Kuhn saw this choice as determined by the community of scientists. Verification is, he claimed, more a social than a rational process.

The Quine-Duhem thesis claims that any theory can be made compatible with any empirical observation by the addition of suitable ad hoc hypotheses. This thesis was accepted by Karl Popper, leading him to reject na�ve falsification in favour of 'survival of the fittest', or most falsifiable, of scientific theories.

Ontological relativity, a theory supported by W. V. Quine, argues that if empirical data is not sufficient to make a judgment between theories, then the choice between conflicting theories is arbitrary in the sense that there can be no final, rational ruling on which theory should be accepted.

Foundationalism

The idealised method adopts a foundationalist epistemology, implicitly claiming that observations do not require justification and that observation is needed to get the scientific process underway. That observation is embedded in theory undermines its ability to act as the unjustified base of a foundationalist epistemology. It appears to be reasonable, when someone claims to have made an observation, to ask them to justify their claim.

This is not the same as arguing that observations are irrelevant to science. Scientific understanding derives from observation, but the truth of scientific statements is as dependent on the related theoretical background as it is on observation. Coherentism and scepticism offer alternative ways of dealing with this difficulty.

Demarcation

Scientific Method is often touted as determining which disciplines are scientific and which are not. Those which follow the scientific method might be considered sciences; those that do not are not. That is, method might be used as the criterion for demarcation between science and non-science.

If observation cannot act as a theory-independent foundation for the scientific enterprise, science becomes a cycle of hypothesising and verification embedded in a theoretical framework and tied to the 'real world' by the agreement of the scientific community. Popper's claim that only falsifiable statements are scientific does not help here (see The_Criterion_of_Demarcation). The Quine-Duhem thesis argues that it is not possible to prove that a statement is falsified; rather, falsification occurs when the scientific community agrees that a statement is falsified.

Assuming this to be true, it is not obvious how scientific debate differs in any logical way from the debates of, for example, historians. Both work within a cycle of hypothesising and verification, historians by reference to historical documents, scientists by reference to the experiments they construct. It is not possible to conduct experiments to test historical hypotheses, and that is not what this argument claims. History has already happened and cannot be rerun. Historians test their hypotheses by comparing them to historical sources and to other theories, whilst scientific theories are tested by comparing them to experimental results. What appears to differ is not the method, but the content, with historians taking documents as their verification criterion, while scientists use documentation from experiments.

One might argue that science occupies a special place because its experiments can be repeated, but using repetition as a demarcation criterion would disenfranchise areas that are at present considered to be science, such as palaeontology and cosmology.

Alternately, Kuhn claims that the explanatory success of science is explained by the way in which scientists are restricted to working within a particular paradigm.

Paul Feyerabend takes these arguments to their limit, arguing that science does not occupy a special place in terms of either its logic or method, and so that any claim to special authority made by scientists cannot be upheld. This leads to a particularly democratic and anarchist approach to knowledge formation.

Science as a communal activity

The idealised scientific method makes reference to the scientific community in the verification and evaluation of a scientific theory. Some consideration will lead to the conclusion that the role of the scientific community extends further than this.

In his book the Structure of Scientific Revolutions Kuhn argues that the process of observation and evaluation take place within a paradigm. 'A paradigm is what the members of a community of scientists share, and, conversely, a scientific community consists of men who share a paradigm' (postscript, part 1). On this account, science can be done only as a part of a community, and is inherently a communal activity.

For Kuhn the fundamental difference between science and other disciplines is in the way in which the communities function. Others, especially Feyerabend and some post-modernist thinkers, have argued that there is insufficient difference between social practices in science and other disciplines to maintain this distinction. It is apparent that social factors play an important and direct role in scientific method, but that they do not serve to differentiate science from other disciplines.

This is not an area of study in which it is possible to give a definitive account, because it is undergoing considerable change. It appears that positivist, empiricists and falsificationist theories are unable to satisfy their aim of giving as definitive account of the logic of science; it may also be that the sociology of science is incapable of accounting for the success of the scientific enterprise. In any case, it should be clear that the idealised scientific method is a source of ongoing debate and contention.

Annotated list of related issues

Empirical methods

Paradigm change

Paradigm, perhaps the most abused word in English.
Thomas Kuhn wrote influentially on the sociology of scientific revolutions in The Structure of Scientific Revolutions.
Paradigm shift is a Kuhnian term referring to the change between one pervasively accepted theory (eg, Aristotian motion) and another (eg, Newtonian gravitation). Kuhn himself came to prefer other terminology.

The problem of induction questions the logical ground for induction as a basis for science.

Inductive reasoning has appeared to some (most famously, to Sir Francis Bacon) to be at the core of scientific method; it also appears to be logically invalid.
David Hume was the person who most famously and influentially pointed out the inadequacy of induction in generating true statements, scientific or not.
Karl Popper offered one resolution, Falsifiability

Scientific creativity

When Method goes wrong

Critique of 'standard' Scientific Method

Paul Feyerabend argued that the search for a definitive scientific method was misplaced, and even counterproductive.
Imre Lakatos attempted to bridge the apparent gap between Popper's account of the philosophy of science and Kuhn's account of its sociology of change. Others do not see any need to worry about this difference, feeling they are not mutually exclusive.
Scientism