Previous PageTable Of ContentsNext Page

Chapter 1 Experimental design

OBJECTIVES

As explained in the foreword, this Bulletin is not directed to pure research and the pursuit of knowledge, but to finding practical answers to practical questions. So the starting point of any programme must be to define precisely and accurately what it is that one wants to know, and how the knowledge will be used. A very experienced soil scientist, Sir Charles Pereira, suggests:

"One needs to distinguish clearly between two types of experimental measurements.

� a genuine estimate of the mean of a highly variable quantity, such as rainfall, which is to be used in a quantitative balance, and

� an illustration of the range and order of magnitude of a variable for which no useful estimate of the mean can be made from the number of measurements which are logistically practical, such as soil loss per hectare per year, since this compounds the variabilities of soil type, drainage, vegetation cover, mechanical disturbance, slope, aspect, and exposure to flows from upslope.

There is much danger of the unscientific use of the second type, expressed as tonnes/km2, as if it had the logical and statistical basis of the first type."

It is a sensible precaution to check whether the information which is sought is already available. The project worker in the field is unlikely to have either time or facilities to do a literature search, but the technical backup staff at headquarters may be able to make use of the efficient data storage and retrieval systems which are now available. One problem is that the simple practical field experiments which it is proposed to embark on are usually either not adequately written up, or are in hard-to-find project documents.

In tropical conditions the damage caused by rarely occurring extreme events can be very important - more so than in less aggressive climates. This is particularly so for soil loss from cropland and the expansion of gullies. Thus a balance has to be made between installing a flume which is able to measure a 20-year extreme flood which is unlikely to occur during the experiment, or installing a flume which can measure floods up to a 5-year frequency and take a chance on it being submerged by larger floods.

PRACTICALITIES

A small amount of reliable information is more useful than a large amount of information which cannot be used because it is unreliable. Therefore field assessments should be kept as simple as possible and directed to limited attainable objectives. There is a great temptation to try for too much by including too many variables or too many levels of each variable. When planning a programme of experiments bear in mind the limited resources which will be available, considering the initial and running costs, and the staff required to operate the experiments.

Avoid the idea that any information is better than none. This is sometimes used to justify 'quick and dirty' trials, or as an excuse for not doing them properly. However, it is completely unscientific. Using inaccurate or unreliable data is much more likely to cause problems than to improve matters.

How long will the experiments continue? Because of the variability of runoff from year to year, the duration should be as long as possible but there are practical constraints. How long is the project going to run and what happens at the end? What is the chance that it might continue after the first phase? Can the project continue if the staff who set up the experiments are transferred or replaced? The practical solution is to attempt only what can be completed within the assured time and known resources. It may sometimes be possible to plan an additional component which can be put into operation if the project is extended, but this is only sensible if it is an optional extra, not if its absence will reduce the value of the basic plan.

It may be possible to extrapolate short-term results if longer records of rainfall are available. For example, if annual soil loss was measured for two years but one year had below average rainfall and the other above average rainfall. Long-term rainfall records may give some indication of how often these losses may be expected to occur, but only if there is a direct relationship between what was measured (annual soil loss) and the long-term record (total annual rainfall) There will be an association, of course, because the more it rains, the more erosion will take place, but the linkage is so crude that it has little value because it ignores important factors like how much of the rain fell on bare soil and how much on mature crops. A sound principle in all science is to avoid extrapolating beyond the range of measured results.

In experimental work, it is important to relate cause and effect. If some change or difference is measured, one needs to know what caused it, and usually it does not help to know that it must have been one or more of a number of possible causes. In research station trials, this problem is avoided by changing the factors one at a time, called isolating the variables, so that if a change is measured it is clear what caused it. This is not always practical or necessary in field trials. It may help to know that one package of farm practices gives a better result than another package, but it is better to know which of the components of the package are most important in causing the improvement.

The problem of on-farm research is that it nearly always suffers from the problem that the variables cannot be separated and controlled. An example is the studies made in Kenya which were set up to check whether the conservation package programme could be shown to increase the yield of maize on fields where the programme had been applied. Data were collected from farmers' fields and the results did suggest that there was an increase. But the package consisted of building earth terraces, using improved seed, using more fertilizer, and generally improved cultivation practices, and the effect of each of these factors cannot be separately evaluated, so the information had very limited application. It could be that the improved seed would have given the increase without the cost of additional fertilizer, or that the increased use of fertilizer could have given the improved yield if applied to the traditional maize varieties, and if the effect of the terracing was to improve the soil moisture availability this might be achieved by simpler methods.

AVOID 'BEFORE AND AFTER' EXPERIMENTS

A method which is sometimes used to assess the effect of some change or treatment is to take a series of measurements before the treatment and compare these with a similar series after the treatment. This is simple but fundamentally unsound because there is no way of telling whether the difference is directly caused by the treatment or arises from some other cause of change between the two sets of measurements. Figure 1 shows an example of this. An experiment is set up to estimate the effect of changing the intensity of grazing on runoff for a small catchment. Measurements of total weekly runoff are plotted against weekly rainfall, and at the end of the season about 20 or 30 points seem to indicate a fairly close relationship. In the following season the treatment is applied, say, by doubling the number of animals grazing, and at the end of the season the plotted points suggest that the relationship is significantly different. The problem lies in the fact that one cannot be sure that the difference is caused by the treatment and is not reflecting the fact that rainfall during the second season was different. If the rainfall was much heavier, this alone could account for higher rates of runoff. There are no statistical contrivances which can overcome this difficulty. An uncontrollable and unmeasurable variable has been allowed to come in and muddy the waters and there is no way of removing its effect.

The technique to be used to avoid this problem is that of paired plots. Two plots are chosen as similar as possible, for no plots are ever identical, and the two plots are calibrated, i.e., the difference between the two is measured by plotting values of some suitable parameter of one plot against the other as in Figure 2. After a time, the relationship between the two plots becomes apparent and this time depends upon the frequency of the observations. Plotting daily values would establish the relationship sooner than plotting weekly or monthly data. If the two plots are truly identical the plot will be a straight line at 45_ through the origin. The treatment is then applied to one plot and the series of measurements repeated. Now if there is any variation in the rainfall or any other aspect of climate it will affect both plots equally, and so a new relationship can only be due to the treatment because that is the only variable.

The conclusion is to avoid before and after trials of processes which can be affected by climate, particularly rainfall.

STATISTICS

The details of experimental design and the statistical analysis of results are beyond the scope of this Bulletin, but it may be worth a quick look at some important issues. If one is looking for cause and effect relationships, the key to experimental design is to isolate and measure the effect of the variables. The problem is that in all biological processes there are huge numbers of variables which can affect the process, each with a wide range of values, and separating them all is difficult.

t0848e01.jpg (43049 bytes)

Sampling

The method of sampling and the size of the sample are important. For a sample to be representative of the whole population, it must be large enough to reflect the variation within the whole population. To assess the yield of a field of maize the yield of one plant could be measured and multiplied by the number of plants in the field. But plants differ greatly, so a more accurate estimate can be obtained by measuring say 20 plants, and the really believable result is measuring what comes off the whole field.

Differences

Usually the objective is to estimate differences rather than absolute values. For example, trials show that one variety yields 1.5 t/ha and another variety 1.6 t/ha. It may be possible to calculate whether the difference is statistically significant at various levels of probability, but that is irrelevant since no farmer is going to change on the basis of such a small difference in yield when there are other factors to consider such as taste, keeping quality, disease resistance, and so on.

Beware of expressing differences as percentages, which can be misleading or irrelevant. For example, the summary of a research report says that the annual soil loss from the plot with treatment A was 80% less than from the plot with treatment B. This sounds impressive until one reads that the figures were 100 kg/ha and 20 kg/ha, so the real conclusion should be that soil loss was negligible in both cases.

Avoid also announcing a winner without giving the scores. A comparison was made between five different parameters for estimating the erosivity of rainfall, and the abstract says No 4 was 'the best estimator'. The full report discloses that the correlation coefficient was above 0.9 for all five, with the 'winner' at 0.955. A more sensible conclusion would have been that all five were extremely effective, and the one to recommend should be whichever is simplest and easiest to use.

Replications

Going back to the variety trial yield figures of 1.5 t/ha one should know how accurate that is. Did it mean that it is fairly sure to be between 1.0 and 2.0 i.e. 1.5 � 0.5, or between 1.4 and 1.6 i.e. 1.5 � 0.1? This can only be answered from replications; with a single observation there is no way of knowing.

Many years ago when the application of statistics to agricultural research was in its infancy and a huge research programme in Africa was being set up, the Director said "We are only looking for results that farmers will understand and apply. I am not interested in figures that need statistics to explain them." As a result, all the early experiments had many treatments and no replications - a foolish error which was much regretted later.

There is always going to be some natural variation in what is being measured, and also some error in the measurement. Replications are used, that is theoretically absolutely identical situations, to give an assessment which is acceptably reliable and accurate. The size of the experiment always has limitations such as cost, labour, area, so there has to be a compromise between the number of treatments and the number of replications. If one can afford 12 plots in a field experiment, they could be allocated as follows:

Treatments

Replications of each Treatment

12

1

6

2

4

3

3

4

2

6

1

12

Clearly the two extremes are impractical, but how to choose between the other four alternatives? For an important long-term experiment it might be worthwhile carrying out a pilot trial. One would take measurements from 10 or 12 replications and study the variation, and it is possible to calculate what is the smallest number of replications which will give an acceptable accuracy. Since this is inappropriate for simple field experiments, one must resort to some simple rules. If the soil loss from two replicates is 2 kg and 8 kg, then a mean of 5 kg is rather crude. If three replications give 2 kg, 8 kg and 7 kg, one might have a little more confidence in a mean of 6.5 kg - but there is still doubt about the 2 kg result. Is that a faulty measurement or is it the only good one with mistakes in the other two?

In general the greater the difference between measurements, the greater is the number of replications required, but the catch is that when the experiment is started, it is not known what the difference is likely to be, so some other arbitrary criteria are required.

� Always do a calibration run on the replicated plots before applying the treatments.

� Too many replications may limit the number of treatments which can be applied, but the results are reliable and therefore useful. This is better than too few replications giving a result which cannot be relied upon.

� The greater the range of what is being measured, the more replications are required. Runoff plots are notorious for wild results because things go wrong (discussed in more detail in Chapter 3), so a large number of replications dilutes the effect of a mechanical fault like a tap accidentally left open.

� Results of the same experiment in successive years are not replications because of annual variability.

As an arbitrary judgement based on experience it is suggested that for plot work there should always be a minimum of three replications and four or five are better. Studies of variance and sampling techniques are listed in the section Further reading.

ANOMALOUS RESULTS AND EXTRAPOLATION

Here is another cautionary tale, which illustrates two points - the danger of discarding anomalous results, and the danger of extrapolating, that is predicting what might happen beyond the range of the measurements.

Suppose the measurements from four replications are 1.6, 1.8, 1.7 and 4.5. Should it be assumed that the 4.5 is an error and discarded, giving a tidy mean of 1.7? No! There is a 25% chance that that is the only correct figure, and that there are faults in the other three. With only three replications an apparently anomalous result has a 33% chance of being right, and with only two replications and two different results each has a 50% chance of being right or wrong. Another reason for having as many replications as possible is that with a large number of replications one freak figure will only have a small effect on the mean.

There are few anomalies in science. When a reading looks suspicious, either there is a reason for it which you have not yet latched on to - so think hard - or it is a false reading due to an unknown experimental error - so do it again until you locate the error.

Here are two examples of anomalies, one of which was resolved and one not.

A graduate student studied the relationship between density of cover and soil splash, and the results plotted as a smooth curve with a strange blip as shown in Figure 3. The effect was real, because it occurred when the experiment was independently repeated, but there was no apparent reason for it. It had to be an unknown fault in the technique, and after much effort it was tracked down. Plastic meshes were used to give different levels of cover, and the density was measured on photographs of each mesh. But the mesh at one point was thicker than at the others so there was more splashing about within the mesh, and less energy available to cause splash. It was not realized that light passing through a mesh is not the same as raindrops passing through.

The second anomaly in experimental results was the opposite situation - apparently spurious results were written off as experimental error although later research showed them to be genuine. Measuring the size of raindrops used to be done by catching drops in pans of flour and weighing the pellet formed in the flour by each drop. A previous laboratory calibration gives the relationship between weight of pellet and weight of water drop, and this is expressed as mass ratio. When this method was used by Laws and Parsons in 1943, the results obtained were as shown in Figure 4. Laws and Parsons assumed that a flour pellet could not weigh less than the drop which formed it, so ignored point A, although it is now known that in fact this can and does happen. The other points fell close to a straight line except the largest drop so they also ignored point B. It is now known that too was a genuine result, and that a smooth curve through all the results is better than the straight line relationship they assumed.

t0848e02.jpg (59459 bytes)

The error in the calibration of Laws and Parsons did not alter the fact that this was a brilliant and original piece of research. Unfortunately, good original ideas are sometimes so successful that their use is stretched farther than is justifiable. Most of Laws and Parsons samples were at intensities less than 50 mm/h with a very small number up to 100 mm/h, and gave a curve of the type shown in Figure 5 which can be represented by an equation of the form:

D = aIb

where D is drop diameter, I is intensity, and a and b are constants.

But this equation suggests that the median drop size will continue to increase as intensity increases, and while this is true within the range of Laws and Parsons measurements, later research at higher intensities has shown that is not the case in the high intensities which occur in tropical rainstorms. In fact there is a straightforward physical explanation of this. There is a limit to the size of falling raindrops because above this limit they become unstable and break up into smaller drops, and in fact it is now known that the median drop diameter rises to a maximum and then decreases as in Figure 6. When calculations of rainfall energy were required for calculating the erosivity factor of the Universal Soil Loss Equation, the Laws and Parsons drop size/intensity relationship was extrapolated up to intensities of 250 mm, with the result that the kinetic energy of tropical rainfall was greatly exaggerated. It was not until 1987 that the USDA Agricultural Handbook was revised and upper limits of rainfall energy were introduced, but in the intervening period many studies of tropical rainfall were confused or unsound as a result of this basic error.

The moral is clear - be very cautious about accepting or rejecting suspicious results, and equally cautious about extrapolating beyond the range of the measured data. A useful book on the handling of experimental data is Pentz and Shott (1988).

CONCLUSION

The message must be: always keep the experimental design as simple as possible, and take advice from a biometrician before starting experiments. What happens all too often is that the experimenter goes to the statistician with a box full of data files and asks for help in analysing the results. He or she is then disappointed to be told that the design was all wrong and the results are of little value. The most likely faults are, in the case of catchment studies, rushing into the treatments before adequate calibration, or in the case of small plot work, the lack of adequate replication or allowing uncontrolled variables to creep in.

t0848e03.jpg (44139 bytes)

Previous PageTop Of PageNext Page