Previous Page Table of Contents Next Page


C 67

- record sheets for measurements and analyses of variance


Page 210 is a blank assessment sheet for measuring trees.
Page 211 is the same sheet, with a simple worked example.
Page 212 is a blank analysis of variance sheet.
Page 213 is the same sheet, with a worked analysis of the figures on page 211.

ASSESSMENT OF                                                                              


ASSESSMENT OF   Gain in height growth          

 Height in cm measured from the top of a peg in the sore to the estimated position of the man shoot top
Initial heightHeightGain        
1 (Small120244         
 Mean  6.0         
2 (medium519256         
sized pots)617269         
 8(22)--        attacked by aphids
 Mean  7.7         
3 (large9183214         
 11(17)--        broken off
 12(21)--        died back
 Mean  12.5         
 MEAN  8.0        

Species:Experiment number:


Effect of:  
Assessment of: Units:
Date of treatment:Date of assessment:Assessment number:
Treatment number→      OVERALL TOTALS
Total (∑x)      TOTAL(∑X)
Number (π)      NUMBER(Π)
Mean (x)      MEAN(x)
Differences:      C.F. = 
 ∑ (x2)      (A) 
Source of VariationSums of SquaresDfVariance Estimate
(Mean square)
Variance Ratio
F (tables)
(to exceed)
(at level)
Treatment(B -CF)    
Error(total - treatment)    
Total(A - CF)    
Coefficient of variation:%
Standard error of the mean: 

at the


Species: Ceiba pentandreExperiment number:


Effect of: Pot size 
Assessment of: gain in heightUnits: cm
Date of treatment: 15/12/97 Date of assessment: 5/1/98Assessment number: 1
Treatment number→123   OVERALLTOTALS
Total (∑x)242325   TOTAL (∑X)72
Number (π)432   NUMBER (π)9
Mean (x)6.07.712.5   MEAN  (x)8.0
Differences:1.7           4.8Treatment 3 = 2.1 × Treatment 1C.F. = 
 ∑ (x2)184181317   (A)682
 144.00176.33312.50   (B)632.83
Source of VariationSums of SquaresDfVariance Estimate
(Mean square)
Variance Ratio
F (tables)
(to exceed)
(at level)
Treatment(B -cf)  56.83228.423.47 n.s.5.14(5%)
Error(total - treatment)  49.176     8.194   
Total(A -C.F.)106.008    
Coefficient of variation : 35.8 %
Standard error of the mean:
at thefor 4 replicates = ± 1.4
   5% - 6.1for 3 replicates = ± 1.7
   1% - 9.2for 2 replicates = ± 2.0

CONCLUSIONS: Overall treatment effect not significant. Growth in large pots more than twice that in small pots. This probably significant difference needs further study with many more trees in each treatment.


C 68

- assessment by scoring

(A) Need for scoring methods:

Scoring is a valuable way of getting a rapid, general view of a situation in biology. It can take one further than the recording of an observation (C 55), without embarking on long and detailed measurements that may or may not be appropriate and productive.

Scoring is especially useful when the features to be assessed are difficult or impossible to record by measurement or counting; for example when differences are:

  1. primarily qualitative, such as the rooting of cuttings, the germination of seeds, or the stopping or re-starting of shoot elongation (C 12, C 55);
  2. rather subjective, like the branching habit of a young tree, or the colour of its leaves; or
  3. needing to be estimated without sacrificing sample leaves (for example to measure leaf areas) or whole plants (to obtain dry weights).

Scoring can also be helpful later on, when the trees are too big for easy measurement, or there are too many items to count.

(B) How to score:

  1. Decide on the feature of the trees which you want to assess;
  2. Choose a set of recognisable categories or stages that cover at least the range of variation shown, and which can be seen without disturbing their growth;
  3. Label an example of each category or stage, and number them in sequence;
  4. Try giving a score to a few plants, and modify the categories if needed; and then
  5. Score all the trees in the experiment (see the blank record sheet in C 67).

Suggestions on scoring leaf colour are given in sheet C 55.

(C) Some weaknesses of scoring methods:

  1. It can be difficult to standardise the categories, and the intervals between them are not necessarily equal;
  2. Bias (C 15) is harder to avoid than when measuring;
  3. After some time, one's brain may refuse to carry on scoring without a rest;
  4. Not all statistical tests (C 67, C 69-B2) can be done on the results, and extra care is needed not to mislead oneself.

(D) Hints on scoring:

The main aims when choosing a scoring method are to minimise these weaknesses, and to achieve a valid, useful assessment simply and promptly. Some hints are:

  1. Look through the young trees first in order to discover whether they have yet reached a suitable stage of development for scoring, and to gauge the range of variation to be expected in the feature(s) to be scored;
  2. Choose between 5 and 10 convenient categories or stages. For example, when assessing:
    1. categories of branching habits, 1 might be used for young trees with unbranched main stems, and 5 for very bushy trees; and
    2. stages in outgrowth of new shoots, 1 might stand for “buds still unopened”, and 10 for “first new leaves fully expanded”;
  3. Do the scoring with at least one other person - difficult features may need three or four observers. Discuss the categories or stages together, but then score independently;
  4. Aiming for consistency of scoring is more important than whether you tend to score higher or lower than other people;
  5. Don't try and score too many different features at the same time; and
  6. To reduce bias in experiments, arrange to do the scoring without knowing the treatment or genetic origin of the trees. Expectations can influence results!

(E) Analysis of scored data:

(1) Chi-square (χ2) tests are especially appropriate for comparing categories.
The 2×2 χ2 test is quick to calculate, and gives a simple estimate of the significance (see C 69-E, G, H, I) of qualitative differences such as the presence or absence of something. For example:

Comparing the number of cuttings that rooted in two different rooting media:

TreatmentNumber rootedNumber not rootedTotal numberPercentage rooted
sand19 (a)46 (b)65 (g)29 %
sawdust/sand12 (c)  3 (d)15 (h)80 %
both31 (e)49 (f)80 (N) 

This is the sum for calculating the chi-square:


Note: when the numbers are small (some totals less than 30), three points apply:

  1. the test is only valid when the ‘Expected Frequency’ in each of the 4 main boxes a-d is at least 5 (for example the expected frequency in box d = fh/N = 9.2 and so is valid);
  2. ‘Yates's Correction’ should be applied before doing the test (take ½ from higher value and add it to smaller); and
  3. the chi-square will only be significant if the difference between the percentages is quite large (for example, a difference of 30 percentage points will not be significant until the total in each sample exceeds 25).

Result of the chi-square analysis (using Yates's Correction): χ2 = 11.18 ***.

With one degree of freedom (see C 69-I) between the two treatments, the values of chi-square to be exceeded are 3.84 (5% level); 6.64 (1%); and 10.83 (0.1%). In this example, the difference in rooting percent is highly significant (C 69-H), indicated by ***.

When features have been scored into several categories, larger tables can be constructed, and the combined chi-square calculated. If such a test would be invalid because of low ‘expected frequencies’ in some boxes in the table, then categories can be amalgamated and a simpler table prepared. (For instance, categories 1+2 and 3–5 might be put together and the larger groups compared in a 2×2 chi-square test.)

(2) Analysis of variance can also be applied to scored data, and the variation between independent observers included in the analysis (see C 69-F), provided that:

  1. the intervals between categories are reasonably even;
  2. an appropriate transformation (see C 69-O) is used because the numbers are expressed in a small number of discrete categories, and may not be ‘normally distributed’ (C 69-B,2,g). Transformations may also be relevant when the categories are non-linear (for example, with numbers of branches in categories of 0, 1, 2–5, 6–14, 15+).

If there are many zero values, you could compare the presence or absence of the feature by a chi-square test, and confine the analysis of variance to the cases where the feature is present.

(F) Summary:

Used with judgement, scoring methods can provide a rapid and useful complement to more precise and fully quantitative measurements. They are especially valuable when time is short and the features do not lend themselves to easy measurement.

Although the data obtained are only semi-quantitative, it may be possible to carry out valid statistical tests of significance.


C 69

- analysing the results of experiments

(A) Why experimental results generally need analysing:

Looking carefully at what happened in your experiment can clarify:

  1. whether there were differences between any treatments or genetic origins;
  2. when they started to occur, and how big they became; and
  3. possible linkages between observations and measurements, or between different assessments (C 55).

Statistical analyses are particularly important, helping one to avoid being misled when drawing conclusions about the results. They indicate how likely it is that any differences between the growth of various groups of young trees in your experiment are due just to chance, rather than to the conditions being studied (C 62-F).

(B) Two questions before starting a statistical analysis:

(1) Is it unnecessary? A formal analysis may not be needed for example when:

  1. numerous treated plants are thriving, while all the controls remain stunted;
  2. the trees of one genetic origin are growing, and those of the other are all dead; or
  3. you were just doing a preliminary ‘look-see’ trial with a few plants, to try something out before a full experiment.

(2) Would the analysis be valid? It may not be, if, for instance:

  1. there were no controls or other standards to compare with the treated trees;
  2. the treatments were not applied randomly, or otherwise without bias;
  3. one treatment influenced another (for example if fertiliser in treated containers could have washed out and been taken up by control trees);
  4. a suitable layout of the young trees wasn't used during the experimental period;
  5. some of the experimental trees were subjected to severe stress (C 41) before the time of the assessment;
  6. the figures were arithmetically invalid (for instance, ratios of percentages); or
  7. the pattern of variation between the individual plants didn't approximate to a normal distribution (but see section O).

(C) Steps in analysing the results:

  1. Calculating the average values for all treatments, Blocks (D 55 in Manual 4), genetic origins, and other factors; from the measurements you have done, at each assessment;
  2. Finding out what patterns of variation occur - from selected samples of the individual values - and looking at how one set of figures might be linked to another;
  3. Preparing the main results as a graph, histogram, pie-chart, diagram or table, so that differences between sets of trees can be more easily seen and appreciated;
  4. Doing some tests of significance on the most relevant figures; and
  5. Drawing conclusions about what the experiment has shown.

(D) Which figures to analyse?

This depends on the circumstances, but it may often be best to start with:

  1. assessments made at the end of the experimental period;
  2. stages when sizeable differences had recently appeared between various sets of experimental trees; or
  3. differences that deal with the main hypotheses of the experiment.

A decision can then be taken about which other sets of figures might be worth analysing.

(E) Tests of significance for ‘Yes/No’ situations:

If the difference between two sets of experimental trees is qualitative - that is, a simple choice between damaged/undamaged; alive/dead; leafy/leafless; terminal bud sprouting/not sprouting - then the Chi-square test2) is a straightforward one to use.
(see worked example of a 2×2 χ2 test in sheet C 68-E.)
Chi-square tests can also be performed with more than two samples.

(F) Tests of significance for ‘More/Less’ situations:

Where the difference between various sets of experimental trees is quantitative, several kinds of tests of significance are available. One of the most adaptable and widely used is the Analysis of Variance (ANOVA) - see blank sheet and worked example in C 67). What this does is to estimate:

  1. how much variation exists between all the individual values being analysed;
  2. how much of this variation can be assigned to the treatments applied, to overall differences between Blocks, to various genetic origins, or to other factors; and
  3. how much then remains unassigned (the ‘Error’ or ‘Residual’ estimate).

A simpler version is the t-test, but this can only handle a single comparison at a time, so is generally less informative and useful. (See also F - standard error of the mean.)

(G) Significant and non-significant effects.

The starting assumption (‘null hypothesis’) on which tests of significance are based is that there are not any real differences between the various groups of plants in the experiment - they just show chance variation around the overall mean. However, if it turns out that considerably more of the variation is assigned to treatment than to error, the assumption is found to be false and the treatment is said to have had a significant effect. If, on the other hand, roughly similar variation is assigned to treatment and to error, the original assumption stands, and the overall treatment difference is said to be ‘not significant’ (n.s.). The same applies to Blocks, genetic origins, and other factors.

(H) Levels of significance.

If a test of significance gives a number that is larger than the value given in the relevant table for the 5% level of probability (p = 0.05), this means that variation like this might happen anyway by chance in one out of more than 20 such trials. We say that such a difference is “probably significant” (and it is usually given one *). If the test gives a number that is bigger than the value in the table for the 1% level (p = 0.01), such a difference would only be likely to happen by chance once in more than 100 trials, and it is called “significant” (**). If the value for the 0.1% level (p = 0.001) is exceeded, the difference would probably only occur by chance once in more than 1000 trials, and is called “highly significant” (***).

(I) Degrees of freedom and significance in statistical tables.

Between two treatments, there is only one comparison to be made; between ten seed-lots only nine independent comparisons. The number of degrees of freedom (d.f.) is the number of trees, replicates, Blocks, treatments, seed lots, clones, and so on that are involved; minus one. So when looking up tables for:

  1. Chi-square tests: With one d.f. between ‘yes’ or ‘no’, the values of chi-square to be exceeded are 3.84 (5% level); 6.64 (1%); and 10.83 (0.1%).

  2. ANOVA (See worked example in C 67):

    1. Divide each of the sums of squares by the appropriate d.f. to get the mean squares (estimates of variation);
    2. Divide the mean square for treatment by the error mean square, to get the calculated value of F (variance ratio);
    3. Look up a Table of F values, using the d.f. for treatment along the top, and the error d.f. down the side. If your calculated F is larger than the one in the Table, this means that there is a significant overall effect of treatment (see F and L for individual pairs of treatments);
    4. Similarly, the mean squares for Blocks, clones, ‘triplets’ (C 15) or other groupings are divided by the error mean square to find out whether any of them show significance.

(J) Calculating the Standard Errors of the Means:

The standard error of the mean (S.E.) is the simplest estimate of how reliable an average is. It can be calculated for any set of figures by dividing the standard deviation by the square root of the number of trees:

After an ANOVA, a more accurate S.E. is calculated by dividing the error mean square (residual variance estimate - see I-2) by the number of values that have been averaged in a particular treatment, and then taking the square root.

The average (mean) is then written for example as 5.6±1.2, and on a graph or histogram the S.E. is usually shown to scale, as a vertical bar above and below the average value.

If their ‘error bars’ do not overlap, this is commonly taken as an indication that two means are probably significantly different from each other. If they do overlap, any differences may just be due to chance.

(K) Interactions.

Consider a 2 × 2 trial with a control, mulch only, fertiliser only, and mulch plus fertiliser (D 6 and D 55 in Manual 4). If, for example the effects of fertiliser were dependent on whether the plants were mulched or not, then an interaction is occurring (Manual 5). The two factors, mulch and fertiliser, are not acting independently from each other. Interactions are important in understanding more about growth, because they suggest that the two factors are acting upon the same process. On the other hand, if there is no interaction, the separate effects of the two factors will just be added together, or the one subtracted from the other. Interactions:

  1. can only be detected in experiments examining more than one factor. These might for instance be two different types of treatment, or one kind of treatment and a difference of genetic origin;
  2. need to be considered before looking at the effects of the main factors on their own;
  3. may, if significant, mean that a ‘breakdown analysis’ (usually, re-analysing the experiment in two parts) is needed to determine whether the individual effects are also significant.

(L) Examining the significance of differences between individual pairs:

  1. Find the difference between the average values in the pair to be compared (for example between treatments 1 and 3 in the worked example in C 67);
  2. Calculate a value called the least significant difference (L.S.D.):

    L.S.D= t × the standard error of the difference.

    The value ‘t’ is taken from tables at the 5%, 1% and 0.1% levels of probability, using the error degrees of freedom. n1 and n2 are the numbers of plants in treatment 1 and control;

  3. If the difference between the two averages is larger than the L.S.D., the difference is significant at the appropriate level.

    This is the simplest method. Although various authors suggest alternatives (C 62-F), these are more complicated to calculate. The L.S.D. can be a useful guide, provided the following points are remembered:

    1. if 20 genetic origins were tested in an experiment, you would expect the LSD at the 5% level to be exceeded once, just by chance, not because there was a real difference;
    2. similarly, if the treatments in an experiment involved 3 factors at 3 different levels, at least one of the 26 possible comparisons would be expected to be significant at the 5% level, without meaning that a real difference had been found.

(M) Various reasons for lack of significance:

If a test does not show significance, this merely means that the null hypothesis stands (see G). It does not prove that the treatment is ineffective.

Significant effects might not have been found because:

  1. there was too much variation in the experiment for them to show up;
  2. there weren't enough replicates, as in the worked example in C 67;
  3. the experimental plants were growing slowly, and errors in measurement were too large for an effect of treatment to be found;
  4. the effect of another factor was masking that of the treatment in question; or
  5. the particular treatment would only show significant effects in a different environment.

(N) Reducing variability:

  1. Re-analysing data from the same experiment:
    1. You could recalculate the data as the gain since the start of the experiment (C 55). This removes the variation due to the trees starting off at different sizes, and is in any case desirable when studying the periodicity or rates at which shoots are growing;
    2. For relative growth rates, you could do an analsis of covariance, based on the values for individual trees at the beginning of the experiment, and after a given period, or a regression (see Q);
    3. Transformations (see O) may have the effect of reducing variation, because they often give less weight to occasional very high values;
    4. Some computer programmes (see R-2) can be set to ignore data points that are further from the mean than a set distance. This is risky when dealing with variable species and environments, as these points may well be true values.

  2. Repeating the experiment. For instance, you might do this using:
    1. more replicates;
    2. young trees that had been grown beforehand under more uniform conditions (C 7);
    3. a different experimental area that provided similar light levels to all the plants. If necessary, you could have 3–5 Blocks running from the sunnier to the shadier parts;
    4. a ‘surround’ of similar trees that were not part of the experiment, to reduce ‘edge’ effects;
    5. treatments that were more contrasting than before; and
    6. more careful handling and watering (C 42, C 48)

Only after several experiments would you conclude that the treatment probably has little or no effect on those aspects of growth of that tree species.

(O) Transformations.

These are sometimes needed in order to put the figures into a form where a valid analysis can be done (see B-2). Here are some examples:

  1. Chi-square tests with small numbers - apply Yates's Correction (see C 68-E).

  2. ANOVA with a non-normal distribution - if the mean value lies well towards the low end of the distribution, transforming all the original data (‘x’) may make the distribution reasonably normal. If so, do the ANOVA on the transformed figures (‘z’). Some common transformations include:

    1. square root transformation: z = √(x + 0.375)
    2. log transformation: z = log10 (x + 0.375); or z = ln (x + 0.375) (0.375 is added to each number to avoid problems with values of zero and one.)

  3. ANOVA of percentages - use the arcsin transformation:

Note: if you want to de-transform the results before presenting them, the standard error of the mean (see J) and the least significant difference (see L) require care. Because transformations (2) and (3) above are not linear ones, the S.E. bars will be of unequal length above and below a mean.

(P) Missing plants or readings.

It is still possible to do ANOVAs when there are different numbers of readings in the various treatments or genetic origins, for example because:

  1. there was a shortage in some groups of plants;
  2. only a few trees could be treated, but more controls were available;
  3. some trees were accidentally damaged during the experiment;
  4. some dieback of shoot tips occurred, or death of plants (C 55).

If the ANOVA has only one factor (see K), then calculate as in C 67. If it has more than one factor, you could analyse them separately, though without being able to look at any interactions. Alternatively see statistical textbooks (C 62-F) for how to estimate missing values, noting that for each of them one d.f is deducted before calculating the error mean square.

(Q) Correlation and regression.

These are ways of examining how closely two sets of readings may be connected. For example, height and diameter growth in a set of young trees might often (though not always) be closely linked, with the shorter trees thinner, and the taller ones thicker. Moreover, you might expect that the growth of the trees could be linked with soil depth, moisture or fertility, or with an aspect of the weather.

When correlations or regressions show a close relationship, significance values are often given to them. But here it is particularly important not to be misled, because:

  1. the significance has not come from the experimental testing of a hypothesis;
  2. the apparent link may simply depend on connections to a third factor; and
  3. one in twenty comparisons amongst the many factors affecting the young trees may be expected to show a ‘probably significant’ relationship just by chance.

(R) Aids to calculation.

(1) Calculators: These have the advantages of being small, portable, robust and relatively cheap, and of working reliably from long-lasting batteries or solar energy. They are invaluable for transformations (see O), to obtain and check totals and averages, and for other simple calculations (C 63).

Some types contain programmes that automatically calculate the standard deviation and standard error of the mean when a set of figures is totalled. Others will perform more detailed analyses, or allow you to write a programme yourself.

(2) Computers: These offer opportunities for storing large amounts of data, doing complex calculations and analyses, and almost limitless possibilities for displaying the results. They can also be programmed to accept electronically recorded information about the environment. However, computers are relatively expensive, require a steady and reliable electricity supply, and need to be kept free of dust and high humidity. Some types can operate from rechargeable batteries, but they are too delicate to be really portable.

(S) A final hint:

Check at each stage for errors in recording numbers, and in calculations. If not, sooner or later you will find yourself having to start back at the beginning, re-analysing and re-drawing graphs (and perhaps even changing slides and the proofs of publications). Just because a computer has done the analysis does not mean that there cannot be errors.

Previous Page Top of Page Next Page