7.1 Introduction
7.2 Calibration graphs
7.3 Blanks and Detection limit
7.4 Types of sample material
7.5 Validation of own procedures
7.6 Drafting an analytical procedure
7.7 Research plan
SOPs
In this chapter the actual execution of the jobs for which the laboratory is intended, is dealt with. The most important part of this work is of course the analytical procedures meticulously performed according to the corresponding SOPs. Relevant aspects include calibration, use of blanks, performance characteristics of the procedure, and reporting of results. An aspect of utmost importance of quality management, the quality control by inspection of the results, is discussed separately in Chapter 8.
All activities associated with these aspects are aimed at one target: the production of reliable data with a minimum of errors. In addition, it must be ensured that reliable data are produced consistently. To achieve this an appropriate programme of quality control (QC) must be implemented. Quality control is the term used to describe the practical steps undertaken to ensure that errors in the analytical data are of a magnitude appropriate for the use to which the data will be put. This implies that the errors (which are unavoidably made) have to be quantified to enable a decision whether they are of an acceptable magnitude, and that unacceptable errors are discovered so that corrective action can be taken. Clearly, quality control must detect both random and systematic errors. The procedures for QC primarily monitor the accuracy of the work by checking the bias of data with the help of (certified) reference samples and control samples and the precision by means of replicate analyses of test samples as well as of reference and/or control samples.
7.2.1 Principle
7.2.2 Construction and use
7.2.3 Error due to the regression line
7.2.4 Independent standards
7.2.5 Measuring a batch
Here, the construction and use of calibration graphs or curves in daily practice of a laboratory will be discussed. Calibration of instruments (including adjustment) in the present context are also referred to as standardization. The confusion about these terms is mainly semantic and the terms calibration curve and standard curve are generally used interchangeably. The term "curve" implies that the line is not straight. However, the best (parts of) calibration lines are linear and, therefore, the general term "graph" is preferred.
For many measuring techniques calibration graphs have to be constructed. The technique is simple and consists of plotting the instrument response against a series of samples with known concentrations of the analyte (standards). In practice, these standards are usually pure chemicals dispersed in a matrix corresponding with that of the test samples (the "unknowns"). By convention, the calibration graph is always plotted with the concentration of the standards on the xaxis and the reading of the instrument response on the yaxis. The unknowns are determined by interpolation, not by extrapolation, so that a suitable working range for the standards must be selected. In addition, in the present discussion it is assumed that the working range is limited to the linear range of the calibration graphs and that the standard deviation does not change over the range (neither of which is always the case* and that data are normally distributed. Nonlinear graphs can sometimes be linearized in a simple way, e.g. by using a log scale (in potentiometry), but usually imply statistical problems (polynomial regression) for which the reader is referred to the relevant literature. It should be mentioned, however, that in modem instruments which make and use calibration graphs automatically these aspects sometimes go by unnoticed.
* This is the socalled "unweighted" regression line. Because normally the standard deviation is not constant over the concentration range (it is usually least in the middle range), this difference in error should be taken into account. This would then yield a "weighted regression line". The calculation of this is more complicated and information about the standard deviation of the yreadings has to be obtained. The gain in precision is usually very limited, but sometimes the extra information about the error may be useful.
Some common practices to obtain calibration graphs are:
1. The standards are made in a solution with the same composition as the extractant used for the samples (with the same dilution factor) so that all measurements are done in the same matrix. This technique is often practised when analyzing many batches where the same standards are used for some time. In this way an incorrectly prepared extractant or matrix may be detected (in blank or control sample).2. The standards are made in the blank extract. A disadvantage of this technique is that for each batch the standards have to be pipetted. Therefore, this type of calibration is sometimes favoured when only one or few batches are analyzed or when the extractant is unstable. A seeming advantage is that the blank can be forced to zero. However, an incorrect extractant would then more easily go by undetected. The disadvantage of pipetting does not apply in case of automatic dispensing of reagents when equal volumes of different concentration are added (e.g. with flowinjection).
3. Less common, but useful in special cases is the socalled standard additions technique. This can be practised when a matrix mismatch between samples and standards needs to be avoided: the standards are prepared from actual samples. The general procedure is to take a number of aliquots of sample or extract, add different quantities of the analyte to each aliquot (spiking) and dilute to the final volume. One aliquot is used without the addition of the analyte (blank). Thus, a standard series is obtained.
If calibration is involved in an analytical procedure, the SOP for this should include a description of the calibration subprocedure. If applicable, including an optimalization procedure (usually given in the instruction manual).
In several laboratories calibration graphs for some analyses are still adequately plotted manually and the straight line (or sometimes a curved line) is drawn with a visual "best fit", e.g. for flame atomic emission spectrometry, or colorimetry. However, this practice is only legitimate when the random errors in the measurements of the standards are small: when the scattering is appreciable the linefitting becomes subjective and unreliable. Therefore, if a calibration graph is not made automatically by a microprocessor of the instrument, the following more objective and also quantitatively more informative procedure is generally favoured.
The proper way of constructing the graph is essentially the performance of a regression analysis i.e., the statistical establishment of a linear relationship between concentration of the analyte and the instrument response using at least six points. This regression analysis (of reading y on concentration x) yields a correlation coefficient r as a measure for the fit of the points to a straight line (by means of Least Squares).
Warning. Some instruments can be calibrated with only one or two standards. Linearity is then implied but may not necessarily be true. It is useful to check this with more standards.
Regression analysis was introduced in Section 6.4.4 and the construction of a calibration graph was given as an example. The same example is taken up here (and repeated in part) but focused somewhat more on the application.
We saw that a linear calibration graph takes the general form:
y = bx + a 
(6.18; 7.1) 
where:
a = intercept of the line with the yaxis
b = slope (tangent)
Ideally, the intercept a is zero. Namely, when the analyte is absent no response of the instrument is to be expected. However, because of interactions, interferences, noise, contaminations and other sources of bias, this is seldom the case. Therefore, a can be considered as the signal of the blank of the standard series.
The slope b is a measure for the sensitivity of the procedure; the steeper the slope, the more sensitive the procedure, or: the stronger the instrument response on y_{i} to a concentration change on x (see also Section 7.5.3).
The correlation coefficient r can be calculated by:
_{} 
(6.19;7.2) 
where
x_{1}= concentrations of standards
¯x = mean of concentrations of standards
y_{1}= instrument response to standards
¯y = mean of instrument responses to standards
The line parameters b and a are calculated with the following equations:
_{} 
(6.20;7.3) 
and
a = ¯y  b¯x 
(6.21;7.4) 
Example of calibration graph
As an example, we take the same calibration graph as discussed in Section 6.4.4.1, (Fig. 64): a standard series of P (01.0 mg/L) for the spectrophotometric determination of phosphate in a BrayI extract ("available P"), reading in absorbance units. The data and calculated terms needed to determine the parameters of the calibration graph were given in Table 65. The calculations can be done on a (programmed) calculator or more conveniently on a PC using a homemade program or, even more conveniently, using an available regression program. The calculations yield the equation of the calibration line (plotted in Fig. 71):
y = 0.626x + 0.037 
(6.22; 7.5) 
with a correlation coefficient r = 0.997 . As stated previously (6.4.3.1), such high values are common for calibration graphs. When the value is not close to 1 (say, below 0.98) this must be taken as a warning and it might then be advisable to repeat or review the procedure. Errors may have been made (e.g. in pipetting) or the used range of the graph may not be linear. Therefore, to make sure, the calibration graph should always be plotted, either on paper or on computer monitor.
Fig. 71. Calibration graph plotted from data of Table 65.
If linearity is in doubt the following test may be applied. Determine for two or three of the highest calibration points the relative deviation of the measured yvalue from the calculated line:
_{} 
(7.6) 
 If the deviations are < 5% the curve can be accepted as linear.
 If a deviation > 5% then the range is decreased by dropping the highest concentration.
 Recalculate the calibration line by linear regression.
 Repeat this test procedure until the deviations < 5%.
When, as an exercise, this test is applied to the calibration curve of Fig. 71 (data in Table 63) it appears that the deviations of the three highest points are < 5%, hence the line is sufficiently linear.
During calculation of the line, the maximum number of decimals is used, rounding off to the last significant figure is done at the end (see instruction for rounding off in Section 8.2).
Once the calibration graph is established, its use is simple: for each y value measured for a test sample (the "unknown") the corresponding concentration x can be determined either by reading from the graph or by calculation using Equation (7.1), or x is automatically produced by the instrument.
The "fitting" of the calibration graph is necessary because the actual response points y_{i}, composing the line usually do not fall exactly on the line. Hence, random errors are implied. This is expressed by an uncertainty about the slope and intercept b and a defining the graph. A discussion of this uncertainty is given. It was explained there that the error is expressed by s_{y}, the "standard error of the yestimate" (see Eq. 6.23, a parameter automatically calculated by most regression computer programs.
This uncertainty about the _{}values (the fitted yvalues) is transferred to the corresponding concentrations of the unknowns on the xaxis by the calculation using Eq. (7.1) and can be expressed by the standard deviation of the obtained xvalue. The exact calculation is rather complex but a workable approximation can be calculated with:
_{} 
(7.7) 
Example
For each value of the standards x the corresponding y is calculated with Eq. (7.5):
_{} 
(7.8) 
Then, s_{y} is calculated using Eq. (6.23) or by computer:
_{}
Then, using Eq. (7.7):
_{}
Now, the confidence limits of the found results x_{f} can be calculated with Eq. (6.9):
x_{f }± t.s_{x} 
(7.9) 
For a twosided interval and 95% confidence: t_{tab} = 2.78 (see Appendix 1, df = n 2=4). Hence all results in this example can be expressed as:
Xf ± 0.08 mg/L
Thus, for instance, the result of a reading y = 0.22 and using Eq. (7.5) to calculate x_{f} = 0.29, can be reported as 0.29 ± 0.08 mg/L. (See also Note 2 below.)
The used s_{x} value can only be approximate as it is taken constant here whereas in reality this is usually not the case. Yet, in practice, such an approximate estimation of the error may suffice. The general rule is that the measured signal is most precise (least standard deviation) near the centroid of the calibration graph (see Fig. 64). The confidence limits can be narrowed by increasing the number of calibration points. Therefore, the reverse is also true: with fewer calibration points the confidence limits of the measurements become wider. Sometimes only two or three points are used. This then usually concerns the checking and restoring of previously established calibration graphs including those in the microprocessor or computer of instruments. In such cases it is advisable to check the graph regularly with more standards. Make a record of this in the file or journal of the method.
Note 1. Where the determination of the analyte is part of a procedure with several steps, the error in precision due to this reading is added to the errors of the other steps and as such included in the total precision error of the whole procedure. The latter is the most useful practical estimate of confidence when reporting results. As discussed in Section 6.3.4 a convenient way to do this is by using Equations (6.8) or (6.9) with the mean and standard deviation obtained from several replicate determinations (n> 10) carried out on control samples or, if available, taken from the control charts (see 8.3.2: Control Chart of the Mean). Most generally, the 95% confidence for single values x of test samples is expressed by Equation (6.10):
x±2s 
(6.10; 7.10) 
where s is the standard deviation of the mentioned large number of replicate determinations.Note 2. The confidence interval of ± 0.08 mg/L in the present example is clearly not satisfactory and calls for inspection of the procedure. Particularly the blank seems to be (much) too high. This illustrates the usefulness of plotting the graph and calculating the parameters. Other traps to catch this error are the Control Chart of the Blank and, of course, the technician's experience.
It cannot be overemphasized that for QC a calibration should always include measurement of an independent standard or calibration verification standard at about the middle of the calibration range. If the result of this measurement deviates alarmingly from the correct or expected value (say > 5%), then inspection is indicated.
Such an independent standard can be obtained in several ways. Most usually it is prepared from pure chemicals by another person than the one who prepared the actual standards. Obviously, it should never be derived from the same stock or source as the actual standards. If necessary, a bottle from another laboratory could be borrowed.
In addition, when new standards are prepared, the remainder of the old ones always have to be measured as a mutual check (include this in the SOP for the preparation of standards!).
After calibration of the instrument for the analyte, a batch of test samples is measured. Ideally, the response of the instrument should not change during measurement (drift or shift). In practice this is usually the case for only a limited period of time or number of measurements and regular recalibration is necessary. The frequency of recalibration during measurement varies widely depending on technique, instrument, analyte, solvent, temperature and humidity. In general, emission and atomizing techniques (AAS, ICP) are more sensitive to drift (or even sudden shift: by clogging) than colorimetric techniques. Also, the techniques of recalibration and possible subsequent action vary widely. The following two types are commonly practised.
1. Stepwise correction or interval correction
After calibration, at fixed places or intervals (after every 10, 15, 20, or more, test samples) a standard is measured. For this, often a standard near the middle of the working range is used (continuing calibration standard). When the drift is within acceptable limits, the measurement is continued. If the drift is unacceptable, the instrument is recalibrated ("resloped") and the previous interval of samples remeasured before continuing with the next interval. The extent of the "acceptable" drift depends on the kind of analysis but in soil and plant analysis usually does not exceed 5%. This procedure is very suitable for manual operation of measurements. When automatic sample changers are used, various options for recalibration and repeating intervals or whole batches are possible.
2. Linear correction or correction by interpolation
Here, too, standards are measured at intervals, usually together with a blank ("drift and wash") and possible changes are processed by the computer software which converts the past readings of the batch to the original calibration. Only in case of serious mishap are batches or intervals repeated. A disadvantage of this procedure is that drift is taken to be linear whereas this may not be so. Autoanalyzers, ICP and AAS with automatic sample changers often employ variants of this type of procedure.
At present, the development of instrument software experiences a mushroom growth. Many new fancy features with respect to resloping, correction of carryover, postbatch dilution and repeating, are being introduced by manufacturers. Running ahead of this, many laboratories have developed their own interface software programs meeting their individual demands.
A blank or blank determination is an analysis of a sample without the analyte or attribute, or an analysis without a sample, i.e. going through all steps of the procedure with the reagents only. The latter type is the most common as samples without the analyte or attribute are often not available or do not exist.
Another type of blank is the one used for calibration of instruments as discussed in the previous sections. Thus, we may have two types of blank within one analytical method or system:
 a blank for the whole method or system and
 a blank for analytical subprocedures (measurements) as part of the whole procedure or system.
For instance, in the cation exchange capacity (CEC) determination of soils with the percolation method, two method or system blanks are included in each batch: two percolation tubes with cotton wool or filter pulp and sand or celite, but without sample. For the determination of the index cation (NH_{4} by colorimetry or Na by flame emission spectroscopy) a blank is included in the determination of the calibration graph. If NH_{4} is determined by distillation and subsequent titration, a blank titration is carried out for correction of test sample readings.
The proper analysis of blanks is very important because:
1. In many analyses sample results are calculated by subtracting blank readings from sample readings.2. Blank readings can be excellent monitors in quality control of reagents, analytical processes, and proficiency.
3. They can be used to estimate several types of method detection limits.
For blanks the same rule applies as for replicate analyses: the larger the number, the greater the confidence in the mean. The widely accepted rule in routine analysis is that each batch should include at least two blanks. For special studies where individual results are critical, more blanks per batch may be required (up to eight).
For quality control, Control Charts are made of blank readings identically to those of control samples. The betweenbatch variability of the blank is expressed by the standard deviation calculated from the Control Chart of the Mean of Blanks, the precision can be estimated from the Control Chart of the Range of Duplicates of Blanks. The construction and use of control charts are discussed in detail in 8.3. One of the main control rules of the control charts, for instance, prescribes that a blank value beyond the mean blank value plus 3× the standard deviation of this mean (i.e. beyond the Action Limit) must be rejected and the batch be repeated, possibly with fresh reagents.
In many laboratories, no control charts are made for blanks. Sometimes, analysts argue that 'there is never a problem with my blank, the reading is always close to zero'. Admittedly, some analyses are more prone to blank errors than others. This, however, is not a valid argument for not keeping control charts. They are made to monitor procedures and to alarm when these are out of control (shift) or tend to become out of control (drift). This can happen in any procedure in any laboratory at any time.
From the foregoing discussion it will be clear that signals of blank analyses generally are not zero. In fact, blanks may found to be negative. This may point to an error in the procedure: e.g. for the zeroing of the instrument an incorrect or a contaminated solution was used or the calibration graph was not linear. It may also be due to the matrix of the solution (e.g. extractant), and is then often unavoidable. For convenience, some analysts practice "forcing the blank to zero" by adjusting the instrument. Some instruments even invite or compel analysts to do so. This is equivalent to subtracting the blank value from the values of the standards before plotting the calibration graph. From the standpoint of Quality Control this practice must be discouraged. If zeroing of the instrument is necessary, the use of pure water for this is preferred. However, such general considerations may be overruled by specific instrument or method instructions. This is becoming more and more common practice with modem sophisticated hitech instruments. Whatever the case, a decision on how to deal with blanks must made for each procedure and laid down in the SOP concerned.
In environmental analysis and in the analysis of trace elements there is a tendency to accurately measure low contents of analytes. Modem equipment offer excellent possibilities for this. For proper judgement (validation) and selection of a procedure or instrument it is important to have information about the lower limits at which analytes can be detected or determined with sufficient confidence. Several concepts and terms are used e.g., detection limit, lower limit of detection (LLD), method detection limit (MDL). The latter applies to a whole method or system, whereas the two former apply to measurements as part of a method.
Note: In analytical chemistry, "lower limit of detection" is often confused with "sensitivity" (see 7.5.3).
Although various definitions can be found, the most widely accepted definition of the detection limit seems to be: 'the concentration of the analyte giving a signal equal to the blank plus 3× the standard deviation of the blank'. Because in the calculation of analytical results the value of the blank is subtracted (or the blank is forced to zero) the detection limit can be written as:
LLD, MDL = 3 × s_{bl} 
(7.11) 
At this limit it is 93% certain that the signal is not due to the blank but that the method has detected the presence of the analyte (this does not mean that below this limit the analyte is absent!).
Obviously, although generally accepted, this is an arbitrary limit and in some cases the 7% uncertainty may be too high (for 5% uncertainty the LLD =3.3 × s_{bl}). Moreover, the precision in that concentration range is often relatively low and the LLD must be regarded as a qualitative limit. For some purposes, therefore, a more elevated "limit of determination" or "limit of quantification" (LLQ) is defined as
LLQ = 2 × LLD = 6 × s_{bl} 
(7.12) 
or sometimes as
LLQ = 10 × s_{bl} 
(7.13) 
Thus, if one needs to know or report these limits of the analysis as quality characteristics, the mean of the blanks and the corresponding standard deviation must be determined (validation). The s_{bl} can be obtained by running a statistically sufficient number of blank determinations (usually a minimum of 10, and not excluding outliers). In fact, this is an assessment of the "noise" of a determination.
Note: Noise is defined as the 'difference between the maximum and minimum values of the signal in the absence of the analyte measured during two minutes' (ox otherwise according to instrument instruction). The noise of several instrumental measurements can be displayed by using a recorder (e.g. FES, AAS, ICP, IR, GC, HPLC, XRFS). Although this is not often used to actually determine the detection limit, it is used to determine the signaltonoise ratio (a validation parameter not discussed here) and is particularly useful to monitor noise in case of trouble shooting (e.g. suspected power fluctuations).
If the analysis concerns a onebatch exercise 4 to 8 blanks are run in this batch. If it concerns an MDL as a validation characteristic of a test procedure used for multiple batches in the laboratory such as a routine analysis, the blank data are collected from different batches, e.g. the means of duplicates from the control charts.
For the determination of the LLD of measurements where a calibration graph is used, such replicate blank determinations are not necessary since the value of the blank as well as the standard deviation result directly from the regression analysis (see Section 7.2.3 and Example 2 below).
Examples
1. Determination of the Method Detection Limit (MDL) of a KjeldahlN determination in soils
Table 71 gives the data obtained for the blanks (means of duplicates) in 15 successive batches of a microKjeldahl N determination in soil samples. Reported are the millilitres 0.01 M HCl necessary to titrate the ammonia distillate and the conversion to results in mg N by: reading × 0.01 × 14.
Table 71. Blank data of 15 batches of a KjeldahlN determination in soils for the calculation of the Method Detection Limit.
ml HCl 
mg N 
0.12 
0.0161 
0.16 
0.0217 
0.11 
0.0154 
0.15 
0.0203 
0.09 
0.0126 
0.14 
0.0189 
0.12 
0.0161 
0.17 
0.0238 
0.14 
0.0189 
0.20 
0.0273 
0.16 
0.0217 
0.22 
0.0308 
0.14 
0.0189 
0.11 
0.0154 
0.15 
0.0203 
Mean blank: 
0.0199 
s_{bl}: 
0.0048 
MDL = 3 × sbl =0.014 mg N
The MDL reported in this way is an absolute value. Results are usually reported as relative figures such as % or mg/kg (ppm). In the present case, if 1 g of sample is routinely used, then the MDL would be 0.014 mg/g or 14 mg/kg or 0.0014%.
Note that if one would use only 0.5 g of sample (e.g. because of a high N content) the MDL as a relative figure is doubled!
When results are obtained below the MDL of this example they must reported as: '<14 mg/kg' or '< 0.0014%'. Reporting '0 %' or '0.0 %' may be acceptable for practical purposes, but may be interpreted as the element being absent, which is not justified.
Note 1. There are no strict rules for reporting figures below the LLD or LLQ. Most important is that data can be correctly interpreted and used. For this reason uncertainties (confidence limits) and detection limits should be known and reported to clients or users (if only upon request).The advantage of using the " <" sign for values below the LLD or LLQ is that the value 0 (zero) and negative values can be avoided as they are usually either impossible or improbable. A disadvantage of the " <" sign is that it is a nonnumerical character and not suitable in spreadsheet programs for further calculation and manipulation. In such cases the actually found value will be required, but then the inherent confidence restrictions should be known to the user.
Note 2. Because a normal distribution of data is assumed it can statistically be expected that zero and negative values for analytical results occur when blank values are subtracted from test values equal to or lower than the blank. Clearly, only in few cases are negative values possible (e.g. for adsorption) but for concentrations such values should normally not be reported. Exceptions to this rule are studies involving surveys of attributes or effects. Then it might be necessary to report the actually obtained low results as otherwise the mean of the survey would be biased.
2. Lower Limit of Detection derived from a calibration graph
We use the calibration graph of Figure 71. Then, noting that s_{bl} = s_{x} = 0.6097 and using Equation (7.11) we obtain: LLD = 3×0.6097 = 1.829 mg/L.
It is noteworthy that "forcing the blank to zero" does not affect the Lower Limit of Detection. Although a (= y_{b}, see Fig. 71) may become zero, the uncertainty s_{y} of the calibration graph, and thus of s_{x} and s_{bl}, is not changed by this: the only change is that the "forced" calibration line has moved up and now runs through the intersection of the axes (parallel to the "original" line).
7.4.1 Certified reference material (CRM)
7.4.2 Reference material (RM)
7.4.3 Control sample
7.4.4 Test sample
7.4.5 Spiked sample
7.4.6 Blind sample
7.4.7 Sequencecontrol sample
Although several terms for different sample types have already freely been used in the previous sections, it seems appropriate to define the various types before the major Quality Control operations are discussed.
A primary reference material or substance, accompanied by a certificate, one or more of whose property values are accurately determined by a number of selected laboratories (with a stated method), and for which each certified value is accompanied by an uncertainty at a stated level of confidence.
These are usually very expensive materials and, particularly for soils, hard to come by or not available. For the availability a computerized databank containing information on about 10,000 reference materials can be consulted (COMAR, see Appendix 4).
A secondary reference material or substance, one or more of whose property values are accurately determined by a number of laboratories (with a stated method), and which values are accompanied by an uncertainty at a stated level of confidence. The origin of the material and the data should be traceable.
In soil and plant analysis RMs are very important since for many analytes and attributes certified reference materials (CRMs) are not (yet) available. For certain properties a "true" value cannot even be established as the result is always methoddependent, e.g. CEC, and particlesize distribution of soil material. A very useful source for RMs are interlaboratory (round robin) sample and data exchange programmes. The material sent around is analyzed by a number of laboratories and the resulting data offer an excellent reference base, particularly if somehow there is a link with a primary reference material. Since this is often not the case, the data must be handled with care: it may well be that the mean or median value of 50 or more laboratories is "wrong" (e.g. because most use a method with an inadequate digestion step).
In some cases different levels of analyte may be imitated by spiking a sample with the analyte (see 7.4.5). However, this is certainly not always possible (e.g. CEC, exchangeable cations, pH, particlesize distribution).
An inhouse reference sample for which one or more property values have been established by the user laboratory, possibly in collaboration with other laboratories.
This is the material a laboratory needs to prepare for secondline (internal) control in each batch and the obtained results of which are plotted on Control Charts. The sample should be sufficiently stable and homogeneous for the properties concerned. The preparation of control samples is discussed in Chapter 8.
The material to be analyzed, the "unknown".
A test material with a known addition of analyte.
The sample is analyzed with and without the spike to test recovery (see 7.5.6). It should be a realistic surrogate with respect to matrix and concentration. The mixture should be well homogenized.
The requirement "realistic surrogate" is the main problem with spikes. Often the analyte cannot be integrated in the sample in the same manner as the original analyte, and then treatments such as digestion or extraction may not necessarily reflect the behaviour of real samples.
A sample with known content of the analyte. This sample is inserted by the Head of Laboratory or the Quality Officer in batches at places and times unknown to the analyst. The frequency may vary but as an indication one sample in every 10 batches is given.
Various types of sample material may serve as blind samples such as control samples or sufficiently large leftovers of test samples (analyzed several times). In case of water analysis a solution of the pure analyte, or combination of analytes, may do. Essential is that the analyst is aware of the possible presence of a blind sample but that he does not recognize the material as such.
Insertion of blind samples requires some attention regarding the administration and camouflaging. The protocol will depend on the organization of the sample and data stream in the laboratory.
A sample with an extreme content of the analyte (but falling within the working range of the method). It is inserted at random in a batch to verify the correct order of samples. This is particularly useful for long batches in automated analyses. Very effective is the combination of two such samples: one with a high and one with a low analyte content.
7.5.1 Trueness (accuracy), bias
7.5.2 Precision
7.5.3 Sensitivity
7.5.4 Working range
7.5.5 Selectivity and specificity
7.5.6 Recovery
7.5.7 Ruggedness, robustness
7.5.8 Interferences
7.5.9 Practicability
7.5.10 Validation report
Validation is the process of determining the performance characteristics of a method/procedure or process. It is a prerequisite for judgement of the suitability of produced analytical data for the intended use. This implies that a method may be valid in one situation and invalid in another. Consequently, the requirements for data may, or rather must, decide which method is to be used. When this is illconsidered, the analysis can be unnecessarily accurate (and expensive), inadequate if the method is less accurate than required, or useless if the accuracy is unknown.
Two main types of validation may be distinguished:
1. Validation of standard procedures. The validation of new or existing methods or procedures intended to be used in many laboratories, including procedures (to be) accepted by national or international standardization organizations.2. Validation of own procedures. The inhouse validation of methods or procedures by individual userlaboratories.
The first involves an interlaboratory programme of testing the method by a number (³ 8) of selected renown laboratories according to a protocol issued to all participants. The second involves an inhouse testing of a procedure to establish its performance characteristics or more specifically its suitability for a purpose. Since the former is a specialist task, usually (but not exclusively) performed by standardization organizations, the present discussion will be restricted to the second type of validation which concerns every laboratory.
Validation is not only relevant when nonstandard procedures are used but just as well when validated standard procedures are used (to what extent does the laboratory meet the standard validation?) and even more so when variants of standard procedures are introduced. Many laboratories use their own versions of wellestablished methods or change a procedure for reasons of efficiency or convenience.
Fundamentally, any change in a procedure (e.g. sample size, liquid:solid ratio in extractions, shaking time) may affect the performance characteristics and should be validated. For instance, in Section 7.3.2 we noticed that halving the sample size results in doubling the Lower Limit of Detection.
Thus, inherent in generating quality analytical data is to support these with a quantification of the parameters of confidence. As such it is part of the quality control.
To specify the performance characteristics of a procedure, a selection (so not necessarily all) of the following basic parameters is determined:
 Trueness (accuracy), Bias
 Precision
 Recovery
 Sensitivity
 Specificity and selectivity
 Working range (including MDL)
 Interferences
 Ruggedness or robustness
 Practicability
Before validation can be carried out it is essential that the detailed procedure is available as a SOP.
One of the first characteristics one would like to know about a method is whether the results reflect the "true" value for the analyte or property. And, if not, can the (un)trueness or bias be quantified and possibly corrected for?
There are several ways to find this out but essentially they are all based on the same principle which is the use of an outside reference, directly or indirectly.
The direct method is by carrying out replicate analyses (n ³ 10) with the method on a (certified) reference sample with a known content of the analyte.
The indirect method is by comparing the results of the method with those of a reference method (or otherwise generally accepted method) both applied to the same sample(s). Another indirect way to verify bias is by having (some) samples analyzed by another laboratory and by participation in interlaboratory exchange programmes. This will be discussed in Chapter 9.
It should be noted that the trueness of an analytical result may be sensitive to varying conditions (level of analyte, matrix, extract, temperature, etc.). If a method is applied to a wide range of materials, for proper validation different samples at different levels of analyte should be used.
Statistical comparison of results can be done in several ways some of which were described in Section 6.4.
Numerically, the trueness (often less appropriately referred to as accuracy) can be expressed using the equation:
_{} 
7.14 
where
¯x = mean of test results obtained for reference sample
m = "true" value given for reference sample
Thus, the best trueness we can get is 100%.
Bias, more commonly used than trueness, can be expressed as an absolute value by:
bias = ¯x  m 
(7.15) 
or as a relative value by:
_{} 
(7.16) 
Thus, the best bias we can get is 0 (in units of the analyte) or 0 % respectively.
Example
The Cu content of a reference sample is 34.0 ± 2.7 mg/kg (2.7 = s, n=12). The results of 15 replicates with the laboratory's own method are the following: 38.0; 34.6; 29.1; 27.8; 40.4; 33.1; 40.9; 28.5; 36.1; 26.8; 30.6; 24.3; 31.6; 22.3; 29.9 mg/kg.
With Equation (6.1) we calculate: ¯ x = 31.6. Using Equation (7.14) the trueness is (31.6/34.0)×100% = 93%. Using Equation (7.16), the bias is (31.6  34.0)×100% / 34.0 =  7%.
These calculations suggests a systematic error. To see if this error is statistically significant a ttest can be done. For this, with Equation (6.2) we first calculate s = 5.6. The Ftest (see 6.4.2 and 7.5.2) indicates a significant difference in standard deviation and we have to use the Cochran variant of the ttest (see 6.4.3). Using Equation (6.16) we find t_{cal} = 1.46, and with Eq. (6.17) the critical value t_{tab}^{*} = 2.16 indicating that the results obtained by the laboratory are not significantly different from the reference value (with 95% confidence).
Although a laboratory could be satisfied with this result, the fact remains that the mean of the test results is not equal to the "true" value but somewhat lower. As discussed in Sections 6.4.1 and 6.4.3 the onesided ttest can be used to test if this result is statistically on one side (lower or higher) of the reference value. In the present case the onesided critical value is 1.77 (see Appendix 1) which also exceeds the calculated value of 1.46 indicating that the laboratory mean is not systematically lower than the reference value (with 95% confidence).
At first sight a bias of 7% does not seem to be insignificant. In this case, however, the wide spread of the own data causes the uncertainty about this. If the standard deviation of the results had been the same as that of the reference sample then, using
Equations (6.13) and (6.14), t_{cal} were 2.58 and with t_{tab} = 2.06 (App. 1) the difference would have been significant according to the twosided ttest, and with t_{tab} =1.71 significantly lower according to the onesided ttest (at 95% confidence).
7.5.2.1 Reproducibility
7.5.2.2 Repeatability
7.5.2.3 Withinlaboratory reproducibility
Replicate analyses performed on a reference sample yielding a mean to determine trueness or bias, as described above, also yield a standard deviation of the mean as a measure for precision. However, for precision alone also control samples and even test samples can be used. The statistical test for comparison is done with the Ftest which compares the obtained standard deviation with the standard deviation given for the reference sample (in fact, the variances are compared: Eq. 6.11).
Numerically, precision is either expressed by the absolute value of the standard deviation or, more universally, by the relative standard deviation (RSD) or coefficient of variation (CV) (see Equations 6.5 and 6.6,).
_{} 
(7.17 
where
¯x = mean of test results obtained for reference sample
s = standard deviation of x
If the attained precision is worse than given for the reference sample then it can still be decided that the performance is acceptable for the purpose (which has to be reported as such), otherwise it has to be investigated how the performance can be improved.
Like the bias, precision will not necessarily be the same at different concentration of the analyte or in different kinds of materials. Comparison of precision at different levels of analyte can be done with the Ftest: if the variances at a few different levels are similar, then precision is assumed to be constant over the range.
Example
The same example as above for bias is used. The standard deviation of the laboratory is 5.6 mg/kg which, according to Eq. (7.17), corresponds with a precision of (5.6/31.6)×100% = 18%. (The precision of the reference sample can similarly be calculated as about 8%).
According to Equation (6.11) the calculated Fvalue is:
_{}
the critical value is 2.47 (App. 2, twosided, df_{1} = 14, df_{2} =11) hence, the null hypothesis that the two standard deviations belong to the same population is rejected: there is a significant difference in precision (at 95% confidence level).
Types of precision
The above description of precision leaves some uncertainty about the actual execution of its determination. Because particularly precision is sensitive to the way it is determined some specific types of precision are distinguished and, therefore, it should always be reported what type is involved.
The measure of agreement between results obtained with the same method on identical test or reference material under different conditions (execution by different persons, in different laboratories, with different equipment and at different times). The measure of reproducibility R is the standard deviation of these results s_{R}, and for a not too small number of data (n³ 8) R is defined by (with 95% confidence):
R = 2.8 × s_{R} 
(7.18) 
(where 2.8 = 2_{}and is derived from the normal or gaussian distribution; ISO 5725).
Thus, reproducibility is a measure of the spread of results when a sample is analyzed by different laboratories. If a method is sensitive to different ways of execution or conditions (low robustness, see 7.5.7), then the reproducibility will reflect this.
This parameter can obviously not be verified in daily practice. For that purpose the next two parameters are used (repeatability and withinlaboratory reproducibility).
The measure of agreement between results obtained with the same method on identical test or reference material under the same conditions (job done by one person, in the same laboratory, with the same equipment, at the same time or with only a short time interval). Thus, this is the best precision a laboratory can obtain: the withinbatch precision.
The measure for the repeatability r is the standard deviation of these results s_{r}, and for a not too small number of data (³ 10) r is defined by (with 95% confidence):
r = 2.8 × s_{r} 
(7.19) 
The measure of agreement between results obtained with the same method on identical test material under different conditions (execution by different persons, with the same or different equipment, in the same laboratory, at different times). This is a more realistic type of precision for a method over a longer span of time when conditions are more variable than defined for repeatability.
The measure is the standard deviation of these results s_{L} (also called betweenbatch precision). The withinlaboratory reproducibility R_{L} is calculated by:
R_{L} = 2.8 × s_{L} 
(7.20) 
The betweenbatch precision can be estimated in three different ways:
1. As the standard deviation of a large number (n³ 50) of duplicate determinations carried out by two analysts:
_{} 
(7.21) 
where
s_{i}, = the standard deviation of each pair of duplicates
k = number of pairs of duplicates
d_{i} = difference between duplicates within each pair
2. Empirically as 1.6 × s_{r}. Then:
R_{L} = 2.8 × 1.6 × s_{r}
or:
R_{L }= 1.6 × r 
(7.22) 
where r is the repeatability as defined above.
3. The most practical and realistic expression of the withinlaboratory reproducibility is the one based on the standard deviation obtained for control samples during routine work. The advantage is that no extra work is involved: control samples are analyzed in each batch, and the withinlaboratory standard deviation is calculated each time a control chart is completed (or sooner if desired, say after 10 batches). The calculation is here:
R_{L} = 2.8 × s_{cc} 
(7.23) 
where s_{cc} is the standard deviation obtained from a Control Chart (see 8.3.2).
Clearly, the above three R_{L} values are not identical and thus, whenever the withinlaboratory reproducibility is reported, the way by which it is obtained should always be stated.
Note: Naturally, instead or reporting the derived validation parameters for precision R, r, or R_{L}, one may prefer to report their primary measure: the standard deviation concerned.
This is a measure for the response y of the instrument or of a whole method to the concentration C of the analyte or property, e.g. the slope of the analytical calibration graph (see Section 7.2.2). It is the value that is required to quantify the analyte on basis of the analytical signal. The sensitivity for the analyte in the final sample extract may not necessarily be equal to the sensitivity for the analyte in a simple standard solution. Matrix effects may cause improper calibration of the measuring Step of the analytical method. As observed earlier for calibration graphs, the sensitivity may not be constant over a long range. It usually decreases at higher concentrations by saturation of the signal. This limits the working range (see next Section 7.5.4). Some of the most typical situations are exemplified in Figure 72.
Fig. 72. Examples of some typical response graphs. 1. Constant sensitivity. 2. Sensitivity constant over lowerrange, then decreasing. 3. Sensitivity decreasing over whole range. (See also 7.5.4.)
In general, on every point of the response graph the sensitivity can be expressed by
_{} 
(7.24) 
The dimension of S depends on the dimensions of y and C. In atomic absorption, for example, y is expressed in absorbance units and C in mg/L. For pH and ionselective electrodes the response of the electrode is expressed in mV and the concentration in mg/L or moles (plotted on log scale). Often, for convenience, the signal is conversed and amplified to a direct reading in arbitrary units, e.g. concentration. However, for proper expression of the sensitivity, this derived response should be converted back to the direct response. In practice, for instance, this is simply done by making a calibration graph in the absorbance mode of the instrument as exemplified in Figure 71, where slope b is the sensitivity of the P measurement on the spectrophotometer. If measured in the absorption (or transmission) mode, plotting should be done with a logarithmic yaxis.
For most analytical methods the working range is known from previous experience. When introducing a new method or measuring technique this range may have to be determined. This range can be determined during validation by attempting to span a (too) wide range. This can for instance be done by using several sample sizes, liquid:sample ratios, or by spiking samples (see 7.5.6, Recovery). This practice is particularly important to determine the upper limit of the working range (the lower limit of a working range corresponds with the Method Detection Limit and was discussed in Section 7.3.2). The upper limit is often determined by such factors as saturation of the extract (e.g. the "free" iron or gypsum determinations) or by depletion of a solution in case of adsorption procedures (e.g. phosphate adsorption; cobaltihexamine or silver thiourea adsorption in singleextraction CEC methods). In such cases the liquid:sample ratio has to be adapted.
To determine the measuring range of solutions the following procedure can be applied:
 Prepare a standard solution of the analyte in the relevant matrix (e.g. extractant) at a concentration beyond the highest expected concentration. Measure this solution and determine the instrument response.
 Dilute this standard solution 10× with the matrix solution and measure again.
 Repeat dilution and measuring until the instrument gives no response.
 Plot the response vs. the concentration.
 Estimate the useful part of the response graph.
(If the dilution steps are too large to obtain a reliable graph, they need to be reduced, e.g. 5×).
In Figure 72 the useful parts of graphs 1 and 2 are obviously the linear parts (and for graph 2 perhaps to concentration 8 if necessary). Sometimes a builtin curve corrector for the linearization of curved calibration plots can extend the range of application (e.g. in AAS). Graph 3 has no linear part but must and can still be used. A logarithmic plotting may be considered and in some cases by nonlinear (polynomial) regression an equation may be calculated. It has to be decided on practical grounds what concentration can be accepted until the decreasing sensitivity renders the method inappropriate (with the knowledge that flat or even downward bending ranges are useless in any case).
The measurement of an analyte may be disturbed by the presence of other components. The measurement is then nonspecific for the analyte under investigation. An analytical method is "fully specific" when it gives an analytical signal exclusively for one particular component, but is "dead" for all other components in the sample, e.g. when a reagent forms a coloured complex with only one analyte. A method is "fully selective" when it produces correct analytical results for various components of a mixture without any mutual interaction of the components, e.g. when a reagent forms several coloured complexes with components in the matrix but with a different colour for each component. A selective method is composed of a series of specific measurements.
Mutual influences are common in analytical techniques but can often easily be overcome. An example is ionization interference reducing the specificity in flame spectrometric techniques (FES, AAS). The selectivity is no problem as the useful spectral lines can be selected exactly with a monochromator or filters. The mutual interference can be suppressed by adding an excess of an easily ionizable element, such as cesium, which maintains the electron concentration in the flame constant. In chromatographic techniques (GC, HPLC) specificity is sometimes a problem in the analysis of complex compounds.
In the validation report, selectivity and specificity are usually described rather than quantitatively expressed.
To determine the effectiveness of a method (and also of the working range), recovery experiments can be carried out. Recovery can be defined as the 'fraction of the analyte determined after addition of a known amount of the analyte to a sample'. In practice, control samples are most commonly used for spiking. The sample as well as the spikes are analyzed at least 10 times, the results averaged and the relative standard deviation (RSD) calculated. For inhouse validation the repeatability (replicates in one batch, see 7.5.2.2) is determined, whereas for quality control the withinlaboratory reproducibility (replicates in different batches, see 7.5.2.3) is determined and the data recorded on Control Charts. The concentration level of the spikes depend on the purpose: for routine control work the level(s) will largely correspond with those of the test samples (recoveries at different levels may differ): a concentration midway the working range is a convenient choice. For the determination of a working range a wide range may be necessary, at least to start with, see 7.5.4). An example is the addition of ammonium sulphate in the Kjeldahl nitrogen determination. Recovery tests may reveal a significant bias in the method used and may prompt a correction factor to be applied to the analytical results.
The recovery is calculated with:
_{} 
(7.25) 
where
¯x_{s} = mean result of spiked samples
¯x = mean result of unspiked samples
¯x_{add} = amount of added analyte
If a blank (sample) is used for spiking then the mean result of the unspiked sample will generally be close to zero. In fact, such replicate analyses could be used to determine or verify the method detection limit (MDL, see 7.3.2).
As has been mentioned before (Section 7.4.5) the recovery obtained with a spike may not be the same as that obtained with real samples since the analyte may not be integrated in the spiked sample in the same manner as in real samples. Also, the form of the analyte with which the spike is made may present a problem as different compounds and grain sizes representing the analyte may behave differently in an analysis.
An analytical method is rugged or robust if results are not (very) sensitive to variations in the experimental conditions. Such conditions can be temperature, extraction or shaking time, shaking technique, pH, purity of reagents, moisture content of sample, sample size, etc. Usually, when a new method is proposed, the ruggedness is first tested by the initiating laboratory and subsequently in an interlaboratory trial. The ruggedness test is conveniently done with the socalled "Youden and Steiner partial factorial design" where in only eight replicate analyses seven factors can be varied and analyzed. This efficient technique can also be used for withinlaboratory validation. As an example the ammonium acetate CEC determination of soil will be taken. The seven factors could be for instance:
A: With (+) and without () addition of 125 mg CaCO_{3} to the sample (corresponding with 5% CaCO_{3} content)
B: Concentration of saturation solution: 1 M (+) and 0.5 M () NH_{4}OAc
C: Extraction time: 4 hours () and 8 hours (+)
D: Admixture of seasand (or celite): with (+) and without () 1 teaspoon of sand
E: Washing procedure: 2× () or 3×(+) with ethanol 80%
F: Concentration of washing ethanol: 70% () or 80% (+)
G: Purity of NH_{4}OAc: technical grade () and analytical grade (+)
The matrix of the design looks as shown in Table 72. The eight subsamples are analyzed basically according to the SOP of the method. The variations in the SOP are indicated by the + or  signs denoting the high or low level, presence or absence of a factor or otherwise stated conditions to be investigated. The eight obtained analytical results are Yi,. Thus, sample (experiment) no. 1 receives all treatments A to G indicated with (+), sample no. 2 receives treatments A, B and D indicated by (+) and C, E, F and G indicated by (), etc.
Table 72. The partial factorial design (seven factors) for testing ruggedness of an analytical method

Factors 
 
Experiment 
A 
B 
C 
D 
E 
F 
G 
Results 
1 
+ 
+ 
+ 
+ 
+ 
+ 
+ 
Y_{1} 
2 
+ 
+ 
 
+ 
 
 
 
Y_{2} 
3 
+ 
 
+ 
 
+ 
 
 
Y_{3} 
4 
+ 
 
 
 
 
+ 
+ 
Y_{4} 
5 
 
+ 
+ 
 
 
+ 
 
Y_{5} 
6 
 
+ 
 
 
+ 
 
+ 
Y_{6} 
7 
 
 
+ 
+ 
 
 
+ 
Y_{7} 
8 
 
 
 
+ 
+ 
+ 
 
Y_{8} 
The absolute effect (bias) of each factor A to G can be calculated as follows:
_{} 
(7.26) 
where
S Y_{A+} = sum of results Y_{i}, where factor A has + sign (i.e. Y_{1}, + Y_{2} + Y_{3} + Y_{4}; n=4)
S Y_{A} = sum of results Yi, where factor A has  sign (i.e. Y_{5} + Y_{6} + Y_{7}+ Y_{8}; n=4)
The test for significance of the effect can be done in two ways:
1. With a ttest (6.4.3) using in principle the table with "twosided" critical t values (App. 1, n=4). When clearly an effect in one direction is to be expected, the onesided test is applicable.2. By checking if the effect exceeds the precision of the original procedure (i.e. if the effect exceeds the noise of the procedure). Most realistic and practical in this case would be to use s_{cc}, the withinlaboratory standard deviation taken from a control chart (see Sections 7.5.2.3 and 8.3.2). Now, the standard deviation of the mean of four measurements can be taken as _{} (see 6.3.4), and the standard deviation of the difference between two such means (i.e. the standard deviation of the effect calculated with Eq. 7.26) as _{}. The effect of a factor can be considered significant if it exceeds 2× the standard deviation of the procedure, i.e._{}.
Therefore, the effect is significant when:
Effect >1.4 × s_{cc} 
(7.27) 
where s_{cc} is the standard deviation of the original procedure taken from the last complete control chart.
Note. Obviously, when this standard deviation is not available such as in the case of a new method, then an other type of precision has to be used, preferably the withinlaboratory reproducibility (see 7.5.2).
It is not always possible or desirable to vary seven factors. However, the discussed partial factorial design does not allow a reduction of factors. At most, one (imaginary) factor can be considered in advance to have a zero effect (e.g. the position of the moon). In that case, the design is the same as given in Table 72 but omitting factor G.
For studying only three factors a design is also available. This is given in Table 73.
Table 73. The partial factorial design (three factors) for testing ruggedness of an analytical method
Experiment 
Factors 
Results  

A 
B 
C 

1 
+ 
+ 
+ 
Y_{1} 
2 
 
+ 
 
Y_{2} 
3 
+ 
 
+ 
Y_{3} 
4 
 
 
 
Y_{4} 
The absolute effect of the factors A, B, and C can be calculated as follows:
_{} 
(7.28) 
where
S Y_{A+} = sum of results Y_{i}, where factor A has + sign (i.e. Y_{1 }+ Y_{3}; n=2)
S Y_{A} = sum of results Y_{i}, where factor A has  sign (i.e. Y_{2 }+ Y_{4}; n=2)
The test for significance of the effect can be done similarly as described above for the sevenfactor design, with the difference that here n = 2.
If the relative effect has to be calculated (for instance for use as a correction factor) this must be done relative to the result of the original factor. Thus, in the above example of the CEC determination, if one is interested in the effect of reducing the concentration of the saturating solution (Factor B), the "reference" values are those obtained with the 1 M solution (denoted with + in column B) and the relative effect can be calculated with:
_{} 
(7.29) 
The confidence of the results of partial factorial experiments can be increased by running duplicates or triplicates as discussed in Section 6.3.4. This is particularly useful here since possible outliers may erroneously be interpreted as a "strong effect".
Often a laboratory wants to check the influence of one factor only. Temperature is a factor which is particularly difficult to control in some laboratories or sometimes needlessly controlled at high costs simply because it is prescribed in the original method (but perhaps never properly validated). The very recently published standard procedure for determining the particlesize distribution (ISO 11277) has not been validated in an interlaboratory trial. The procedure prescribes the use of an endoverend shaker for dispersion. If up to now a reciprocating shaker has been used and the laboratory decides to adopt the endoverend shaker then inhouse validation is indicated and a comparison with the endoverend shaker must be made and documented. If it is decided, after all, to continue with the reciprocating shaking technique (e.g. for practical reasons), then the laboratory must be able to show the influence of this step to users of the data. Such validation must include all soil types to which the method is applied.
The effect of a single factor can simply be determined by conducting a number of replicate analyses (n>. 10) with and without the factor, or at two levels of the factor, and comparing the results with the Ftest and ttest (see 6.4). Such a single effect may thus be expressed in terms of bias and precision.
Many analytical methods are to a greater or lesser extent susceptible to interferences of various kinds. Proper validation should include documentation of such influences. Most prominent are matrix effects which may either reduce or enhance analytical results (and are thus a form of reduced selectivity). Ideally, such interferences are quantified as bias and corrected for, but often this is a tedious affair or even impossible. Matrix effects can be quantified by conducting replicate analyses at various levels and with various compositions of (spiked) samples or they can be nullified by imitating the test sample matrix in the standards, e.g. in Xray fluorescence spectroscopy. However, the matrix of test samples is often unknown beforehand. A practical qualitative check in such a case is to measure the analyte at two levels of dilution: usually the signal of the analyte and of the interference are not proportional.
Wellknown other interferences are, for example, the dark colour of extracts in the colorimetric determination of phosphate, and in the CEC determination the presence of salts, lime, or gypsum. A colour interference may be avoided by measuring at an other wavelength (in the case of phosphate: try 880 nm). Sometimes the only way to avoid interference is to use an other method of analysis.
If it is thought that an interference can be singled out and determined, it can be quantified as indicated for ruggedness in the previous section.
When a new method is proposed or when there is a choice of methods for a determination, it may be useful if an indication or description of the ease or tediousness of the application is available. Usually the practicability can be derived from the detailed description of the procedure. The problems are in most cases related to the availability and maintenance of certain equipment and the required staff or skills. Also, the supply of required parts and reagents is not always assured, nor the uninterrupted supply of stable power. In some countries, for instance, high purity grades cannot always be obtained, some chemicals cannot be kept (e.g. sodium pyrophosphate in a hot climate) and even the supply of a seemingly common reagent such as ethanol can be a problem. If such limitations are known, it is useful if they are mentioned in the relevant SOPs or validation report.
The results of validation tests should be recorded in a validation report from which the suitability of a method for a certain purpose can be deduced. If (legal) requirements for specific analyses are known (e.g. in the case of toxic compounds) then such information may be included.
Since validation is a kind of research project the report should have a comparable format. A plan is usually initiated by the head of laboratory, drafted by the technician involved and verified by the head. The general layout of the report should include:
 Parameters to be validated
 Description of the procedures (with reference to relevant SOPs)
 Results
A model for a validation SOP is given (VAL 092).
For drafting an analytical procedure the general instructions for drafting SOPs as given in Chapter 2 apply. An example of an analytical procedure as it can be written in the form of a SOP is METH 006. A laboratory manual of procedures, the "cookery book", can be made by simply collecting the SOPs for all procedures in a ring binder. Because analytical procedures, more than any other type of SOP, directly determine the product of a laboratory, some specific aspects relating to them are discussed here.
As was outlined in Chapter 2, instructions in SOPs should be written in such a way that no misunderstanding or ambiguity exists as to the execution of the procedure. Thus, much of the responsibility (not all) lies with the author of the procedure. Even if the author and user are one and the same person, which should normally be the case (see 2.2), such misunderstanding may be propagated since the author usually draws on the literature or documents written by someone else. Therefore, although instructions should be as brief as possible, they should at the same time be as extensive as necessary.
As an example we take the weighing of a sample, a common instruction in many analytical procedures. Such an instruction could read:
1. Weigh 5.0 g of sample into a 250 ml bottle.
2. Add 100 ml of extracting solution and close bottle.
3. Shake overnight.
4. Etc., etc.
Comment 1
According to general analytical practice the amount of 5.0 g means "an amount between and including 4.95 g and 5.05 g" (4.95£ weight£ 5.05) since less than 4.95 would round to 4.9 and more than 5.05 would round to 5.1 (note that 5.05 rounds to 5.0 and not to 5.1).
Some analysts, particularly students and trainees, take the amount of 5.0 g too literally and set out on a lengthy process of adding and subtracting sample material until the balance reads "5.0" or perhaps even "5.00". Not only is this procedure tedious, the sample may become biased as particles of different size tend to segregate during this process. To prevent such an interpretation, often the prefixes "approximately", "approx." or "ca." (circa) are used, e.g. "approx. 5.0 g". As this, in turn, introduces a seeming contradiction between "5.0" (with a decimal, so quite accurate) and "approx." ('it doesn't matter all that much'), the desired accuracy must be stated: "weigh approx. 5.0 g (accuracy 0.01 g) into a 250 ml bottle".
The notation 5.0 g can be replaced by 5 g when the sample size is less critical (in the present case for instance if the ratio sample: liquid is not very critical). Sometimes it may even be possible to use "weigh 3  5 g of sample (accuracy 0.1 g)". The accuracy needs to be stated when the actual sample weight is used in the calculation of the final result, otherwise it may be omitted.
Comment 2
The "sample" needs to be specified. A convenient and correct way is to make reference to a SOP where the preparation of the sample material is described. This is the more formal version of the common practice in many laboratories where the use of the sample is implied of which the preparation is described elsewhere in the laboratory manual of analytical procedures. In any case, there should be no doubt about the sample material to be used. When other material than the usual "laboratory sample" or "test sample" is used, the preparation must be described and the nature indicated e.g., "fieldmoist fine earth" or "fraction > 2 mm" or "nodules".
When drafting a new procedure or an own version of a standard procedure, it must be considered if the moisture content of the used sample is relevant for the final result. If so, a moisture correction factor should be part of the calculation step. In certain cases where the sample contains a considerable amount of water (moist highly humic samples; andic material) this water will influence the soil: liquid ratio in certain extraction or equilibration procedures. Validation of such procedures is then indicated.
Comment 3
The "250 ml bottle" needs to be specified also. This is usually done in the section "Apparatus and glassware" of the SOP. If, in general, materials are not specified, then it is implied that the type is unimportant for the procedure. However, in shaking procedures, the kind, size and shape of bottles may have a significant influence on the results. In addition the kind (composition) of glass is sometimes critical e.g., for the boron determination.
Comment 4
To the instruction "Add 100 ml of extracting solution" apply the same considerations as discussed for the sample weighing. The accuracy needs to be specified, particularly when automatic dispensers are used. The accuracy may be implicit if the equipment to be used is stated e.g., "add 100 ml solution by graduated pipette" or "volumetric pipette" or "with a 100 ml measuring cylinder". If another means of adding the solution is preferred its accuracy should equal or exceed that of the stated equipment.
Comment 5
The instruction "shake overnight" is ambiguous. It must be known that "overnight" is equivalent to "approximately 16 hrs.", namely from 5 p.m. till 9 a.m. the next morning. It is implied that this timespan is not critical but generally the deviation should not be more than, say, two hours. In case of doubt, this should be validated with a ruggedness test. More critical in many cases is the term "shake" as this can be done in many different ways. In the section "Apparatus" of the SOP the type of shaking machine is stated e.g., reciprocating shaker or endoverend shaker. For the reciprocating shaker the instruction should include the shaking frequency (in strokes per minute), the amplitude (in mm or cm) and the position of the bottles (standing up, lying lengthwise or perpendicular to the shaking direction). For an endoverend shaker usually only the frequency or speed (in rpm) is relevant.
All laboratories, including those destined for routine work, carry out research in some form. For many laboratories it constitutes the main activity. Research may range from a simple test of an instrument or a change in procedure, to large projects involving many aspects, several departments of an institute, much staff and money, often carried out by commission of third parties (contract research, sponsors).
For any project of appreciable size, according to GLP the management of the institute must appoint a study director before the study is initiated. This person is responsible for the planning and execution of the job. He/she is responsible to a higher Inspecting Authority (IA) which may be the institute's management, the Quality Assurance Unit, the Head of Research or the like as established by the management.
A study project can be subdivided into four phases: preparation, execution, reporting, filing/archiving.
1. Preparation
In this phase the purpose and plan are formulated and approved by the IA. Any subsequent changes are documented and communicated to the IA. The plan must include:
 Descriptive title, purpose, and identification details Study director and further personnel Sponsor or client Work plan with starting date and duration Materials and methods to be used Study protocol and SOPs (including statistical treatments of data)
 Protocols for interim reporting and inspection Way of reporting and filing of results Authorization by the management (i.e. signature)
 A work plan or subroutines can often be clarified by means of a flow diagram. Some of the most used symbols in flow diagrams for procedures in general, including analytical procedures, are given in Figure 73. An example of a flow sheet for a research plan is given in Fig 74.
Fig. 73. Some common symbols for flow diagrams.
2. Execution of the work
The work must be carried out according to the plan, protocols and SOPs. All observations must be recorded including errors and irregularities. Changes of plan have to be reported to the IA and if there are budgetary implications also to the management. The study leader must have control of and be informed about the progress of the work and, particularly in larger projects, be prepared for inspection by the IA.
Fig. 74. Design of flow diagram for study project.
3. Reporting
As soon as possible after completion of the experimental work and verification of the quality control data the results are calculated. Together with a verification statement of the IA, possibly after corrections have been made, the results can be reported. The copyright and authorship of a possible publication would have been arranged in the plan.
The report should contain all information relevant for the correct interpretation of the results. To keep a report digestible, used procedures may be given in abbreviated form with reference to the original protocols or SOPs. Sometimes, relevant information turns up afterwards (e.g. calculation errors). Naturally, this should be reported, even if the results have already been used.
It is useful and often rewarding if after completion of a study project an evaluation is carried out by the study team. In this way a next job may be performed better.
VAL 092  Validation of CEC determination with NH4OAc
METH 006  Determination of nitrogen in soil with microKjeldahl
LOGO 
STANDARD OPERATING PROCEDURE 
Page; 1 # 2  

No.: VAL 092 
Version: 1 
Date: 960919 

Title: Validation of CEC determination with NH_{4}OAc (pH 7) 
File: 
1 PURPOSE
To determine the performance characteristics of the CEC determination with ammonium acetate (pH 7) using the mechanical extractor.
The following parameters have been considered: Bias, precision, working range, ruggedness, interferences, practicability.
2 REQUIREMENTS
See SOP METH 092 (Cation Exchange Capacity and Exchangeable Bases with ammonium acetate and mechanical extractor).
3 PROCEDURES
3.1 Analytical procedure
The basic procedure followed is described in SOP METH 092 with variations and number of replicates as indicated below. Two Control Samples have been used: LABEX 6, a Nitisol (clay» 65%, CEC» 20 cmol_{c}/kg) and LABEX 2, an Acrisol (clay» 25%; CEC» 7 cmol_{c}/kg); further details of these control samples in SOP RF 031 (List of Control Samples).
3.2 Bias
The CEC was determined 10× on both control samples. Reference is the mean value for the CEC obtained on these samples by 19 laboratories in an interlaboratory study.
3.3 Precision
Obtained from the replicates of 3,2,
3.4 Working range
The Method Detection Limit (MDL) was calculated from 10 blank determinations. Determination of the Upper Limit is not relevant (percolates beyond calibration range are rare and can be brought within range by dilution).
3.5 Ruggedness
A partial factorial design with seven factors was used. The experiments were carried out in duplicate and the factors varied are as follows:
A: 
With (+) and without () addition of 125 mg CaCO_{3} (corresponding with 5% CaCO_{3} content) 
B: 
Concentration of saturating solution: 1 M (+) and 0.5 M () NH_{4}OAc 
C: 
Extraction time: 4 hours () and 8 hours (+) 
D: 
Admixture of seasand (or celite): with (+) and without () 1 teaspoon of sand 
E: 
Washing procedure: 2× () or 3× (+) with ethanol 80% 
F: 
Concentration of ethanol for washing free of salt: 70% () or 80% (+) 
G: 
Parity of NH_{4}OAc: technical grade () and analytical grade (+) 
3.6 Interferences
Two factors particularly interfere in this determination: 1. high clay content (problems with efficiency of percolation) and 2. presence of CaCO_{3} (competing with saturating index cation). The first was addressed by the difference in clay content of the two samples as well as by Factor D in the ruggedness test, the second by factor A of the ruggedness test,
3.7 Practicability
The method is famous for its wide application and illfamed for its limitations. Some of the most prominent aspects in this respect are considered.
4 RESULTS
As results may have to be produced as a document accompanying analytical results (e.g. on request of clients) they are presented here in a model format suiting this purpose.
In the present example where two different samples have been used the results for both samples may be given on one form, or for each sample on a separate form.
For practical reasons, abbreviated reports may be released omitting irrelevant information. {The full report should always be kept!)
LOGO 
METHOD VALIDATION FORM 
Page: 1 # 1  

No.: VAL RES 092 
Version: 1 
Date: 961123 

Title: Validation data CECNH_{4}OAc (METH 092) 
File: 
1 TITLE or DESCRIPTION
Validation of cation exchange capacity determination with NH4OAc pH 7 method as described in VAL 092 dd. 960919.
2 RESULTS
2.1 Bias (Accuracy): 
Result of calculation with Eq. (7.14) or (7.16) of Guidelines.  
2.2 Precision 
 

Repeatability: 
Result of calculation with Eq. (7.17) or (7.19). 

Withinlab reproducibility: 
Result of calculation with Eq. (7.23) (if Control Charts are available). 
2.3 Working range: 
Result of calculation as examplified by Table 71 in Section 7.3.2 of Guidelines.  
2.4 Ruggedness: 
Results of calculations with Eq. (7.26) or (7.29),  
2.5 Interferences: 
In this case mainly drawn from Ruggedness test  
2.6 Practicability: 
Special equipment necessary: mechanical extractor substantial amounts of ethanol required washing procedures not always complete, particularly in highclay samples, requiring thorough check.  
2.7 General observations: 

Author: 
Sign.: 
QA Officer (sign.): 
Date of Expiry: 
Author: 
Sign.: 
QA Officer (sign.): 
Date of Expiry: 
LOGO 
METHOD VALIDATION FORM 
Page: 1 # 1  

No.: METH 006 
Version: 2 
Date: 960301 

Title: Determination of nitrogen in soil with microKjeldahl 
File: 
1. SCOPE
This procedure describes the determination of nitrogen with the microKjeldahl technique. It is supposed to include all soil nitrogen (including adsorbed NH_{4}^{+}) except that in nitrates.
2. RELATED DOCUMENTS
2.1 Normative references
The following standards contain provisions referred to in the text.
ISO 3696 Water for analytical laboratory use. Specification and test methods,
ISO 11464 Soil quality Pretreatment of samples for physicochemical analysis.
2.2 Related SOPs
F 001 
Administration of SOPs 
APP 066 
Operation of Kjeltec 1009 digester 
APP 067 
Operation of ammonia distillation unit 
APP 072 
Operation of Autoburette ABU 13 and Titrator TTT 60 (facultative) 
RF 008 
Reagent Book 
METH 002 
Moisture content determination 
3. PRINCIPLE
The microKjeldahl procedure is followed. The sample is digested in sulphuric acid and hydrogen peroxide with selenium as catalyst and whereby organic nitrogen is converted to ammonium sulphate. The solution is then made alkaline and ammonia is distilled. The evolved ammonia is trapped in boric acid and titrated with standard acid,
4. APPARATUS AND GLASSWARE
4.1 Digester (Kjeldahl digestion tubes in heating block)
4.2 Steamdistillation unit (Fitted to accept digestion tubes)
4.3 Burette 25 ml
5. REAGENTS
Use only reagents of analytical grade and deionized or distilled water (ISO 3696).
5.1 Sulphuric acid  selenium digestion mixture. Dissolve 3.5 g selenium powder in 1 L concentrated (96%, density 1.84 g/ml) sulphuric acid by mixing and heating at approx. 350°C. on a hot plate. The dark colour of the suspension turns into clear lightyellow. When this is reached, continue heating for 2 hour5.2 Hydrogen peroxide, 30%.
5.3 Sodium hydroxide solution, 38%. Dissolve 1,90 kg NaOH pellets in 2 L water in a heavywalled 5 L flask. Cool the solution with the flask stoppered to prevent absorption of atmospheric CO_{2}. Make up the volume to 5 L with freshly boiled and cooled deionized water. Mix well.
5.4 Mixed indicator solution. Dissolve 0.13 g methyl red and 0.20 g bromocresol green in 200 ml ethanol.
5.5 Boric acidindicator solution, 1%. Dissolve 10 g H_{3}BO_{3} in 900 ml hot water, cool and add 20 ml mixed indicator solution. Make to 1 L with water and mix thoroughly.
5.6 Hydrochloric acid, 0.010 M standard. Dilute standard analytical concentrate ampoule according to instruction.
Author: 
Sign.: 
QA Officer (sign.): 
Date of Expiry: 
6. SAMPLE
Airdry fine earth (<2 mm) obtained according to ISO 11464 (or refer to own procedure). Mill approx. 15 g of this material to pass a 0.25 mm sieve. Use part of this material for a moisture determination according to ISO 11465 and PROC 002.
7. PROCEDURE
7.1 Digestion
1. Weigh 1 g of sample (accuracy 0.01 g) into a digestion tube. Of soils, rich in organic matter (>10%), 0.5 g is weighed in (see Remark 1). In each batch, include two blanks and a control sample.2. Add 2.5 ml digestion mixture.
3. Add successively 3 aliquots of 1 ml hydrogen peroxide. The next aliquot can be added when frothing has subsided. If frothing is excessive, cool the tube in water.
Note:. In Steps 2 and 3 use a measuring pipette with balloon or a dispensing pipette,4. Place the tubes on the heater and heat for about 1 hour at moderate temperature (200°C).
5. Turn up the temperature to approx. 330°C (just below boiling temp.) and continue heating until mixture is transparent (this should take about two hours).
6. Remove tubes from heater, allow to cool and add approx., 10 ml water with a wash bottle while swirling.
7.2 Distillation
1. Add 20 ml boric acidindicator solution with measuring cylinder to a 250 ml beaker and place beaker on stand beneath the condenser tip.2. Add 20 ml NaOH 38% with measuring cylinder to digestion tube and distil for about 7 minutes during which approx. 75 ml distillate is produced.
Note: the distillation time and amount of distillate may need to be increased for complete distillation (see Remark 2).
3. Remove beaker from distiller, rinse condenser tip, and titrate distillate with 0.01 M HCl until colour changes from green to pink.
Note: When using automatic titrator: set endpoint pH at 4.60.
Remarks
1. The described procedure is suitable for soil samples with a nitrogen content of up to 10 mg N. This corresponds with a carbon content of roughly 10% C. Of soils with higher contents, less sample material is weighed in. Sample sizes of less than 250 mg should not be used because of sample bias.2. The capacity of the procedure with respect to the amount of N that can be determined depends to a large extent on the efficiency of the distillation assembly. This efficiency can be checked, for instance, with a series of increasing amounts of (NH_{4})_{2}SO_{4} or NH_{4}Cl containing 050 mg N.
8. CALCULATION
_{}
where
a = ml HCl required for titration of sample
b = ml HCl required for titration of blank
s = airdry sample weight in gram
M = molarity of HCl
1.4 = 14 × 10^{3 }× 100% (14 = atomic weight of nitrogen)
mcf = moisture correction factor
9. VALIDATION PARAMETERS
9.1 Bias: 
3.1% rel. (sample ISE 921, ¯x=2.80 g/kg N, n=5) 
9.2 Withinlab reproducibility: 
R_{L} = 2.8×s_{cc} = 2,5% rel. (sample LABEX 38,¯x =2.59 g/kg N, n=30) 
9.3 Method Detection Limit: 
0.014 mg N or 0.0014% N 
10. TEST REPORT
The report of analytical results shall contain the following information:
 the result(s) of the determination with identification of the corresponding sample(s);
 a reference to this SOP (if requested a brief outline such as given under clause 3: Principle);
 possible peculiarities observed during the test;
 all operations not mentioned in the SOP that can have affected the results.
11. REFERENCES
Hesse, P.R. (1971) A textbook of soil chemical analysis. John Murray, London.Bremner, J.M. and C.S. Mulvaney (1982) Nitrogen Total. In: Page, A.L., R.H. Miller & D.R. Keeney (eds.) Methods of soil analysis. Part 2. Chemical and microbiological properties, 2nd ed. Agronomy Series 9 ASA, SSSA, Madison. ISO 11261 Soil quality  Determination of total nitrogen  Modified Kjeldahl method.