APPENDIX 1 CONCEPTUAL AND PRACTICAL CONSIDERATIONS FOR THE QUANTIFICATION OF OVERALL UNCERTAINTY IN STOCK ASSESSMENTS

The quantification of the overall uncertainty of an estimate conceptually involves a four-step process:

1. For each potential source of error, a set of hypotheses must be developed. (It should be noted that a set of hypotheses does not have to be a group of discrete alternatives, as it can be represented by a continuous distribution.);
2. For each hypothesis in Step 1, a relative weight or probability must be determined;

3. For all combinations of hypotheses in Step 1, the likelihood of the resulting estimate (e.g. the fit of the data) must be determined;
4. The results from Steps 2 and 3 must be integrated to provide an overall assessment of the uncertainty or risk.

Step 1: Develop hypotheses for each source of error

The possible range of alternatives that could be reasonably considered is often very large. A balanced set of hypotheses representing the overall uncertainty must be considered. It is not appropriate to consider only "optimistic" or "pessimistic" alternatives. Consideration must be given to both their relative plausibilities and their relative likely impacts on the results.

In general, relatively implausible hypotheses should not be considered. However, the extent to which "low probability" hypotheses should be included depends in part on the risk criteria that the managers are using, combined with the possible consequences if those hypotheses were satisfied. For example, a decision about whether to consider stock-recruitment functions with depensation will often be a judgement involving these two factors. Similarly, highly-plausible alternatives can be collapsed into a single alternative if they result in similar estimates or consequences. For example, small uncertainties associated with weight and length measurements can generally be ignored.

The problem in using the "relative likely impact" as a criterion for deciding whether to include an alternative hypothesis is that the relative impact cannot often be determined until the calculations in Step 3 are completed. Some preliminary calculations, plus common sense, can assist in this process. In general, when reasons exist to suspect that an alternative hypothesis may have a large effect, it should be included. Nevertheless, considerable judgement is required.

Step 2: Determine a relative weight for each hypothesis

This process requires considerable scientific judgment. In Bayesian approaches, this is referred to as specification of the priors, and is an explicit part of the process. It is often not realized that other approaches for assessing uncertainties, either implicitly or explicitly, involve specification of prior weights for alternative hypotheses. Thus excluding some alternatives is like assigning zero weights to them. Similarly, Monte Carlo simulation methods require specification of the relative probability or weight to be given to each realization in the simulation through the assignment of probability distributions to the input parameters in the simulation. Even interpretation of sensitivity analyses requires implicit weightings, either by the scientists or those using the results, of the likely probabilities of the different alternatives presented.

The following hierarchy (Sainsbury, K., D. Butterworth, C. Francis, N. Klaer, T. Polachek, A. Punt, and T. Smith, 1995, Incorporating uncertainty into stock projections-Report of the Scientific Meeting 3-7 April 1995, CSIRO Marine Laboratories, Hobart, Tasmania, Australia: 51 pp) is suggested in selecting the range of alternative hypotheses and assigning relative weights:

a) How strong is the evidence for the alternative in the existing information for the species under consideration?
b) How strong is the evidence for the alternative in the existing information for similar species?

c) How strong is the evidence for the alternative in the existing information from any species?

d) How strong is the theoretical basis for the alternative?

Step 3: Determine the likelihoods for all hypotheses

Determining the relative probability (or output weight) of a resulting estimate from a model, given the input data for a particular combination of hypotheses, also requires scientific judgement. The most common approach for deciding on the output weights is to construct an overall likelihood function for the parameters of the model, given the data. The results, in some cases, can be sensitive to the specification of these, particularly when there are several data sources.

Difficulties arise when not all of the alternative hypotheses for the model structure use the same data sets. Two alternatives for dealing with this problem are: (1) to further develop the alternative model structures so that they all use the same data or (2) to conduct separate analyses for each model structure and compare the end results. In the long term, the first alternative is preferable, and attempts should be made to have the estimation models incorporate all of the relevant data.

Another aspect of Step 3 is consideration of model mis-specification, which may be guided by examination of the results of a particular realization (i.e. pertaining to a particular combination of alternative hypotheses). Such examinations may involve analyses of the residuals, retrospective analyses, and "reality" checking relative to predictions derived from the model. Analyses of the residuals can be conducted with standard statistical diagnostic tests. However, there are complications in applying standard diagnostic tests to complex estimation models because the procedures for detecting model mis-specification are not well developed for complex models. Most commonly, examination of the residuals is used to provide a zero to one weighting to particular realizations or classes of model. Retrospective analyses can also provide an indication as to whether model mis-specification is occurring. Generally, the results of such analyses have been used in a qualitative sense, e.g. to suggest that caution should be used in the interpretation of the results. There is a need for additional research on approaches for incorporating the results from both residual and retrospective analyses into the evaluation of overall uncertainty.

Over-parameterization is another complication in Step 3. Problems with lack of fit in residuals can always be solved by increasing the number of estimated parameters. Increased parameterization, however, eventually leads to model over-specification and to inappropriate model predictions. Likelihood-ratio tests, or their Bayesian analogues, can sometimes be used to provide guidance here. There are issues related to plausibility relative to Step 1, however, particularly in relation to the power of such tests to distinguish among competing hypotheses. This issue is particularly important when the model results are used to extrapolate beyond the range of observed values (e.g. the ability to distinguish between linear and non-linear relationships with small amounts of data).

Step 4: Integrate the relative weights and likelihoods

Integrating the results of steps 2 and 3 is straightforward, in principle, although it can be computationally intensive. Conceptually, this step entails multiplying the results of Steps 2 and 3 together to provide a probability distribution across the full set of alternative hypotheses. This probability distribution then provides a quantitative measure of the uncertainty associated with the quantity being estimated. In practice, the full set may be very large (or infinite), and a sampling-estimation procedure is required.

The extent and completeness with which each of these steps can be successfully completed in any given application will depend on the availability of data, on the basic knowledge about the underlying system, and on the availability of scientists and time. When the existing data are not very informative, the resulting probability distribution from this process will reflect primarily the prior distributions assigned in Step 2. In this case, the uncertainty will be simply a reflection of the alternative hypotheses selected and the weights assigned to them in Steps 1 and 2. Unless there is a consensus that the prior information is highly informative, the resulting probability distribution for the quantity of interest will be very broad. Moreover, different experts' judgements about the appropriate sets of alternative hypotheses and about their relative weights will differ. For this reason, it is important to attempt to harmonize and integrate different judgements into a single prior. If this cannot be achieved, quite different results can occur, as they have in some cases. There is no easy or quick solution in such situations. Essentially, the only solution is to collect more data.