8. STOCHASTIC MODELS FOR PREDICTION OF THE CATCH DYNAMICS

The modelled time series are represented by catch variations of the 12 main commercial species. The sampling interval of the time series is 1 year, and all series finish in 1996 and cover the following periods:

1900 (1 series - Pacific herring, 97 observations),
1920 (3 series - Californian and Japanese sardine, and salmon, 77 observations),
1930 (2 series - Atlantic cod and herring, 67 samples),
1950 (4 series - Peruvian anchovy and sardine, Alaska pollock, South African sardine, 47 observations),
1957 (1 series - European sardine, 40 observations),
1969 (1 series - Chilean jack mackerel, 29 observations)

All considered time series undergo well-manifested cyclic variation. The time series can be represented as the following sum:

(1)

where x(t) is the catch volume in the year t, t₀ is the starting year of the series, F(t) is deterministic function (trend), and x(t) is a stationary component (including random noise) of the time series. The stationary component has no trends in mean or variation. The deterministic part of the model (F(t)) is taken as a multiple cyclic trend with m periods T_i, i=1, ..., m (Anderson 1971; Kashyap and Rao 1976):

(2)

where B_i, D_i are unknown amplitudes, and G is an unknown static shift constant. Equation (2) can also be expressed as:

(3)

The stochastic part, xs(t), may be represented as an autoregressive Box-Jenkins model (Box and Jenkins 1970; Anderson 1971; Kashyap and Rao 1976) of the given order p (or AR(p)-model):

(4)

where p is the number of autoregressive parameters, a_k, and e(t) is a residual signal, which is considered to be Gaussian^[3] white noise with zero mean and unknown variance s². Combining formulae (1) and (4), the general model is presented as follows:

(5)

Thus, unknown parameters of model (5) to be estimated from the data, are: p autoregressive parameters a_k; 2m harmonic amplitudes B_i and D_i; a static shift constant G; and variance s² of the residual signal e(t). These parameters are estimated using the maximum likelihood method (Kashyap and Rao 1976), which in case of Gaussian noise e(t) coincides with the least squares method. Let us define Y(t) as the following (p+2m+1)-dimensional column vector:

(6)

where the index "T" indicates a transposed (i.e. column) vector, and the parameters colum vector c of the same dimensionality is:

(7)

Then the model (5) in matrix form could be described as follows:

(8)

where c^T = row vector of parameters.

The following gives a detailed account of the method used to fit the parameters. It is not necessary to understand the fitting method to understand the results of the model. But it is useful to follow the principles used here for robust regression to understand how the particular results are derived. The parameters, c, can be estimated by minimising the sum:

(9)

A solution of (9) can be obtained as the solution to the Normal equations (i.e. standard least-squares regression):

(10)

and the unbiased maximum likelihood estimate of the variance of e(t) is^:

(11)

Estimates defined by equations (10) and (11) are the least squares estimates and maximum likelihood where the errors are genuinely normally distributed. The least squares estimates can be very sensitive to outliers, so that only a few outliers can lead to parameter estimates, which are considerably different from the "true" values (Huber 1981). The outliers may occur in fishery data for various reasons, which cause a temporary violation of the assumptions of the model: increases in illegal activities or the influences of rare but powerful events (war, economic crisis, etc.). One way to make estimates more resistant to outliers is to apply maximum likelihood estimates, but with the assumption that the distribution of the residual signal e(t) is Gaussian for small values ofe and Laplacian for large values of e. This kind of distribution has the following density (Huber 1981):

(12a)

(12b)

where a is a robustness parameter, which usually takes values 1-3 (we use a=2), s is a scale parameter (analogous to standard deviation in Gaussian law), andb is normalising constant:

(12c)

Maximum log-likelihood method for the model (12) leads to the following problem (with accuracy to an additive constant, which depends on a only):

(13)

Solution of equation (13) can be obtained using an iterative method, which is a combination of a generalised Newton-Raphson method for vector c and a simple iteration method for s:

(14)

where j=0, 1, ... is the iteration index. Matrix A(c, s) and vector R(c, s) are computed as follows:

(15)

where

(16)

Function c(c, s) is defined as:

(17)

where

(18)

Derivation of equation (17) is the following. If we consider the dependence of value (13) on parameter s, it could be noticed that the major part of this dependence is due to multipliers 1/(2s²) and a/s^{. If}d^s is a small variation of s, then variation of the sums:

(19)

multiplied by 1/(2s²) and a/s, are much smaller than the variation of 1/(2s²) and a/s alone. Hence, for a small compute variation of J(c, s) with respect to ds, values (19) can be considered approximately constant, so:

(20)

Applying the condition for the maximum point, dJ(c, s)=0, results in one non-trivial solution for s in the form of a quadratic equation: ar2 + gr - 1 = 0, where r=1/s. This has a unique positive root:

However, the values of a andg are dependent on s, so the last equation should be used in an iterative procedure to refine the value of s (the 2^nd equation in (14)). It should be noted that if the value of the robustness parameter a is large, then g=0 and, according to (20), , which is the same as (11).The iterative procedure starts from an initial approximation, which is given by the least method (10) and (11). It was found that values converged reasonably rapidly within 5-10 iterations.

The order of the autoregression, p, and number m of harmonics in cyclic trend and their periods T_i need to be set before estimating the parameters of the model (5). The autoregression order (p) was assumed to be equal to 2 for all cases as the minimum value for describing a wide range of types of stochastic oscillations (Box and Jenkins 1970). As for choice of values of m and periods T_i, two possible approaches were used.

8.1 THE FIRST APPROACH - NUMBER OF HARMONICS AND THEIR PERIODS ARE DEFINED FROM FISHERY TIME SERIES THEMSELVES

Let us define the time series of more than 64 samples as "long" and others as "short" time series data sets.

For "long" time series, the number of harmonics m and their periods T_i are defined from power spectra estimates of the appropriate series. The value of m varies from 1 to 6 and is usually equal to 3-4. Values of periods T_i were taken as periods corresponded to the essential peak values of power spectra estimates. Although the time series are labelled as "long", their lengths (from 67 up to 97 samples) has rather few observations for applying common procedures of spectra estimates based on the Fourier transform. For such a short series, parametric methods are more advantageous in terms of frequency resolution (Marple 1987). We use the autoregression approximation for the series, which is the procedure estimating parameters of the following model:

(21)

where a_j, j=1, ..., q are autoregressive parameters, h(t) is the residual signal, which is assumed to be Gaussian white noise with zero mean and variance s². Equation (21) applies the designations forparameters of the AR(q)-model different from those for the AR(p)-model in equations (4) and (5), to stress that although these models are the same, they are used for distinctly different purposes. The AR-members in formulae (4) and (5) are introduced to describe essential features of stochastic oscillations of the signal close to the deterministic cyclic trend. That is why we assumed a small order p=2. The AR(q)-model (21) is estimated to describe the spectral structure of the signal and for this purpose needs many more parameters (i.e. larger q). The larger the value of q, the more sensitive is the power spectra estimate. At the same time, increase in q results in the corresponding increase in the statistical fluctuations of the estimate. Thus, the choice of q is a compromise between sensitivity and stability. Usually, for short series q=N/5-N/3, where N is the length of time series. We used q=20. As soon as parameters of the model (21) are estimated, the power spectra estimate could be computed using the following equation:

(22)

where w is a cyclic frequency: w=2p/T, T = period, measured in units of sampling time interval (1year in our case), i = imaginary unit. AR-methods of power spectra estimates differ from each other by procedures of estimating parameters of the model (21). We use the most reliable and providing the best frequency resolution for short time series (Marple 1987).

For the "short" series, we used the model (5) with p=2 and single harmonic cyclic trend: m=1. Because of lack of samples in the series, we could not define the period value T₁ from power spectra estimates. That is why it was defined by minimization of s² after estimating the autoregression parameters a₁ and a₂^,amplitudes B₁, D₁ and static shift constant G for different trial values of the period: s²(T₁)®min. The last task of one-dimensional minimization is solved by a Golden Sectionmethod. This method will be defined as fitting an AR(2)-model with single cyclic trend with unknown period.

After the model had been fitted, it was used to forecast the interval 1997-2056 (the near 60-year cycle). The procedure is described below. For some species, the forecasted time series took negative values during some future time intervals. In such cases, we interpret these as a negligible forecasted catch and substituted all negative values with zero.

8.2 THE SECOND APPROACH - AR(2)-MODEL WITH A SINGLE CYCLIC TREND WITH THE PERIOD LENGTH TAKEN FROM THE CLIMATIC TIME SERIES

Earlier we used those periods in the equation (1), which follow naturally from the power spectra of the fishery time series, or those periods for which the cyclic trend is the best approximation to the data. However, such values of periods may be too sensitive to patterns in the data, which are not useful for forecasting. A long-term prediction based on the short-term behaviour of a signal is risky. The catch depends not only on natural processes in the ocean, but on economic politics of fishing-dependent countries, availability of fuel for fleets, power resources for fish processing, military and economic collisions in various regions of World ocean, etc. It is hardly possible to forecast all these (and other) factors for 10-20 years ahead. It is therefore prudent to confine the model to patterns, which can be established through independent means.

From all the above, it may be concluded that the period lengths derived from the fishery time series and then used in model (5), are estimated and therefore depend on realization and length of the sample. At the same time, there are time series for global climatic processes which, although believed to be somewhat affected by human activities or changes in local measurement environments, are much less "human-dependent" than fish stocks are.

These are:

mean global temperature - dT time series, 138 annual samples for 1861-1998;
atmospheric circulation index - ACI time series, 109 annual samples for 1891-1999;
length of day (i. e. Earth rotation velocity) - LOD time series , 151 annual samples for 1850-2000.

In addition, the following long-term series were analyzed:

Reconstructed air surface temperature for the period of 552-1975 AD by Greenland Ice Cores - 1420 annual samples.
Reconstructed summer air surface temperature for the period of 500-1990 AD by Tree Rings of Scots pine -140 samples aggregated by a 10-year moving average.
Reconstructed sardine and anchovy population by analysis of varved sediment cores located in Californian upwelling for the period of 500-1950 AD - 145 samples aggregated by 10-years averaging.

Climatic processes definitely affect the productivity of major commercial marine species, although in an undefined way. Thus, we applied model (5) to make forecasts, but the periods were derived not only from fishery time series, but also from using the most significant climate cycles. Applying such cyclic trends to model (5) results in more reliable values for the modelled (and forecasted) period lengths.

The values of the period lengths derived from the climatic and geophysical (LOD) time series are the following:

- for dT - 55, 9, 5 years (evaluation from power spectrum estimate), 64 years (evaluation from single cyclic trend fitting with unknown period);

- for ACI - 50, 19 years (evaluation from power spectrum estimate), 58. 5 years (evaluation from single cyclic trend fitting with unknown period);

- for LOD - 64, 23 years (evaluation from power spectrum estimate).

Thus, we examined the AR(2)-model with a single cyclic trend with the following modelled periods close to those derived form the above-mentioned climatic (geophysical) time series: 55, 60 and 65 years.

8.3 FORECASTING

To make the forecast, we applied a "bootstrap" method (Efron and Tibshirani 1986) to generate a probability distribution for the forecasts. The method consisted of generating a large number, M (we take M=1000), of independent stochastic artificial trajectories of the model (5), starting from the last sample of the series into the future time interval of the given length. These trajectories differ from each other only by independent realizations of the white noise component e(t). All other parameters (including variance s²) are fixed as the estimates. Each trajectory presents a scenario for a future time series following model (5) drawn from a probability distribution based on the variation in data. Thus we have a "bundle" of M samples of generated trajectories which fill some "strip" on the plane of ( t, x). For each value of t in the future we can compute the average value among M values, corresponding to the different realisations and a variance (a width of the projected "strip"). These mean values constitute the forecasting curve, and the variance values reflect standard deviations bars for each future t value (for more explanation see Chapter 9)

Besides forecasting, we also applied the same bootstrap technique for "forecasting the past" or "hindcasting" (see figures of Chapter 9, dashed line). In this case, M bootstrap trajectories were generated starting from the third rather than first data point of the series because the AR(2) used the first two values to initiate the autoregression. This procedure can be regarded as a test of the model by visual comparison between the mean values (i.e. expected from the model) and observed time series. It should be noticed, however, that such mean trajectory depends on the first two samples taken as initial values. Other methods can also be used to test model (5), such as the usual statistical criteria to test whether the residual signal e(t) is Gaussian white noise or more sophisticated tools such as the AIC (Akaike Informational Criterion) to decide upon which terms to include (Kashyap and Rao 1976).

^[3] The Gaussian probability distribution, also known as the Normal distribution, is often appropriate when errors show constant variation.