Previous Page Table of Contents Next Page


CHAPTER 7 - ESTIMATION OF PARAMETERS


In the previous chapters, several models used in stock assessment were analysed, the respective parameters having been defined. In the corresponding exercises, it was not necessary to estimate the values of the parameters because they were given. In this chapter, several methods of estimating parameters will be analysed. In order to estimate the parameters, it is necessary to know the sampling theory and statistical inference.

This manual will use one of the general methods most commonly used in the estimation of parameters - the least squares method. In many cases this method uses iterative processes, which require the adoption of initial values. Therefore, particular methods will also be presented, which obtain estimates close to the real values of the parameters. In many situations, these initial estimates also have a practical interest. These methods will be illustrated with the estimation of the growth parameters and the S-R stock-recruitment relation.

The least squares method is presented under the forms of Simple linear Regression, multiple linear model and non linear models (method of Gauss-Newton).

Subjects like residual analysis, sampling distribution of the estimators (asymptotic or empiric Bookstrap and jacknife), confidence limits and intervals, etc., are important. However, these matters would need a more extensive course.

7.1 SIMPLE LINEAR REGRESSION - LEAST SQUARES METHOD

Model

Consider the following variables and parameters:

Response or dependent variable

= Y

Auxiliary or independent variable

= X

Parameters

= A, B

The response variable is linear with the parameters

Y = A+BX

Objective

The objective of the method is to estimate the parameters of the model, based on the observed pairs of values and applying a certain criterium function (the observed pairs of values are constituted by selected values of the auxiliary variable and by the corresponding observed values of the response variable), that is:

Observed values

xi and yi for each pair i, where i=1,2,...,i,...n

Values to be estimated

A and B and (Y1,Y2,...,Yi,...,Yn) for the n observed pairs of values

Object function (or criterium function)

Estimation method

In the least squares method the estimators are the values of A and B which minimize the object function. Thus, one has to calculate the derivatives ∂Φ/∂A e ∂Φ/∂B, equate them to zero and solve the system of equations in A and B.

The solution of the system can be presented as:

b = Sxy/Sxx

Notice that the observed values y, for the same set of selected values of X, depend on the collected sample. For this reason, the problem of the simple linear regression is usually presented in the form:

y = A + BX + ε

where ε is a random variable with expected value equal to zero and variance equal to σ2.

So, the expected value of y will be Y or A+BX and the variance of y will be equal to the variance of ε.

The terms deviation and residual will be used in the following ways:

Deviation is the difference between yobserved and ymean () i.e., deviation = (y-)

while

Residual is the difference between yobserved and Yestimated (), i.e., residual =.

To analyse the adjustment of the model to the observed data, it is necessary to consider the following characteristics:

Sum of squares of the residuals:

This quantity indicates the residual variation of the observed values in relation to the estimated values of the response variable of the model, which can be considered as the variation of the observed values that is not explained by the model.

Sum of squares of the deviations of the estimated values of the response variable of the model:

This quantity indicates the variation of the estimated values of the response variable of the model in relation to its mean, that is the variation of the response variable explained by the model.

Total sum of squares of the deviations of the observed values equal to:

This quantity indicates the total variation of the observed values in relation to the mean

It is easy to verify the following relation:

SQtotal = SQmodel + SQresidual

or

or

1 = r2 + (1 - r2)

where

r2 (coefficient of determination) is the percentage of the total variation that is explained by the model and

1-r2 is the percentage of the total variation that is not explained by the model.

7.2 MULTIPLE LINEAR REGRESSION - LEAST SQUARES METHOD

Model

Consider the following variables and parameters:

Response or dependent variable

= Y

Auxiliary or independent variables

= X1, X2,..., Xj,..., Xk

Parameters

= B1, B2,..., Bj,..., Bk

The response variable is linear with the parameters

Y = B1X1+B2X2+... + BkXk = Σ BjXj

Objective

The objective of the method is to estimate the parameters of the model, based on the observed n sets of values and by applying a certain criterium function (the observed sets of values are constituted by selected values of the auxiliary variable and by the corresponding observed values of the response variable), that is:

Observed values x1,i x2,i,..., xj,i,.., xk,i and yi for each set i, where i=1,2,...,i,...n

Values to be estimated B1,B2,...,Bj,...,Bk et (Y1,Y2,..., Yi,..., Yn)

The estimated values can be represented by:

(ou b1,b2,...,bj,...,bk) et

Object function (or criterium function)

Estimation method

In the least squares method the estimators are the values of Bj which minimize the object function.

As with the simple linear model, the procedure of minimization requires equating the partial derivatives of Φ to zero in order to each parameter, Bj, where j=1, 2,..., k. The system is preferably solved using matrix calculus.

Matrix version

Matrix X(n,k) = Matrix of the n observed values of each of the k auxiliary variables
Vector y(n,1) = Vector of the n observed values of the response variable
Vector Y(n,1) = Vector of the values of the response variable given by the model (unknown)
Vector B(k,1) = Vector of the parameters
Vector or b(k,1) = Vector of the estimators of the parameters

Model

Y(n,1) = X(n,k). B(k,1) ou Y=X.B+ε

Object function

Φ(1,1) = (y-Y)T.(y-Y) ou Φ(1,1) = (y-X.B)T.(y-X.B)

To calculate the least squares estimators it will suffice to put the derivative dΦ/dB of Φ in order to vector B, equal to zero. dΦ/dB is a vector with components ∂Φ/∂B1, ∂Φ/∂B2,..., ∂Φ/∂Bk. Thus:

dΦ/dB(k,1) = -2.XT.(y-X.B) = 0

or XTy - (XT.X). B = 0

and b = = (XT.X)-1. XTy

The results can be written as:

b(k,1) = (XT.X)-1.XTy

= X.b or = X (XT.X)-1.XT y

residuals(n,1) = (y-)

Comments

In statistical analysis it is convenient to write the estimators and the sums of the squares using idempotent matrices. Then the idempotent matrices L, (I - L) and (I - M) with L(n,n) = X (XT. X)-1. XT, I = unity matrix and M(n,n) = mean(n,1) matrix = 1/n [1] where [1] is a matrix with all its elements equal to one, are used.

It is also important to consider the sampling distributions of the estimators assuming that the variables εi are independent and have a normal distribution.

A summary of the main properties of the expected value and variance of the estimators is presented:


E[c1+c2.u] = c1+c2.E[u]

V[c1+c2.u] = c2.V[u].c2T

1 -

Random variable, ε

εn. (independent)


Expected value of ε

E[ε] = 0.


Variance of ε

V[ε](n.n) = E[ε.εT]=I. σ2

2 -

Observed response variable y

y = Y+ε


Expected value of y

E[y] = Y = X.B.


Variance of y

V[y](n.n) = V[ε](n.n) = I. σ2

3 -

Estimator of B

= (XT.X)-1.XT.y


Expected value of

E[] = B


Variance of

V[](k.k) = (XT.X)-1. σ2

4 -

Estimator of Y of the model

= X. = L.y


Expected value of

E[] = Y.


Variance of

V[] = L. σ2

5 -

Residual e

e = y-= (I-L).y


Expected value of e

E[e] = 0


Variance of e

V[e] = (I-L). σ2

6 - Sum of squares

6.1 - Residual Sum of squares = SQ residual(1.1) = (y-)T(y-) = yT (I-L)y

This quantity indicates the residual variation of the observed values in relation to the estimated values of the model, that is, the variation not explained by the model.

6.2 - Sum of squares of the deviation of the model = SQ model(1.1) = (-)T(-) = yT (L-M)y

This quantity indicates the variation of the estimated response values of the model in relation to the mean, that is, the variation explained by the model.

6.3 - Total Sum of the squares of the deviations = SQ total(1.1) = (y-)T(y-) = yT (I-M) y

This quantity indicates the total variation of the observed values in relation to the mean.

It is easy to verify the following relation:

SQtotal = SQmodel + SQresidual or

or 1 = R2 + (1 - R2)

where:

R2 is the percentage of the total variation that is explained by the model. In matrix terms it will be:

R2 = [yT(L - M)y].[ (yT(I - M)y]-1

1-R2 is the percentage of the total variation that is not explained by the model.

The ranks of the matrices (I-L), (I-M) and (L-M) respectively equal to (n-k), (n-1) and (k-1), are the degrees of freedom associated with the respective sums of squares.

7.3 NON-LINEAR MODEL - METHOD OF GAUSS-NEWTON - LEAST SQUARES METHOD

Model

Consider the following variables and parameters:

Response or dependent variable

= Y

Auxiliary or independent variable

= X

Parameters

= B1,B2,...,Bj,...,Bk

The response variable is non-linear with the parameters

Y = f(X;B) where B is a vector with the components B1,B2,...,Bj,...,Bk

Objective

The objective of the method is to estimate the parameters of the model, based on the n observed pairs of values and by applying a certain criterium function (the observed sets of values are constituted by selected values of the auxiliary variable and by the corresponding observed values of the response variable), that is:

Observed values xi and yi for each pair i, where i=1,2,...,i,...n

Values to be estimated B1,B2,...,Bj,..,Bk and (Y1,Y2,...,Yi,...,Yn) form the n pairs of observed values.

(Estimates = or b1,b2,...,bj,...,bk and )

Object function or criterium function

Estimation criterium

The estimators will be the values of Bj for which the object function is minimum.

(This criterium is called the least squares method).

Matrix version

It is convenient to present the problem using matrices.

So:

Vector X(n,1) = Vector of the observed values of the auxiliary variable
Vector y(n,1) = Vector of the observed values of the response variable
Vector Y(n,1) = Vector of the values of the response variable given by the model
Vector B(k,1) = Vector of the parameters
Vector b(k,1) = Vector of the estimators of the parameters

Model

Y(n,1) = f(X; B)

Object function

Φ(1,1) = (y-Y)T.(y-Y)

In the case of the non linear model, it is not easy to solve the system of equations resulting from equating the derivative of the function Φ in order to the vector B, to zero. Estimation by the least squares method can, based on the Taylor series expansion of function Y, use iterative methods.

Revision of the Taylor series expansion of a function

Here is an example of the expansion of a function in the Taylor series in the case of a function with one variable.

The approximation of Taylor means to expand a function Y = f(x) around a selected point, x0, in a power series of x:

Y = f(x) = f(x0) +(x-x0).f’(x0)/1! + (x-x0)2f’’(x0)/2! +... + (x- x0)i f(i)(x0)/i!+...

where

f(i)(x0) = ith derivatives of f(x) in order to x, at the point x0.

The expansion can be approximated to the desired power of x. When the expansion is approximated to the power 1 it is called a linear approximation, that is,

Y ≅ f(x0) + (x-x0).f’(x0)

The Taylor expansion can be applied to functions with more than one variable. For example, for a function Y = f(x1,x2) of two variables, the linear expansion would be:

which may be written, in matrix notation, as

Y = Y(0)+A(0).(x-x(0))

where Y(0) is the value of the function at the point x(0),with components x1(0) and x2(0),and A(0) is the matrix of derivatives whose elements are equal to the partial derivatives of f(x1,x2) in order to x1,x2 at the point (x1(0), x2(0)).

To estimate the parameters, the Taylor series expansion of function Y is made in order to the parameters B and not to the vector X.

For example, the linear expansion of Y = f(x,B) in B1, B2,..., Bk, would be:

Y = f(x;B) = f(x; B(0)) + (B1-B1(0)) f /B1 (x;B(0)) +..... +
(B2-B2(0))f /B2 (x;B(0)) +...... +..........+ (Bk-Bk(0)) f /Bk (x;B(0))

or, in matrix notation, it would be:

Y(n,1) = Y(0) (n,1) + A(0) (n,k). ΔB(0) (k,1)

where

A = matrix of order (n,k) of the partial derivatives of the matrix f(x;B) in order to the vector B at the point B(0) and

ΔB(0) = vector (B - B(0)).

Then, the object function will be:

Φ = (y-Y)T.(y-Y) = (y-Y(0) - A(0). ΔB(0))T(y-Y(0) - A(0). ΔB(0))

To obtain the minimum of this function it is more convenient to differentiate Φ in order to the vector ΔB than in relation to vector B and put it equal to zero. Thus:

or

Therefore:

If ΔB(0) is "equal to zero" then the estimate of B is equal to B(0).

(In practice, when we say "equal to zero" in this process, we really mean smaller than the approximation vector one has to define beforehand).

If ΔB(0) is not "equal to zero" then the vector B(0) will be replaced by:

B(1) =B(0) + ΔB(0)

And the process will be repeated, that is, there will be another iteration with B(0) replaced by B(1) (and A(0) replaced by A(1)). The iterative process will go on until the convergence at the desired level of approximation is reached.

Comments

1. It is not guaranteed that the process always converges. Sometimes it does not, some other times it is too slow (even for computers!) and some other times it converges to another limit!!

2. The above described method is the Gauss-Newton method which is the basis of many other methods. Some of those methods introduce modifications in order to obtain a faster convergence like the Marquardt method (1963), which is frequently used in fisheries research. Other methods use the second order Taylor expansion (Newton-Raphson method), looking for a better approximation. Some others, combine the two modifications.

3. These methods need the calculation of the derivatives of the functions. Some computer programs require the introduction of the mathematical expressions of the derivatives, while others use sub-routines with numerical approximations of the derivatives.

4. In fisheries research, there are methods to calculate the initial values of the parameters, for example in growth, mortality, selectivity or maturity analyses.

5. It is important to point out that the convergence of the iterative methods is faster and more likely to approach the true limit when the initial value of the vector B(0) is close to the real value.

7.4 ESTIMATION OF GROWTH PARAMETERS

The least squares method (non-linear regression) allows the estimation of the parameters K, L and to of the individual growth equations.

The starting values of K, Land t0 for the iterative process of estimation can be obtained by simple linear regression using the following methods:

Ford-Walford (1933-1946) and Gulland and Holt (1959) Methods

The Ford-Walford and Gulland and Holt expressions, which were presented in Section 3.4, are already in their linear form, allowing the estimation of K and Lwith methods of simple linear regression on observed Li and Ti. The Gulland and Holt expression allows the estimation of K and Leven when the intervals of time Ti are not constant. In this case, it is convenient to re-write the expression as:

Stamatopoulos and Caddy Method (1989)

These authors also present a method to estimate K, L and to (or Lo) using the simple linear regression. In this case the von Bertalanffy equation should be expressed as a linear relation of Lt against e-Kt.

Consider n pairs of values ti, Li where ti is the age and Li the length of the individual i where i=1,2,...., n.

The von Bertalanffy equation, in its general form is (as previously seen):

L- Lt = (L- La). e-K(t-ta)

It can be written as:

Lt = L - (L- La). e+Kta. e-Kt

The equation has the simple linear form, y = a + bx, where:

y = Lt

a = L

b = - (L- La). e+Kta

x = e-Kt



If one takes La = 0, then ta=to, but, if one considers ta = 0, then La = Lo.

The parameters to estimate from a and b will be L, to or Lo.

The authors propose adopting an initial value K(0), of K, and estimating a(0), b(0) and r2(0) by simple linear regression between y (= Lt) and x(=ek(0)). The procedure may be repeated for several values of K, that is, K(1) K(2),.... One can then adopt the regression that results in the larger value of r2, to which Kmax, amax and bmax correspond. From the values of amax, bmax and Kmax one can obtain the values of the remaining parameters.

One practical process towards finding Kmax can be:

(i). To select two extreme values of K which include the required value, for example K= 0 and K=2 (for practical difficulties, use K = 0.00001 instead of K = 0).

(ii). Calculate the 10 regressions for equally-spaced values of K between those two values in regular intervals.

(iii). The corresponding 10 values of r2 will allow one to select two new values of K which determine another interval, smaller than the one in (i), containing another maximum value of r2.

(iv). The steps (ii) and (iii) can be repeated until an interval of values of K with the desired approximation is obtained. Generally, the steps do not need many repetitions.

7.5 ESTIMATION OF M - NATURAL MORTALITY COEFFICIENT

Several methods were proposed to estimate M, and they are based on the association of M with other biological parameters of the resource. These methods can produce approximate results.

7.5.1 RELATION OF M WITH THE LONGEVITY,

Longevity: Maximum mean age tλ of the individuals in a non-exploited population.

Duration of the exploitable life: tλ - tr = λ (Figure 7.1)

Figure 7.1 Duration of the exploitable life

Tanaka (1960) proposes "NATURAL" Survival Curves (Figure 7.2) to obtain the values of M from longevity.

A cohort practically vanishes when only a fraction, p, of the recruited individuals survives. In that case, Nλ = R · e-M·λ, and it can be written:

and so

M = -(1/λ).ln p

Different values of the survival fraction produce different survival curves of M in function of λ.

Figure 7.2 Survival curves by Tanaka

Any value of p can be chosen, for instance, p = 5%, (i.e. one in each twenty recruits survives until the age tλ) as variable value of the survival curves.

7.5.2 RELATION BETWEEN M AND GROWTH

Beverton and Holt Method (1959)

Gulland (1969) mentions that Beverton and Holt verified that species with a larger mortality rate M also presented larger values of K. Looking for a simple relation between these two parameters, they concluded approximately that:

for small pelagic fishes

for demersal fishes

Pauly Method (1980)

Based on the following considerations:

1. Resources with a high mortality rate cannot have a very big maximum size;
2. In warmer waters, the metabolism is accelerated, so the individuals can grow up to a larger size and reach the maximum size faster than in colder waters.

Based on data of 175 species, Pauly adjusted multiple linear regressions of transformed values of M against the corresponding transformed values of K, L and temperature, T, and selected one that was considered to have a better adjustment, that is, the following empirical relation:

with the parameters expressed in the following units:

M = year-1
L = cm of total length
K = year-1
T° = surface temperature of the waters in °C

Pauly highlights the application of this expression to small pelagic fishes and crustaceans. The Pauly relation uses decimal logarithms to present the first coefficient different from the value -0.0152 which was given in the previous expression, written with natural logarithms.

7.5.3 RELATION BETWEEN M AND REPRODUCTION

Rikhter and Efanov Method (1976)

These authors analysed the dependency between M and the age of first (or 50 percent) maturity. They used data from short, mean and long life species, and suggested the following relation of M with the, tmat, age of 1st maturity:


(Units)

Gundersson Method (1980)

Based on the assumption that the natural mortality rate should be related to the investment of the fish in reproduction, beyond the influence of other factors, Gundersson established several relations between M and those factors.

He proposed, however, the following simple empirical relation, using the Gonadosomatic Index (GSI) (estimated for mature females in the spawning period) in order to calculate M:

M = 4.64xGSI - 0.37

7.5.4 KNOWING THE STOCK AGE STRUCTURE, AT BEGINNING AND END OF YEAR, AND CATCHES IN NUMBER, BY AGE, DURING THE YEAR

The natural mortality coefficients Mi, at age i can be calculated from the catch, Ci, in numbers, and the survival numbers, Ni and Ni+1 at the beginning and end of a year, by following the steps:

calculate
calculate
calculate

The several values of M obtained in each age could be combined to calculate a constant value, M, for all ages.

Paloheimo Method (1961)

Let us consider the supposition that Fi is proportional to fi for several years i, that is

for Ti = 1 year, Fi = q · fi,

then:

Zi = q · fi + M

So, the linear regression between Zi and fi has a slope b = q and an intercept a = M.

7.6 ESTIMATION OF Z - TOTAL MORTALITY COEFFICIENT

There are several methods of estimating the total mortality coefficient, Z, assumed to be constant during a certain interval of ages or years.

It is convenient to group the methods, according to the basic data, into those using ages or those using lengths.

7.6.1 METHODS USING AGE DATA

The different methods are based on the general expression of the number of survivors of a cohort, at the instant t, submitted to the total mortality, Z, during an interval of time, that is:

Z is supposed to be constant in the interval of time (ta,tb).

Taking logarithms and re-arranging the terms, the expression will be:

lnNt = Cte - Z.t

where Cte is a constant (= ln Na+Zta).

This expression shows that the logarithm of the number of survivors is linear with the age, being the slope equal to -Z.

Any constant expression which does not affect the determination of Z will be referred to as Cte.

1. If Z can be considered constant inside the interval (ta,tb) and, having available abundance data, Ni, or indices of abundance in number, Ui in several ages, i, then, the application of the simple linear regression allows one to estimate the total mortality coefficient Z.

In fact

so . Constant

and, as

then, by substitution:

(Ti = const = 1 year)

and also

The simple linear regression between and ti allows the estimation of Z (notice that the constant, Cte is different from the previous one. In this case only the slope matters to estimate Z).

2. If ages are not at constant intervals, the expression could be approximated and expressed in terms of the tcentrali. For Ti variable, it will be:


Ni ≈ Ni. e-ZTi/2

and, as

Ni = Na. e -Z.(ti-ta)

it will be

Ni ≈ Cte. e-Ztcentrali

and finally:

ln Ni ≈ Cte - Z. tcentrali

3. When using indices Ui, the situation is similar because Ui = q. Ni, with q constant, and then, also:

The simple linear regression between and ti allows one to estimate Z.

4. If the intervals are not constant, the expression should be modified to:

Simple linear regression can be applied to obtain Z, from catches, Ci, and ages, ti, supposing that Fi is constant.

and so, when Ti is constant. So:

lnCi = Cte - Z. ti

5. If the intervals are not constant, the expression should be modified to:

lnCi/Ti ≈ Cte - Z. tcentrali

6. Let Vi be the cumulative catch from ti until the end of the life, then:

Vi = Σ Ck = Σ Fk Nkcum,

Where the sum goes from the last age until age i,

As Fk and Zk are supposed to be constant ΣNkcum = Ni/Z and so:

Vi = FN/Z

and

lnVi = Cte + lnNi

Therefore:

ln Vi = Cte - Z. ti

7. Following Beverton and Holt (1956), Z can be expressed as:

Then, it is possible to estimate Z from the mean age t

This expression was derived, considering the interval (ta, tb) as (ta, ∞).

7.6.2 METHODS USING LENGTH DATA

When one has available data by length classes instead of by age, the methods previously referred to can still be applied. For that purpose, it is convenient to define the relative age.

Using the von Bertalanffy equation one can obtain the age t in function of the length, as:

(the expression is written in the general form in relation to ta and not to t0)

t = ta - (1/K).ln[(L- Lt)/(L- La)]

or

(This equation is referred to by some authors as the inverse von Bertalanffy equation).

The difference t-ta is called relative age, t*,.

So: t* =-(1/K).ln[(L- Lt)/(L- La)] or t* =-(1/K)ln[1-(Lt-La)/ (L- La)]

For ta = to, La = 0 and:

t* is called a relative age because the absolute ages, t, are related to a constant age, ta.

In this way, the duration of the interval Ti can either be calculated by the difference of the absolute ages or by the difference of the relative ages at the extremes of the interval:

Ti = ti+1 -ti = t*i +1 - t*i

Also:

t*centrali = tcentrali + Cte

So, the previous expressions still hold when the absolute ages are replaced by the relative ages:

ln Ni = Cte - Z. t*centrali
ln Ui = Cte - Z. t*centrali
ln Vi = Cte - Z. t*i
ln Ci/Ti = Cte - Z. t*centrali

Finally, the expression would also be:

Beverton and Holt (1957) proved that:

must be calculated as the mean of the lengths weighted with abundances (or their indices) or with the catches in numbers.

Comments

1. The application of any of these methods must be preceeded by the graphical representation of the corresponding data, in order to verify if the assumptions of the methods are acceptable or not and also to determine the adequate interval, (ta, tb).

2. These formulas are proved with the indications that were presented, but it is a good exercise to develop the demonstrations as they clarify the methods.

3. It is useful to estimate a constant Z, even when it is not acceptable, because it gives a general orientation about the size of the values one can expect.

4. The methods are sometimes referred to by the names of the authors. For example, the expression ln Vi = Cte - Z.t*i is called the Jones and van Zalinge method (1981).

5. The mean age as well as the mean length in the catch can be calculated from the following expressions:

with Ci = catch in number in the age class i

where Ci = catch in number in the length class i

with Ci = catch in number in the age class.

The relative age should be t* = - (1/K).ln[(L- Lt)/(L- La)]

Summary of the Methods to Estimate the Total Mortality Coefficient, Z

Assumption: Z is constant in the interval of ages, (ta, tb)

T Constant



ln Ci = Cte - Z · ti


ln Vi = Cte - Z · ti

Ti variable



ln Vi = Cte - Z · ti


(tb = ∞) (Beverton and Holt equation of Z)

Supposition: Z is constant in the interval of lengths, (La, Lb)

Relative age


(Gulland and Holt equation)

(Jones and van Zalinge equation)

(Beverton and Holt equation of Z)

7.7 ESTIMATION OF THE PARAMETERS OF THE STOCK-RECRUITMENT (S-R) RELATION

The least squares method (non-linear model) can be used to estimate the parameters, α and k, of any of the S-R models.

The initial values of the Beverton and Holt model (1957) can be obtained by re-writing the equation as:

and estimating the simple linear regression between y (= S/R) and x (=S) which will give the estimations of 1/α and 1/(αk). From these values, it will then be possible to estimate the parameters α and k. These values can be considered as the initial values in the application of the non-linear model.

In the Ricker model (1954) the parameters can be obtained by re-writing the equation as:

and applying the simple linear regression between y (= ln R/S) and x (=S) to estimate ln α and (-1/k). From these values, it will be possible to estimate the parameters (α and k) of the model, which can be considered as the initial values in the application of the non-linear model.

It is useful to represent the graph of y against x in order to verify if the marked points are adjustable to a straight line before applying the linear regression in any of these models.

In the models with the flexible parameter, c, like for example, the Deriso model (1980), the equation can be re-written as:

For a given value of c the linear regression between y (= (R/S)c) and x (=S) allows the estimation of the parameters α and k.

One can try several values of c to verify which one will have a better adjustment with the line y against x; for example, values of c between -1 and 1.

The values thus obtained for α, k and c, can be considered as initial values in the application of the iterative method, to estimate the parameters α, k and c of the non-linear Deriso model.

7.8 ESTIMATION OF THE MATRIX [F] AND OF THE MATRIX [N] -COHORT ANALYSIS - AC AND LCA

7.8.1 COHORT ANALYSIS BY AGE- (AC)

The cohort analysis is a method to estimate the fishing mortality coefficients, Fi, and the number of survivors, Ni, at the beginning of each age, from the annual structures of the stock catches, in number, over a period of years.

More specifically, consider a stock where the following is known:

Data

age, i, where i = 1,2,...,k
year, j, where j = 1,2,...,n
Matrix of catches [C] with
Ci,j = Annual catch, in number, of the individuals with the age i and during the year j
Matrix of natural mortality [M] with
Mi,j = natural mortality coefficient, at the age i and in the year j.
Vector [T] where
Ti = Size of the age interval i (in general, Ti=T=1 year)

Objective

To estimate

matrix [F]

and

matrix [N].

In the resolution of this problem, it is convenient to consider these estimations separately; one interval of age i (part 1); all the ages during the life of a cohort (part 2); and finally, all the ages and years (part 3).

PART 1 (INTERVAL TI)

Consider that the following characteristics of a cohort, in an interval Ti are known:

Ci = Catch in number
Mi = Natural mortality coefficient
Ti = Size of the interval

Adopting a value of Fi, it is then possible to estimate the number of survivors at the beginning, Ni, and at the end, Ni+1, of the interval.

In fact, from the expression:

one can calculate Ni which is the only unknown variable in the expression.

To calculate Ni+1 one can use the expression where the values Ni, Fi and Mi were previously obtained.

PART 2 (DURING THE LIFE)

Suppose now that the catches Ci of each age i, of a cohort during its life, the values of Mi and the sizes of the interval Ti are known.

Adopting a certain value, Ffinal, for the Fishing Mortality Coefficient in the last class of ages, it is possible, as mentioned in part 1, to estimate all the parameters (related to numbers) in that last age group. In this way, one will know the number of survivors at the beginning and end of the last age.

The number at the beginning of that last class of ages, is also the number Nlast at the end of the previous class, that is, Nfinal is the initial number of survivors of the class before last.

Using the Ci expression, resulting from the combination of the two expressions above:

one can estimate Fi in the previous class, which is the only unknown variable in the expression. The estimation may require iterative methods or trial and error methods.

Finally, to estimate the number Ni of survivors at the beginning of the class i, the following expression can be used:

Repeating this process for all previous classes, one will successively obtain the parameters in all ages, until the first age.

In the case of a completely caught cohort, the number at the end of the last class is zero and the catch C has to be expressed as:

Pope Method

Pope (1972) presented a simple method to estimate the number of survivors at the beginning of each age of the cohort life, starting from the last age.

It is enough to apply successively in a backward way, the expression:

Ni ≈ (Ni+1 e MT/2 + Ci).e MT/2

Pope indicates that the approximation is good when MT ≤ 0.6

Pope’s expression is obtained, supposing that the catch is made exactly at the central point of the interval Ti (Figure 7.3).

Figure 7.3 Number of survivors during the interval Ti = ti+1 - ti with the catch extracted at the central point of the interval

Proceeding from the end to the beginning one calculates successively:

N" = Ni+1e+MTi/2
N’ = N" + Ci
Ni = N’.e+MT/2

substituting N’ by N"+Ci, the expression will be:

Ni = (N" + Ci).e MT/2

Finally, substituting N" by Ni+1.e +MTi/2, it will be:

Part 3 (period of years)

Let us suppose now that the Catch matrix [C], the natural mortality [M] matrix and the vector size of the intervals [T], are known for a period of years.

Let us also assume that the values of F in the last age of all the years represented in the matrices and the values of F of all the ages of the last year were adopted. These values will be designated by Fterminal (Figure 7.4)

Figure 7.4 Matrix of catch, [C], with Fterminal in the last line and in the last column of the matrix C. The shadowed zones exemplify the catches of a cohort


Years


Ages

2000

2001

2002

2003


1

C

C

C

C

Fterminal

2

C

C

C

C

Fterminal

3

C

C

C

C

Fterminal


Fterminal

Fterminal

Fterminal

Fterminal


Notice that in this matrix the elements of the diagonal correspond to values of the same cohort, because one element of a certain age and a certain year will be followed, in the diagonal, by the element that is a year older.

From parts 1 and 2 it will then be possible to estimate successively Fs and Ns for all the cohorts present in the catch matrix.

Comments

1. The values of Mi,j are considered constant and equal to M, when there is no information to adopt other values.

2. When data is referred to ages, the values Ti will be equal to 1 year.

3. The last age group of each year is, sometimes grouped ages(+). The corresponding catches are composed of individuals caught during those years, with several ages. So, the cumulative values do not belong to the same cohorts, but are survivors of several previous cohorts with different recruitments and submitted to different fishing patterns. It would not be appropriate to use the catch of a group (+) and to apply cohort analysis. Despite this fact, the group (+) is important in order to calculate the annual totals of the catches in weight, Y, of total biomasses, B, and the spawning stock biomass. So, it is usual to start with the cohort analysis on the age immediately before the group (+) and use the group (+) only to calculate the annuals Y, B and (SP). The value of F in that group (+) in each year, can be estimated as being the same fishing mortality coefficient as the previous age or, in some cases, as being a reasonable value in relation to the values of Fi in the year that is being considered.

4. A difficulty in the technical application appears when the number of ages is small or when the years are few. In fact, in those cases, the cohorts have few age classes represented in the Matrix [C] and the estimations will be very dependent on the adopted values of Fterminals.

5. The cohort analysis (CA) has also been designated as: VPA (Virtual Population Analysis), Derzhavin method, Murphy method, Gulland method, Pope method, Sequential Analysis, etc. Sometimes, CA is referred to when the Pope formula and the VPA are used in other cases. Megrey (1989) presents a very complete revision about the cohort analyses.

6. It is also possible to estimate the remaining parameters in an age i, related to numbers, that is, Ncumi, Ni, Di, Zi and Ei. When the information on initial individual or mean weights matrices [w] or [w] are available, one can also calculate the matrices of annual catch in weight [Y], of biomasses at the beginning of the years, [B], and of mean biomasses during the years [B]. If one has information on maturity ogives in each year, for example at the beginning of the year, spawning biomasses [SP] can also be calculated. Usually, only the total catches Y, the stock biomasses (total and spawning) at the beginning and the mean biomasses of the stock (total and spawning) in each year are estimated.

7. The elements on the first line of the matrix [N] can be considered estimates of the recruitment to the fishery in each year.

8. The fact that the Fterminals are adopted and that these values have influence on the resulting matrix [F] and matrix [N], forces the selection of values of Fterminals to be near the real ones. The agreement between the estimations of the parameters mentioned in the points 6. and 7. and other independent data or indices (for example, estimations by acoustic methods of recruitment or biomasses, estimations of abundance indices or cpue´s, of fishing efforts, etc) must be analysed.

9. The hypothesis that the exploitation pattern is constant from year to year, means that the fishing level and the exploitation pattern can be separated, or Fsepi = Fj x si. This hypothesis can be tested based on the matrix [ F ] obtained from the cohort analysis.

It is usual to call this separation VPA-Separable (SVPA).

We have

and

and

Then, if Fij = Fj.si one can prove that .

If the estimated values of Fij are the same as the previous Fsepij = Fj.si then the hypothesis is verified. This comparison can be carried out in two different ways, the simplest is to calculate the quotients Fsepij /Fij. If the hypothesis is true this quotient is equal to one. If the hypothesis is not verified it is always possible to consider other hypotheses with the annual vector [s] constant in some years only, mainly the last years.

10. It is usual to consider an interval of ages, where it can be assumed that the individuals caught are "completely recruited". In that case, the interval of ages corresponds to exploitation pattern constant (for the remaining ages, not completely recruited, the exploitation pattern should be smaller). For that interval of ages, the means of the values of Fi,j in each year are then calculated. Those means, Fj, are considered as fishing levels in the respective years. The exploitation pattern in each cell, would then be the ratio Fi,j / Fj. The si, for the period of years considered, can be taken as the mean of the relative pattern of exploitation calculated before. Alternatively, they can also be taken as referring to si of an age chosen for reference.

7.8.2 LENGTH COHORT ANALYSIS - (LCA)

The technique of the cohort ansalysis, applied to the structure of the catches of a cohort during its life, can be made with non constant intervals of time, Ti,. This means that the length classes structure of the catches of a cohort during its life, can also be analysed.

The methods of analysis of the cohort in those cases is called the LCA (Length Cohort Analysis). The same techniques; Pope method, iterative method, etc, of the CA for the ages, can be applied to the LCA analysis (the intervals Ti´s can be calculated from the relative ages).

One way to apply the LCA to the length annual catch compositions, will be: to group the catches of length classes belonging to the same age interval in each year. The technique CA can then be applied directly to the resulting age composition of the catches by age of the matrix [C]. This technique is known as "slicing" the length compositions. To "slice", one usually inverts the von Bertalanffy length growth equation and estimates the age ti for each length Li (sometimes using the relative ages t*i) (Figure 7.5). It is possible that when grouping the length classes of the respective age interval, there are length classes composed by elements that belong to two consecutive age groups. In these cases, it will be necessary to "break" the catch of these extreme classes into two parts and distribute them to each of those ages. In the example of Figure 7.5, the catches of the length class (24-26] belong to age 0 and to age 1. So, it is necessary to distribute that catch to the two ages. One simple method is to attribute to age 0 the fraction (1.00 - 0.98)/(1.06 - 0.98) = 0.25 of the annual catch of that length class and to age 1 the fraction (1.06 - 1.00)/(1.06 - 0.98) = 0.75. The method may not be the most appropriate one, because it is based on the assumption that, in the length classes, the distribution of the individuals by length is uniform. So, it is necessary to use the smallest possible interval of length classes, when applying this distribution technique.

Another way to do the length cohort analysis is to use the catches in the length classes of the same age group. It is possible to follow the cohorts in the matrix [C], through the length classes belonging to a same age, in a certain year, with the length classes of the next age, in the following year, etc. In this way, the different cohorts existing in the matrix will be separated and the evolution of each one of them will be by length classes, not by age (see Figure 7.5).

Figure 7.5 Example of a matrix [C] with the catches of the cohort shadowed, written in bold, recruited at year 2000, "sliced" by length classes,

Group



Years

Age

Relative age

Length
Classes

2000

2001

2002

2003

0

1.03

20-

41

30

17

49

1.54

22-

400

292

166

472

1.98

24-

952

699

400

1127

1

2.06

26-

1766

1317

757

2108

2.30

28-

2222

1702

985

2688

2.74

30-

2357

1872

1093

2902

2.88

32-

2175

1091

1067

2739

2

3.00

34-

1817

948

1416

1445

3.42

36-

1529

812

1270

1250

3.64

38-

1251

684

980

1053

3.83

40-

1003

560

702

710

3.96

42-

787

290

310

558

3

4.01

44-

595

226

179

834

4.25

46-

168

70

71

112







Cohort of the year 2000

The LCA R. Jones method (1961), of analysing a length composition during the life of a cohort can then be applied. The different values of Ti are calculated as Ti = ti+1*-ti*, where ti* e ti+1* are the relative ages corresponding to the extremes of the length interval i. The vector [N] can also be obtained as the number of initial survivors in each length class of the cohort, and in each age class.

Comments on cohort analyses

1. Certain models, called integrated models, combine all the available information (catches, data collected on research cruises, effort and cpue data, etc) with the matrix [C], and integrate in a unique model, in order to optimize the previously defined criterium function. A model integrating CA and the hypothesis of constant exploitation pattern was developed and called SVPA, separable VPA, because the Fishing level and Exploitation pattern are "separable".

2. Fry (1949) considered the cumulative catches of a cohort by age during its life, from the end to the beginning, as an image of the number of survivors at the beginning of each age (which the author designated as "virtual population"):

In the fishery that Fry studied, M was practically equal to zero.

If M is different from zero it can also be said that the number Ni of survivors at the beginning of the interval i will be

where Dk represents the number of total deaths at the interval k.

Adopting the initial values, Ek(0), for the exploitation rates, E, in all the classes, one can calculate the total deaths:

Dk(0) = Ck/Ek(0).

N i(0) can be calculated as the cumulative total deaths from the last class up to the ith class, that is:

Then the expression will be:

Zi(1).Ti = ln(Ni+1(0)/Ni(0))

and:

Fi(1).Ti = Ei(0).Zi(1).Ti

Comparing Ei(1) with Ei(0), the new values of E will be:

Ei(1) = Fi(1).Ti/ Fi(1).Ti + Mi.Ti

One can then estimate the values of E with the desired approximation by an iterative method, repeating the five calculations (of Di, Ni, ZiTi, FiTi and Ei,) using Ei(1) instead of Ei(0).

In the last class, the number, Nlast, can be taken as equal to the number of deaths, Dlast, and in this case, Nlast will be calculated as:

Nlast = Dlast = Clast / Elast

3. Finally, the results of CA and of LCA give a perspective view of the stock in the previous years. That information is useful for the short and long-term projections. Usually, data concerning the catches is not available for the year in which the assessment is done and so it is necessary to project the catches and the biomasses to the beginning of the present year before calculating the short-term projection.

4. When the relative ages are calculated, it is usual to adopt zero as the age ta corresponding to the value of La, taken as the lower limit of the first length class represented in the catches.


Previous Page Top of Page Next Page