4. DATA COLLECTION STRATEGY: SAMPLING

There is no single prescription for the optimum design of a data collection system, as the underlying conditions for the design varies from place to place. Obviously, the design of a data collection programme for a small island state is different from that of a large country. Also, the nature of the fishing industry, for example, primarily industrial or artisanal fishery, plays an important role in the design. Although it has been attempted to make this chapter general, it concentrates on the case of large country with a dominant artisanal sector, based on experience drawn from Viet Nam.

4.1 SAMPLE-BASED ESTIMATION VERSUS COMPLETE ENUMERATION

The concepts of “total enumeration” (sometimes referred to as “census”) and “sample-based estimation”, can be explained by a hypothetical example, where a “population” of six vessels is landing at a particular location.

Notice that the word “population” is used for the set of units from which data are collected. Thus, “population” is a general term which may refer to “a set of landing places”, “a set of fishing trips”, “a set of landings” etc. In the present example, it is assumed that the population consists of six “similar vessels”, both with respect to vessel dimensions and fishing techniques. Thus, the average landings are the same for all six vessels.

The landings are in units of “boxes”, and the task is to determine the total number of boxes landed on a particular day in a particular landing place.

“Total enumeration” means that all 6 vessels are inspected and all the boxes recorded. Alternatively, all the skippers may have filled in logbooks recording the number of boxes, which subsequently are made available to the data collectors. The result of total enumeration will be a total landing of 19 boxes (see Figure 4.1.1 and Table 4.1.1)

In the sample-based estimation, only a subset of the vessels is inspected (due to limited resources for data collection). In this case, only vessels No. 3 and 4 are inspected and the total number of boxes counted for the two vessels is six boxes.

Table 4.1.1 Illustration of the concepts of “Total enumeration” and “Sample-based estimation”

Vessel No.	Boxes landed	Sampled Boxes
1	3
2	4
3	3	3
4	3	3
5	4
6	2
Total	19	6
Raising factor = 6/2 =		3.0
Sample-based estimate of total 3×6=		18 boxes
Complete enumeration:		19 boxes

Figure 4.1.1 Illustration of the concepts of “Total enumeration” and “Sample-based estimation”. Sampling results in only vessels 3 and 4 being inspected, whereas total enumeration means all vessels are inspected.

The six observed boxes are then “raised” to a total by application of the “raising factor”:

Raising factor = (Total number of vessels)/(Number of sampled vessels) = 6/2 = 3.0

The estimated total number of boxes then becomes 3^*6 = 18. In this hypothetical case, total enumeration gave the correct number of boxes, whereas the sample approach underestimated the total. In reality, complete enumeration may also give inaccurate results. For example, if some vessels transfer boxes to other vessels at sea so they are not reported, “complete enumeration” would underestimate the total landings. However, unlike sampling, complete enumeration is unlikely to overestimate the catch.

The above example does not account for the duration of the fishing operation of each vessel used to land the boxes. Perhaps vessel 5 landed the double of vessel 6 (see Table 4.1.1) because vessel 5 was fishing for a longer period (see Section 3.1.2).

4.2 GEOGRAPHICAL AND SEASONAL DIVISION OF POPULATIONS

The populations, from which fisheries data are collected, are naturally grouped in several dimensions, such as vessel and landings categories. Some of the divisions can be defined by the designer of the sampling programme. The process of defining divisions and allocation of sampling intensity to divisions of the population is called “stratification”. The divisions are called “strata” (see Section 4.3). Other divisions of populations are forced upon the designer of the sampling programme, and that applies to the major divisions in space and time, such as seasons and administrative provinces.

The Government will often request the data to be grouped by the administrative units, primarily by the administrative divisions within a country, for example, by the “provinces” (sometimes called “states”), districts, cities, towns, villages etc.

As living resources, and often fishers, do not keep within provincial or even national borders, the administrative divisions are often in conflict with natural divisions matching the distribution of living resources and fishing fleets. Stocks will disregard administrative borders unless they happen to lie along some natural environmental feature (e.g. a river outflow). In many cases, fishing vessels operate far from the “home port” (the place of vessel registration), and therefore use another port as their base for fishing (“base port”). However, the administrative units in a country (e.g. province or town) often register fishing vessels, and fishing rights are sometimes linked to the place of registration, which may reduce this problem.

Whereas the administration will naturally group fishing vessels according to their place of registration (home port), the biologist will want to group vessel landings (and thereby also group the vessels) relative to the fishing grounds. Eventually, the biologist will want to group catch data according to which stocks the catch originates from. Stocks are related to fishing grounds, which in turn are more related to landing places and base ports than home ports. The economists and the sociologists, on the other hand, may be less interested in the fish stocks and where the fish were caught, and for many economic, social and technical data, the natural place for data collection is the home port.

A sampling programme with the objective of estimating total catch, has to use the landing places and the base ports as the division of the populations of landings and fishing vessels, and in the following we shall assume that the primary objective is to estimate total landings and discards.

The fishing vessels often change fishing techniques and/or fishing grounds during the year. In the tropics the fishing seasons are usually linked to the monsoon seasons, so it is often convenient to define the start of fishing seasons as different from that of the calendar year. Whether you use already existing geographical and seasonal divisions depends on the objective of the sampling programme. For the estimation of total landings, it will often be preferable not to use the existing divisions:

Rather than using home port, landing places and base ports may be preferable.
Rather than using provinces, other divisions (such as groups of provinces or sub-divisions of provinces) may be preferable.
Rather than using months, fishing seasons may be preferable.

However, there are also some advantages to staying with the existing divisions, as many other data will structured according to the administrative divisions (demographic, economic etc.). As a rule of thumb, you should only change the traditional divisions when there is a good reason for it. For example, it may not make any sense to deal with the living resources of a province, where the fish stocks has a larger distribution than the waters of the province in question. Fish stocks may even distribute over the Exclusive Economic Zones (EEZs) of several countries. Landing statistics can be presented both by fishing grounds (often the same as “by stock”) and by administrative unit, say “province”. To present the statistics by province will usually not be a major problem, as the sampling programme has to structure the data collection according to the landing places anyway.

The coastal waters of a province (or a country) will often be of special interest to politicians, managers and sociologists, as these waters often support traditional (artisanal) fishers and their families. The management instruments for coastal waters may be quite different from those for industrial fisheries. Artisanal fishers are often only able to fish in the coastal waters of their home province, whereas the industrial fishery is much more mobile.

The division of the surface of the earth into degrees of latitude and longitude divides the sea into “rectangles”. Fishing grounds are either given as a named area (which can be any area) and/or is defined by a grid reference. The basic spatial concept is the statistical rectangle, defined by latitude and longitude grid. For example, the statistical rectangles could be half degree squares (30' latitude by 30' longitude), that is a total area of 30×30 = 900 square nautical miles (nm²). Actually, this is correct only for the squares at the equator, as the distance between the lines of longitude gets smaller as they get further from equator.

Figure 4.2.1 summarises the main geographical elements of fisheries data collection by a hypothetical country with its sea areas. The statistical rectangles are usually given names either in a national context or an international context, as illustrated by the four rectangles in the upper right corner of the figure.

Figure 4.2.1 Illustration of the principal geographical elements of fisheries data collection (hypothetical example). The “Landings places” may also be “Home ports” and/or “Base ports”. The grid reference is based on latitude and longitude 30×30 nm squares. These may be further divided into 9 10×10 nm squares (right).

The names of the statistical rectangles, which here are the same as the code, are composed of a letter and a number (Figure 4.2.1). The statistical rectangles may be further subdivided into 9 divisions each of dimension 10×10 nm, with the numbering, 1,2…9, as indicated on the figure. The number “0” indicates the entire 30×30 nm. No real fishing ground fits exactly to the statistical rectangles, but for a number of practical reasons, it is advantageous for the fisheries data collector to adopt the standard system.

The designer of the data collection programme must achieve all the objectives of the sampling programme by covering all the major geographical and seasonal factors. This is not an easy task, as the objectives are often conflicting and funding and manpower are limited. As the optimum solution always involves the particulars of the country in question, it is not possible to give a universal method. Instead, a representative example is given, which it is hoped will lead readers in the right direction as far as their special situation is concerned.

4.3 STRATIFICATION

Resources (manpower and funds) for sampling are almost always limited. Therefore it may not be possible to cover all major landing places in any division (province or group of provinces) of a country. Instead, the landing places are usually categorised, and representatives for each category of landing place selected. Statisticians call this process “stratification”.

As an example, consider categories of landing sites defined according to the total landings per year: below 1000 tonnes per year, 1000–2000 tonnes per year etc. To make a first estimate of the total landings of each landing place, we may use a frame survey, and compute the potential catch from the number of vessels in each vessel category. As an alternative, the categories of landing places may also be based on the number of vessels fishing from each place. This will be the only option, if total catches are not known. However, here we assume that some estimate of total landings is available.

The process of defining groups of landing places is the first part of the stratification. The stratification is illustrated in Figure 4.3.1. The left-hand side of the figure indicates a coastline with 42 landing places. Four have been categorised as “large landing places”, ten as “medium landing places” and 28 as “small landing places”. The stratification also involves the selection of landing places to be covered by the sampling programme. The right hand side of Figure 4.3.1 illustrates the sampled population, taken at random from within the groups or strata. In this case, it was decided to sample randomly 3 “large”, 4 “medium” and 4 “small” landing places.

Estimates of landings from the selected landing places are then raised firstly to the total division of the country by raising factors from the frame survey (or vessel register). If data on total landings are available from the landing places not sampled, we may also use this information in the raising procedure.

4.3.1 Criteria for Strata Selection

Sampling programmes are usually “stratified”. That means that the “populations” to be sampled are grouped into sub sets or strata. This is done simply to reduce sampling costs. Strata and the number of samples they each provide, should be defined according to “Neyman's Criteria” These attempt to optimise the allocation of limited resources (manpower and funds) between the strata. That is, the task is to tell how many samples should be purchased from each stratum, taking into account the cost of obtaining the samples.

As another example, consider the population of a group of fish landed. Let us assume that the task of the sampling programme is to estimate the mean body weight of the fish landed.

Neyman's first criterion is that large samples should be taken from large strata (Figure 4.3.2). We may in this connection think of three parts of the landings, and the largest sample should thus be then from the largest part. If a stratum is large, it represents a larger proportion of the population and its mean value is more representative than that of an inferior stratum. This works in the same way as if there were no stratification.

Figure 4.3.1 Illustration of the concepts of “strata” and “stratification” (for further explanation, see text). The left side represents a coast with three categories of landing site to be sampled, large medium and small.

Figure 4.3.2 Neyman's first criterion: large stratum - large sample. The largest sample should be taken from Stratum 1 and the smallest from stratum 3.

Figure 4.3.3 Neyman's second criterion: large variation - large sample. A larger sample should be taken from Stratum 1 as there is much greater variation in fish sizes. All the fish in Stratum 2 are of the same size, so it would be enough to weigh one single fish, because its weight will equal the mean weight. Stratum 1, on the other hand shows large individual differences, and many individuals must be weighed to achieve a precise estimate of the mean weight.

The second criterion says that more samples should be taken from the stratum containing greater variation (Figure 4.3.3). That the variation within a stratum is large indicates that there is greater uncertainty concerning the mean value for this stratum, and consequently it should become subject to a more examination than the stratum for which there is little uncertainty.

The third Neyman criterion states that large samples should be taken from strata where the sampling is cheap.

The second and third criteria are also used to define strata. Optimum strata minimise variation within them (i.e. maximise variation between strata), and make the most expensive strata as small and as homogeneous as possible. In the fish size example, the most obvious strata are the commercial groups. In other cases, it may not be so simple and some initial random sampling may be required to identify an appropriate stratification.

In defining strata, the definition should be as closely related to the variable of interest as possible. There is no point, for instance, in defining strata based on the first letter of the name of landing places if the sample was to estimate total landings. It would be better to pick the required sample at random from all landing places together. However, there is considerable advantage in stratifying landing sites on fleet composition and nearness to fishing grounds, as these will affect the individual vessel landings at those sites.

In the following example, we consider “the population of landing places” described above. These landing places may be divided according to the total quantity landed per year. Table 4.3.1 shows the number of landing places in the country. Column A gives the definition of landing place category, (or the “strata”), column B gives the number of landing places in each stratum and column C gives the total landings to all places in each stratum. Column D gives the average landings of a landing place. The remaining three columns will be explained below.

Table 4.3.1 “Neyman's allocation” for landing place, based on total catches.

A	B	C	D	E	F	G
					Allocation of samples
Landing place by size of landings (stratum)	N=Number	Sum of total landings	Average landings	s=Standard Deviation	N^*s	% of samples
Less than 1000 t/yr	378	121250	321	102	38459	14
1000 to 2000 t/yr	46	68350	1486	297	13660	5
2000 to 5000 t/yr	66	196436	2976	951	62794	24
5000 to 10000 t/yr	19	138400	7284	2294	43582	16
More than 10000 t/yr	15	249000	16600	7209	108137	41
Total	524	773436			266632	100

Let us say that we have the resources to collect 50 samples. Now the question is how do we allocate these 50 samples to the 5 strata? Neyman's criteria tell us how to solve this problem. But before we can use Neyman's criteria we must compute the “standard deviations” for each stratum. The standard deviation is a measure for how much variation there is between the total landings of each landing place within a stratum. If all the landing places have exactly the same total annual landings, the standard deviation would be zero. If they were nearly the same the standard deviation would be small. The larger the variation between total landings the larger the standard deviation becomes. A large standard deviation means that it is difficult to get a true picture about the stratum from a few samples. Calculation of the standard deviation is a simple exercise.

Table 4.3.2 shows three samples (of total catch in a landing place) and illustrates the corresponding standard deviations. Notice that in this example, the standard deviation becomes large when the observations are large (they are approximately proportional to the mean value). This is a feature often observed in fisheries data.

Table 4.3.2 Three examples of standard deviations of ten observations, where the standard deviation is (approximately) proportional to the mean values.

	Sample	Sample	Sample
Observation No.	1	2	3
1	5	10	51
2	6	14	72
3	4	9	43
4	5	10	52
5	6	13	66
6	3	7	31
7	4	9	43
8	5	11	56
9	3	6	34
10	4	9	44
Mean value	4.50	10.11	48.63
Standard deviation	1.08	2.35	13.03

Column E in Table 4.3.1 shows the estimated standard deviation of each group of landing places (each stratum). The two first Neyman criteria state that the sample should be large if:

The stratum has many members (i.e. values in column B, Table 4.3.1 are large)
The stratum has a large standard deviation (i.e. values in column E, Table 4.3.1 are large).

The two criteria are combined by computing the product of “Stratum size” and “Standard deviation” (N^*s) (Column F in the table). Eventually the samples are distributed in the same proportions as the 'N*s". The percentage of the total possible number of samples to be allocated to each stratum is shown in column G of the table. This is equal to 100 times the N^*s divided by the sum of the N^*s.

The results of Table 4.3.1 are depicted in Figure 4.3.4. There are three bars for each stratum. The left most bar indicates the stratum size, the number of landing places. The right most bar indicates the standard deviation of each stratum divided by 100. The middle bar indicates the product of stratum size and standard deviation. Notice that the larger the landings, the larger is the standard deviation. One reason for this is that the difference between any two landing places in the first two strata is limited to 1000 tonnes, whereas for the third stratum the limit is 2000 tonnes, and so on.

The picture observed here is a common result: there are many units in the first stratum, but their standard deviation is small. There are few units in the last stratum, but their standard deviation is large. Although the large landing places are few in number, the standard deviation is so big that most samples are allocated to this stratum (41%). On the other hand, the small landing places are so many that they cannot be ignored, and according to these criteria, we should allocate 14% of the samples to this stratum.

Figure 4.3.4 Illustration of “Neyman's Criteria”.

The strata are sometimes forced upon the designer of the data collection programme, and sometimes it is up to the designer to create the strata. Administrative divisions like provinces (or states) of a country is often forced upon the data collection programme, as the government will require the data to be presented by the administrative units. Furthermore, the fisheries institutions involved in the data collection are usually structured by the administrative units. The problem with the administrative units are often that they are not defined to match the distribution of natural resources and the distribution of fishing fleets.

The definitions of fleets, on the other hand is a division where the designer may have a certain freedom to create strata, which match the fisheries. Definitions of fleets are dictated both by the need of the sampling programme, and the needs of the fisheries managers, who are often interested in the performance of the fleets.

4.3.2 Designing Strata

Stratification has two main purposes:

To optimise the sampling programme.
To address management and development issues (To be able to answer “what-if-questions”, see Section 3.2.3).

The two objectives may be in conflict, but more often, they lead to the same division of the populations.

The solution for the second objective would be to have a very large number of “small” divisions, which could be merged in different ways to meet the request from different “what-if-questions”. For example, in one context, you may be interested in the “shrimp trawlers with engines from 25 to 50 HP”, in another context you are interested in “deep sea vessels”. With a very detailed split-up of the vessels into fleets, you will always be in a position to separate the desired group of vessels.

However, many divisions is in conflict with first objective as all strata must be covered adequately by the sampling programme, and introducing a large number of strata may exhaust the manpower and funding available for data collection.

Stratification in theory is a mathematical exercise along the lines of the Neyman allocation criteria. Stratification in practice is a combination of the theoretical approach and a combination of logistical, political, and administrative factors. The allocation of limited resources of personnel and funding can only be partly solved by the theoretical approach.

Seven types of stratification are considered:

Provinces of a country and districts of a province;
Landing places;
Fleets/gear;
Fishing grounds;
Commercial groups;
Seasons;
Fish species.

However, the seven stratification types are not independent. Once stratification on, for example, fleets, is made, it will have implications on the other types of stratification. A purse seine fleet will catch mainly pelagic species, so the stratification on commercial groups, fishing grounds and species will reflect the fact that the catch was made by a purse seine.

One of the most important questions when setting up a sampling programme is “How many samples should be collected?” The more samples collected, the more likely it is that the estimated mean value of the variable is close to the “true value”. The “true value” for, say, “the average number of kg landed per day of species X”, is the value you would get in case all landings were inspected. Thus, the true value is the result of a complete numeration, without any bias or erroneous reporting.

Figure 4.3.5 shows an example of the confidence limits as a function of the number of samples. The “confidence limits” is here given as the percentage deviation of estimate from true value with a probability of 95%. For example, with 4 samples, there is a 5% chance the mean value will deviate more than 12% from the true value.

The relative standard deviation 100 × s/ (Mean value) is assumed to be 10% in the example of Figure 4.3.5 where the standard deviation, s, is defined as:

where n = number of samples and x_i= i^th observation.

The confidence interval is here computed as: (Mean value) ±t_n-1*s, where t_n-1 is the inverse of the “Student's t-distribution” for n-1 degrees of freedom corresponding to the probability 0.05. (For further details, see textbooks on statistical theory and Sparre and Venema 1998).

As can be seen, the confidence limit gets smaller the larger the number of samples, but there are diminishing returns as the sample size gets larger. Thus, little is achieved in terms of increasing confidence after a certain number of samples.

The number of samples which produces a pre-specified confidence interval can thus (in theory) be estimated. As already noted in Chapter 1, this manual does not aim to introduce the mathematical aspects of sampling theory. A reader who wants to go further into these topics is referred to one of the many textbooks on sampling theory (see References in Chapter 8).

Figure 4.3.5 Hypothetical example of relative 95% confidence limits as a function of number of samples, when relative standard deviation is 10%.

4.3.2.1 Provinces of a Country and Districts of a Province

If the country is divided into administrative units, say provinces, the data collection programme cannot ignore this stratification, although it may not match the distribution of fish stocks and fishing fleets as fish and vessels may cross the province borders.

The resources for data collection may not allow for a complete coverage of all coastal provinces, and in that case, the selected provinces should be representative for the entire coastline. Naturally, following Neyman's criteria, the provinces with the largest production should be given the best cover by the sampling programme, but that is not the only criterion. The leading fisheries provinces may be located at one end of the country, which may deviate in its marine resources and fishery from the other areas of the country. Unless the part of the country with the secondary fishery is very insignificant, it should not be ignored. Logistics, collaboration with local authorities, collaboration with industry, location of fishing companies and processing plants may also make the designers deviate from the Neyman allocation.

The stratification by provinces should be based on the province fisheries profiles (see Section 6.9), and an overall evaluation of the suitability of a province for data collection should be the basis for the allocation of resources. All coastal provinces (or provinces with marine fisheries) must be covered by the sampling programme in one way or another. As a minimum, the vessel registration and/or the frame survey should cover all provinces, to allow for raising of samples to a total (see Section 4.4).

Provinces may be too small to fit to the stocks, the fishing fleets and the limited resources available for data collection. It may therefore be desirable to use groups of provinces rather than the individual provinces as the strata. Provinces are “too small” if the fishing vessels within the group of provinces frequently cross the province borders or even have their main fishery outside the home province (the province of registration).

The grouping of provinces should be chosen so that the migration of fishing vessels is mainly within the groups, and the migration between groups in minimised. Furthermore, the waters of the province groups should not show too much variation in species composition between provinces. The exploited shallow and deep water environments should preferably remain approximately the same within a province stratum. With such a grouping of similar provinces, the samples from vessels within a group can be raised collectively.

Groups of provinces rather than individual provinces may reduce the cost of data collection. Some provinces are considered well represented by other provinces, and only the landing places of representative provinces are covered by the interview sampling programme. If the vessels move between the landing places irrespectively of the provincial borders, vessels from all provinces will be sampled, even if some home-provinces are not sampled.

Table 4.3.3 illustrates the concept of “vessel migration”. In this case, it is assumed that the programme has covered all provinces and that all landings during a certain period have been recorded. In this case there are 10 provinces, all of which can be both “home province” (province of registration) and “base province”. As can be seen the vessels do not move away from the homeport in a random manner. A vessel from Province 6 is more likely to use Provinces 4 to 8 (group B) than Provinces 1–3 (group A) or Provinces 9–10 (group C). With the grouping of the 10 provinces into three Divisions A, B and C there is still some migration between groups, but the major migration is within the groups.

Table 4.3.3 Number of landings by base port and homeport. (*) Indicates provinces selected to represent group A, B and C respectively (for further explanation, see text).

Division of country		A			B					C
Division of country		PROVINCE USED AS BASE FOR FISHING
	HOME PROVINCE	Prov.1	Prov.2 *()**	Prov.3	Prov.4	Prov.5 *()**	Prov.6	Prov.7	Prov.8 *()**	Prov.9 *()**	Prov.10
A	Province 1	92	80	40	8	3	2	3	2	0	1
	Province 2	76	90	76	7	8	4	2	0	1	0
	Province 3	30	76	99	16	6	8	3	3	2	1
B	Province 4	8	6	16	46	40	17	20	11	3	2
	Province 5	3	8	6	37	48	45	14	20	3	1
	Province 6	1	3	8	17	41	48	45	16	7	4
	Province 7	2	1	4	20	18	37	47	37	6	8
	Province 8	1	0	2	10	19	14	40	49	16	7
C	Province 9	2	0	3	2	5	8	6	16	96	83
C	Province 10	1	0	0	0	2	2	7	6	79	92

Table 4.3.4 Migration of vessels. Summary of Table 4.3.3.

Total in %	Division A	Division B	Division C
Division A	32.5	3.7	0.3
Division B	3.4	37.0	2.8
Division C	0.3	2.6	17.3

Table 4.3.4 summarises the migration patterns for the 3 groups of provinces. The table shows the percentage of landings by Division. If the 10 provinces have to be divided into three groups, then the grouping shown in Table 4.3.4 is best. What has been said about provinces of a country applies as well to the districts of a province, except the migration between districts is likely to be more pronounced.

4.3.2.2 Landing Place

Usually, it will not be possible to cover all landing places within a province or a district and representative landing places must be selected for sampling. The Neyman criteria cannot usually be applied directly, but must be combined with a number of features of the landing place:

Type of fishery (fleets) of the landing place;
Type of landings to the landing place (for example, only fish, only pelagic fish, only cephalopods, all categories of landings, etc.);
Number of vessels by fleet (size of stratum);
Total landings by commercial group (size of stratum);
Seasonality of fishing (when should many samples and when should few samples be taken?);
Buyers system (is it easy to access the files of buyers?);
Type of landing place (quay, jetty, beach, cold store);
Distance from local office of fisheries department and cost of transport;
Collaboration with local authority (local fisheries department, coast guard etc.);
Collaboration with local fishers and (if applicable) their association;
Availability of local enumerator(s);
Practical conditions for sampling (problems in getting access to the vessels and the fishers).

The list is not complete. The final selection of landing places to cover, however, should primarily be made so that the principles of Neyman are adhered to as far as possible.

4.3.2.3 Fleets/Gear

The stratification of fleets is important not only for the data collection, but also for the subsequent use of the database. The fisheries managers and developers will often ask questions related to fleets, but these will be limited by the fleet stratification. Questions asked on categories within strata may not be answerable.

From the point of view of flexibility of the database, it is desirable to have as many fleets as possible, but from the view of funding and manpower, the number should be as small as possible. Thus, the choice of fleet stratification will become a compromise between these two conflicting objectives. The following features of fleets should be considered:

Number of vessel in the fleets;
Total production by value and weight by the fleet relative to the total fishery;
The existing artisanal and industrial vessels of the country;
Type of vessel;
Horsepower of engine;
Dimensions (length, depth, width);
Gear(s);
Electronic equipment;
Primary fishing grounds of the fleet;
Home port(s) of fleets;
Base port(s) of the fleet;
Target species of fleet;
Fishing Seasons of fleet;
Ownership of vessels;
Legal basis (license agreement);
Value of vessel;
Crew size;
Nationality of vessel (national/joint venture);
Prospects for further investment in fleet;
Requests from decision makers (politicians, managers, developers);
Special scientific requests;
Practical problems in recognising that a vessel belongs to the fleet.

In practice, the most obvious definition of fleets uses the three features:

Horsepower of engine;
Gear(s);
Type of vessel.

If a vessel register is available and sampling is random, it may be possible to make a “post-stratification” of the fleets. In that case, the sampled vessels can be divided in all the ways the data in the vessel register allows for. That may leave some strata with very few samples. In general, a vessel register facilitates the flexibility for re-definition of fleets.

4.3.2.4 Fishing Grounds

Fishing grounds are to a large degree determined by the fleet stratification. There may, however, be cases where a fleet exploits two or more different fishing grounds, and if the difference is significant, stratification could be made on fishing grounds according to Neyman's criteria.

If catches are also given by position (Latitude, Longitude) or statistical rectangles (see Section 4.2), the flexibility for post-stratification is improved. If recording by statistical rectangles or positions is common, it is advisable to define the fishing grounds by the statistical rectangles (or divisions) they cover.

Figure 4.3.6 Abundance of living resources depicted as catch per day by statistical rectangle.

An important feature of a fishing ground is the water depth. Stratification of landings by depth-zone is often adequate, as the distribution of resources is strongly related by water depth. Within a species it is often observed that that the size distribution is dependent on depth, so that the deeper the water, the larger the specimens observed. In addition, the average size of fishing vessels is often related to depth.

The habitat or the bottom type may be important. For example, coral reefs are different from, say, trawling grounds in many respects. Species compositions as well as fishing techniques are specific for coral reefs.

Political issues may also influence the geographical stratification. For example, an EEZ (Exclusive Economic Zone) boundary going across a fishing ground will naturally lead to a geographical stratification, which is not related to the living resources. Nevertheless, this division must be established, unless there is an agreement between countries about equal rights to fish in each other's waters. Often the distance from the coastline is used to define fishing rights, and a stratification of say, 0–10 nm, 11–20 nm, 20–50 nm, 50–200 nm from the coast line may be required to address certain management issues.

Perhaps the most commonly raised question by managers is that of “boxes”. Boxes are selected areas where a restriction on fishing is imposed, sometimes by species. Examples are a complete ban on fishing, seasonal ban on fishing, ban on fishing with certain gears, etc. In order to be able to provide advice on where to place such boxes, or the effect of an existing box, the catch by area is required, such as catch by statistical rectangle.

Catch by position or statistical rectangle will allow for the use of GIS (Geographical Information Systems), for example, to produce maps showing the distribution of resources and fishing fleets (Figure 4.3.6).

4.3.2.5 Commercial Groups

The division of landings into commercial groups is in some areas (even regions) standardised, and the commercial groups remain more or less constant from fleet to fleet and from landing place to landing place. The commercial groups will also remain fixed during the year. The commercial groups are in these cases usually fixed by legislation, international agreements and are compulsory. Under these circumstances, the designers of the data collection programme have no options but the prevailing standards. This is usually a great advantage for the sampling programme, as the fishers will be forced to sort the landings in a unique way, which will facilitate the processing of data.

However, this ideal situation usually applies to temperate waters, where the species diversity is small relative to tropical waters. In tropical countries, the commercial groups are often highly variable, and may depend on the fleet, the buyer, the season and the landings place. Often catches are divided into “Small”, “Medium” and “Large”, but the definition of the three groups depends on the individual fisher who sorts the catch. Therefore, the groups may change during the year. Shrimp grades and some cephalopod grades for export are determined by the world market and therefore standardised.

The commercial groups may not only relate to species and size, but also to the treatment of the landings. High quality (well preserved) fish landed shortly after catching, may fetch a much higher price on the export market than low quality fish sold on the domestic market. Even if they are the same species and size, allocating them to the some commercial group may compromise the bio-economic analysis.

In general, apart from the aspects of limited resources for data collection, the commercial groups should be selected so that they reflect:

Commercial importance
Ecological importance
Selected species for fish stock assessment

The commercial groups selected by designers of the data collection programme are either the commercial groups used by fishers and merchants or they are formed by merging these groups. There is usually no option to choose alternative groups by the time the catch is landed.

All weights by commercial groups should be in units of whole wet body weight. For example, weight of dried squid, should be converted into whole wet weight. The data entered in the database should be the dried weight, together with an indication that this is “dried weight”. The conversion is then done by the database system. As a matter of principle, the data entered in the computer should always be the “raw” data, not processed data.

4.3.2.6 Seasons

The stratification of seasons is to a high degree determined by the fleet stratification and vice versa. The fleet definition usually involves accounting for seasonality.

The tropical monsoon will in most countries provide the natural stratification over seasons. Often, two or three seasons during the year will be sufficient to guarantee approximately homogenous fishing within a province. One season is most often that of rough weather. The species composition of catches, the fishing grounds and the gears used may change between seasons. Data will often be structured by month, and in case the coverage by samples is complete for each month (or week), the raising of samples can be made for periods shorter than the fishing seasons.

4.3.2.7 Species (Stock)

Biological data for individual species are usually intended for stock assessment. In practice, “management units” form the basis for sampling (See Sections 2.3 and 3.2).

Traditional stock assessment methods, like cohort analysis and VPA, use length distribution or age distribution of the entire catch from the stock as the primary input (see Section 3.2). Some of the most commonly collected biological (stock specific) data are:

Length frequency data;
Age frequency data;
Length/weight data;
Sex distribution;
Maturity stage;
Condition factor (flesh fat content);
Stomach content data;
Data for stock identification (e.g. meristic characters);
Special measurements of economic interest (swim bladder, shark fins etc.).

In addition to cohort analysis, the traditional stock related analyses are: (a) Estimation of growth parameters (b) Estimation of spawning seasons (c) Maturity ogive (percentage mature as a function of age) (d) Estimation of natural mortality. Combined with spatial data, the above data may also be used for estimation of migration routes, spawning grounds, nursery grounds, distribution by depth zone, etc.

Table 4.3.5 An example of selection of a species for a VPA. The example is from the sampling programme in Viet Nam 1996–7. In this case, the resources available for sampling limited the programme to only 23 sampling units (19 species, and for the 4 shrimp species by sex). Data on some of the species were collected only in certain parts of the country.

	Species group	Number of sampling units
1	Large pelagic fish species	2
2	Small pelagic fish species	3
3	Large demersal fish species	2
4	Small demersal species	3
5	Shrimp species (by sex)	8
6	Cephalopod species	3
7	Other invertebrates (e.g. crabs, lobsters, gastropods and/or bivalves)	2
	Total number of units to sample	23

Figure 4.3.7 Example of the selection of species for biological sampling (from Viet Nam, 1996–97).

It is usually not possible to collect data for fish stock assessment from all species of commercial interest in the waters of a tropical country, due to limited personnel and funds. A limited number of species has to be selected as representatives for the entire living resources. The selection of representative species must account for both the ecological and economic importance of the species, that is, large stock size (potential yield) and high price per kg should be the criteria for biological sampling.

The selected species should also be reasonably abundant to secure the availability of continuous samples of reasonable size. It is the author's experience that regular sampling of biological data from more than 20 species was rare in the tropical East Asian countries. In particular, it is costly to buy samples of shrimps even if the samples are resold after the measurement. If possible the data for shrimps should be obtained from processing plants (see Section 5.4).

The sample size needed for modal progression analysis (leading to estimation of growth parameters and cohort analysis, see Section 3.2) is a question often raised. Unfortunately, the author is not aware of an objective method to determine the optimum sample size. The sample size depends on the number of age groups (cohorts) to identify (see Section 3.2.2). The more age groups you try to identify, the larger the sample size needed. For shrimps or squid, for example, with a short life span (one year), there will be only one or two cohorts represented in a sample, and the sample size need not be large, say 200–500 specimens per month. For a long-lived species like grouper (life span of 10 or more years) you may need 2000–5000 specimens per sample. These numbers are based on the experience of the author trying to separate length distributions into cohorts.

The separation into stocks is often very problematic. Even for stocks in non-tropical waters with relatively few species, stock separation is often difficult. Tropical stocks may in theory be separated by the same methods as used in cold waters, such as comparison of meristic characters (for example, size and position of fins and other body parts), number of vertebrae, blood type, parasites, etc. However, these kinds of data collection may well exceed the capacity of the resources of a developing tropical country. Also collection of stomach content data to be used for, for example, the multi-species VPA, where predation mortality is estimated from stomach content of predators, may exceed the sampling capacity. Maturity data, spawning grounds and migration routes, however, are often within the reach of the budget.