5. DATA COLLECTION STRATEGY

5.1 INTRODUCTION
5.2 INFORMATION REQUIREMENTS FOR SYSTEM DESIGN
5.3 CO-MANAGEMENT AND SYSTEM DESIGN
5.4 COMPLETE ENUMERATION AND SAMPLING
5.5 COMPLETE ENUMERATION APPROACHES
5.6 SAMPLE-BASED APPROACHES
5.7 OPERATIONAL CONSIDERATIONS

Before looking at the details of the data collection methods, an overall strategy is required. The way in which different data variables are collected needs to be tailored to the structure of the fishery. A key element in design is the degree to which fishers and others co-operate, an issue which is most effectively addressed by using a co-management approach. Designers must choose which variables need to be collected through complete enumeration and which should be sampled. Complete enumeration is expensive for many variables, but must be carried out for some if totals (e.g. total catch) are to be estimated for the fishery. Sampling is more cost effective, but care is required in designing the distribution of sampling effort in time and space. Finally, the strategy will be strongly influenced by the budget and personnel available.

5.1 INTRODUCTION

Strategies for the design of data collection programmes will vary between fisheries. Within a state or region, there almost always will be a mixture of industrial, small scale commercial, artisanal, subsistence and recreational fisheries. Each will have its own characteristics, its own relative importance and its own potential for the supply of data. In addition, some information must be obtained from external sources, such as international market data, or catch data from foreign fishing vessels that never visit state ports.

Each fishery will require its own strategy with elements of complete enumeration and sampling. Over time some aspects of a data collection strategy may move from complete enumeration to sampling (or vice versa), particularly as knowledge is developed and requirements or resources change. Sampling strategies are often punctuated by complete enumeration from time to time in order to re-evaluate baseline data.

It is not feasible to construct a perfect strategy for any one fishery or subsector that will meet all requirements for all time. Flexibility and the adoption of alternative approaches must form a key component of any strategy, whether it is designed for assessment of fish stocks, the evaluation of markets or the assessment of community dependence on fisheries.

In general, however, any strategy will require the following steps:

· evaluate existing data sets in relation to the objectives of the programme, including accessibility of the data (i.e. computerised, on paper);
· describe the operating characteristics of the sector or subsector (e.g. fishery, market, fleet, community, institutional environment), also known as the census or frame survey;
· decide on the approach to be taken: complete enumeration or sampling, including cost-benefit and cost effectiveness analysis and an evaluation of operational considerations (institutional, financial and human resources);
· design methods according to the approach adopted, including the form of stratification to be used in sampling;
· implement a test phase to validate the method, including participation by other stakeholders;
· establish a continuing feedback mechanism between data sources and data users to ensure that data types, quantity, quality and origin are consistent with the requirements for determination of the performance indicator in question.

5.2 INFORMATION REQUIREMENTS FOR SYSTEM DESIGN

Infrastructure information is essential for constructing frames for a data collection programme. The first step is to define the water bodies and areas that will be included, and prepare a description of the fishing industry operating within them (ports and landing places, fishing fleets, fishers, markets and transportation routes etc.). Such information serves to provide a detailed classification and description of the structure of the primary fishery sector, and is essential for establishing a proper collection scheme for all fishery data. Many of these institutional data are also required for socio-cultural analyses.

Essential infrastructure and personnel information required for this purpose include:

· existing ports and landing places, their locations, patterns of distribution and accessibility;
· numbers of fishing units and information on their composition such as fishing gears, fishers, fishing craft, and their geographical distributions in relation to home ports and landing places;
· fishing activity and landing patterns including their geographical, seasonal and diurnal distributions, and some information on the extent to which different units and vessels switch between fisheries. In order to do this, some working definition of a fishery needs to be adopted (see below);
· supply centres for capital goods, essential material and services (e.g. fishing gears and their components, fuel oil, engine parts, vessel repairs, navigation equipment, ice);
· fish distribution routes, fish utilisation, fish processing and marketing practices, fish trade, local consumption, number of processors and marketing units.

The description of the fisheries infrastructure and personnel in terms of its main units is sometimes called a frame survey. Where possible, the survey should draw upon information available from scattered sources including vessel registers, harbour radio logs, ports, market sales, transport and other administrative records, fishing population censuses, maps, fishing charts and other information.

A routine data collection programme should be preceded by a pilot programme. This is limited in time and space, with the main purpose of familiarising the designers with fishery conditions. It can be used to test alternative procedures and different sources for data collection, although collection of data is not the purpose of a pilot programme. Generally, a pilot survey is of much wider scope than the final frame survey and can contain a large variety of data types related to other important indicators of the fishing industry. For example, as well as infrastructure and fleet characteristics, surveys might record normal vessel activity data by season or fishers opinions on what are the critical factors in the fishery. Some of these data can be very useful for survey planning purposes. At the same time, these parameters can provide indications as to which of these schemes would seem more suitable from both methodological and operational standpoints.

5.3 CO-MANAGEMENT AND SYSTEM DESIGN

Management measures are more likely to be compatible with community values and to create a greater commitment to the system if the users and managers are both involved in the formulation of policy and fishing regulations. This, in turn, should result in greater compliance and lower enforcement costs. Social science studies of local common property management systems, for example, have shown that local fishing communities are more willing to engage in self and mutual monitoring where they have helped to formulate and support regulations. This reduces the need for expensive government supervision. This type of participatory management is often referred to as "co-management".

There are many types of co-management, which may integrate data collection. Fishers and scientists may conduct joint experiments, or meet in joint councils where information can be used to co-operatively plan management actions. Public meetings may be held to inform local community members of proposed management measures and solicit input and opinions. The possible arrangements are endless; the exact format must depend on the particular situation, including the current political organisation.

5.4 COMPLETE ENUMERATION AND SAMPLING

5.4.1 Definitions
5.4.2 Deciding between complete enumeration and sampling

5.4.1 Definitions

Data collection is the recording of one or more data variables (length, duration, etc.) from members of a population of "data-units" (the population of fishing vessels, fishing trips, etc.). Two basic data collection approaches are possible:

· by complete enumeration, where all members of the whole population are measured;
· by sampling, where only a proportion of members of the whole population are measured.

Fisheries data usually collected by complete enumeration include vessel registers and infrastructure data. Data sometimes collected by complete enumeration and sometimes by sample surveys include catch per unit effort, price per kilogram and costs and earnings of fishing units. Data usually collected by the sample-based approach include species composition and biological data (e.g. size frequency data). A complete enumeration may well refer to a sub-set, for example, one may make a complete enumeration of all vessels longer than 10 metres.

5.4.2 Deciding between complete enumeration and sampling

Both complete enumeration and sample-based approaches have as their objective the collection of data for a specified period, often over a calendar month, to determine some statistic of interest. For example, a total enumeration approach could be used to calculate the total catch where all landings were monitored. An example sample-based approach to estimate total catch would use the mean catch per fishing day from a landings sample and the mean number of fishing days per vessel from a vessel sample, which multiplied together give the mean catch per vessel. The total catch can then be obtained by multiplying this by the total number of vessels (a raising factor) obtained from a frame survey or vessel register. The applicability of either sampling or complete enumeration is determined by various criteria related to the type of data and to existing financial and human constraints.

Most data collection methods can be utilised under either complete enumeration or sampling approaches. For instance, logbook catch and effort information may be monitored by means of a complete reporting of landings. Catch and effort data from small-scale and subsistence fisheries generally are sampled. Biological and socio-cultural data are usually collected through a sample-based system, though demographics are collected by complete enumeration. Very large populations, like fish stocks, can be sampled only.

A complete enumeration-based survey is often preferred for certain types of data, solely because it is expected that it will provide complete statistical coverage over space and time. However, a well-designed sample-based survey can often provide good estimates of important parameters at a fraction of the cost. Complete enumeration of some variables (e.g. through a frame survey) is always needed to obtain raising factors when totals of variables like catch or effort are required. Which approach is used will depend on local circumstances.

Complete enumeration sometimes may be seen as desirable, but not attainable for operational reasons. An existing sampling programme can be progressively expanded to provide more reliable and robust estimates, if human and logistics resources allow such expansion in a sustainable manner. Usually such progressive expansion is done in distinct phases. For estimating total catch and effort this would involve:

· Phase 1: Use of frame surveys to obtain a raising factor, while sampling in space and in time for effort and CPUE. This is the most common scenario, whereby one sample survey is used for the CPUE, and three surveys for estimating total fishing effort (frame survey, vessel/gear activity survey and survey for active fishing days).
· Phase 2: Sampling in space and time for CPUE, sampling in time, but complete enumeration in space for effort (no need for frame surveys). This improves significantly the reliability of effort estimates since it does not involve frame survey data, which are usually the weakest component in estimating total effort (they are static and therefore often outdated).
· Phase 3: Sampling in space and time for CPUE, but complete enumeration in both space and time for effort (no need for frame surveys or surveys on active fishing days). This is the most accurate of the three sampling scenarios because it involves only one sample survey related to CPUE and species composition.

The passage from one sampling scenario to another of higher accuracy requires large increases in operational and logistical support, and is not always feasible or desirable.

5.5 COMPLETE ENUMERATION APPROACHES

Frame surveys and fishery censuses are a common category of data collection for which the complete enumeration approach is required. These surveys are designed to collect data necessary to describe the basic structure of the capture fishery production sector and activities directly dependent upon it, including infrastructure, employment, and community dependence. Such information is a pre-requisite for conducting on-going collection schemes using either complete enumeration or a sample survey approach.

Complete enumeration may be preferred in cases where data sources can be legally obliged to report, thus reducing the cost of this approach. Complete enumeration may be required as a statutory obligation, often for regulatory purposes. Examples include fishing vessel registers, exports (for custom tariff purposes), variables related to catch quota management (e.g. using fishing logbooks) and variables related to fishing effort limitations (e.g. days at sea).

Complete enumeration may also be preferred in cases where little effort is saved by sampling, such as if the data population is small or the variable to be measured cannot be time-sampled realistically. This might occur with small fishing fleets, where the CPUE is very erratic.

An important consideration concerning the complete enumeration approach is the risk of negative bias due to incomplete coverage. In practice, there is always a proportion of the population, which is not captured by a data collection scheme intended to have complete coverage. The reasons for these information gaps are most commonly associated with operational difficulties. When the proportion of missing data is known to be relatively small, the results can be adjusted to reflect the actual situation. However, there are cases where a proportion of population is never captured by the system and the level of under-reporting is unknown, and so the census results contain a systematic negative bias which will be very difficult to correct.

Another common source of bias occurs when the data collected are used for control of fisheries regulations (for example catch quotas). In this case, deliberate misreporting may occur to cover illegal fishing.

Developments in data collection technologies, such as vessel monitoring systems, electronic logbooks and automatic logging of market information are providing an opportunity for complete enumeration in situations, which before were ignored or could only be covered by sampling.

5.6 SAMPLE-BASED APPROACHES

5.6.1 Stratification in data collection
5.6.2 The effect of stratification

Sample surveys operate on selected subsets of the target population and, using a number of assumptions regarding the distribution of the population, provide estimates of the parameters under study. As well as the sample error, sample-based surveys involve uncertainties as to the correctness of the various assumptions used. However, a well-designed sampling survey can often produce accurate and reliable estimates at a cost much lower than that of complete enumeration.

The nature of many variables (for example, fish size frequencies) dictates the application of the sample-based approach. It is necessary to consider carefully how individuals are selected for measurement, whether it is selecting fish from a catch, vessels landing their catch from all those landing at a particular port, or fishers for interview. Therefore, to draw out the relationship between the whole population and the sample, the sampling methodology needs to be based on sound statistical methods and fully documented.

One of the main issues is to reduce sample bias in estimates. Bias, in this case, is the tendency for estimates to centre on a value different from the true value as data accumulate. This can occur if, for example, data collectors tend to choose larger fish or vessels when sampling. The simplest theoretical way to avoid bias is to use random sampling. Under this scheme it is ensured that all individuals (fish, vessels etc.) within a stratum have an equal chance of being selected. In practice, this is often difficult to achieve, and a systematic sampling scheme (every third vessel, or tenth box of fish etc.) is used, which guards against the worst forms of bias. However, it should be borne in mind that most analytical methods assume random sampling, and therefore the possible effects of other sampling methodologies need to be considered in interpreting results.

5.6.1 Stratification in data collection

Stratification reduces the error in sample estimates by systematically removing as much as possible of the data variability through the sampling design. This is achieved by dividing the sample population into groups or strata, where as much as possible of the variability in the population is represented in differences between the groups. For instance, industrial vessels would probably be treated as a separate stratum to artisanal vessels, since across the fleet, this division marks a clear divide in many variables. There may also be clear logistical criteria supporting the choice of strata.

There are two major types of stratification in a data collection programme.

· Subdivisions based on administrative, geographical or temporal criteria, that are imposed on the data collection programme for reporting purposes, and are therefore not under the control of the survey designer. Conventionally, in this document, this type of subdivision is referred to as a major stratum. Major strata are for example: provinces of a country, the months of the year, fishing seasons, subdivisions based on specific research needs etc. Major strata may be based on any combination of such criteria, for instance administrative, regional and seasonal.
· Within a major stratum there are usually subdivisions based on criteria, which are chosen by the designer for the sole purpose of increasing the accuracy of the derived estimates. These subdivisions are chosen in such way as to partition the population into homogeneous subsets. In this document they are conventionally called minor strata. Examples of minor strata include fishing grounds, lunar versus dark periods, and small scale versus semi-industrial fisheries.

Estimates of population parameters are always calculated at minor stratum level. Totals at major stratum level are simply aggregations of estimates and counts from the minor strata involved. Table 5.1 gives further examples of major and minor strata.

Table 5.1 Some examples of stratifications for fisheries data collection

Strata group

Stratification

Spatial

Province of country or major city
Districts (islands, villages)
Home port (place of registration)
Base port of fishing
Community of residence
Landing place
Fishing grounds

Time

Fishing season
Basic time period (week, month, year)
Day/night

Enterprises

Companies/co-operatives
Processing plants
Type of support industry

Trade

Markets/auctions
Intermediaries/companies
Exporters/importers

Vessel/gear group

Fishing fleet
Gear
Vessel group (small scale, semi-industrial, industrial, joint venture, foreign)
Fishery (métier) (defined by fleet/target species/gear)

Experimental fishery or research vessels

Geographical areas/depth zones/bottom types/habitats
Time period/day-night
Gear/fishing operation

Landings

Commercial species group (catch/effort, value)
Commercial size/treatment group (catch/effort, value)
Ecological species groups
Landings agent

People or households

Demographic sub-groups
Fishing community
Vessel group
Economic sector (harvest, post-harvest, market, support industry)
Status (captain, crew, vessel owner)

Environment

Habitats (floodplain, lake, mudflats, mangroves, upwelling areas)
Season
Physical oceanographic/limnological criteria

5.6.2 The effect of stratification

Stratification sometimes may be complicated by the need to reconcile two conflicting objectives:

· To select strata with the maximum degree of homogeneity;
· To minimise the number of strata (usually in view of operational constraints).

However, by systematically varying the stratification in a pilot phase, the appropriate balance can often be found using a variety of methods, as illustrated in Box 5.1.

Box 5.1 Examples of the use of stratification
Combining two gears into one
A boat/gear classification contains two different gear types (e.g. tangle nets of different mesh size), but repeated tests for species composition, average size of fish and CPUE have revealed that there are no statistically significant differences between the two types. It would thus seem reasonable to combine the two gears into one, thereby simplifying data collection operations in frame and catch/effort surveys.
Reduction of sampling effort
Fishing effort for line fishing is collected 16 times a month and the variation for the Boat Activity Coefficient (BAC) is only 3% (an indication of high homogeneity in the level of fishing activity). Using the collected samples and simulating a reduction in the sampling days using a computer, it has been found that the new variation is 6% and the resulting estimates are close to the old ones. This suggests that data collection of fishing effort for this gear may be reduced from 16 to 8 days without seriously degrading the accuracy of estimates.
Stratification in time
For all boat/gear types of an inshore fishery there have been consistent and highly significant differences in catch rates and species composition during the lunar and dark periods. This indicates that the reference period (a calendar month) should be further stratified in time (lunar period and dark period).
Stratification in space
A large homeport contains most of the boats for a trap fishery and they all use a fishing ground that is inaccessible to other boats with traps. Catch rates are significantly different from the rest of the fleet that operate from other sites. This indicates that this particular homeport should become a minor stratum.
Stratification on size of landing places
Stratification of landing places and/or homeports with respect to size is a simple arithmetic process involving an initial list of sites (rows) to which a variable number of indicators (columns) is associated. These indicators may refer to number of fishing units and/or gears by boat/gear type and may be supplemented with other quantitative criteria. The arithmetic process involves normalisation of each column (for instance by converting all values into dimensionless values between 0 and 1), and formulation of totals by site using the normalised values. The stratification criteria are then based on ranking the sites (rows) by their individual percentage of the grand total in descending order and formulating cumulative percentages that will range between the maximum percentage (first row) and 100% (last row). This percentage list can be used to determine partitioning schemes where sampling effort is proportional to the stratum size.
For instance, out of 600 sites the first 10 with up to 50% cumulative percentage will be "primary" and the remaining 590 "secondary". This, in turn, means that 50% of data collection effort must be allocated to only 10 primary sites and the other 50% to all 590 secondary sites. In this manner, data collection effort is allocated proportionally to the size/importance of sampling sites.

Once a stratification scheme has been decided upon, the next problem is the allocation of sampling effort to the strata. The basic three rules (known as "Von Neumann allocation") for optimisation are to allocate larger sampling effort to strata with i) greater size, ii) greater variation and iii) lower sampling cost.

The cost of a sample-based approach is mainly a function of its statistical coverage or sample size. When the sample size increases the survey cost also increases and the expected accuracy is higher. However, the increase in accuracy is not proportional to the sample size, but suffers from diminishing returns. For example, to obtain a sufficiently accurate size frequency distribution requires only a relatively small random sample of all the fish landed.

5.7 OPERATIONAL CONSIDERATIONS

Regularly conducted data collection programmes require consideration of operational criteria as well as statistical needs and cost-effectiveness. Methodological approaches, such as the selection of strata, will often be constrained by operational considerations including institutional, financial and human resources. Sampling effort for routine data collection has to be set for the long term. For example, the frequency of visits to landings sites has to be set realistically to a level which takes account of the number of data collectors and their other responsibilities. It is all to easy to set up an ambitious data collection programme, which proves unsustainable as enumerators find themselves unable to complete all tasks assigned to them.

Careful planning under operational criteria (see Table 5.2) will be needed to ensure that available resources support the choice of stratification and the required sampling intensity in space and time. Budgeting for institutional development and training will, therefore, need to be considered well in advance of implementation.

Table 5.2 Examples of operational parameters used in planning data collection programmes.

Parameter type

Purpose

Travel time and costs between sites using alternative transportation means

To determine total time required for visiting sites using existing transportation.
To examine if investing in increased mobility will be cost-effective.

The times when boats usually land their catch, by gear type.

To determine the most convenient time intervals for sampling landings.

Whether boats land their catch at a different site than their homeports

To determine whether it is possible to combine landings, effort and other sampling into one visit.

Whether boats migrate to other places (if so, where and when).

To include seasonal migration information into the frame survey data

Usual duration of a fishing trip, by fleet.

To simulate fishing trips.

Catch per day by fleet.

To simulate CPUE and catch per trip.

Number of gears (traps etc.) or gear operations (e.g. net hauls) normally used, by fleet.

To simulate fishing operations and derive catch-per-gear/operation.

Maximum time required for a full recording of landings and species composition from one boat.

To determine how many boats can be sampled per hour.

Maximum time required for inquiring whether a fishing unit was active or not.

To determine the total time required (per day and per homeport) for recording boat/fishing unit activities.

General budget items will include:

· Human resources - costs of salaries, training, contracts etc.
· Institutional resources - costs of establishing and maintaining committees, working groups, focus groups between government institutions and private sector (companies, producer organisations, communities) etc.
· Capital expenditure - costs of transport (including vessels), computers, offices, equipment etc.
· Recurrent expenditure - costs of communications, travel, office, publications, utilities etc.

In developing countries, it may be possible to establish projects supported by other development partners. In the conduct of such projects, careful attention should be paid to sustainability after the project's end. This involves:

· Training
· Establishing appropriate enumeration or sampling methods in the expectation of declines in resources
· Development of analytical methods and tools (models, software)
· Seeking the establishment of positions within government and the finance for their management
· Preparing for alternative sources of funding (industry or local government; additional taxes or levies; etc.)

Strata group	Stratification
Spatial	Province of country or major city Districts (islands, villages) Home port (place of registration) Base port of fishing Community of residence Landing place Fishing grounds
Time	Fishing season Basic time period (week, month, year) Day/night
Enterprises	Companies/co-operatives Processing plants Type of support industry
Trade	Markets/auctions Intermediaries/companies Exporters/importers
Vessel/gear group	Fishing fleet Gear Vessel group (small scale, semi-industrial, industrial, joint venture, foreign) Fishery (métier) (defined by fleet/target species/gear)
Experimental fishery or research vessels	Geographical areas/depth zones/bottom types/habitats Time period/day-night Gear/fishing operation
Landings	Commercial species group (catch/effort, value) Commercial size/treatment group (catch/effort, value) Ecological species groups Landings agent
People or households	Demographic sub-groups Fishing community Vessel group Economic sector (harvest, post-harvest, market, support industry) Status (captain, crew, vessel owner)
Environment	Habitats (floodplain, lake, mudflats, mangroves, upwelling areas) Season Physical oceanographic/limnological criteria

Parameter type	Purpose
Travel time and costs between sites using alternative transportation means	To determine total time required for visiting sites using existing transportation. To examine if investing in increased mobility will be cost-effective.
The times when boats usually land their catch, by gear type.	To determine the most convenient time intervals for sampling landings.
Whether boats land their catch at a different site than their homeports	To determine whether it is possible to combine landings, effort and other sampling into one visit.
Whether boats migrate to other places (if so, where and when).	To include seasonal migration information into the frame survey data
Usual duration of a fishing trip, by fleet.	To simulate fishing trips.
Catch per day by fleet.	To simulate CPUE and catch per trip.
Number of gears (traps etc.) or gear operations (e.g. net hauls) normally used, by fleet.	To simulate fishing operations and derive catch-per-gear/operation.
Maximum time required for a full recording of landings and species composition from one boat.	To determine how many boats can be sampled per hour.
Maximum time required for inquiring whether a fishing unit was active or not.	To determine the total time required (per day and per homeport) for recording boat/fishing unit activities.