Previous Page Table of Contents Next Page


6. DATABASES AND DATA MANAGEMENT (Contd.)

6.7 DATA VALIDATION

We shall use the common term “data errors” to indicate any type of deviation from the correct value.

In this context we are dealing only with the “raw data”, that is the direct observations from the field, not the processed data. As the computer does not make mistakes, the processing is always correct in the sense that the processing is always executed according to the specifications (e.g. SQL-commands, see Section 6.6).

The entry of data in the database should not be the final treatment of the “raw data”. The raw data should be checked and validated in as many different ways as possible. For example, the inspection of processed data may reveal errors in the raw data.

Errors may occur for different reasons:

  1. The enumerators may not understand the meaning of some fields of the interview form.

  2. The interviewed person may not understand the question of the enumerators.

  3. The enumerators may misinterpret the answer from the interviewed person, because of problems with the verbal communication.

  4. The encoder may by mistake write a wrong value in a field (for example, put one zero too many in a weight field).

  5. The enumerators may make a wrong identification of a commercial group.

  6. The encoder may make a wrong species identification.

  7. The interviewed person may not remember correctly what happened during the trip

  8. The interviewed person may deliberately give wrong data.

  9. The enumerator may deliberately give false data (e.g. made-up data, to meet the expected frequency of samples, without working).

  10. Wrong recording of landings due to bad organisation of the interviewed person (for example if the landings are sold to many buyers or at several landing places).

  11. Enumerator's lack of knowledge of the fishing operations, in particular when groups of vessels fish and land collectively.

  12. Enumerators lack of knowledge of transfer of catches at sea.

  13. The interviewed person may not possess the correct information about the vessel particulars, such as engine HP, gear specification etc.

  14. The interviewed person may not collect detailed data on landings, but leave it to other persons who are not available to the enumerator.

  15. The encoder may misinterpret the writing on the forms.

  16. The encoder may misunderstand the form.

  17. The encoder may make a keypunch error (hit the wrong key).

  18. …etc.

Some data can be validated automatically at the time they are entered in the database. For example, the order of magnitude can be checked and rejected if a value has been given in kilograms where it should have been tonnes. In addition, dates can be validated. For example, lower and upper limits for acceptable dates can be specified, so that the database will give a warning when a date is outside the limits. These are examples of the easy-to-detect errors.

Data selected from look-up-tables will contain only values from the look-up-table, so the only possibility for error is that the wrong value is selected. Erroneous data from look-up tables may be detected from comparison of interviews which are expected to give approximate the same results. For example, one fleet, fishing at a given fishing ground during a given period. If one trip gives a completely different composition of commercial groups, if may be an erroneous vessel registration, for example that a purse seine was given the registration number of a shrimp trawler.

If a commercial group appears in a sample, but not in other similar samples, it may be an error. Some errors are easier to detect than others. For example, if a typical pelagic species appears in the catch of a typical demersal trawl. If a species appear in an area where it is has not been observed earlier, this might indicate an error. However, if there is enough data in the database, rare events will be expected and should not be removed simply because they are rare. The validation system should be careful to verify rather than automatically delete records or fields.

Where the input is a unique key (e.g. a vessel registration number) a relational database system will not accept it as a new value. Thus, the system will automatically check that all vessel registration codes are different, and whenever duplicates are observed, action can be taken.

Samples of individual enumerators may be compared to check if there are differences between enumerators, where they should not be expected. In addition, the performance of encoders can be compared. Different encoders could enter the same interview forms, and possible encoder-specific errors may be detected. The encoder should preferably have some knowledge about the fishing sector, for example, from participation in training courses and work in the field.

When both a vessel register is maintained and vessel details are collected during an interview, we have a situation with two independent sources for the same data. If the interview gives the same value as the vessel register, then this is an indication of high quality data, if it does not, the vessel register should be checked and possibly updated, as the interview is likely to be the most recent collection of data. Alternatively, the interview may be incorrect, and that can be checked by a new interview for the vessel in question.

In general, whenever possible, data from independent sources should be compared. If, for example, the buyer fills in a sales slip and the skipper fills in a logbook with the same information, the data can be compared. If auctions are used for sale of landings, the auction authority may keep records on the sale, which can be compared with the interview data.

Date for leaving and returning to harbour for different trips of the same vessel can be compared to check for consistency. Activity data (for example fishing days per months) may be compared with the effort recorded for the trips. Harbour authorities (police, coast guard) may record the activities of fishing vessels, which may be used to check the effort reported by the skippers.

Data from the commercial fishery may be compared with data from an experimental fishery, which for example, may reveal erroneous species identification.

Supervision and on-the-job-training of enumerators is a kind of double sampling, as the supervisor will check any data collected by the enumerator, and they will (hopefully) sort out any discrepancies between them. Supervision and discussion between programme staff is probably the most efficient tool for data validation.

The processed data should be evaluated by comparison with general knowledge and common sense. All computed total catches should be evaluated for their reasonability (i.e. not being too far from the expected value).

6.8 DESIGN OF FISHERIES REPORTS

The reports produced by the database and subsequent processing of the data (resource evaluation, CPUE analysis, stock assessment, bio-economics etc.) are the last step in the process described in the manual. Some of the staff of the data collection programme, notably the scientists, may be involved also in these final steps. As the data collection programme is a routine activity repeated every year, the majority of the reports should be annual or quarterly.

Examples of such regular reports have been given earlier in the manual. Regular fisheries reports produced from the regular fisheries data collection programme comprise:

  1. Documentation of sampling programme (see Section 6.3);

  2. Documentation of database (see Section 6.3);

  3. Yearbook of Fisheries Statistics;

  4. Fisheries sector profiles (see Section 6.9);

  5. Summary vessel registration (Year book) or frame survey;

  6. Detailed landings statistics (in weight and value) by fleet, commercial category, geographical area, species;

  7. Detailed effort and CPUE statistics;

  8. Detailed vessel registration;

  9. In addition to the standard reports, the database can produce ad hoc reports, but this usually requires knowledge of SQL (see Section 6.6);

  10. Resource evaluation (e.g. fish stock assessment, CPUE analyses, etc.);

  11. Bio-economic report.

It is useful if the annual reports do not change too much between years, in order to maintain the compatibility between years. Readers will want to compare the figures of the current year to the figures of foregoing years. The design of regular reports should be changed only in cases where there is an important development in the fishing sector that cannot be covered by the present design. Therefore, the design of the regular reports is important, as it will affect the presentation of the fisheries statistics perhaps for many years.

Each report should contain tables and graphs, which illustrate the historical development of the fisheries sector. Thus, every year a new year-column or year-row is added to tables, and a new point or bar is added to graphs.

The designers of regular reports should get inspiration from regular fisheries reports of countries with a long tradition for production of such reports. The most useful guidance for design of reports is provided from countries with a similar fisheries sector and similar resources. However, ideas from countries with different fisheries may also prove useful, if due regard is given to the modification required before the design is transferred to your country.

As there are so many examples of regular fisheries statistics and regular fisheries reports, this manual shall not go further into the design of reports.


Previous Page Top of Page Next Page