7.1 NEED FOR DATA MANAGEMENT
7.2 DATABASE DESIGN
7.3 DATA MANAGEMENT OPERATIONS AND MAINTENANCE
7.4 DATA ACCESS AND DISSEMINATION
Fishery data must be stored securely, but made easily
available for analysis. The design of a data management system should follow the
basic data processing principles. The database should store the original raw
data. The data management system should be integrated with the data collection
system as far as possible. Database design and software development can vary in
approach from adapting an existing system to designing a new system from
scratch. In all cases, the system should be well documented. The human-computer
interface needs to guide the user in getting the best out of the system,
including help and local language facilities. Data entry should integrate import
functions and validation controls, processing should use embedded functions for
common procedures, and reporting should be flexible and include an export
facility. The responsible authority must commit adequate financial and personnel
resources for maintenance, make regular archives to protect the data, and
periodically re-evaluate the design to be sure the system is meeting its
objectives. Access should be controlled to ensure database integrity and
confidentiality, but interfere as little as possible with legitimate access.
Decision making for fisheries policy-making, planning and management relies largely on processed information, not raw data. Data have to be interpreted before they can be utilised. The volume of raw primary data is often very large, and so can only be used effectively if held in a Data Base Management System (DBMS). The functions of a DBMS are:
· to ensure data conform to standard classifications;A fundamental principle is to hold all data as they were collected, in their primary form. This allows flexibility in the way data can be processed (e.g. filtered, aggregated, transformed), and ensures all calculations are reproduced from source data incorporating all revisions. Considering the considerable investment in data collection and low costs storage and processing, there is little reason for not holding complete data in its primary form.
· to ensure validity of the data;
· to ensure data integrity and internal consistency;
· to secure and maintain primary data;
· to allow easy access to primary data;
· to process the data efficiently as required;
· to allow different data sets to be integrated, thereby increasing their overall utility.
7.2.2 Human-computer interface
7.2.3 Computerised documentation
7.2.4 Data entry
7.2.5 Data processing
7.2.6 Data reporting
7.2.7 Geographic Information Systems (GIS)
Information technology is diverse and changing rapidly, so it is important to seek the most up-to-date advice before selecting a system or developing an application.
Ideally, database developers should be involved in not just the data management, but also the sampling system. Although fisheries experts may be aware of computer technology, they should not be concerned with actual implementation of the database system. Likewise, computer professionals should not be concerned with developing a fishery sampling system. However, when the two activities occur at the same time, each can complement the other to mutual benefit, increasing the probability of a project's success.
A decentralised database design should be considered to make database management and data validation easier. In a distributed system, data are entered and validated locally, but linked with other databases for analysis. Data can be made accessible for analysis through a centralised database, preferably housed at a national institution.
When considering the approach to take for creating a new data collection system, there are various options available. These include:
· Taking a commercially available software and adapting it to new requirements;The advantages and disadvantages vary for each approach and should be weighed carefully before committing resources.
· Piecing together a system with different software components;
· Creating a custom system from scratch.
Customised database systems rely on the presence and continuing involvement of the system developers. Contingency plans should be established to minimise the risk of system failure should these developers become unavailable. In all cases, the system should be fully documented. However, custom systems are often still better than on-site adaptation of a commercially available system, as significant modifications to an existing system can sometimes cripple their intended function. Although adapting a system has lower initial costs, it can sometimes prove more costly in the end because of increased maintenance requirements.
An important benefit of custom development is that it can be configured to match closely the data sampling methodology, so the system will be more efficient and easily accepted. Another possible benefit is that the database design can also be used as a tool to assist the development of the data collection programme. If the two development phases occur simultaneously, the use of common terminology (i.e. species identification, sampling techniques) and tools (i.e. data flow diagrams, task analysis) can be mutually beneficial to the two systems.
Depending on the quantity of data and the availability of resources, commercial desktop applications for database development can have long-term limitations. For larger fisheries, they should only be used for initiating data collection programmes and prototyping (i.e. scenarios, storyboards). The limits of these tools for large-scale sampling should be realised and the data collection methodology will eventually require migration into a more formal and robust system. Benefits from prototyping may include better identification of data flows and system components, which can assist integration of the data collection methodology and data storage design.
An established software development life cycle should be used when designing and developing a database system (Fig. 7.1). Failing to follow standard software development methodology is a major contributing factor to system failure or severe cost and schedule overruns.
Figure 7.1 Examples of established software development life cycles: the 'Waterfall' methodology (left) and the "Star Life Cycle" (right), which is a more contemporary approach to software engineering.
Important to the overall acceptability of DBMS is the Human-Computer Interface (HCI). Users of the DBMS (i.e. data encoders, scientists, decision-makers and policy planners) should be involved in the development of the HCI. The following are some basic principles that may be employed to develop effective HCI interfaces:
· Automated procedures to guide users on how to proceed when using the system;Whenever possible, efforts should be extended to provide HCI interfaces in the local language. This makes the system easier to understand for local users, increasing operator learning rates and overall data quality.
· Use of graphical structures such as command-buttons in the HCI, preferably with commonly applied icons, to facilitate access to frequently used functions;
· Use of menus to list commands;
· Readily accessible "Help" keys or a command-button to access on-line help messages.
On-line help, documentation, tutorials and training are contributing factors to the sustainability of a database. Special consideration should be placed on the development of these components within the system. Preferably, the development of these components should proceed in parallel with the development of the software/user-interfaces. However, this does not eliminate the need for hard copies of the documentation.
When creating or modifying a data entry system, it is often necessary to incorporate historical data that has been stored on non-computer media. In such cases, all possible methods of bulk data conversion (scanning, inexpensive local labour, etc.) should be considered for conversion to computer-compatible form. This allows for data integration, which is necessary for proper analysis.
Additionally, an 'Import' function should be available to incorporate data commonly held in alternative formats (e.g. word processor or spreadsheet). This function should ensure data integrity and quality is maintained.
When applicable, special structures or software links should be developed to facilitate retrieval of data from other computer sources such as electronic logbooks. Again, care should be taken that data integrity is maintained and data are properly validated.
Data validation can be implemented at various levels including data collection, compilation, data entry to a DBMS, data processing and analysis. Data entry user-interfaces should be structured to enforce sets of rules applied to validate inputs.
A feature of DBMS technology, which should be exploited when developing or modifying a data collection system, is the capability of imbedding control and processing within the database using stored procedures and queries. This approach has the advantages of:
· reducing the amount of exterior processing necessary;An important consideration when processing data is the need for maintaining an audit trail of all actions performed on data to allow subsequent review of information quality.
· providing more immediate data validation;
· increasing flexibility for future system modifications.
Whenever possible, parameters should be used to make the system more flexible. Parameters are easily changed values that alter the structure and function of the system. Often, requirements change over the life of a system, and allowing expansion and modification without major configuration changes can preserve the viability of the data collection system.
Flexibility when producing reports from data is important. Often, the potential uses of data are not fully recognised before a system is operational. To allow ease of retrieval/reporting helps prevent unnecessary secondary modifications to a system.
To facilitate report flexibility, a general-purpose 'export' function should be provided. Features that this function should have are:
· identifying name-tags for all exported data attributes;
· a summary of data types and formats;
· variable length records with user selected field delimiters (i.e. ASCII files with commas or tabs).
It is useful to present spatial data in graphical form. Presenting geo-referenced data graphically offers the advantage of allowing the view of the data relative to other geographical data like positions of rivers, mangroves, reefs or other features that are known to have an effect to fisheries production. Commercially available systems should be able to access geo-referenced data within the DBMS, however data management remains the responsibility of the DBMS.
7.3.3 Design re-evaluation
In order to sustain the use of the database, there is the need for a long-term commitment to support the data management application. Adequate personnel should be available not only for routine operation, but also to modify the system as the need arises. Failure to provide such support is very likely to result in a gradual loss of system capabilities and ultimately may contribute to a collapse of the system.
The database should be backed up regularly. The system should always be prepared for major hardware or software failures and data loss. Procedures should be made as simple as possible to ensure that backups are regularly made.
As the database evolves with time and changes in information technology occur, data archiving is essential to allow retrieval of historical data stored in former structure or design. Archiving of data should be done using a non-volatile media (e.g. CD-ROM) and system independent data format.
As a result of established feedback mechanisms and in order to ensure that the data management system is meeting its objectives (i.e. complying with the needs of clients) periodic evaluations should be undertaken. Representatives of those using the system should be present.
A continuing programme of design evaluation is recommended to ensure that the system takes advantage of recent developments in information technology. Special attention should be given to establish procedures for upgrading archived data so that data in the old format will continue to be accessible.
7.4.1 Data ownership and control
7.4.2 Communication networks
7.4.3 Computerised publication
The state or agency where the data originated is the main owner of the data. Recognising that data are a resource and hence have values, economic or otherwise, the Government should exercise its right to maintain, secure and control access.
Control is the limit placed on the ability of an individual, a group of individuals, organisations or another state to have partial or full access to the data contained on a database. Partial data access is the inability to do any of the following: (i) view all of the data entered and stored by the system, (ii) append data, (iii) edit data, (iv) copy data, or (v) distribute/share the data by any means. Controls should be used to limit access in a manner consistent with any confidentiality requirements and protect the data from unauthorised changes. Of greatest importance is the protection of primary data from accidental corruption. The master copy of data must always be 'write protected'. However, although control and security are important, they should not hinder legitimate access. In particular, the security and control features of the DBMS should never hinder state-recognised scientific institutions from accessing data for resource management research.
Special provisions should be made in the DBMS to facilitate sharing of the data with other states and regional organisations as appropriate. The UN Fish Stocks Agreement requires states to exchange information for managing straddling and highly migratory fish stocks. Data exchange is facilitated if national standards and classifications share a common regional or inter-regional set of statistical standards, most specially at higher aggregating levels.
Developments in communications technology open a new arena of possibilities with regard to the distribution of data. Whenever possible and appropriate, the DBMS design should consider structures that will facilitate distribution, or allow direct access of the data from remote locations.
Development of software for tutorials, demonstrations and related documents (e.g. on-line help text, computer-based user guide) is essential to long-term viability of the database. These documents may reside locally or, preferably, nationally in a form allowing network access.
The use of digital media should also be considered for disseminating statistics. For example, the Internet offers an inexpensive method to share information, allowing secure access to data and analytical results.