Although the individual components of hardware and software may appear reasonably priced, it is apparent that there is still a considerable development cost involved. Most of the cost in the development of the data bank for genetic resources comes from defining the internal format of the data (the number of traits to be defined is very large, and a description must be entered for each trait, in three languages, as well as its format, range checks and length) and defining reports. Obviously not all expected report formats need to be determined immediately (this would detract from the advantages of a database system) but one has to ensure that the definition of the data is sufficiently flexible to meet basic reguirements. It is not the complexity of the application but its bulk which will extend development time. The cost of this time may be drastically reduced either by choosing a software supplier who provides user training and development guidance as part of the purchase price, or by assigning some local personnel to become fully proficient in the workings of the package and to then be responsible for development. This will keep down consulting cost. The first choice is very limited, as obviously no supplier is going to devote a month or two of professional time to sell a US$ 5 000 package. The second choice is more tenable; it has the added advantage that you are not only paying salaried rates instead of consultant rates, but you are also paying them in local currency; third-world salaries are much lower than those of developed countries. Ultimately, the local expertise will be required anyway.
The consultant's recommended method of approach would therefore be:
- finalize the format of data as sufficiently as practical.
- call on the help of a consultant to make the final selection of a software package. This could take two to three weeks of professional time.
- purchase the cheapest possible hardware option to run this software, but that allows a growth path to 1 MB of memory and at least 20 MB of disc capacity (for a regional site) and preferably 80 MB (for a global site) and will allow magnetic tape backup. It is essential to gain experience in the use of the database without having to outlay vast sums of money.
- based on the experience gained make the decision as to whether to have a global site or several regional sites.
- do not attempt to introduce all species at once. Become totally familiar with the system by starting with the simplest species definition but be sure that it has sufficient data to adequately test the system.
If local personnel can be employed within existing salaries to do the bulk of the development, a start system at one site covering all species could be developed for approximately US$ 15 000.
This is made up of:
|Two weeks' consultant fees to select software||US$ 3 000|
|Software||$ 5 000|
|Hardware||$ 7 000|
Data must be extracted from relevant source documents and transcribed onto well-designed input forms. These forms must closely match input screens so that a data entry operator does not have to search through the forms to find information required for the next field on the screen. Considerable thought should be placed into the design of screen and input form format. It is obvious that there are going to be very many fields for which no data are available. It is pointless having to fill in, say, 10 fields of useful information in a form containing over 2 000 items. Data must be grouped logically so that where information is missing whole sections can be omitted both in filling out the input forms and prompting on a terminal screen. Screens and forms should be sufficiently clear that it is not necessary to look up instructions on how to fill them in (except in rare esoteric cases). Menus should be developed to allow entry of specific types of information rather than be forced to go through the same set of prompts for each record. For example, in the buffalo “slave” record descriptions logical divisions could be:
|a)||Basic information||- items||1 – 8*|
|b)||Environmental||- items||9 – 12|
|c)||Management||- items||13 – 17|
|d)||Nutrition||- items||18 – 19|
|e)||Diseases||- items||20 – 21|
|h)||Genetic parameters||- item||24|
* Item numbers refer to the number series used in the final description lists received by the consultant.
Categorizing the information like this provides a convenient check list of traits that may appear in the source document without becoming awesome. Thus, for example, document 1 may only use input forms a), b) and c), while document 2 may use forms a), b), d) and h). Only the relevant screens need be called up for data entry.
It is essential that only relevant information gets into the database. To guarantee this will require a stringent screening of data by a genetics expert and to subsequently apply tight editing checks on input to the computer. The former is more important, however. A database should store only useful information. It is simple to correct a typing error, but it is much harder to detect misleading information when it has become part of statistics.