APPENDIX D: DATA ENVELOPMENT ANALYSIS (DEA)

Data envelopment analysis or DEA is a linear programming technique developed in the work of Charnes, Cooper and Rhodes (1978). It is a non-parametrictechnique used in the estimation of production functions and has been used extensively to estimate measures of technical efficiency in a range of industries (Cooper, Seiford and Tone, 2000). Like the stochastic production frontiers, DEA estimates the maximum potential output for a given set of inputs, and has primarily been used in the estimation of efficiency. However, again like the SPF approach, DEA also can be used to estimate capacity utilization (Färe, Grosskopf and Lovell, 1994). The Färe, Grosskpof and Lovell approach, however, seeks to determine capacity output, conditional on the fixed input binding production. This is the weak concept of capacity output offered by Coelli, Grifell-Tatje and Perelman (2001). The strong concept includes the weak concept, while the weak concept does not include the strong concept of capacity output. In addition, the weak concept avoids problems caused by particular functional forms and decreasing returns to scale (e.g. the Cobb-Douglas production function, which does not have an absolute mathematical maximum).

Seiford and Thrall (1990) describe DEA in terms of floating a piece-wise linear surface to rest on top of the observations (i.e. envelop the data). More specifically, the key constructs of a DEA model are the envelopment surface and the efficient projection path to the envelopment surface (Charnes et al., 1995). The projection path to the envelope surface is determined by whether the model is output-oriented or input-oriented. The choice of input- or output-oriented models depends upon the production process characterizing the firm (i.e. minimize the use of inputs to produce a given level of output or maximize the level of output given levels of the inputs). For the purpose of estimating capacity in fisheries, only the output-oriented DEA measures have been empirically estimated.

A key advantage of DEA over other approaches previously examined is that it more easily accommodates both multiple inputs and multiple outputs. As a result, it is particularly useful for analysis of multispecies fisheries, because prior aggregation of the outputs is not necessary. Further, as will be outlined below, a specific functional form for the production process does not need to be imposed on the model (as is required in the use of the SPF approach).

In fisheries, the technique has been applied to the Malaysian purse seine fishery (Kirkley et al., 2003), United States Northwest Atlantic sea scallop fishery (Kirkley et al. 2001), Atlantic inshore groundfish fishery (Hsu, 2003), Pacific salmon fishery (Hsu, 2003), the Danish gillnet fleet (Vestergaard, Squires and Kirkley, 2003), English Channel multispecies multigear fisheries (Pascoe, Coglan and Mardle, 2000; Tingley, Pascoe and Mardle, 2003), the Scottish fleet (Tingley and Pascoe, 2003) and the total world capture fisheries (Hsu, 2003).

CRS and VRS frontiers

The envelopment surface will differ depending on the scale assumptions that underpin the model. Two scale assumptions are generally employed: constant returns to scale (CRS), and variable returns to scale (VRS). The latter encompasses both increasing and decreasing returns to scale. CRS reflects the fact that output will change by the same proportion as inputs are changed (e.g. a doubling of all inputs will double output); VRS reflects the fact that production technology may exhibit increasing, constant and decreasing returns to scale. As demonstrated in Section 2.6, input- and output-based capacity measures are only equivalent under the assumption of constant returns to scale. However, there are generally a priori reasons to assume that fishing would be subject to variable returns and, in particular, decreasing returns to scale (see Section 2.6). Cooper, Seiford and Tone (2000) provide a discussion of methods for determining returns to scale. In essence, the researcher examines the technical efficiency given different returns to scale, and determines whether or not the observed levels are along the frontier corresponding to a particular returns to scale.

The effect of the scale assumption on the measure of capacity utilization is demonstrated in Figure D.1. Four data points (A, B, C, and D) are used to estimate the efficient frontier and the level of capacity utilization under both scale assumptions. Note that only fixed inputs are considered in Figure D.1. The frontier defines the full capacity output given the level of fixed inputs. With constant returns to scale, the frontier is defined by point C for all points along the frontier, with all other points falling below the frontier (hence indicating capacity underutilization). With variable returns to scale, the frontier is defined by points A, C and D, and only point B lies below the frontier i.e. exhibits capacity underutilization. The capacity output corresponding to variable returns to scale is lower than the capacity output corresponding to constant returns to scale.

Figure D.1 - CRS and VRS frontiers

As with the SPF analysis, the measure of capacity utilization is estimated as the ratio of the actual output to the frontier level of output. With the exception of point C (which has a capacity utilization of 100 percent under both assumptions), the measure of capacity utilization is lower (i.e. more underutilization) for each point when assuming constant returns to scale than when assuming variable returns to scale. Even for point B, O₁/O₃ < O₁/O₂.

Hence, assuming a CRS frontier is likely to result in a greater estimate of capacity output and a lower estimate of capacity utilization than assuming a VRS frontier. As there are a priori reasons for assuming variable returns to scale in fisheries it is recommended that the latter be used, and the results treated as lower bounds for capacity output and upper bounds for capacity utilization.

Input and output orientations

A range of DEA models have been developed that measure efficiency and capacity in different ways. These largely fall into the categories of being either input-oriented or output-oriented models.

With input-oriented DEA, the linear programming model is configured so as to determine how much the input use of a firm could contract if used efficiently in order to achieve the same output level. For the measurement of capacity, the only variables used in the analysis are the fixed factors of production. As these cannot be reduced, the input-oriented DEA approach is less relevant in the estimation of capacity utilization. Modifications to the traditional input-oriented DEA model, however, could be done such that it would be possible to determine the reduction in the levels of the variable inputs conditional on fixed outputs and a desired output level.

In contrast, with output-oriented DEA, the linear programme is configured to determine a firm’s potential output given its inputs if it operated efficiently as firms along the best practice frontier. This is more analogous to the SPF approach, which estimated the potential output for a given set of inputs and measured capacity utilization as the ratio of the actual to potential output, and is consistent with the illustration of the method in Figure D.1. Output-oriented models are “...very much in the spirit of neo-classical production functions defined as the maximum achievable output given input quantities” (Färe, Grosskopf and Lowell, 1994, p. 95).

Mathematical specification of the DEA approach

Technically speaking, DEA is an approach rather than a model. Unlike the SPF model where the parameter estimates represent the production elasticities, the resultant weights associated with the input variables have no economic interpretation.^[51] They simply define the relative contribution of reference points on the frontier to the estimation of efficient or capacity output for the point under examination. As a result, it is a method for estimating efficiency and capacity utilization, but does not impart any useful information on the production processes involved in the fishery. Models can be developed, however, to assess allocative and scale efficiencies, congestion, and overall economic efficiency (Färe, Grosskopf and Kirkley, 2000). Linear programming (LP) models are developed to undertake the DEA, and for the purposes of simplicity, these can be referred to as DEA LP models.

An output-oriented approach is generally more appropriate for the estimation of capacity and capacity utilization. Following Färe, Grosskopf and Kokkelenberg (1989), and Färe, Grosskopf and Lowell (1994) the output-oriented DEA LP model of capacity output given current use of inputs is given as:

(1)

where is a scalar showing by how much the production of each firm can increase output, u_j,m is amount of output m by firm j, x_j,n is amount of input n used by boat j and z_j are weighting factors. Inputs are divided into fixed factors, defined by the set, and variable factors defined by the set . To calculate the measure of capacity output, the bounds on the sub-vector of variable inputs, , need to be relaxed. This is achieved by allowing these inputs to be unconstrained through introducing a measure of the input utilization rate (), itself estimated in the model for each boat j and variable input n (Färe, Grosskopf and Lovell, 1994). The restriction allows for variable returns to scale.^[52]

Capacity output based on observed outputs (u*) is defined as multiplied by observed output (u). Implicit in this value is the assumption that all inputs are used efficiently as well as at their optimal capacity. From this, technically efficient capacity utilization (TECU) based on observed output (u) is:

. (2)

The measure of TECU ranges from zero to 1, with 1 being full capacity utilization (i.e. 100 percent of capacity). Values less than 1 indicate that the firm is operating at less than full capacity given the set of fixed inputs.

Implicit in the above is a downwards bias because observed outputs are not necessarily being produced efficiently (Färe, Grosskopf and Lovell, 1994). As with the SPF measure of capital utilization, an unbiased measure of capacity utilization is calculated as the ratio of technically efficient output to capacity output.

The technically efficient level of output requires an estimate of technical efficiency of each boat, and requires both variable and fixed inputs to be considered. The output orientated DEA model for technically efficient measure of output is given as:

(3)

where F₂ is a scalar outcome showing how much the production of each firm can increase by using inputs (both fixed and variable) in a technically efficient configuration. In this case, both variable and fixed inputs are constrained to their current level (i.e. the equality constraint on the output orientated model of capacity has been relaxed). Again, the restriction is imposed to allow for variable returns to scale.

In this case, F₂ represents the extent to which output can increase through using all inputs efficiently. From this, technical efficiency is estimated as:

. (4)

The measure of technical efficiency ranges from one to infinity; F₂ - 1.0 is the proportion by which outputs may be expanded. Some existing software and articles, however, report the value of TE as one over F₂ (see for example, Coelli, Rao and Battese, 1998). Values of the ratio (Eq. 4) less than 1 indicate that, even if all current inputs (both variable and fixed) were used efficiently, output is less than potential output. That is, output could increase through efficiency gains, without changing the levels of the inputs.

The unbiased estimate of capacity utilization is consequently estimated by:

. (5)

As, the estimate of CU ³ TECU. Dividing the level of output by the corrected measure of capacity utilization produces lower but unbiased estimates of capacity output.

Categorical variables

A key factor affecting the level of fishery production is the size of the stock. This is effectively an exogenous variable (also known as non-discretionary variable) as it is beyond the control of the fishers to modify their use of the stock input, other than through exploiting it harder by spending more days fished. Where information on stock is available, such as an index of stock abundance, then this can be directly incorporated into the analysis and treated the same as other fixed inputs.

A difficulty arises, however, when stock information is not available. In such a case, the analysts have two options. The first option is to ignore stock changes between time periods, as was the case in the Nigerian example using the peak-to-peak and SPF approaches. In such a case, the measure of capacity and capital utilization may be distorted, as actual output may be low due to low stock abundance rather than due to under-utilization of capacity. Where only a time series of aggregate data is available, then this may be the only option.

Where a time series of cross sectional data are available (i.e. data on individual vessels with several observations per vessel over time, also known as panel data), then it is more appropriate to treat stock (and time) as a categorical variable. Boats operating in the same time period will be subject to the same stock conditions. As a result, a direct comparison of these boats is possible. Conversely, it is not possible to compare boats across periods, as the output will be affected also by different stock conditions. In such a case, the measure of capacity in periods of low abundance will be over-estimated. In treating stock (and time) as a categorical variable, only boats that operate under the same conditions are compared. This requires undertaking several analyses, one for each time period (i.e. each period is treated as a separate category). Measures of capacity output are more reliable relative to the implicit stock abundance in that period. These measures can be consistently aggregated over time if necessary, e.g. monthly estimates of capacity can be aggregated to provide annual estimates of capacity.

Effects of random variations on estimates of capacity and capacity utilization

A shortcoming of the DEA approach is that the results may be unduly influenced by random events. Fisheries are often considered to be highly stochastic (i.e. subject to random fluctuations) because of the susceptibility to environmental fluctuations. Further, as the fishery resource is unseen and must effectively be found before it can be harvested, some fishers may be ‘lucky’ and find a large school of fish while others may by ‘unlucky’ and find few if any fish. The SPF approach filters out these random fluctuations to a large extent when estimating capacity utilization; at the same time and like all regression procedures, outliers and the central tendency of the data influence the SPF parameter estimates. Without proper examination of the data prior to estimation, estimates derived from the SPF or any regression procedure may based on a limited number of observations (e.g. the case in which it is determined that the SPF is the appropriate specification with all the data, but the SPF is rejected when one observation (outlier) is omitted).

The effects of these random fluctuations on the estimates of capacity and capacity utilization using the DEA approach are illustrated in Figure D.2. The frontier depicted in Figure D.2 is essentially the same as illustrated in Figure D.1, with the exception that an additional point E has been added. Under normal circumstances (i.e. normal operating conditions), the firm at point E would produce an output at point E*. However, due to some random event, it managed to produce at point E. The effect of this is to shift the frontier to a higher level, changing the estimated capacity and capacity utilization measures for those points not on the frontier. For example, the capacity output for firm B on the ‘new’ frontier is greater than the ‘true’ frontier, and the level of capacity utilization would therefore be lower. Similarly, firm D, which is on the original frontier (i.e. 100 percent capacity utilization), is now considered to be under-utilizing its capacity.

Figure D.2 - Effects of random variation on capacity utilization

Empirical testing of the DEA methodology using artificial data sets has shown that the distortion in capacity estimates is proportional to the degree of random variation in the data. However, the unbiased estimate of capacity utilization (given in (5) earlier) is not greatly affected by the amount of variation, as the estimated value of F₁ and F₂ are both (almost) equally affected, and thus the ratio of the two measures is not substantially distorted (Holland and Lee, 2002). Hence, a reliable estimate of capacity output can be derived by using the actual catch data and the unbiased estimate of capacity utilization.

Software

DEA can be undertaken using any linear programming package, including fairly basic packages such as the optimization facilities generally found in spreadsheet packages. However, as the analysis has to be repeated for each observation, using simple linear programming algorithms may be time consuming, particularly if a large number of data points are available. Mathematical programming packages such as GAMS (General Algebraic Modelling System, Brooke, Kendrick and Meerhaus, 1992) have the advantage that loops can be written into the model to repeat the analysis for every observation. However, this requires understanding of the modelling language.

A range of software has been developed specifically to undertake DEA analysis. These are generally user-friendly packages that make estimating efficiency and capacity utilization relatively straightforward. These include DEA-Solver (Cooper, Seiford and Tone, 2000), which is an add-on to Microsoft Excel, On-front (Färe and Grosskopf, 1999) and DEAP (Coelli, 1996b). The latter package is freely available over the Internet (<http://www.une.edu.au/econometrics/cepa.htm>) from the Centre for Efficiency and Productivity Analysis, University of New England, Australia. User guides and examples are also provided when downloading the software. Information on On-Front is also available over the internet (http://www.emq.com) from Economic Measurement and Quality.

Example of use: Nigerian artisanal fishery

The Nigerian data used in the peak-to-peak and SPF analysis were also used to estimate capacity utilization and capacity output using the DEA approach. As with the SPF analysis, both the number of canoes and average crew per canoe were used as inputs in the analysis, with one aggregated output measure.

The analysis was undertaken using the DEAP programme (Coelli, 1996b). For the purpose of the estimation of capacity utilization, each observation was assumed to occur in the same time period. As time is a categorical variable, a separate analysis of one observation in each time period would result in every observation being at full capacity (as there are no other observations against which the output can be compared). This differs from the SPF analysis, where the time element could be directly factored into the analysis.^[53] As with the other two analyses (i.e. SPF and peak-to-peak), the absence of stock information results in the implicit assumption that stock has not changed, and that any change in catch rate is attributable to changes in capacity utilization.

Estimates of capacity utilization were obtained assuming both constant and variable returns to scale (Table D.1). As would be expected, the CRS analysis resulted in lower estimates of capacity utilization and greater estimates of capacity output than the VRS analysis. Further, when variable returns to scale were assumed, most years were found to reflect “operation at full capacity”.

The results of the three analyses are compared in Figure D.3. Only the VRS results are presented for the DEA analysis. From this, it can be seen that the SPF and DEA results are identical for all but the last 6 years of the data. The DEA estimates of capacity were also generally the most conservative over the period examined. In contrast, the peak-to-peak estimates of capacity were substantially greater than the other two methods over the period 1977 to 1987. This is largely an artefact of the peak in 1982, which resulted in a relatively high apparent rate of technological progress being imposed on the estimates over the period 1977 to 1982. While subsequent “technological change” was negative (most likely reflecting a decline in the stocks) the capacity catch rate did not converge with the actual catch rate until the next peak in 1988. Hence, unusually high catch rates can have longer lasting effects when using peak-to-peak analysis than with the other two techniques.

Figure D.3 - Comparison of capacity measures: peak-to-peak, SPF and DEA

Table D.1 - Capacity utilization and output, Nigerian artisanal fishery

Year	Production	Capacity utilization		Capacity output
		CRS	VRS	CRS	VRS
1976	327 561	1	1	327 561	327 561
1977	331 280	0.999	0.999	331 612	331 612
1978	336 138	1	1	336 138	336 138
1979	356 888	0.998	1	357 603	356 888
1980	274 158	0.974	0.978	281 476	280 325
1981	323 916	0.987	0.989	328 182	327 519
1982	377 683	1	1	377 683	377 683
1983	376 984	0.994	1	379 260	376 984
1984	246 784	0.991	1	249 025	246 784
1985	140 873	0.948	1	148 600	140 873
P986	160 169	0.959	1	167 017	160 169
1987	145 755	0.952	1	153 104	145 755
1988	185 181	0.971	1	190 712	185 181
1989	171 332	0.964	0.993	177 730	172 540
1990	170 459	0.964	1	176 825	170 459
1991	168 211	0.963	0.992	174 674	169 568
1992	184 407	0.97	1	190 110	184 407
1993	106 276	0.926	0.957	114 769	111 951
1994	124 117	0.939	0.967	132 180	128 353

Example of use: multispecies fisheries in the English Channel

This example is drawn from Pascoe, Coglan and Mardle (2001). The study examined two fleet segments in the English Channel - an otter trawl fleet and a static gear fleet that used a combination of both gillnets and long lines. Both fleet segments targeted the same set of species, but their catch composition varied as a result of the different gear types. The example illustrates how capacity of two different fleet segments can be estimated and the results combined to produce an overall estimate of capacity output in a heterogeneous fishery.

A multi-output DEA analysis was undertaken with the catch of the main target species (cod, cuttlefish, hake, ling, monk, plaice, sole and whiting) included as separate outputs in the analysis. In addition, all other species were aggregated into an “other” category. While the target species formed the minority of the catch by weight, they generally formed a significant part of the value of the total catch. Further, most of the target species are subject to quota control and are of main interest to fisheries managers (e.g. cod, hake, monk, plaice, sole and whiting).

The key inputs used in the analysis were days fished, length and breadth of boat and engine power (kW) (Table D.2). Fixed inputs included length, breadth and engine power, and were assumed to represent the capital input into the fishery. Variable inputs only included days fished. While data on labour employed were available, these were only available on an annual basis. Hence, they would have effectively formed part of the fixed factors of production. They were excluded from the analysis as, in practice, labour is a variable input. Data on catch and days fished were available on a monthly basis over a 12 month period (1995).

Inputs were relatively similar between otter trawlers and netter-liners. Netter-liners fished, on average, approximately two days less a month than otter trawlers. Otter trawlers tended to have, on average, physically bigger boats (in terms of length times breath), although netter-liner boats had on average larger engines. There is no a priori reason why this would be the case.

Table D.2 - Key inputs for otter trawlers and netter-liners

	Variable	Fixed
	Days fished	Length	Width	Kw
Otter Trawlers
· Average	14.0	13.27	4.66	157.7
· Maximum	34	23.16	6.34	373
· Minimum	1	10.33	3.62	28
Netter-liners
· Average	11.9	12.29	4.34	171.2
· Maximum	31	23.82	5.79	442
· Minimum	1	10.4	3.5	55

Catch composition changes over the year due to different patterns of seasonal abundance. However, information on the stock conditions in each month was not available, so a stock variable could not be included in the analysis. To allow for variations in availability, the DEA model was run categorically. That is, the model was run separately for each month, so that only boats that fished in the same month would be compared. It was assumed that stock abundance was relatively constant over the month so that the timing of fishing did not affect the catch composition. Spatial variations in catch composition were also not considered. The analysis is limited to one area of the Channel (the western half) and it is assumed that species abundance did not vary substantially across this area.

The model was also run separately for the two fleet segments such that otter trawlers were not directly compared to netter-liners. A combined analysis would have required the assumption of a common production process, which clearly is not realistic. Capacity output was estimated at the individual vessel level (based on the observed catch and the estimated capacity utilization and technical efficiency measures) and aggregated to the fleet level. From this, aggregate measures of capacity utilization can be derived from the aggregated actual and capacity output estimates.

From the model output, capacity utilization (CU) varied considerably by species and between the two fleet segments examined (Table D.3). For most species, the otter trawlers were operating at less than 90 percent capacity (e.g. cod, hake and ling) and for some species less than 80 percent capacity (e.g. cuttlefish, plaice and whiting). However, much of this underutilization of capacity arose out of using the inputs inefficiently rather than not using enough variable inputs. If the inputs had been used efficiently, then the unbiased capacity utilization for the target species would have been greater than 90 percent. In contrast, the netter-liner fleet segment was generally operating at above 90 percent capacity, and if inputs were used efficiently, would be operating at almost 100 percent capacity for most of the target species.

Table D.3 - Estimated capacity output (tonnes) and capacity utilization by species

	Observed output	TE Capacity output	TE output	CU 1/q₁	Unbiased CU q₂/q₁	Unbiased capacity output
	a	b	c	a/b	c/b	a/(c/b)
Otter trawlers
Cod	89.5	108.3	98.2	0.83	0.91	98.4
Cuttlefish	472.2	649.6	596.4	0.73	0.92	513.3
Hake	15.1	17.5	16.0	0.86	0.91	16.6
Ling	33.8	38.1	35.7	0.89	0.94	36.0
Monk	218.4	260.1	237.3	0.84	0.91	240.0
Plaice	121.7	158.0	144.6	0.77	0.91	133.7
Sole	15.2	18.6	17.1	0.82	0.92	16.5
Whiting	650.6	822.0	757.0	0.79	0.92	707.2
Other	2 449.4	3 550.4	3 038.5	0.70	0.86	2 906.3
Netter-liners
Cod	38.0	41.2	40.4	0.92	0.98	38.8
Cuttlefish	25.3	26.7	25.7	0.95	0.96	26.4
Hake	3.8	3.9	3.8	0.98	0.99	3.8
Ling	84.6	88.2	86.9	0.96	0.99	85.5
Monk	57.5	59.3	58.8	0.97	0.99	58.1
Plaice	8.5	9.2	8.8	0.92	0.96	8.9
Sole	3.4	3.5	3.4	0.98	0.99	3.4
Whiting	59.3	66.1	63.0	0.90	0.95	62.4
Other	786.7	894.2	856.4	0.88	0.96	819.5

While the fleets are significantly different in their operations, the capacity output from the two fleets can be aggregated at the species level (as the species is a homogenous output). The combined capacity output and derived capacity utilization for the two fleet segments is presented in Table D.4 From this, overall unbiased capacity utilization averages out at between 88 percent for the ‘other’ species, and between 92 and 97 percent for the key species examined.

The purpose in this example was to demonstrate how aggregate measures of capacity can be derived for individual species that are exploited by more than one fleet segment and in different combinations with other species. Further, the capacity measures can be aggregated over several fisheries provided each fishery is estimated separately. The resultant set of information can be compared to overall target capacity measures for the species. Further, the fleet level information provides guidance as to which fleet segments exploiting the fishery may be in most need of capacity management measures.

Table D.4 - Combined capacity output (tonnes) and capacity utilization by species

	Observed output	TE Capacity output	TE output	CU 1/q₁	Unbiased CU q₂/q₁	Unbiased capacity output
Cod	127.5	149.5	138.6	0.85	0.93	137.1
Cuttlefish	497.5	676.3	622.1	0.74	0.92	539.6
Hake	18.9	21.4	19.8	0.88	0.93	20.4
Ling	118.4	126.3	122.6	0.94	0.97	121.4
Monk	275.9	319.4	296.1	0.86	0.93	298.1
Plaice	130.2	167.2	153.4	0.78	0.92	142.6
Sole	18.6	22.1	20.5	0.84	0.93	20.0
Whiting	709.9	888.1	820	0.80	0.92	769.6
Other	3 286.1	4 444.6	3 894.9	0.74	0.88	3 725.8

^[51] Specific functional forms, however, can be estimated via DEA. For example, it is possible to specify a Cobb-Douglas specification or even a second-order translog specification and estimate the parameters by DEA (see, for example, Färe et al., (1993) and Charnes et al. (1994).
^[52] In contrast, excluding this constraint implicitly imposes constant returns to scale while Sz_j£ 1 imposes non-increasing returns to scale (Färe, Grosskopf and Kokkelenberg, 1989).
^[53] Using a windows technique, which is based on the use of moving averages, different time periods can be included in the analysis (Charnes et al., 1995).