THEORETICAL JUSTIFICATION OF THE NEGATIVE BINOMIAL DISTRIBUTION
We have derived the Poisson Distribution from the Binomial Distribution, and the necessary condition for the Binomial Distribution to hold is that the probability, p, of an event E shall remain constant for all occurrences of its context-events. Thus, this condition must also hold for the Poisson Distribution.
If, however, it is known that p is not constant in its context-events, another distribution known as the Negative Binomial Distribution (N.B.D.) may provide an even closer fit.
Suppose we have a Binomial Distribution for which the variance V,(x) = s2 = npq is greater than the mean m = np.
In such a case the following equalities/inequalities are held:
(i) npq > npBut np being positive, n must be negative also (writing n = -k).and
(ii) since p + q = 1, p must be negative, i.e.
The trouble about this type of distribution lies in the interpretation, for we have defined probability in such a way that its measure must always be a number lying between 0 and 1 and so, essentially positive. Again, since n(= -k) is the number of context-events how can it possibly be negative?
It is often found that observed frequency distributions are represented by Negative Binomial Distributions. This is theoretically justified when in frequency distributions the variance is greater than the mean.
This often arises when the probability of an event E does not remain constant for all occurrences of its context-events.1
1 The concentration of units varies between different parts of the population (non-randomly distributed throughout the whole population).From the above (ii) we have,
and , where
substituting we get
The parameters of the distribution are the arithmetic mean (m) and the exponent k.
Since the variance of the population is,
,
substituting
we get,
(iii)
The probability series of the N.B.D. is given by the expansions
The individual terms of are given by
By using the recurrence formula the individual terms of the series are,
and
Note that k is no longer the maximum possible number of individuals a sampling unit could contain, but is related to the Spatial distribution of the surveyed population (k is a measure of the heterogeneity of the distribution). Unlike the positive Binomial, k is not necessarily an integer in the Negative Binomial Distribution.
From above (iii) we have,
The above formula indicates that, the reciprocal of the exponent k, i.e., is a measure of the excess of variance or clumping of the individuals in the population. Specifically, as approaches zero and k approaches infinity, the distribution coverges to the Poisson series (s2 Þ m). Conversely, if clumping increases , 1 approaches infinity (k Þ 0) and the distribution converges to the Logarithmic Series.
Example:
The Table below gives the number of aquatic invertebrates on the bottom in 400 square units. Fit a Negative Binomial Distribution to the empirical data.
Number of aquatic invertebrates (x) |
0 |
1 |
2 |
3 |
4 |
5 |
Total |
Frequency (f) |
213 |
128 |
37 |
18 |
3 |
1 |
400 |
Estimated variance:
Calculated q:
or
0.81 = 0.68q, and q=1.19,
Calculated :
and
Estimated
Estimated probabilities:
Recurrence formula:Estimated theoretical frequencies (N.B.D.):P(x=0) = q-kTherefore,
Nx=0 = 400 × P(x=0) = 400 × 0.5365 = 214Testing goodness of fit:Nx=1 = 400 × P(x=1) = 400 × 0.3065 = 123
Nx=2 = 400 × P(x=2) = 400 × 0.1120 = 45
Nx=3 = 400 × P(x=3) = 400 × 0.0332 = 13
Nx=4 = 400 × P(x=4) = 400 × 0.0087 = 4
Nx=5 = 400 × P(x=5) = 400 × 0.0022 = 1
A problem that arises frequently in statistical work is the testing of comparability of a set of observed (empirical) and theoretical (N.B.D.) frequencies.
To test the hypothesis of goodness of fit of the N.B.D. to the empirical frequency distribution we calculate the value of
where
fi = empirical frequenciesThe estimated X2 - value is compared2 with the tabulated -value. The hypothesis is valid if X2 < , the hypothesis is discredited if X2 >
qi = theoretical frequencies
2 It should be noted that, since x2 curve is an approximation to the discrete x2 frequency function care must be exercised that the x2 test is used only when the approximation is good. Experience and theoretical investigations indicate that the approximation is usually satisfactory - provided that the frequencies of the class intervals are usually ³ 5 and that the number of classes in the frequency distribution are ³ 5.The following Table gives the empirical and theoretical frequencies of the previous example and the estimated X2 - value.
Table X2 test of goodness of fit N.B.D. to spatial distribution of aquatic invertebrates
|
Number of squares |
|
|
|
|
Number of aquatic invert. (x) |
Empirical frequencies (fi) |
Theoretical frequencies (qi) |
(fi + qi) |
|
Remarks |
0 |
213 |
214 |
-1 |
0.0047 |
|
1 |
128 |
123 |
+5 |
0.2033 |
|
2 |
37 |
45 |
-8 |
1.4222 |
|
3 |
18 |
13 |
+5 |
1.9231 |
|
4 |
4 |
5 |
-1 |
0.2000 |
combined |
|
|
|
|
X2 =3.7533 |
|
(n - degrees qf freedom, n =5 classes -(2 estimated parameters + 1)
Since , 3.7533 < 5.991 the hypothesis of goodness of fit is valid.
Note: A second estimate of k
From the above (iii) we have(See also Appendix II), and
A second estimate of k is given by
In the above example,
Transformations
Analysis of variance, correlation analysis, testing hypothesis and other statistical methods of analysis of data associated with the normal distribution are performed on the transformed counts (see Table below).
Transformations
Original distribution |
Special conditions |
Estimated parameters |
Transformation |
1. Poisson |
1, No counts less than 10 |
|
Replace x by |
2, Some counts less than 10 |
|
Replace x by |
|
2. Negative Binomial |
1. k greater than 5 |
- |
Replace x by |
2. k between 2 and 5 |
- |
Replace x by y = log(x+k/2) |
|
3. No zero counts |
|
Replace x by y = log x |
|
4. Some zero counts |
|
Replace x by y = log(x+1) |
As the derived mean is smaller than the arithmetic mean of the original counts before transformation, it is not comparable with arithmetic mean obtained by direct averaging. Therefore small adjustments have to be made to the derived means. (See section 10.4.4.).