Tom McMahon wrote:
>How does one calculate an error term for multiple population estimates? As
>an example, you want to test the hypothesis that fish abundance in two
>different habitat types was statistically different. You picked five
>samples of each habitat type and performed a depletion/removal estimate in
>each habitat. For each habitat sampled, you obtain a point population
>estimate with an associated variance. To compare population abundance
>among the two habitat types, it is straightforward to take the mean of the
>five population estimates, but my question is, how do you calculate the
>error term used in the statistical comparison? Using the variance of the
>five means would seemingly highly underestimate the true variance around
>each point estimate of abundance, but I'm unsure of the correct way to
>essentially take the 'mean' variance of the five individual estimates.
It is OK to use the variance of the five means as the error term. Call this
total variance. This total variance is actually the sum of two variance
Var total = Var process + Var enumeration.
The first component is due to process variance, which is true variation
among the sampled habitat units (i.e., each habitat unit, in truth,
contained a different number of animals). The second component of the
variance is enumeration variation, which is variation arising because you
did not count every animal in the sampled habitat units, you made an
estimate (i.e., you are not sure what the true population is in even one
sampled habitat unit).
Think of it this way. If the habitat units you sampled all, in truth, had
exactly the same number of animals (process variation = 0), your five
estimates would probably still be different because each is an estimate with
associated enumeration variation. Thus, the total variance observed among
the five units would be due all to enumeration variation. On the other
hand, if the habitat units sampled did differ in the number of animals, but
you were able to do a complete count in each unit (enumeration variation =
0), then the total variance would all be due to process variation. In
truth, the situation is somewhere in between: you have both process
variation and enumeration variation included in the total variance you
observe among the five estimates.
A conservative statistical test (i.e., one that would tend not to reject the
null hypothesis of no difference between habitat-types) would use the total
variance as the error term. If you do this test and it rejects, you can
make a good case that the two habitat types really do support different
numbers of fish. Try a t-test.
If the test doesn't reject, it may be due to low power, and the low power
may be due to high enumeration variation. In this case you would like to
subtract out the enumeration variation because it is inflating your error
term. You could do this because you have estimates of enumeration variation
(the variances of each of the 5 population estimates; these variances are
based on the statistical model [the removal model] that you used to generate
the estimates). Details can be found in Skalski and Robson (1992.
Techniques for wildlife investigations - design and analysis of capture
data. Academic Press). However, in almost all cases, it turns out that
most of the total variance is due to process variation, not enumeration
variation, so you don't get much more power in your test anyway. If your
simple t-test using the total variance doesn't reject, the problem is
probably that your sample size of habitat units is too small, not that you
did a poor job estimating abundance within each sampled unit.
In truth, what you have is a multi-stage sampling design. At the first
stage, you *randomly* selected habitat units from some larger population of
habitat units. At the second stage, you made an estimate of abundance in
each sampled habitat unit. Two stages, two sources of variation. Hankin
(1984. Multistage sampling designs in fisheries research: applications in
small streams. CJFAS 41:1575-1591) does a nice job of explaining all this.
These ideas are most important when designing a sampling program. The basic
question is, given limited time and money, should you put your effort into
sampling more habitat units with less efficiency in each (i.e., reducing
process variance at the expense of increased enumeration variation), or
should you put your effort into sampling fewer habitat units with greater
efficiency in each (i.e., increased process variance but with reduced
enumeration variation). Almost always, the best choice is more units with
less effort in each (see Hankin and Reeves 1988, CJFAS 45:834-844).
So, take your two sets of five estimates and do a t-test. If it rejects,
rejoice. If not, you probably will need to sample more habitat units.
Department of Biology and Environmental Studies Program
Ashland, VA 23005
email: [log in to unmask]