Confessions of a Statistical Paleontologist
Allow me to introduce myself as Hofstra's resident paleontologist. Like all paleontologists, I study the natural history of the Earth through the data provided by fossils. Most people assume, as do the song lyrics to the right, that this entails digging up the bones of prehistoric beasts. I will never forget the look of abject disappointment on my mother's face the day I informed her, some number of years into my Ph.D. research, that I did NOT dig up dinosaurs. "Well then," she asked, "what do you study?" "I count fossils," I told her. I could see that this was not a very satisfying answer. Traditionally, paleontologists have a specialization, a taxonomic group or a geologic time period that they are expert in; I do not.
My interests lie in the fossil record itself. Fossils and their entombing sediments are the remains of the habitats, communities, and ecosystems of the biological past. It is a source of wonder to paleontologists that the strata of the Earth should preserve such a record at all, but at the same time the fossil record is frustrating because it can be very difficult to accurately interpret. The study of all that happens to a living community of organisms as they are transformed into fossils is called taphonomy (literally, "the law of burial"). Taphonomic processes selectively remove some organisms via decay, disarticulate and disperse others, average together cohorts of individuals through time, and generally disturb, destroy, and scramble the intricate details and relationships of the living world. To read the fossil record at face value is naïve, particularly if you are trying to reconstruct the complexities of biological communities and ecological interactions. This was the challenge that caught my attention starting out as a young scientist – how can we extract and more reliably interpret the ecological data preserved in a paleoecological (i.e., fossil) assemblage?
Measuring Ecological Stability in the Fossil Record
Early in my graduate school career, I took an excellent course in statistics offered through the animal science program at Virginia Tech. Up to that point I had never considered myself much inclined toward math, but I had something of an epiphany studying statistics. Beyond knowing how to apply particular statistical techniques, if you understand the basic principles of statistical analysis, you gain powerful insight into what kinds of data are needed to answer research questions and how to look at data with a skeptical eye. It was not long before I began applying my insights from statistical analysis to the fossil record. For my Ph.D. research, I decided to look for evidence of ecological stability over geological spans of time. Several very well-respected paleontologists had observed that the same assemblages of marine shelly fossils seemed to recur in the fossil record over hundreds of thousands to millions of years, even after significant disruptions of their habitats. A few had even suggested that ecological interactions were stabilizing species assemblages, causing the same combinations of species to reassemble in similar abundances whenever the appropriate habitats were re-established. To test this, I headed out to the coal fields of Kentucky and southwest Virginia where thin intervals of shale with marine fossils deposited during times of exceptionally high global sea level were separated by thousands of feet of strata deposited by rivers during the intervening times of low sea level. Were the fossil assemblages the same from marine interval to marine interval, demonstrating ecological stasis, or did they change with each new drowning of the basin by marine waters and return of marine organisms? From my training in statistics, I realized that to assess how much fossil assemblages changed through time I needed to determine how variable they were at any one time from place to place. If the temporal variability was equal to or less than the spatial variability, then I could conclude that ecological communities had persisted relatively unchanged through time. As it turned out, I was able to show that in almost all of the marine habitats I tested, the fossil communities showed statistically significant change through time relative to their variability at any one time (Figure 1), with species dropping in and out of habitats and changing their numerical abundance in the fossil assemblages. There was little evidence for ecological interactions controlling community structure. I was already teaching at Hofstra when the results of this work began appearing in print (Bambach and Bennington, 1996; Bennington and Bambach, 1996).
"From my experience sampling fossils in the field, I was convinced that fossil assemblages were "patchy," meaning that they were spatially variable even at small scales (I had seen repeatedly how two samples from the same fossil horizon just a few feet apart could contain quite different species abundances)."
"Wading along the streambanks with a team of Hofstra students, we collected bulk samples of the fossil horizon at a hierarchy of spatial scales, from one meter to several kilometers apart."
An Experimental Design Approach to the Fossil Record
It is a basic tenet of experimental design that observations must be replicated. A single measurement is not very informative because we don't know how representative it is of the quantity we are trying to measure. We make repeat observations to quantify the amount of error or bias in our measurements. Some error comes from random chance – the luck of the draw, called sampling error. Other sources of error are systematic – for example, a difference in conditions between the two times an experiment is run. There is also a question of how much replication is necessary to make reliable comparisons between experimental treatments. Not enough replicates, and your results lack statistical confidence and are biased; too many replicates, and you are wasting time, energy, and money. As I contemplated the results of my analysis of stability through time in fossil assemblages, I realized that these issues applied to any attempt to measure attributes of the fossil record. Most paleontologists were fixated on random sampling error and assumed that the solution to accurately quantifying the fossil record was to collect lots of fossils – the more the better. At the same time, there was an underlying assumption that fossiliferous layers were essentially homogeneous across large areas and that one large sample was all you needed to describe a fossil assemblage. From my experience sampling fossils in the field, I was convinced that fossil assemblages were "patchy," meaning that they were spatially variable even at small scales (I had seen repeatedly how two samples from the same fossil horizon just a few feet apart could contain quite different species abundances). Working with my friend Scott Rutherford, who was then at the University of Rhode Island, Graduate School of Oceanography, we wrote computer programs to simulate sampling from virtual fossil assemblages. By changing the number and size of our samples and changing our simulated fossil assemblages from homogeneous to spatially patchy, we were able to demonstrate that, for patchy fossil assemblages, replicate samples were critical for obtaining unbiased estimates of species abundances. Furthermore, we were able to show that the number of replicate samples was much more important than the total number of fossils collected for quantifying fossil assemblages and that most paleontologists were collecting too many fossils from too few places. Since the real work of paleontology comes in extracting and identifying individual fossils from a bulk sample of rock or sediment (collecting bulk samples is as easy as digging a hole) the results of our computer simulations provided valuable guidance to our colleagues studying fossil assemblages. To make our conclusions clear, we published this work with the attention-grabbing subtitle "How to get more statistical bang for your sampling buck" (Bennington and Rutherford, 1999).
Testing Sampling Theory Against the Fossil Record
Computer simulations are useful, but inherently less convincing than examples drawn from real data. So my next project was to follow my own advice and apply what I had learned about sampling to actual fossil assemblages. Fortunately, for several years I had been leading my undergraduate students on field trips to a pair of well-known fossil-collecting sites in central New Jersey, along the banks of Big Brook and Poricy Brook. Here, muddy sands of the Navesink Formation were deposited on the seafloor of the inner continental shelf during the Late Cretaceous Period (now exposed above sea level as the Atlantic Coastal Plain) and include a laterally extensive shell bed composed of extinct oyster, brachiopod, clam, and squid fossils. This layer is densely fossiliferous, and the condition of many of the shells shows that they accumulated over a long period of exposure on the seafloor. This is exactly the kind of fossil assemblage that paleontologists would expect to have been time averaged (laterally homogenized by the accumulation of shells over time) and was thus an ideal fossil layer in which to test my ideas about patchiness and sampling. Wading along the streambanks with a team of Hofstra students, we collected bulk samples of the fossil horizon at a hierarchy of spatial scales, from one meter to several kilometers apart. It took a couple of years to extract all the fossils from all the samples, indentify and count them, but finally I had the numbers to analyze and graph. Sure enough, the Navesink shell bed was patchy, despite being time averaged, even at the meter scale (Figure 2). Traditional bulk sampling would have failed to quantify the range of patches, leading to biased estimates of the overall species abundance distribution in the fossil assemblage at each of the two localities (Bennington, 2003). This field-based analysis of the structure of fossil assemblages has been widely cited by other paleontologists since it was published, particularly in the research reports of graduate students and newly minted Ph.D.s. I am also gratified to report that the authors of the third edition of the widely used textbook Principles of Paleontology saw fit to devote several pages to a discussion of this work in their chapter on paleoecology. There is no greater satisfaction for a scholar than knowing your work has been of use to others in answering their own research questions.
In fall 2007 and again in 2008, I was invited to the Smithsonian National Museum of Natural History to participate in a pair of workshops designed to bridge the gap between paleoecology and ecology and encourage more research combining data from the paleontological and neontological records. Because biological data usually span less than a decade, while paleontological data may incorporate hundreds to hundreds of thousands of years of time, it is important to understand how these different time scales relate. I was included in these deliberations because of my expertise in quantifying spatial and temporal scales in the fossil record. From these discussions emerged a short manifesto of the critical issues of scale in paleoecology that was published as a guide for future research (Bennington et al., 2009).
Although the intricacies of counting fossils have been the focus of most of my published research over the last two decades, other projects have also benefited from a statistical approach to fossil data. For example, I can now proudly tell my mother that I do study dinosaurs, thanks to the research interests of my former undergraduate student Christa Abatemarco. Taking advantage of an opportunity to examine the remarkable collection of dinosaur tracks preserved at Dinosaur State Park in Connecticut, we decided to measure the best-preserved trackways at the site and apply basic statistics to determine how variable measured parameters such as trackway orientation, footprint length, stride, and digit angle are from trackway to trackway. What we found was that there is a lot of variation from footprint to footprint within a trackway, making it difficult to statistically distinguish many of the trackways we measured (Figure 3). In other words, the 12 trackways we measured could have been made by as few as six individual dinosaurs! We also used the variation within trackway parameters to quantify uncertainties around estimates of speed derived from dinosaur trackways (Abatemarco and Bennington, 2009).
More recently, I have been collaborating with Dr. Sylvia Silberger in Hofstra's Department of Mathematics and some high school research students, including Anna Chung, to use fossil bivalves to estimate the size of the sampling domain (the total population of individuals from which the fossil shells in a sample were drawn). The basic idea here is that your odds of collecting both halves of an individual clam are greater if the sampling domain is small. If you can determine that you have, say, 10 matching clam shells out of a sample of 100 shells, then you can estimate the total number of clams from which your 100 shells were drawn. Although somewhat esoteric, this work has potential uses for quantifying how much spatial and temporal mixing a fossil shell bed has experienced (Figure 4). We recently presented some preliminary results from this study (Bennington, Silberger, and Chung, 2010), and the feedback we received from other paleontologists has provided some interesting avenues of inquiry for two new high school research interns.
Finally, I should also mention that replicate sampling is an important consideration in a new research project being conducted with Dr. E. Christa Farmer of Hofstra's Geology Department under the auspices of the Hofstra University Center for Climate Studies (HUCCS). I am assisting
Dr. Farmer in collecting sediment cores from locations across the Great South Bay along the inner barrier island shores of southern Long Island. Our aim is to develop a sedimentary record of the history of deposition in the bay for the purposes of detecting prehistoric hurricane events (powerful hurricanes wash sand from the ocean beaches, over the barrier island, into the bay marshes) and describing how the bay environment has changed over time. During our first field season, we collected four cores from the marsh behind Gilgo Beach in Babylon, New York. We found a significant amount of spatial variability in the sedimentary layering from core to core, demonstrating the potential hazards of trying to adequately describe any location with a single core. Again, replication is proving to be the key to sampling the record of the past.
Abatemarco, Christa, and Bennington,
J Bret. 2009. A reanalysis of footprints and trackways at the Dinosaur State Park megatracksite using basic statistical methods. Geological Society of America, Abstracts with Programs, 41 (7): 264.
Bambach, R. K., and Bennington, J Bret.
1996. Do communities evolve? A major question in evolutionary paleoecology. In Evolutionary Paleobiology. D. Jablonski, D. H. Erwin, and J. H. Lipps, eds., University of Chicago Press., pp. 123-160.
Bennington, J Bret. 2003. Transcending
patchiness in the comparative analysis of paleocommunities: A test case from the Upper Cretaceous of New Jersey. PALAIOS 18: 22-33.
Bennington, J Bret, and Bambach, R. K.
1996. Statistical testing for paleocommunity recurrence: Are similar fossil assemblages ever the same? Palaeogeography, Palaeoclimatology and Palaeoecology 127: 107-134.
Bennington, J Bret, Dimichele, W.A.,
Badgley, C., Bambach, R.K., Barrett, P.M., Behrensmeyer, A.K., Bobe, R., Burnham, R.J., Daeschler, E.B., Van Dam, J., Eronen, J.T., Erwin, D.H., Finnegan, S., Holland, S.M., Hunt,
G., Jablonski, D., Jackson, S.T., Jacobs, B.F., Kidwell, S.M., Koch, P.L., Kowalewski, M.J., Labandeira, C.C., Looy, C.V., Lyons, K., Novack-Gottshall, P.M., Potts, R., Roopnarine, P.D., Stromberg, C.A.E., Sues, H., Wagner, P.J., Wilf, P., and Wing, S.L. 2009.
Spotlight: Critical issues of scale in paleoecology. PALAIOS 24: 1-4.
Bennington, J Bret, and Rutherford, S. D.
1999. Precision and reliability in paleocommunity comparisons based on cluster-confidence intervals: How to get more statistical bang for your sampling buck. PALAIOS 14: 506-515.
Bennington, J Bret, Silberger, Silvia, and
Chung, Anna. 2010. Estimating the size of the sampling domain from the number of unique bivalved individuals in a paleontological sample. Geological Society of America, Abstracts with Programs, 42 (5): 139.
Foote, Michael, and Miller, Arnold. 2007.
Principles of Paleontology, 3rd Ed., W.H. Freeman and Company, 354 p.