Research  |   Contributions  |   Matrices  |   Methods  |   Library  |   Photos  |   Links  
Methods > Removing sampling signal from diversity curves

Removing sampling signal from diversity curves


To remove sampling from a diversity curve.

Data required

Time series of both diversity (taxonomic richness) and a sampling proxy (e.g. formations, map area) as well as dates for each time bin.


Although reconstructing estimates of past diversity is a central goal of palaeobiology raw data can be misleading due to sampling and other biases. Raup (1972) made an early tabulation of such biases and set out two major ways in which they might be corrected. The first was subsampling, which has been explored thoroughly by John Alroy (Alroy et al. 2001, 2008; Alroy 2010a,b,c). The second was modelling, which has received comparatively little attention. A key paper, then, was that of Smith and McGowan (2007) who introduced a new modelling technique that assumes true diversity is constant such that observed diversity is driven purely by sampling. When this modelled estimate is subtracted from actual observed diversity then the residuals can be thought to represent a sampling-free diversity curve. Barrett et al. (2009) added a rudimentary significance test to the Smith and McGowan (2007) approach by treating the standard deviation of the residuals as a kind of confidence interval. However, this is flawed as it does not take into account how well or poorly the model fits. Most recently, Lloyd (2012) overhauled the Smith and McGowan method, by allowing non-linear relationships between the sampling proxy and diversity, implememting a more informative significance test and introducing a hinged-regression line to the resulting time series to elucidate medium-term diversity trends. This refined method has been applied to dinosaurs Lloyd (2012) and coccolithophores (Lloyd et al. 2011). Here a function that automates the Lloyd (2012) approach has been produced for use in the freely available statistical progamming language R.


1. Install and run R (it's free!).

2. Install the earth, nlme, paleoTS and plotrix libraries from within R, making sure to include any dependencies.

3. Copy and paste the following line into R and hit enter:


4. Enter your diversity data in order and store it as a variable named "div" (here, as an example, I use the sauropodomorph data of Barrett et al. 2009):


5. Enter your sampling proxy data in the same way and store it in a variable named "proxy" (here again, I use the dinosaur-bearing formation data of Barrett et al. 2009):


6. Finally, enter a time value (in Ma) for each bin and store it in a variable named "time" (here again, I use the time data of Barrett et al. 2009):


7. We can now run the function, storing the results in a new variable called "results":


8. The function returns five time series that can be accessed using '$' then the series name. So for example, to get the predicted value we can write:


This is the model-predicted value based on the assumption true diversity is constant and observed diversity is driven purely by the sampling proxy. The remaining four series are the upper and lower 95% confidence intervals based on standard error:


and standard deviation (recommended):


There is also information on the favoured model (y being diversity and x the sampling/rock proxy):


9. However, it is best to plot the data to visualise it by copying and pasting the following:

        xlab="Time (Ma)",ylab="Taxonomic richness")

This creates a graph that shows the modelled estimate of diversity in grey (first line), adds a dashed grey line for the standard error 95% confidence interval (second and third lines), adds a dash-dot grey line for the standard deviation 95% confidence interval (fourth and fifth lines) and plots the actual data in black (last line).

10. A better way to visualise things, however, is to look at the model-detrended richness. This can be done by copying and pasting the following:

        xlab="Time (Ma)",ylab="Model Detrended Taxonomic Richness")
    if(marsout$best.model != "Linear") {

This plots the model-detrended diversity as a grey polygon (first to fourth lines), adds a dashed line for the standard error 95% confidence interval (fifth and sixth lines), adds a dash-dot line for the standard deviation 95% confidence interval (seventh and eighth lines) and fits a hinged-regression (if optimal) to the model-detrended tme series (ninth to thirteenth lines).

References cited

Alroy, J. 2010a. The shifting balance of diversity among major marine animal groups. Science, 329, 1191-1194.

Alroy, J. 2010b. Fair sampling of taxonomic richness and unbiased estimation of origination and extinction rates. In: Alroy J, Hunt G (eds.) Quantitative Methods in Paleobiology. The Paleontological Society; New Haven, USA. 55-80.

Alroy, J. 2010c. Geographical, environmental and intrinsic biotic controls on Phanerozoic marine diversification. Palaeontology, 53, 1211-1235.

Alroy, J., Marshall, C. R., Bambach, R. K., Bezusko, K., Foote, M., Fursich, F. T., Hansen, T. A., Holland, S. M., Ivany, L. C., Jablonski, D., Jacobs, D. K., Jones, D. C., Kosnik, M. A., Lidgard, S., Low, S., Miller, A. I., Novack-Gottshall, P. M., Olszewski, T. D., Patzowsky, M. E., Raup, D. M., Roy, K., Sepkoski, J. J., Sommers, M. G., Wagner, P. J. and Webber, A. 2001. Effects of sampling standardization on estimates of Phanerozoic marine diversification. Proceedings of the National Academy of Sciences of the United States of America, 98, 6261-6266.

Alroy, J., Aberhan, M., Bottjer, D. J., Foote, M., Fursich, F. T., Harries, P. J., Hendy, A. J. W., Holland, S. M., Ivany, L. C., Kiessling, W., Kosnik, M. A., Marshall, C. R., McGowan, A. J., Miller, A. I., Olszewski, T. D., Patzowsky, M. E., Peters, S. E., Villier, L., Wagner, P. J., Bonuso, N., Borkow, P. S., Brenneis, B., Clapham, M. E., Fall, L. M., Ferguson, C. A., Hanson, V. L., Krug, A. Z., Layou, K. M., Leckey, E. H., Nurnberg, S., Powers, C. M., Sessa, J. A., Simpson, C., Tomasovych, A. and Visaggi, C. C. 2008. Phanerozoic trends in the global diversity of marine invertebrates. Science, 321, 97-100.

Barrett, P. M., McGowan, A. J. and Page, V. 2009. Dinosaur diversity and the rock record. Proceedings of the Royal Society, London B, 276, 2667-2674.

Lloyd, G. T. 2012. A refined modelling approach to assess the influence of sampling on palaeobiodiversity curves: new support for declining Cretaceous dinosaur richness. Biology Letters, 8, 123-126.

Lloyd, G. T., Smith, A. B. and Young, J. R. 2011. Quantifying the deep sea rock and fossil record bias using coccolithophores. In: Comparing the Rock and Fossil Records: Implications for Biodiversity, McGowan, A. J. and Smith, A. B. (eds.). Geological Society Special Publication 358, 167-177.

Raup, D. M. 1972. Taxonomic diversity during the Phanerozoic. Science, 177, 1065-1071.

Smith, A. B. and McGowan, A. J. 2007. The shape of the Phanerozoic diversity curve. How much can be predicted from the sedimentary rock record of Western Europe? Palaeontology, 50, 765-777.

Last updated 7th November 2008.