Graeme T. Lloyd - Contributions, Data, Methods & Library

Methods > Generalized differencing of time series

Generalized differencing of time series

Purpose

To difference time series data prior to correlation.

Data required

Technically only a single time series of (e.g. species richness, number of geologic formations etc.) as well as the dates (in millions of years) for each value, although in reality you will wish to pick at least two time series so that they can be correlated after differencing.

Description

Much of palaeontological data can ultimately be expressed or plotted as a time series, i.e. some measure on the y-axis and time on the x-axis. We may further be interested in knowing whether any two time series co-vary, or more specificially whether they are correlated. However, it isn't safe to just correlate time series together as there is a strong danger of false positives (type I errors). This is because time series will often show some form of overall trend (rising/falling) and and so simple correlation will suggest a positive or negative relationship between the two series. This can lead to some spurious conclusions, my favourite example of which is the trends in global tempertaure and number of pirates. Instead workers have employed some form of differencing - i.e. instead of looking at values within time bins, they use the differences between time bins. The simplest form of this is first differencing which uses simply the absolute change from bin-to-bin as a new time series and uses these for correlation instead (use the 'diff' function in R to do this). However, first differencing can go too far the other way as any signal of a real trend in the data can also be removed. A compromise between these two extremes is the generalized differencing approach of McKinney (1990). This approach explicitly incoporates both any trend in the data (using a linear model) and the differences from it. It has already found favour in the literature and is even used by John Alroy (see his ten commandments!).

Implementation

1. Install and run R (it's free!).

2. Copy and paste the following line into R and hit enter (this will load the necessary function):

    source("http://www.graemetlloyd.com/pubdata/functions_2.r")

3. Enter your time series (here I re-use the divesity and proxy dinosaur data from Barrett et al. 2009):

    div<-c(0,0,0,0,0,1,1,1,2,9,
    9,9,5,5,5,11,11,11,10,10,
    10,12,12,12,8,9,8,5,5,5,
    7,7,8,12,12,15,15,15,15,9,
    9,9,37,37,38,31,30,30,3,3,
    3,5,5,5,4,4,5,7,6,6,
    7,9,10,14,12,13,8,6,7,5,
    2,2,3,3,3,4,4,4,9,11,
    10,13,13,14)

    proxy<-c(1,1,1,2,2,3,11,9,10,15,
    13,15,7,7,7,44,39,40,36,37,
    39,27,27,27,23,23,22,15,14,15,
    19,19,22,32,32,32,26,26,26,46,
    54,61,60,54,66,65,61,70,85,83,
    86,88,85,85,93,93,98,90,82,91,
    103,98,113,119,113,120,113,101,102,80,
    80,80,85,84,87,85,81,91,134,115,
    150,159,133,175)

4. Enter the time value (in Ma) for each bin and store it in a variable named "time" (here again, I use the time data of Barrett et al. 2009):

    time<-c(243.67,241.00,238.33,235.50,232.50,229.50,226.08,222.25,218.42,214.35,
    210.05,205.75,202.93,201.60,200.27,199.08,198.05,197.02,195.35,193.05,
    190.75,188.50,186.30,184.10,181.77,179.30,176.83,174.93,173.60,172.27,
    170.95,169.65,168.35,167.20,166.20,165.20,164.12,162.95,161.78,160.28,
    158.45,156.62,154.88,153.25,151.62,149.92,148.15,146.38,144.62,142.85,
    141.08,139.57,138.30,137.03,135.33,133.20,131.07,129.17,127.50,125.83,
    122.83,118.50,114.17,109.93,105.80,101.67,98.58,96.55,94.52,92.80,
    91.40,90.00,88.72,87.55,86.38,85.42,84.65,83.88,81.35,77.05,
    72.75,69.75,68.05,66.35)

5. We can now run the function, storing the results in two new variable called "gd.div" and "gd.proxy":

    gd.div<-gen.diff(div,time)

    gd.proxy<-gen.diff(proxy,time)

6. These can now be correlated in the normal way, for example:

    cor.test(gd.div,gd.proxy,method="spearman")

7. Compare this with the same correlation of the raw data:

    cor.test(div,proxy,method="spearman")

References cited

Barrett, P. M., McGowan, A. J. and Page, V. 2009. Dinosaur diversity and the rock record. Proceedings of the Royal Society, London B, 276, 2667-2674.

McKinney, M. L., 1990. Classifying and analyzing evolutionary trends. In Evolutionary Trends, McNamara K. J., ed. Belhaven, London, pp. 28-58.


	Research \| Contributions \| Matrices \| Methods \| Library \| Photos \| Links

	Methods > Generalized differencing of time series Generalized differencing of time series Purpose To difference time series data prior to correlation. Data required Technically only a single time series of (e.g. species richness, number of geologic formations etc.) as well as the dates (in millions of years) for each value, although in reality you will wish to pick at least two time series so that they can be correlated after differencing. Description Much of palaeontological data can ultimately be expressed or plotted as a time series, i.e. some measure on the y-axis and time on the x-axis. We may further be interested in knowing whether any two time series co-vary, or more specificially whether they are correlated. However, it isn't safe to just correlate time series together as there is a strong danger of false positives (type I errors). This is because time series will often show some form of overall trend (rising/falling) and and so simple correlation will suggest a positive or negative relationship between the two series. This can lead to some spurious conclusions, my favourite example of which is the trends in global tempertaure and number of pirates. Instead workers have employed some form of differencing - i.e. instead of looking at values within time bins, they use the differences between time bins. The simplest form of this is first differencing which uses simply the absolute change from bin-to-bin as a new time series and uses these for correlation instead (use the 'diff' function in R to do this). However, first differencing can go too far the other way as any signal of a real trend in the data can also be removed. A compromise between these two extremes is the generalized differencing approach of McKinney (1990). This approach explicitly incoporates both any trend in the data (using a linear model) and the differences from it. It has already found favour in the literature and is even used by John Alroy (see his ten commandments!). Implementation 1. Install and run R (it's free!). 2. Copy and paste the following line into R and hit enter (this will load the necessary function): source("http://www.graemetlloyd.com/pubdata/functions_2.r") 3. Enter your time series (here I re-use the divesity and proxy dinosaur data from Barrett et al. 2009): div<-c(0,0,0,0,0,1,1,1,2,9, 9,9,5,5,5,11,11,11,10,10, 10,12,12,12,8,9,8,5,5,5, 7,7,8,12,12,15,15,15,15,9, 9,9,37,37,38,31,30,30,3,3, 3,5,5,5,4,4,5,7,6,6, 7,9,10,14,12,13,8,6,7,5, 2,2,3,3,3,4,4,4,9,11, 10,13,13,14) proxy<-c(1,1,1,2,2,3,11,9,10,15, 13,15,7,7,7,44,39,40,36,37, 39,27,27,27,23,23,22,15,14,15, 19,19,22,32,32,32,26,26,26,46, 54,61,60,54,66,65,61,70,85,83, 86,88,85,85,93,93,98,90,82,91, 103,98,113,119,113,120,113,101,102,80, 80,80,85,84,87,85,81,91,134,115, 150,159,133,175) 4. Enter the time value (in Ma) for each bin and store it in a variable named "time" (here again, I use the time data of Barrett et al. 2009): time<-c(243.67,241.00,238.33,235.50,232.50,229.50,226.08,222.25,218.42,214.35, 210.05,205.75,202.93,201.60,200.27,199.08,198.05,197.02,195.35,193.05, 190.75,188.50,186.30,184.10,181.77,179.30,176.83,174.93,173.60,172.27, 170.95,169.65,168.35,167.20,166.20,165.20,164.12,162.95,161.78,160.28, 158.45,156.62,154.88,153.25,151.62,149.92,148.15,146.38,144.62,142.85, 141.08,139.57,138.30,137.03,135.33,133.20,131.07,129.17,127.50,125.83, 122.83,118.50,114.17,109.93,105.80,101.67,98.58,96.55,94.52,92.80, 91.40,90.00,88.72,87.55,86.38,85.42,84.65,83.88,81.35,77.05, 72.75,69.75,68.05,66.35) 5. We can now run the function, storing the results in two new variable called "gd.div" and "gd.proxy": gd.div<-gen.diff(div,time) gd.proxy<-gen.diff(proxy,time) 6. These can now be correlated in the normal way, for example: cor.test(gd.div,gd.proxy,method="spearman") 7. Compare this with the same correlation of the raw data: cor.test(div,proxy,method="spearman") References cited Barrett, P. M., McGowan, A. J. and Page, V. 2009. Dinosaur diversity and the rock record. Proceedings of the Royal Society, London B, 276, 2667-2674. McKinney, M. L., 1990. Classifying and analyzing evolutionary trends. In Evolutionary Trends, McNamara K. J., ed. Belhaven, London, pp. 28-58.
Last updated 7th November 2008.