An Introduction to the RecordTest Package

Introduction

The RecordTest package (Inference Tools in Time Series Based on Record Statistics) contains functions to visualize the behaviour of the record occurrence, functions to calculate a wide variety of distribution-free tests for trend in location, variation and non-stationarity, for change-point detection, and tools to prepare a time series in order to study its records.

Install RecordTest using

install.packages("RecordTest")

The introductory theory and summary for the package is at

help("RecordTest-package")

Here, the main purpose of the package is developed as well as an outline of the functions available in the package.

RecordTest has several functions that attempt to test the classical record model which assumes randomness in its variables, that is, they are continuous independent and identically distributed, by means of hypothesis tests and graphical tools.

Records in the 200-meter Olympic race

Sports Data

To begin with, RecordTest has a dataset Olympic_records_200m containing the record times time and record values value of the Olympic 200-meter, from 1900 to 2020. In this case, only the lower records are available.

Data preparation

The RecordTest functions need a complete series of observations to calculate its records. In order to apply these tools to the series of Olympic records, the function series_record is applied, which generates a series with the same records.

library(RecordTest)
library(ggpubr) # To join plots
data(Olympic_records_200m, package = "RecordTest")

or200m <- series_record(L_lower = Olympic_records_200m$time, 
                        R_lower = Olympic_records_200m$value,
                        Trows = 27)

Some tests and graphical tools

As a preview, the Olympic records series is drawn highlighting its lower records.

records(or200m, type = "points", alpha = c(1,0,1,0)) + ggplot2::ylab("seconds")

The graph below shows the number of accumulated lower records together with its confidence intervals under the null hypothesis, from which we see that the observed sample departs significantly since time t = 13, corresponding to the 1960 Olympics.

N.plot(or200m, record = c(0,1,0,0))

Due to the sample size is not very large, an exact one-sided test can be implemented based on the Poisson binomial distribution. The result is highly significant. The number of observed records is 12 while the expected under the null hypothesis is close to 4.

N.test(or200m, record = "lower", distribution = "poisson-binomial")
#> 
#>  Test on the number of lower records
#> 
#> data:  or200m
#> Poisson-binomial = 12, p-value = 1.5e-05
#> alternative hypothesis: true 'Poisson-binomial' is greater than 3.8915

Records in temperatures and global warming

Temperature Data

RecordTest has a benchmark temperature dataset TX_Zaragoza containing the time series TX of daily maximum temperature at Zaragoza (Spain), from 01/01/1951 to 31/12/2020 measured in tenths of a degree Celsius. In this case, the whole series is available.

Data preparation

As a preview, the temperature series TX_Zaragoza$TX is drawn highlighting its upper and lower records.

data(TX_Zaragoza, package = "RecordTest")
records(TX_Zaragoza$TX, alpha = c(1, 1, 1, 0.1))

A large number of upper records are observed in the first observations and very few lower records. The initial behaviour in the records appear due to the seasonal increment of temperature between January and July, because this series has a strong seasonal component and serial correlation.

To take these characteristics into consideration, splitting the observed series into M uncorrelated subseries is especially useful in the presence of serial correlation and seasonality. series_split splits TX_Zaragoza$TX into 365 subseries, each corresponding to each day of the year. series_uncor selects the larger number of columns or subseries that are not correlated with their adjacent columns.

TxZ365 <- series_split(TX_Zaragoza$TX, Mcols = 365)
TxZ <- series_uncor(TxZ365)
dim(TxZ)
#> [1] 70 76

Since the observed series is measured and rounded to tenths of a degree Celsius, ties can occur, which may cause a lower number of records to be identified than actually occurred. In particular, we observe that around 4% of the records are weak records.

series_ties(TxZ365)
#> $number
#>              Total             Strong               Weak Expected under IID 
#>               1969               1889                 80               1764 
#> 
#> $percentage
#> [1] 4.063
#> 
#> $percentage.position
#>       1       2       3       4       5       6       7       8       9      10 
#>  0.0000  2.4631  2.4590  0.0000  4.2105  6.4516  3.0769  1.6393  0.0000  3.4483 
#>      11      12      13      14      15      16      17      18      19      20 
#> 10.2564  3.7037  6.2500  2.4390 13.3333  0.0000  6.2500  9.0909 11.1111  0.0000 
#>      21      22      23      24      25      26      27      28      29      30 
#> 42.8571 22.2222  0.0000  0.0000  0.0000 16.6667  0.0000  5.5556 11.1111  8.3333 
#>      31      32      33      34      35      36      37      38      39      40 
#>  4.7619  7.1429 20.0000 18.1818 12.5000 14.2857  0.0000 10.0000  7.1429 10.5263 
#>      41      42      43      44      45      46      47      48      49      50 
#> 12.5000  6.2500  0.0000  8.3333  9.5238  0.0000  0.0000  0.0000  0.0000 25.0000 
#>      51      52      53      54      55      56      57      58      59      60 
#> 14.2857  0.0000  6.6667 16.6667  0.0000  0.0000  0.0000 20.0000  0.0000  0.0000 
#>      61      62      63      64      65      66      67      68      69      70 
#> 25.0000  3.8462  0.0000  5.5556  8.3333  0.0000  7.6923  0.0000  0.0000  0.0000

We could untie the possible records by adding a random value from a Uniform and independent distribution for each observation.

set.seed(23)
TxZ <- series_untie(TxZ)

Tests to detect change-points

Signs of climate change have not been present since the beginning of the series. We can implement a test to detect change-points in the series on a daily scale as follows.

change.point(ZaragozaSeries)
#> 
#>  Records test for single changepoint detection
#> 
#> data:  ZaragozaSeries
#> Kolmogorov = 2.08, p-value = 0.00035
#> alternative hypothesis: two.sided
#> sample estimates:
#> probable changepoint time 
#>                        36
change.point(ZaragozaSeries, weights = function(t) sqrt(t), 
  record = "d", simulate.p.value = TRUE, B = 10000)
#> 
#>  Records test for single changepoint detection with weights = sqrt(t)
#>  with simulated p-value (based on 10000 replicates)
#> 
#> data:  ZaragozaSeries
#> Kolmogorov = 2.35, p-value = 1e-04
#> alternative hypothesis: two.sided
#> sample estimates:
#> probable changepoint time 
#>                        36

The change-point is found at time 36 (1986). We can see analogous results for the annual mean temperature series, a change-point is significantly detected with time estimate 38 (1988).

test.result <- change.point(rowMeans(TxZ365, na.rm = TRUE))
test.result
#> 
#>  Records test for single changepoint detection
#> 
#> data:  rowMeans(TxZ365, na.rm = TRUE)
#> Kolmogorov = 3.74, p-value = 1.4e-12
#> alternative hypothesis: two.sided
#> sample estimates:
#> probable changepoint time 
#>                        38
records(rowMeans(TxZ365, na.rm = TRUE)) + 
  ggplot2::geom_vline(xintercept = test.result$estimate, colour = "red")

There are still more tools! Try them yourself.