LjungBox

class hyppo.time_series.LjungBox(max_lag=0)

Ljung-Box for Cross Correlation (CorrX) test statistic and p-value.

Parameters

max_lag (int, default: 0) -- The maximum number of lags in the past to check dependence between x and the shifted y. If None, then max_lag=np.ceil(np.log(n)). Also the M hyperparmeter below.

Notes

The statistic can be derived as follows 1:

Let \(x\) and \(y\) be \((n, 1)\) and \((n, 1)\) series respectively, which each contain \(y\) observations of the series \((X_t)\) and \((Y_t)\). Similarly, let \(x[j:n]\) be the \((n-j, p)\) last \(n-j\) observations of \(x\). Let \(y[0:(n-j)]\) be the \((n-j, p)\) first \(n-j\) observations of \(y\). Let \(M\) be the maximum lag hyperparameter. The cross distance correlation is,

\[\mathrm{Ljung-Box}_n (x, y) = n(n+2)\sum_{j=1}^M \frac{ \rho^2(x[j:n], y[0:(n-j)])}{n-j}\]

where $rho$ is the Pearson correlation coefficient. The p-value returned is calculated either via chi-squared distribution or using a permutation test.

References

1

Ronak Mehta, Jaewon Chung, Cencheng Shen, Ting Xu, and Joshua T. Vogelstein. Independence Testing for Multivariate Time Series. arXiv:1908.06486 [cs, stat], May 2020. arXiv:1908.06486.

Methods Summary

LjungBox.statistic(x, y)

Helper function that calculates the Ljung-Box cross correlation test statistic.

LjungBox.test(x, y[, reps, workers, auto, ...])

Calulates the time-series test test statistic and p-value.


LjungBox.statistic(x, y)

Helper function that calculates the Ljung-Box cross correlation test statistic.

Parameters

x,y (ndarray of float) -- Input data matrices. x and y must have the same number of samples. That is, the shapes must be (n, 1) and (n, 1) where n is the number of samples.

Returns

  • stat (float) -- The computed Ljung-Box statistic.

  • opt_lag (int) -- The computed optimal lag.

LjungBox.test(x, y, reps=1000, workers=1, auto=True, random_state=None)

Calulates the time-series test test statistic and p-value.

Parameters
  • x,y (ndarray of float) -- Input data matrices. x and y must have the same number of samples. That is, the shapes must be (n, p) and (n, q) where n is the number of samples and p and q are the number of dimensions. Alternatively, x and y can be distance matrices, where the shapes must both be (n, n).

  • reps (int, default: 1000) -- The number of replications used to estimate the null distribution when using the permutation test used to calculate the p-value.

  • workers (int, default: 1) -- The number of cores to parallelize the p-value computation over. Supply -1 to use all cores available to the Process.

  • is_distsim (bool, default: False) -- Whether or not x and y are input matrices.

Returns

  • stat (float) -- The computed time-series independence test statistic.

  • pvalue (float) -- The computed time-series independence p-value.

  • null_dist (list) -- The time-series independence p-value.