DcorrX¶
-
class
hyppo.time_series.DcorrX(compute_distance='euclidean', max_lag=0, **kwargs)¶ Cross Distance Correlation (DcorrX) test statistic and p-value.
DcorrX is an independence test between two (paired) time series of not necessarily equal dimensions. The population parameter is 0 if and only if the time series are independent. It is based upon energy distance between distributions.
- Parameters
compute_distance (
str,callable, orNone, default:"euclidean") -- A function that computes the distance among the samples within each data matrix. Valid strings forcompute_distanceare, as defined insklearn.metrics.pairwise_distances,From scikit-learn: [
"euclidean","cityblock","cosine","l1","l2","manhattan"] See the documentation forscipy.spatial.distancefor details on these metrics.From scipy.spatial.distance: [
"braycurtis","canberra","chebyshev","correlation","dice","hamming","jaccard","kulsinski","mahalanobis","minkowski","rogerstanimoto","russellrao","seuclidean","sokalmichener","sokalsneath","sqeuclidean","yule"] See the documentation forscipy.spatial.distancefor details on these metrics.
Set to
Noneor"precomputed"ifxandyare already distance matrices. To call a custom function, either create the distance matrix before-hand or create a function of the formmetric(x, **kwargs)wherexis the data matrix for which pairwise distances are calculated and**kwargsare extra arguements to send to your custom function.max_lag (
int, default:0) -- The maximum number of lags in the past to check dependence betweenxand the shiftedy. IfNone, thenmax_lag=np.ceil(np.log(n)). Also theMhyperparmeter below.**kwargs -- Arbitrary keyword arguments for
compute_distance.
Notes
The statistic can be derived as follows 1:
Let \(x\) and \(y\) be \((n, p)\) and \((n, q)\) series respectively, which each contain \(y\) observations of the series \((X_t)\) and \((Y_t)\). Similarly, let \(x[j:n]\) be the \((n-j, p)\) last \(n-j\) observations of \(x\). Let \(y[0:(n-j)]\) be the \((n-j, p)\) first \(n-j\) observations of \(y\). Let \(M\) be the maximum lag hyperparameter. The cross distance correlation is,
\[\mathrm{DcorrX}_n (x, y) = \sum_{j=0}^M \frac{n-j}{n} Dcorr_n (x[j:n], y[0:(n-j)])\]The p-value returned is calculated using a permutation test.
References
- 1
Cencheng Shen, Jaewon Chung, Ronak Mehta, Ting Xu, and Joshua T Vogelstein. Independence testing for temporal data. Transactions on Machine Learning Research, 2024.
Methods Summary
|
Helper function that calculates the DcorrX test statistic. |
|
Calculates the DcorrX test statistic and p-value. |
-
DcorrX.statistic(x, y)¶ Helper function that calculates the DcorrX test statistic.
- Parameters
x,y (
ndarrayoffloat) -- Input data matrices.xandymust have the same number of samples. That is, the shapes must be(n, p)and(n, q)where n is the number of samples and p and q are the number of dimensions. Alternatively,xandycan be distance matrices, where the shapes must both be(n, n).- Returns
-
DcorrX.test(x, y, reps=1000, workers=1, random_state=None)¶ Calculates the DcorrX test statistic and p-value.
- Parameters
x,y (
ndarrayoffloat) -- Input data matrices.xandymust have the same number of samples. That is, the shapes must be(n, p)and(n, q)where n is the number of samples and p and q are the number of dimensions. Alternatively,xandycan be distance matrices, where the shapes must both be(n, n).reps (
int, default:1000) -- The number of replications used to estimate the null distribution when using the permutation test used to calculate the p-value.workers (
int, default:1) -- The number of cores to parallelize the p-value computation over. Supply-1to use all cores available to the Process.
- Returns
Examples
>>> import numpy as np >>> from hyppo.time_series import DcorrX >>> np.random.seed(456) >>> x = np.arange(7) >>> y = x >>> stat, pvalue, dcorrx_dict = DcorrX().test(x, y, reps = 100) >>> '%.1f, %.2f, %d' % (stat, pvalue, dcorrx_dict['opt_lag']) '1.0, 0.05, 0'