DcorrX¶

class
hyppo.time_series.
DcorrX
(compute_distance='euclidean', max_lag=0, **kwargs)¶ Cross Distance Correlation (DcorrX) test statistic and pvalue.
DcorrX is an independence test between two (paired) time series of not necessarily equal dimensions. The population parameter is 0 if and only if the time series are independent. It is based upon energy distance between distributions.
The statistic can be derived as follows [1]:
Let \(x\) and \(y\) be \((n, p)\) and \((n, q)\) series respectively, which each contain \(y\) observations of the series \((X_t)\) and \((Y_t)\). Similarly, let \(x[j:n]\) be the \((nj, p)\) last \(nj\) observations of \(x\). Let \(y[0:(nj)]\) be the \((nj, p)\) first \(nj\) observations of \(y\). Let \(M\) be the maximum lag hyperparameter. The cross distance correlation is,
\[\mathrm{DcorrX}_n (x, y) = \sum_{j=0}^M \frac{nj}{n} Dcorr_n (x[j:n], y[0:(nj)])\]The pvalue returned is calculated using a permutation test.
 Parameters
compute_distance (
str
,callable
, orNone
, default:"euclidean"
)  A function that computes the distance among the samples within each data matrix. Valid strings forcompute_distance
are, as defined insklearn.metrics.pairwise_distances
,From scikitlearn: [
"euclidean"
,"cityblock"
,"cosine"
,"l1"
,"l2"
,"manhattan"
] See the documentation forscipy.spatial.distance
for details on these metrics.From scipy.spatial.distance: [
"braycurtis"
,"canberra"
,"chebyshev"
,"correlation"
,"dice"
,"hamming"
,"jaccard"
,"kulsinski"
,"mahalanobis"
,"minkowski"
,"rogerstanimoto"
,"russellrao"
,"seuclidean"
,"sokalmichener"
,"sokalsneath"
,"sqeuclidean"
,"yule"
] See the documentation forscipy.spatial.distance
for details on these metrics.
Set to
None
or"precomputed"
ifx
andy
are already distance matrices. To call a custom function, either create the distance matrix beforehand or create a function of the formmetric(x, **kwargs)
wherex
is the data matrix for which pairwise distances are calculated and**kwargs
are extra arguements to send to your custom function.max_lag (
int
, default:0
)  The maximum number of lags in the past to check dependence betweenx
and the shiftedy
. Also theM
hyperparmeter below.**kwargs  Arbitrary keyword arguments for
compute_distance
.
Methods Summary

Helper function that calculates the DcorrX test statistic. 

Calculates the DcorrX test statistic and pvalue. 

DcorrX.
statistic
(x, y)¶ Helper function that calculates the DcorrX test statistic.
 Parameters
x,y (
ndarray
)  Input data matrices.x
andy
must have the same number of samples. That is, the shapes must be(n, p)
and(n, q)
where n is the number of samples and p and q are the number of dimensions. Alternatively,x
andy
can be distance matrices, where the shapes must both be(n, n)
. Returns

DcorrX.
test
(x, y, reps=1000, workers=1)¶ Calculates the DcorrX test statistic and pvalue.
 Parameters
x,y (
ndarray
)  Input data matrices.x
andy
must have the same number of samples. That is, the shapes must be(n, p)
and(n, q)
where n is the number of samples and p and q are the number of dimensions. Alternatively,x
andy
can be distance matrices, where the shapes must both be(n, n)
.reps (
int
, default:1000
)  The number of replications used to estimate the null distribution when using the permutation test used to calculate the pvalue.workers (
int
, default:1
)  The number of cores to parallelize the pvalue computation over. Supply1
to use all cores available to the Process.
 Returns
Examples
>>> import numpy as np >>> from hyppo.time_series import DcorrX >>> np.random.seed(456) >>> x = np.arange(7) >>> y = x >>> stat, pvalue, dcorrx_dict = DcorrX().test(x, y, reps = 100) >>> '%.1f, %.2f, %d' % (stat, pvalue, dcorrx_dict['opt_lag']) '1.0, 0.01, 0'