CCA¶

class
hyppo.independence.
CCA
¶ Cannonical Correlation Analysis (CCA) test statistic and pvalue.
This test can be thought of inferring information from crosscovariance matrices [1]. It has been thought that virtually all parametric tests of significance can be treated as a special case of CCA [2]. The method was first introduced by Harold Hotelling in 1936 [3].
The statistic can be derived as follows [4]:
Let \(x\) and \(y\) be \((n, p)\) samples of random variables \(X\) and \(Y\). We can center \(x\) and \(y\) and then calculate the sample covariance matrix \(\hat{\Sigma}_{xy} = x^T y\) and the variance matrices for \(x\) and \(y\) are defined similarly. Then, the CCA test statistic is found by calculating vectors \(a \in \mathbb{R}^p\) and \(b \in \mathbb{R}^q\) that maximize
\[\mathrm{CCA}_n (x, y) = \max_{a \in \mathbb{R}^p, b \in \mathbb{R}^q} \frac{a^T \hat{\Sigma}_{xy} b} {\sqrt{a^T \hat{\Sigma}_{xx} a} \sqrt{b^T \hat{\Sigma}_{yy} b}}\]The pvalue returned is calculated using a permutation test using
hyppo.tools.perm_test
.
Methods Summary

Helper function that calculates the CCA test statistic. 

Calculates the CCA test statistic and pvalue. 

CCA.
statistic
(x, y)¶ Helper function that calculates the CCA test statistic.

CCA.
test
(x, y, reps=1000, workers=1)¶ Calculates the CCA test statistic and pvalue.
 Parameters
x,y (
ndarray
)  Input data matrices.x
andy
must have the same number of samples and dimensions. That is, the shapes must be(n, p)
where n is the number of samples and p is the number of dimensions.reps (
int
, default:1000
)  The number of replications used to estimate the null distribution when using the permutation test used to calculate the pvalue.workers (
int
, default:1
)  The number of cores to parallelize the pvalue computation over. Supply1
to use all cores available to the Process.
 Returns
Examples
>>> import numpy as np >>> from hyppo.independence import CCA >>> x = np.arange(7) >>> y = x >>> stat, pvalue = CCA().test(x, y) >>> '%.1f, %.2f' % (stat, pvalue) '1.0, 0.00'