CCA¶
- class hyppo.independence.CCA¶
Cannonical Correlation Analysis (CCA) test statistic and p-value.
This test can be thought of inferring information from cross-covariance matrices 1. It has been thought that virtually all parametric tests of significance can be treated as a special case of CCA 2. The method was first introduced by Hotelling3.
Notes
The statistic can be derived as follows 4:
Let \(x\) and \(y\) be \((n, p)\) samples of random variables \(X\) and \(Y\). We can center \(x\) and \(y\) and then calculate the sample covariance matrix \(\hat{\Sigma}_{xy} = x^T y\) and the variance matrices for \(x\) and \(y\) are defined similarly. Then, the CCA test statistic is found by calculating vectors \(a \in \mathbb{R}^p\) and \(b \in \mathbb{R}^q\) that maximize
\[\mathrm{CCA}_n (x, y) = \max_{a \in \mathbb{R}^p, b \in \mathbb{R}^q} \frac{a^T \hat{\Sigma}_{xy} b} {\sqrt{a^T \hat{\Sigma}_{xx} a} \sqrt{b^T \hat{\Sigma}_{yy} b}}\]The p-value returned is calculated using a permutation test using
hyppo.tools.perm_test
.References
- 1
Wolfgang Karl Härdle and Léopold Simar. Canonical Correlation Analysis. In Wolfgang Karl Härdle and Léopold Simar, editors, Applied Multivariate Statistical Analysis, pages 443–454. Springer, Berlin, Heidelberg, 2015. doi:10.1007/978-3-662-45171-7_16.
- 2
Thomas R. Knapp. Canonical correlation analysis: A general parametric significance-testing system. Psychological Bulletin, 85(2):410–416, 1978. doi:10.1037/0033-2909.85.2.410.
- 3
Harold Hotelling. Relations Between Two Sets of Variates, pages 162–190. Springer New York, New York, NY, 1992. URL: https://doi.org/10.1007/978-1-4612-4380-9_14, doi:10.1007/978-1-4612-4380-9_14.
- 4
David R. Hardoon, Sandor Szedmak, and John Shawe-Taylor. Canonical Correlation Analysis: An Overview with Application to Learning Methods. Neural Computation, 16(12):2639–2664, December 2004. doi:10.1162/0899766042321814.
Methods Summary
|
Helper function that calculates the CCA test statistic. |
|
Calculates the CCA test statistic and p-value. |
- CCA.statistic(x, y)¶
Helper function that calculates the CCA test statistic.
- CCA.test(x, y, reps=1000, workers=1, random_state=None)¶
Calculates the CCA test statistic and p-value.
- Parameters
x,y (
ndarray
offloat
) -- Input data matrices.x
andy
must have the same number of samples and dimensions. That is, the shapes must be(n, p)
where n is the number of samples and p is the number of dimensions.reps (
int
, default:1000
) -- The number of replications used to estimate the null distribution when using the permutation test used to calculate the p-value.workers (
int
, default:1
) -- The number of cores to parallelize the p-value computation over. Supply-1
to use all cores available to the Process.
- Returns
Examples
>>> import numpy as np >>> from hyppo.independence import CCA >>> x = np.arange(7) >>> y = x >>> stat, pvalue = CCA().test(x, y) >>> '%.1f, %.2f' % (stat, pvalue) '1.0, 0.00'