# KCI¶

class hyppo.conditional.KCI(**kwargs)

Kernel Conditional Independence Test Statistic and P-Value.

This is a conditional indpendence test utilizing a radial basis function to calculate the kernels of two datasets. The trace of the normalized matrix product is then calculated to extract the test statistic. A Gaussian distribution is then utilized to calculate the p-value given the statistic and approximate mean and variance of the trace values of the independent kernel matrices. This test is consistent against similar tests.

Notes

Let $$x$$ be a combined sample of $$(n, p)$$ sample of random variables $$X$$ and let $$y$$ be a $$(n, 1)$$ labels of sample classes $$Y$$. We can then generate $$Kx$$ and $$Ky$$ kernel matrices for each of the respective samples. Normalizing, multiplying, and taking the trace of these kernel matrices gives the resulting test statistic. The p-value and null distribution for the corrected statistic are calculated a gamma distribution approximation.

Methods Summary

 KCI.statistic(x, y) Calculates the conditional independence test statistic. KCI.test(x, y) Calculates the Kernel Conditional Independence test statistic and p-value.

KCI.compute_kern(x, y)
KCI.statistic(x, y)

Calculates the conditional independence test statistic.

Parameters

x,y,z (ndarray of float) -- Input data matrices.

Returns

stat (float) -- The computed conditional independence test statistic.

KCI.test(x, y)

Calculates the Kernel Conditional Independence test statistic and p-value.

Parameters

x,y (ndarray of float) -- Input data matrices. x and y must have the same number of columns. That is, the shapes must be (n, p) and (n, 1) where n is the dimension of samples and p is the number of dimensions.

Returns

Example

>>> from hyppo.conditional import KCI
>>> from hyppo.tools.indep_sim import linear
>>> np.random.seed(123456789)
>>> x, y = linear(100, 1)
>>> stat, pvalue = KCI().test(x, y)
>>> '%.1f, %.2f' % (stat, pvalue)
'544.7, 0.00'