FCIT¶
- class hyppo.conditional.FCIT(model=DecisionTreeRegressor(), cv_grid={'min_samples_split': [2, 8, 64, 512, 0.01, 0.2, 0.4]}, num_perm=8, prop_test=0.1, discrete=(False, False))¶
Fast Conditional Independence test statistic and p-value
The Fast Conditional Independence Test is a non-parametric conditional independence test 1.
- Parameters
model (
Sklearn regressor
) -- Regressor used to predict input data \(Y\)cv_grid (
dict
) -- Dictionary of parameters to cross-validate over when training regressor.num_perm (
int
) -- Number of data permutations to estimate the p-value from marginal stats.prop_test (
float
) -- Proportion of data to evaluate test stat on.discrete (
tuple
ofstring
) -- Whether \(X\) or \(Y\) are discrete
Notes
Note
This algorithm is currently a pre-print on arXiv.
The motivation for the test rests on the assumption that if \(X \not\!\perp\!\!\!\perp Y \mid Z\), then \(Y\) should be more accurately predicted by using both \(X\) and \(Z\) as covariates as opposed to only using \(Z\) as a covariate. Likewise, if \(X \perp \!\!\! \perp Y \mid Z\), then \(Y\) should be predicted just as accurately by solely using \(X\) or soley using \(Z\) 1. Thus, the test works by using a regressor (the default is decision tree) to to predict input \(Y\) using both \(X\) and \(Z\) and using only \(Z\) 1. Then, accuracy of both predictions are measured via mean-squared error (MSE). \(X \perp \!\!\! \perp Y \mid Z\) if and only if MSE of the algorithm using both \(X\) and \(Z\) is not smaller than the MSE of the algorithm trained using only \(Z\) 1.
References
Methods Summary
- FCIT.statistic(x, y, z=None)¶
Calculates the FCIT test statistic.
- FCIT.test(x, y, z=None)¶
Calculates the FCIT test statistic and p-value.
- Parameters
- Returns
Examples
>>> import numpy as np >>> from hyppo.conditional import FCIT >>> from sklearn.tree import DecisionTreeRegressor >>> np.random.seed(1234) >>> dim = 2 >>> n = 100000 >>> z1 = np.random.multivariate_normal(mean=np.zeros(dim), cov=np.eye(dim), size=(n)) >>> A1 = np.random.normal(loc=0, scale=1, size=dim * dim).reshape(dim, dim) >>> B1 = np.random.normal(loc=0, scale=1, size=dim * dim).reshape(dim, dim) >>> x1 = (A1 @ z1.T + np.random.multivariate_normal(mean=np.zeros(dim), cov=np.eye(dim), size=(n)).T) >>> y1 = (B1 @ z1.T + np.random.multivariate_normal(mean=np.zeros(dim), cov=np.eye(dim), size=(n)).T) >>> model = DecisionTreeRegressor() >>> cv_grid = {"min_samples_split": [2, 8, 64, 512, 1e-2, 0.2, 0.4]} >>> stat, pvalue = FCIT(model=model, cv_grid=cv_grid).test(x1.T, y1.T, z1) >>> '%.1f, %.2f' % (stat, pvalue) '-3.6, 1.00'