FCIT

class hyppo.conditional.FCIT(model=DecisionTreeRegressor(), cv_grid={'min_samples_split': [2, 8, 64, 512, 0.01, 0.2, 0.4]}, num_perm=8, prop_test=0.1, discrete=(False, False))

Fast Conditional Independence test statistic and p-value

The Fast Conditional Independence Test is a non-parametric conditional independence test 1.

Parameters
  • model (Sklearn regressor) -- Regressor used to predict input data \(Y\)

  • cv_grid (dict) -- Dictionary of parameters to cross-validate over when training regressor.

  • num_perm (int) -- Number of data permutations to estimate the p-value from marginal stats.

  • prop_test (float) -- Proportion of data to evaluate test stat on.

  • discrete (tuple of string) -- Whether \(X\) or \(Y\) are discrete

Notes

Note

This algorithm is currently a pre-print on arXiv.

The motivation for the test rests on the assumption that if \(X \not\!\perp\!\!\!\perp Y \mid Z\), then \(Y\) should be more accurately predicted by using both \(X\) and \(Z\) as covariates as opposed to only using \(Z\) as a covariate. Likewise, if \(X \perp \!\!\! \perp Y \mid Z\), then \(Y\) should be predicted just as accurately by solely using \(X\) or soley using \(Z\) 1. Thus, the test works by using a regressor (the default is decision tree) to to predict input \(Y\) using both \(X\) and \(Z\) and using only \(Z\) 1. Then, accuracy of both predictions are measured via mean-squared error (MSE). \(X \perp \!\!\! \perp Y \mid Z\) if and only if MSE of the algorithm using both \(X\) and \(Z\) is not smaller than the MSE of the algorithm trained using only \(Z\) 1.

References

1(1,2,3,4)

Krzysztof Chalupka, Pietro Perona, and Frederick Eberhardt. Fast conditional independence test for vector variables with large sample sizes. arXiv:1804.02747 [math, stat], 2018.

Methods Summary


FCIT.statistic(x, y, z=None)

Calculates the FCIT test statistic.

Parameters

x,y,z (ndarray of float) -- Input data matrices.

Returns

  • stat (float) -- The computed FCIT test statistic.

  • two_sided (float) -- Two-sided p-value associated with test statistic

FCIT.test(x, y, z=None)

Calculates the FCIT test statistic and p-value.

Parameters

x,y,z (ndarray of float) -- Input data matrices.

Returns

  • stat (float) -- The computed FCIT statistic.

  • pvalue (float) -- The computed FCIT p-value.

Examples

>>> import numpy as np
>>> from hyppo.conditional import FCIT
>>> from sklearn.tree import DecisionTreeRegressor
>>> np.random.seed(1234)
>>> dim = 2
>>> n = 100000
>>> z1 = np.random.multivariate_normal(mean=np.zeros(dim), cov=np.eye(dim), size=(n))
>>> A1 = np.random.normal(loc=0, scale=1, size=dim * dim).reshape(dim, dim)
>>> B1 = np.random.normal(loc=0, scale=1, size=dim * dim).reshape(dim, dim)
>>> x1 = (A1 @ z1.T + np.random.multivariate_normal(mean=np.zeros(dim), cov=np.eye(dim), size=(n)).T)
>>> y1 = (B1 @ z1.T + np.random.multivariate_normal(mean=np.zeros(dim), cov=np.eye(dim), size=(n)).T)
>>> model = DecisionTreeRegressor()
>>> cv_grid = {"min_samples_split": [2, 8, 64, 512, 1e-2, 0.2, 0.4]}
>>> stat, pvalue = FCIT(model=model, cv_grid=cv_grid).test(x1.T, y1.T, z1)
>>> '%.1f, %.2f' % (stat, pvalue)
'-3.6, 1.00'

Examples using hyppo.conditional.FCIT