SmoothCFTest¶

class hyppo.ksample.SmoothCFTest(num_randfreq=5)

Smooth Characteristic Function test statistic and p-value

The Smooth Characteristic Function test is a two-sample test that uses differences in the smoothed (analytic) characteristic function of two data distributions in order to determine how different the two data are 1.

Parameters

num_randfreq (int) -- Used to construct random array with size (p, q) where p is the number of dimensions of the data and q is the random frequency at which the test is performed. These are the random test points at which test occurs (see notes).

Notes

The test statistic takes on the following form:

$nW_n\Sigma_n^{-1}W_n$

As seen in the above formulation, this test-statistic takes the same form as the Hotelling $$T^2$$ statistic. However, the components are defined differently in this case. Given data sets X and Y, define the following as $$Z_i$$, the vector of differences:

$Z_i = (k(X_i, T_1) - k(Y_i, T_1), \ldots, k(X_i, T_J) - k(Y_i, T_J)) \in \mathbb{R}^J$

The above is the vector of differences between kernels at test points, $$T_j$$. This same formulation is used in the Mean Embedding Test. Moving forward, $$W_n$$ can be defined:

$W_n = \frac{1}{n} \sum_{i = 1}^n Z_i$

This leaves $$\Sigma_n$$, the covariance matrix as:

$\Sigma_n = \frac{1}{n}ZZ^T$

In the specific case of the Smooth Characteristic function test, the vector of differences can be defined as follows:

$Z_i = (f(X_i)\sin(X_iT_1) - f(Y_i)\sin(Y_iT_1), f(X_i)\cos(X_iT_1) - f(Y_i)\cos(Y_iT_1),\cdots) \in \mathbb{R}^{2J}$

Once $$S_n$$ is calculated, a threshold $$r_{\alpha}$$ corresponding to the $$1 - \alpha$$ quantile of a Chi-squared distribution w/ J degrees of freedom is chosen. Null is rejected if $$S_n$$ is larger than this threshold.

References

1

Kacper Chwialkowski, Aaditya Ramdas, Dino Sejdinovic, and Arthur Gretton. Fast two-sample testing with analytic representations of probability measures. arXiv:1506.04725 [math, stat], 2015.

Methods Summary

 SmoothCFTest.statistic(x, y, random_state) Calculates the smooth CF test statistic. SmoothCFTest.test(x, y[, random_state]) Calculates the smooth CF test statistic and p-value.

SmoothCFTest.statistic(x, y, random_state)

Calculates the smooth CF test statistic.

Parameters
• x,y (ndarray of float) -- Input data matrices. x and y must have the same number of dimensions. That is, the shapes must be (n, p) and (m, p) where n is the number of samples and p and q are the number of dimensions.

• random_state (int) -- Set random seed for generation of test points

Returns

stat (float) -- The computed Smooth CF statistic.

SmoothCFTest.test(x, y, random_state=None)

Calculates the smooth CF test statistic and p-value.

Parameters
• x,y (ndarray of float) -- Input data matrices. x and y must have the same number of dimensions. That is, the shapes must be (n, p) and (m, p) where n is the number of samples and p and q are the number of dimensions.

• random_state (int) -- Set random seed for generation of test points

Returns

Examples

>>> import numpy as np
>>> from hyppo.ksample import SmoothCFTest
>>> np.random.seed(1234)
>>> x = np.random.randn(500, 10)
>>> y = np.random.randn(500, 10)
>>> stat, pvalue = SmoothCFTest().test(x, y, random_state=1234)
>>> '%.2f, %.3f' % (stat, pvalue)
'4.70, 0.910'