# DiscrimTwoSample¶

class hyppo.discrim.DiscrimTwoSample(is_dist=False, remove_isolates=True)

A class that compares the discriminability of two datasets. Two sample test measures whether the discriminability is different for one dataset compared to another. More details can be described in [1].

Let $$\hat D_{x_1}$$ denote the sample discriminability of one approach, and $$\hat D_{x_2}$$ denote the sample discriminability of another approach. Then, .. math:

H_0: D_{x_1} &= D_{x_2} \\
H_A: D_{x_1} &> D_{x_2}


Alternatively, tests can be done for $$D_{x_1} < D_{x_2}$$ and $$D_{x_1} \neq D_{x_2}$$.

Parameters

Methods Summary

 Helper function that calculates the discriminability test statistic. DiscrimTwoSample.test(x1, x2, y[, reps, ...]) Calculates the test statistic and p-value for a two sample test for discriminability.

DiscrimTwoSample.statistic(x, y)

Helper function that calculates the discriminability test statistic.

Parameters

x, y (ndarray) -- Input data matrices. x and y must have the same number of samples. That is, the shapes must be (n, p) and (n, q) where n is the number of samples and p and q are the number of dimensions. Alternatively, x and y can be distance matrices, where the shapes must both be (n, n).

Returns

stat (float) -- The computed two sample discriminability statistic.

DiscrimTwoSample.test(x1, x2, y, reps=1000, alt='neq', workers=- 1)

Calculates the test statistic and p-value for a two sample test for discriminability.

Parameters
• x1, x2 (ndarray) -- Input data matrices. x1 and x2 must have the same number of samples. That is, the shapes must be (n, p) and (n, q) where n is the number of samples and p and q are the number of dimensions. Alternatively, x1 and x2 can be distance matrices, where the shapes must both be (n, n), and is_dist must set to True in this case.

• y (ndarray) -- A vector containing the sample ids for our n samples. Should be matched to the inputs such that y[i] is the corresponding label for x_1[i, :] and x_2[i, :].

• reps (int, optional (default: 1000)) -- The number of replications used to estimate the null distribution when using the permutation test used to calculate the p-value.

• alt ({"greater", "less", "neq"} (default: "neq")) -- The alternative hypothesis for the test. Can test that first dataset is more discriminable (alt = "greater"), less discriminable (alt = "less") or unequal discriminability (alt = "neq").

• workers (int, optional (default: -1)) -- The number of cores to parallelize the p-value computation over. Supply -1 to use all cores available to the Process.

Returns

Examples

>>> import numpy as np
>>> from hyppo.discrim import DiscrimTwoSample
>>> x1 = np.ones((100,2), dtype=float)
>>> x2 = np.concatenate([np.zeros((50, 2)), np.ones((50, 2))], axis=0)
>>> y = np.concatenate([np.zeros(50), np.ones(50)], axis=0)
>>> discrim1, discrim2, pvalue = DiscrimTwoSample().test(x1, x2, y)
>>> '%.1f, %.1f, %.2f' % (discrim1, discrim2, pvalue)
'0.5, 1.0, 0.00'