FriedmanRafsky¶

class
hyppo.independence.
FriedmanRafsky
(**kwargs)¶ FriedmanRafksy (FR) test statistic and pvalue. This is a multivariate extension of the WaldWolfowitz runs test for randomness. The normal concept of a 'run' is replaced by a minimum spanning tree (MST) calculated between the points in respective data sets with edge weights defined as the Euclidean distance between two such points. After MST has been determined, all edges such that both corresponding nodes do not belong to the same class are severed and the number of independent resulting trees is counted. This test is consistent against similar tests.
Notes
The statistic can be derived as follows 1
Let \(x\) be a combined sample of \((n, p)\) and \((m, p)\) samples of random variables \(X\) and let \(y\) be a \((n+m, 1)\) array of labels \(Y\). We can then create a graph such that each point in \(X\) is connected to each other point in \(X\) by an edge weighted by the euclidean distance inbetween those points. The minimum spanning tree is then calculated and all edges such that the labels in \(Y\) are not from the same class are removed. The number of independent graphs is then summed to determine the uncorrected statistic for the test.
The pvalue and null distribution for the corrected statistic are calculated via a permutation test using
hyppo.tools.perm_test
.
Methods Summary

Helper function that calculates the Friedman Rafksy test statistic. 

Calculates the Friedman Rafsky test statistic and pvalue. 

FriedmanRafsky.
statistic
(x, y)¶ Helper function that calculates the Friedman Rafksy test statistic.
 Parameters
x,y (
ndarray
offloat
)  Input data matrices.x
andy
must have the same number of rows. That is, the shapes must be(n, p)
and(n, 1)
where n is the number of combined samples and p is the number of dimensions.y
is the array of labels corresponding to the two samples, respectively. Returns
stat (
float
)  The computed Friedman Rafsky statistic.

FriedmanRafsky.
test
(x, y, reps=1000, workers=1, random_state=None)¶ Calculates the Friedman Rafsky test statistic and pvalue.
 Parameters
x,y (
ndarray
offloat
)  Input data matrices.x
andy
must have the same number of rows. That is, the shapes must be(n, p)
and(n, 1)
where n is the number of combined samples and p is the number of dimensions.y
is the array of labels corresponding to the two samples, respectively.reps (
int
, default:1000
)  The number of replications used to estimate the null distribution when using the permutation test used to calculate the pvalue.workers (
int
, default:1
)  The number of cores to parallelize the pvalue computation over. Supply1
to use all cores available to the Process.random_state (
int
, default:None
)  The random_state for permutation testing to be fixed for reproducibility.
 Returns