hyppo.ksample.k_sample_transform(inputs, test_type='normal')

Computes a k-sample transform of the inputs.

For \(k\) groups, this creates two matrices, the first vertically stacks the inputs. In order to use this function, the inputs must have the same number of dimensions \(p\) and can have varying number of samples \(n\). The second output is a label matrix the one-hoc encodes the groups. The outputs are thus (N, p) and (N, k) where N is the total number of samples. In the case where the test a random forest based tests, it creates a (N, 1) where the entries are varlues from 1 to \(k\) based on the number of samples.

  • inputs (list of ndarray) -- A list of the inputs. All inputs must be (n, p) where n is the number of samples and p is the number of dimensions. n can vary between samples, but p must be the same among all the samples.

  • test_type ({"normal", "rf"}, default: "normal") -- Whether to one-hoc encode the inputs ("normal") or use a one-dimensional categorical encoding ("rf").


  • u (ndarray) -- The matrix of concatenated inputs of shape (N, p).

  • v (ndarray) -- The label matrix of shape (N, k) ("normal") or (N, 1) ("rf").