mgcpy.independence_tests.utils package

Submodules

mgcpy.independence_tests.utils.compute_distance_matrix module

Common Distance Calculation Matrix

mgcpy.independence_tests.utils.compute_distance_matrix.compute_distance(matrix_X, matrix_Y, _compute_distance)[source]

Computes the distance matrix among both independence tests

Parameters
  • matrix_X (2D numpy.array) – is interpreted as a [n*p] data matrix, a matrix with n samples in p dimensions

  • matrix_Y (2D numpy.array) – is interpreted as a [n*q] data matrix, a matrix with n samples in q dimensions

  • _compute_distance (FunctionType or callable()) – is interpreted as the distance matrix calculation with the specified metric

Returns

returns a list of two items, that contains:

  • matrix_X

    the calculated distance matrix for matrix_X

  • matrix_Y

    the calculated distance matrix for matrix_Y

Return type

list

mgcpy.independence_tests.utils.distance_transform module

MGC’s Distance Transform Module

mgcpy.independence_tests.utils.distance_transform.center_distance_matrix(ndarray distance_matrix, str base_global_correlation='mgc', is_ranked=True)

Appropriately transform distance matrices by centering them, based on the specified global correlation to build on

Parameters
  • distance_matrix (2D numpy.array) – a symmetric distance matrix

  • base_global_correlation (string) – specifies which global correlation to build up-on, including ‘mgc’,’unbiased’, ‘biased’, ‘mantel’, and ‘rank’. Defaults to mgc.

  • is_ranked (boolean) – specifies whether ranking within a column is computed or not Defaults to True.

Returns

A dict with the following keys:

  • centered_distance_matrix

    a [n*n] centered distance matrix

  • ranked_distance_matrix

    a [n*n] column-ranked distance matrix

Return type

dictionary

mgcpy.independence_tests.utils.distance_transform.dense_rank_data(ndarray data_matrix)

Equivalent to scipy.stats.rankdata(x, “dense”), but faster!

Parameters

data_matrix – any data matrix.

Returns

dense ranked data_matrix

Return type

2D numpy.array

mgcpy.independence_tests.utils.distance_transform.rank_distance_matrix(ndarray distance_matrix)

Sorts the entries within each column in ascending order

For ties, the “minimum” ranking is used, e.g. if there are repeating distance entries, The order is like 1,2,2,3,3,4,…

Parameters

distance_matrix (2D numpy.array) – a symmetric distance matrix.

Returns

column-wise ranked matrix of distance_matrix

Return type

2D numpy.array

mgcpy.independence_tests.utils.distance_transform.transform_distance_matrix(ndarray distance_matrix_A, ndarray distance_matrix_B, str base_global_correlation='mgc', is_ranked=True)

Transforms the distance matrices appropriately, with column-wise ranking if needed.

Parameters
  • distance_matrix_A – first symmetric distance matrix, [n*n]

  • distance_matrix_B – second symmetric distance matrix, [n*n]

  • base_global_correlation (string) – specifies which global correlation to build up-on, including ‘mgc’,’unbiased’, ‘biased’, ‘mantel’, and ‘rank’. Defaults to mgc.

  • is_ranked (boolean) – specifies whether ranking within a column is computed or not, if, base_global_correlation = “rank”, then ranking is performed regardless of the value if is_ranked. Defaults to True.

Returns

A dict with the following keys:

  • centered_distance_matrix_A

    a [n*n] centered distance matrix of A

  • centered_distance_matrix_B

    a [n*n] centered distance matrix of B

  • ranked_distance_matrix_A

    a [n*n] column-ranked distance matrix of A

  • ranked_distance_matrix_B

    a [n*n] column-ranked distance matrix of B

Return type

dictionary

Example:

>>> import numpy as np
>>> from scipy.spatial import distance_matrix
>>> from mgcpy.mgc.distance_transform import transform_distance_matrix
>>>
>>> X = np.array([[2, 1, 100], [4, 2, 10], [8, 3, 10]])
>>> Y = np.array([[30, 20, 10], [5, 10, 20], [8, 16, 32]])
>>> X_distance_matrix = distance_matrix(X, X)
>>> Y_distance_matrix = distance_matrix(Y, Y)
>>> transformed_distance_matrix_X_Y = transform_distance_matrix(X_distance_matrix, Y_distance_matrix)

mgcpy.independence_tests.utils.fast_functions module

Common Functions used in Fast Dcorr and Fast MGC

mgcpy.independence_tests.utils.mdmr_functions module

mgcpy.independence_tests.utils.mdmr_functions.check_rank(X)[source]

This function checks if X is rank deficient.

Parameters

matrix_X (2D numpy.array) –

is interpreted as:

  • a [n*d] data matrix, a matrix with n samples in d dimensions

Return type

None

Raise

Raises Exception if X matrix is rank deficient.

mgcpy.independence_tests.utils.mdmr_functions.hatify(X)[source]

Calculates the “hat” matrix.

Parameters

X (2D numpy.array) –

is interpreted as:

  • a [n*d] data matrix, a matrix with n samples in d dimensions

Returns

returns the hat matrix of the data matrix input.

Return type

2D numpy.array

mgcpy.independence_tests.utils.mdmr_functions.gower_center(Y)[source]

Computes Gower’s centered similarity matrix.

Parameters

Y (2D numpy.array) –

is interpreted as:

  • a [n*n] distance matrix

Returns

returns the gower centered similarity matrix of the input matrix.

Return type

2D numpy.array

mgcpy.independence_tests.utils.mdmr_functions.gower_center_many(Ys)[source]

Gower centers each matrix in the input.

Parameters

Ys (2D numpy.array Note: in practice this function is only run on one matrix currently. Due to this, Ys will just be a 1D numpy.array) –

is interpreted as:

  • an array of [n^2*1] distance matrices

Returns

returns the gower centered similarity matrix of the all input matrices.

Return type

2D numpy.array

mgcpy.independence_tests.utils.mdmr_functions.gen_H2_perms(X, predictors, permutation_indexes)[source]

Return H2 for each permutation of X indices, where H2 is the hat matrix minus the hat matrix of the untested columns.

Parameters
  • X (2D numpy.array) –

    is interpreted as:

    • a [n*d+1] data matrix, a matrix with n samples in d dimensions

    and a column of ones placed before the matrix

  • predictors (1D numpy.array) –

    is interpreted as:

    • a [1*d] array with the number of each variable in X used as a predictor

  • permutation_indexes (2D numpy.array) –

    is interpreted as:

    • a [p+1*n] matrix where p is the number of permutations given in the main code.

    This matrix has p permutations of indexes of the X data.

Returns

a [p+1*n^2] array of the flattened H2 matrices for each permutation

Return type

2D numpy.array

mgcpy.independence_tests.utils.mdmr_functions.gen_IH_perms(X, predictors, permutation_indexes)[source]

Return I-H where H is the hat matrix and I is the identity matrix.

The function calculates this correctly for multiple predictor tests.

Parameters
  • X (2D numpy.array) –

    is interpreted as:

    • a [n*d+1] data matrix, a matrix with n samples in d dimensions

    and a column of ones placed before the matrix

  • predictors (1D numpy.array) –

    is interpreted as:

    • a [1*d] array with the number of each variable in X used as a predictor

  • permutation_indexes (2D numpy.array) –

    is interpreted as:

    • a [p+1*n] matrix where p is the number of permutations given in the main code.

    This matrix has p permutations of indexes of the X data.

Returns

a [p+1*n^2] array of the flattened arrays of the IH matrix for each permutation

Return type

2D numpy.array

mgcpy.independence_tests.utils.mdmr_functions.calc_ftest(Hs, IHs, Gs, m2, nm)[source]

This function calculates the pseudo-F statistic.

Parameters
  • Hs (2D numpy.array) –

    is interpreted as:

    • a [p+1*n^2] array with the flattened H2 matrix for each permutation

  • IHs (2D numpy.array) –

    is interpreted as:

    • a [p+1*n^2] array with the flattened IH matrix for each permutation

  • Gs (2D numpy.array) –

    is interpreted as:

    • a [n^2*a] array with the gower centered distance matrix where a is in practice 1

  • m2 (float) –

    is interpreted as:

    • a float equal to the number of predictors minus the number of tests (which will be 1)

  • nm (float) –

    is interpreted as:

    • a float equal to the number of subjects minus the number of predictors

Returns

a [p+1*1] array of F statistics for each permutation

Return type

1D numpy.array

mgcpy.independence_tests.utils.mdmr_functions.fperms_to_pvals(F_perms)[source]

This function calculates the permutation p-value from the test statistics of all permutations.

Parameters

F_perms (1D numpy.array) –

is interpreted as:

  • a [p+1*1] array of F statistics for each permutation

Returns

a float which is the permutation p-value of the F-statistic

Return type

float

Module contents