mgcpy.independence_tests.mgc_utils package

Submodules

mgcpy.independence_tests.mgc_utils.local_correlation module

MGC’s Local Correlation Module

mgcpy.independence_tests.mgc_utils.local_correlation.local_correlations(ndarray matrix_A, ndarray matrix_B, distance_metric='euclidean', base_global_correlation='mgc')

Computes all the local correlation coefficients in O(n^2 log n)

Parameters
  • matrix_A (2D numpy.array) –

    is interpreted as either:

    • a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR

    • a [n*d] data matrix, a matrix with n samples in d dimensions

  • matrix_B (2D numpy.array) –

    is interpreted as either:

    • a [n*n] distance matrix, a square matrix with zeros on diagonal for n samples OR

    • a [n*d] data matrix, a matrix with n samples in d dimensions

  • distance_metric (string) – specifies the distance_metric to use for computing the distance_matrix, defaults to ‘euclidean’

  • base_global_correlation (string) – specifies which global correlation to build up-on, including ‘mgc’,’dcor’,’mantel’, and ‘rank’. Defaults to mgc.

Returns

A dict with the following keys:

  • local_correlation_matrix

    a 2D matrix of all local correlations within [-1,1]

  • local_variance_A

    all local variances of A

  • local_variance_B

    all local variances of B

Return type

dictionary

Example:

>>> import numpy as np
>>> from scipy.spatial import distance_matrix
>>> from mgcpy.mgc.local_correlation import local_correlations
>>>
>>> X = np.array([[2, 1, 100], [4, 2, 10], [8, 3, 10]])
>>> Y = np.array([[30, 20, 10], [5, 10, 20], [8, 16, 32]])
>>> result = local_correlations(X, Y)
mgcpy.independence_tests.mgc_utils.local_correlation.local_covariance(ndarray distance_matrix_A, ndarray distance_matrix_B, ndarray ranked_distance_matrix_A, ndarray ranked_distance_matrix_B)

Computes all local covariances simultaneously in O(n^2).

Parameters
  • distance_matrix_A (2D numpy.array) – first distance matrix (centered or appropriately transformed), [n*n]

  • distance_matrix_B (2D numpy.array) – second distance matrix (centered or appropriately transformed), [n*n]

  • ranked_distance_matrix_A (2D numpy.array) – column-wise ranked matrix of A, [n*n]

  • ranked_distance_matrix_B (2D numpy.array) – column-wise ranked matrix of B, [n*n]

Returns

matrix of all local covariances, [n*n]

Return type

2D numpy.array

mgcpy.independence_tests.mgc_utils.threshold_smooth module

MGC’s Sample Statistic Module

mgcpy.independence_tests.mgc_utils.threshold_smooth.threshold_local_correlations(local_correlation_matrix, sample_size)[source]

Finds a connected region of significance in the local correlation map by thresholding

Parameters
  • local_correlation_matrix – all local correlations within [-1,1]

  • sample_size (integer) – the sample size of original data (which may not equal m or n in case of repeating data).

Returns

a binary matrix of size m and n, with 1’s indicating the significant region.

Return type

2D numpy.array

mgcpy.independence_tests.mgc_utils.threshold_smooth.smooth_significant_local_correlations(significant_connected_region, local_correlation_matrix)[source]

Finds the smoothed maximal within the significant region R:

  • If area of R is too small it returns the last local correlation

  • Otherwise, returns the maximum within significant_connected_region.

Parameters
  • significant_connected_region (2D numpy.array) – a binary matrix of size m and n, with 1’s indicating the significant region.

  • local_correlation_matrix – all local correlations within [-1,1]

Returns

A dict with the following keys:

  • mgc_statistic

    the sample MGC statistic within [-1, 1]

  • optimal_scale

    the estimated optimal scale as an [x, y] pair.

Return type

dictionary

Module contents