mgcpy.benchmarks package

Submodules

mgcpy.benchmarks.power module

mgcpy.benchmarks.power.power(independence_test, sample_generator, num_samples=100, num_dimensions=1, noise=0.0, repeats=1000, alpha=0.05, simulation_type='')[source]

Estimate the power of an independence test given a simulator to sample from

Parameters
  • independence_test (Object(Independence_Test)) – an object whose class inherits from the Independence_Test abstract class

  • sample_generator (FunctionType or callable()) – a function used to generate simulation from simulations with parameters given by the following arguments - num_samples: default to 100 - num_dimensions: default to 1 - noise: default to 0

  • num_samples (int) – the number of samples generated by the simulation (default to 100)

  • num_dimensions (int) – the number of dimensions of the samples generated by the simulation (default to 1)

  • noise (float) – the noise used in simulation (default to 0)

  • repeats (int) – the number of times we generate new samples to estimate the null/alternative distribution (default to 1000)

  • alpha (float) – the type I error level (default to 0.05)

  • simulation_type (string) – specify simulation when necessary (default to empty string)

Return empirical_power

the estimated power

Return type

numpy.float

Example

>>> from mgcpy.benchmarks.power import power
>>> from mgcpy.independence_tests.mgc.mgc import MGC
>>> from mgcpy.benchmarks.simulations import circle_sim
>>> mgc = MGC()
>>> mgc_power = power(mgc, circle_sim, num_samples=100, num_dimensions=2, simulation_type='ellipse')
mgcpy.benchmarks.power.power_given_data(independence_test, simulation_type, data_type='dimension', num_samples=100, num_dimensions=1, repeats=1000, alpha=0.05, additional_params={})[source]

Estimate the power of an independence test given pre-generated data from the repository MGC-paper Mostly for internal testing purposes

Parameters
  • independence_test (Object(Independence_Test)) – an object whose class inherits from the Independence_Test abstract class

  • simulation_type (int within [1, 20]) – specify which simulation is used

  • data_type (string, either 'dimension' or 'sample_size') – the pre-generated data is either increasing in dimensions or increasing in sample sizes

  • num_samples (int) – the number of samples generated by the simulation (default to 100)

  • num_dimensions (int) – the number of dimensions of the samples generated by the simulation (default to 1)

  • noise (float) – the noise used in simulation (default to 0)

  • repeats (int) – the number of times we generate new samples to estimate the null/alternative distribution (default to 1000)

  • alpha (float) – the type I error level (default to 0.05)

Return empirical_power

the estimated power

Return type

numpy.float

Example

>>> from mgcpy.benchmarks.power import power_given_data
>>> from mgcpy.independence_tests.mgc.mgc import MGC
>>> from mgcpy.benchmarks.simulations import circle_sim
>>> mgc = MGC()
>>> mgc_power = power_given_data(mgc, simulation_type=4, num_samples=100, num_dimensions=2)

mgcpy.benchmarks.simulations module

mgcpy.benchmarks.simulations.gen_coeffs(num_dim)[source]

Helper function for generating a linear simulation.

Parameters

num_dim – number of dimensions for the simulation

Returns

a vector of coefficients

mgcpy.benchmarks.simulations.gen_x_unif(num_samp, num_dim, low=-1, high=1)[source]

Helper function for generating n samples from d-dimensional vector

Parameters
  • num_samp – number of samples for the simulation

  • num_dim – number of dimensions for the simulation

  • low – the lower limit of the data matrix, defaults to -1

  • high – the upper limit of the data matrix, defaults to 1

Returns

uniformly distributed simulated data matrix

mgcpy.benchmarks.simulations.linear_sim(num_samp, num_dim, noise=1, indep=False, low=-1, high=1)[source]

Function for generating a linear simulation.

Parameters
  • num_samp – number of samples for the simulation

  • num_dim – number of dimensions for the simulation

  • noise – noise level of the simulation, defaults to 1

  • indep – whether to sample x and y independently, defaults to false

  • low – the lower limit of the data matrix, defaults to -1

  • high – the upper limit of the data matrix, defaults to 1

Returns

the data matrix and a response array

mgcpy.benchmarks.simulations.exp_sim(num_samp, num_dim, noise=10, indep=False, low=0, high=3)[source]

Function for generating an exponential simulation.

Parameters
  • num_samp – number of samples for the simulation

  • num_dim – number of dimensions for the simulation

  • noise – noise level of the simulation, defaults to 10

  • indep – whether to sample x and y independently, defaults to false

  • low – the lower limit of the data matrix, defaults to 0

  • high – the upper limit of the data matrix, defaults to 3

Returns

the data matrix and a response array

mgcpy.benchmarks.simulations.cub_sim(num_samp, num_dim, noise=80, indep=False, low=-1, high=1, cub_coeff=array([-12, 48, 128]), scale=0.3333333333333333)[source]

Function for generating a cubic simulation.

Parameters
  • num_samp – number of samples for the simulation

  • num_dim – number of dimensions for the simulation

  • noise – noise level of the simulation, defaults to 80

  • indep – whether to sample x and y independently, defaults to False

  • low – the lower limit of the data matrix, defaults to -1

  • high – the upper limit of the data matrix, defaults to 1

  • cub_coeff – coefficients of the cubic function where each value corresponds to the respective order coefficientj, defaults to [-12, 48, 128]

  • scale – scaling center of the cubic, defaults to 1/3

Returns

the data matrix and a response array

mgcpy.benchmarks.simulations.joint_sim(num_samp, num_dim, noise=0.5)[source]

Function for generating a joint-normal simulation.

Parameters
  • num_samp – number of samples for the simulation

  • num_dim – number of dimensions for the simulation

  • noise – noise level of the simulation, defaults to 80

Returns

the data matrix and a response array

mgcpy.benchmarks.simulations.step_sim(num_samp, num_dim, noise=1, indep=False, low=-1, high=1)[source]

Function for generating a joint-normal simulation.

Parameters
  • num_samp – number of samples for the simulation

  • num_dim – number of dimensions for the simulation

  • noise – noise level of the simulation, defaults to 1

  • indep – whether to sample x and y independently, defaults to false

  • low – the lower limit of the data matrix, defaults to -1

  • high – the upper limit of the data matrix, defaults to 1

Returns

the data matrix and a response array

mgcpy.benchmarks.simulations.quad_sim(num_samp, num_dim, noise=1, indep=False, low=-1, high=1)[source]

Function for generating a quadratic simulation.

Parameters
  • num_samp – number of samples for the simulation

  • num_dim – number of dimensions for the simulation

  • noise – noise level of the simulation, defaults to 1

  • indep – whether to sample x and y independently, defaults to false

  • low – the lower limit of the data matrix, defaults to -1

  • high – the upper limit of the data matrix, defaults to 1

Returns

the data matrix and a response array

mgcpy.benchmarks.simulations.w_sim(num_samp, num_dim, noise=1, indep=False, low=-1, high=1)[source]

Function for generating a w-shaped simulation.

Parameters
  • num_samp – number of samples for the simulation

  • num_dim – number of dimensions for the simulation

  • noise – noise level of the simulation, defaults to 1

  • indep – whether to sample x and y independently, defaults to false

  • low – the lower limit of the data matrix, defaults to -1

  • high – the upper limit of the data matrix, defaults to 1

Returns

the data matrix and a response array

mgcpy.benchmarks.simulations.spiral_sim(num_samp, num_dim, noise=0.4, low=0, high=5)[source]

Function for generating a spiral simulation.

Parameters
  • num_samp – number of samples for the simulation

  • num_dim – number of dimensions for the simulation

  • noise – noise level of the simulation, defaults to 0.4

  • low – the lower limit of the data matrix, defaults to 0

  • high – the upper limit of the data matrix, defaults to 5

Returns

the data matrix and a response array

mgcpy.benchmarks.simulations.ubern_sim(num_samp, num_dim, noise=0.5, bern_prob=0.5)[source]

Function for generating an uncorrelated bernoulli simulation.

Parameters
  • num_samp – number of samples for the simulation

  • num_dim – number of dimensions for the simulation

  • noise – noise level of the simulation, defaults to 0.5

  • bern_prob – the bernoulli probability, defaults to 0.5

Returns

the data matrix and a response array

mgcpy.benchmarks.simulations.log_sim(num_samp, num_dim, noise=3, indep=False, base=2)[source]

Function for generating a logarithmic simulation.

Parameters
  • num_samp – number of samples for the simulation

  • num_dim – number of dimensions for the simulation

  • noise – noise level of the simulation, defaults to 1

  • indep – whether to sample x and y independently, defaults to false

  • base – the base of the log, defaults to 2

Returns

the data matrix and a response array

mgcpy.benchmarks.simulations.root_sim(num_samp, num_dim, noise=0.25, indep=False, low=-1, high=1, n_root=4)[source]

Function for generating an nth root simulation.

Parameters
  • num_samp – number of samples for the simulation

  • num_dim – number of dimensions for the simulation

  • noise – noise level of the simulation, defaults to 1

  • indep – whether to sample x and y independently, defaults to false

  • low – the lower limit of the data matrix, defaults to -1

  • high – the upper limit of the data matrix, defaults to 1

  • n_root – the root of the simulation, defaults to 4

Returns

the data matrix and a response array

mgcpy.benchmarks.simulations.sin_sim(num_samp, num_dim, noise=1, indep=False, low=-1, high=1, period=12.566370614359172)[source]

Function for generating a sinusoid simulation.

Note: For producing 4*pi and 16*pi simulations, change the period to the respective value.

Parameters
  • num_samp – number of samples for the simulation

  • num_dim – number of dimensions for the simulation

  • noise – noise level of the simulation, defaults to 1

  • indep – whether to sample x and y independently, defaults to false

  • low – the lower limit of the data matrix, defaults to -1

  • high – the upper limit of the data matrix, defaults to 1

  • period – the period of the sine wave, defaults to 4*pi

Returns

the data matrix and a response array

mgcpy.benchmarks.simulations.square_sim(num_samp, num_dim, noise=1, indep=False, low=-1, high=1, period=-0.39269908169872414)[source]

Function for generating a square or diamond simulation.

Note: For producing square or diamond simulations, change the period to -pi/8 or -pi/4.

Parameters
  • num_samp – number of samples for the simulation

  • num_dim – number of dimensions for the simulation

  • noise – noise level of the simulation, defaults to 0.05

  • indep – whether to sample x and y independently, defaults to false

  • low – the lower limit of the data matrix, defaults to -1

  • high – the upper limit of the data matrix, defaults to 1

  • period – the period of the sine and cosine square equation, defaults to 4*pi

Returns

the data matrix and a response array

mgcpy.benchmarks.simulations.two_parab_sim(num_samp, num_dim, noise=2, low=-1, high=1, prob=0.5)[source]

Function for generating a two parabolas simulation.

Parameters
  • num_samp – number of samples for the simulation

  • num_dim – number of dimensions for the simulation

  • noise – noise level of the simulation, defaults to 2

  • low – the lower limit of the data matrix, defaults to -1

  • high – the upper limit of the data matrix, defaults to 1

  • prob – the binomial probability, defaults to 0.5

Returns

the data matrix and a response array

mgcpy.benchmarks.simulations.circle_sim(num_samp, num_dim, noise=0.4, low=-1, high=1, radius=1)[source]

Function for generating a circle or ellipse simulation.

Note: For producing circle or ellipse simulations, change the radius to 1 or 5.

Parameters
  • num_samp – number of samples for the simulation

  • num_dim – number of dimensions for the simulation

  • noise – noise level of the simulation, defaults to 0.4

  • low – the lower limit of the data matrix, defaults to -1

  • high – the upper limit of the data matrix, defaults to 1

  • radius – the radius of the circle or ellipse, defaults to 1

Returns

the data matrix and a response array

mgcpy.benchmarks.simulations.multi_noise_sim(num_samp, num_dim)[source]

Function for generating a multiplicative noise simulation.

Parameters
  • num_samp – number of samples for the simulation

  • num_dim – number of dimensions for the simulation

Returns

the data matrix and a response array

mgcpy.benchmarks.simulations.multi_indep_sim(num_samp, num_dim, prob=0.5, sep1=3, sep2=2)[source]

Function for generating a multimodal independence simulation.

Parameters
  • num_samp – number of samples for the simulation

  • num_dim – number of dimensions for the simulation

  • prob – the binomial probability, defaults to 0.5

  • sep1 – determines the size and separation of clusters, defaults to 3

  • sep2 – determines the size and separation of clusters, defaults to 2

Returns

the data matrix and a response array

Module contents