{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Power" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this tutorial, we explore\n", "\n", "- The concept of power in hypothesis testing\n", "- Power computation in mgcpy\n", "- Comparison of methods" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Theory" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Consider,\n", "\n", "$$Z_1, ..., Z_n \\sim F_Z$$ \n", "\n", "We wish to test:\n", "\n", "\\begin{align*}\n", " H_0: F_Z &\\in \\mathcal{F}_0\\\\\n", " H_A: F_Z &\\in \\mathcal{F}_1\n", "\\end{align*}\n", "\n", "For a testing procedure $T$, we define $\\alpha_n$ to be the probability of Type I error. That is,\n", "\n", "$$\\alpha_n(T) = Pr(T \\text{ rejects under } H_0)$$\n", "\n", "Similarly, we define $\\beta_n$ to be the probability of Type II error.\n", "\n", "$$\\beta_n(T) = Pr(T \\text{ fails to reject under } H_A)$$\n", "\n", "Finally, the **power** is defined as:\n", "\n", "$$\\text{Power}(T) = 1 - \\beta_n(T)$$\n", "\n", "or the probability of correctly rejecting the null when the alternative is true. A common desideratum for a testing procedure is to have as high of a power as possible, subject to $\\alpha_n(T) \\leq \\alpha$, where $\\alpha$ is some specified \"significance level\". When many alternatives are possible, power is a property of not only the test, but the particular distribution of the alternative. Implicitly, it depends on the sample size as well." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Power in mgcpy" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from mgcpy.independence_tests.dcorr import DCorr\n", "from mgcpy.independence_tests.rv_corr import RVCorr\n", "from mgcpy.benchmarks.power import power\n", "from mgcpy.benchmarks.simulations import linear_sim" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "mgcpy comes in built with 20 simulation functions that model various types of dependencies that random variables can have (linear, spiral, sinusoidal, etc.) The power function takes an Independence_Test object and function (that takes arguments num_samples, num_dimensions, and noise) to simulate data. Using these, estimates the power of the test under the alternative posed by the simulation. \n", "\n", "We first estimate the power of DCorr and Pearson on linearly related data. Without any noise, we expect this relationship to be perfectly discernable, i.e. a power of 1. For the following simulations we have sample size n = 100 and number of dimensions d = 1." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The power of Pearson's correlation against a linear alternative is: 1.000000\n", "The power of DCorr against a linear alternative is: 1.000000\n" ] } ], "source": [ "dcorr = DCorr()\n", "pearson = RVCorr(which_test = 'pearson')\n", "\n", "p = power(pearson, linear_sim)\n", "q = power(dcorr, linear_sim)\n", "print(\"The power of Pearson's correlation against a linear alternative is: %f\" % p)\n", "print(\"The power of DCorr against a linear alternative is: %f\" % q)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "By adding noise, we see a decrease in power of both tests." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The power of Pearson's correlation against a linear alternative is: 0.507000\n", "The power of DCorr against a linear alternative is: 0.439000\n" ] } ], "source": [ "p = power(pearson, linear_sim, noise = 3.0)\n", "q = power(dcorr, linear_sim, noise = 3.0)\n", "print(\"The power of Pearson's correlation against a linear alternative is: %f\" % p)\n", "print(\"The power of DCorr against a linear alternative is: %f\" % q)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When we change the simulation to a highly nonlinearly related distribution, such as a spiral, Pearson's correlation is incomporable to DCorr. Similarly, MGC will have high power in this nonlinear setting than even DCorr." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The power of Pearson's correlation against a spiral alternative is: 0.130000\n", "The power of DCorr against a spiral alternative is: 0.304000\n", "The power of MGC against a spiral alternative is: 1.000000\n" ] } ], "source": [ "from mgcpy.independence_tests.mgc import MGC\n", "from mgcpy.benchmarks.simulations import spiral_sim\n", "\n", "mgc = MGC()\n", "\n", "p = power(pearson, spiral_sim)\n", "q = power(dcorr, spiral_sim)\n", "r = power(mgc, spiral_sim)\n", "print(\"The power of Pearson's correlation against a spiral alternative is: %f\" % p)\n", "print(\"The power of DCorr against a spiral alternative is: %f\" % q)\n", "print(\"The power of MGC against a spiral alternative is: %f\" % r)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, we present a high-dimensional square shape at low sample size to show the effectiveness of MGC in such a setting." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The power of Pearson's correlation against a square alternative at n = 30 and d = 20 is: 0.056000\n", "The power of DCorr correlation against a square alternative at n = 30 and d = 20 is: 0.040000\n", "The power of MGC correlation against a square alternative at n = 30 and d = 20 is: 0.059000\n" ] } ], "source": [ "from mgcpy.benchmarks.simulations import square_sim\n", "\n", "d = 20\n", "n = 30\n", "\n", "p = power(pearson, square_sim, num_samples = n, noise = 1, num_dimensions = d)\n", "q = power(dcorr, square_sim, num_samples = n, noise = 1, num_dimensions = d)\n", "r = power(mgc, square_sim, num_samples = n, noise = 1, num_dimensions = d)\n", "print(\"The power of Pearson's correlation against a square alternative at n = %d and d = %d is: %f\" % (n, d, p))\n", "print(\"The power of DCorr correlation against a square alternative at n = %d and d = %d is: %f\" % (n, d, q))\n", "print(\"The power of MGC correlation against a square alternative at n = %d and d = %d is: %f\" % (n, d, r))" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 2 }