{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n\n# `K`-Sample Testing\n\nA common problem experienced in research is the `k`-sample testing problem.\nConceptually, it can be described as follows: consider `k` groups of data where each\ngroup had a different treatment. We can ask, are these groups the similar to one\nanother or statistically different? More specifically, supposing that each group has\na distribution, are these distributions equivalent to one another, or is one of them\ndifferent?\n\nIf you are interested in questions of this mold, this module of the package is for you!\nAll our tests can be found in :mod:`hyppo.ksample`, and will be elaborated in\ndetail below. But before that, let's look at the mathematical formulations:\n\nConsider random variables $U_1, U_2, \\ldots, U_k$ with distributions\n$F_{U_1}, F_{U_2}, \\ldots F_{U_k}$.\nWhen performing `k`-sample testing, we are seeing whether or not\nthese distributions are equivalent. That is, we are testing\n\n\\begin{align}H_0 &: F_{U_1} = F_{U_2} = \\cdots = F_{U_k} \\\\\n H_A &: \\exists \\, i \\neq j \\text{ s.t. } F_{U_i} \\neq F_{U_j}\\end{align}\n\nLike all the other tests within hyppo, each method has a :func:`statistic` and\n:func:`test` method. The :func:`test` method is the one that returns the test statistic\nand p-values, among other outputs, and is the one that is used most often in the\nexamples, tutorials, etc.\nThe p-value returned is calculated using a permutation test using\n:meth:`hyppo.tools.perm_test` unless otherwise specified.\n\nSpecifics about how the test statistics are calculated for each in\n:class:`hyppo.ksample` can be found the docstring of the respective test.\nlet's look at unique properties of some of these tests:\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Multivariate Analysis of Variance (MANOVA) and Hotelling\n\n**MANOVA** is the current standard for `k`-sample testing in the literature.\nMore details can be found in :class:`hyppo.ksample.MANOVA`.\n**Hotelling** is 2-sample MANOVA.\nMore details can be found in :class:`hyppo.ksample.Hotelling`.\n\n
:Pros: - Very fast\n - Similar to tests found in scientific literature\n :Cons: - Not accurate when compared to other tests in most situations\n - Assumes data is derived from a multivariate Gaussian\n - Assumes data is has same covariance matrix
If you want use 2-sample MGC, we have added that functionality to SciPy!\n Please see :func:`scipy.stats.multiscale_graphcorr`.
:Pros: - Highly accurate\n - No additional computation complexity added\n - Not many assumptions of the data (only must be i.i.d.)\n - Has fast implementations (for ``indep_test=\"Dcorr\"`` and\n ``indep_test=\"Hsic\"``)\n :Cons: - Can be a little slower than some of the other tests in the package
:Pros: - Highly accurate\n - Has similar test statistics to the literature\n - Has fast implementations\n :Cons: - Lower power than more computationally complex algorithms
:Pros: - Very fast computation time\n - Faster than current, state-of-the-art quadratic-time kernel-based tests\n :Cons: - Heuristic method, checking more frequencies will give more power.
:Pros: - Very fast computation time\n - Faster than current, state-of-the-art quadratic-time kernel-based tests\n :Cons: - Heuristic method, checking more frequencies will give more power.
:Pros: - Very fast computation time\n :Cons: - Lower power than more computationally complex algorithms\n - Inherits the assumptions of the KS univariate test