{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Simulations" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this tutorial, we explore\n", "\n", "- The mathematical representations of the simulations\n", "- Plots showing each simulation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Mathematical Equations" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Simulations for the power curves were generated using the following equations:\n", "\n", "- Linear$\\left( X, Y \\right) \\in \\mathbb{R}^p \\times \\mathbb{R}$:\n", "\n", "$$X \\sim {\\mathcal{U} \\left( -1, 1 \\right)}^p,$$\n", "\n", "$$Y = w ^T X + \\kappa \\epsilon.$$\n", "\n", "- Exponential$\\left( X, Y \\right) \\in \\mathbb{R}^p \\times \\mathbb{R}$:\n", "\n", "$$X \\sim {\\mathcal{U} \\left( 0, 3 \\right)}^p,$$\n", "\n", "$$Y = \\exp \\left( w ^T X \\right) + 10 \\kappa \\epsilon.$$\n", "\n", "- Cubic$\\left( X, Y \\right) \\in \\mathbb{R}^p \\times \\mathbb{R}$:\n", "\n", "$$X \\sim {\\mathcal{U} \\left( -1, 1 \\right)}^p,$$\n", "\n", "$$Y = 128 {\\left( w ^T X - \\frac{1}{3} \\right)}^3 + 48 {\\left( w ^T X - \\frac{1}{3} \\right)}^2 - 12 \\left( w ^T X - \\frac{1}{3} \\right) + 80 \\kappa \\epsilon.$$\n", "\n", "- Joint Normal$\\left ( X, Y \\right) \\in \\mathbb{R}^p \\times \\mathbb{R}^p$: Let $\\rho = 1/2 p$, $I_p$ be the identity matrix of size $p \\times p$, $J_p$ be the matrix of ones of size $p \\times p$, and \n", "$\\Sigma =\n", "\\begin{bmatrix}\n", " I_p & \\rho J_p \\\\\n", " \\rho J_p & \\left(1 + 0.5 \\kappa \\right) I_p\\\\\n", "\\end{bmatrix}$.\n", "Then,\n", "\n", "$$\\left( X, Y \\right) \\sim \\mathcal{N} \\left( 0, \\Sigma \\right).$$\n", "\n", "- Step Function$\\left( X, Y \\right) \\in \\mathbb{R}^p \\times \\mathbb{R}$:\n", "\n", "$$X \\sim {\\mathcal{U} \\left( -1, 1 \\right)}^p,$$\n", "\n", "$$Y = \\mathcal{I} \\left( w ^T X > 0 \\right) + \\epsilon,$$\n", "\n", "where $\\mathcal{I}$ is the indicator function, that is $\\mathcal{I} \\left( z \\right)$ is unity whenever $z$ is true, and $0$ otherwise.\n", "\n", "- Quadratic$\\left( X, Y \\right) \\in \\mathbb{R}^p \\times \\mathbb{R}$:\n", "\n", "$$X \\sim {\\mathcal{U} \\left( -1, 1 \\right)}^p,$$\n", "\n", "$$Y = {\\left( w ^T X \\right)}^2 + 0.5 \\kappa \\epsilon.$$\n", "\n", "- W-Shape$\\left( X, Y \\right) \\in \\mathbb{R}^p \\times \\mathbb{R}$: For $U \\sim {\\mathcal{U} \\left( -1, 1 \\right)}^p$,\n", "\n", "$$X \\sim {\\mathcal{U} \\left( -1, 1 \\right)}^p,$$\n", "\n", "$$Y = 4 \\left[ {\\left( {\\left( w ^T X \\right)}^2 - \\frac{1}{2} \\right)}^2 + \\frac{w ^T U}{500} \\right] + 0.5 \\kappa \\epsilon.$$\n", "\n", "- Spiral$\\left( X, Y \\right) \\in \\mathbb{R}^p \\times \\mathbb{R}$: For $U \\sim \\mathcal{U} \\left( 0, 5 \\right)$, $\\epsilon \\sim \\mathcal{N} \\left( 0, 1 \\right)$,\n", "\n", "$$X_{\\left| d \\right|} = U \\sin \\left(\\pi U \\right) \\cos^d \\left(\\pi U \\right)\\ \\mathrm{for}\\ d = 1, ..., p - 1,$$\n", "\n", "$$X_{\\left| p \\right|} = U \\cos^p \\left(\\pi U \\right),$$\n", "\n", "$$Y = U \\sin \\left( \\pi U \\right) + 0.4 p \\epsilon.$$\n", "\n", "- Uncorrelated Bernoulli$\\left( X, Y \\right) \\in \\mathbb{R}^p \\times \\mathbb{R}$: For $U \\sim \\mathcal{B} \\left( 0.5 \\right)$, $\\epsilon_1 \\sim \\mathcal{N} \\left( 0, I_p \\right)$, $\\epsilon_2 \\sim \\mathcal{N} \\left( 0, 1 \\right)$,\n", "\n", "$$X \\sim {\\mathcal{B} \\left( 0.5 \\right)}^p + 0.5 \\epsilon_1,$$\n", "\n", "$$Y = \\left( 2 U - 1 \\right) w ^T X + 0.5 \\epsilon_2.$$\n", "\n", "- Logarithmic$\\left( X, Y \\right) \\in \\mathbb{R}^p \\times \\mathbb{R}^p$: For $\\epsilon \\sim \\mathcal{N} \\left( 0, I_p \\right)$,\n", "\n", "$$X \\sim \\mathcal{N} \\left( 0, I_p \\right),$$\n", "\n", "$$Y_{\\left| d \\right|} = 2 \\log_2 \\left( \\left| X_{\\left| d \\right|} \\right| \\right) + 3 \\kappa \\epsilon_{\\left| d \\right|}\\ \\mathrm{for}\\ d = 1, ..., p.$$\n", "\n", "- Fourth\\ Root$\\left( X, Y \\right) \\in \\mathbb{R}^p \\times \\mathbb{R}$:\n", "\n", "$$X \\sim {\\mathcal{U} \\left( -1, 1 \\right)}^p,$$\n", "\n", "$$Y = {\\left| w ^T X \\right|}^{1/4} + \\frac{\\kappa}{4} \\epsilon.$$\n", "\n", "- Sine\\ Period 4$\\pi$ $\\left( X, Y \\right) \\in \\mathbb{R}^p \\times \\mathbb{R}^p$: For $U \\sim \\mathcal{U} \\left( -1, 1 \\right)$, $V \\sim {\\mathcal{N} \\left( 0, 1 \\right)}^p$, $\\theta = 4 \\pi$,\n", "\n", "$$X_{\\left| d \\right|} = U + 0.02 p V_{\\left| d \\right|}\\ \\mathrm{for}\\ d = 1, ..., p,$$\n", "\n", "$$Y=\\sin(\\theta X)+\\kappa \\epsilon.$$\n", "\n", "- Sine\\ Period 16$\\pi$ $\\left( X, Y \\right) \\in \\mathbb{R}^p \\times \\mathbb{R}^p$: Same as above except $\\theta = 16 \\pi$ and the noise on $Y$ is changed to $0.5 \\kappa \\epsilon$.\n", "\n", "- Square$\\left( X, Y \\right) \\in \\mathbb{R}^p \\times \\mathbb{R}^p$: For $U \\sim \\mathcal{U} \\left( -1, 1 \\right)$, $V \\sim \\mathcal{U} \\left( -1, 1 \\right)$, $\\epsilon \\sim {\\mathcal{N} \\left( 0, 1 \\right)}^p$, $\\theta = -\\frac{\\pi}{8}$,\n", "\n", "$$X_{\\left| d \\right|} = U \\cos \\left( \\theta \\right) + V \\sin \\left( \\theta \\right) + 0.05 p \\epsilon_{\\left| d \\right|},$$\n", "\n", "$$Y_{\\left| d \\right|} = -U \\sin \\left( \\theta \\right) + V \\cos \\left( \\theta \\right).$$\n", "\n", "- Diamond$\\left( X, Y \\right) \\in \\mathbb{R}^p \\times \\mathbb{R}^p$: Same as above except $\\theta = \\pi/4$.\n", "\n", "- Two\\ Parabolas$\\left( X, Y \\right) \\in \\mathbb{R}^p \\times \\mathbb{R}$: For $\\epsilon \\sim \\mathcal{U} \\left( 0, 1 \\right)$, $U \\sim \\mathcal{B} \\left( 0.5 \\right)$,\n", "\n", "$$X \\sim {\\mathcal{U} \\left( -1, 1 \\right)}^p,$$\n", "\n", "$$Y = \\left( {\\left( w ^T X \\right)}^2 + 2 \\kappa \\epsilon \\right) \\cdot \\left(U - \\frac{1}{2} \\right).$$\n", "\n", "- Circle$\\left( X, Y \\right) \\in \\mathbb{R}^p \\times \\mathbb{R}$: For $U \\sim {\\mathcal{U} \\left( -1, 1 \\right)}^p$, $\\epsilon \\sim \\mathcal{N} \\left( 0, I_p \\right)$, $r = 1$,\n", "\n", "$$X_{\\left| d \\right|} = r \\left( \\sin \\left( \\pi U_{\\left| d + 1 \\right|} \\right) \\prod \\limits_{j = 1}^d \\cos \\left( \\pi U_{\\left| j \\right|} \\right) + 0.4 \\epsilon_{\\left| d \\right|} \\right)\\ \\mathrm{for}\\ d = 1, ..., p-1,$$\n", "\n", "$$X_{\\left| d \\right|} = r \\left( \\prod \\limits_{j = 1}^p \\cos \\left(\\pi U_{\\left| j \\right|} \\right) + 0.4 \\epsilon_{\\left| p \\right|} \\right),$$\n", "\n", "$$Y_{\\left| d \\right|} = \\sin \\left(\\pi U_{\\left| 1 \\right|} \\right).$$\n", "\n", "- Ellipse$\\left( X, Y \\right) \\in \\mathbb{R}^p \\times \\mathbb{R}^p$: Same as above except $r = 5$.\n", "\n", "- Multiplicative Noise$\\left( x, y \\right) \\in \\mathbb{R}^p \\times \\mathbb{R}^p$: $u \\sim \\mathcal{N} \\left( 0, I_p \\right)$,\n", "\n", "$$x \\sim \\mathcal{N} \\left( 0, I_p \\right),$$\n", "\n", "$$y_{\\left| d \\right|} = u_{\\left| d \\right|} x_{\\left| d \\right|}\\ \\mathrm{for}\\ d = 1, ..., p.$$\n", "\n", "- Multimodal Independence$\\left( X, Y \\right) \\in \\mathbb{R}^p \\times \\mathbb{R}$: For $U \\sim \\mathcal{N} \\left( 0, I_p \\right)$, $V \\sim \\mathcal{N} \\left( 0, I_p \\right)$, $U' \\sim {\\mathcal{B} \\left( 0.5 \\right)}^p$, $V' \\sim {\\mathcal{B} \\left( 0.5 \\right)}^p$,\n", "\n", "$$X = U/3 + 2U' - 1,$$\n", "\n", "$$Y = V/3 + 2V' - 1.$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Plots" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's import some useful packages and create a function that plots our simulated 1D data, to ensure consistency in these examples, we set the seed:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "import numpy as np\n", "import matplotlib.pyplot as plt; plt.style.use('classic')\n", "import seaborn as sns; sns.set(style=\"white\")\n", "\n", "from mgcpy.benchmarks.simulations import *\n", "\n", "np.random.seed(12345678)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "def plot_sims(sim_name, sim_func):\n", " \"\"\"\n", " Plots all of the simulations\n", " \"\"\"\n", " if sim_name == 'Sine (16$\\pi$)':\n", " x, y = sim_func(num_samp=1000, num_dim=1, noise=0, period=16*np.pi)\n", " elif sim_name == 'Ellipse':\n", " x, y = sim_func(num_samp=1000, num_dim=1, noise=0, radius=5)\n", " elif sim_name == 'Diamond':\n", " x, y = sim_func(num_samp=1000, num_dim=1, noise=0, period=-np.pi/4)\n", " elif sim_name == 'Multiplicative Noise' or sim_name == 'Multimodal Independence':\n", " x, y = sim_func(num_samp=1000, num_dim=1)\n", " else:\n", " x, y = sim_func(num_samp=1000, num_dim=1, noise=0)\n", " \n", " # Normalize\n", " x = x / np.max(x)\n", " y = y / np.max(y)\n", " \n", " fig = plt.figure(figsize=(8,8))\n", " fig.suptitle(sim_name + \" Simulation\", fontsize=17)\n", " ax = sns.scatterplot(x=x[:,0], y=y[:,0])\n", " ax.set_xlabel('Simulated X', fontsize=15)\n", " ax.set_ylabel('Simulated Y', fontsize=15)\n", " plt.axis('equal')\n", " plt.xticks(fontsize=15)\n", " plt.yticks(fontsize=15)\n", " plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Simultions are randomly generated with an $x$ which is $(n \\times d)$ and $y$ which is $(n \\times 1)$ that have 2 required parameters: num_samp or the number of samples, and num_dim or the number of dimensions. Optional parameters can be set based on the documentation. Visualizations of are shown below with and without the noise. Here are all the simulations:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "scrolled": false }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "