# Pearson’s Product-Moment Correlation¶

In this tutorial, we explore

The theory behind the Pearson test statistic and p-value

The features of the implementation

## Theory¶

The following description is adapted from [1]:

Pearson’s product-moment correlation is a measure of the linear correlation between two univariate random variables [2]. Given sample data \(\mathbf{x}\) and \(\mathbf{y}\), the sample Pearson correlation is

where \(\hat{\mathrm{cov}} \left( \mathbf{x}, \mathbf{y} \right)\) is the sample covariance, \(\hat{\sigma}_{\mathbf{x}}\) and \(\hat{\sigma}_{\mathbf{y}}\) are the sample standard deviations of \(\mathbf{x}\) and \(\mathbf{y}\) respectively.

This implementation wraps `scipy.stats.pearsonr`

[3] to conform to the `mgcpy`

API.

## Using Pearson’s¶

Before delving straight into function calls, let’s first import some useful functions, to ensure consistency in these examples, we set the seed:

```
[1]:
```

```
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt; plt.style.use('classic')
import seaborn as sns; sns.set(style="white")
from mgcpy.independence_tests.rv_corr import RVCorr
from mgcpy.benchmarks import simulations as sims
np.random.seed(12345678)
```

To start, let’s simulate some linear data:

```
[2]:
```

```
x, y = sims.linear_sim(num_samp=100, num_dim=1, noise=0.1)
fig = plt.figure(figsize=(8,8))
fig.suptitle("Linear Simulation", fontsize=17)
ax = sns.scatterplot(x=x[:,0], y=y[:,0])
ax.set_xlabel('Simulated X', fontsize=15)
ax.set_ylabel('Simulated Y', fontsize=15)
plt.axis('equal')
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
plt.show()
```

The test statistic and p-value can be called by creating the `RVCorr`

object and simply calling the corresponding test statistic and p-value methods. When creating the object, it is necessary to define the `which_test`

parameter so that the correct test is run (Pearson in this case).

```
[3]:
```

```
pearson = RVCorr(which_test="pearson")
pearson_statistic, independence_test_metadata = pearson.test_statistic(x, y)
p_value, _ = pearson.p_value(x, y)
print("Pearson test statistic:", pearson_statistic)
print("P Value:", p_value)
```

```
Pearson test statistic: 0.9863824325214345
P Value: 1.2218301635032126e-78
```

Covariance is also returned in the metadata. **Note that Pearson only operates on univariate data.**

# RV¶

In this tutorial, we explore

The theory behind the RV test statistic and p-value

The features of the implementation

## Theory¶

The following description is adapted from [1]:

RV is a multivariate generalization of the squared Pearson’s coefficient [4, 5]. The derivation is as follows: assuming each column in \(\mathbf{x}\) and \(\mathbf{y}\) are pre-centered to zero mean in each dimension, then the sample covariance matrix is $:nbsphinx-math:hat{mathbf{Sigma}_{xy}} = {\mathbf{x}}^T \mathbf{y} $, and the RV coefficient is

The p-value is then calculated using a standard permutation test.

## Using RV¶

Let’s use a multivariate simulation this time:

```
[4]:
```

```
x, y = sims.linear_sim(num_samp=100, num_dim=3, noise=0.1)
```

The test statistic and p-value can be called by creating the `RVCorr`

object and simply calling the corresponding test statistic and p-value methods. When creating the object, it is necessary to define the `which_test`

parameter so that the correct test is run (RV in this case).

```
[5]:
```

```
rv = RVCorr(which_test="rv")
rv_statistic, independence_test_metadata = rv.test_statistic(x, y)
p_value, _ = rv.p_value(x, y)
print("Pearson test statistic:", rv_statistic)
print("P Value:", p_value)
```

```
Pearson test statistic: 0.5019467309442741
P Value: 0.001
```

Covariance is also returned in the metadata. The p-value is bounded by the number of repetitions (in this case 1000). This is because since we are estimating the null distribution via permutation, this is the lowest value that we can be sufficiently sure is the p-value. It is worth noting that as in most of the other tests that use permutation to approximate the p-value, the `replication_factor`

parameter can be set to the desired number.

# Canonical Correlation Analysis (CCA)¶

In this tutorial, we explore

The theory behind the CCA test statistic and p-value

The features of the implementation

## Theory¶

The following description is adapted from [1]:

CCA, which finds the linear combinations with respect to the dimensions of \(\mathbf{x}\) and \(\mathbf{y}\) that maximize their correlation \citep{hardoon2004canonical}. It seeks a vector \(\mathbf{a} \in {\mathbb{R}}^p\) and \(\mathbf{b} \in {\mathbb{R}}^q\) to compute the first correlation coefficient as

The p-value is then calculated using a standard permutation test.

## Using CCA¶

The test statistic and p-value can be called by creating the `RVCorr`

object and simply calling the corresponding test statistic and p-value methods. When creating the object, it is necessary to define the `which_test`

parameter so that the correct test is run (CCA in this case). Using the same linear relationship as before:

```
[6]:
```

```
cca = RVCorr(which_test="cca")
cca_statistic, independence_test_metadata = cca.test_statistic(x, y)
p_value, _ = cca.p_value(x, y)
print("Pearson test statistic:", cca_statistic)
print("P Value:", p_value)
```

```
Pearson test statistic: 0.5019467309442741
P Value: 0.001
```

Covariance is also returned in the metadata. The p-value is bounded by the number of repetitions (in this case 1000). This is because since we are estimating the null distribution via permutation, this is the lowest value that we can be sufficiently sure is the p-value. It is worth noting that as in most of the other tests that use permutation to approximate the p-value, the `replication_factor`

parameter can be set to the desired number.