# Kendall’s $$\tau$$¶

In this tutorial, we explore

• The theory behind the Kendall test statistic and p-value

• The features of the implementation

## Theory¶

The following description is adapted from [1]:

To formulate Kendall [2], define $$(x_i, y_i)$$ and $$(x_j, y_j)$$ as concordant if the ranks agree: $$x_i > x_j$$ and $$y_i > y_j$$ or $$x_i < x_j$$ and $$y_i < y_j$$. They are discordant if the ranks disagree: $$x_i > x_j$$ and $$y_i < y_j$$ or $$x_i < x_j$$ and $$y_i > y_j$$. If $$x_i = x_j$$ and $$y_i = y_j$$, the pair is said to be tied. Let $$n_c$$ and $$n_d$$ be the number of concordant and discordant pairs respectively and $$n_0 = n (n - 1) / 2$$. In the case of no ties, the test statistic is defined as

$\mathrm{Kendall}_n = \frac{n_c - n_d}{n_0},$

Further define

$n_1 = \sum_i \frac{t_i (t_i - 1)}{2},$
$n_2 = \sum_j \frac{u_j (u_j - 1)}{2},$
$t_i = \mathrm{number\ of\ tied\ values\ in\ the}\ i \mathrm{th\ group\ of\ ties\ in\ the\ first\ quantity\, and},$
$u_j = \mathrm{number\ of\ tied\ values\ in\ the}\ j \mathrm{th\ group\ of\ ties\ in\ the\ second\ quantity}.$

In the case of ties, the statistic is calculated as in [3]

$\mathrm{Kendall}_n = \frac{n_c - n_d}{\sqrt{(n_0 - n_1) (n_0 - n_2)}}.$

This implementation wraps scipy.stats.kendalltau [4] to conform to the mgcpy API.

## Using Kendall’s $$\tau$$¶

Before delving straight into function calls, let’s first import some useful functions, to ensure consistency in these examples, we set the seed:

[1]:

%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt; plt.style.use('classic')
import seaborn as sns; sns.set(style="white")

from mgcpy.independence_tests.kendall_spearman import KendallSpearman
from mgcpy.benchmarks import simulations as sims

np.random.seed(12345678)


To start, let’s simulate some linear data:

[2]:

x, y = sims.linear_sim(num_samp=100, num_dim=1, noise=0.1)

fig = plt.figure(figsize=(8,8))
fig.suptitle("Linear Simulation", fontsize=17)
ax = sns.scatterplot(x=x[:,0], y=y[:,0])
ax.set_xlabel('Simulated X', fontsize=15)
ax.set_ylabel('Simulated Y', fontsize=15)
plt.axis('equal')
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
plt.show()


The test statistic and p-value can be called by creating the KendallSpearman object and simply calling the corresponding test statistic and p-value methods. When creating the object, it is necessary to define the which_test parameter so that the correct test is run (Kendall in this case).

[3]:

kendall = KendallSpearman(which_test="kendall")
p_value, _ = kendall.p_value(x, y)

print("Kendall test statistic:", kendall_statistic)
print("P Value:", p_value)

Kendall test statistic: 0.8905050505050507
P Value: 2.2897821932369628e-39


Note that Kendall only operates on univariate data.

# Spearman’s $$\rho$$¶

In this tutorial, we explore

• The theory behind the Spearman test statistic and p-value

• The features of the implementation

## Theory¶

Spearman and Kendall are closely related because they both operate on univariate ranked data. The following description is adapted from [1]:

Spearman can be thought of as closely related to Pearson’s product-moment correlation [5]. Suppose that $$\mathrm{rg}_{x_i}$$ and $$\mathrm{rg}_{y_i}$$ are the respective ranks of $$n$$ raw scores $$x_i$$ and $$y_i$$, $$\rho$$ denotes the Pearson’s coefficient but applied to rank variables, $$\hat{\mathrm{cov}} (\mathrm{rg}_{\mathbf{x}}, \mathrm{rg}_{\mathbf{y}})$$ denotes the covariance of the rank variables, and $$\hat{\sigma}_{\mathrm{rg}_{\mathbf{x}}}$$ and $$\hat{\sigma}_{\mathrm{rg}_{\mathbf{y}}}$$ denote the standard deviations of the rank variables. The statistic is

$\mathrm{Spearman}_s = \rho_{\mathrm{rg}_{\mathbf{x}}, \mathrm{rg}_{\mathbf{y}}} = \frac{\hat{\mathrm{cov}} (\mathrm{rg}_{\mathbf{x}}, \mathrm{rg}_{\mathbf{y}})}{\hat{\sigma}_{\mathrm{rg}_{\mathbf{x}}} \hat{\sigma}_{\mathrm{rg}_{\mathbf{y}}}}.$

This implementation wraps scipy.stats.spearmanr [4] to conform to the mgcpy API.

## Using Spearman’s $$\rho$$¶

The test statistic and p-value can be called by creating the KendallSpearman object and simply calling the corresponding test statistic and p-value methods. When creating the object, it is necessary to define the which_test parameter so that the correct test is run (Spearman in this case). Using the same linear relationship as before:

[4]:

spearman = KendallSpearman(which_test="spearman")

Kendall test statistic: 0.982214221422142