## Pretty Joint Plots

In this tutorial, I’ll be using Python to create a neat, customizable joint plot–– inspired by the `jointplot`

graphics found in `Seaborn`

. Joint plots are great for plotting bivariate datasets, as they’re readily legible and provide high information content. Here, I’ll be creating a KDE joint plot with a modern color palette but a.simple design that resembles an old-school topographical map.

To follow along, you’ll need a working knowledge of statistics as well as `NumPy`

and `SciPy`

in order to generate a dataset; `matplotlib`

to create the figure itself; and `Seaborn`

to enhance matplotlib’s somewhat bland aesthetics. Note: I’m also using `mpld3`

to generate an interactive plot, but this is completely optional.

## Package Installation

Assuming you’re familiar with `pip`

, we can further simplify installation by installing `conda`

(Although I highly recommend using Anaconda’s python distribution, which comes with `conda`

, `NumPy`

, `SciPy`

and `matplotlib`

pre-installed):

`$ pip install conda`

Once `conda`

is installed, getting `NumPy`

, `SciPy`

, and `matplotlib`

is a easy as:

```
$ conda install numpy
$ conda install scipy
$ conda install matplotlib
```

`conda`

will automatically install the packages and their requirements using pre-compiled binaries, so installation times should be on the order of seconds instead of tens of minutes.

`Seaborn`

is only available through pip, however:

`$ pip install seaborn`

Once these packages have been installed, you can get started!

## Data Generation

If you don’t already have some data lying around to plot, you can easily create some using a bivariate normal distribution with `NumPy`

. First, we’ll need an easy way to generate pseudo-random positive-semidefinite matrices for our distribution’s covariances:

```
import numpy as np
def gen_cov(n):
A = np.random.rand(n, n)
return np.dot(A, A.T)
```

Then, we can define a pair of means and covariance matrices for our distribution:

```
n = 2
m = 2
mu = map(lambda x: np.random.normal(size = x)*np.random.randint(1,11) + np.random.normal(size = x), n*[m])
sigma = map(gen_cov, n*[m])
```

Here, we’ve generated a pair of 2-D means from , where is a random integer pulled from , and and are both random float arrays pulled from . If this all seems cryptic–– don’t worry! These parameters are completely arbitrary, and the point is just that we’re generating an interesting distribution to plot.

To sample 500 points from our bivariate distribution, we can use `numpy.random.multivariate_normal`

:

```
N = 500
data = np.vstack(tuple(map(lambda i: np.random.multivariate_normal(mu[i], sigma[i], N), range(n)))).T
```

So, now we have some data to plot!

## Scatter Plot

Before we go on to making a KDE of our data, let’s just make a joint plot with a scatter plot in the main grid and marginal histograms. First let’s import `Seaborn`

and set our plot style to `'dark'`

```
import seaborn as sns
sns.set_style('dark')
```

We can then set the x- and y-axes limits:

```
xmin, ymin = data.min(axis=1)
xmax, ymax = data.max(axis=1)
```

Although, for aesthetic purposes, we should probably adjust these values so that the all the points are clearly visible. I find that scaling up each axis by a factor of works fairly well:

```
xmax, xmin = tuple(np.array([xmax, xmin]) + 0.25*(xmax - xmin)*np.array([1, -1]))
ymax, ymin = tuple(np.array([ymax, ymin]) + 0.25*(ymax - ymin)*np.array([1, -1]))
```

These values will be useful for making the KDE joint plot as well.

Now, we can use the following code to generate the figure:

```
from matplotlib import pyplot as pp
from matplotlib import gridspec
#Define grid for subplots
gs = gridspec.GridSpec(2, 2, width_ratios=[3, 1], height_ratios = [1, 4])
#Create scatter plot
fig = pp.figure()
ax = pp.subplot(gs[1, 0])
cax = ax.scatter(data[0], data[1], color='darkred', alpha=.4)
#Turn off all axes
_=ax.axis('off')
#Create Y-marginal (right)
axr = pp.subplot(gs[1, 1], sharey=ax, xticks=[], yticks=[],frameon = False, xlim=(0, 1), ylim = (ymin, ymax))
axr.hist(data[1], color = '#5673E0', orientation = 'horizontal', normed = True)
#Create X-marginal (top)
axt = pp.subplot(gs[0,0], sharex=ax,frameon = False, xticks = [], yticks = [], xlim = (xmin, xmax), ylim=(0, 1))
axt.hist(data[0], color = '#5673E0', normed = True)
#Bring the marginals closer to the scatter plot
fig.tight_layout(pad = 1)
```

## Kernel Density Estimation

There are many ways to create a kernel density estimator in Python, but in this tutorial we’ll be using `scipy.stats.gaussian_kde`

:

```
from scipy.stats import gaussian_kde
#KDE for top marginal
kde_X = gaussian_kde(data[0])
#KDE for right marginal
kde_Y = gaussian_kde(data[1])
#KDE for contour plot
kde_XY = gaussian_kde(data)
```

To extract densities from the `gaussian_kde`

objects, we need to create some meshgrids using our max- and min-values from before:

```
# Create two 1D grid with 100 points in each dimension
x = linspace(xmin, xmax, 100)
y = linspace(ymin, ymax, 100)
# Create a regular 2D grid with 100 points in each dimension
xi, yi = np.mgrid[xmin:xmax:100j, ymin:ymax:100j]
coords = np.vstack([item.ravel() for item in [xi, yi]])
# Evaluate the KDEs on a 1D grid
dx = kde_X(x) # X-marginal density
dy = kde_Y(y) # Y-marginal density
# Evaluate the KDE on a 2D grid
density = kde_XY(coords).reshape(xi.shape) # Bivariate density
```

## Contour Plot

Now that we have KDEs for both the marginals and the bivariate distribution, we can make our final plot with the following code:

```
#Set style to white
sns.set_style('white')
#Define grid for subplots
gs = gridspec.GridSpec(2, 2, width_ratios=[3, 1], height_ratios=[1, 4])
#Create contour plot
fig = pp.figure()
ax = pp.subplot(gs[1,0])
cax = ax.contourf(density.T, origin = 'lower',
extent = (xmin, xmax, ymin, ymax), aspect = 'auto', cmap = cm.coolwarm) # Filled Contour
ax.contour(density.T, origin = 'lower', extent = (xmin, xmax, ymin, ymax), aspect = 'auto', cmap = cm.bone) # Contour Lines
#Turn off all axes
ax.axis('off')
#Create Y-marginal (right)
axr = subplot(gs[1,1], sharey = ax, xticks = [], yticks = [], frameon = False, xlim = (0, 1.4*dy.max()), ylim=(ymin, ymax))
axr.plot(dy, y, color = 'black')
axr.fill_betweenx(y, 0, dy, alpha = .75, color = '#5673E0')
#Create X-marginal (top)
axt = subplot(gs[0,0], sharex = ax, frameon = False, xticks=[], yticks=[], xlim = (xmin, xmax), ylim=(0, 1.4*dx.max()))
axt.plot(x, dx, color = 'black')
axt.fill_between(x, 0, dx, alpha=.75, color = '#5673E0')
#Bring the marginals closer to the contour plot
fig.tight_layout(pad = 1)
```

And there you have it! If you have any questions, please feel free to leave a comment below.