Violin plot algorithm
You can enhance a box plot with a kernel density estimate (KDE) to create a violin plot, showing the distribution of values for each category in more detail.
Gaussian kernel
The KDE is calculated using a Gaussian kernel estimation.
The kernel density estimate
for any point
x is calculated as follows:
where
- h is the bandwidth
- n is the number of bins
- xi is the ith bin.
Note: The standard derivation
is set to 1.
is set to 1.
Bandwidth - Silverman's rule of thumb
The bandwidth is calculated using Silverman's rule of thumb, also known as the normal distribution approximation or Gaussian approximation. It minimizes the mean integrated squared error.
The bandwidth h is calculated as follows:
where
is the standard deviation
- IQR is the interquartile range
- n is the sample size.
Parent topic: Creating a violin plot