Standard Deviation

karthik · created · flag

$$\sigma=\sqrt{\frac{\sum_{i=1}^{N}\left(X_{i}-\mu\right)^{2}}{N}}$$

Standard deviation $\sigma$ is just the square root of variance $\sigma^2$. But what does it mean?

Standard deviation, in simpler words, is just the average of the difference between all data points and the mean of the given data. Standard deviation is how you measure how varied the data points are in the distribution. But wait, that’s the same thing as variance!

Take a look at the above output. Variance is 33333 for the dataset sierra. But our data points don’t vary that much, does it? It’s -300 to 300. If our data points were to have a unit like metre, variance will be in a unit of metre squared. So to make sense of the data in metres, we use standard deviation.

Standard deviation simply represents the spread of the data in its absolute unit. This way, we can represent the measure of dispersion in a relatable way. Take a look at the output of the following program to grasp this.

import numpy as np

# Data
sierra_points = np.array([-300, -200, -100, -100, 0, 100, 100, 200, 300])
tango_points = np.array([-3, -2, -1, -1, 0, 1, 1, 2, 3])

# Mean
print("Mean of sierra is: {}.".format(np.mean(sierra_points)))
print("Mean of tango is: {}.".format(np.mean(tango_points)))

# Median
print("Median of sierra is: {}.".format(np.median(sierra_points)))
print("Median of tango is: {}.".format(np.median(tango_points)))

# Variance
print("Variance of sierra is: {}.".format(np.var(sierra_points)))
print("Variance of tango is: {}.".format(np.var(tango_points)))

# Standard deviation
print("Standard deviation of sierra is: {}.".format(np.std(sierra_points)))
print("Standard deviation of tango is: {}.".format(np.std(tango_points)))

The above program gives the following output.

Mean of sierra is: 0.0.
Mean of tango is: 0.0.

Median of sierra is: 0.0.
Median of tango is: 0.0.

Variance of sierra is: 33333.333333333336.
Variance of tango is: 3.3333333333333335.

Standard deviation of sierra is: 182.57418583505537.
Standard deviation of tango is: 1.8257418583505538.

So if the unit of sierra were to be in metres, then the standard deviation is 182 metres.

Practical application of variance and standard deviation

If both variance and standard deviation measure the spread of the data, you may wonder what is the significance of calculating both. As mentioned above, variance does not have an absolute reference to the unit used in the data anymore. In that case, standard deviation represents the spread of the data with an absolute reference to the context of the data. So in general, variance is mathematically used and standard deviation is used in contexts where you need an absolute representation of the dispersion. In short, interpretation of the dispersion becomes easier with standard deviation compared to variance.