The three common measures of central tendency are as follows.
- Mean – The average of the given set of values.
- Median – The value in the middle when you arrange the given set of value in ascending order.
- Mode – The value that occurs frequently in the given set.
Mean¶
Mean is quite easy to understand. It’s just the average (the sum of all values divided by the number of those values). Take the following example.
import numpy as np
numbers = np.array([20, 34, 21, 18, 22, 21, 45, 10, 14, 20])
mean = np.mean(numbers)
print(mean)
The output will be 22.5
, which is the average of all the values in the list scores. Now, this is a one-dimensional array. For two dimensional array, you’d need to mention the axis along which you need to calculate the mean. I’ll cover this in a separate crash course soon.
Median¶
Median, unlike mean, is attained by sorting a list of values by their size and picking the middle value.
import numpy as np
numbers = np.array([20, 34, 21, 18, 22, 21, 45, 10, 14, 20])
median = np.median(numbers)
print(median)
The result will be 20.5
. This is the middle value you get when you arrange the scores in ascending order. If the number of values in a list is odd, the middle value will be its median. In the case of even counts, the two middle values are averaged like in the above example.
Mode¶
In a given list, mode is the value that occurs most often.
import numpy as np
from scipy import stats
numbers = np.array([20, 34, 21, 18, 22, 21, 45, 10, 14, 20])
mode = stats.mode(numbers)
print(mode)
The result will be (array([ 20.]), array([ 2.]))
, which means 20
is the mode and 2
is the count of the occurrence.
Wait! 21
occurs twice too. Well, you’re right there, champ! it is a mode as well. Except that this library shows us the first encountered mode alone.
While the measure of central tendency is focused towards the central aspects of the given dataset, it’s not always helpful. Any number of datasets can have identical mean, median, or mode. In that case, we’d use the measure of dispersion as it is focused towards the span of the entire dataset. It measures or summarises how spread the data is. The common measures of dispersion are variance and standard deviation, and we’ll exploring these two in the upcoming lessons.
Comment