Research Basic: Mean
What is Mean?
The sum of the data divided by the number
of data is called the mean.
How to find the mean
Suppose we have a sample of 10 individuals, and we've recorded their annual purchase quantities of product brand X as follows:
|
A |
B |
C |
D |
E |
F |
G |
H |
I |
J |
|
3 |
4 |
6 |
4 |
5 |
3 |
2 |
4 |
5 |
6 |
The average annual purchase quantity for
product brand X can be calculated using the following formula:
Average = (3+4+6+4+5+3+2+4+5+6) / 10 = 4.2
In Excel, this can be easily calculated using the AVERAGE function: "=AVERAGE (range)".
Notes on the use of averages.
While the mean is a useful tool for summarizing data, it is sensitive to extreme values or outliers. For instance, in our previous example, let's assume one individual (let's call them J) purchased an unusually high quantity of 40 units in a year.
|
A |
B |
C |
D |
E |
F |
G |
H |
I |
J |
|
3 |
4 |
6 |
4 |
5 |
3 |
2 |
4 |
5 |
40 |
In this case, the average annual purchase
quantity of product brand X would be calculated as follows:
Average = (3+4+6+4+5+3+2+4+5+40) / 10 = 7.6
units
However, if we look at the data again, we
can see that only individual J purchased 7 or more units annually. The other 9
individuals had purchase quantities between 2 and 6 units. It seems unusual to
represent the annual purchase quantity of these 10 individuals as 7.6 units. In
such cases, it might be more appropriate to use a different measure of central
tendency, such as the median or mode, to summarize the data. Alternatively, if
we suspect that there might be an error or anomaly in the data, we could consider
using a trimmed mean, which is calculated by excluding a certain percentage of
the highest and lowest values.
As shown in the figure below, when the data distribution is skewed, the mean can be pulled in one direction, so caution is necessary.
Additionally, when data exhibits a bimodal
distribution, as shown in the figure below, it is difficult to summarize the
data using the mean alone. A bimodal distribution is characterized by two
distinct peaks.
Representative values other than the mean
In addition to the mean, there are other
measures of central tendency.
Median: When
data is arranged in ascending or descending order, the median is the middle
value. If there is an even number of data points, the median is the average of
the two middle values.
Mode: The
mode is the value that appears most frequently in a data set.
For full report, refer to this link.