When you have a group of numbers which represents some data, statistics allows you to describe that data so it's useful. One of the most common statistical applications is finding the mean of a set of numbers.
Let's look at an example. Suppose last week your class took a math test. 10 people showed up that day and received the following scores:
85, 97, 65, 87, 100, 76, 72, 89, 54, 92
What we want to do is find an average of those numbers. An average gives us a value which is an indication of the "normal score on the test". Computing the mean is the most common way to compute an average.
In order to compute the mean we have to perform two steps on the data. The first step is to add up all of the pieces of data:
85 + 97 + 65 + 87 + 100 + 76 + 72 + 89 + 57 + 92 = 820
The next step is to take that value and divide it by the total number of pieces of data (In this case 10 since we have 10 students test scores):
820 / 10 = 82
We now have our mean or average score of
82
Understanding the Mean
In order to better understand the mean let's look at it in a different way. Imagine that you are in a science class and your job is to collect a bunch of rocks. You have a group of 5 people who all collect the rocks. Some of your teammates find more rocks than others. Once you are done collecting rocks, you need to make sure each of your team members has the same number of rocks.
Here is the break down of the amount of rocks each member in your group collected:
You: 15 rocks
Connor: 24 rocks
Sarah: 13 rocks
Jason: 17 rocks
Jennifer: 21 rocks
Your group brings all of the rocks they collected to the table and combines them together in one big pile. Your group now has:
15 + 24 + 13 + 17 + 21 = 90 rocks
Now that your group has 90 rocks, you must make sure that everyone gets an equal number of rocks. In order to do that, you must divide those 90 rocks among the 5 of you.
90 ÷ 5 = 18
Each of you now has 18 rocks.
When finding the mean, you are finding what the amount is if every data point was the same, or if every amount was evenly distributed over the population.
The Problem With the Mean
There is one problem with using the mean to judge data overall. Lets look at the example with test scores again. This time only 5 people showed up to class. Here are the test scores:
85, 92, 91, 0, 90
As you can see just by looking at the numbers, 4 students did well, and 1 not so well because they slept in and missed the test. Let's see what the average score is:
85 + 92 + 91 + 0 + 90 = 358
358 / 5 = 71.6
Using the following grading scale: A=> 90, B=80-89, C=70-79, D=60-69 and F=<60 we would say the average grade in the class was a low C, yet there were 3 A's, 1 B and an F. Looking at just the scores it seems like those that showed up knew the material very well, however because of that one low grade it threw the average off making it look like the class was performing at a low C average. This value is called an outlier. An outlier is a data point which occurs because of some rare circumstances and doesn't follow the trend of the rest of the data. Outliers can greatly effect means as they can skew the result in one direction or the other. If you need to compute the mean, it is best to calculate them by removing outliers first:
85 + 92 + 91 + 90 = 358
358 / 4 = 89.5
We now have an average of 89.5 which is a very high B...so high in fact it's practically an A. This is more in line with what we would expect looking at the performance of the class.
*Note: When doing homework problems, do not remove outliers unless you are directed to do so. Outliers are removed when you are doing calculations of meaningful data. Many homework problems may contain outliers to show you how they can skew a mean. If you are unsure whether or not to remove outliers, contact your teacher or instructor.
In order to combat the outlier problem we can also use another method of finding an average:
Finding the median