What are Confidence Intervals?
Let’s start at the beginning: When a sample of data is taken from a process, and a statistic (e.g. an average, percentage or median) is calculated from that sample of data, we have to remember that the statistic doesn’t necessarily represent the true process (it’s just a sample!).
So, because sample statistics don’t necessarily reflect the true process (they’re just based upon a sample) we place an interval around each statistic, and say that we are confident that the true process statistics fall within those intervals. Those intervals are known as ‘Confidence Intervals’.
An example: From a sample of data, such as that shown in the histogram below (taken from p152 of our book).
Figure: A Graphical Summary output from Minitab, with 95% Confidence Intervals
So, from the example above, while the average of the sample is 24.503, the ‘95% Confidence Interval’ indicates that, based upon this sample, we can be 95% confident that the true average of the process (from which the sample was taken) is between 23.990 and 25.016.
What does the ‘95%’ bit mean?
The 95% refers to the probability that the true process average is within the confidence interval that we’ve calculated. So, there is also a 5% chance that the true process average is not within the confidence interval. Think that’s too high? Well, Minitab can calculate a 99% Confidence Interval for you (or any other level of confidence for that matter), but, if you want 99% confidence, the Confidence Interval will actually become larger, to cover more eventualities!
For general business use (in non-safety-critical decisions), 95% is a reasonable level of confidence.
How are Confidence Intervals calculated?
Putting aside the ‘95%’ element, the size of a confidence interval for an average relies upon two key factors:
1) The size of the sample: The larger a sample, the better it reflects the process from which it was taken, and so the smaller the confidence interval can be (i.e. we can predict more precisely where the process average is). In summary; a bigger sample will provide a smaller (more precise) confidence interval.
2) The variation within the sample: If a sample has high variation (standard deviation), then it indicates that the process from which it was taken also has high variation. Taking this a step further, if the process has high variation, it’s more difficult to establish where the average is! So, the confidence interval will be larger to reflect this. In summary; higher variation will result in a larger (less precise) confidence interval, and vice versa for lower variation.
The mathematical equations for calculating confidence intervals (for various statistics) are quite complex, and not dealt with here. More information on Confidence Intervals can be found on page 151 of our Lean Six Sigma and Minitab book.