What is ‘Analysis of Residuals’?
Analysis of Residuals’ is a mathematical method for checking if a regression model is a ‘good fit’.
Imagine that you have identified that a correlation exists (click here for a refresher on correlation) between a process input and the process output, and a regression model has been created in Minitab, as shown here:
Visually, it looks like this regression line (right) is a ‘good fit’ – it appears to go through the centre of the data points, and to represent the general correlation. However, this type of visual assessment is quite subjective – when do you decide that the model is ‘not a good fit’?
The ‘Analysis of Residuals’ provides a more sophisticated approach for deciding if a regression model is a good fit. It is particularly useful in Multiple Regression, where a Scatter Plot is not available for a visual assessment.
Comparing the residuals of ‘good’ and ‘bad’ regression models:
Consider the two regression models, and their residuals plots, shown here:
The (lower) plots show the residuals for each model (the residuals are the errors between the regression lines and the actual data points). It can be seen that:
1) The residuals for the ‘good’ regression model are Normally distributed, and random.
2) The residuals for the ‘bad’ regression model are non-Normal, and have a distinct, non-random pattern.
Using this knowledge, the validity of a regression model can be assessed by looking at its residuals.
Using Minitab for the ‘Analysis of Residuals’:
When completing a regression analysis, Minitab can provide four different Residuals plots, in one Minitab graph. These four Residual plots provide four different ways to look at the residuals, in order to help you decide if they are Normally distributed and random. Here are the Residuals plots for the regression shown at the top of this article:
In this case, the Residuals appear to be Normally distributed (shown by the probability and histogram plots on the left) and generally random (shown by the right hand plots, which display the residuals against their fitted values and in their observation order).
There is perhaps one data point (number 49 on the lower right hand plot) which does not fit the Normal distribution, and may be worthy of further investigation.