Correlation is a technique that helps to identify if there is a relationship between two sets of data (typically an input and output of a process). Nb: It’s important to remember that Correlation helps to establish if a mathematical relationship exists, but this doesn’t necessarily mean that a causal relationship exists – a subject for a future newsletter!
The first step of Correlation is to use a graphical technique – the scatter plot – to investigate whether there appears to be a correlation. Consider the following scatter plots shown on the right. Do you think they demonstrate a correlation?
Generally, we’re looking for two aspects when considering if a scatter plot demonstrates a correlation:
1) The angle of the slope (if any) – an upward slope indicates a direct relationship between the input and output, and a downward slope indicates an inverse relationship.
2) The degree to which the points are tightly clustered – the tighter the clustering of the points, the stronger the (mathematical) relationship between the input and output.
So, what is the Pearson Coefficient?
The Pearson Coefficient (r) aims to quantify the relationship that you can see on a scatter plot. It ranges from -1.0 to +1.0, where:
-1.0 is a very strong inverse relationship
0 indicates no relationship
+1.0 is a very strong direct relationship
In addition, Minitab also calculates a p-value that indicates if the Pearson Coefficient is statistically significant, which is discussed in a separate article.