In a separate article, we introduced Correlation and the Pearson coefficient, and this article looks in more detail at how to interpret the Pearson coefficient, and in particular, it’s p-value.
Firstly, a reminder of the scatter plots and the Pearson coefficient, which aims to quantify the relationship that might exist between two variables on a scatter plot. The coefficient ranges from -1.0 to +1.0, where:
-1.0 is a strong inverse relationship
0 indicates no relationship
+1.0 is a strong direct relationship
You might think that’s the end of the matter but, as with many things in Six Sigma, it’s actually a little more complicated! This is because you must also assess whether the correlation is statistically significant.
Consider the scatter plot on the right. It appears to show a strong, positive correlation. Accordingly, the Pearson coefficient is likely to be close to +1.0 (we’ve estimated 0.8).
However, you also need to consider whether the correlation is statistically significant before you go any further. Why? Because with small sample sizes (and we only have 5 data points in this example!) there is a small chance that your data points will fall in such a way that it appears that a correlation exists, even when it doesn’t.
So, to assess the statistical significance of your correlation, you need to look at the p-value that is calculated alongside the Pearson coefficient, which can be interpreted as follows:
– If the p-value is low (generally less than 0.05), then your correlation is statistically significant, and you can use the calculated Pearson coefficient.
– If the p-value is not low (generally higher than 0.05), then your correlation is not statistically significant (it might have happened just by chance) and you should not rely upon your Pearson coefficient.
In our example above, the p-value is 0.3 (not statistically significant) which reflects the very small sample size (n=5). So, we should ignore the Pearson coefficient for now – it’s suggesting a correlation that might not even exist!
Confused? Here’s a summary:
- The Pearson coefficient helps to quantify a correlation
- The p-value helps to assess whether a correlation is real (statistically significant).
- The Pearson coefficient and p-value should be interpreted together, not individually.