Last updated on 8 September 2025
Correlation coefficients measure the relationship between two variables, such as screen time and mental health. These coefficients are used in numerous fields such as finance, education, and health care. In this topic, you will learn about the correlation coefficient and its significance from a broader perspective.
The correlation coefficient is a statistical metric that measures how strongly two variables are linearly related. The values of the correlation coefficient range from -1 to 1. If the correlation coefficient is -1, the relationship between the variables indicates a negative or inverse correlation. When the correlation coefficient is 1, the variables are in positive correlation and are directly proportional. The correlation coefficient of zero indicates that there is no significant relationship between the variables.
Here are a few key takeaways to help you grasp the concept at a glance:
We can calculate the correlation coefficient easily by understanding each step listed below:
ρxy = Cov(x, y) / σx σy
Here:
ρxy represents Pearson’s product-moment correlation coefficient
Cov(x, y) is the covariance of variables x and y
σx, σy are the standard deviations for variables x and y.
\( r = \frac{n\sum xy - (\sum x)(\sum y)} {\sqrt{\bigl[n\sum x^{2} - (\sum x)^{2}\bigr] \bigl[n\sum y^{2} - (\sum y)^{2}\bigr]}} \),
Where:
Correlation and Regression are two related but different concepts. Understanding their differences will help you understand them better. Let’s look at a few key differences between Correlation and Regression:
Correlation |
Regression |
|
|
|
|
|
|
The correlation coefficient formulas vary in different types. We will now look at each of them:
Pearson’s Correlation Coefficient Formula
\( r = \frac{n\sum xy - (\sum x)(\sum y)} {\sqrt{\bigl[n\sum x^{2} - (\sum x)^{2}\bigr] \bigl[n\sum y^{2} - (\sum y)^{2}\bigr]}} \)
Where:
Linear Correlation Coefficient Formula
\(r_{xy} = \frac{n \sum_{i=1}^n x_i y_i - \left( \sum_{i=1}^n x_i \right)\left( \sum_{i=1}^n y_i \right)} {\sqrt{n \sum_{i=1}^n x_i^2 - \left( \sum_{i=1}^n x_i \right)^2} \; \sqrt{n \sum_{i=1}^n y_i^2 - \left( \sum_{i=1}^n y_i \right)^2}} \)
\( r_{xy} = \frac{n \displaystyle\sum_{i=1}^{n} x_i y_i - \left( \displaystyle\sum_{i=1}^{n} x_i \right) \left( \displaystyle\sum_{i=1}^{n} y_i \right)} {\sqrt{\,n \displaystyle\sum_{i=1}^{n} x_i^2 - \left( \displaystyle\sum_{i=1}^{n} x_i \right)^2}\; \sqrt{\,n \displaystyle\sum_{i=1}^{n} y_i^2 - \left( \displaystyle\sum_{i=1}^{n} y_i \right)^2}} \)Sample Correlation Coefficient Formula
rxy= Sxy/ Sx Sy
Sx, Sy represent the standard deviations
Sxy is the sample covariance
Population Correlation Coefficient Formula
Ρxy = σxy/ σx σy
Where:
σx σy is the population standard deviation
Σxy is the population covariance
Correlation coefficients are applied in various fields to determine the linear relationship between two different quantities. Let’s learn how they can be applied in various fields:
Students tend to make mistakes when calculating correlation coefficients. Such mistakes occur due to various reasons. Let’s explore such errors and a few tips to avoid them:
A café owner wants to analyze if temperature affects cool drinks sales. They collect data for 5 days:
The resultant value (0.98) shows that there is a positive correlation between the variables.
We organize the data provided:
Day |
Temperature (°C)(x) |
Cool Drinks Sales(Y) |
XY | X2 | Y2 |
1 | 24 | 200 | 4800 | 576 | 40000 |
2 | 32 | 300 | 9600 | 1024 | 9000 |
3 | 25 | 250 | 6250 | 625 | 62500 |
4 | 33 | 350 | 11550 | 1089 | 122500 |
5 | 40 | 450 | 18000 | 1600 | 202500 |
Calculating the sums:
∑X = 24 + 32 + 25 +33 +40 = 154
∑Y = 200 + 300 +250+ 350+ 450 = 1550
∑XY= 4800 + 9600 +6250 +11550 +18000 = 50200
∑X2 = 576 +1024 +625 +1089 +1600 = 4914
∑Y2 = 40000 + 90000 +62500 +122500 + 202500 = 517500
Given that n = 5
Using Pearson’s Correlation Formula,
Substituting values into the formula: (5 × 50200) − (154 × 1550)/ √[5 × 4914) − (154)2][(5 × 517500)-- (1550)2]
= (251000 – 238700)/ √(24570 − 23716) (2587500 – 2402500)
= 12300/ √(854 × 185000)
= 12300/ √157990000
= 12300/ 12573.37
r ≈ 0.98
Here, the resultant value shows that there is a positive correlation between the variables. This indicates that temperature and cool drinks are directly proportional.
A student wants to see if reading time affects student exam scores. Data for 4 students is collected:
The resultant value is 0.998, and it shows that there is a positive correlation between the variables. This indicates that an increase in reading time leads to an increase in the score results.
Calculating the sum:
∑X = 3 + 5 +7 +9 = 24
∑Y = 60 + 70 + 82 + 95 = 307
∑XY= 180 + 350 + 574 + 855 = 1959
∑X2 = 9 + 25 + 49 + 81 +1600 = 164
∑Y2 = 3600 + 4900 + 6724 + 9025 = 24249
Given that n = 4
Using Pearson’s Correlation Formula:
Substituting values into the formula:
r = (4 × 1959) − (24 × 307) )/ √[4× 164) − (24)2][(4 × 24249) – (307)2]
= (7836 – 7368)/ √(656 − 576) × (96996 – 94249)
= 468/ √(80 × 2747)
= 468/ √219760
= 468/ 468.86
r ≈ 0.998
Here, the resultant value shows that there is a positive correlation between the variables. This indicates that an increase in reading time leads to an increase in the score results.
Anita records her daily exercise hours and weight loss of 5 of her friends over a month:
The correlation coefficient is 0.91 which indicates that there is a positive correlation between the variables.
We organize the data provided:
Friends |
Exercise Hours(x) |
Weight Loss |
XY | X2 | Y2 |
1 | 3 | 4 | 12 | 9 | 16 |
2 | 2 | 3 | 6 | 4 | 9 |
3 | 4 | 5 | 20 | 16 | 25 |
4 | 1 | 2 | 2 | 1 | 4 |
5 | 5 | 10 | 50 | 25 | 100 |
Calculating the sums:
∑X = 3 + 2 + 4 + 1 + 5 = 15
∑Y = 4 + 3 + 5 + 2 + 10 = 24
∑XY= 12 + 6 + 20 + 2 + 50 = 90
∑X2 = 9 + 4 + 16 + 1+ 25 = 55
∑Y2 = 16 + 9 + 25 + 4 + 100 = 154
Given that n = 5
Using Pearson’s Correlation Formula:
r = r = n(Σxy) − (Σx)(Σy)/ √[nΣx2 − (Σx)2][nΣy2−(Σy)2]
Substituting values into the formula:
r = (5 × 90) − (15 × 24) )/ √[5× 55) − (15)2][(5 × 154) – (24)2]
= (450 – 360)/ √(275 − 225) × (770 – 576)
= 90/ √(50 × 194)
= 90/ √9700
= 90/ 98.49
r ≈ 0.91
Jaipreet Kour Wazir is a data wizard with over 5 years of expertise in simplifying complex data concepts. From crunching numbers to crafting insightful visualizations, she turns raw data into compelling stories. Her journey from analytics to education ref
: She compares datasets to puzzle games—the more you play with them, the clearer the picture becomes!