Summarize this article:
1324 LearnersLast updated on November 26, 2025

Correlation coefficients measure the relationship between two variables, such as screen time and mental health. These coefficients are used in numerous fields such as finance, education, and health care. In this topic, you will learn about the correlation coefficient and its significance from a broader perspective.
The correlation coefficient is a statistical metric that measures how strongly two variables are linearly related. The values of the correlation coefficient range from -1 to 1. If the correlation coefficient is -1, the relationship between the variables indicates a negative or inverse correlation. When the correlation coefficient is 1, the variables are in positive correlation and are directly proportional. The correlation coefficient of zero indicates that there is no significant relationship between the variables.
Here are a few key takeaways to help you grasp the concept at a glance:
Correlation and regression are two related but different concepts. Understanding their differences will help you understand them better. Let’s look at a few key differences between Correlation and regression:
|
Correlation |
Regression |
|
Analyzes the strength and direction of the linear connection between two variables. |
Measures the relationship between an independent variable and a dependent variable. |
|
The correlation can be positive or negative, depending on the connection between the variables. |
Establishes a functional dependence, where the changes in one variable directly affect the other. |
|
Correlation is the same for both variables. |
It is not the same for both the variables |
The correlation coefficient formula helps measure the strength and direction of the relationship between two variables. Several forms of the formula are used depending on the type of data. The main correlation coefficient formulas are given below.
Pearson’s Correlation Coefficient Formula
\( r = \frac{n\sum xy - (\sum x)(\sum y)} {\sqrt{\bigl[n\sum x^{2} - (\sum x)^{2}\bigr] \bigl[n\sum y^{2} - (\sum y)^{2}\bigr]}} \)
Where:
Sample Correlation Coefficient Formula
\(r_{xy} = \frac{s_{xy}}{s_x \cdot s_y} \)
\(S_x, S_y \) represent the standard deviations
\(S_{xy}\) is the sample covariance
Population Correlation Coefficient Formula
\(\rho_{xy} = \frac{\sigma_{xy}}{\sigma_x \, \sigma_y} \)
Where:\(\sigma_x, \sigma_y \) is the population standard deviation
\(Σxy \) is the population covariance
Linear Correlation Coefficient Formula
\(r_{xy} = \frac{n \sum_{i=1}^n x_i y_i - \left( \sum_{i=1}^n x_i \right)\left( \sum_{i=1}^n y_i \right)} {\sqrt{n \sum_{i=1}^n x_i^2 - \left( \sum_{i=1}^n x_i \right)^2} \; \sqrt{n \sum_{i=1}^n y_i^2 - \left( \sum_{i=1}^n y_i \right)^2}} \)


We can calculate the correlation coefficient easily by understanding each step listed below:
\(\rho_{xy} = \frac{\text{Cov}(x, y)}{\sigma_x \, \sigma_y} \)
Here:
\(\rho_{xy}\) represents Pearson’s product-moment correlation coefficient
Cov(x, y) is the covariance of variables x and y
\(\sigma_x, \sigma_y \) are the standard deviations for variables x and y.
Suppose a teacher wants to check whether students who study more hours tend to score higher marks. The data for 5 students is given below:
| Students | Study hour (x) | Mark (y) |
| 1 | 2 | 50 |
| 2 | 3 | 60 |
| 3 | 5 | 80 |
| 4 | 6 | 85 |
| 5 | 4 | 70 |
To calculate the correlation coefficient using the formula:
\(\rho_{xy} = \frac{\sigma_{xy}}{\sigma_x \, \sigma_y} \)
Find the mean of x and y:
\({\bar x} = {{2 + 3 + 5 + 6 + 4}\over{5 }}= 4 \)
\({\bar y} = {{50 + 60 + 80 + 85 + 70}\over {5}} = 69 \)
Finding the covariance:
\(\text{Cov}(x, y) = \frac{1}{n} \sum (x_i - \bar{x})(y_i - \bar{y}) \)
| x | y | x - 4 | y -69 | \((x - 4)(y - 69)\) |
| 2 | 50 | -2 | -19 | 38 |
| 3 | 60 | -1 | -9 | 9 |
| 5 | 80 | 1 | 11 | 11 |
| 6 | 85 | 2 | 16 | 32 |
| 4 | 70 | 0 | 1 | 0 |
\(Σ (x - x) (y - y) = 38 + 9 + 11 + 32 + 0 \\ \ \\ = 90\)
Calculating the standard deviations:
For x: \(\sigma_x = \sqrt{\frac{\sum (x_i - \bar{x})^2}{n}} \)
\( (x_i - \bar{x})^2 = (-2)^2 + (-1)^2 + 1^2 + 2^2 + 0^2 \)
\(= 4 + 1 + 1 + + 4 \\ \ \\ = 10\)
\(= \sqrt{\frac{10}{5}} \)
\(={ \sqrt 2} = 1.414\)
For y:
\((y - {\bar y})^2 = 361 + 81 + 121 + 256 + 1 = 820 \)
\(\sigma_y = {\sqrt {820 \over 5}} \)
\(= \sqrt {164}\)
=12.806
Apply the formula: \(\rho_{xy} = \frac{\sigma_{xy}}{\sigma_x \, \sigma_y} \)
\(\rho_{xy} = {{18 \over(1.414) (12.806)}}\)
\(= {18\over 18.10} = 0.994 \)
\(\rho_{xy} = 0.99\)
The correlation coefficient indicates the strength of the relationship between two variables and whether it is positive or negative. Here are a few tips and tricks to master the correlation coefficient.
Students tend to make mistakes when calculating correlation coefficients. Such mistakes occur due to various reasons. Let’s explore such errors and a few tips to avoid them:
Correlation coefficients are applied in various fields to determine the linear relationship between two different quantities. Let’s learn how they can be applied in various fields:
A café owner wants to analyze if temperature affects cool drinks sales. They collect data for 5 days:
The resultant value (0.98) shows that there is a positive correlation between the variables.
We organize the data provided:
| Day |
Temperature (°C)(x) |
Cool Drinks Sales(Y) |
XY | X2 | Y2 |
| 1 | 24 | 200 | 4800 | 576 | 40000 |
| 2 | 32 | 300 | 9600 | 1024 | 9000 |
| 3 | 25 | 250 | 6250 | 625 | 62500 |
| 4 | 33 | 350 | 11550 | 1089 | 122500 |
| 5 | 40 | 450 | 18000 | 1600 | 202500 |
Calculating the sums:
\(∑X = 24 + 32 + 25 +33 +40 = 154 \)
\(\sum Y = 200 + 300 + 250 + 350 + 450 = 1550 \)
\(∑XY= 4800 + 9600 +6250 +11550 +18000 = 50200 \)
\(\sum X^2 = 576 + 1024 + 625 + 1089 + 1600 = 4914 \)
\(∑Y^2 = 40000 + 90000 +62500 +122500 + 202500 = 517500\)
Given that \(n = 5\)
Using Pearson’s Correlation Formula,
\( r = \frac{n\sum xy - (\sum x)(\sum y)} {\sqrt{\bigl[n\sum x^{2} - (\sum x)^{2}\bigr] \bigl[n\sum y^{2} - (\sum y)^{2}\bigr]}} \)
Substituting values into the formula:
\(r = {{(5 × 50200) − (154 × 1550)} \over {\sqrt {[5 × 4914) − (154)^2][(5 × 517500)-- (1550)^2]}}} \)
\({= {{(251000 – 238700)}\over { \sqrt {(24570 − 23716) (2587500 – 2402500)} }}}\)
\(= {{12300} \over {\sqrt{(854 × 185000)}}} \)
\( = {{12300\over {\sqrt{157990000}} }} \)
\( = {12300\over12573.37 } \)
\( r ≈ 0.98 \)
Here, the resultant value shows that there is a positive correlation between the variables. This indicates that temperature and cool drinks are directly proportional.
A student wants to see if reading time affects student exam scores. Data for 4 students is collected:
The resultant value is 0.998, and it shows that there is a positive correlation between the variables. This indicates that an increase in reading time leads to an increase in the score results.
| Student | Reading hours | Test score | XY | \(X^2\) | \(Y^2\) |
| 1 | 3 | 60 | 180 | 9 | 3600 |
| 2 | 5 | 70 | 350 | 25 | 4900 |
| 3 | 7 | 82 | 574 | 49 | 6725 |
| 4 | 9 | 95 | 855 | 81 | 9025 |
Calculating the sum:
\(∑X = 3 + 5 +7 +9 = 24\)
\(∑Y = 60 + 70 + 82 + 95 = 307 \)
\(∑XY= 180 + 350 + 574 + 855 = 1959 \)
\( ∑X^2 = 9 + 25 + 49 + 81 +1600 = 164 \)
\( ∑Y^2 = 3600 + 4900 + 6724 + 9025 = 24249\)
Given that \(n = 4\)
Using Pearson’s Correlation Formula:
\( r = \frac{n\sum xy - (\sum x)(\sum y)} {\sqrt{\bigl[n\sum x^{2} - (\sum x)^{2}\bigr] \bigl[n\sum y^{2} - (\sum y)^{2}\bigr]}} \)
Substituting values into the formula:
\(r = \frac{(4 \times 1959) - (24 \times 307)} {\sqrt{\, [4 \times 164 - 24^2] \, [4 \times 24249 - 307^2]\, }} \)
\(= \frac{7836 - 7368}{\sqrt{(656 - 576)\,(96996 - 94249)}} \)
= \(468 \over {\sqrt{(80 × 2747)}}\)
\( = \frac{468}{\sqrt{219760}} \)
\( = \frac{468}{468.86} \)
r ≈ 0.998
Here, the resultant value shows that there is a positive correlation between the variables. This indicates that an increase in reading time leads to an increase in the score results.
Anita records her daily exercise hours and weight loss of 5 of her friends over a month:
The correlation coefficient is 0.91 which indicates that there is a positive correlation between the variables.
We organize the data provided:
| Friends |
Exercise Hours(x) |
Weight Loss |
XY | X2 | Y2 |
| 1 | 3 | 4 | 12 | 9 | 16 |
| 2 | 2 | 3 | 6 | 4 | 9 |
| 3 | 4 | 5 | 20 | 16 | 25 |
| 4 | 1 | 2 | 2 | 1 | 4 |
| 5 | 5 | 10 | 50 | 25 | 100 |
Calculating the sums:
\(∑X = 3 + 2 + 4 + 1 + 5 = 15\)
\(∑Y = 4 + 3 + 5 + 2 + 10 = 24\)
\(∑XY= 12 + 6 + 20 + 2 + 50 = 90 \)
\(∑X^2 = 9 + 4 + 16 + 1+ 25 = 55 \)
\(∑Y^2 = 16 + 9 + 25 + 4 + 100 = 154\)
Given that \(n = 5\)
Using Pearson’s Correlation Formula:
\(r = \frac{n(\Sigma xy) - (\Sigma x)(\Sigma y)} {\sqrt{\,\left[n\Sigma x^{2} - (\Sigma x)^{2}\right]\left[n\Sigma y^{2} - (\Sigma y)^{2}\right]\,}} \)
Substituting values into the formula:
\(= \frac{(5 \times 90) - (15 \times 24)} {\sqrt{\, [5 \times 55 - 15^{2}] \,[5 \times 154 - 24^{2}]\,}} \)
\(= \frac{450 - 360}{\sqrt{(275 - 225)\,(770 - 576)}} \)
\(= \frac{90}{\sqrt{50 \times 194}} \)
\(= \frac{90}{\sqrt{9700}} \)
\(= \frac{90}{98.49} \)
r ≈ 0.91
Jaipreet Kour Wazir is a data wizard with over 5 years of expertise in simplifying complex data concepts. From crunching numbers to crafting insightful visualizations, she turns raw data into compelling stories. Her journey from analytics to education ref
: She compares datasets to puzzle games—the more you play with them, the clearer the picture becomes!






