Summarize this article:
1217 LearnersLast updated on November 26, 2025

Correlation tells us how strongly two variables move together from -1 to +1, while regression helps to predict one variable using the other. Together, they let us explore and forecast the relationships, like how a child’s study time might engagingly influence test scores. In this article, we will learn in detail.
Correlation explains how two variables are related and whether they change in the same or opposite direction. This link is measured by the correlation coefficient from -1 to +1.
Here is the meaning of the values:
For example:
Imagine you are observing your child’s daily study hours and their marks in tests:
This helps us understand how two things are related and how they change together.
There are many types of correlation coefficients. Let us now see the most commonly used types of correlation coefficients:
Pearson’s Correlation Coefficient: This type of correlation coefficient measures the linear relationship between two continuous variables. The coefficient of values ranges from -1 (perfect negative correlation) to +1 (perfect positive correlation), with 0 being no correlation. This type of correlation coefficient assumes all the variables are distributed normally.
Spearman’s Rank Coefficient: The Spearman’s rank coefficient measures the strength and the direction of a relationship between two variables. We use this type for ranked data or when the assumption of normality is violated. It is based on ranking rather than the actual values.
Kendall’s Tau: This type of correlation coefficient measures the ranked association between two variables. We use it when the data has ties or small sample sizes. It compares a number of concordant and discordant pairs in ranking.
Regression is a statistical tool that helps us understand how one variable changes as another changes. It also allows us to predict future values of a dependent variable using information from one or more independent variables.
Here is an example:
Imagine you want to estimate your child’s exam score based on their study hours:
If regression shows that more study hours usually lead to higher marks, you can use this pattern to predict your child’s score based on how much they study. Regression turns simple observations into meaningful, real-life predictions.


There are several types of regression. Let us see some main types of regression mentioned below:
Linear Regression: We use this type of regression to model the relationship between one independent and one dependent variable using a straight line. It looks at two variables and tries to find if one variable makes a contribution to the other through putting it in a place on a linear equation.
Multiple Linear Regression: This type of regression looks at the effect of two or more independent variables on one dependent variable. This extends the linear regression to multiple independent variables.
Polynomial Regression: This type of regression shows the non-linear relationships by adding polynomial terms like squared, cubed, etc. It plans the connection between the dependent and independent variables by a polynomial function.
Logistic Regression: We use this type of regression for binary classification like yes/no or 0/1 outcomes. We also use a logistic function instead of a straight line. Likewise, we apply this type of regression where the dependent variable is qualitative.
The formulas and the explanation about each formula of correlation and regression, respectively, are mentioned below:
Correlation Formula: The formula most commonly used for correlation is the Pearson’s correlation coefficient formula:
r = \(\frac{\sum (X - \bar{X})(Y - \bar{Y})} {\sqrt{\sum (X - \bar{X})^{2} \; \sum (Y - \bar{Y})^{2}}}\)
Where,
r = Pearson’s correlation coefficient (ranges for -1 to +1)
X, Y = Individual data point for variables X and Y
X, Y = Mean (average) of X and Y.
= Summation Symbol.
The numerator shows the covariance between X and Y.
The denominator shows the product of the standard deviations of X and Y.
Regression Formula: The most commonly used regression formula is the linear regression formula, which is mentioned below:
Y = a + bX
Where,
a = Intercept
b = Slope
Y = Dependent variable
X = Independent Variable.
There are a lot of differences between correlation and regression. Let us see the differences of correlation and regression in the given table below:
| Correlation | Regression |
| It measures the strength and direction of the relationship between two variables. | It models the relationship between independent and dependent variables, which allows for predictions |
| The range of the values lie in between -1 and +1. | Regression has no fixed range. |
| The purpose is to show how strongly two variables are related. | It establishes a cause and effect relationship and predicts one variable based on the other. |
| We use it in statistics, economics, finance, psychology, and research | We use it for forecasting, data modelling, risk analysis and machine learning. |
Correlation Analysis
Correlation analysis helps us to understand whether two variables are connected and how strong that connection is. We can use a correlation coefficient, such as Pearson’s correlation, to obtain a value between -1 and +1 that indicates both the direction and strength of the relationship. A scatter plot is often used to visually display how two variables (x and y) move together.
Regression Analysis
Regression analysis goes beyond the correlation. It explains the exact relationship between two variables and helps us to predict the value of one variable using the other. In linear regression, we fit a straight line to the data points. This line clearly shows how x and y are related when plotted on a graph.
In statistics, the strength and direction of the relationship between two variables are measured using “r”, the correlation coefficient. This value shows how closely the variables move together. When the relationship is not straight or linear, more advanced techniques are used to capture and represent the curved pattern between the variables.
Correlation and regression are two complex mathematical concepts and to get a better understanding of them, some tips and tricks are mentioned below:
Students tend to make mistakes when they solve problems related to correlation and regression. Let us now see the common mistakes they make and the solutions to avoid them:
There are a lot of applications of correlation and regression. Let us now see the different uses of correlation and regression in different fields:
Economics and Finance: We use correlation in economics and finance, where it is used to analyze the relationship between economic indicators. We use regression in economics and finance to predict the future trends, use GDP growth to forecast employment rates.
Stock Market: We use correlation and regression in stock markets, where correlation is used for the study of the correlation between stock prices. Regression is used to forecast the stock prices.
Healthcare: We use correlation and regression in stock markets, where correlation is used to find the risk factors for diseases. We use regression to make medical predictions.
Agriculture: In agriculture, correlation is used to find the relationship between rainfall and crop yield.
Regression is applied to predict future crop production based on soil fertility, fertilizer use, and weather conditions.
Marketing and Business: Companies use correlation to identify the relationship between advertising expenditure and sales revenue. Regression analysis helps businesses forecast future sales based on factors like price, promotions, and customer demand.
Compute the Pearson correlation coefficient for the paired data: X = [1, 2, 3, 4, 5] and Y = [2, 4, 5, 4, 5]
r 0.78
Compute the means:
X = \(\frac{1 + 2 + 3 + 4 + 5}{5}\) = 3.
Y = \(\frac{2 + 4 + 5 + 4 + 5}{5}\) = 4
Compute the deviations and their products:
| X | Y | X - X | Y - Y | (X - X)(Y - Y) | (X - X)2 | (Y - Y)2 |
| 1 | 2 | -2 | -2 | 4 | 4 | 4 |
| 2 | 4 | -1 | 0 | 0 | 1 | 0 |
| 3 | 5 | 0 | 1 | 0 | 0 | 1 |
| 4 | 4 | 1 | 0 | 0 | 1 | 0 |
| 5 | 5 | 2 | 1 | 2 | 4 | 1 |
Sum the Products and squares:
(X - X)(Y - Y) = \(4 + 0 + 0 + 0 + 2 = 6\)
(X - X)2 = \(4 + 1 + 0 + 1+ 4 = 10\)
(Y - Y)2 = \(4 + 0 + 1 + 0 + 1 = 6\)
Apply the Pearson formula:
r \(= \frac{6}{10} \times 6 = 6.60 = \frac{6}{7.746} = 0.775\)
For the ranked data X = [1, 2, 3, 4, 5] and Y = [2, 3, 5, 4, 1], compute Spearman’s rank correlation coefficient.
ρ = 0.9
Assign Ranks:
Y ranks:
Y = 2 → Rank 1
Y = 3 → Rank 2
Y = 5 → Rank 3
Y = 4 → Rank 4
Y = 1 → Rank 5
Calculate the differences between the ranks of X and Y
| Observation | Rank X | Rank Y | d = Rank X - Rank Y | d2 |
| 1 | 1 | 1 | 0 | 0 |
| 2 | 2 | 2 | 0 | 0 |
| 3 | 3 | 4 | -1 | 1 |
| 4 | 4 | 3 | 1 | 1 |
| 5 | 5 | 5 | 0 | 0 |
Sum the squared differences:
\(\sum d^{2} = 0 + 0 + 1 + 1 + 0 = 2\)
Applying the Spearman’s formula:
\(\rho = 1 - \frac{6\sum d^{2}}{n(n^{2} - 1)}
= 1 - \frac{6 \times 2}{5(5^{2} - 1)}
= 1 - \frac{12}{5(24)}
= 1 - \frac{12}{120}
= 1 - 0.1
= 0.9\)
For the dataset X = [2, 4, 6, 8, 10] and Y = [20, 16, 12, 8, 4], find the Pearson correlation coefficient.
r = -1
Means:
X = \(\frac{2 + 4 + 6 + 8 + 10}{5} = 6\)
Y = \(\frac{20 + 16 + 12 + 8 + 4}{5} = 12\)
Deviations and products:
| X | Y | X - X | Y - Y | Product |
| 2 | 20 | -4 | 8 | -32 |
| 4 | 16 | -2 | 4 | -8 |
| 6 | 12 | 0 | 0 | 0 |
| 8 | 8 | 2 | -4 | -8 |
| 10 | 4 | 4 | -8 | -32 |
∑(X − X)(Y − Y) =\( -32 - 8 + 0 – 8 – 32 = -80\)
Sums of Squares:
\(\sum (X - \bar{X})^{2}
= (-4)^{2} + (-2)^{2} + 0^{2} + 2^{2} + 4^{2}
= 16 + 4 + 0 + 4 + 16
= 40\)
\(\sum (Y - \bar{Y})^{2}
= 8^{2} + 4^{2} + 0^{2} + (-4)^{2} + (-8)^{2}
= 64 + 16 + 0 + 16 + 64
= 160\)
Pearson’s r:
\(r = \frac{-8040}{160} = -50.25
\)
Using the dataset X = [1, 2, 3, 4, 5] and Y = [2, 4, 6, 8, 10], determine the regression line and predict Y when X = 7.
Y = 2X and predicted Y(7) = 14
Find the Means:
X = 3 and Y = 6
Compute the slope:
b = Change in Y/Change in X = \(\frac{10 - 2}{5 - 1} = \frac{8}{4} = 2\)
Intercept:
\(a = Y - bX = 6 - 2 \times 3 = 0\)
Prediction:
Regression equation: Y = 2X
For X = 7: Y = 2 x 7 = 14.
Given the regression equation, Y = 5 + 3X, interpret the slope and intercept.
Intercept = 5 and Slope = 3
Intercept:
When X = 0, the predicted Y is 5. This is the baseline value.
Slope:
For each unit increase in X, Y increases by 3 units.
Jaipreet Kour Wazir is a data wizard with over 5 years of expertise in simplifying complex data concepts. From crunching numbers to crafting insightful visualizations, she turns raw data into compelling stories. Her journey from analytics to education ref
: She compares datasets to puzzle games—the more you play with them, the clearer the picture becomes!






