Last updated on August 27th, 2025
The covariance matrix, also known as the variance-covariance matrix, is a square, symmetric, and positive semi-definite matrix. It displays the relationship between two elements in a random vector. Each entry on the diagonal shows the variance of an individual element. It is used in stochastic modelling and principal component analysis.
A covariance matrix is used in statistics to understand the types of relationships between different variables. The matrix gives us variance and covariance. Variance refers to the measure at which a variable expands from its mean, and covariance tells us how two variables change with respect to each other.
Covariance can be positive, negative, or zero. A positive value suggests both variables increase together. A negative value means when one variable increases, the other decreases. A zero covariance suggests that the variables are not related.
Follow the given steps to calculate a covariance matrix when a dataset is provided:
Step 1: Organize the dataset forming an n × m matrix with each row representing an observation or data point and each column representing a variable.
For example:
Step 2: Find the mean of each column.
Step 3: Subtract the mean of each column from every entry in that column. This results in a mean-centered matrix. xcentered = x − xˉ
Step 4: Apply the covariance matrix formula.
Covariance matrix =1n-1xcenteredTxcentered
Here,
xT is the transpose of the centered matrix
1n-1 is used to sample the covariance
A centered matrix is a matrix from which the mean of each variable is subtracted from its values, resulting in each column having a mean of zero.
The properties of the covariance matrix are listed below:
For a random vector with n variables X = [x1, x2, . . ., xn]. The covariance matrix is an n × n square matrix:
Where,
Sample variance: var(x) = 1n(xi-x ⎺)2n - 1
Sample covariance: cov(x, y) = 1n(xi-x ⎺)2(yi-y ⎺)n-1
Population variance: var(x) = 1n(xi-)2n
Population covariance: cov(x, y) = 1n(xi-x)(yi-y)n
Here,
represents the mean of the population.
x ⎺ is the mean of the sample data.
n is the total number of observations in the dataset.
xi refers to individual data points in the dataset x
2 ⨯ 2 Covariance Matrix
For two random variables x and y, a 2 × 2 matrix is expressed as
3 ⨯ 3 Covariance Matrix
For three random variables x, y, and z a 3 × 3 covariance matrix is represented as:
Covariance matrices help in understanding the relationship between variables and are used across many fields, in real-life situations, including the following:
Risk analysis of stocks in finance
Investors use covariance matrices to understand how different stocks will rise or fall in the market. This is useful for diversifying investment and spreading it across different assets.
Principal component analysis in machine learning
PCA requires covariance to identify the main directions in which the data varies. This helps simplify the data while focusing only on those directions, like compressing an image without affecting the quality.
Noise reduction in image processing
Nearing pixels in an image are often similar. Covariance matrices help isolate and reduce random noise that may affect image quality. They help retain image quality, which is useful in MRI and CT scan processing.
Object tracking in engineering and robotics
In robotics and engineering, covariance helps in tracking movements of an object. This property can be seen in Kalman filter used to predict where a moving object is going even when an image is unclear.
Climate pattern detection in environmental sciences
Covariance matrices help study how temperature, rainfall, or pressure readings across different regions relate to each other.This is useful for identifying patterns like El Niño or predicting future climate changes.
While working with covariance matrices, students tend to make conceptual and calculation errors. The most commonly occurring errors are mentioned below for students to refer to and avoid.
Find the covariance matrix for X = [2, 4, 6] and Y = [1, 3, 5]
na
Calculate the mean of X and Y
Mean of X = 4, mean of Y = 3
Deviations of X
2 - 4 = -2
4 - 4 = 0
6 - 4 = 2
Deviation of Y
1 - 3 = -2
3 - 3 = 0
5 - 3 = 2
Using sample covariance formula,
Var(X) = (-2)2 + 02 + 222=4 + 0 + 42=84=4
Var(Y) = (-2)2 + 02 + 222=4 + 0 + 42=84=4
Cov(X,Y) = (-2)2 + 02 + 222=4 + 0 + 42=84=4
So,
If three datasets are A = [1, 2], B = [3, 4], C = [5, 6], what is the 3 × 3 covariance matrix?
na
Each dataset has two values, so
Mean of A = 1 + 22=1.5
Mean of B = 3 + 42=3.5
Mean of C = 5 + 62=5.5
Deviations of A:
Deviation 1 = 1 - 1.5 = -0.5
Deviation 2 = 2 - 1.5 = 0.5
Deviations of B:
Deviation 1 = 3 - 3.5 = -0.5
Deviation 2 = 4 - 3.5 = 0.5
Deviation of C:
Deviation 1 = 5 - 5.5 = -0.5
Deviation 2 = 6 - 5.5 = 0.5
Now we will apply the covariance formula to compute all covariances.
Cov(A, A) = (-0.5)2 + (0.5)21=0.25+0.25=0.50
Cov(A, B) = (-0.5)(-0.5) + (0.5)(0.5)1=0.25+0.25=0.5
Cov(A, C) = (-0.5)(-0.5) + (0.5)(0.5)1=0.5
Cov(B, B) = 0.5
Cov(B, C) = 0.5
Cov(C, C) = 0.5
So, the covariance matrix is
A person gets daily returns for 2 stocks, Stock A = [0.01, 0.03, 0.02] and Stock B = [ 0.02, 0.06. 0.04]. What is the covariance matrix?
na
Find the average return of each stock
Mean of stock A = (0.01 + 0.03 + 0.02)3=0.02
Mean of stock B = (0.02 + 0.06 + 0.04)3=0.04
Deviation of A:
Day 1 = 0.01 - 0.02 = -0.01
Day 2 = 0.03 - 0.02 = 0.01
Day 3 = 0.02 - 0.02 = 0
Deviation of B:
Day 1 = 0.02 - 0.04 = -0.02
Day 2 = 0.06 - 0.04 = 0.02
Day 3 = 0.04 - 0.04 = 0
Variance of A = (-0.01)2+(0.01)2+023=0.000230.000067
Variance of B = (-0.02)2+ (0.02)2+ 023= 0.000830.000267
Covariance(A, B) = (-0.01)(-0.02) + (0.01)(0.02) + 03=0.0002 + 0.00023=0.000430.000133
Let X = [1, 2, 3], Y = [4, 5, 6]. Find the covariance matrix.
na
Mean of X =1 + 2 + 33=2, mean of Y =4 + 5 + 63=5
Deviation of X
At 1st point = 1 - 2 = -1
At 2nd point = 2 - 2 = 0
At 3rd point = 3 - 2 = 1
Deviation of Y
At 1st point = 4 - 5 = -1
At 2nd point = 5 - 5 = 0
At 3rd point = 6 -5 = 1
Variance X = 1 + 0 + 13=23
Variance Y = 1 + 0 + 13=23
Covariance(X, Y) = (-1)(-1) + (0)(0) + (1)(1)3=23
Let X = [3, 3, 3], Y = [2, 2, 2]. What is the covariance matrix?
na
Both datasets X and Y have no variation, all the values are the same.
Variance of X = 0
Variance of Y = 0
Covariance between X and Y = 0
Since there is no deviation, they are not related, resulting in a zero matrix.