Summarize this article:
1602 LearnersLast updated on November 22, 2025

Outliers are extreme values and an essential part of a dataset. Outliers provide valuable insights into data and can significantly impact the results of analysis. Let us now learn more about an outlier.
Outliers are data points that stand out because they are much higher or lower than the rest of the data. Outliers can disproportionately affect statistical measures such as the mean, standard deviation, and regression models, skewing results and leading to misguided conclusions. This is because statistical measures, such as the mean, standard deviation, and regression models, are sensitive to extreme values.
For example,
A teacher records the marks of 10 students in a math test. \(45, 48, 50, 47, 49, 46, 50, 48, 47, 95\).
Here, all the scores are around 45 – 50, but 95 is much higher than the rest.
So, 95 is an outlier because it is unusually high compared to the other data points.
Finding outliers is an integral part of data analysis because unusual values can heavily influence the results. Outliers can be identified using two main approaches: visualization techniques and statistical methods.
Using Visualization Methods
Visualization involves presenting data in graphical form, making it easier to spot patterns and detect unusual values. Two commonly used visual tools are:
1. Box Plot
A box plot displays the minimum, first quartile (Q₁), median, third quartile (Q₃), and maximum.
Any point that lies outside the “whiskers” (beyond\(Q_1 - 1.5 \times \mathrm{IQR} \quad \text{or} \quad Q_3 + 1.5 \times \mathrm{IQR}
\)) is considered an outlier.
Example:
Data: 12, 14, 15, 16, 17, 18, 40
In the box plot, 40 appears far outside the upper whisker, making it an outlier.
2. Scatter Plot
A scatter plot displays individual data points on a graph. Outliers appear as points that lie far away from the general cluster.
Example:
If you plot students’ study hours vs. exam scores, and most points form a cluster, but one point. For example, 10 hours of study and only 5 marks that lies far away, that point is an outlier.
A box plot is a statistical chart that provides a visual summary of how data is distributed. Outliers in a box plot can be identified using these steps.
Step 1: Sort the data in ascending order and determine the median.
Step 2: Calculate the Interquartile Range (IQR), which represents the central 50% of the dataset.
Step 3: Determine the lower and upper bounds (also called fences) using the IQR.
Step 4: Any data point that lies below the lower bound or above the upper bound is considered an outlier.
A scatter plot is used to visualize the relationship between two continuous variables, with each point represented as a dot. In this plot, any points far from the central cluster of data are considered outliers.
Using Statistical Methods
To detect the outliers numerically, statistical techniques are used. Some commonly used methods include Z-score, DBSCAN, and the Isolation Forest algorithm.
Identifying Outliers Using the Z-Score
The Z-score method measures how many standard deviations a data point is from the mean. It is calculated using the formula:
\(Z=X-μσ\)
Where:
X = the data point
μ = mean of the dataset
σ = standard deviation of the dataset
A data point is considered an outlier if its Z-score is greater than +3 or less than –3.
The Isolation Forest algorithm is an anomaly detection technique that uses decision trees to separate data points. It works by randomly partitioning the dataset. Points that are isolated in fewer steps are considered outliers because unusual values stand out and are easier to separate from the rest.
For example,
Consider the dataset representing daily sales (in units):
50, 52, 55, 53, 58, 60, 54, 300
Most values are close to each other, between 50 and 60. But 300 is hugely different.
When the Isolation Forest algorithm creates random partitions:
Since 300 requires fewer splits, the algorithm marks it as an outlier.
Outliers can be identified using different methods depending on the complexity of the data, the time available, and the level of accuracy needed. Here are four commonly used methods, along with the steps:
In this method the data is arranged in ascending order and sorting the data visually scanning the extreme values.
Step 1: Arrange the data in ascending, that is, from small to big
Step 2: The value which is higher than the other values are considered to be the outlier
Calculating Outlier by Statistical Outlier Detection (Z-score Method)
The z-score is calculated by using the formula,\( z = X - μ/σ\),
Here, X is the data point
μ is the mean of the data set
σ (sigma) is the standard deviation.
If the value is greater than or less than ±3, then the value is an outlier. That is an outlier is more than 3 times a standard deviation.
Interquartile range is the median of the half of the data set. In this method, we find the outlier by following these steps,
Step 1: Arranging the data in ascending order, that is, low from high.
Step 2: Finding the value of Q1 and Q3, Q1 is the middle value of the lower half and Q3 is the middle of the upper half.
Step 3: Calculate the value of IQR. So, \(IQR = Q3 - Q1\).
Step 4: Finding the value of lower bound and upper bound, here the lower bound = \(Q1 - 1.5 × IQR\) and the upper bound = \(Q3 + 1.5 × IQR\).
To master the topic outliers, some tips and tricks are mentioned below.
Now let’s learn a few common mistakes that students tend to repeat when working on outlier. But learning these students can master outlier
Outlier is used in different fields such as finance, environment monitoring, cybersecurity, and so on. Let’s learn a few real-life applications of outliers.
A teacher records the ages of students in a class: 12, 13, 14, 15, 12, 13, 14, 12, 13, 27. Find the outlier in the dataset.
The outlier is 27.
Arranging the data: 12, 12, 12, 13, 13, 13, 14, 14, 15, 27
The data set has 10 numbers
Here, Q1 is 12
Q3 is 14
So,\( IQR = Q3 - Q1 = 14 - 12 = 2\)
Lower bound =\( Q1 - 1.5 × IQR = 12 - 1.5 × 2 = 9\)
Upper bound = \(Q3 + 1.5 × IQR = 14 + 1.5 × 2 = 17\)
Any value below 9 or above 17 is the outlier. Here the outlier is 27.
A runner records his daily running distance (in miles) over 7 days: 3, 4, 3.5, 3.8, 4.2, 3.9, 10. Identify the outlier.
The outlier is 10.
Sorting the data: 3, 3.5, 3.8, 3.9, 4, 4.2, 10
Here the median is 4th value: 3.9
The lower half is 3, 3.5, 3.8. So, \(Q1 = 3.5\)
The upper half is 4, 4.2, 10. So, \(Q3 = 4.2\)
So, \(IQR = Q3 - Q1 = 4.2 - 3.5 = 0.7\)
Finding the lower bound,
Lower bound = \(Q1 - 1.5 × IQR \)
= \(3.5 - 1.5 × 0.7 = 2.45\)
Finding the upper bound,
Upper bound = \(Q3 + 1.5 × IQR\)
= \(4.2 + 1.5 × 0.7 = 5.25 \)
The number below 2.45 and above 5.25 is the outlier
Here the outlier is 10.
A bakery records daily cupcake sales: 25, 30, 28, 35, 27, 500, 32. Find the outlier.
The outlier is 500.
Sorting the data: 25, 27, 28, 30, 32, 35, 500
The 4th value is the median, so the median is 30
The lower half is 25, 27, 28. So, Q1 is 27
The upper half is 32, 35, 500. So, Q3 is 35
\(IQR = Q3 - Q1 \)
So, \(IQR = 35 - 27 = 8\)
Now let’s find the lower bound,
Lower bound = \(Q1 -1.5 × IQR = 27 - 1.5 × 8 = 15\)
Upper bound = \(Q3 +1.5 × IQR = 35 + 1.5 × 8 = 47\)
Here, the outlier is below 15 and above 47, so the outlier is 500.
A group of friends records their heights in inches: 60, 61, 62, 63, 64, 65, 90. Identify the outlier.
The outlier here is 90.
Sorting the data in ascending order: 60, 61, 62, 63, 64, 65, 90
Here the median is the 4th value, which is 63
Therefore, the lower half is 60, 61, 62. So, Q1 is 61.5
The upper half is 64, 65, 90. So, Q3 is 64.5
\(IQR = Q3 - Q1 = 64.5 - 61.5 = 3\)
Lower bound = \(Q1 - 1.5 × IQR\)
\(= 61.5 - 1.5 × 3 = 61.5 - 4.5 = 57\)
Upper bound =\( Q3 + 1.5 × IQR = 64 + 1.5 × 4 \)
= \(64.5 + 1.5 × 3 = 64.5 + 4.5= 69\)
Any value above 69 is an outlier. As \(90 > 69\), it is the outlier.
A company records the number of employees working overtime each week: 5, 7, 6, 8, 6, 50, 7. Identify the outlier.
The outlier here is 50.
Sorting the data in ascending order: 5, 6, 6, 7, 7, 8, 50
Here the median is the 4th value, which is 7
Therefore, the lower half is 5, 6, 6. So Q1 is 6
The upper half is 7, 8, 50. So, Q3 is 8
\(IQR = Q3 - Q1 = 8 - 6 = 2\)
Lower bound = \(Q1 - 1.5 × IQR\)
\(= 6 - 1.5 × 2 = 3\)
Upper bound = \(Q3 + 1.5 × IQR\)
= \(8 + 1.5 × 2 = 11\)
Any value above 11 and below 3 is an outlier. As \(50 > 11\) it is the outlier.
Jaipreet Kour Wazir is a data wizard with over 5 years of expertise in simplifying complex data concepts. From crunching numbers to crafting insightful visualizations, she turns raw data into compelling stories. Her journey from analytics to education ref
: She compares datasets to puzzle games—the more you play with them, the clearer the picture becomes!






