Last updated on June 12th, 2025
Outliers are the extreme values and an essential part in a dataset. It is helpful because it gives us valuable insights into our data and can impact the final result of our analysis. Let us now learn more about an outlier.
Outliers are those data points that stand out because they are much higher or lower than the rest of the data points. The statistical measures like skewing the general outcomes, suggest, and leading to misguided conclusions have a disproportionate influence on the outlier. This is because the statistical measures are sensitive to extreme values, such as the mean, standard deviation, and regression models.
For instance, in the dataset of the score of a class, if for one exam the marks are higher than the other exams, then the higher mark is considered as the outlier.
Struggling with Math?
Get 1:1 Coaching to Boost Grades Fast !
An important step in data analysis is identifying an outlier. Visualization and statistical methods are the two primary methods to identify outliers. Let's discuss them in detail:
Using Visualizations Method
In visualization techniques, data is displayed graphically to help observe patterns and identify key insights. The common tools used in this technique are box plots and scatter plots.
The statistical chart that summarizes the data distribution is the box plot, here an outlier is identified by following these steps,
Step 1: The data is sorted and arranged to find the median
Step 2: Then the IQR is identified, which represents the middle 50% of the data
Step 3: Finding the maximum and minimum values in the lower and upper bounds
Step 4: The data point that falls below the lower bound or above the upper bound is the outlier
The relationship between two continuous variables is visualized, represented using a scatter plot. It is represented using a dot. In the scatter plot, the points which are separate from the main cluster are the outliers.
Using Statistical Method
For quantitative outlier detection, we use statistical methods. The common methods used here are z-score, DBSCAN, and the isolation forest algorithm.
Identifying Outlier Using Z-Score
To find outlier using the Z-score, we use the formula:
Z = X - μσ
Where X is the data point
μ is the mean of the dataset
σ is the standard deviation of the dataset
If the value is greater than or less than ±3, then the value is the outlier.
The isolation forest algorithm is a type of anomaly detection method based on the decision trees. The isolated data points in the random partitioning of data is the outlier here.
Outlier is calculated using different methods based on the data complexity, time, and so on. Let us learn about four of these methods:
In this method the data is arranged in ascending order and sorting the data visually scanning the extreme values.
Step 1: Arrange the data in ascending, that is, from small to big
Step 2: The value which is higher than the other values are considered to be the outlier
Calculating Outlier by Statistical Outlier Detection (Z-score Method)
The z-score is calculated by using the formula, z = X - μ/σ,
Here, X is the data point
μ is the mean of the data set
Σ is the standard deviation
If the value is greater than or less than ±3, then the value is an outlier. That is an outlier is more than 3 times a standard deviation.
Interquartile range is the median of the half of the data set. In this method, we find the outlier by following these steps,
Step 1: Arranging the data in ascending order, that is low from high
Step 2: Finding the value of Q1 and Q3, Q1 is the middle value of the lower half and Q3 is the middle of the upper half
Step 3: Calculate the value of IQR. So, IQR = Q3 - Q1
Step 4: Finding the value of lower bound and upper bound, here the lower bound = Q1 - 1.5 × IQR and the upper bound = Q3 + 1.5 × IQR
Outlier is used in different fields such as finance, environment monitoring, cybersecurity, and so on. Let’s learn a few real-life applications of outliers.
Now let’s learn a few common mistakes that students tend to repeat when working on Outlier. But learning these students can master Outlier
Level Up with a Math Certification!
2X Faster Learning (Grades 1-12)
A teacher records the ages of students in a class: 12, 13, 14, 15, 12, 13, 14, 12, 13, 27. Find the outlier in the dataset.
The outlier is 27
Arranging the data: 12, 12, 12, 13, 13, 13, 14, 14, 15, 27
The data set has 10 numbers
Here, Q1 is 12
Q3 is 14
So, IQR = Q3 - Q1 = 14 - 12 = 2
Lower bound = Q1 - 1.5 × IQR = 12 - 1.5 × 2 = 9
Upper bound = Q3 + 1.5 × IQR = 14 + 1.5 × 2 = 17
Any value below 9 or above 17 is the outlier. Here the outlier is 27.
A runner records his daily running distance (in miles) over 7 days: 3, 4, 3.5, 3.8, 4.2, 3.9, 10. Identify the outlier.
The outlier is 10
Sorting the data: 3, 3.5, 3.8, 3.9, 4, 4.2, 10
Here the median is 4th value: 3.9
The lower half is 3, 3.5, 3.8. So, Q1 = 3.5
The upper half is 4, 4.2, 10. So, Q3 = 4.2
So, IQR = Q3 - Q1 = 4.2 - 3.5 = 0.7
Finding the lower bound,
Lower bound = Q1 - 1.5 × IQR
= 3.5 - 1.5 × 0.7 = 2.45
Finding the upper bound,
Upper bound = Q3 + 1.5 × IQR
= 4.2 + 1.5 × 0.7 = 5.25
The number below 2.45 and above 5.25 is the outlier
Here the outlier is 10
A bakery records daily cupcake sales: 25, 30, 28, 35, 27, 500, 32. Find the outlier.
The outlier is 500
Sorting the data: 25, 27, 28, 30, 32, 35, 500
The 4th value is the median, so the median is 30
The lower half is 25, 27, 28. So, Q1 is 27
The upper half is 32, 35, 500. So, Q3 is 35
IQR = Q3 - Q1
So, IQR = 35 - 27 = 8
Now let’s find the lower bound,
Lower bound = Q1 -1.5 × IQR = 27 - 1.5 × 8 = 15
Upper bound = Q3 +1.5 × IQR = 35 + 1.5 × 8 = 47
Here, the outlier is below 15 and above 47, so the outlier is 500
A group of friends records their heights in inches: 60, 62, 61, 63, 64, 65, 90. Identify the outlier.
The outlier here is 90
Sorting the data in ascending order: 60, 61, 63, 64, 65, 90
Here the median is the 4th value, which is 63
Therefore, the lower half is 60, 61, 62. So, Q1 is 61
The upper half is 64, 65, 90. So, Q3 is 64
IQR = Q3 - Q1 = 64 - 61 = 3
Lower bound = Q1 - 1.5 × IQR
= 61 - 1.5 × 4 = 61 - 6 = 55
Upper bound = Q3 + 1.5 × IQR = 64 + 1.5 × 4
= 64 + 6 = 70
Any value above 70 is an outlier. As 90 > 70, it is the outlier.
A company records the number of employees working overtime each week: 5, 7, 6, 8, 6, 50, 7. Identify the outlier.
The outlier here is 50
sorting the data in ascending order: 5, 6, 6, 7, 7, 8, 50
Here the median is the 4th value, which is 7
Therefore, the lower half is 5, 6, 6. So Q1 is 6
The upper half is 7, 8, 50. So, Q3 is 8
IQR = Q3 - Q1 = 8 - 6 = 2
Lower bound = Q1 - 1.5 × IQR
= 6 - 1.5 × 2 = 3
Upper bound = Q3 + 1.5 × IQR
= 8 + 1.5 × 2 = 11
Any value above 11 and below 3 is an outlier. As 50 > 11 it is the outlier
Turn your child into a math star!
#1 Math Hack Schools Won't Teach!
Struggling with Math?
Get 1:1 Coaching to Boost Grades Fast !
Jaipreet Kour Wazir is a data wizard with over 5 years of expertise in simplifying complex data concepts. From crunching numbers to crafting insightful visualizations, she turns raw data into compelling stories. Her journey from analytics to education ref
: She compares datasets to puzzle games—the more you play with them, the clearer the picture becomes!