482 Learners
Data is a collection of information. In mathematics, data can be numbers, letters, or measurements used to solve mathematical problems.
Share Post:
Trustpilot | Rated 4.7
1,292 reviews
Data mean information in various forms, such as numbers or observations. They are broadly classified into three types: structured, unstructured, and semi-structured data. We’ll soon learn about the different types of data.
Data were initially used to maintain simple records. From 3000 BCE to 1000 BCE, people in China, Egypt, and India used clay tablets to record data. It was usually used to track population, growth, taxes, and crops. Today, the word ‘data’ refers to a far more complex subject. Not surprisingly, it is used widely in schools, offices, and other organizations.
In the classical era (500 BCE to 100 BCE), the Romans used data for managing their empire, while Greek mathematicians like Pythagoras worked on abstract numerical concepts. Between the 8th and 13th centuries, Islamic scholars like Al-Khwarizmi (the father of algebra) and Al-Biruni advanced mathematical methods. Al-Biruni used trigonometry to calculate the Earth’s size and studied pharmaceutical applications. Around the 12th-13th centuries, Europeans started collecting data for various purposes, such as agriculture and trade.
In the 17th century, statistics emerged as a discipline. John Graunt started analyzing the population data, and Pascal (mathematician) developed probability theory.
By the 18th and 19th centuries, Carl Friedrich Gauss introduced the method of least squares and the normal distribution, making data an important part in economics, urban planning, and industry.
The concept of big data was introduced during the 20th century. The invention of the computer transformed big data, enabling complex calculations and machine learning. Databases facilitated large-scale data storage and retrieval.
Big data became even more powerful during the 21st century. It began processing huge amounts of data easily. AI and ML enhanced data analysis, while visualization tools simplified complex patterns.
With the evolution of human beings, data also became more complex and intricate. From ancient agricultural records to big data and machine learning, data has come a long way.
So far, we’ve learned about data and its history. Now, let’s learn about the types of data. This is important because it helps us choose the correct method to analyze and process the collected data. The following are the major branches of data:
I. Descriptive Data
Descriptive statistics are great for describing the collected information. But it is not helpful in making predictions. It is easier to understand because it gives the data in visual forms, such as graphs. Its main purpose is to give an overview of the collected data.
Example: A summary of student exam scores or monthly weather reports.
II. Inferential Data
Inferential statistics uses sample data to make predictions or generalizations about a larger population. They enable us to draw conclusions based on sample analysis.
Example: Estimating the average height of a population from a sample.
III. Qualitative Data
Data that cannot be quantified numerically but describes characteristics is called qualitative data. It answers ‘what kind’ rather than ‘how much.’ It is useful for classification and labeling the information.
Example: Categories of fruits and vegetables, colors, etc.
IV. Quantitative Data
Quantitative data can be measured. It is usually used for statistical and mathematical calculations.
Example: The total number of employees in an organization or the total weight of all the students in a class.
V. Structured Data
Structured data is organized in columns and rows. Structured data is easy to manage, search, and analyze. It fits into relational databases or matrix-based calculations.
Example: Excel database.
VI. Unstructured Data
Unstructured data is the opposite of structured data. They lack a predefined structure, making them difficult to store and process.
Example: Images, videos, and audio files.
VII. Semi-structured Data
The data that lies between structured and unstructured data is called semi-structured data. This type of data has some organization, but does not strictly fit in rows and columns. They are useful for dynamic data types.
Example: Email has some structure (sender, receiver, subject, and date), but the body of the email is unstructured.
Data visualization presents data graphically to reveal patterns, trends, and relationships. The following are some of the types of data representation.
I. Bar Graphs
Bar graphs are a visual representation of data in horizontal and vertical bars. Here, the length of each bar corresponds to the data value. These are also known as bar charts.
Example: A bar graph showing the population trends over the last decade, categorized by gender.
II. Pie Charts
Here, data are shown as part of a whole circular graph. It is called a pie chart because the data resembles a slice of pie; each slice or segment represents proportions. It is best for displaying percentages.
Example: A pie chart showing the monthly expenses of different departments of a company.
III. Histogram
Much like a bar graph, a histogram uses bars to represent data. The difference is that it doesn’t have gaps between the bars, while a bar graph has gaps. A histogram is used to represent the frequency of continuous data. Example: the students’ scores in intervals.
IV. Line Graphs
Data that has the potential to change over time can be represented as a line graph. It is formed by joining plotted points with straight lines to represent the given data. Line graphs are used to determine trends.
Example: Revenue of a company.
V. Scatter Plots
Scatter plots display the relationship between two variables. If the plotted points are rising, then it is called a positive correlation. If the points are falling, it is called a negative correlation. Should the points be scattered without any pattern, then there is no correlation.
Example: Comparing the number of study hours spent by a student against the grades obtained to observe a correlation.
VI. Tables
Tables organize data in rows and columns. They are used to find a particular type of data quickly. It is the most basic method for summarizing raw data.
Example: A table showing all the records of students in a class.
The different methods used to collect data are called data collection methods. They are used by researchers and analysts who use data to solve a specific problem or make informed decisions. Some of the commonly used methods are given below:
I. Surveys
A survey involves asking a set of predefined questions to a group of people. It is used to gather opinions or preferences from people. Types of surveys include questionnaires, online forms, and interacting with people.
Example: A survey of children to identify their favorite candy.
II. Experiment
An experiment involves collecting data under some pre-defined conditions to observe its effects. These experiments will either result in a new theory or validate existing theories.
Example: Experimenting with fertilizers to find the best fertilizer for a particular type of crop.
III. Observations
Observation is gathering data by watching or recording events without interference. It is extremely useful for studying patterns. Observation aids pattern recognition in data analysis.
Example: Observing the blossoming of a flower to determine the number of days it takes to bloom.
IV. Interviews
Interviews are a process of gathering data by conducting face-to-face or virtual interactions with people. They help us find data, considering diverse perspectives of people from different walks of life.
Examples: Conducting interviews in the office to assess a person’s suitability for a role.
Data analysis techniques are used to organize and examine data to get relevant information about a given problem. These techniques help identify patterns and refine raw information given by the data. Here are some of the data analysis techniques that are used often.
The mean is used to find the average of a set of numbers. To calculate the mean, add all the values together and divide the total by the number of values. The formula to find the mean is:
σ = Sum of given data/Total number of data
Let’s understand this with an example and try to find the mean for the exam scores, 85, 90, 68, 95, and 55.
Use the formula to find the average score. Substituting the scores in the formula, we get:
Mean = 85 + 90 + 68 + 95 + 55/5
= 393/5
= 78.6
Median is the middle value in the dataset. We have to arrange the numbers in ascending or descending order and take the middle number as the median. If the given numbers are even, take the average of the two middle numbers.
Example:
(i) 1, 2, 3, 4, and 5. Here, the median value is 3.
(ii) 1, 2, 3, and 4. Here, we will find the median value using the following formula.
Median = sum of the middle two numbers/2
Median = 2 + 3/2 = 2.5
The mode is the value that appears more frequently in the dataset. Unimodal, bimodal, and multimodal are some of the types of modes. If the dataset contains only one mode, it is called unimodal. If there are two modes, it’s called bimodal, and if there are more than two modes, it is called multimodal. A dataset with no repeated values has no mode.
Example: Find the mode in the dataset 2, 5, 7, 6, 5, 7, 8.
The modes in the given dataset are 5 and 7, because they appear twice in the dataset. So the dataset has two modes and hence it is bimodal.
The difference between the lowest value and the highest value in the dataset is the range. The range is calculated as: Range = Highest value - Lowest value.
Example: Find the range using the dataset 89, 76, 65, 78, 54, 23.
Here, Highest value = 89
Lowest value = 23
Range = Highest value - Lowest value
= 89 - 23
= 66
The amount of variation in the given dataset is known as the standard deviation. We can find the standard deviation using the formula: σ =√ ∑(xi - μ)2/n.
Here xi is every value in the dataset.
μ is the mean value in the dataset.
n is the given number of values.
Example: Find the standard deviation for 1, 2, 3, with the mean 2.
Standard Deviation =√ ∑(xi - μ)2/n
Standard Deviation =√ (1 - 2)2 (2 - 2)2 (3 - 2)2/3
=√ 1 + 0 + 1/3
=√ 2/3
≈ 0.816.
Variance measures the spread of data. In math, it is used to find the square of the difference in the mean. The formula used to find the variance is as follows:
σ2= ∑(xi - μ)2/n
Here xi is every value in the dataset.
μ Is the mean value in the dataset.
n is the number of values.
Example: Find the variance of 1, 2, 3 with the mean 2
Variance = (1 - 2)2 + (2 - 2)2 + (3 - 2)2/3.
= 1 + 0 + 13.
= 2/3.
≈ 0.667
Data is the collection of information or facts. The accuracy of any data can be improved by learning the rules and properties of data. Below are some rules and properties of data.
We apply the principles of data in our everyday lives. Therefore, it’s important to know some of the tips and tricks regarding data.
The concept of data is applied almost everywhere. Interestingly, data can be collected and applied in the same place, such as social media, shops, and educational websites.
For instance, data from faculty and students are used to improve educational institutions. While teachers can use data to improve their teaching, students can use it to track their progress and behavior.
In healthcare, data are stored to maintain the record of the patients. E.g., data of the treatments that they have undergone can help doctors treat them better.
Social media platforms like Instagram and Facebook store data related to the users’ online activity. By storing this data, these platforms give us personalized content.
In the field of science and research, scientists use data from their previous research to solve new problems.
Data analysis is used to process raw data. Identifying mistakes and learning from them will help us get more accurate results.
Find the mean, median, and mode for the scores 67, 89, 76, 79, 90.
The mean of the scores is 80.2, the median is 79, and there is no mode.
(i) To find the mean, we use the formula, mean = Σx/n
= 67 + 89 + 76 + 79 +90/5
= 401/5
= 80.2
(ii) To find the median, arrange the given dataset in ascending order,
67, 76, 79, 89, 90
The median is 79, because it's the middle number
(iii) There is no mode in the given dataset. All the numbers in the dataset are unique.
There are 50 patients in the hospital, and 28 are hospitalized. Find the percentage of patients who are admitted.
The percentage of people hospitalized is 56%
To find the percentage, we use the formula:
Percentage = Number of patients admitted/Total patients x 100
= 28/50 x 100
= 0.56 × 100
= 56%
Identify the outliers in the given dataset: apple, orange, banana, strawberry, and potato.
The outlier is the potato
Outliers are like finding the odd one out. Here, potato is the outlier, because all others are fruits and potato is a vegetable.
If a bar graph shows the number of bananas sold in a week, find the total number of bananas sold. Monday: 20, Tuesday: 25, Wednesday: 15, Thursday: 8, Friday: 17, Saturday: 25.
The total number of bananas sold is 110
Add the number of bananas sold on each day.
20 + 25 + 15 + 8 + 17 + 25 = 110
Find the range of the given numbers 12, 15, 10, 18, 14.
The range is 8
To find the range, we use the formula: Range = The highest value - The lowest value
Here, the highest value = 18
Lowest value = 10
Range = 18 - 10 = 8
Jaipreet Kour Wazir is a data wizard with over 5 years of expertise in simplifying complex data concepts. From crunching numbers to crafting insightful visualizations, she turns raw data into compelling stories. Her journey from analytics to education ref
: She compares datasets to puzzle games—the more you play with them, the clearer the picture becomes!