Descriptive statistics describe or summarize the features of a data set. Descriptive statistics are split into two main branches:
- Measures to central tendency: that include mean, median, mode
- Measures of variability: these include standard deviation, variance, minimum and maximum variables, range, skewness, and kurtosis
What Are Descriptive Statistics?
Descriptive statistics depicts the features of a data set as a summary of data shown and explains the contents of that data. For instance, in a population census, the calculation of the ratio of men to women in a country is descriptive statistics.
So descriptive statistics is meant to describe the characteristics of the data set under study. It can help statisticians analyze important information that helps in decision-making. For instance, if a statistician is tasked to evaluate the performance of baseball teams through descriptive statistical analysis.
The expert will calculate the highest batting average for every player and win per division using descriptive statistics. The information of all the baseball players for all the teams is a very huge data set. Using descriptive analysis to sort data can help provide useful information about a large amount of data.
The main purpose of descriptive statistics is to provide information about a data set. In the example above, there are hundreds of baseball players that engage in thousands of games. Descriptive statistics summarize a large amount of data into several useful bits of information.
Types of Descriptive Statistics
There are three types of descriptive statistics: measures of central tendency, measures of variability, and frequency distribution.
Measures of central tendency:
In a measure of central tendency, the aim is to find the central or average value of the given information. The date set can be valued in a table, graph, or general information. The mean, median, and mode for such data sets can help analyze the common patterns.
Measures of Variability
As opposed to a measure of central tendency, the measure of variability shows how the data is distributed within the data set. The measure of variability is also known as a measure of dispersion.
Let’s take this data set as an example: 4, 6, 5, 7, 8
If the average of this data set is 6, other features can find out through measures of variability such as the range which shows the difference between the lowest to the highest value, calculated below:
8 (highest value) – 4 (lowest value) = 4
The range shows much the lowest and highest values in a data set vary.
Frequency distribution helps check how many times a data point occurs or does not occur in a distribution. For example, in a study of male and female students in art class, the genders of the students are specified in the following data set:
Male, female, male, not specified, male, not specified, female, male.
The frequency distribution occurs as follows:
Numbers of males: 4
Number of non-males: 4
Number of females: 2
Number of persons other than male or female: 2
Univariate and Bivariate Analysis
Univariate as the ‘uni’ in its name means data analysis of one variable. While bivariate analysis as it contains ‘bi’ (mean two) is the data analysis of two variables.
Univariate analysis is for identifying the characteristics based on a single trait instead of analyzing relationships or interconnections. For instance, when studying data of students in a class, we focus on only one variable: the test scores of students. The calculation of average test scores involves a single variable; test scores are granted to each student; hence this study is univariate.
Bivariate studies look for a correlation between two variables. In a bivariate study, two different sets of data are collected separately, that are later analyzed together.
To test how intelligence affects the test scores of a student, we collect the test scores of students in a class and the results from their IQ test. By drawing a graph that compares two variables: IQ level points and test scores, beneficial insights about students can be analyzed.