Measures of dispersion in statistics is a collection of metrics employed to assess the quality of data objectively.
Most data science studies begin with the fundamentals of statistics as dispersion is an essential subject you must not overlook.
Understanding the distribution of data is the most critical aspect of dispersion measures. The value of the measure of dispersion increases with the diversity of the data set.
Ungrouped or raw data or even different data sets can take time to interpret and analyze. The measures of dispersion help solve this by making data accessible to read.
Read on and learn more about the different various types of measures of dispersion, relevant examples and other related information to the measures.
Dispersion in Statistics
Dispersion means to distribute or disseminate. Statistical dispersion refers to the degree of variability a set of values spreads relative to an median value. Or, to put it another way, dispersion helps us comprehend data distribution.
Dispersion in statistics helps understand how the data varies in that it is homogeneous or heterogeneous. It helps show how broadly or narrowly spread the data is.
Dispersion measures are positive integers that quantify the distribution between data points and a center value. These values assist in determining the spread of the data, to how squeezed or stretched it is.
There are five standard methods for measuring dispersion: variance, range, quartile deviation, mean deviation, and standard deviation.
Objectives of measures of dispersion in statistics include:
- It attempts to determine the degree of similarity or consistency between two or more data sets. A significant degree of variance indicates a limited degree of consistency. A lower degree of variance corresponds to greater consistency and uniformity.
- Differentiation helps ascertain the causes of variations in a given data set and helps manage the variation.
- Measures of dispersion help one to determine the extent to which an average represents the whole data set.
- Measures of dispersion are used in the calculation of various statistical methods, including hypothesis tests and regression.
The two main measures of dispersion in statics include the following;
- Relative Measure of Dispersion
- Absolute Measure of Dispersion
1. Absolute Measure of Dispersion
Absolute measures of dispersion quantify the degree of variation among a set of numbers expressed in observation units.
Absolute dispersion measures differences as mean or standard deviations, representing the average variation in the data.
For instance, if you provide data regarding the temperature readings in an area over days in °C, absolute measures of dispersion will give the variation in °C.
The absolute measures of dispersion are as follows:
- Range: In essence, range denotes the quantity of values between a given set’s maximum and minimum values. Consider the following: 1, 4, 7, 9, 11; range = 11-1 = 10.
- Variance: Subtract the mean of the given set of values from each data in the group, square the value, add the squares, and divide by the number of values in the set to get the variance.
Variance (σ2) = ∑(X−μ)2/N
Standard Deviation: Get the square root of the variance to get the standard deviation of the data set S.D. = √σ.
- Quartiles and Quartile Deviation: Quartiles represent values that divide a given set of numbers into four equal portions. A quartile deviation of half is equal to the difference between the first and third quartiles.
Mean and Mean Deviation: The average of numbers in the data provided is the mean and the mean of absolute deviations is the mean deviation
- Range
The range is the most straightforward method for quantifying variation and is relatively easy to calculate. We calculate it by subtracting the minimum and maximum values in the data set.
You have likely encountered the range on numerous occasions as it provides the most accurate estimation of the variability of a given entity. Although range may appear alluring due to its simplicity in calculation, it may not give a reliable indication of variation.
Here’s how you calculate the range—a group of numbers: 8, 7, 4, 3, 5, 10, 6. Get the range by taking the maximum number 10 and the minimum 3. The difference between the two, 7 is the range of that data set.
Pros
- It is straightforward to deduce and comprehend.
- No technical formula is required to calculate the range
- The computation requires the shortest time possible.
- It presents a concise summary of all pertinent information.
Cons
- This metric is straightforward as it solely considers the minimum and maximum values.
- Determining the range of an open-ended series is unattainable.
- Differences between samples significantly impact the range, which varies considerably between samples.
- Variance
It approximates the deviation of a given set of (random) numbers from their mean. Variance is the average squared difference between the values and the mean.
Statistically defined, it is the sum of squared deviations between each score and the mean divided by the number of scores in the set.
- Standard Deviation
Calculate the standard deviation by getting the square root of the variance. Having the square root of the variance helps you understand the data as it returns the standard deviation in the units employed initially to measure it.
Standard deviation is a practical and understandable value as when reporting summary statistics for a study; researchers say the mean and standard deviation. They are the most commonly used measures of dispersion.
The value of a set of numbers’ standard deviation indicates their degree of dispersion around the mean. It simply shows the degree to which the value deviates from its mean.
A minor variation in even one of the factors causes an entire standard deviation to shift. Hence, it is dependent on every element present in the dataset. The significance of the data set’s size is more important than its origin.
Since the standard deviation indicates the degree of spread among the values in a given set, its value is always 0 or a positive integer.
A low standard deviation indicates minimal variability in the data. If the standard deviation is bigger than the given data, there’s more spread from the mean, a bigger measure of dispersion.
The Greek letter π represents the population standard deviation, while the lowercase letter s denotes the sample standard deviation.
You must be familiar with the formula for calculating standard deviation. Calculate the standard deviation, by following these outlined steps.
- Find the mean. This is the initial and paramount step when provided with a dataset. To calculate the mean, get the sum of all the available data and divide the result by the total number of data.
- After determining the mean, subtract the amount from each data in the set, then square the value.
- Calculate the mean of the squared values you put aside. Sum the squared data then divide the value by the number of data to get the mean of the squared values.
- What you have now is the variance. The square root of the given value (the variance) is the standard deviation.
- Quartiles and Quartile Deviation
Quartiles are measurement units that divide the data set into four equal parts, ensuring each part has the same observation data.
For this reason, there are three quartiles, namely Q1, Q2, and Q3. Q1 represents the first quartile or the lower quartile. This quartile houses 25% of the data, and 75% of the value is more significant than it.
The second quartile, Q2, houses 50% under it, and 50% of the items are greater than its value. It’s more of the median of the data provided.
The letter Q3, or the upper quartile, denotes the third quarter. It comprises 75% of the values in the spread with 25 % being above it.
In a nutshell, Q1 and Q3 are the two limits in which almost half of the data spread lies. The difference between Q1 and 3 gives you the medium of the values presented.
Pros
- It is also straightforward to comprehend and deduce.
- It is also effective for open-ended sharing.
- It exhibits a reduced impact of extreme numbers, rendering it superior to “Range.”
- It is more beneficial when calculating the dispersion of the middle 50%.
Demerits
- It doesn’t factor all observations
- It doesn’t promote further mathematical or statistical processing
- Variations in sampling considerably impact outcomes.
- It is not as reliable as alternative dispersion measures due to its omission of half of the data.
- Mean and Mean Deviation
Mean deviation is the distance between the observed values and the distribution’s mean. Some deviations show values that are either positive or negative.
In this manner, summing them will not reveal much difference because their effects tend to cancel one another out.
For example :
We have this set of data: -10, 5, 35
We get the mean = (-10 + 5 + 35)/3 = 10
Now a deviation from the mean for different values is,
- (-10 -10) = -20
- (5 – 10) = -5
- (35 – 10) = 25
As shown, adding the difference between the mean value and the actual data set cancels each other, and we get an x=zero deviation.
Alternatively, to resolve this issue, use the absolute values of the differences to calculate the mean deviation.
Pros
- Focuses on all observations and doesn’t rely on limits such as range and quartile deviation
- It is straightforward to deduce and comprehend.
- Result not affected by extreme values
- The average is used to calculate the mean deviation
Cons
- In mathematics, it is not advisable to disregard the concepts of + or – values.
- It does not merit further mathematical investigation.
- Determining the mean or median becomes challenging when the value is a fraction.
- This strategy may not work on open-ended series.
Measures of Dispersion | |
Range | H – SH = The Largest ValueS = the Smallest Value |
Variance | Population Variance, σ2 = Σ(xi-μ)2 /nSample Variance, S2 = Σ(xi-μ)2 /(n-1)n = The number of observationμ = The mean |
Standard Deviation | S.D. = √(σ2) |
Mean Deviation | μ = (x – a)/nn = The number of observationa = The central value(mean, median, mode) |
Quartile Deviation | (Q3 – Q1)/2Q3 = Third QuartileQ1 = First Quartile |
Factors Affecting Variability
Before looking at the other types of measures of dispersion in statistics, we would like to go through several factors that can influence data distribution.
i) Stability during sample collection: Several samples from the same group will produce identical outcomes due to their common origin. This explains their stability; one would anticipate samples to be just as variable as the group from which they originated.
ii) Outliers: Extreme scores affect the range, standard deviation, and variance. One outlier or an extreme data set score will definitely alter the overall statistical value you calculate.
iii)Sample size: Increasing or decreasing a given sample size within a set increases the range due to the change of values in the data set.
2. Relative Measures of Dispersion
Relative measures of dispersion remain unaffected by the measurement units used to denote the readings.
They are numbers and compare variation not in a data set but between two or more data sets with different units and observation measurements. Having absolute and relative measures of dispersion is handy for Six Sigma teams.
Relative measures of dispersion are necessary when comparing data from distinct sets that employ distinct units. These values are expressed as ratios and percentages, lacking a standard unit. The following are some of the numerous dispersion measures:
- Coefficient of Range: Expressed as a ratio between a set’s maximum and minimum value to the sum of the maximum and minimum value.
- Coefficient of Variation: In terms of the data set, the variation coefficient is equivalent to the standard deviation from the mean expressed as a percentage.
- Coefficient of Mean Deviation: The coefficient mean deviation is calculated by dividing the mean deviation by the center point value used from the data provided.
- Coefficient of Quartile Deviation: it is calculated by dividing the difference between the third and first quartiles expressed as a ratio.
Relative Measures of Dispersion | Related Formulas |
Coefficient of Range | (H – S)/(H + S) |
Coefficient of Variation | (SD/Mean)×100 |
Coefficient of Mean Deviation | (Mean Deviation)/μwhere,μ is the central point for which the mean is calculated |
Coefficient of Quartile Deviation | (Q3 – Q1)/(Q3 + Q1) |
Measures of Central Tendency
A central tendency can be explained as the value of a singular number that attempts to define a set of data by indicating the location of the midpoint. They are sometimes referred to as measures of central location.
Grouped as summary statistics, the mean or the average is arguably the most widely recognized. However, there is the median and mode also.
It is possible to determine the central trajectory of a set of data using the mean, median, or mode. Nevertheless, instances arise in which one of these metrics surpasses the others.
- Mean
It represents the average value of the set. It is determined by getting the sum and dividing the product of each value in the list by the total number of values. Researchers commonly call it the arithmetic mean.
- Median
The median is the middle value of a set of data set items when arranged in ascending or descending order. If the data is even, add the two values at the middle then divide by two to get the mean. The value is the median of the data set.
- Mode
The mode of a dataset indicates the number that occurs most frequently. Some data may show several modes while some will not show any frequently occurring value in the data set.
Measures of dispersion and central tendency values all help understand data. The following table illustrates the distinction between dispersion measurement and central trend measurement.
Measures of Dispersion | Central Tendency |
Measures of dispersion help quantify the variables in data. | Measures of central tendency quantifies the data’s average behavior. |
Examples include variance, mean deviation, standard deviation, quartile deviation, etc. | Examples include mean, median, and mode. |
Quick Notes on Dispersion
In the field of statistics, the concept of dispersion holds significant importance. It facilitates comprehension of concepts such as data distribution, maintenance, and diversification compared to the central value or trend.
Moreover, statistical dispersion enables us to gain a more comprehensive understanding of how data is spread. For instance, Mean, Median, and Range might be identical between two distinct groups, whereas the degree of variation might be pretty distinct.
Here are quick notes to remember on dispersion;
- Q2 in the frequency distribution series is also the median of the values
- Measures of dispersion help determine data spread and are measured around a central value.
- Calculate the median the same way you calculate Q1 and Q3
- Measures of dispersion are grouped into two broad categories, namely relative and absolute measure of dispersion
- The range of the Lorenz curve’s values is from 0 to 100
- Absolute measures of deviations have identical units as data provided while relative measures are limitless
- A good measure of dispersion is easy to calculate and analyze
- Variations in data don’t alter a precise measure of dispersion.
- Absolute deviation measures include the range, variance, standard deviation, quartile deviation, and mean deviation.
- Relative measures of dispersion are the coefficients of dispersion.
Conclusion
To grasp the concept of measures of dispersion in statistics, it is imperative first to understand what dispersion means.
Dispersion is a statistical term for the degree of spread of a set of data. Dispersion refers to the process by which data is varied across multiple groups. It entails calculating the magnitudes of the expected distribution values for a given variable using the available data.
Understanding what dispersion entails bridges you to the two types of dispersion. Absolute dispersion illustrates the degree of variability among the values of distinct data sets with respect to the mean. Relative dispersion is the ratio of the standard deviation to its mean.