Measure Data Spread Using Dispersion Metrics
Business Scenario
Welcome!
Today is your ninth day as a Junior Data Analyst at a retail analytics company.
After calculating descriptive statistics such as Mean, Median, and Mode, businesses also need to understand how widely their data is spread. Two datasets may have the same average value but very different levels of variation
Pre-Lab Preparation
Click here to download previous lab file: DM LAB 8
Git Pull
git pull origin branchNameDispersion metrics help analysts measure consistency, identify fluctuations in sales performance, evaluate customer behavior, and detect unusual patterns in retail operations.
Understanding data spread enables better forecasting, inventory planning, and business decision-making.
Topic: Decoding Your Data
1) Measuring Data Spread: Dispersion Insights
Click to download Dataset : Retail_Dataset_Cleaned
Task 1: Understanding Variance and Standard Deviation
Retail organizations frequently analyze sales and revenue data to understand business performance. Knowing the average revenue alone is not enough.
Analysts must determine:
Two important dispersion metrics are:
Variance
Measures how far data points are spread from the mean.
Standard Deviation
Measures the average distance of data points from the mean.
What is Dispersion?
3
Upload the Retail Dataset
4
Load Dataset Using Pandas
df = pd.read_csv("/content/Retail_Dataset_Modified.csv")
print("Dataset Loaded Successfully")Open Google Colab
1
2
Import Required Libraries
import pandas as pd
import numpy as npDispersion refers to the degree of spread or variability present in a dataset.
A small dispersion indicates data values are close to the average.
A large dispersion indicates data values are widely spread.
Display First Five Records
5
df.head()6
Check Dataset Information
df.info()7
Calculate Variance of Revenue
revenue_variance = df["Revenue"].var()
print("Revenue Variance:", revenue_variance)
Variance=
Formula
8
Calculate Standard Deviation of Revenue
revenue_std = df["Revenue"].std()
print("Revenue Standard Deviation:", revenue_std)
Standard Deviation=
Formula
9
8
Compare Revenue Variance and Standard Deviation
comparison = pd.DataFrame({
"Metric": ["Variance", "Standard Deviation"],
"Value": [revenue_variance, revenue_std]
})
comparison10
Calculate Variance of Units Sold
units_variance = df["Units_Sold"].var()
print("Units Sold Variance:", units_variance)11
Calculate Standard Deviation of Units Sold
units_std = df["Units_Sold"].std()
print("Units Sold Standard Deviation:", units_std)12
Analyze Delivery Time Variability
delivery_std = df["Delivery_Time"].std()
print("Delivery Time Standard Deviation:", delivery_std)Display Results
13
print("Revenue Variance:", revenue_variance)
print("Revenue Standard Deviation:", revenue_std)
print("Units Sold Variance:", units_variance)
print("Units Sold Standard Deviation:", units_std)
Great job!
You have successfully completed your lab on Measure Data Spread Using Dispersion Metrics.
Checkpoint
In this lab, you have: Calculated Variance for business metrics, Measured Standard Deviation of sales data, Analyzed Revenue variability, Evaluated Inventory fluctuations, Examined Customer Satisfaction consistency, Compared dispersion across multiple retail metrics, Extracted meaningful insights from data spread
You are now ready to move to the next stage of Junior Data Analyst.
Git Push
git push origin branchNameNext-Lab Preparation
Topic: Decoding Your Data
1) Decoding Skewness: Understanding Data Distribution