Operations Analytics Portfolio Project
ETL Automation &Weekly Report
Automated end-to-end data cleaning pipeline for 3 operational data sources. Transforms 3 hours of manual work into 18 seconds.
3
Data Sources
18s
Runtime
156 hrs
Annual Savings
โน62.4K
Cost Savings
Interactive Pipeline Demo
Watch the ETL pipeline extract messy data, transform it programmatically, and generate the weekly report. Click tabs to explore raw vs. cleaned data.
ETL Pipeline Demo
Live data cleaning simulation
Extract
Transform
Load
0
Orders Processed
0
Inventory Items
0
Employees
0 hrs
Hours Saved
Raw Orders Data (Sample)
3 different date formats, duplicates, typos| Order ID | Date | Product | Price | Qty | Status |
|---|---|---|---|---|---|
| ORD-001 | 01/15/2024 | Laptop | $1,299.00 | 2.0 | Compleeted |
| ORD-002 | 2024-01-16 | Phone | $899.00 | 1.0 | shiped |
| ORD-001 ๐ | 15-Jan-2024 | Laptop | $1,299.00 | 2.0 | CANCELLED |
| ORD-003 | NULL | Tablet | NULL | 1.0 | Pending |
๐ Duplicatesโ ๏ธ NULL valuesโ๏ธ Typos๐
Mixed formats
Pipeline runtime: ~18 seconds3 CSV files generated
View Python SourceWhat This Automates
Orders Data
- รDuplicate order IDs
- ร3 date formats (YYYY-MM-DD, MM/DD/YYYY, DD-Mon-YYYY)
- รPrice strings with $ prefix
- รFloat quantities
- รStatus typos (Compleeted, shiped)
Inventory Data
- รExtra whitespace in names
- รNegative stock levels
- รMixed timezones (IST/UTC)
- รComma-separated cost prices
- รReorder threshold violations
Employee Data
- รMixed name casing (ALL CAPS, lowercase)
- รMixed overtime flags (0/1/Yes/No)
- รImpossible hours (>24)
- รBlank date rows
- รInconsistent boolean formats
How It Works
01
Generate Messy Data
Creates 3 CSV files with realistic data quality issues โ duplicates, nulls, typos, format inconsistencies.
python data/generate_messy_files.py02
Run ETL Pipeline
Extracts, cleans, and transforms all 3 files. Logs every issue found and fixed.
python etl/clean_and_report.py03
Review Report
Get a structured weekly report with executive summary, issue counts, and cost savings.
cat reports/weekly_report.txtTech Stack
PythonPandasETL PipelineData QualityLoggingAutomationOperations Analytics