Batch Sales Analytics Pipeline
A draft portfolio project for a batch ETL pipeline that ingests transaction data, validates schemas, and produces daily revenue metrics.
Problem
Raw sales files need to be converted into clean, analytics-ready tables that support revenue reporting and trend analysis.
Pipeline
- Load raw files into a staging area
- Validate required fields and data types
- Transform transactions into fact and dimension tables
- Publish daily revenue and order metrics
Data Quality
- Not-null checks
- Duplicate order detection
- Referential integrity checks
- Daily row-count checks
Data Sources
CSV transaction exports, Product reference data, Customer reference data
Tech Stack
Python, SQL, PostgreSQL, Docker
Data Model
stg_transactions, dim_customer, dim_product, fact_orders, mart_daily_revenue