Case Studies

Pipeline and data modeling writeups

Case studies explain the engineering problem, pipeline architecture, data model, quality checks, trade-offs, and outcomes.

Planning

Fintech Sentinel

Planning case study for a real-time transaction monitoring and fraud detection lakehouse using CDC, Kafka, Spark, Iceberg, dbt, Airflow, and Great Expectations.

Stack

PostgreSQL, Debezium, Kafka, PySpark, Apache Iceberg, MinIO, Nessie, dbt, Airflow, Great Expectations, Trino, Grafana, Docker

Read case study
In progress

Formula-1 Lakehouse Analytics

In-progress case study for a Formula 1 lakehouse analytics platform that unifies historical results, race sessions, telemetry, weather, and strategy data into Bronze, Silver, and Gold datasets.

Stack

Python, Apache Airflow, MinIO, PySpark, dbt, DuckDB, Metabase, Parquet, Docker

Read case study
In progress

Glowbal University Data Ingestion Pipeline

In-progress case study for an evidence-first university data ingestion and QA pipeline that turns official university source pages into auditable product review datasets.

Stack

Python, Supabase, PostgreSQL, Playwright, Serper, Gemini, CSV, pytest

Read case study
Ready

Ecommerce Market Batch ETL Pipeline

Ready case study for a daily Airflow ETL pipeline that collects e-commerce product data, validates quality, stores market snapshots in PostgreSQL, and powers Metabase dashboards.

Stack

Python, Apache Airflow, Pandas, Pandera, PostgreSQL, SQLAlchemy, Metabase, Docker, pytest

Read case study