☁️

SQL ServerSSMSAzure Data FactoryAzure DatabricksPySpark

On-Premise To Cloud Data Migration

Built a full Medallion Architecture pipeline on Azure that migrated enterprise data from on-premise SQL Server into a scalable cloud data platform. Azure Data Factory handles ingestion into the Bronze layer, Azure Databricks + PySpark handles transformation into Silver and Gold layers, and the final Gold tables power analytics dashboards and business reporting.

Role

Data Engineer

Duration

2 months

Code

🔒 Private Repo

The Problem

The organization had years of enterprise data locked inside on-premise SQL Server databases. Running analytics on it meant querying production systems directly — slowing down live operations and limiting what analysts could do. There was no scalable way to run historical analysis, no separation between operational and analytical workloads, and no cloud flexibility.

⚠️ Querying production SQL Server for analytics was impacting live application performance. The business needed a clean separation between operational and analytical workloads.

Architecture

Architecture Diagram

🗄️SQL Server (On-Prem)

🔧SSMS

🔄Azure Data Factory

🪣Bronze Layer

⚡Azure Databricks

🧹Silver Layer

🏆Gold Layer

📊Dashboards

Data Flow

🗄️ SQL Server (On-Prem)── manage ──→🔧 SSMS

🗄️ SQL Server (On-Prem)── extract ──→🔄 Azure Data Factory

🔄 Azure Data Factory── raw load ──→🪣 Bronze Layer

🪣 Bronze Layer── trigger ──→⚡ Azure Databricks

⚡ Azure Databricks── clean + validate ──→🧹 Silver Layer

🧹 Silver Layer── aggregate + model ──→🏆 Gold Layer

🏆 Gold Layer── serve ──→📊 Dashboards

Tech Stack

Solution — Medallion Architecture

Bronze Layer — Raw Ingestion

Azure Data Factory pipelines extract data from every SQL Server table using a dynamic, parameterized approach — no hardcoded table names. Each pipeline run pulls the full dataset and lands it as-is into the Bronze layer on Azure Data Lake Storage. This is the immutable raw copy — nothing is modified here.

Silver Layer — Cleaned and Validated

Azure Databricks notebooks process Bronze data using PySpark. At this layer: schema is enforced, data types are cast correctly, null values are handled, duplicates are removed, and basic business rules are applied. The output is a clean, trusted dataset ready for modeling.

Gold Layer — Business Ready

Silver data is aggregated and modeled into business-oriented fact and dimension tables in the Gold layer. Views are created on top of these tables so analysts and BI tools always query the latest, correctly-modeled data without touching any upstream layers.

Key Challenges

▸Dynamic pipeline design — hardcoding table names would make the pipeline brittle. Built ADF pipelines with parameterized linked services and lookup activities to discover and ingest all tables automatically
▸Schema inconsistencies — on-premise SQL Server tables had mixed data types and inconsistent naming conventions. Built a PySpark schema normalization step at the Bronze → Silver boundary
▸Incremental loads — full extracts of large tables were slow and expensive. Implemented watermark-based incremental loading in ADF using high-watermark columns on source tables
▸Data validation across layers — silent data loss between layers was a risk. Added row count reconciliation checks between Bronze and Silver after every Databricks job

Results

100%

Data Migrated

all on-prem tables moved

Production DB Queries

analytics fully decoupled

Data Layers Built

Bronze, Silver, Gold

Dynamic

Pipeline Design

no hardcoded table names

Delta

Storage Format

time travel + history enabled

Fast

Reporting Speed

queries hit Gold, not prod DB

✅ The Medallion Architecture gave the business something they never had before — a clear, layered data platform where raw data, clean data, and business data are always separated, always traceable, and never mixed.

Built by

Jeevan Gaire

Data Engineer · 2 months

🔒 Private Repository

Hire Me

Other Projects

🏦

Banking Data Platform — L2/L3 Support & Data Engineering

Azure Data Factory · Azure Databricks · SQL Server · SSMS · PySpark

→