🏦

Azure Data FactoryAzure DatabricksSQL ServerSSMSPySpark

Banking Data Platform — L2/L3 Support & Data Engineering

Working as a Data Engineer embedded with a banking client, responsible for L2/L3 support across their Azure-based data platform. The role spans pipeline optimization, automation of manual processes, production issue resolution, root cause analysis for data mismatches, and building internal tooling to improve operational visibility across Databricks workspaces.

Role

Data Engineer — L2/L3 Support

Duration

Ongoing

Code

🔒 Private Repo

Project Overview

This is an active engagement with a financial sector client — a bank running a large-scale Azure data platform that handles critical business data across multiple systems. The platform spans Azure Data Factory for orchestration and ingestion, Azure Databricks for transformation and analytics, and SQL Server as a primary source for storing metadata and logs. My role sits at the intersection of support and development — keeping production stable while continuously improving the platform through automation, optimization, and better tooling.

L2/L3 support in a banking environment is significantly more demanding than a typical data engineering role. Every production issue has a business impact. Data mismatches affect financial reporting. Pipeline failures delay downstream processes that business teams depend on. The expectations around response time, root cause analysis, and permanent resolution are high — and rightly so.

Tech Stack

Architecture

Architecture Diagram

🗄️Source Systems

🔄Azure Data Factory

🪣Azure Data Lake

⚡Azure Databricks

🗃️SQL Server

📊Monitoring Dashboard

👥Business Consumers

Key Contributions

Pipeline Automation — Eliminating Manual Effort

One of the most impactful areas of work has been identifying manual processes that were consuming significant time from the operations team and converting them into fully automated Azure Data Factory pipelines. So far, 4+ ADF pipelines have been automated — tasks that previously required manual triggering, file handling, or data movement are now fully scheduled and self-running. Each automation reduces the chance of human error, removes a dependency on specific team members being available, and frees up engineering time for higher-value work.

Databricks Workflow Monitoring Dashboard

The client had multiple Databricks workspaces running a large number of scheduled workflows. Monitoring them required manually logging into each workspace and checking job run status one by one — a time-consuming and error-prone process that could take hours daily. I built a centralized monitoring dashboard using the Databricks Jobs REST API, a Delta table, and a Databricks Dashboard. A scheduled notebook calls the Jobs API across all workspaces on a regular interval, upserts run data into a Delta table, and the dashboard visualizes everything in one place — total runs, success count, failure count, currently running jobs, and a clickable failed workflows table with direct links to each failed run. The team now has a single URL that gives complete visibility across all workspaces in seconds.

Pipeline Optimization

Existing ADF pipelines and Databricks notebooks were reviewed and optimized for performance and reliability. This included identifying redundant transformation steps, improving SQL queries inside pipeline activities, optimizing PySpark logic in Databricks notebooks to reduce shuffle and improve partition efficiency, and restructuring pipeline dependency chains to enable better parallelism. Optimization work in a banking environment requires careful validation — every change to a production pipeline goes through testing and sign-off before deployment.

Production Issue Resolution

As the L2/L3 support engineer, production issues are escalated here after L1 triage. This covers the full range — Azure Data Factory pipeline failures, Databricks job failures, SQL Server query timeouts, data load failures, connector issues, and environment-level problems. Each issue is investigated, resolved, and documented with a permanent fix where possible so the same issue does not recur.

Root Cause Analysis — Data Mismatches

Data mismatch issues are among the most complex problems in a banking data platform — a number that does not match between source and destination can have downstream consequences in financial reporting. Multiple RCAs have been conducted to trace mismatches through the pipeline chain: identifying where in the ingestion, transformation, or loading process a discrepancy was introduced, determining whether the root cause was a pipeline logic error, a timing issue, a schema change, or a source system change, and delivering a documented resolution with steps to prevent recurrence.

Key Challenges

▸Banking environment constraints — every change to production follows strict change management processes. Nothing gets deployed without documented testing, approval, and a rollback plan
▸Legacy pipeline complexity — some existing ADF pipelines and SQL scripts had evolved over years without documentation. Understanding the original intent before making any changes required significant investigation
▸Oracle V1 to V2 connector migration — upgrading Oracle Linked Services in ADF from V1 to V2 caused Parquet precision failures at runtime due to Oracle V2 inferring decimal(256,130) for unconstrained NUMBER columns. Resolved permanently by enabling supportV1DataTypes in the Linked Service configuration
▸Multi-workspace monitoring gap — no centralized visibility across Databricks workspaces meant failures were often caught late. The monitoring dashboard directly addressed this gap
▸Data mismatch RCA complexity — tracing a numeric discrepancy through a multi-layer pipeline (source → ADF → ADLS → Databricks → Destination) requires eliminating each stage systematically, which is time-intensive but critical in a financial context

Results

Pipelines Automated

manual processes eliminated

Monitoring Dashboard

all workspaces in one view

Multi

Workspaces Monitored

centralized via Jobs API

Multiple

Production Issues Fixed

with permanent resolutions

Several

RCAs Completed

data mismatch root causes found

Ongoing

Engagement

active L2/L3 support

🏦 Working in the financial sector means the bar for data accuracy and pipeline reliability is higher than almost any other domain. Every fix, every optimization, and every automation has a direct impact on the data that the bank's business teams rely on for decisions.

Built by

Jeevan Gaire

Data Engineer — L2/L3 Support · Ongoing

🔒 Private Repository

Hire Me

Other Projects

☁️

On-Premise To Cloud Data Migration

SQL Server · SSMS · Azure Data Factory · Azure Databricks · PySpark

→