Banking Data Platform — L2/L3 Support & Data Engineering
Working as a Data Engineer embedded with a banking client, responsible for L2/L3 support across their Azure-based data platform. The role spans pipeline optimization, automation of manual processes, production issue resolution, root cause analysis for data mismatches, and building internal tooling to improve operational visibility across Databricks workspaces.
Role
Data Engineer — L2/L3 Support
Duration
Ongoing
Code
🔒 Private RepoProject Overview
This is an active engagement with a financial sector client — a bank running a large-scale Azure data platform that handles critical business data across multiple systems. The platform spans Azure Data Factory for orchestration and ingestion, Azure Databricks for transformation and analytics, and SQL Server as a primary source for storing metadata and logs. My role sits at the intersection of support and development — keeping production stable while continuously improving the platform through automation, optimization, and better tooling.
L2/L3 support in a banking environment is significantly more demanding than a typical data engineering role. Every production issue has a business impact. Data mismatches affect financial reporting. Pipeline failures delay downstream processes that business teams depend on. The expectations around response time, root cause analysis, and permanent resolution are high — and rightly so.
Tech Stack
Architecture
Architecture Diagram
Key Contributions
Pipeline Automation — Eliminating Manual Effort
One of the most impactful areas of work has been identifying manual processes that were consuming significant time from the operations team and converting them into fully automated Azure Data Factory pipelines. So far, 4+ ADF pipelines have been automated — tasks that previously required manual triggering, file handling, or data movement are now fully scheduled and self-running. Each automation reduces the chance of human error, removes a dependency on specific team members being available, and frees up engineering time for higher-value work.
Databricks Workflow Monitoring Dashboard
The client had multiple Databricks workspaces running a large number of scheduled workflows. Monitoring them required manually logging into each workspace and checking job run status one by one — a time-consuming and error-prone process that could take hours daily. I built a centralized monitoring dashboard using the Databricks Jobs REST API, a Delta table, and a Databricks Dashboard. A scheduled notebook calls the Jobs API across all workspaces on a regular interval, upserts run data into a Delta table, and the dashboard visualizes everything in one place — total runs, success count, failure count, currently running jobs, and a clickable failed workflows table with direct links to each failed run. The team now has a single URL that gives complete visibility across all workspaces in seconds.
Pipeline Optimization
Existing ADF pipelines and Databricks notebooks were reviewed and optimized for performance and reliability. This included identifying redundant transformation steps, improving SQL queries inside pipeline activities, optimizing PySpark logic in Databricks notebooks to reduce shuffle and improve partition efficiency, and restructuring pipeline dependency chains to enable better parallelism. Optimization work in a banking environment requires careful validation — every change to a production pipeline goes through testing and sign-off before deployment.
Production Issue Resolution
As the L2/L3 support engineer, production issues are escalated here after L1 triage. This covers the full range — Azure Data Factory pipeline failures, Databricks job failures, SQL Server query timeouts, data load failures, connector issues, and environment-level problems. Each issue is investigated, resolved, and documented with a permanent fix where possible so the same issue does not recur.
Root Cause Analysis — Data Mismatches
Data mismatch issues are among the most complex problems in a banking data platform — a number that does not match between source and destination can have downstream consequences in financial reporting. Multiple RCAs have been conducted to trace mismatches through the pipeline chain: identifying where in the ingestion, transformation, or loading process a discrepancy was introduced, determining whether the root cause was a pipeline logic error, a timing issue, a schema change, or a source system change, and delivering a documented resolution with steps to prevent recurrence.
Key Challenges
- ▸Banking environment constraints — every change to production follows strict change management processes. Nothing gets deployed without documented testing, approval, and a rollback plan
- ▸Legacy pipeline complexity — some existing ADF pipelines and SQL scripts had evolved over years without documentation. Understanding the original intent before making any changes required significant investigation
- ▸Oracle V1 to V2 connector migration — upgrading Oracle Linked Services in ADF from V1 to V2 caused Parquet precision failures at runtime due to Oracle V2 inferring decimal(256,130) for unconstrained NUMBER columns. Resolved permanently by enabling supportV1DataTypes in the Linked Service configuration
- ▸Multi-workspace monitoring gap — no centralized visibility across Databricks workspaces meant failures were often caught late. The monitoring dashboard directly addressed this gap
- ▸Data mismatch RCA complexity — tracing a numeric discrepancy through a multi-layer pipeline (source → ADF → ADLS → Databricks → Destination) requires eliminating each stage systematically, which is time-intensive but critical in a financial context
Results
4+
Pipelines Automated
manual processes eliminated
1
Monitoring Dashboard
all workspaces in one view
Multi
Workspaces Monitored
centralized via Jobs API
Multiple
Production Issues Fixed
with permanent resolutions
Several
RCAs Completed
data mismatch root causes found
Ongoing
Engagement
active L2/L3 support
🏦 Working in the financial sector means the bar for data accuracy and pipeline reliability is higher than almost any other domain. Every fix, every optimization, and every automation has a direct impact on the data that the bank's business teams rely on for decisions.
Built by
Jeevan Gaire
Data Engineer — L2/L3 Support · Ongoing