Engineering Lead · Data & AI Platforms

Aaditya
Gupta.

I build the unglamorous machinery that lets Data & AI products actually ship — at scale, on budget, in regulated environments.

Based
India · NCR
Now
Senior Associate L2, Publicis Sapient
Focus
Data platforms, multi-tenant AI
Years
Seven, and counting.
i. Preface

A short introduction.

About

For seven years I've worked on the less-visible side of software — pipelines, schemas, warehouses, tenancy boundaries, and the architecture decisions that quietly determine whether a system survives its second year. That data engineering foundation is what I'm bringing to the current era of AI products: they need clean ingestion, sane evaluation, and predictable cost just as badly as any analytics platform ever did. Today I lead teams building these platforms for healthcare, pharma, and enterprise analytics.

That foundation came from years on sales analytics and payment-integrity programs measured in tens of millions of dollars — HIPAA-compliant pipelines, data quality systems, on-prem-to-cloud migrations. It's also what's let me co-architect a multi-tenant competitive-intelligence platform: multiple services & AI agents, a multi-quarter scope shipped in eight weeks of focused delivery. I care about clean data, evaluation harnesses that tell the truth, cost-aware AI, and engineers who feel trusted enough to ship.

7  years data engineering 6  engineers led ~90%  data quality lift, DQMS on Databricks AI PLATFORM
ii. Trajectory

Seven years, four chapters.

From building data pipelines to designing the platforms they live in — a chronological account of the work, the teams, and the wins worth remembering.

  1. May 2025 — Present

    Engineering Lead, Data & AI Platforms

    Publicis Sapient · Senior Associate, Data Engineering L2

    Leading data engineering and AI platform delivery across two enterprise client engagements in healthcare and pharma analytics — owning architecture decisions, sprint scoping, and cross-functional collaboration with product, compliance, and US-based client-partner teams.

    • Multi-source patient analytics consolidation — unified five disparate clinical, pharmacy and spend sources into a single Snowflake-native data product, collapsing analyst time-to-insight from days to seconds.
    • Streaming MDM crosswalk on Kafka + Databricks for insurance and member identity — established the publish-subscribe pattern as a reusable engagement asset.
    • Two production AI services shipped on a 15-microservice multi-tenant SaaS platform, plus platform-wide cost & quality patterns (see project below).
  2. June 2022 — March 2025

    Data Engineering Lead, Analytics Programs

    ZS Associates · Business Technology Solutions Consultant

    Led an engineering team of six across two flagship programs — Sales Analytics Growth Engine and Payment Integrity Analytics — driving roadmap, sprint planning, mentorship, and end-to-end delivery against multi-million-dollar business outcomes.

    • Architected scalable, HIPAA-compliant platforms on Azure / Databricks, including end-to-end cloud migration of legacy on-prem systems with 100% data consistency via Control-M, Jenkins, UCD.
    • Data Quality Management System on Databricks improved data accuracy by ~90% across multiple datasets — became the standard layer for downstream business reporting.
    • Strategic alt-architecture proposal aligned with the client's future roadmap — saved ~2 months of implementation time while preserving project scope and outcomes.
  3. Nov 2021 — June 2022

    Senior Data Engineer

    Mindtree Limited · Senior Software Engineer

    Built and optimized ETL pipelines and validation frameworks for production-grade ingestion across multiple sources — the kind of work that gets noticed only when it stops failing at 3am.

    • ~60% faster processing — rebuilt pipelines in PySpark + Pandas on Azure Databricks, with consistency holding across ingestion sources.
    • Automated scheduled loads via Python and Azure Data Factory, ensuring timely ingestion for downstream web applications.
    • Cross-source validation framework improving data accuracy and reliability across the stack.
  4. July 2019 — Oct 2021

    Data Engineer, Cloud & Search

    Tata Consultancy Services · System Engineer

    The first chapter — Azure data solutions on Cosmos DB, ADF and Cognitive Search, with a steady migration from Pandas to PySpark as workloads grew.

    • Pandas → PySpark migration — improved memory efficiency and scalability for large datasets.
    • Multi-source real-time pipelines reduced manual intervention by ~80% and improved reliability through automated monitoring and alerting.
iii. Selected Work

A few things worth describing.

Three programs shipped end-to-end — each measured against business outcomes, each with its own architectural argument.

2026
Pharma & CPG · Multi-tenant SaaS

Multi-Tenant AI / Competitive Intelligence Platform

Co-architected a multi-tenant competitive media intelligence platform on Azure — fifteen FastAPI microservices, seven AI agents, forty-four ADRs — shipping a multi-quarter scope in eight weeks of focused, production-grade delivery. Owned the data ingestion service (medallion architecture, scale-to-zero Container Apps Jobs, 4-layer dedup) and the multimodal AI decomposition service (Video Indexer + ffmpeg + GPT-5.1 Vision) that replaced a proprietary vendor — the kind of work that only holds together when the data engineering underneath is right.

Azure Container AppsKEDAEvent HubsFastAPIGPT-5.1ClaudeBicepTerraformRedisOPA
  • $1.5–2kmonthly LLM savings per tenant, via tenant-aware ModelRouter.
  • 60–80%LLM cost reduction from a Redis-cached Content Fragments layer.
  • 99.6%creative-asset extraction (lifted from 10%); <0.1% duplicates on 1.6M+ daily records via a 4-layer dedup.
  • 87%test coverage across 530+ tests, plus 16 LLM-as-judge eval scenarios.
2023 — 25
Healthcare benefits · Enterprise analytics

SAGE — Sales Analytics Growth Engine

Led a six-engineer team across an enterprise platform that contributed to multi-million-dollar account acquisitions and an estimated $30–50M in annual revenue growth for clients. Owned end-to-end cloud migration from on-prem to Azure with 100% data consistency, and built the dashboard surfacing Form 5500, Dun & Bradstreet, and proprietary datasets for targeted prospecting.

DatabricksADLSControl-MJenkinsUrbanCode DeployPower BIShell
  • $30–50Mest. annual revenue growth for clients, attributed to SAGE-driven account acquisition.
  • ~90%data accuracy lift via the Data Quality Management System on Databricks.
  • 4h → 50mcore pipeline runtime cut through Databricks Workflows + shell automation.
  • 100%data consistency through legacy on-prem → Azure migration.
2024 — 25
Healthcare payments · HIPAA-compliant

Payment Integrity Analytics Engine

Led development of a HIPAA-compliant payment integrity analytics platform on Azure + Databricks — a unified view of pre- and post-payment savings across vendor partners, projected to deliver $7M in vendor-fee and commission savings. Built a standardized ingestion framework consolidating multiple disparate payment sources into a single decision-grade dashboard for executive stakeholders.

DatabricksADLSAzurePower BIHIPAA controls
  • $7Mprojected vendor-fee & commission savings.
  • 1unified executive view across previously disparate vendor sources.
  • PHIde-identification & re-identification controls integrated directly into pipeline architecture.
  • Audit-grade orchestration via Databricks Workflows for full HIPAA compliance.
iv. Toolkit

What I reach for.

A working list, in plain prose. Anything italicized is something I'd happily lead an architecture conversation about tomorrow morning.

Skills
Programming
Python· PySpark· SQL & T-SQL· Shell scripting
Data Platforms
Databricks· Snowflake· Azure Data Lake (ADLS)· Azure SQL· Cosmos DB· Teradata· Hive
Streaming & Orchestration
Kafka· Azure Data Factory· Databricks Workflows· Logic Apps· Control-M· Event Hubs
AI / LLM Engineering
Multi-agent systems· RAG patterns· LLM-as-judge evaluation· Tenant-aware model routing· Prompt engineering· Azure AI Video Indexer· Azure AI Speech· GPT-5.1· Claude APIs· AI-augmented engineering workflows
Cloud & Infrastructure
Azure (Container Apps, KEDA, Synapse, ADF)· AWS (Redshift, Glue, S3)· Bicep / Terraform· OPA
DevOps & Quality
GitHub· Azure DevOps· CI / CD· Jenkins· UrbanCode Deploy· Automated testing· AI-assisted code review
Reporting
Power BI· Tableau
Leadership
Engineering leadership & mentorship· Multi-tenant SaaS patterns· Data modelling & pipeline design· HIPAA-compliant data engineering· Stakeholder & client engagement
Credentials
Education
B.Tech, Computer Science — Manav Rachna University (2015–2019)
Certifications & Publications
  • SQL Advanced Certification — TechGig
  • Databases & SQL for Data Science — IBM
  • Data Science in Python — Univ. of Michigan
  • Python Specialization — Univ. of Michigan
  • AI for Everyone — deeplearning.ai
  • Python & RDBMS Module — Infosys
  • “Ground Water Quality Monitoring Using Wireless Sensors and Machine Learning” — IEEE
  • “Role of Hybrid Neural Network in Bankruptcy Prediction” — IEEE, accepted
v. Coda

Let's talk.

Open to data engineering leadership roles, data & AI platform consulting, and the occasional deep-dive conversation about pipeline architecture, evaluation harnesses, or multi-tenancy boundaries.

 Set in Fraunces & Manrope. Hand-built, no frameworks.
© 2026 Aaditya Gupta