Skip to Content

Senior Software Engineer -Data Engineer

Mumbai, India

Job Description — Lead Data Engineer

Role Overview

You will own the design, development, and optimization of large-scale data pipelines and platforms. This role needs someone who can combine strong engineering fundamentals with hands-on expertise in Python, distributed data processing (Spark), cloud architecture (AWS), and modern container/orchestration stacks (Docker, Kubernetes). You will guide the data engineering strategy, enforce best practices, and lead complex initiatives end-to-end.

Key Responsibilities

  • Architect, build, and maintain scalable batch and streaming data pipelines using Spark (PySpark preferred).
  • Develop robust, modular, production-grade Python services for ingestion, transformation, and ML-adjacent workflows.
  • Optimize Spark jobs for performance, cost efficiency, and reliability in large datasets (TB-scale).
  • Lead the design of data lake, lakehouse, or warehouse architecture (S3 + Glue + Athena, EMR, Redshift, or similar).
  • Implement CI/CD pipelines for data services using tools like GitHub Actions / GitLab / Jenkins.
  • Manage containerized workloads using Docker and deploy/operate them on Kubernetes (EKS preferred).
  • Enforce data quality, lineage, governance, and schema management standards.
  • Collaborate with ML engineers and data scientists to productionize models, ensuring scalable feature pipelines and deployment workflows.
  • Perform code reviews, mentor team members, and uphold engineering best practices.
  • Drive architectural decisions, capacity planning, and platform scalability.

Required Skills & Experience

  • 6–10+ years in data engineering, software engineering, or distributed systems.
  • Strong Python fundamentals: modular design, testing, async patterns, performance profiling.
  • Advanced Spark/PySpark experience (RDDs, DataFrames, structured streaming, tuning, partitioning strategies).
  • Strong AWS knowledge:
    • S3, EMR, Glue, Lambda, Step Functions
    • IAM policies & security
    • EKS experience is a strong plus
  • Hands-on Docker and Kubernetes (K8s operators, manifests, Helm is a plus).
  • Solid understanding of distributed systems concepts (fault tolerance, parallelization, memory management).
  • Experience designing data models, ETL frameworks, and metadata management.
  • Working knowledge of ML workflows (feature engineering pipelines, model deployment, feature stores).
  • Experience with monitoring (Prometheus, Grafana), logging (ELK), and alerting systems.
  • Strong problem-solving ability and the ability to cut through noise.

Nice to Have

  • Experience with Delta Lake / Iceberg / Hudi.
  • Terraform or CloudFormation for IaC.
  • CI/CD for ML (SageMaker pipelines, MLflow, Kubeflow, or similar).
  • Real-time streaming platforms (Kafka, Kinesis).
  • Security and compliance experience (PII handling, encryption-at-rest/in-transit).

Soft Skills (the practical ones)

  • Drives clarity, not chaos.
  • Communicates architecture decisions clearly and defends them with logic, not buzzwords.
  • Can mentor mid-level engineers without hand-holding.
  • Can operate independently and handle ambiguity.
  • Pushes back when something is poorly defined or technically unsound.