Senior Software Engineer -Data Engineer

Mumbai, India

Apply Now!

Job Description — Lead Data Engineer

Role Overview

You will own the design, development, and optimization of large-scale data pipelines and platforms. This role needs someone who can combine strong engineering fundamentals with hands-on expertise in Python, distributed data processing (Spark), cloud architecture (AWS), and modern container/orchestration stacks (Docker, Kubernetes). You will guide the data engineering strategy, enforce best practices, and lead complex initiatives end-to-end.

Key Responsibilities

Architect, build, and maintain scalable batch and streaming data pipelines using Spark (PySpark preferred).
Develop robust, modular, production-grade Python services for ingestion, transformation, and ML-adjacent workflows.
Optimize Spark jobs for performance, cost efficiency, and reliability in large datasets (TB-scale).
Lead the design of data lake, lakehouse, or warehouse architecture (S3 + Glue + Athena, EMR, Redshift, or similar).
Implement CI/CD pipelines for data services using tools like GitHub Actions / GitLab / Jenkins.
Manage containerized workloads using Docker and deploy/operate them on Kubernetes (EKS preferred).
Enforce data quality, lineage, governance, and schema management standards.
Collaborate with ML engineers and data scientists to productionize models, ensuring scalable feature pipelines and deployment workflows.
Perform code reviews, mentor team members, and uphold engineering best practices.
Drive architectural decisions, capacity planning, and platform scalability.

Required Skills & Experience

6–10+ years in data engineering, software engineering, or distributed systems.
Strong Python fundamentals: modular design, testing, async patterns, performance profiling.
Advanced Spark/PySpark experience (RDDs, DataFrames, structured streaming, tuning, partitioning strategies).
Strong AWS knowledge:
- S3, EMR, Glue, Lambda, Step Functions
- IAM policies & security
- EKS experience is a strong plus
Hands-on Docker and Kubernetes (K8s operators, manifests, Helm is a plus).
Solid understanding of distributed systems concepts (fault tolerance, parallelization, memory management).
Experience designing data models, ETL frameworks, and metadata management.
Working knowledge of ML workflows (feature engineering pipelines, model deployment, feature stores).
Experience with monitoring (Prometheus, Grafana), logging (ELK), and alerting systems.
Strong problem-solving ability and the ability to cut through noise.

Nice to Have

Experience with Delta Lake / Iceberg / Hudi.
Terraform or CloudFormation for IaC.
CI/CD for ML (SageMaker pipelines, MLflow, Kubeflow, or similar).
Real-time streaming platforms (Kafka, Kinesis).
Security and compliance experience (PII handling, encryption-at-rest/in-transit).

Soft Skills (the practical ones)

Drives clarity, not chaos.
Communicates architecture decisions clearly and defends them with logic, not buzzwords.
Can mentor mid-level engineers without hand-holding.
Can operate independently and handle ambiguity.
Pushes back when something is poorly defined or technically unsound.

Apply Now!