Senior Software Engineer -Data Engineer
Mumbai,
India
Mumbai,
India
Job Description — Lead Data Engineer
Role Overview
You will own the design, development, and optimization of large-scale data pipelines and platforms. This role needs someone who can combine strong engineering fundamentals with hands-on expertise in Python, distributed data processing (Spark), cloud architecture (AWS), and modern container/orchestration stacks (Docker, Kubernetes). You will guide the data engineering strategy, enforce best practices, and lead complex initiatives end-to-end.
Key Responsibilities
- Architect, build, and maintain scalable batch and streaming data pipelines using Spark (PySpark preferred).
- Develop robust, modular, production-grade Python services for ingestion, transformation, and ML-adjacent workflows.
- Optimize Spark jobs for performance, cost efficiency, and reliability in large datasets (TB-scale).
- Lead the design of data lake, lakehouse, or warehouse architecture (S3 + Glue + Athena, EMR, Redshift, or similar).
- Implement CI/CD pipelines for data services using tools like GitHub Actions / GitLab / Jenkins.
- Manage containerized workloads using Docker and deploy/operate them on Kubernetes (EKS preferred).
- Enforce data quality, lineage, governance, and schema management standards.
- Collaborate with ML engineers and data scientists to productionize models, ensuring scalable feature pipelines and deployment workflows.
- Perform code reviews, mentor team members, and uphold engineering best practices.
- Drive architectural decisions, capacity planning, and platform scalability.
Required Skills & Experience
- 6–10+ years in data engineering, software engineering, or distributed systems.
- Strong Python fundamentals: modular design, testing, async patterns, performance profiling.
- Advanced Spark/PySpark experience (RDDs, DataFrames, structured streaming, tuning, partitioning strategies).
- Strong AWS knowledge:
- S3, EMR, Glue, Lambda, Step Functions
- IAM policies & security
- EKS experience is a strong plus
- Hands-on Docker and Kubernetes (K8s operators, manifests, Helm is a plus).
- Solid understanding of distributed systems concepts (fault tolerance, parallelization, memory management).
- Experience designing data models, ETL frameworks, and metadata management.
- Working knowledge of ML workflows (feature engineering pipelines, model deployment, feature stores).
- Experience with monitoring (Prometheus, Grafana), logging (ELK), and alerting systems.
- Strong problem-solving ability and the ability to cut through noise.
Nice to Have
- Experience with Delta Lake / Iceberg / Hudi.
- Terraform or CloudFormation for IaC.
- CI/CD for ML (SageMaker pipelines, MLflow, Kubeflow, or similar).
- Real-time streaming platforms (Kafka, Kinesis).
- Security and compliance experience (PII handling, encryption-at-rest/in-transit).
Soft Skills (the practical ones)
- Drives clarity, not chaos.
- Communicates architecture decisions clearly and defends them with logic, not buzzwords.
- Can mentor mid-level engineers without hand-holding.
- Can operate independently and handle ambiguity.
- Pushes back when something is poorly defined or technically unsound.