Back to jobs
Lead Site Reliability Engineer
Successfully
Req. VR-123080
Luxoft partner with next-generation digital bank, built from the ground up to deliver seamless, secure, and scalable financial services. Our platform is cloud-native, API-first, and focused on reliability, speed, and security. We are growing fast and looking for top-tier Site Reliability / Ops Engineers to join our core team and help run and scale our infrastructure.
As a Site Reliability Engineer, you will be responsible for maintaining and scaling our core infrastructure, ensuring our banking services remain available, secure, and performant. You will work closely with development, product, and security teams to automate operations, manage cloud infrastructure, and uphold high availability standards.
Ownership
Lead the design, operation, and continuous improvement of cloud infrastructure, Kubernetes platforms, and reliability practices across production environments.
Direct and develop a team of 3-5 engineers, combining mentoring with clear delivery ownership, coaching, and performance leadership.
Establish and drive standards for observability, deployment safety, incident management, self-service platform capabilities, and reusable golden-path engineering practices.
Build automation across infrastructure provisioning, CI/CD workflows, and operational processes to improve consistency, resilience, and delivery efficiency.
Collaboration
Partner with engineering, product, platform, and security teams to improve reliability, scalability, and secure-by-default operations.
Align stakeholders on platform standards, operational readiness, and adoption of engineering practices, using strong documentation and influence rather than relying only on formal authority.
Provide clarity and direction in complex environments by balancing delivery needs, team development, and cross-functional priorities.
Solutioning
Solve complex reliability and infrastructure problems by balancing availability, security, performance, cost, and delivery speed.
Guide technical decisions across AWS, multi-cluster Kubernetes, blue-green deployments, service mesh, and distributed production systems.
Define and operationalize SLOs, SLIs, error budgets, monitoring, alerting, and post-incident improvement practices.
Support resilient production systems through strong debugging, fault-tolerant design, and practical security and compliance controls.
Must have
12+ years of experience in Site Reliability Engineering, DevOps, Platform Engineering, Cloud Infrastructure, or related production engineering roles.
2+ years operating at Staff Engineer, Lead Engineer, or equivalent senior technical level.
2+ years supporting production-grade microservices environments at scale.
Strong hands-on expertise with AWS, Kubernetes, multi-cluster operations, Terraform, Helm, kubectl, CI/CD, and tools such as Jenkins.
Strong experience with observability and incident management tooling such as Prometheus, Grafana, and OpenSearch.
Experience building self-service platform capabilities, reusable platform standards, and scalable operational practices.
Strong understanding of Zero Trust architecture, OAuth2, ZTNA, IAM, secrets management, certificates, and access controls.
Experience working in regulated or high-control environments with standards such as PCI DSS, ISO 27001, and MAS TRM.
Experience supporting distributed systems and data platforms, including microservices reliability, PostgreSQL, Kafka, Cassandra, and fault-tolerant architectures.
Strong leadership, decision-making, stakeholder influence, and technical documentation skills.
Success KPIs
Production platforms meet agreed reliability, availability, and recovery targets.
Deployment and operational workflows become more automated, repeatable, and low risk.
Platform standards and self-service practices are adopted across teams.
Recurring incidents and operational toil are reduced through better engineering design and automation.
Team capability, ownership, and execution quality improve through effective people leadership.
The role delivers visible business and organizational impact, not only technical delivery.
Nice to have
--
Languages
English: C2 Proficient
Seniority
Lead
Bengaluru, India
Req. VR-123080
DevOps
BCM Industry
26/05/2026
Req. VR-123080
Apply for Lead Site Reliability Engineer in Bengaluru
*Indicates a required field