Back to jobs
Observability & Monitoring Lead
Successfully
Req. VR-121454
Support clients in the operation, maintenance, and optimization of Oracle Cerner EHR environments. This role is designed for early-career professionals who are eager to grow their technical skills in healthcare IT while working under the mentorship of experienced consultants and technical leaders. You will gain hands-on exposure to Cerner infrastructure, system workflows, and healthcare technology best practices while contributing to meaningful client outcomes.
Trend Analysis & Problem Identification
Identify recurring incident patterns, anomalies, and signs of alert fatigue that may indicate deeper systemic issues.
Collaborate with L2/L3 teams to review telemetry data and recommend improvements to alert thresholds, rules, and policies.
Provide insights that support proactive issue prevention, noise reduction, and overall monitoring refinement.
Platform Management & Optimization
Develop, update, and maintain dashboards that reflect real‑time system health, performance metrics, and service behavior.
Support the ongoing adoption and optimization of Dynatrace, enhancing dashboarding and visualization capabilities for cloud and on‑prem observability.
Assist in routine platform checks, ensuring monitoring tools remain accurate, stable, and aligned with business and operational requirements.
Leadership & Collaboration
Responsible for organizing the work for the team, including planning, task breakdown, and ensuring clarity of priorities.
Provide structured, timely updates to leadership on progress, risks, blockers, team capacity, and delivery timelines.
Work closely with application teams, SRE groups, and infrastructure operations during incident triage, investigations, and routine monitoring reviews.
Ensure clear, timely, and effective communication with stakeholders during service-impacting events, providing status updates and context as needed.
Ensures adherence to engineering best practices, drives operational excellence, and maintains accountability for team delivery outcomes
Operational Excellence
Support platform stability and availability through adherence to lifecycle maintenance, patching schedules, and vulnerability management processes.
Contribute to the improvement of monitoring workflows, alert routing logic, runbook effectiveness, and incident management practices.
Innovation & AI Enablement
Assist in exploring and adopting AI-driven capabilities that improve observability, automate root‑cause identification, and reduce manual effort.
Contribute to internal knowledge sharing by documenting best practices, playbooks, AI reference materials, and usage guidelines (e.g., Copilot tips).
Collaboration & Leadership Support
Partner with cross-functional teams to align monitoring practices with evolving business needs and operational priorities.
Drive end-to-end delivery of monitoring initiatives—requirements gathering, planning, execution oversight, and delivery validation.
Coordinate cross‑team dependencies, ensure timelines are met, and proactively remove blockers for the team.
Provide subject‑matter support for ITSM processes including incident, problem, and change management discussions.
Must have
6+ years in Site Reliability Engineering or Observability/Monitoring engineering roles.
5+ years hands-on with monitoring/observability tools: New Relic, SolarWinds ,WUG
4+ years of scripting experience (JavaScript, Java, PowerShell, or others)
2+ years with Azure (architecture fundamentals, observability in cloud-native and lift‑and‑shift contexts).
4+ year scripting with Python and Bash or PowerShell for automation.
Experience troubleshooting complex distributed applications, leading/participating in war rooms, and performing code‑level impact analysis (read logs/stack traces, correlate with deploys and infra changes).
Solid understanding of observability best practices (metrics, logs, traces), ITSM processes, and alert hygiene.
Have the mindset of "automate any task"
Maintain associated documentation as it applies to our audit and certification requirements
Ensure platform stability, availability, and compliance through proactive vulnerability management and lifecycle maintenance
Drive process improvements for monitoring workflows and incident management
Participate in troubleshooting, capacity planning, and performance analysis activities
Research new monitoring requirements and in many cases write code for that
Solid expertise in setting up monitoring policies/rules/templates; and writing scripts to accomplish monitoring requirements
Excellent problem solving, communication, and cross‑team collaboration skills.
Nice to have
Certifications
Languages
English: C1 Advanced
Seniority
Regular
Pune, India
Req. VR-121454
Integration Engineering
HLS & Consumer industry
04/03/2026
Req. VR-121454
Apply for Observability & Monitoring Lead in Pune
*Indicates a required field