• To identify duplicate components and maintain an open library of share components for teams to reuse
• To benchmark existing apps, identifying gaps and proposing mitigation plans
• Support documentation of standard processes and best practices (i.e. application resiliency guidelines).
• Enhancing operational and reliability best practices (e.g, capacity planning. SLOs, incident response) and work with teams to adopt those practices
• Reviewing their technical architecture, dependencies and underlying infrastructure
• Helping build or help teams adopt, core shared internal components
• Improving monitoring and observability
• Work with teams to design and implement automation, tooling, and application code to improve reliability and reduce toil
• Mentor less senior SREs and grow the SRE community and practice at New Relic.
• Perform task-based operational work (toil) required to unblock teams with operational needs where automated or self-service solutions do not yet exist for those teams.
• Hands-on development and experimentation on new technologies and techniques
Must• Strong understanding of Linux
• Java, Spring, Spring Boot
• Distributed Systems
• Openshift/Cloud Foundry
• Docker, Kubernetes
• DevOps (Jenkins/Maven/Git/SonarQube/Jira/Artifactory)
• AWS/Google Cloud/Azure
• Elasticsearch, Logstash, Kibana
• Dynatrace/AppDynamics/New Relic
• Prometheus, Grafana
Nice to have• Agile, Scrum, Extreme Programming
• Test Driven Development
• Relational Databases such as Oracle, MySQL, MariaDB, PostgreSQL.
• BMC Remedy
- English: Upper-intermediate