Position

L1 Site Reliability Engineer,
Remote Romania

Location


Remote Romania

Office Address


Project Description


We have developed Cloud Services platform, based on micro services architecture, hosting SaaS applications for large enterprises and internet service providers. As we scale our offerings, we are expanding our support capabilities in Europe. This position involves providing L1/L2 support needed by our customers.

Responsibilities


    The candidate will be part of L1 SRE engineering team whose mission is to monitor and manage cloud operations, monitor the overall health, take proactive measures to exceed customer SLAs, plan capacity, understand the overall cloud services platform architecture, and the end-to-end customer solution needs.
    • Operate, maintain and support production systems/applications; ensure that the systems are accessible and available
    • Work in a 24x7 support environment, which includes night shifts as well
    • Deploy, maintain and improve availability and performance of production environment to ensure high quality through early detection of issues
    • Develop metrics and alarms to monitor health and security of applications and micro services running on cloud in AWS infrastructure
    • Ensure systems availability to adhere to customers SLAs and plan capacity
    • Participate in change management process, as appropriate
    • Escalate issues to engineering and L3 SRE team, as appropriate and participate in communication with customers until the issue is resolved
    • Work with customers, deliver high quality monitoring, constantly improve and optimize it
    • Correlate monitoring information from application and infrastructure helping to resolve problems
    • The candidate must possess outstanding problem-solving skills in the diagnosis and resolution of platform issues
    • Participate in documentation creation process

Skills


Must have

    • Experience in monitoring production environments (i.e. AWS CloudWatch, Prometheus, Grafana)
    • Prior experience in developing metrics and alarms to monitor health of infrastructure and applications
    • Prior experience with ELK stack (ElasticSearch, LogStash and Kibana)
    • Very good background using public cloud infrastructure (AWS)
    • Very good Linux troubleshotting skills
    • Understanding of networking fundamentals (TPC/IP, DHCP, DNS, IP routing, switching, SDWAN, Security and Cloud networking services)
    • Programming experience in automation using python
    • Hands-on experience with Docker, Kubernetes, Terraform, Salt
    • Knowledge of RDBMS and Cassandra databases (including query construction)
    • Very good communication with customer skills (English)

Nice to have

    • Experience in managing SaaS applications infrastructure with REST based test automation
    • Experience with MariaDB, ArangoDB, Zookeeper, RabbitMQ, ETCD
    • Experience and thorough understanding of Micro service development architecture, Agile development model
    • Knowledge of building pipeline/infrastructure like Jenkins, GitHub, CICD would be added advantage

Languages


English: B2 Upper Intermediate

Seniority


Regular

Relocation package


If needed, we can help you with relocation process. Click here for more information.

Work Type


Technical Support (SL1)

Ref Number


VR-58791

More job opportunities in
Technical Support (SL1)

Specialization Position / Title Location Seniority Send to a friend
Technical Support (SL1) Switching L1/L2Technical Support Engineer Remote Romania, RO Regular
Technical Support (SL1) Security L1/L2Technical Support Engineer Remote Romania, RO Junior
Technical Support (SL1) Security L1/L2Technical Support Engineer Remote Romania, RO Regular