L1 Site Reliability Engineer,
Remote Romania


Remote Romania

Office Address

Project Description

We have developed Cloud Services platform, based on micro services architecture, hosting SaaS applications for large enterprises and internet service providers. As we scale our offerings, we are expanding our support capabilities in Europe. This position involves providing L1/L2 support needed by our customers.


    The candidate will be part of L1 SRE engineering team whose mission is to monitor and manage cloud operations, monitor the overall health, take proactive measures to exceed customer SLAs, plan capacity, understand the overall cloud services platform architecture, and the end-to-end customer solution needs.
    • Operate, maintain and support production systems/applications; ensure that the systems are accessible and available
    • Work in a 24x7 support environment, which includes night shifts as well
    • Deploy, maintain and improve availability and performance of production environment to ensure high quality through early detection of issues
    • Develop metrics and alarms to monitor health and security of applications and micro services running on cloud in AWS infrastructure
    • Ensure systems availability to adhere to customers SLAs and plan capacity
    • Participate in change management process, as appropriate
    • Escalate issues to engineering and L3 SRE team, as appropriate and participate in communication with customers until the issue is resolved
    • Work with customers, deliver high quality monitoring, constantly improve and optimize it
    • Correlate monitoring information from application and infrastructure helping to resolve problems
    • The candidate must possess outstanding problem-solving skills in the diagnosis and resolution of platform issues
    • Participate in documentation creation process


Must have

    • Experience in monitoring production environments (i.e. AWS CloudWatch, Prometheus, Grafana)
    • Prior experience in developing metrics and alarms to monitor health of infrastructure and applications
    • Prior experience with ELK stack (ElasticSearch, LogStash and Kibana)
    • Very good background using public cloud infrastructure (AWS)
    • Very good Linux troubleshotting skills
    • Understanding of networking fundamentals (TPC/IP, DHCP, DNS, IP routing, switching, SDWAN, Security and Cloud networking services)
    • Programming experience in automation using python
    • Hands-on experience with Docker, Kubernetes, Terraform, Salt
    • Knowledge of RDBMS and Cassandra databases (including query construction)
    • Very good communication with customer skills (English)

Nice to have

    • Experience in managing SaaS applications infrastructure with REST based test automation
    • Experience with MariaDB, ArangoDB, Zookeeper, RabbitMQ, ETCD
    • Experience and thorough understanding of Micro service development architecture, Agile development model
    • Knowledge of building pipeline/infrastructure like Jenkins, GitHub, CICD would be added advantage


English: B2 Upper Intermediate



Relocation package

If needed, we can help you with relocation process. Click here for more information.

Work Type

Technical Support (SL1)

Ref Number


More job opportunities in
Technical Support (SL1)

Specialization Position / Title Location Seniority Send to a friend
Technical Support (SL1) Switching L1/L2Technical Support Engineer Remote Romania, RO Regular
Technical Support (SL1) Security L1/L2Technical Support Engineer Remote Romania, RO Junior
Technical Support (SL1) Security L1/L2Technical Support Engineer Remote Romania, RO Regular