Kochi
88 days ago
Senior Site Reliability Engineer

Site Reliability Engineer.

Responsibilities

·       The engineer will enable clients to navigate and adoption of IT methodologies and operating models to drive business agility using SRE and Agile frameworks.

·       As a SRE engineer, you will work closely with our clients to define clients’ operational and governance models.

·       Design and deploy scalable, reliable, and secure SRE solutions.

·       The ideal candidate will combine technical and business skills and a passion for working with clients to deliver excellence.

·       This position is responsible for collaborating with teams to build tools and strategies for problem detection, prevention, and chaos testing.

·       Troubleshoot production incidents in real-time and conduct blameless post-mortems.

·       You will have the opportunity to work on/with a diverse set of projects, clients, industries, and frameworks and this position will provide opportunities to expand your horizons to reach your personal development goals.

·       Mentors and coaches other members of the agile team. Leads a small team of DevOps engineers using agile methodology, with a focus on continuous delivery. 

·       Provides functional and technical expertise on monitoring, observability, and resilience.

·       Drives engagement with Security and Infrastructure teams to ensure secure deployment of applications.

·       Assists in production support and maintenance of applications as needed.

·       Develops and maintains the documentation.

Must have

·       Proven experience in SRE or similar role

·       Experience with DevOps in public cloud (AWS/Azure)

·       Expertise in defining SLAs, Error Budgets, and other key metrics for stable, scalable, robust application infrastructure.

·       Good understanding of Chaso engineering and Canary deployments

·       Good understanding of Cloud Infrastructure services and their limitations

·       Experience in configuring & monitoring different attributes and handling scale-up and scale-down scenarios for the application in a cloud environment.

·       Deploy and manage container orchestration, service mesh, serverless, API gateways, and observability stack.

·       Have experience building and deploying as containers on a cloud platform using an automated CI / CD pipeline.

·       Experience with network technologies and with system, security, and network monitoring tools

·       Experience using Terraform for IaC automation.

·       Deep knowledge of monitoring and observability tools (e.g. Prometheus, Grafana, ELK Stack, DataDog, AppDynamics, New Relic, or similar)

·       Deep knowledge of ing tools (e.g. Pager Duty, Zen Duty, or similar)

·       Practical scripting skills are a must.

·       At least 3+ years of experience working in an Agile team.

Qualifications:

·       Bachelor’s degree or equivalent in Computer Science, Engineering, or a related field, or additional comparable experience

·       Proven experience in IT, application and infrastructure monitoring, DevOps, including excellent knowledge of networking, computing, and storage.

·       Industry certifications in Monitoring, Observability, SRE, DevOps, and Cloud services will be a big plus.

·       Any SAFe certification is desirable.

Confirm your E-mail: Send Email