Pune, Maharashtra, India
119 days ago
Site Reliability Engineering Manager
Zycus is looking for  Program Manager- Site Reliability Engineering (SRE) with 8 to 15 years of experience. In this role, you will be responsible for leading a team of SRE engineers and ensuring the reliability and performance of our systems and infrastructure. You will work closely with cross-functional teams to implement best practices, drive operational excellence, and support the overall mission and goals of the organization. This role requires a strong technical background, exceptional leadership skills, and the ability to align SRE initiatives with business objectives.
Roles and Responsibilities: Lead and manage a team of SRE engineers, providing technical guidance, mentorship, and performance feedback. Develop and implement SRE strategies, standards, and best practices to improve system reliability, availability, and performance. Collaborate with cross-functional teams, including development, operations, and quality assurance, to ensure seamless integration of SRE principles throughout the software development lifecycle. Strong knowledge of cloud infrastructure, such as AWS, Azure, or GCP, and proficiency in infrastructure as code (IaC) tools like Terraform or Cloud Formation. Balance feature development speed and reliability with well-defined service-level objectives. Define and monitor service level objectives (SLOs) and key performance indicators (KPIs) to measure and optimize system performance and reliability. Identify and mitigate potential risks and vulnerabilities in the infrastructure, applications, and services, ensuring high levels of security and compliance. Drive incident response and resolution processes, including root cause analysis and post-incident reviews, to minimize downtime and improve system resiliency. Implement and maintain effective monitoring, alerting, and logging systems to proactively identify and address system issues. Need to have certain initiatives implemented (eg. Middleware support experience, ITIL v3 - nice to have) Collaborate with architects and engineers to design and deploy highly available and scalable infrastructure solutions. Stay up to date with industry trends, emerging technologies, and best practices in SRE, and evaluate their applicability to the organization's infrastructure and systems. Provide regular reports and updates to senior management on the status of SRE initiatives, system performance, and key metrics.
Confirm your E-mail: Send Email