Bangalore
148 days ago
Sr. Site Reliability Engineer
About ZetaZeta is a Next-Gen Banking Tech company that empowers banks and fintechs to launch banking products for the future. It was founded by and Ramki Gaddipati in 2015.Our flagship processing platform - Zeta Tachyon - is the industry’s first modern, cloud-native, and fully API-enabled stack that brings together issuance, processing, lending, core banking, fraud & risk, and many more capabilities as a single-vendor stack. 20M+ cards have been issued on our platform globally.Zeta is actively working with the largest Banks and Fintechs in multiple global markets transforming customer experience for multi-million card portfolios.Zeta has over 1700+ employees - with over 70% roles in R&D - across locations in the US, EMEA, and Asia. We raised $280 million at a $1.5 billion valuation from Softbank, Mastercard, and other investors in 2021.Learn more @ , , , ResponsibilitiesSystem Reliability: Ensuring the reliability of software systems by designing, implementing, and maintaining scalable and reliable infrastructure.Automation: Developing automation tools and scripts to streamline operational tasks, reduce manual intervention, and improve overall system efficiency.Incident Response and Resolution: Monitoring system performance and responding to incidents promptly to minimize downtime and ensure high availability.Capacity Planning: Analyzing system usage patterns and forecasting future capacity needs to ensure that the infrastructure can handle current and future demands.Performance Optimization: Identifying and addressing performance bottlenecks in software systems through optimization and tuning.Infrastructure as Code (IaC): Implementing infrastructure as code practices, using tools like Terraform or Ansible, to define and manage infrastructure in a version-controlled and automated manner.Monitoring and Logging: Implementing and maintaining monitoring and logging solutions to gain insights into system behavior, troubleshoot issues, and proactively address potential problems.On-Call Support: Participating in an on-call rotation to respond to incidents outside of regular working hours and ensure 24/7 system availabilitySecurity: Collaborating with security teams to implement and maintain security best practices in infrastructure and applicationDisaster Recovery Planning: Developing and maintaining disaster recovery plans to ensure that systems can quickly recover from major outages or failuresContinuous Improvement: Continuously analyzing system performance, reliability, and incidents to identify areas for improvement and implementing changes to enhance overall system resilience.SkillsProgramming Languages: Proficiency in one or more programming languages, commonly Python, Go, Shell, Bash.Automation and Scripting: Strong automation skills using tools like Ansible, Puppet, Chef, or custom scripts. Knowledge of Infrastructure as Code (IaC) tools like TerraformContainerization and Orchestration: Experience with containerization technologies like Docker and container orchestration platforms like Kubernetes.Cloud Computing: Proficiency in any of the cloud platforms such as AWS, Azure, or Google Cloud Platform, and knowledge of managing infrastructure in the cloud.Monitoring and Logging: Familiarity with monitoring tools (e.g., Prometheus, Grafana, ELK stack) and logging frameworks to track system performance and troubleshoot issues.Networking: Understanding of networking concepts, protocols, and troubleshooting skills.Security: Knowledge of security best practices, including encryption, access controls, and vulnerability management.Continuous Integration/Continuous Deployment (CI/CD): Understanding and implementation of CI/CD pipelines for automated testing and deployment.Load Balancing: Experience in incident response, troubleshooting, and resolution.Version Control: Proficient use of version control systems like Git.Experience and QualificationsMinimum 5+ years of experience in site reliability engineering. in computer science, information technology or a related field.Having experience working for a product organization is a plus.Certifications from cloud service providers like AWS Certified DevOps Engineer, Google Cloud Professional DevOps Engineer, or Microsoft Certified is a plus
Confirm your E-mail: Send Email