Kuala Lumpur, MY
117 days ago
P-9025914 Site Reliability Engineer, Principal-1

At AIA we’ve started an exciting movement to create a healthier, more sustainable future for everyone.

As pioneering innovators for over 100 years, we’re now transforming our organisation to be faster, simpler and more connected. Because we want to be even better equipped to develop digital solutions and experiences that help more people live Healthier, Longer, Better Lives.

To get there, we need people with tech/digital/analytics expertise and passion to help develop positive, sustainable change through digitally enhanced experiences that will impact the lives of millions of people and create a healthier future for everyone.

If you believe in developing a better tomorrow, read on. 

About the Role

System Reliability Engineer (SRE) to ensure that our cloud application systems are reliable and available to users. The SRE will monitor application systems and establish automated detections, root cause analysis, and formulate preventive actions. They will gather and analyze metrics from operating systems as well as applications to assist in performance tuning and fault finding. They will partner with development teams to improve services.

About the Role:

We are looking for a Site Reliability Engineer (SRE) to enhance the reliability, scalability, and performance of our systems. As an SRE, you will focus on building and maintaining automation systems, monitoring platforms, and cloud infrastructures to ensure uptime and reliability. You will work closely with development and operations teams to monitor critical services and ensure that our applications are performant and highly available.

The ideal candidate should have expertise in web and mobile development stacks, cloud technologies, and a strong understanding of how APIs work from a monitoring and reliability perspective. You should be capable of working independently with minimal supervision and providing critical support for resolving system issues.

Key Responsibilities:

Ensure the reliability, scalability, and efficiency of production systems, maintaining high availability of services.Build and maintain infrastructure automation to streamline operations and enhance monitoring capabilities using Azure cloud service.Collaborate with development teams to ensure the reliability of mobile and web applications built on React Native, React.js, Node.js, and Angular.Monitor RESTful APIs for mobile and web to ensure consistent availability, performance, and reliability, identifying and resolving issues as they arise.Implement and maintain robust monitoring and alerting systems using tools such as Dynatrace, Elastic Stack, or equivalent.Involved in incident response efforts for critical system issues, ensuring quick identification, resolution, and follow-up post-incident reviews. Automate operational processes to reduce manual intervention, improve system efficiency, and enhance response times.Ensure the reliability of application through best practices, including disaster recovery planning and resource optimization. Provide occasional on-call support for critical system issues, ensuring minimal downtime. Document and maintain processes, monitoring setups, known issues, and operation toils

Required Skills and Experience:

Proficiency in one or more of the following stacks: React Native, React.js, Node.js, Angular.Experience in mobile development using Kotlin (Android) and Swift (iOS).Strong understanding of RESTful APIs, with experience in monitoring and ensuring their availability, performance, and reliability.Experience with cloud platforms such as AWS, Azure, or GCP, including managing and optimizing cloud infrastructure. Familiarity with monitoring and logging tools such as Dynatrace, Elastic Stack, or similar platforms. Proven experience in automating system operations and managing system reliability in large-scale environments. Ability to troubleshoot and resolve complex system issues in a timely manner.Strong communication and collaboration skills, working effectively across development and operations teams.

Nice to Have:

Experience in .NET or familiarity with the Microsoft development ecosystem.Experience with CI/CD pipelines and tools.

Build a career with us as we help our customers and the community live Healthier, Longer, Better Lives.

You must provide all requested information, including Personal Data, to be considered for this career opportunity. Failure to provide such information may influence the processing and outcome of your application. You are responsible for ensuring that the information you submit is accurate and up-to-date.

Confirm your E-mail: Send Email