Tech Ops Engineer - Incident Management, Central Technical Operations Services (CTOS)
Amazon.com
Amazon is seeking an exceptional Systems Engineer to join our world-class Central Technical Operations Services (C-TOS) team as an Incident Manager. As the first line of defense for maintaining high availability on the Amazon Retail Website, our C-TOS group provides critical incident response and management for the entire Amazon ecosystem. When issues arise that could impact our hundreds of millions of customers worldwide, our skilled Incident Managers spring into action to make event durations shorter, less frequent, and less severe.
This is immensely important, high-stakes work. The Amazon Retail Website is where we directly engage and delight our global customer base - any disruption can have a real impact on real people. That's why our C-TOS Incident Managers are so vital; leveraging deep operational expertise and the latest incident management tools, they work quickly to mitigate customer-impacting events.
This is an excellent opportunity to join one of Amazon's world-class engineering teams, working alongside some of the best and brightest minds in technology. Our engineers are encouraged to build solutions that enhance our incident management practice, including tooling and processes, as well as fix software problems - and then share those innovations across the organization. You'll have access to mentoring programs, regular tech talks with technical leaders, and well-defined career paths for motivated engineers who want to contribute to our culture of operational excellence and customer-focused innovation. The C-TOS team is globally distributed, with groups in Austin, Dublin, and Sydney providing 24/7 coverage, each working 10-hour shifts for 4 days per week.
#techjobsau
Key job responsibilities
- Serve as a technical evangelist, leveraging deep expertise to devise innovative solutions to complex business problems.
- Drive down mean time to resolution for incidents through proactive monitoring, rapid response, and continuous process improvement.
- Design, implement, and optimize world-class event detection, alerting, and incident management systems.
- Evolve operations management processes and technologies to accommodate Amazon's rapid growth.
- Create, review, and continuously improve documentation, procedures, and knowledge resources.
- Identify and resolve recurring platform issues by collaborating cross-functionally with service owners.
- Provide exceptional customer service by responding to and resolving requests within defined SLAs.
- Participate in a global "follow the sun" rotation, ensuring 24/7 coverage including weekends and holidays.
- Contribute to the interviewing and hiring process to build a world-class Incident Management team.
This is immensely important, high-stakes work. The Amazon Retail Website is where we directly engage and delight our global customer base - any disruption can have a real impact on real people. That's why our C-TOS Incident Managers are so vital; leveraging deep operational expertise and the latest incident management tools, they work quickly to mitigate customer-impacting events.
This is an excellent opportunity to join one of Amazon's world-class engineering teams, working alongside some of the best and brightest minds in technology. Our engineers are encouraged to build solutions that enhance our incident management practice, including tooling and processes, as well as fix software problems - and then share those innovations across the organization. You'll have access to mentoring programs, regular tech talks with technical leaders, and well-defined career paths for motivated engineers who want to contribute to our culture of operational excellence and customer-focused innovation. The C-TOS team is globally distributed, with groups in Austin, Dublin, and Sydney providing 24/7 coverage, each working 10-hour shifts for 4 days per week.
#techjobsau
Key job responsibilities
- Serve as a technical evangelist, leveraging deep expertise to devise innovative solutions to complex business problems.
- Drive down mean time to resolution for incidents through proactive monitoring, rapid response, and continuous process improvement.
- Design, implement, and optimize world-class event detection, alerting, and incident management systems.
- Evolve operations management processes and technologies to accommodate Amazon's rapid growth.
- Create, review, and continuously improve documentation, procedures, and knowledge resources.
- Identify and resolve recurring platform issues by collaborating cross-functionally with service owners.
- Provide exceptional customer service by responding to and resolving requests within defined SLAs.
- Participate in a global "follow the sun" rotation, ensuring 24/7 coverage including weekends and holidays.
- Contribute to the interviewing and hiring process to build a world-class Incident Management team.
Confirm your E-mail: Send Email
All Jobs from Amazon.com