Seattle, WA, US
12 hours ago
Software Development Engineer, EC2 , EC2 Dataplane LSE
Amazon Web Services is seeking a Software Development Engineer to support the growth of EC2's machine learning (ML) platforms. This position focuses on complex, ambiguous problem areas and innovative software initiatives within the EC2 Dataplane LSE team, which is part of the EC2 Infrastructure Services organization.

The role requires broad engineering competence and in-depth technical knowledge of software development, DevOps, infrastructure tools, and distributed systems. The candidate should have demonstrated experience in planning, organizing, and executing software development projects in a dynamic environment. They should be able to independently design, develop, test, and deploy software, as well as clarify requirements and assist with estimates.

The successful candidate will have the unique opportunity to work closely with Security Engineers, Senior Software Engineers, Principal and Distinguished Engineers throughout AWS to define the technical roadmap and follow through with world-class execution.

The EC2 Dataplane LSE team owns and operates services and tools that are designed to detect and recover host and rack-level availability failures, as well as recover EC2 instances back to availability. They architect, develop, and operate highly available and resilient services that are critical to ensuring the highest availability of the broader EC2 service. The team is also closely involved in providing health monitoring capabilities for EC2's newest and upcoming ML infrastructure.

If the challenge of building the next generation of compute platform excites you, come join us to shape the future of compute services!



Key job responsibilities
* Design, develop, operate and own large-scale services, architecting them to scale
* Provide technical leadership and mentor junior engineers in the team
* Write high quality code to develop new systems, conduct deep design and code reviews
* Solve problems at their root, stepping back to understand the broader context, and implementing fixes to ensure that an issue will never happen again
* Share in an on-call rotation with your team

A day in the life
You will be working in a highly collaborative team environment that prioritizes developing and operating high-quality services. The team follows an Agile, Scrum-based process, which provides flexibility for each team member to demonstrate scope and impact. You will mentor the junior engineers and interns. The team generally values operational excellence, so defining the right metrics and providing useful insights into those metrics will be essential.
You'll participate in an on-call rotation with the team to resolve and mitigate production issues, and bring back the learnings from the on-call to continuously improve the operational posture of the team's services. When building new features, you will own the Application Security process and collaborate with the rest of the team to ensure operational readiness.

About the team
The EC2 Dataplane LSE team owns and operates services and tools that are designed to detect and recovery host and rack-level availability failures and recover customer instances back to availability. We architect, develop and own tier1 services that are highly available and resilient to failures, and these services themselves are critical to ensure the highest availability of the broader EC2 service. We play a key role in providing health monitoring capabilities for EC2’s newest and upcoming ML infrastructure. We are also the first responder team to engage in detection and recovery of large scale events that impact EC2 compute resources.
Confirm your E-mail: Send Email