San Francisco, CA, US
1 day ago
Senior Site Reliability Engineer I (Python/Golang), Agent Ops

Cisco ThousandEyes is a Digital Experience Assurance platform that empowers organizations to deliver flawless digital experiences across every network – even the ones they don’t own. Powered by AI and an unmatched set of cloud, internet and enterprise network telemetry data, ThousandEyes enables IT teams to proactively detect, diagnose, and remediate issues – before they impact end-user experiences.

About The Role

The Application Window is expected to close on 1/16/25. However, the job posting may be removed earlier if the position is filled or if a sufficient number of applications are received.

Are you interested in Operational Excellence? The Agent Ops team is needed to maintain and grow our global infrastructure. Our mission is to optimize user experience from a geographic and service provider standpoint. As a Site Reliability Engineer, you'll ensure the reliability of our global monitoring infrastructure, focusing on availability, performance, and strategic growth. Join us and be part of a team passionate about innovation and reliability!

What You’ll Do

We are seeking driven and innovative engineers with a blend of software and operations expertise. Familiarity with networking protocols, Kubernetes, Infrastructure as Code (IaC), distributed computing, and software development is key.

• Collaborate with software engineers to quickly identify and resolve software issues, suggesting performance and architecture improvements.

• Design and maintain a custom infrastructure deployment model with bare metal servers and virtualized environments across major cloud providers.

• Automate processes to enable the fleet to scale efficiently.

• Analyze, debug, and resolve issues across our infrastructure and platform services.

• Strive to provide our customers with the best possible experience.

• Participate in an on-call rotation, enhancing our 24x7 incident response capabilities.

Required Qualifications

• Design and implement scalable, well-tested solutions, emphasizing Kubernetes and Terraform deployments.

• Write high-quality code in Python or Go

• Use IaC tools like Terraform, Ansible, Puppet, and Kubernetes to build sophisticated yet streamlined systems.

• Have experience with cloud services, especially AWS.

• Possesses proven knowledge of Unix/Linux systems, including the kernel, file systems, and general administration.

• Understand standard network protocols such as IPv4, IPv6, TCP, UDP, DNS, HTTP, and TLS.

• Communicate complex topics clearly to diverse audiences.

• Exhibit strong ownership, drive, and attention to detail.

• Contribute to excellence in both operations and development.

Cisco values the perspectives and skills that emerge from employees with diverse backgrounds. That's why Cisco is expanding the boundaries of discovering top talent by not only focusing on candidates with educational degrees and experience but also placing more emphasis on unlocking potential. We believe that everyone has something to offer and that diverse teams are better equipped to solve problems, innovate, and create a positive impact.

We encourage you to apply even if you do not believe you meet every single qualification. Not all strong candidates will meet every single qualification. Research shows that people from underrepresented groups are more prone to experiencing imposter syndrome and doubting the strength of their candidacy. We urge you not to prematurely exclude yourself and to apply if you're interested in this work.

Cisco is an Affirmative Action and Equal Opportunity Employer and all qualified applicants will receive consideration for employment without regard to race, color, religion, gender, sexual orientation, national origin, genetic information, age, disability, veteran status, or any other legally protected basis. Cisco will consider for employment, on a case by case basis, qualified applicants with arrest and conviction records.

Confirm your E-mail: Send Email