Illumio, the pioneer and market leader of Zero Trust segmentation, prevents breaches from becoming cyber disasters. Illumio protects critical applications and valuable digital assets with proven segmentation technology purpose-built for the Zero Trust security model. Illumio ransomware mitigation and segmentation solutions see risk, isolate attacks, and secure data across cloud-native apps, hybrid and multi-clouds, data centers, and endpoints, enabling the world’s leading organizations to strengthen their cyber resiliency and reduce risk. Illuminate the future with Illumio and join a team that’s passionate about developing cutting-edge security solutions that protect the world's most critical assets.
Our Team's Vision:Our Engineering team is driven by a culture that thrives on visionary leadership, autonomy, and ownership, creating a dynamic synergy that drives us forward in the ever-evolving landscape of cybersecurity.
When you join our team, you become part of the leader in Zero Trust Segmentation. You'll work with a cutting-edge technology stack that spans operating systems, distributed applications, and immersive UI/visualization tools.
We're shaping the future of cybersecurity. And together, we will continue to build world-class products—led by people with different perspectives, backgrounds, and a commitment to innovation in a time when the world faces its greatest cybersecurity threats in history.
Your Impact:We are seeking a skilled and proactive Product SRE (Site Reliability Engineer) to join our team and take ownership of debugging, troubleshooting, and resolving production escalations in a complex SaaS environment. The ideal candidate will have a deep understanding of AWS and Azure cloud platforms, application performance, and operational excellence, with a passion for automation and continuous improvement.
Production Support:
Investigate and resolve production incidents and escalations to ensure minimal downtime and impact to customers.
Work closely with engineering and support teams to troubleshoot application and infrastructure issues.
Performance Monitoring and Optimization:
Proactively monitor application health, performance, and reliability using modern observability tools.
Analyze trends in system behavior and suggest performance improvements.
Automation and Tooling:
Develop and maintain automation scripts and tools to improve operational efficiency and incident resolution.
Create and enhance runbooks to streamline troubleshooting and reduce mean time to resolution (MTTR).
Root Cause Analysis (RCA):
Conduct thorough post-incident reviews to identify root causes and implement preventive measures.
Drive a culture of continuous improvement by documenting lessons learned and improving system designs.
Cross-Functional Collaboration:
Partner with software engineers, QA, and product teams to improve application stability and user experience.
Act as a bridge between development and operations, ensuring smooth and reliable service delivery.
Your Toolkit:Bachelor's degree in Computer Science, Engineering, or related field; or equivalent work experience
8+ years of relevant SRE experience.
Cloud Expertise: Strong hands-on experience with AWS and Azure Familiarity with Kubernetes and containerized environments. Knowledge of networking concepts, such as DNS, load balancing, and firewalls. Troubleshooting Skills: Proficient in diagnosing and resolving complex issues in SaaS environments, including performance bottlenecks and application errors. Programming and Scripting: Proficiency in at least one programming language (e.g., Python, Go, Java) and scripting languages (e.g., Bash, PowerShell). Monitoring and Observability: Experience with tools like Datadog, New Relic, Prometheus, Grafana, ELK, or Azure Monitor. Automation and Configuration Management: Familiarity with tools like Ansible, Terraform, or CloudFormation. Database Experience: Knowledge of debugging and optimizing relational databases (e.g., PostgreSQL, MySQL) and caching systems (e.g., Redis, Memcached). Incident Management: Experience with incident management tools and processes, including conducting RCAs and improving on-call processes.Compensation:
$ 192,000 USD - $ 230,000 USD
The pay range for this job level is a general guideline only and not a guarantee of compensation or salary. Additional factors considered in extending an offer include responsibilities of the job, education, location, experience, knowledge, skills, abilities, and internal equity, alignment with market data, or applicable laws.
At Illumio we offer a wide range of benefits to our eligible team members. Our benefit programs vary by location and can include Medical, Dental, Vision Coverage – Health and Dependent Savings Accounts – Life and Disability Programs – Paid Parental Leave – Voluntary Benefit Programs – Company Sponsored Wellness Program – Wellness Reimbursement Program - Retirement Savings – Equity Opportunities – Paid time off and Paid Holidays – Employee Incentive Program. #LI-KD1 #LI-ONSITE
Our Commitment:
Illumio believes that an environment of unique backgrounds, experiences, viewpoints, and individual contributions drives our success and makes us stronger together. We are dedicated to creating and maintaining a diverse culture and emphasizing inclusion and belonging.
All official job offers from our company are extended directly by our recruitment team and will be sent through an official DocuSign document for your review and signature. Please be aware that we do not ask for any personal information in the process of extending offers of employment, such as financial details or social security numbers. Upon acceptance of any offer, we will request such information as part of the onboarding process prior to or on your first day of employment, and only after completing a background check through an authorized third-party vendor. If you receive any communication asking for personal details outside of these processes, please contact us immediately to verify the authenticity of the request. Your security is important to us, and we are committed to a safe and transparent hiring experience.