Job Title:
Site Reliability EngineerAbout Trellix:
Trellix, the trusted CISO ally, is redefining the future of cybersecurity and soulful work. Our comprehensive, GenAI-powered platform helps organizations confronted by today’s most advanced threats gain confidence in the protection and resilience of their operations. Along with an extensive partner ecosystem, we accelerate technology innovation through artificial intelligence, automation, and analytics to empower over 53,000 customers with responsibly architected security solutions.
We also recognize the importance of closing the 4-million-person cybersecurity talent gap. We aim to create a home for anyone seeking a meaningful future in cybersecurity and look for candidates across industries to join us in soulful work. More at https://www.trellix.com/.
Role Overview:
2+ years of hands-on working experience in AWS supporting production of large-scale cloud services.Experience using modern Monitoring and Alerting tools (Prometheus, Grafana, PagerDuty, etc.)
Proven ability to work independently in deploying, testing, and troubleshooting systems.
Familiarity with Containerization and associated management tools (Docker, Kubernetes)
Understanding of Incident, Change, Problem and Vulnerability Management processes.
About Role:
Being part of a global 24x7x365 team providing the operational coverage including event response and recovery efforts of critical services.
Periodic deployment of features, patches and hotfixes to maintain the Security posture of our Cloud Services.
Ability to work in shifts on a rotational basis and participate in On-Call duties
Have ownership and responsibility for high availability of Production environments
Input into the monitoring of systems applications and supporting data
Report on system uptime and availability
Collaborate with other team members on best practices
Assist with creating and updating runbooks & SOPs
Build a strong relationship with the Cloud DevOps, Dev & QA teams and become a domain expert for the cloud services in your remit.
Provided the required support for growth and development in this role.
About You:
2+ years of hands-on working experience in AWS supporting production of large-scale cloud services.Strong production support background and experience of in-depth troubleshootingExperience working with solutions in both Linux and Windows environmentsExperience using modern Monitoring and Alerting tools (Prometheus, Grafana, PagerDuty, Opsgenie, Cloudwatch etc.)Proven ability to work independently in deploying, testing, and troubleshooting systems.Experience in Scripting language like Python or Shell Scripting.Familiarity with security tools & practices (Wiz, Tenable)Familiarity with Containerization and associated management tools (Docker, Kubernetes)Significant experience of developing and maintaining relationships with a wide range of customers at all levels.Understanding of Incident, Change, Problem and Vulnerability Management processes.Company Benefits and Perks:
We work hard to embrace diversity and inclusion and encourage everyone to bring their authentic selves to work every day. We offer a variety of social programs, flexible work hours and family-friendly benefits to all of our employees.
Retirement PlansMedical, Dental and Vision CoveragePaid Time OffPaid Parental LeaveSupport for Community InvolvementWe're serious about our commitment to diversity which is why we prohibit discrimination based on race, color, religion, gender, national origin, age, disability, veteran status, marital status, pregnancy, gender expression or identity, sexual orientation or any other legally protected status.