Job Title:
Site Reliability EngineerAbout Trellix:
Trellix, the trusted CISO ally, is redefining the future of cybersecurity and soulful work. Our comprehensive, GenAI-powered platform helps organizations confronted by today’s most advanced threats gain confidence in the protection and resilience of their operations. Along with an extensive partner ecosystem, we accelerate technology innovation through artificial intelligence, automation, and analytics to empower over 53,000 customers with responsibly architected security solutions.
We also recognize the importance of closing the 4-million-person cybersecurity talent gap. We aim to create a home for anyone seeking a meaningful future in cybersecurity and look for candidates across industries to join us in soulful work. More at https://www.trellix.com/.
Role Overview:
The Site Reliability Engineer team is responsible for design, implementation and end to end ownership of the infrastructure platform and services that protect the Trellix Security’s Consumer. The services provide continuous protection to our customers with a very strong focus on quality and an extendible services platform to internal partners & product teams.Role Overview:
This role is a Site Reliability Engineer for commercial cloud-native solutions, deployed and managed in public cloud environments like AWS, GCP.
You will be part of a team that is responsible for Trellix Cloud Services that enable protection at the endpoint products on a continuous basis.
Responsibilities of this role include supporting Cloud service measurement, monitoring, and reporting, deployments and security. You will input into improving overall operational quality through common practices and by working with the Engineering, QA, and product DevOps teams.
You will also be responsible for supporting efforts that improve Operational Excellence and Availability of Trellix Production environments.
You will have access to the latest tools and technology, and an incredible career path with the world’s cyber security leader. You will have the opportunity to immerse yourself within complex and demanding deployment architectures and see the “big picture” all while helping to drive continuous improvement in all aspects of a dynamic and high-performing engineering organization.
About Role:
Being part of a global 24x7x365 team providing the operational coverage including event response and recovery efforts of critical services.
Periodic deployment of features, patches and hotfixes to maintain the Security posture of our Cloud Services.
Ability to work in shifts on a rotational basis and participate in On-Call duties
Have ownership and responsibility for high availability of Production environments
Input into the monitoring of systems applications and supporting data
Report on system uptime and availability
Collaborate with other team members on best practices
Assist with creating and updating runbooks & SOPs
Build a strong relationship with the Cloud DevOps, Dev & QA teams and become a domain expert for the cloud services in your remit.
Provided the required support for growth and development in this role.
If you are passionate about running and continuously improving as a world class Site Reliability Engineer Team, we are offering you a unique and great opportunity to build your career with us and gain experience working with high-performance Cloud systems.About you:
2 to 4 years of hands-on working experience in supporting production of large-scale cloud services.
Strong production support background and experience of in-depth troubleshooting
Experience working with solutions in both Linux and Windows environments
Experience using modern Monitoring and Alerting tools (Prometheus, Grafana, PagerDuty, etc.)
Excellent written and verbal communication skills.
Experience with Python or other scripting languages
Proven ability to work independently in deploying, testing, and troubleshooting systems.
Experience supporting high availability systems and scalable solutions hosted on AWS or GCP.
Familiarity with security tools & practices (Wiz, Tenable)
Familiarity with Containerization and associated management tools (Docker, Kubernetes)
Significant experience of developing and maintaining relationships with a wide range of customers at all levels
Understanding of Incident, Change, Problem and Vulnerability Management processes.
Desired:Awareness of ITIL best practices
AWS Certification and/or Kubernetes Certification
Experience with SnowFlake
Automation/CI/CD experience, Jenkins, Ansible, Github Actions, Argo CD.
Company Benefits and Perks:
We work hard to embrace diversity and inclusion and encourage everyone to bring their authentic selves to work every day. We offer a variety of social programs, flexible work hours and family-friendly benefits to all of our employees.
Retirement PlansMedical, Dental and Vision CoveragePaid Time OffPaid Parental LeaveSupport for Community InvolvementWe're serious about our commitment to diversity which is why we prohibit discrimination based on race, color, religion, gender, national origin, age, disability, veteran status, marital status, pregnancy, gender expression or identity, sexual orientation or any other legally protected status.