As a Lead Site Reliability Engineer you will play a vital role in implementing modern Engineering and DevOps techniques operating a large-scale distributed application portfolio across on-premises and cloud to increase efficiency, eliminate downtime, optimize cost, and maintain performance at scale. You will provide hands on technical expertise to design, deploy, secure, and optimize cloud services and deliver the best customer experience. This role will also be responsible for maintaining and reporting the health of the core E-Commerce systems, page performance and customer experience analytics while working as an adviser to help identify, educate, and foster best-in-class site reliability solutions.
Essential Duties and Responsibilities (Min 5%) Leads end-to-end availability, security and performance of mission-critical applications and services that are part of the E-Commerce eco-system Drives changes and release activities related to site stability with other teams (internal and external), partnering with the Change Management group to ensure smooth and trouble-free roll out of releases and changes. Partners with Information Security with managing application security, vulnerabilities fix remediation, and compliance activities with other teams (internal and external) Partners with vendors to ensure all critical patches are tested and applied in both Non-Production and Production environment in time to avoid any business and customer impacts. Partners with leads and architects across the organization to define the Performance strategy and executes performance test activities with other teams (internal and external, partners with QA Performance Test Engineers to ensure all changes are tested in both Non-Production and Production to avoid any business and customer impacts. Establishment of application and synthetic monitoring, alerting and execution of failover capabilities and automated self-healing and recovery. Manages and maintains performance environments, ensuring that these environments are properly setup, configured, and highly available for each project as scheduled. Communicates state of reliability to prioritize technical debt and improvements on technology team roadmaps. Supports day-to-day health, uptime, monitoring and reliability of the website and related services Leads, models, and drives SRE culture and behaviors Share a 24x7 On-Call Production support rotation with your team and respond to service incidents. May perform other duties as assignedRequired Qualifications
Experience:
7+ years of experience in B2B or B2C customer facing software design, development, and deployments. 7+ year of experience around performance engineering & application monitoring for an organization with large and complex information systems is preferred. 5+ Experience with Application Security treat & vulnerability management and bot traffic Management for B2C or B2B large scale applications.
Education: Bachelor’s degree in Computer Science or related field is required. Any suitable combination of education and experience will be considered.
Working Conditions Normal office working conditions Must be able to work some nights and weekends Occasional travel required Physical Requirements Sitting Standing (not walking) Walking Kneeling/Stooping/Bending Reaching overhead Lifting up to 20 pounds Disclaimer
This job description represents an overview of the responsibilities for the above referenced position. It is not intended to represent a comprehensive list of responsibilities. A team member should perform all duties as assigned by his/ her supervisor.