Southlake, TX, US
1 day ago
Sr Manager, Reliability Engineering and Operations
Welcome page Returning Candidate? Log in Sr Manager, Reliability Engineering and Operations Job Locations US-TX-Westlake | US-TX-Southlake | US-CO-Lone Tree Requisition ID 2024-103815 Posted Date 15 hours ago(11/12/2024 5:53 PM) Category Engineering & Software Development Salary Range USD $150000.00 - $205000.00 / Year Application deadline 11/19/2024 Position Type Full time Your Opportunity

At Schwab, you’re empowered to make an impact on your career. Here, innovative thought meets creative problem solving, helping us “challenge the status quo” and transform the finance industry together.

 

The Sr Manager, Reliability Engineering and Operations is enthusiastic about leading technology teams responsible for delivering exceptional application and production support. You need to have a proven track record of critical thinking skills with laser focus on pragmatic problem solving and production support, and customer satisfaction. We require strong ethics, critical thinking skills, and the ability to partner with and influence business partners, product teams, and technologists across the organization. The right candidate will have a strong background in leading and developing 24 X 7 support teams.

 

Essential Functions:

Leadership & Management:Leading and mentoring a Production Operations team for Schwab’s Workplace Financial Services Technology team fostering a culture of continuous improvement and innovationCollaborating with cross-functional teams to ensure alignment on reliability and performance goalsHands-on technical leader who will lead the team from the front and be able to inspire thought leadership in the teamIdentifying tactical and strategic opportunities to improve service health, performance, reliability, and telemetryDriving a shift-left mindset and influence architectural decisions to ensure resiliency and scale at the outset of software development processAdvocating automation to ensure teams are following patterns to ensure repeatability, consistency, and portabilityIdentifying toil and technical debt, develop a comprehensive plan and lead the team through the process of executionReliability & Performance:Conducting post-mortem reviews to identify areas for improvement and implement solutions to enhance system reliabilityImplementing and promoting performance engineering practices to ensure optimal system performanceDeveloping and executing strategies for destructive testing to identify potential points of failure and improve system resilienceWorking closely with development team to define a sustainable operating model for Mobile applications focusing on platform scale, availability, fault tolerance and performanceLeading the team with data driven mindset focusing on addressing key performance metrics such as MTTD, MTTR, Availability in close collaboration with development teamsProduction Engineering & Operational Support:Overseeing production engineering efforts to ensure systems are designed for operational excellence and reliabilityProviding technical guidance as needed during incidents and daily workProviding leadership around incident management and root cause analysis to resolve production issues and prevent recurrenceEstablishing and maintaining operational support practices, including monitoring, alerting, and incident responseLeading the team in their SRE maturity journeyContinuous Improvement:Driving continuous improvement initiatives in reliability, performance, automation, and operational supportStaying current with industry trends and best practices to ensure our systems and processes remain in line with SRE tenets What you have

Required Qualifications:

10+ years of experience running 24/7/365 application support teams responsible for enterprise applications, infrastructure, and systems.10+ years of experience in measuring, tracking, improving, and reporting on SLO/SLA’s/KPI’s7+ years of experience supporting enterprise applications in production.5+ years of experience working Enterprise ITSM Business Processes.ITIL Experience with Enterprise Systems that includes but not limited to:Event and Incident ManagementRelease and deploymentEnterprise Change Management experienceAvailable for after-hours calls/incident managementExperience managing multi-shift-based teamsRecent experience leading operations organization that focuses on event and incident managementExperience in monitoring tools with a focus on ITIL capabilitiesExperience with GitHub, Bamboo, Bitbucket, Splunk, ThousandEyes, and AppDynamics  Options Apply for this jobApplyShareRefer a friendRefer Sorry the Share function is not working properly at this moment. Please refresh the page and try again later. Share on your newsfeed Why work for us?

Own Your Tomorrow embodies everything we do! We are committed to helping our employees ignite their potential and achieve their dreams. Our employees get to play a central role in reinventing a multi-trillion-dollar industry, creating a better, more modern way to build and manage wealth.

 

Benefits: A competitive and flexible package designed to empower you for today and tomorrow. We offer a competitive and flexible package designed to help you make the most of your life at work and at home—today and in the future.   Application FAQs

Software Powered by iCIMS
www.icims.com

Confirm your E-mail: Send Email