Lead Application Support & Reliability Engineer
UBS
Lead Application Site Reliability Engineer responsible for availability, latency, performance, efficiency, change management, monitoring, emergency response and capacity planning of Applications managed by Application Support and Reliability team
The role need to adopt software engineering practices to view technology operations as engineering problems and enhance resiliency of the application or product on various parameters listed above.
Your role is expected to automate the TOIL for efficiency, enhance customer experience on delivery of the Application Services, increased observability and reduce time to recovery on any outages.
apply a broad range of engineering practices with a focus on reliability, from instrumentation, performance analysis, and log analytics to identify Hot spots, automated deployment, and operations risk reduction
minimize the risk and impact of failures by engineering operational improvements, such as predictive monitoring, auto scaling or self-healing
collect and analyze operational data, define and monitor key metrics to identify and communicate areas for improvement
ensure that Ops professionals and product managers are reviewing incidents and documenting the findings to enable informed decision-making. Based on post-incident reviews, will need to optimize overall process of delivery, monitoring and controls to boost service reliability
ensure the quality, security, reliability, and compliance of our solutions by applying our digital principles and implementing both functional and non-functional requirements
learn new technologies and practices, reuse strategic platforms and standards, evaluate options, and make decisions with long-term sustainability in mind
work in an agile way as part of multi-disciplinary teams, participate in agile ceremonies, and collaborate with engineers and product managers
understand, represent, and advocate for client needs
share knowledge and expertise with colleagues, help with hiring, and contribute regularly to our engineering culture and internal communities
drive automation to eliminate TOIL, increase self-serviceable option on Requests and Changes requested
implement Monitoring Maturity with SLI, driving clear SLO and SLA
create baselines for Error Budget and efforts spent on addressing TOIL
lay down a plan to mitigate the issue in shortest possible time to avoid impact on Error Budget
adopt test methodologies like Chaos Engineering to enhance resiliency and ability to withstand against unexpected failures
carry out technical analysis, design, code, tests, documentation, and other engineering artifacts
drive continuous improvement through Problem Management
The role need to adopt software engineering practices to view technology operations as engineering problems and enhance resiliency of the application or product on various parameters listed above.
Your role is expected to automate the TOIL for efficiency, enhance customer experience on delivery of the Application Services, increased observability and reduce time to recovery on any outages.
apply a broad range of engineering practices with a focus on reliability, from instrumentation, performance analysis, and log analytics to identify Hot spots, automated deployment, and operations risk reduction
minimize the risk and impact of failures by engineering operational improvements, such as predictive monitoring, auto scaling or self-healing
collect and analyze operational data, define and monitor key metrics to identify and communicate areas for improvement
ensure that Ops professionals and product managers are reviewing incidents and documenting the findings to enable informed decision-making. Based on post-incident reviews, will need to optimize overall process of delivery, monitoring and controls to boost service reliability
ensure the quality, security, reliability, and compliance of our solutions by applying our digital principles and implementing both functional and non-functional requirements
learn new technologies and practices, reuse strategic platforms and standards, evaluate options, and make decisions with long-term sustainability in mind
work in an agile way as part of multi-disciplinary teams, participate in agile ceremonies, and collaborate with engineers and product managers
understand, represent, and advocate for client needs
share knowledge and expertise with colleagues, help with hiring, and contribute regularly to our engineering culture and internal communities
drive automation to eliminate TOIL, increase self-serviceable option on Requests and Changes requested
implement Monitoring Maturity with SLI, driving clear SLO and SLA
create baselines for Error Budget and efforts spent on addressing TOIL
lay down a plan to mitigate the issue in shortest possible time to avoid impact on Error Budget
adopt test methodologies like Chaos Engineering to enhance resiliency and ability to withstand against unexpected failures
carry out technical analysis, design, code, tests, documentation, and other engineering artifacts
drive continuous improvement through Problem Management
Confirm your E-mail: Send Email
All Jobs from UBS