The TOS (Technical Operations and Support) Observability organization has been focused on developing metrics that work towards the goal of stabilizing services in OCI, therefore improving customer experience when using OCI. This group owns the charter to build services, tools, systems, and processes that enhance OCI Operability through improved observability, metrics, and data. Our organization is working on exciting techniques to detect Large Scale Events (outages) in an automated and predictive fashion before they cause serious customer impact. The team will have diverse expertise in systems, networking, and software development to provide the stability, performance and reliability our customers come to expect. We work with multiple service development teams, identify cross-team issues with associated operational risk, and work with many teams across the organization to resolve underlying problems. With a mix of engineering solutions, troubleshooting expertise, and general operational guidance this role also requires communication and organizational skills. You are the interface between Engineering Operations and Service Teams. The work delivered is mission critical and directly contributes to our customer’s success.
Career Level – IC4 or IC5 (Staff/Principal/Tech Lead or Senior Staff/Senior Principal)
We are looking for a devops engineers with expertise and real-world experience in developing and architecting applications that are used at scale. You should be comfortable with API Development as well as automated infrastructure development. These are exciting times in our space, we are growing fast, we are still at an early stage and we are working on ambitious new initiatives.
About You:
You are an experienced cloud engineer with a proven track record of delivering high-scale, high-impact solutions You are extremely interested in having a front row seat when it comes to learning about cloud services and working with Cloud service teams. You have a strong opinion on metrics and observability when it comes to understanding customer pain You have excellent communication skills. You can easily work with teams that are struggling to setup their metrics correctly and convince them of ways to setup their metrics that will serve them and the TOS organization You are a disciplined engineer who understands the importance of high standards, never satisfied with mediocrity and constantly striving for excellence You are comfortable with ambiguity in a chaotic and fluid environment You are passionate about technology and are not afraid to defend your opinions or position with peers/superiorsMinimum Qualifications
8+ years of experience shipping scalable, cloud native distributed systems Ability to work in a collaborative, cross-functional team environment. Strong grasp of Computer Science concepts (data structures, algorithms, and programming paradigms) Proficient in at Java, Python and GO. Use technologies like Kubernetes to operate highly available, high-performance distributed systems Writes correct, secure, maintainable, robust code and appropriate tests. Collaborates on planning and identifying and mitigating risks in their project. You are experienced at building highly available services, possessing knowledge of common service-oriented design patterns and service-to-service communication protocols. Automate common tasks to enable continuous delivery and ensure continuous availability with minimum human overhead Able to effectively communicate technical ideas verbally and in writing (technical proposals, design specs, architecture diagrams and presentations) Has experience with mentoring and knows how to appropriately delegate tasks and teach concepts, so that the rest of the team can grow and become more effective. Provide technical guidance and constructive feedback to leadership, team members, and other stakeholders Contribute to product roadmaps by identifying areas of need and engaging with stakeholders to scope work.Preferred Qualifications
Familiarity with several of the following technologies: Infrastructure-as-a-Service (AWS/Azure/Google Cloud), CI/CD systems (TeamCity/Jenkins), Docker, Linux (Oracle Linux/RedHat), RESTful APIs, log analysis tools, debugging tools. Drives operational readiness & excellence of their features & subsystems Hands on experience building large scale services through entire software development lifecycle. Excellent written and verbal communication skills with the ability to present complex information in a clear, concise manner to all audiences. Results driven; thrives in a development environment that is agile, collaborative and in start-up mode, even when faced with ambiguity. Has experience with on-call and handled operations on previous teams Develops new metrics and dashboards to improve situational awareness, has opinions on what good metrics and dashboards are. Design, develop, troubleshoot and debug software programs for databases, applications, tools, networks etc. Commitment to capturing and maintaining institutional knowledgeCareer Level - IC5