United States
12 hours ago
Senior Principal Site Reliability Developer

Oracle is looking for a Senior Principal Site Reliability Developer with world-class experience in developing and supporting large scale cloud deployments across the world. The candidate should have expert level knowledge with hands-on experience in managing complex microservices architectures using service mesh technologies like Linkerd, alongside API gateways on Kubernetes. Strong proficiency in monitoring and visualization using Grafana to identify and troubleshoot performance issues within distributed systems is required Oracle Weblogic Application, Automation, and Running the System Production at Operational Level. The position is part of SaaS Reliability Engineering organization and provides a unique opportunity to work on cutting edge of cloud technologies, tools, products, and cloud services. The candidate must be US Citizen should be willing to work beyond regular business hours and during weekends/holidays on need basis.

Organization: SaaS Engineering

The Oracle Cloud is a suite of Oracle applications, middleware and database offerings delivered in a self-service, subscription-based, elastically scalable, reliable, highly available and secure manner.  The Oracle Cloud is an enterprise cloud for business. It is an integrated suite of services spanning Oracle's complete portfolio based on open Java and SQL standards offering flexible cloud deployment. The services offered in our cloud are based upon Oracle's complete portfolio of best-in-class solutions.

As part of SaaS Engineering, we consolidate and simplify IT operations and applications instances across hosted services in Oracle Cloud.  We partner with engineering teams to develop product modules to be offered as a service. We collaborate with product quality assurance team to run test cases and to ensure high quality service after release of product versions.  Our team works with the best of the class next generation Oracle Fusion technical-stack components such as Oracle Autonomous Database, Oracle WebLogic Server 12C/14C, Oracle Business Intelligence, Oracle Identity management, Oracle Virtual machine and Hybrid/Spectra Service.  The team has excellent expertise in cutting-edge products and technologies like AI, ML, Oracle Fusion applications, hosting products as Software as a Service & Platform as a Service and supporting services in next generation Oracle Cloud Infrastructure.

 Key Responsibilities:

DevOps - Kubernetes administration, including installation, configuration, and troubleshooting. Grafana and Prometheus Administration - Develop and implement custom dashboards for monitoring key metrics. Data Analysis and Visualization. Strong understanding of monitoring best practices, alerting, and data analysis. Middleware Technology Expert - Part of Oracle Weblogic Administration team to manage the server life cycle to monitor the application services. Troubleshooting the key problems on various layers of the SaaS application and infrastructure. Provide internal analysis, enhance, and maintain existing environment(s) capacity and capabilities. Automation – Clear understanding of automation and orchestration principles is the key. Automate operational tasks/deployments, develop scalable solutions contribute towards transition to algorithmic IT operations. Develop the solutions so that fleet wide deployment, tracking and updates can be done. Ownership Scope – Good understanding of end-to-end configuration and technical dependencies. In partnership with Service Development and Operations partners will have responsibility to ensure that services are designed and delivered to be mission critical with focus on monitoring, telemetry, security, resiliency, scale, and performance. Engineer solutions so that services are compliant and meet/exceed the service level agreements. Collaborate with various teams during Service outages, capacity expansion, infrastructure maintenance as well as ensure adherence to production deployment standards.

Required Skills:

5+ years of experience in Oracle Weblogic along with automation skills (WLST/Shell Scripting) with BS Comp+uter Science or equivalent qualification. Master’s Degree in Computer Science or Management is preferred. Should have administration skills on any WebServer like OHS (Oracle HTTP Server) or Apache Experience in cloud development languages Kubernetes, Python, and Prometheus Experience in working in Linux OS environments  Experience in deploying and running large scale online systems built on Cloud platforms such as Oracle Cloud, AWS, Azure, Google Cloud Platform, and/or OpenStack. Experience in designing and implementing solutions for platform and application layer telemetry, monitoring, scalability, performance and reliability. Knowledge on any parallel job execution framework tools like Marionette Collective (MCollective)  Technical skills and knowledge that extends across Application/Server/Storage/Network technologies to troubleshoot and provide system level guidance/solutions. Strong ability to solve operational problems, with ability to identify and automate common routines. Excellent written and verbal communication skills. Willingness to learn new technologies

 Preferred Additional Skills:

Experience in AI and ML is preferred Experience in ava programing and understanding of structured SQL statements will help. Prior experience as a Service Reliability Engineer or DevOps Engineer. Experience with automated service deployment tools A strong focus on business outcomes Comfortable with collaboration, open communication and reaching teams in boundaryless manner. Knowledge on Incidents/Request and change management process is a plus.

 

 

Solve complex problems related to infrastructure cloud services and build automation to prevent problem recurrence. Design, write, and deploy software to improve the availability, scalability, and efficiency of Oracle products and services. Design and develop designs, architectures, standards, and methods for large-scale distributed systems. Facilitate service capacity planning and demand forecasting, software performance analysis, and system tuning.

Career Level - IC5

Confirm your E-mail: Send Email
All Jobs from Oracle