The SRE team is responsible for the overall health, performance, and reliability of both Demo Cloud Platform and a range of SaaS Services. Demo Services (DS) leverages Oracle Cloud services as well as open-source components to deliver a full-featured, self-service, and extensible demo platform for Oracle sales, consulting, and strategic partners. In addition, we have built two SaaS services, one that helps other SaaS teams to continuously test their services and another to facilitate data creation for sales opportunities and development/test activities.
For our team to meet the goals of the Business, the SRE team works alongside the DS Development and DS Architecture teams to rapidly deploy new functionality for the platform through CI/CD methodologies. We are looking for candidates that have a strong passion for automation and are enthusiastic to pick up new technologies, product stacks, and industry-current solutions.
As part of the SRE team you will … Establish end-to-end monitoring and alerting on all critical components of the application Monitor application performance and take steps to improve overall application performance and stability and follow through with implementation Monitor and manage uptime, end-to-end performance and operability of all service processes and dependent infrastructure to meet SLAs Solve complex problems related to infrastructure cloud services to prevent problem recurrence. Contribute to making our infrastructure simple, reliable, and easy to operate Document your system knowledge as you acquire it over time, create runbooks, and ensure critical system information is readily available to those who need it and turn into repeatable actions–and then into automation. Troubleshoot complicated, cross platform issues handling OS, Networking, Database in a cloud-based SaaS environment and handle live production incidents, debug/troubleshoot application and infrastructure issues Conduct periodic on call duties, respond to production incidents and provide support for development to address customer incidents Results driven; thrive in a development environment that is agile, collaborative and in start-up mode, even when faced with ambiguity Need to possess a contagious sense of ownership and are capable of using all available tools to solve any issues you encounter Participate in the development of tools and processes that leverage observability best practices to proactively identify and resolve issues before they become incidents Model and maintain our Autonomous Data Warehouse and data flows. You will also develop, maintain and debug our internal reporting system on Oracle Analytics Cloud
Your Skills should be a subset of the following … You have a bachelor’s degree in Computer Science, Software Engineering, Information Systems or equivalent and 4+ years of relevant work experience. You have worked in an SRE/DevOps role and managed medium to highly complex production environments at scale You have practical experience with continuous integration and continuous delivery methodologies, using tools like GitLab, Jenkins, or others You have experience with Container and Container Management technologies: Docker, Kubernetes You have hands-on experience with orchestration and configuration management tools such as Ansible, Terraform, Puppet, or others Experience in monitoring and analyzing infrastructure performance using standard performance monitoring tools - Prometheus, Alertmanager, Grafana You are knowledgeable with network concepts - DNS, load balancing, VCN, firewall, proxy server, etc. You are familiar with Linux and its administration life cycle - deployment, upgrading, compiling, and debugging You are adept in one or more of the following languages: JavaScript, NodeJS, Java, Python, Perl, Go, Shell Scripting You are able and willing to work in an on-call rotation that will include rotating weekend coverage Ability to operate independently, make decisions, take action and take responsibility. Effective communication and interpersonal skills, ability to work and coordinate between multiple teams Have a software-centric mindset
Your Bonus Skills… You have a master’s degree in Computer Science or related studies You have experience in working with major cloud platform(s): Oracle Cloud, Microsoft Azure, Google Cloud Platform, or AWS - any certification(s) a plus You are adept with SQL, PL/SQL, and query performance tuning You have experience with data modeling, analytics, and report building You have a solid foundation in database administration and are comfortable with the complete database Life Cycle, including provisioning, backup and recovery, cloning, performance tuning, maintenance, and troubleshooting You have developed tools and provided scalable, maintainable, and automated solutions to support mission-critical applications
What we will offer you... A competitive salary with exciting benefits A special learning and development program to advance your skills and career A senior team, with a strong collaborative spirit where you will have the opportunity to work together but also lead initiatives An inclusive culture that celebrates what makes you unique Core benefits such as life insurance, health insurance, and access to retirement planning An Employee Assistance Program to support your mental health Employee resource groups that champion our diverse communities
Career Level - IC3