Storage SRE Engineer
IBM
**Introduction**
About IBM
IBM is a global technology and innovation company. It is the largest technology and consulting employer in the world, with presence in 170 countries. The diversity and breadth of the entire IBM portfolio of research, consulting, solutions, services, systems and software, unusually distinguishes IBM from other companies in the industry.
Over the past 100 years, a lot has changed at IBM, in this new era of Cognitive Business, IBM is helping to reshape industries as diverse as healthcare, retail, banking, travel, manufacturing, and many more, by bringing together our expertise in Cloud, Analytics, Security, Mobile, and the Internet of Things. We like to say, "be essential." We are changing how we craft. How we collaborate. How we analyze. How we engage.
Join the next generation of innovators, inventors and entrepreneurs who are crafting the very way the world works. We want the brightest minds doing work that encourages, in an environment where growth is supported. IBMers get to discover their potential, so they’re inspired to build breakthroughs that help our clients succeed. We’re building teams with dynamic strengths with people who want their ideas to matter. Join us — you’ll be proud to call yourself an IBMer.
Our Culture:
IBM is committed to crafting a diverse environment and is proud to be an equal opportunity employer. You will receive consideration for employment without regard to your race, color, religion, gender, gender identity or expression, sexual orientation, national origin, genetics, disability, age, or veteran status.
Business Unit Introduction
IBM Cloud Computing is a one-stop shop which provides all the cloud solutions & cloud tools the industries need. IBM Cloud portfolio includes infrastructure as a service (IaaS), software as a service (SaaS) and platform as a service (PaaS) offered through public, private and hybrid cloud delivery models, in addition to the components that make up those clouds.
IBM Cloud ensures seamless integration into public and private cloud environments. The infrastructure is secure, scalable, and flexible, providing customized enterprise solutions that have made IBM Cloud the Hybrid Cloud Market leader with our market leading IAAS and PAAS Platforms. The IBM Cloud platform is the public cloud offering from IBM providing services to global enterprises. IBM Cloud is the Cloud for Smarter Business, built on Open Technology with Developer Tools and supports solutions by Industry. We run the services and workloads from Watson, Blockchain, Services, Security, and IoT.
Ready to help drive IBM's success in the Cloud market? This is your chance to research and learn new Cloud related technology products and services, as well as to design and implement quick Cloud based prototypes while advancing your career in leading edge technology.
**Your role and responsibilities**
Who you are:
As a Site Reliability Engineering (SRE) and DevOps Engineer in Storage, you will ensure that the designed solution responds to non-functional requirements such as reliability, availability, performance, security, and maintainability. You will closely work with the development and other related Release and L2 teams.
* You will bring a strong engineering focus to operations, putting your energy on preventing incidents, increasing observability, automation frameworks, self-service infrastructure, logging and metrics, and operational reports.
* You will be expected to use tools include logging, monitoring, event management, notification, Runbook Automation, ChatOps, Root Cause Analysis.
* You will work with Automation Engineers and QA Engineers to ensure seamless delivery of our service offerings.
· Build sufficient expertise in the IBM Cloud control plane (IMS) to create automated monitoring processes
Responsibilities:
* Keeping your assigned site or service up and running or getting it back up and running quickly when failure occurs
* Working closely with internal partners and teams to ensure that our infrastructure meets security, SLA, and performance requirements
* Writing, updating, and using documentation, including runbooks/playbooks
* Automating work including infrastructure needs, testing, failover solutions, failure mitigation, and much more
* Debugging complex problems across an entire stack and creating solid solutions
* Developing CI/CD processes to improve cadence
* Persistent testing of application and infrastructure resiliency over a variety of error conditions.
* Partnering with security engineers and developing plans and automation to aggressively and safely respond to new risks and vulnerabilities.
* Develop, communicate, and monitor standard processes to promote the long-term health of sustainability and health of operational development tasks.
* Standup and maintain pre-production and developer environments to support the entire development organization and improve overall team velocity
* Use metrics and analytics to determine reliability issues and remove them through automation and tooling
* Be an advocate for our customers, providing them self-diagnosing tools to resolve common issues that arise in the field
**Required technical and professional expertise**
Required Professional and Technical Expertise
· 4+ yrs of total experience
· A solid understanding of Cloud infrastructure/operations is a must
* Knows their way around a Unix/Linux shell, can write shell scripts, and understands Linux internals
* Experience debugging complex problems
* Experience designing, building, and operating large-scale production systems
· Expertise in Ansible, Bash, core Python development
· Strong familiarity with one of C, C++, golang, python, or Java
* Experience with DevOps engineering or SRE
* Experience with containers, such as with Docker, Kubernetes
* Experience with standard industry tools for monitoring and observability like Prometheus and Grafana
* Experience automating infrastructure, configuration management, testing, and deployments using tools like Ansible, Chef and can explain the Infrastructure as Code paradigm
* A strong understanding of diverse infrastructure platforms and infrastructure concepts required.
* Has hands-on experience using source control and feature branching strategies
* Understands networking and messaging, especially between services
· Must have good experience in Infrastructure Operations automation and IT Service Management with hands on exposure in data center administration, configuration, Incident management and support
· Strong communication skills
Confirm your E-mail: Send Email
All Jobs from IBM