Heredia, CRI
2 days ago
Site Reliability Engineer
**Introduction** In a competitive talent market, high quality job posts and adverts are essential if we are to successfully attract talent into IBM. To ensure your Reqs' job post is the best it can be, please use the templates provided in your Job Post Builder tool, which you can access through this [1] Link If using the tool for the first time, please ensure that you read the guidelines provided at the top of each of the resources pages. References Visible links 1. https://w3.ibm.com/w3publisher/ta-strategy-hub-talent-marketing/talent-marketing-in-a-box/job-advert-tips/domains-job-ad-builder-tool **Your role and responsibilities** Your Role and Responsibilities : Troubleshoot, monitor, and support critical production systems. Perform root cause analysis and manage incidents to ensure timely resolution. Provision and deploy environments in a cloud infrastructure (preferably IBM Cloud). Handle initial intake for Salesforce-related customer cases, ensuring SLA commitments are met. Provide on-call support, sharing rotation duties with global resources (including Poland), ensuring minimized MTTR (Mean Time to Recovery). Manage workloads and resources to maintain commitments and prevent SLA breaches. **Required technical and professional expertise** Required Professional and Technical Expertise : Strong working knowledge of Kubernetes and cloud infrastructures, with a preference for IBM Cloud (1-3 years). Expertise in administration, configuration, and management of MS SQL Server 2022 (1-3 years). Expertise in automation platforms such as AWX. Proficiency in scripting languages like Python and related tools. Strong problem-solving skills and attention to detail. **Preferred technical and professional experience** Preferred Professional and Technical Expertise : Proven experience in providing on-call support for critical production systems, focusing on determining root cause analysis (RCA). Familiarity with Salesforce infrastructure and case management processes. Experience with monitoring tools and incident management platforms. Ability to work efficiently in a global, distributed team environment.
Confirm your E-mail: Send Email