DevSecOps & Site Reliability Engineer
McCain
Position Title: DevSecOps & Site Reliability Engineer
Position Type: Regular - Full-Time
Position Location: Florenceville GTC
Requisition ID: 32708
JOB PURPOSE:
The SRE Engineer will work alongside our cloud and managed infrastructure stakeholders to ensure McCain systems are operating optimally.
JOB RESPONSIBILITIES:
Designing, implementing, and administering IT infrastructure to support current and future business requirements, including physical and cloud compute/storage environments, network and communication infrastructure, and endpoint device configurationExperience in problem solving and analyzing complex enterprise systems, and navigating enterprise software, deployment and management of workloads on Cloud, on-premise systemsDrive and influence integrated DevOps solutions across business, product, platform, infrastructure, development, support/DevOps teams that improve the design and operation of systems, making them scalable, reliable, and efficient while ensuring performance and high availability of products/servicesOverseeing and maintaining backup tools, topology, and disaster recovery processesImplementing and supporting maintenance and upgrades of system infrastructure such as Host hardware (IBM iSeries/pSeries, HP, Lenovo), SAN technologies, SQL, VMWare, Hyper-V, and cloud technologiesSpearhead the development of SRE solutions (monitoring and alerting, machine learning anomaly detection, self-healing and reliability testing) for both on-prem and cloud systemsPerforming regular system monitoring, verifying the integrity and availability of all hardware, server resources, systems, and processes for service level integrity and performanceImprove service reliability through blameless post-incident reviews and using code to prevent or respond to problem recurrence.Define and manage Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgets to maintain high service availabilityIdentifying and resolving production capacity, contention, resource, and application deficiencies on both on-prem and cloud systemsCollaborate with engineering teams to improve availability, reliability, and observability of their services.Optimize existing on-premises systems and eliminate toil through automation, optimizing deployment processes, and enhancing the scalability of our infrastructure..Implementing observability, AIOPS across complex cloud workloads and technology stacks.Manage daily operations and functionality of site reliability solutions and applications.Conduct post-incident analysis to identify root causes, implement corrective actions, and prevent similar issues in the futureScripting/Automation - Python, Scripting YAML, Bash, Terraform, Power shellProficiency in test framework automation, test design, test data managementOversee enterprise patch management and CMDB updatesPerform application production support role and troubleshoot incidentsImplement security controls at every stage of the deployment pipeline to detect and mitigate vulnerabilities.Develop and maintain automated processes for security testing, deployment, and infrastructure provisioning.Implement Infrastructure as Code (IaC) practices to ensure consistent and secure infrastructure configurations.Establish and maintain continuous monitoring processes to detect and respond to security incidents promptly.Collaborate with incident response teams to investigate and address security breaches or incidents.Security Audits and Compliance:Deploying and Configuring Azure Firewalls, Azure VPN Gateways and NVAsManage and monitor security health of platforms to ensure that issues and risk are quickly identified and resolved.Collaborate with the IT operations and development teams to plan and execute system changes e.g., security and audit controls as required by the business or compliance requirements.Automate build and release manual activities using DevSecOps best practices.
KEY QUALIFICATION & EXPERIENCES:
7 + years’ experience in IT administration/engineering roles3 - 5 years’ experience in cloud engineering, SRE rolesBachelor’s degree in computer science, information systems or other related field (or equivalent work experience)Strong understanding and working experience of CI/CD and GitOps.Broad exposure to IT infrastructure and application landscape with technical depth in Cloud platformsExtensive experience with documenting and optimizing operational processesExtensive experience engineering both cloud and on-prem environments Proficiency with container orchestration tools such as Kubernetes and Helm.Strong understanding of networking concepts and protocols.OTHER INFORMATION
Key internal relationships: Director, IT Operations & Platform Support, Data & Analytics Teams, IT Application Support, IT Architects, ITSM Manager, Network Services, IT Security, IT Operations (internal and external). Key external relationships: External vendors, partners and service providers.Travel: as required.
Confirm your E-mail: Send Email
All Jobs from McCain