Chennai
44 days ago
Lead I - DevOps Engineering (SRE + Terraform)

Role Proficiency:

Act under guidance of Lead II/Architect understands customer requirements and translate them into design of new DevOps (CI/CD) components. Capable of managing at least 1 Agile Team

Outcomes:

Interprets the DevOps Tool/feature/component design to develop/support the same in accordance with specifications Adapts existing DevOps solutions and creates own DevOps solutions for new contexts Codes debugs tests documents and communicates DevOps development stages/status of DevOps develop/support issues Select appropriate technical options for development such as reusing improving or reconfiguration of existing components Optimises efficiency cost and quality of DevOps process tools and technology development Validates results with user representatives; integrates and commissions the overall solution Helps Engineers troubleshoot issues that are novel/complex and are not covered by SOPs Design install configure troubleshoot CI/CD pipelines and software Able to automate infrastructure provisioning on cloud/in-premises with the guidance of architects Provides guidance to DevOps Engineers so that they can support existing components Work with diverse teams with Agile methodologies Facilitate saving measures through automation Mentors A1 and A2 resources Involved in the Code Review of the team

Measures of Outcomes:

     Quality of deliverables      Error rate/completion rate at various stages of SDLC/PDLC      # of components/reused      # of domain/technology certification/ product certification obtained SLA for onboarding and supporting users and tickets

Outputs Expected:

Automated components :

Deliver components that automat parts to install components/configure of software/tools in on premises and on cloud Deliver components that automate parts of the build/deploy for applications


Configured components:

Configure a CI/CD pipeline that can be used by application development/support teams


Scripts:

Develop/Support scripts (like Powershell/Shell/Python scripts) that automate installation/configuration/build/deployment tasks


Onboard users:

Onboard and extend existing tools to new app dev/support teams


Mentoring:

Mentor and provide guidance to peers


Stakeholder Management:

Guide the team in preparing status updates
keeping management updated about the status


Training/SOPs :

Create Training plans/SOPs to help DevOps Engineers with DevOps activities and in onboarding users


Measure Process Efficiency/Effectiveness:

Measure and pay attention to efficiency/effectiveness of current process and make changes to make them more efficiently and effectively


Stakeholder Management:

Share the status report with higher stakeholder

Skill Examples:

     Experience in the design installation configuration and troubleshooting of CI/CD pipelines and software using Jenkins/Bamboo/Ansible/Puppet /Chef/PowerShell /Docker/Kubernetes      Experience in Integrating with code quality/test analysis tools like Sonarqube/Cobertura/Clover      Experience in Integrating build/deploy pipelines with test automation tools like Selenium/Junit/NUnit      Experience in Scripting skills (Python/Linux/Shell/Perl/Groovy/PowerShell)      Experience in Infrastructure automation skill (ansible/puppet/Chef/Powershell)      Experience in repository Management/Migration Automation – GIT/BitBucket/GitHub/Clearcase      Experience in build automation scripts – Maven/Ant      Experience in Artefact repository management – Nexus/Artifactory      Experience in Dashboard Management & Automation- ELK/Splunk   Experience in configuration of cloud infrastructure (AWS/Azure/Google)   Experience in Migration of applications from on-premises to cloud infrastructures   Experience in Working on Azure DevOps/ARM (Azure Resource Manager)/DSC (Desired State Configuration)/Strong debugging skill in C#/C Sharp and Dotnet   Setting and Managing Jira projects and Git/Bitbucket repositories Skilled in containerization tools like Docker/Kubernetes

Knowledge Examples:

     Knowledge of Installation/Config/Build/Deploy processes and tools      Knowledge of IAAS - Cloud providers (AWS/Azure/Google etc.) and their tool sets      Knowledge of the application development lifecycle      Knowledge of Quality Assurance processes      Knowledge of Quality Automation processes and tools      Knowledge of multiple tool stacks not just one      Knowledge of Build Branching/Merging      Knowledge about containerization      Knowledge on security policies and tools   Knowledge of Agile methodologies

Additional Comments:

Site Reliability Engineer As a Site Reliability Engineer, you will play a pivotal role in ensuring the reliability, scalability, and performance of our systems and services, including managing our DR environment. You will work closely with development, operations, and other cross-functional teams to cultivate a culture of SRE, breaking down silos and managing incidents and problems. Your role will primarily involve developing and implementing innovative solutions for proactive detection, analysis, and resolution of issues, while also managing incidents and problems as they arise. Key responsibilities: Collaborate with development teams to implement highly available, scalable, and reliable systems, including disaster recovery (DR) capabilities Define and implement SLOs, error budgets, and monitoring strategies to ensure system reliability, availability, and performance, including DR scenarios Develop and maintain automation solutions for deployment, monitoring, incident response, and DR operations using scripting languages and configuration management tools Respond to incidents, implement corrective actions to prevent recurrence, including DR-related incidents and provide RCAs Handle change management and deployment support Implement and manage disaster recovery (DR) strategies and procedures to ensure timely and effective recovery of systems and services in the event of a disaster Analyze system performance metrics, identify bottlenecks, and implement optimizations to improve system performance, scalability, and DR readiness Plan for capacity requirements and ensure systems can scale to meet demand, including DR capacity planning Utilize Infrastructure as Code (IaC) principles and tools to provision, manage, and automate infrastructure, including DR infrastructure Implement and maintain monitoring solutions to proactively identify and address issues before they impact users including DR monitoring Document operational procedures, best practices, and system architectures, and share knowledge with team members, including DR documentation Collaborate effectively with cross-functional teams, communicate technical concepts clearly, and contribute to a culture of continuous improvement, including DR preparedness drills and exercises Participate in an on-call rotation to address critical incidents outside of regular business hours Qualifications: Bachelor’s degree in computer science, Engineering, or a related field 2-6 years of hand-on experience in DevOps roles Technical Skills: Strong knowledge of cloud computing platforms such as AWS Proficiency in scripting languages such as Python, Shell or PowerShell Strong experience with CI/CD tools like Jenkins and Harness Experience with configuration management tools such as Ansible or Chef Familiarity with containerization technologies such as Docker and container orchestration tools like Kubernetes, Helm Experience with monitoring and observability tools such as Dynatrace Strong understanding of networking, security, and infrastructure principles, including DR principles and practices. Extensive knowledge of Windows and Linux systems Proficiency in managing artifacts with JFrog Artifactory Familiarity with code quality tools such as SonarQube and SAST tools like Checkmark Expertise in Infrastructure as Code using Terraform/Terragrunt Soft Skills: Strong problem-solving skills and attention to detail Excellent communication and collaboration abilities Ability to work in a fast-paced, agile environment Strong commitment to reliability, scalability, and operational excellence, including DR readiness Good to have: Certifications in relevant technologies such as AWS, Kubernetes, or DevOps Experience with chaos engineering practices and tools such as Chaos Monkey or Gremlin Experience with monitoring tools such as Datadog, Prometheus, Grafana or ELK stack

Confirm your E-mail: Send Email