Bengaluru, Karnataka, India
11 days ago
Software Engineer III AWS, Terraform, Python Observability Tools

We have an exciting and rewarding opportunity for you to expand your skills and make a meaningful impact. Partner with an organization committed to defining the employee platform software engineer in the HR Work Force Technology.

As an Observability & Automation engineer at JPMorgan Chase within the Employee Platform, you will play a crucial role in ensuring the reliability, performance, and scalability of our cloud-based system by implementing effecting observability & automation practices and tools.

Job responsibilities

Design and implement comprehensive observability solutions to monitor and analyze the performance and health of cloud and on-premises infrastructure, applications, and services. Develop custom monitoring tools, dashboards, and alerts to proactively identify and resolve issues before they impact users. Collaborate with cross-functional teams to define key metrics,  alerts, dashboard, service level objectives (SLOs), and service level indicators (SLIs) to measure the reliability and availability of our systems. Implement automated testing, deployment, and provisioning processes to accelerate software delivery and ensure consistency across environments. Work closely with DevOps and SRE teams to integrate observability tools and practices into the CI/CD pipeline and infrastructure-as-code (IaC) workflows. Continuously evaluate and adopt new technologies, tools, and best practices to improve observability and automation capabilities. Troubleshoot complex technical issues related to performance, scalability, and reliability, and provide timely resolutions. Document observability and automation solutions, procedures, and configurations to facilitate knowledge sharing and enable effective collaboration.
 

Required qualifications, capabilities, and skills

Formal training or certification on software engineering concepts and 3+ years applied experience 7+ years of experience as a DevOps Engineer, Site Reliability Engineer (SRE), or similar role with a focus on observability and automation. Hands-on experience with observability tools and technologies, including monitoring systems (e.g., AWS Cloudwatch, AWS X-Ray, Prometheus, Grafana, Dynatrace), logging frameworks (e.g., ELK stack, Splunk), and distributed tracing (e.g., Jaeger, Zipkin). Proficiency in programming/scripting languages such as Python, Bash, or Go for automation and tool development. Experience with infrastructure automation tools such as Terraform, Ansible, or Chef for provisioning and configuration management. Solid understanding of containerization technologies (e.g., Docker, Kubernetes) and microservices architectures. Strong proficiency in  AWS and on-premises infrastructure. Excellent analytical, problem-solving, and communication skills. Ability to work effectively in a fast-paced, dynamic environment and manage multiple priorities simultaneously.

 

Preferred qualifications, capabilities, and skills

Certifications in relevant areas such as AWS Certified DevOps Engineer, Certified Kubernetes Administrator (CKA), or Certified Site Reliability Engineer (SRE). Experience with observability platforms and solutions like Datadog, Splunk, Dynatrace, or Apica. Familiarity with continuous integration/continuous deployment (CI/CD) pipelines and associated tools (e.g., Jenkins, Spinnaker, GitHub ). Knowledge of modern software development practices, including Agile methodologies and DevOps principles.

 

Confirm your E-mail: Send Email