Job Description:Join us as we pursue our ground-breaking vision to make machine data accessible, usable, and valuable to everyone. We are a company filled with people who are passionate about our product and seek to deliver the best experience for our customers. At Splunk, we are committed to our work, customers, having fun, and most significantly to each other’s success. The Splunk Observability Cloud provides full-fidelity monitoring and troubleshooting across infrastructure, applications, and user interfaces, in real-time and at any scale, to help our customers keep their services reliable, innovate faster, and deliver great customer experiences. Infrastructure Software Engineers at Splunk are cloud-native systems engineers who use infrastructure-as-code, microservices, automation, and efficient design to build, operate, and scale our products. Role:Be part of a team whose mission is to build turnkey Infrastructure Monitoring solutions for the Observability platform customers. Primary Focus of the core SRE role revolves around maintaining highly available, scalable, and resilient systems. You are passionate about automation, infrastructure-as-code, microservices, and getting rid of tedious, manual tasks. Qualifications:5+ years of total experience, with at least 2+ years of strong hands-on experience deploying, managing, and monitoring large-scale Kubernetes clusters in the public cloudCloud Platforms: AWS, GCP, Azure, or another cloud provider. Containerization & Orchestration: Docker, Kubernetes, candidates should have experience deploying, scaling, and managing containers in production environments.Infrastructure as Code (IaC): Terraform, CloudFormation, Ansible, Chef or similar tools to manage infrastructure.CI/CD Pipelines: Jenkins, GitLab CI, CircleCI Experience with continuous integration and deployment pipelines is the key area to focus.Monitoring & Logging: Candidate should have experience in setting up monitoring using Prometheus, Grafana, OTEL and logging (e.g., ELK Stack, Splunk, Fluentd)? Ensuring system visibility and alerting is crucial for SREs.Automation & Scripting: Must have strong programming skills Python, Go, Bash, or C++ to automate manual tasks and build self-healing systems.Networking & Security: Any specific security or network management protocols/tools (e.g., firewalls, load balancing, VPNs) the candidate should be aware of.Collaborate with other Product Managers and other teams to bring best user experience.Develop scripts, implement tools and automation frameworks to reduce manual intervention.Document the changes and work with other teams to get it published.Demonstrate exceptional problem-solving skills, with an ability to see and solve issues before they affect business productivity.Excellent analytical and problem-solving skills, with the ability to troubleshoot complex issues and drive resolution.Strong communication and interpersonal skills, with the ability to collaborate effectively with stakeholders at all levels.Extensive understanding of DevOps methodologies and practices.Extensive knowledge of continuous integration and continuous deployment (CI/CD) pipelines. We value diversity, equity, and inclusion at Splunk and are an equal employment opportunity employer. Qualified applicants receive consideration for employment without regard to race, religion, color, national origin, ancestry, sex, gender, gender identity, gender expression, sexual orientation, marital status, age, physical or mental disability or medical condition, genetic information, veteran status, or any other consideration made unlawful by federal, state, or local laws. We consider qualified applicants with criminal histories, consistent with legal requirements.
Thank you for your interest in Splunk!