Chicago, IL, USA
4 days ago
IT - Technical Test Lead | Performance Testing | Performance Testing - ALL
Job Seekers, Please send resumes to resumes@hireitpeople.com

Detailed Job Description:

Troubleshoot mission critical full stack applications, microservices, infrastructure and legacy business applications/websites performance and availability issues Work with DevOps Architects to implement fault tolerance, back-up, and disaster recovery solutions. Lead root cause analysis/investigations through identifying, analyzing and remediating service(s) performance and availability issues to ensure maximum service uptime and availability Pre-emptively pursue the discovery of system faults throughout the application lifecycle before and after release. Manage the incident lifecycle to resolutions and conducting Blameless Post Incident Review Working with the QA Lead to establish best practices for measuring and monitoring availability, latency and overall system health. You re expected to be on- call and have strong written communication skills and be able to develop working relationships with coworkers Experience in balancing service reliability, metrics, sustainability, technical debt, and operational toil for live services running at scale Implementing concepts in Chaos Engineering like Simeon Armies. Work across multiple project teams simultaneously to support rapid development efforts Solve complex, business critical issues that impact bottom line financial numbers and customer loyalty/experience Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity Contribute positively to open source projects developed and join existing communities Bring experience, pragmatism, empathy, and composure to interactions with teams outside of the RE organization Work frequently with Product teams on shared goals and cross-team projects Balance planned and reactive work using basic project planning techniques and technical roadmaps Work and collaborate across teams such Application services, Capacity Planning, Hardware, Network, and Datacenter Operations Participate in building advanced tooling for testing, monitoring, administration, and operations of multiple clusters across multiple environments Experience negotiating SLIs, SLOs, and SLAs with product owners

General/Minimum Qualifications:

3-5+ years of applying reliability and chaos engineering principles with distributed cloud services Strong knowledge of and comfortability with GNU/Linux and Windows operating system(s) Proficiency in high-level languages such as Ruby, Python, Powershell, and Bash Exposure to system-level languages such as Go, C/C++/C# Familiarity with configuration management software such as Puppet, Chef, Ansible, or Salt Source control, branching, & merging, packaging (git, GitHub, NuGet, npm) Networking basics: TCP vs UDP, basic troubleshooting, HTTP load balancing, firewall, private networks, multi-tier design, scale-out Databases RDBMS, NoSQL, SQL, analytics, persistent data Familiarity with standard infrastructure concepts like load balancers, firewalls, object storage and where/when they might be used Service Management Incident Response, Change, and Problem Management. Experience with Kubernetes, Docker, Helm, and Virtual Machines Cloud computing concepts (one or more public cloud providers) VMs vs Docker Containers, block storage vs object storage, infra automation vs install automation Experience operating a platform, software as a service, or shipping software Experience as an open-source contributor
Confirm your E-mail: Send Email