Charlotte, North Carolina, USA
9 days ago
Sr. Software Engineer - (Site Reliability Engineer)

Your Impact

The primary purpose of this role is to run the production environment by monitoring availability and taking a holistic view of system health. This includes building software and systems to manage platform infrastructure and applications to improve the reliability and quality of our suite of software solutions.  This role provides primary operational support and engineering for multiple large, distributed software applications.

What You’ll Do

Run the production environment by monitoring availability and taking a holistic view of system health.Build software and systems to manage platform infrastructure and applications.Improve reliability, quality, and time-to-market of our suite of software solutions.Measure and optimize system performance, with an eye toward pushing capabilities forward.Provide primary operational support and engineering for multiple large, distributed software applications.Improve reliability, quality, and reduce MTTR.Participate in system design consulting, platform management, capacity planning, and cost analysis.Measure and optimize system performance, to push our capabilities forward and innovate to continually improve.Gather and analyze metrics from applications and services to assist in performance tuning and fault finding.Contribute to capacity planning, demand forecasting, software performance analysis, and systems tuning.Develop and Implement monitoring, observability, and alerting tools such as dashboards and logging systems to understand the health and availability of our infrastructure and applications.Collect and analyze information from distributed systems into simple views of the technology portfolio to identify trends and spot stability threats.Monitor application availability, latency, and overall system health.Develop self-service solutions to help increase productivity by removing toil and reducing unnecessary roadblocks.Resolve technical issues in production, learn to mitigate them quickly, and find ways to prevent them.Document every action so lessons learned turn into repeatable actions and then into automation.Triage, analyze, and provide solutions to critical & high-priority technical issues occurring in the ecosystem, and optimize incident management processes.Respond, react & communicate as per the ITSM incident management process. This process involves detection of the incident, timely communication to leadership during the life of the incident, and service restoration, followed by root cause analysis to prevent the incident from occurring in the future.Drive blameless postmortem culture.Regularly review key site technical metrics such as transaction errors, logging, response times, caching strategies, conversion/bounce rates, capacity & resource utilization

Qualifications

5 years of demonstrated proficiency in one or more scripting languages such as bash, python, Go, etc.5 years of experience with Kubernetes or equivalent5 years of Software development experience in Java, C, C++5 years of experience with containers and container orchestrators - Docker, Kubernetes5 years of demonstrated experience debugging and fixing system/infrastructure and application issues.5 years of experience working with monitoring tools such as Prometheus, Grafana, Splunk, Google stackdriver, etc.5 years of experience creating CUJs (Critical User Journeys) by identifying SLIs/SLOs and working with service/application teams to implement monitoring and alerting tools.5 years of experience with databases (SQL or NoSQL)5 years of experience with log analysis and building dashboards.Retail knowledge is a plus.

Preferred Qualifications:

Master's Degree in Computer Science, CIS, or related field5 years of IT experience developing and implementing business systems within an organization5 years of experience working with defect or incident-tracking software5 years of experience writing technical documentation in a software development environment3 years of experience working with an IT Infrastructure Library (ITIL) framework3 years of experience leading teams, with or without direct reports5 years of experience working with source code control systemsExperience working with Continuous Integration/ Continuous Deployment tools5 years of experience in systems analysis, including defining technical requirements and performing high-level design for complex solutions4 years' experience with Reactive programming.

Where You’ll Be

Associates are required to relocate to the Charlotte region to foster collaboration and facilitate improved testing and support.Lowe’s supports a Flex Office concept where in-person work is required two days per week at the Charlotte Tech HubMost business meetings are planned around the Eastern time zone.

About Lowe’s 

Lowe’s Companies, Inc. (NYSE: LOW) is a FORTUNE® 50 home improvement company serving approximately 16 million customer transactions a week in the United States. With total fiscal year 2023 sales of more than $86 billion, Lowe’s operates over 1,700 home improvement stores and employs approximately 300,000 associates. Based in Mooresville, N.C., Lowe’s supports the communities it serves through programs focused on creating safe, affordable housing and helping to develop the next generation of skilled trade experts. For more information, visit Lowes.com. 

Lowe’s is an equal opportunity employer and administers all personnel practices without regard to race, color, religious creed, sex, gender, age, ancestry, national origin, mental or physical disability or medical condition, sexual orientation, gender identity or expression, marital status, military or veteran status, genetic information, or any other category protected under federal, state, or local law.

Confirm your E-mail: Send Email