AI/ML Site Reliability Engineer (SRE)
Lockheed Martin
Job ID: 687434BR Date posted: Feb. 27, 2025
Description:Space is a critical domain, connecting our technologies, our security and our humanity. While others view space as a destination, we see it as a realm of possibilities, where we can do more — we can innovate, invest, inspire and integrate our capabilities to transform the future.
At Lockheed Martin Space, we aim to harness the full potential of space to cultivate innovation, reduce costs, and push the boundaries of what technology can achieve. We’re creating future-ready solutions, focusing on resiliency and urgency through our 21st Century Security® vision. We’re erasing boundaries and forming partnerships across industries and around the world. We’re advancing spacecraft and the workforce to fuel the next generation. And we’re reimagining how space can connect us, ensuring security and prosperity.
Join us in shaping a new era in space and find a career that's built for you.
Job Description
We are seeking an experienced Site Reliability Engineer (SRE) to join our team, responsible for designing, building, and maintaining the infrastructure for a new Artificial Intelligence (AI) and Machine Learning (ML) environment. As an SRE, you will focus on provisioning, deploying, and managing the underlying infrastructure and tools that support the development, testing, and deployment of AI/ML models. You will work closely with data science, engineering, and product teams to ensure that the AI/ML systems are scalable, reliable, and performant and help to upskill existing staff to implement best practices for software maintenance and management, as well as server and workstation provisioning.
Responsibilities:
+ Design, build, and maintain the underlying infrastructure for AI/ML systems, including compute resources, storage, networking, and security
+ Provision and manage AI/ML tools and frameworks, such as TensorFlow, PyTorch, and scikit-learn, to support the development, testing, and deployment of AI/ML models
+ Deploy and manage AI/ML environments, including development, testing, and production environments, to support the development, testing, and deployment of AI/ML models
+ Ensure that AI/ML systems are scalable and performant, by designing and implementing efficient architectures, and optimizing resource utilization
+ Collaborate with data science, engineering, and product teams to ensure that AI/ML systems meet business requirements and are properly integrated with other systems
+ Troubleshoot and resolve issues with AI/ML infrastructure and tools, to ensure that systems are running smoothly and efficiently
+ Stay current with the latest developments in DEVSECOPS/SRE community of practice, and apply this knowledge to continuously improve our infrastructure and tools
This position is contingent upon the program award expected in Spring of 2025
Basic Qualifications:
+ Experience with HPC hardware such as GPU-based systems (e.g., NVIDIA Tesla, Quadro), high-performance CPUs (e.g., Intel Xeon, AMD EPYC), and high-speed storage systems
+ Experience with AI/ML-specific hardware
+ Experience with networking storage fundamentals (e.g., block storage, object storage, file systems)
+ Programming skills in languages such as Python, Java, or C++
+ Experience with AI/ML
+ Experience with containerization technologies such as Docker or Kubernetes
Must have an active TS/SCI security clearance to start.
Desired Skills:
+ Experience with TensorFlow, PyTorch, or scikit-learn
+ Experience with NVIDIA DGX and H series appliances
+ Experience with monitoring and logging tools such as Splunk
+ Experience with Agile project management methodologies (e.g. Scrum, Kanban)
+ Experience with CI/CD Tools such as Jenkins, GitLab or Rancher
Security Clearance Statement: This position requires a government security clearance, you must be a US Citizen for consideration.
Clearance Level: TS/SCI
Other Important Information You Should Know
Expression of Interest: By applying to this job, you are expressing interest in this position and could be considered for other career opportunities where similar skills and requirements have been identified as a match. Should this match be identified you may be contacted for this and future openings.
Ability to Work Remotely: Part-time Remote Telework: The employee selected for this position will work part of their work schedule remotely and part of their work schedule at a designated Lockheed Martin facility. The specific weekly schedule will be discussed during the hiring process.
Work Schedules: Lockheed Martin supports a variety of alternate work schedules that provide additional flexibility to our employees. Schedules range from standard 40 hours over a five day work week while others may be condensed. These condensed schedules provide employees with additional time away from the office and are in addition to our Paid Time off benefits.
Schedule for this Position: 4x10 hour day, 3 days off per week
Lockheed Martin is an equal opportunity employer. Qualified candidates will be considered without regard to legally protected characteristics.
The application window will close in 90 days; applicants are encouraged to apply within 5 - 30 days of the requisition posting date in order to receive optimal consideration.
At Lockheed Martin, we use our passion for purposeful innovation to help keep people safe and solve the world's most complex challenges. Our people are some of the greatest minds in the industry and truly make Lockheed Martin a great place to work.
With our employees as our priority, we provide diverse career opportunities designed to propel, develop, and boost agility. Our flexible schedules, competitive pay, and comprehensive benefits enable our employees to live a healthy, fulfilling life at and outside of work. We place an emphasis on empowering our employees by fostering an inclusive environment built upon integrity and corporate responsibility.
If this sounds like a culture you connect with, you’re invited to apply for this role. Or, if you are unsure whether your experience aligns with the requirements of this position, we encourage you to search on Lockheed Martin Jobs, and apply for roles that align with your qualifications.
Experience Level: Experienced Professional
Business Unit: SPACE
Relocation Available: Possible
Career Area: Artificial Intelligence
Type: Full-Time
Shift: First
Description:Space is a critical domain, connecting our technologies, our security and our humanity. While others view space as a destination, we see it as a realm of possibilities, where we can do more — we can innovate, invest, inspire and integrate our capabilities to transform the future.
At Lockheed Martin Space, we aim to harness the full potential of space to cultivate innovation, reduce costs, and push the boundaries of what technology can achieve. We’re creating future-ready solutions, focusing on resiliency and urgency through our 21st Century Security® vision. We’re erasing boundaries and forming partnerships across industries and around the world. We’re advancing spacecraft and the workforce to fuel the next generation. And we’re reimagining how space can connect us, ensuring security and prosperity.
Join us in shaping a new era in space and find a career that's built for you.
Job Description
We are seeking an experienced Site Reliability Engineer (SRE) to join our team, responsible for designing, building, and maintaining the infrastructure for a new Artificial Intelligence (AI) and Machine Learning (ML) environment. As an SRE, you will focus on provisioning, deploying, and managing the underlying infrastructure and tools that support the development, testing, and deployment of AI/ML models. You will work closely with data science, engineering, and product teams to ensure that the AI/ML systems are scalable, reliable, and performant and help to upskill existing staff to implement best practices for software maintenance and management, as well as server and workstation provisioning.
Responsibilities:
+ Design, build, and maintain the underlying infrastructure for AI/ML systems, including compute resources, storage, networking, and security
+ Provision and manage AI/ML tools and frameworks, such as TensorFlow, PyTorch, and scikit-learn, to support the development, testing, and deployment of AI/ML models
+ Deploy and manage AI/ML environments, including development, testing, and production environments, to support the development, testing, and deployment of AI/ML models
+ Ensure that AI/ML systems are scalable and performant, by designing and implementing efficient architectures, and optimizing resource utilization
+ Collaborate with data science, engineering, and product teams to ensure that AI/ML systems meet business requirements and are properly integrated with other systems
+ Troubleshoot and resolve issues with AI/ML infrastructure and tools, to ensure that systems are running smoothly and efficiently
+ Stay current with the latest developments in DEVSECOPS/SRE community of practice, and apply this knowledge to continuously improve our infrastructure and tools
This position is contingent upon the program award expected in Spring of 2025
Basic Qualifications:
+ Experience with HPC hardware such as GPU-based systems (e.g., NVIDIA Tesla, Quadro), high-performance CPUs (e.g., Intel Xeon, AMD EPYC), and high-speed storage systems
+ Experience with AI/ML-specific hardware
+ Experience with networking storage fundamentals (e.g., block storage, object storage, file systems)
+ Programming skills in languages such as Python, Java, or C++
+ Experience with AI/ML
+ Experience with containerization technologies such as Docker or Kubernetes
Must have an active TS/SCI security clearance to start.
Desired Skills:
+ Experience with TensorFlow, PyTorch, or scikit-learn
+ Experience with NVIDIA DGX and H series appliances
+ Experience with monitoring and logging tools such as Splunk
+ Experience with Agile project management methodologies (e.g. Scrum, Kanban)
+ Experience with CI/CD Tools such as Jenkins, GitLab or Rancher
Security Clearance Statement: This position requires a government security clearance, you must be a US Citizen for consideration.
Clearance Level: TS/SCI
Other Important Information You Should Know
Expression of Interest: By applying to this job, you are expressing interest in this position and could be considered for other career opportunities where similar skills and requirements have been identified as a match. Should this match be identified you may be contacted for this and future openings.
Ability to Work Remotely: Part-time Remote Telework: The employee selected for this position will work part of their work schedule remotely and part of their work schedule at a designated Lockheed Martin facility. The specific weekly schedule will be discussed during the hiring process.
Work Schedules: Lockheed Martin supports a variety of alternate work schedules that provide additional flexibility to our employees. Schedules range from standard 40 hours over a five day work week while others may be condensed. These condensed schedules provide employees with additional time away from the office and are in addition to our Paid Time off benefits.
Schedule for this Position: 4x10 hour day, 3 days off per week
Lockheed Martin is an equal opportunity employer. Qualified candidates will be considered without regard to legally protected characteristics.
The application window will close in 90 days; applicants are encouraged to apply within 5 - 30 days of the requisition posting date in order to receive optimal consideration.
At Lockheed Martin, we use our passion for purposeful innovation to help keep people safe and solve the world's most complex challenges. Our people are some of the greatest minds in the industry and truly make Lockheed Martin a great place to work.
With our employees as our priority, we provide diverse career opportunities designed to propel, develop, and boost agility. Our flexible schedules, competitive pay, and comprehensive benefits enable our employees to live a healthy, fulfilling life at and outside of work. We place an emphasis on empowering our employees by fostering an inclusive environment built upon integrity and corporate responsibility.
If this sounds like a culture you connect with, you’re invited to apply for this role. Or, if you are unsure whether your experience aligns with the requirements of this position, we encourage you to search on Lockheed Martin Jobs, and apply for roles that align with your qualifications.
Experience Level: Experienced Professional
Business Unit: SPACE
Relocation Available: Possible
Career Area: Artificial Intelligence
Type: Full-Time
Shift: First
Confirm your E-mail: Send Email
All Jobs from Lockheed Martin