Remote, US
15 hours ago
Director of Engineering - Infinia AI Performance
Welcome page Returning Candidate? Log back in! Director of Engineering - Infinia AI Performance Job Locations US-Remote Job ID 2025-5114 Name Linked Remote: US Country United States City Remote Worker Type Regular Full-Time Employee Overview

This is an incredible opportunity to be part of a company that has been at the forefront of AI and high-performance data storage innovation for over two decades. DataDirect Networks (DDN) is a global market leader renowned for powering many of the world's most demanding AI data centers, in industries ranging from life sciences and healthcare to financial services, autonomous cars, Government, academia, research and manufacturing.

  

"DDN's A3I solutions are transforming the landscape of AI infrastructure." – IDC 

  

“The real differentiator is DDN. I never hesitate to recommend DDN. DDN is the de facto name for AI Storage in high performance environments” - Marc Hamilton, VP, Solutions Architecture & Engineering | NVIDIA 

  

DDN is the global leader in AI and multi-cloud data management at scale. Our cutting-edge data intelligence platform is designed to accelerate AI workloads, enabling organizations to extract maximum value from their data. With a proven track record of performance, reliability, and scalability, DDN empowers businesses to tackle the most challenging AI and data-intensive workloads with confidence. 

  

Our success is driven by our unwavering commitment to innovation, customer-centricity, and a team of passionate professionals who bring their expertise and dedication to every project. This is a chance to make a significant impact at a company that is shaping the future of AI and data management. 

  

Our commitment to innovation, customer success, and market leadership makes this an exciting and rewarding role for a driven professional looking to make a lasting impact in the world of AI and data storage. 

Job Description

We are seeking an experienced and accomplished Director of Engineering to lead our AI Engineering organization. In this role, you will oversee the design, deployment, and optimization of large-scale AI/ML training and inference pipelines using Infinia as foundational data platform. You will lead the development of connectors to open-source frameworks for data streaming, such as, Mosaic Streaming, Ray Data, and Tf.Data and inference optimizations such as K-V caching and LORAX. You will guide a talented organization of engineers focused on advanced end-to-end data platform for ingestion, transformation, preparation, and streaming on high-performance AI applications. Collaborating closely with software developers, product teams, and partners, you will lead experiments with state-of-the-art models using open-source tools and cloud platforms.

 

Key Responsibilities:

 

Leadership & Management:

Lead, mentor, and grow a team of senior ML and data engineers, fostering a culture of innovation and excellence.Set strategic direction for the ML engineering team in alignment with company goals.Lead strategic partnerships on all areas of AI, from conception to execution to delivering, communicating complex technical concepts to non-technical stakeholders effectively.Track, report, and manage the team’s performance against project milestones, ensuring on-time delivery of high-quality solutions.Partner with architects, engineers, and cross-functional teams to ensure the delivery of innovative, high-quality technical designs.Implement and refine engineering best practices, driving continuous improvements in quality, performance, and operational efficiency.

Technical Oversight:

Lead the integration of data ingestion and streaming pipelines open-source tools, like Ray Data, Mosaic Streaming, Tf.data, Torch Dataloader.Oversee the design of optimization for training like asynchronous checkpointing, and inference, like K-V caching and LORAX.Guide the integration of MLFlow with DDN’s Infinia product for comprehensive experiment tracking, model versioning, and deployment.Drive the implementation and scaling of Retrieval-Augmented Generation (RAG) pipelines to enhance generative model performance.Stay abreast of the latest developments in MLOps, AI/ML frameworks, and tooling.Identify and implement solutions to optimize pipeline performance, runtime, and resource utilization on Infinia.

Required Qualifications:

Bachelor’s or Master’s degree in Computer Science, Data Science, Machine Learning, or a related field.15+ years of experience in machine learning engineering, with at least 10 years in a leadership role.Proven track record of building and scaling AI/ML pipelines and managing high-performing engineering teams.Extensive experience with Apache Spark, Apache Airflow, and MLFlow or equivalent tools.Deep understanding of machine learning frameworks and libraries (TensorFlow, PyTorch, NVIDIA NeMo).Experience deploying open-source vector databases at scale.Proficiency with containerization tools (Docker, Kubernetes) and infrastructure as code (Terraform, Ansible).Solid understanding of cloud infrastructure (AWS, GCP, Azure) and distributed computing.Excellent problem-solving and troubleshooting abilities with a keen eye for performance optimization.Strong leadership, communication, and interpersonal skills.Ability to drive strategic initiatives and manage multiple projects simultaneously.This position requires participation in an on-call rotation to provide after-hours support as needed.

Preferred Skills:

Knowledge of NLP techniques and tools for model deployment.Implementation-level understanding of ML frameworks, data loaders and data formats.Experience with scaling RAG pipelines and integrating them with generative AI models.Experience in operationalizing AI/ML models in production environments.

 

 

This role offers an exceptional opportunity to lead a high-impact engineering organization at the core of DDN’s cutting-edge data solutions. If you are passionate about solving complex technical challenges and driving innovation in high-performance systems, we encourage you to apply.

DDN

Join our dynamic and driven team, where engineering excellence is at the heart of everything we do. We seek individuals who love to challenge themselves and are fueled by curiosity. Here, you'll have the opportunity to work across various areas of the company, thanks to our flat organizational structure that encourages hands-on involvement and direct contributions to our mission. Leadership is earned by those who take initiative and consistently deliver outstanding results, both in their work ethic and deliverables, making strong prioritization skills essential. Additionally, we value strong communication skills in all our engineers and researchers, as they are crucial for the success of our teams and the company as a whole.

 

Interview Process: After submitting your application, one of our recruiters will review your resume. If your application passes this stage, you will be invited to a 30-minute interview during which a member of our team will ask some basic questions. If you clear the interview, you will enter the main process, which can consist of up to four interviews in total:

 

Coding assessment: Often in a language of your choice.Systems design: Translate high-level requirements into a scalable, fault-tolerant service (depending on role).Real-time problem-solving: Demonstrate practical skills in a live problem-solving session.Meet and greet with the wider team.Our goal is to finish the main process in 2-3 weeks at most.

 

DataDirect Networks (DDN) is an Equal Opportunity/Affirmative Action employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, gender, gender identity, gender expression, transgender, sex stereotyping, sexual orientation, national origin, disability, protected Veteran Status, or any other characteristic protected by applicable federal, state, or local law.

 

#LI-Remote

Options Apply for this job onlineApplyShareRefer this job to a friendRefer Sorry the Share function is not working properly at this moment. Please refresh the page and try again later. Share on your newsfeed Application FAQs

Software Powered by iCIMS
www.icims.com

Confirm your E-mail: Send Email