Senior Software Engineer, ML Inference
Apple
Senior Software Engineer, ML Inference
Cupertino,California,United States
Software and Services
We are seeking a Senior Software Engineer, ML Inference, to join Apple Maps. You will optimize, and scale machine learning models, focusing on large language models, for high-performance, production-scale inference. Collaborate with data scientists, researchers, and infrastructure teams to ensure efficient GPU-optimized deployment, handling tens of billions of daily requests. We value a culture of speed and agility—"iterate quickly, improve continuously"—where rapid iteration, learning from failures, and constant refinement are at the core of how we operate. In this environment, you will develop and deploy solutions swiftly, learn from each iteration, and constantly refine your approach to deliver better results. As a self-driven, results-oriented individual with a strong work ethic, you will play a key role in guiding the technical direction of the ML Platform, solving complex problems, and leading by example. You’ll bring leadership to the team through both mentorship and hands-on contributions, helping drive innovations in model optimization and performance tuning.
**Description**
* Optimize LLMs for Inference: Implement and enhance large language models for real-time and batch inference, balancing performance and resource efficiency. * Advanced Inference Optimization: Apply techniques such as quantization and speculative decoding to reduce model size and accelerate inference without sacrificing accuracy. Leverage quantization-aware training (QAT) and post-training quantization (PTQ) to deploy models on resource-constrained hardware. * Cross-Functional Collaboration: Partner with data scientists, ML researchers, and infrastructure engineering teams to understand model requirements, provide feedback, and ensure smooth deployment of models into production. * Monitoring & Resource Management: Implement monitoring tools to profile and track the performance of models running on GPUs, including real-time monitoring of GPU utilization, memory usage, and inference throughput. Manage and optimize resource allocation to ensure high availability and minimal downtime. * Continuous Improvement & R&D: Stay on top of the latest research in LLM inference techniques, GPU optimizations, and distributed systems to bring innovative improvements to the overall system.
**Minimum Qualifications**
+ Bachelor's degree in Computer Science, Engineering, or related field (or equivalent experience).
+ 5+ years in software engineering focused on ML inference, GPU acceleration, and large-scale systems.
+ Expertise in deploying and optimizing LLMs for high-performance, production-scale inference.
+ Proficiency in Python, Java or C++.
+ Experience with deep learning frameworks like PyTorch, TensorFlow, and Hugging Face Transformers.
+ Experience with model serving tools (e.g., NVIDIA Triton, TensorFlow Serving, VLLM, etc)
+ Experience with optimization techniques like Attention Fusion, Quantization, and Speculative Decoding.
+ Skilled in GPU optimization (e.g., CUDA, TensorRT-LLM, cuDNN) to accelerate inference tasks.
+ Familiarity with cloud technologies like Docker, Kubernetes, AWS EKS for scalable deployment.
**Key Qualifications**
**Preferred Qualifications**
+ Master’s or PhD in Computer Science, Machine Learning, or a related field.
+ Understanding of ML Ops practices, continuous integration, and deployment pipelines for machine learning models.
+ Familiarity with model distillation, low-rank approximations, and other model compression techniques for reducing memory footprint and improving inference speed.
+ Strong understanding of distributed systems, multi-GPU/multi-node parallelism, and system-level optimization for large-scale inference.
**Education & Experience**
**Additional Requirements**
**Pay & Benefits**
+ At Apple, base pay is one part of our total compensation package and is determined within a range. This provides the opportunity to progress as you grow and develop within a role. The base pay range for this role is between $175,800 and $312,200, and your base pay will depend on your skills, qualifications, experience, and location.Apple employees also have the opportunity to become an Apple shareholder through participation in Apple’s discretionary employee stock programs. Apple employees are eligible for discretionary restricted stock unit awards, and can purchase Apple stock at a discount if voluntarily participating in Apple’s Employee Stock Purchase Plan. You’ll also receive benefits including: Comprehensive medical and dental coverage, retirement benefits, a range of discounted products and free services, and for formal education related to advancing your career at Apple, reimbursement for certain educational expenses — including tuition. Additionally, this role might be eligible for discretionary bonuses or commission payments as well as relocation.Learn more (https://www.apple.com/careers/us/benefits.html) about Apple Benefits.Note: Apple benefit, compensation and employee stock programs are subject to eligibility requirements and other terms of the applicable plan or program.
+ Apple is an equal opportunity employer that is committed to inclusion and diversity. We take affirmative action to ensure equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics.Learn more about your EEO rights as an applicant. (https://www.eeoc.gov/sites/default/files/2023-06/22-088\_EEOC\_KnowYourRights6.12ScreenRdr.pdf)
**Apple Footer**
Apple is an equal opportunity employer that is committed to inclusion and diversity. We take affirmative action to ensure equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics. Learn more about your EEO rights as an applicant (Opens in a new window) .
Apple will not discriminate or retaliate against applicants who inquire about, disclose, or discuss their compensation or that of other applicants. United States Department of Labor. Learn more (Opens in a new window) .
Apple will consider for employment all qualified applicants with criminal histories in a manner consistent with applicable law. If you’re applying for a position in San Francisco, review the San Francisco Fair Chance Ordinance guidelines (opens in a new window) applicable in your area.
Apple participates in the E-Verify program in certain locations as required by law. Learn more about the E-Verify program (Opens in a new window) .
Apple is committed to working with and providing reasonable accommodation to applicants with physical and mental disabilities. Reasonable Accommodation and Drug Free Workplace policy Learn more (Opens in a new window) .
Apple is a drug-free workplace. Reasonable Accommodation and Drug Free Workplace policy Learn more (Opens in a new window) .
Confirm your E-mail: Send Email
All Jobs from Apple