New York
2 days ago
Senior Software Engineer- Data Science Platform- Generative AI Inference

Bloomberg runs on data. It's our business and our product. From the biggest banks to elite hedge funds, financial institutions need timely, accurate data to capture opportunities and evaluate risk in fast-moving markets. With petabytes of data available, a solution to transform and analyze the data is critical to our success.

Bloomberg’s Data Science Platform was established to support development efforts around data-driven science, machine learning, and business analytics. The solution  aims to provide scalable compute, specialized hardware and first-class support for a variety of workloads such as ML training job and inference services, Spark, and Jupyter. The solution was developed to provide a standard set of tooling for addressing the Model Development Life Cycle from experimentation and training to inference. The solution is built using containerization, container orchestration and cloud architecture and built on top of 100% open source foundations.

Production Inference is a critical step on the MDLC to realize the business value for Bloomberg AI applications and the advent of large language models (LLMs) presents new opportunities for expanding NLP capabilities in our products. The inference solution is powered by open source project KServe which is a production ready inference solution for both generative and predictive AI applications. We are poised for enormous user growth this year and have an ambitious roadmap in terms of new features as well as improved user experience. That’s where you come in. As a member of the inference team, you’ll have the opportunity to design and implement scalable, low latency, high throughput model inference solutions in a hybrid cloud environment. We are founding members of the KServe project to standardize ML Inference within the Kubernetes ecosystem. As part of that, we regularly upstream features we develop, present at conferences and collaborate with our peers in the industry. Open source is at the heart of our team. It's not just something we do in our free time, it is how we work.

We’ll trust you to:

Interact with data scientists to understand their production use cases and requirements to advise the next set of GenAI features for the inference platform.

Design solutions for problems such as scalable model deployment, low latency/high throughput inference, GPU resource optimizations and autoscaling.

Automate operation and improve telemetry of the inference platform in our infrastructure stack.

Design solutions for multi-cloud strategy.

You’ll need to be able to:

Innovate and design solutions that keep in mind strict production SLA: low latency/high throughput, multi-tenancy, high availability, reliability across clusters/data centers, etc.

Fix and optimize generative inference application performance.

Provide developer and operational documentation.

Provide performance analysis and capacity planning for clusters.

String communication and collaboration skills, with the ability to work effective with multi-functional teams

Have a passion for providing reliable and scalable infrastructure.

You’ll need to have:

4+ years programming experience in two or more languages (e.g., Python, Go, C++)

A Degree in Computer Science, Engineering or similar field of study or equivalent work experience

Experience designing and implementing low-latency, high-scalability inference platform.

Design, develop, test and deploy inference solutions for LLMs

Explore emerging inference optimization techniques

Experience with debugging performance issues with distributed tracing.

Experience working with a distributed multi-tenancy and multi-cluster system.

Experience with distributed systems eg. Kubernetes, Kafka, RabbitMQ, Zookeeper/Etcd.

Strong knowledge of data structures and algorithms.

Linux systems experience (Network, OS, Filesystems).

We’d love to see:

Experience Large Language Model Inference, especially vLLM, TensorRT-LLM runtimes.

Experience with Kubeflow/KServe, MLFlow, Sagemaker.

Experience working with GPU compute software and hardware.

Ability to identify and perform OS and hardware-level optimizations.

Open source involvement such as a well-curated blog, accepted contribution, or community presence.

Experience with cloud LLM providers such as AWS Redrock, Gemini or Azure OpenAI.

Experience with configuration management systems (Terraform, Ansible)

Experience with continuous integration tools and technologies (Jenkins, Git, Chat-ops)

Learn more about our work using the links below:

Keynote: Platform Building Blocks: How to build ML infrastructure with CNCF projects - https://www.youtube.com/watch?v=ncED2EMcxZ8

The State and Future of Cloud Native Model Inference -

https://www.youtube.com/watch?v=786VaGAfm6I

The Hitchhiker's Guide to Kubernetes Platforms: Don’t Panic, Just Launch! https://www.youtube.com/watch?v=a84mwXicpdc


Salary Range = 160000 - 240000 USD Annually + Benefits + Bonus
The referenced salary range is based on the Company's good faith belief at the time of posting. Actual compensation may vary based on factors such as geographic location, work experience, market conditions, education/training and skill level.


We offer one of the most comprehensive and generous benefits plans available and offer a range of total rewards that may include merit increases, incentive compensation, [Exempt roles only], paid holidays, paid time off, medical, dental, vision, short and long term disability benefits, 401(k) +match, life insurance, and various wellness programs, among others. The Company does not provide benefits directly to contingent workers/contractors and interns.
Confirm your E-mail: Send Email