Dublin
2 days ago
Staff Engineer, Data Lake
Who we are About Stripe

Stripe is a financial infrastructure platform for businesses. Millions of companies—from the world’s largest enterprises to the most ambitious startups—use Stripe to accept payments, grow their revenue, and accelerate new business opportunities. Our mission is to increase the GDP of the internet, and we have a staggering amount of work ahead. That means you have an unprecedented opportunity to put the global economy within everyone’s reach while doing the most important work of your career.

About the team

Stripe’s Data Lake team owns the storage layer, metadata services, and supporting tooling that power analytics, machine learning, and data-driven products across the company. We build and operate the petabyte-scale data lake that sits atop cloud object storage, enabling thousands of engineers, scientists, and risk analysts to discover, secure, and query data efficiently. Our platform is built on open-source technologies such as Apache Iceberg, Parquet, Trino, Spark, and Flink, and it is a critical foundation for Stripe’s decision-making, reporting, and risk systems.

What you’ll do

We’re looking for a Staff Engineer with deep experience designing, building, and scaling distributed data systems. You will help lead the next evolution of Stripe’s data lake—moving toward a secure, compliant, and cost-efficient storage layer that seamlessly supports both streaming and batch workloads. You’ll work closely with partner teams and open-source communities to create state-of-the-art infrastructure that makes data at Stripe fast, reliable, and secure.

Responsibilities

Scope, design, and lead high-impact technical projects within the Data Lake domain Build and maintain the core services and tooling that power Stripe’s data lake: metadata/catalog services, table formats, ingestion pipelines, and governance controls Optimize end-to-end performance and cost across storage and compute engines  Drive the adoption of emerging open-source capabilities and contribute fixes and features back to projects such as Apache Iceberg, Parquet. Champion operational excellence—ensuring the data lake is highly available, reliable, secure, and compliant Partner with product and data teams to understand user needs and shape the roadmap for a “data lake as a product” experience Who you are

We’re looking for someone who meets the minimum requirements to be considered for the role. If you meet these requirements, you are encouraged to apply. The preferred qualifications are a bonus, not a requirement.

Minimum requirements 10+ years of professional experience writing high quality production level code or software programs. Have experience with distributed data systems such as Spark, Flink, Trino, Kafka ,etc Experience developing, maintaining and debugging distributed systems built with open source tools. Experience building infrastructure as a product centered around user needs. Preferred qualifications Hands-on experience with modern open-source data lake technologies such as Apache Iceberg Experience operating large-scale data lakes on AWS S3, GCS, or equivalent cloud object storage Contributions to relevant open-source projects (Iceberg, Trino, Parquet, Spark, Flink, etc.) Familiarity with data governance, security, and compliance requirements in regulated environments

 

Confirm your E-mail: Send Email