At Carta, our employees set out on a mission to unlock the power of equity ownership for more people in more places. We believe that the problems we solve today unlock the opportunities of tomorrow. As a Senior Site Reliability Engineer, you’ll work to:
Build and scale our internal platform offerings (compute, storage and networking services) to ensure the reliability, and performance of our applications. Design and implement monitoring, alerting, and incident response systems. Collaborate with application software engineers (as needed) to guide their design and ensure it scales for what Carta needs in the long run. Act as an agent of change and push boundaries to incrementally improve our systems as we expand globally. The Team You’ll Work WithYou’ll be joining the Infrastructure Engineering team at Carta. The Infrastructure Engineering team is responsible for providing secure, reliable, scalable and performant Infrastructure to Carta’s customers and developers.
We are Software and Infrastructure Engineers who specialize in cloud computing, networking, systems design and architecture, storage, real time data telemetry, associated automation, tooling and processes. We possess a breadth and depth of knowledge about Carta’s infrastructure and industry wide best practices, that translates into leverage for Carta’s business.
About YouYou are excited by the idea of developing scalable, reliable and efficient infrastructure that powers the entire company. We’re looking for strong communicators who enjoy collaborating to solve complex problems. Familiarity with infrastructure best practices on performance, reliability and security and their associated tools is appreciated.
Our stack is Python, Java, Terraform, gRPC, Docker, Kubernetes, Postgres, running on AWS. Come join us!
Cloud Platforms: Extensive experience with cloud services such as AWS, Google Cloud Platform, or Azure, including services like EC2, S3, RDS, and Lambda. Experience with Kubernetes or other container orchestration is preferred! Infrastructure as Code (IaC): Proficient in using tools such as Terraform, Ansible, or CloudFormation for managing and provisioning cloud infrastructure. Networking: Experience with networking concepts and tools, including Container Network Interface (CNI), Network policy implementations. Experience with proxies and service mesh is a big plus. Monitoring and Observability: Strong knowledge of monitoring tools and practices, such as Prometheus, Grafana, ELK Stack, or Datadog, and the ability to set up and maintain comprehensive monitoring solutions. Software Development: Proficiency in Python, with the ability to write efficient, maintainable, and scalable code. Database Management: Strong knowledge of PostgreSQL, including performance tuning, replication, backup, and recovery. API Services: Experience in designing, deploying, and maintaining API services, with a strong understanding of RESTful and/or GraphQL API design principles. Knowledge of service mesh technologies such as Istio, Cilium, or Linkerd is appreciated though not essential Experience operating CI/CD and its associated best practices is also appreciated though not essential