Our Purpose
Mastercard powers economies and empowers people in 200+ countries and territories worldwide. Together with our customers, we’re helping build a sustainable economy where everyone can prosper. We support a wide range of digital payments choices, making transactions secure, simple, smart and accessible. Our technology and innovation, partnerships and networks combine to deliver a unique set of products and services that help people, businesses and governments realize their greatest potential.
Title and Summary
Senior Cloud EngineerOur Purpose:We work to connect and power an inclusive, digital economy that benefits everyone, everywhere by making transactions safe, simple, smart and accessible. Using secure data and networks, partnerships and passion, our innovations and solutions help individuals, financial institutions, governments, and businesses realize their greatest potential. Our decency quotient, or DQ, drives our culture and everything we do inside and outside of our company. We cultivate a culture of inclusion (https://www.mastercard.us/en-us/vision/who-we-are/diversity-inclusion.html) for all employees that respects their individual strengths, views, and experiences. We believe that our differences enable us to be a better team – one that makes better decisions, drives innovation and delivers better business results.
Overview:
The Cyber and Intelligence Solutions (C&I) team is responsible for product management and innovative product development of products and services to address the evolving risk & cyber security needs of all of MasterCard’s various customer segments. The C&I team was established with the responsibility to safeguard all aspects of safety and security in payments and have made great strides with new products, services and standards, positively impacting all aspects of our current and future payment eco-system.
Ekata, a Mastercard company, is the global standard in identity verification, providing businesses worldwide the ability to link any digital transaction to the human behind it. Our Ekata Identity Engine, the first and only of its kind, uses complex machine learning to combine features derived from the billions of transactions within our proprietary network and the data from our graph to deliver industry leading risk assessment solutions.
As a Site Reliability Engineer manages our production environment, providing a highly available and scalable platform for Ekata to serve our customers. The infrastructure team provides a resource for Engineering to help diagnose production issues and provide guidance on improving the availability and performance of our applications. This position also develops systems, automation, and tools to help make it easier for Engineering teams to deploy services in a fast, automated and reliable fashion.
In this role You will:
Key Responsibilities:
Develop and Manage SLOs:
•Lead the design and implementation of SLOs, SLIs (Service Level Indicators), and SLAs (Service Level Agreements) across multiple teams.
•Define measurable SLOs aligned with business goals and customer expectations.
•Regularly review, track, and adjust SLOs to maintain optimal service levels and prioritize customer impact.
Monitoring and Observability:
•Establish and enforce best practices for monitoring, alerting, and observability for critical systems and services.
•Implement and manage monitoring tools and technologies (Dynatrace, Prometheus, Grafana, Datadog).
•Drive the integration of monitoring into the software development lifecycle, ensuring early detection of performance issues.
•Create dashboards, reports, and other visual aids to provide real-time insights into service health and operational performance.
Continuous Improvement:
•Lead the identification and implementation of operational efficiencies and process improvements.
•Ensure a culture of resilience by constantly evaluating and improving the availability, performance, and security of services.
•Collaborate with other engineering teams to implement automated remediation systems and self-healing mechanisms.
Collaboration and Leadership:
•Collaborate with development, infrastructure, and product teams to define service objectives and integrate reliability metrics into the software development lifecycle.
•Provide technical leadership and mentorship to other SREs and junior engineers.
•Promote a culture of reliability by advocating for reliability best practices across the engineering organization.
Required Qualifications:
Experience:
•Strong experience in Site Reliability Engineering, DevOps, or related fields.
•Proven experience in designing, implementing, and managing SLOs and SLIs in a production environment.
•Experience with cloud platforms (AWS, Azure) and containerization technologies (Kubernetes, Docker).
•Experience with modern monitoring and observability tools (Dynatrace, Prometheus, Grafana, Datadog.).
•Strong understanding of distributed systems, high availability, and failure recovery.
•Familiarity with chaos engineering practices and tools (e.g., Gremlin, Chaos Monkey).
•Strong leadership and team collaboration skills.
•Deep understanding of service-level management, incident response, and root cause analysis.
•Excellent problem-solving and troubleshooting skills.
•Strong programming and scripting skills (e.g., Python, Go, Bash, Java, C#).
•Familiarity with CI/CD pipelines and automation frameworks.
Soft Skills:
•Strong communication skills to engage with both technical and non-technical stakeholders.
•Ability to explain complex technical issues clearly and effectively.
•Detail-oriented with an ability to manage multiple projects and priorities in a fast-paced environment.
#AI3Mastercard is a merit-based, inclusive, equal opportunity employer that considers applicants without regard to gender, gender identity, sexual orientation, race, ethnicity, disabled or veteran status, or any other characteristic protected by law. We hire the most qualified candidate for the role. In the US or Canada, if you require accommodations or assistance to complete the online application process or during the recruitment process, please contact reasonable_accommodation@mastercard.com and identify the type of accommodation or assistance you are requesting. Do not include any medical or health information in this email. The Reasonable Accommodations team will respond to your email promptly.
Corporate Security Responsibility
All activities involving access to Mastercard assets, information, and networks comes with an inherent risk to the organization and, therefore, it is expected that every person working for, or on behalf of, Mastercard is responsible for information security and must:
Abide by Mastercard’s security policies and practices;
Ensure the confidentiality and integrity of the information being accessed;
Report any suspected information security violation or breach, and
Complete all periodic mandatory security trainings in accordance with Mastercard’s guidelines.
Pay Ranges
Vancouver, Canada: $104,000 - $167,000 CAD