Madrid
21 days ago
Senior DevOps Engineer (AI + Azure)

About Us

At EY wavespace Madrid - AI & Data Hub, we are a diverse, multicultural team at the forefront of technological innovation, working with cutting-edge technologies like Gen AI, data analytics, robotics, etc. Our center is dedicated to exploring the future of AI and Data.

 

Overview:

We’re looking for a Senior DevOps Engineer to build and run cloud and AI infrastructure at scale. You’ll own IaC with Terraform, CI/CD, Kubernetes, and Linux. You’ll also help run LLM workloads both in Azure and locally (Ollama/vLLM/llama.cpp). Your work will enable fast, secure, repeatable delivery.

 

Key responsibilities

Build and maintain Azure infrastructure with Terraform (modules, workspaces, pipelines, policies). Design and operate CI/CD with GitHub Actions and/or Azure DevOps (multi-stage, approvals, environments). Run containers and Kubernetes/AKS (Helm, ingress, autoscaling, node pools, storage). Manage AI/LLM runtime: local model runners (Ollama, vLLM, llama.cpp), GPU/CPU configs. Support RAG: embeddings pipelines, vector DBs (Azure AI Search/Cognitive Search, pgvector, Milvus), data sync, retention. Automate platform tasks with Python (tooling, CLI utilities, API glue, ops scripts). Implement observability (Azure Monitor, Prometheus/Grafana, logs/traces/metrics, alerts, runbooks, SLOs). Apply Zero Trust security; Enforce least privilege and role-based access control (RBAC), Identity-based segmentation (Azure AD, Conditional Access, MFA). Implement policy-as-code (OPA, Azure Policy) for compliance. Rotate secrets and certificates via Key Vault; integrate with pipelines. Add continuous security scanning (SAST/DAST, container image scanning). Handle reliability: rollout strategies, health probes, incident response, postmortems. Optimize costs: right-sizing, autoscaling, budgets, tags, reporting.

 

Key requirements:

4+ years in DevOps/SRE/Platform Engineering. Strong Linux (shell, systemd, networking, performance troubleshooting). Terraform at scale (modules, state backends, CI/CD integration). Deep Azure experience (AKS, VNets, Key Vault, Storage, Monitor, Identity, Networking). CI/CD expertise (GitHub Actions and/or Azure DevOps). Containers and Kubernetes in production. Python or scripting for automation (solid scripting and tooling; not full-time app dev). Hands-on with LLM setups (local runners or Azure OpenAI), embeddings, vector indexes, and RAG basics.

Nice to have

Multi-cloud exposure (AWS / GCP). Azure AI services (Azure OpenAI, Cognitive Search). GitOps (Argo CD/Flux), Helm packaging, OCI registries. Eventing/queues (Event Grid, Service Bus, Kafka). Security/compliance in cloud (CIS, NIST, Microsoft CAF). Certifications: AZ‑104, AZ‑204, AZ‑400, AI‑900, HashiCorp Terraform Associate, CKA/CKAD. Experience with GPU nodes, drivers, CUDA/ROCm, or CPU-only optimizations for LLMs.

How we work

Everything as code. PRs, reviews, and tests. Small batches. Trunk-based or short-lived branches. Clear runbooks and on-call rotation where needed. Measure, alert, fix, and improve.

 

Our commitment to diversity & inclusion

We are genuinely passionate about inclusion and we support individuals of all groups; we do not discriminate on the basis of race, religion, gender, sexual orientation, or disability status. 

 

 

Confirm your E-mail: Send Email