Job Title
QE Architect – AI / LLM Systems
Role Summary
We are looking for a visionary QE Architect – AI/LLM Systems to architect, define, and drive the overall Quality Engineering (QE) strategy for next-generation AI products. This role will focus on building scalable quality architectures, AI evaluation frameworks, and automated testing pipelines to ensure reliable, safe, and high-quality AI-driven user experiences.
The ideal candidate will bring strong thought leadership and deep technical expertise, working at the intersection of AI/ML, LLM systems, software engineering, and quality governance.
Key Responsibilities
1. QE Architecture & StrategyDefine and own the end-to-end quality architecture for all AI and LLM initiatives across the organization.
Design enterprise-level QE frameworks and reusable components for:
Conversational AI applications and chatbots
Knowledge-management bots and RAG systems
Semantic and vector-based text search
Image search and multimodal AI systems
Generative AI platforms
Establish scalable testing pipelines for model evaluation, data validation, and automation.
2. AI / LLM Evaluation FrameworksArchitect comprehensive evaluation systems for:
Prompt testing and scenario-based validation
LLM output quality, safety, bias, and consistency
Hallucination detection and mitigation
RAG correctness, grounding accuracy, and knowledge integrity
Search relevance and ranking metrics
Build automated scorecards and continuous evaluation dashboards.
3. Automation & InfrastructureDesign and implement automation frameworks for:
LLM APIs and chat agents
Multimodal AI pipelines
Vector databases and semantic search services
Architect model regression detection using:
Golden datasets
Synthetic test data generation
LLM-as-a-Judge approaches
Self-evaluation and multi-agent evaluation techniques
Integrate AI test harnesses into CI/CD and LLMOps pipelines.
4. Data Quality & Test Data StrategyDefine enterprise-wide AI test data management strategies, including:
Ground-truth datasets
Benchmark datasets
Adversarial and edge-case inputs
Safety and compliance-focused test scenarios
5. Architecture Reviews & Cross-Team LeadershipProvide architectural guidance to ML engineers, data engineers, and software teams on testability and observability.
Review AI system architectures, including model pipelines, chatflows, orchestration layers, and search systems.
Drive quality gates across experimentation, pre-production, and production rollout cycles.
6. Quality Governance & Best PracticesEstablish enterprise standards for:
AI testing taxonomies and methodologies
Privacy, safety, and compliance validation
Defect classification for LLM-specific issues
Reliability, latency, and scalability benchmarks
Lead adoption of AI/ML Quality Engineering best practices across teams.
Required Qualifications
10+ years of experience in Quality Engineering, with at least 3+ years in AI/ML/LLM systems.
Strong understanding of:
Large Language Models (LLMs), NLP, embeddings, and vector databases
Chatbot platforms such as Dialogflow, Rasa, Botpress, Amazon Lex, etc.
RAG pipelines and knowledge-management systems
Image search and multimodal AI architectures
Strong programming experience in Python, Java, or TypeScript, with ML/NLP libraries.
Proven experience building CI/CD-integrated AI test automation frameworks.
Hands-on knowledge of AI evaluation metrics such as:
Perplexity, factuality, grounding score
CER/WER, BLEU, ROUGE
MRR, NDCG, search relevance metrics
Model drift and performance stability metrics
Experience handling non-deterministic testing, probabilistic evaluation, and AI quality challenges.
Proven ability to architect and scale enterprise-grade QE systems.
Core Skills
AI / ML / LLM Systems • QE Architecture • Python / Java / TypeScript • CI/CD & LLMOps • Automation Frameworks • RAG & Vector Search • AI Quality Metrics