AI Test Engineer (Generative AI & LLM Focus)

Company: Mercedes-Benz

Location: Bengaluru

Job Description:

Tätigkeitsbereich:Forschung & Entwicklung incl. DesignFachabteilung:CCC & Product TrainingGesellschaft:Mercedes-Benz Research and Development India Private LimitedStandort:Mercedes-Benz Research and Development India Private Limited, BangaloreStartdatum:sofortVeröffentlichungsdatum:..6Stellennummer:MER3YArbeitszeit:Vollzeit BewerbenAufgaben

About MBRDI

Mercedes-Benz Research and Development India (MBRDI), headquartered in Bengaluru with a satellite office in Pune, is the largest R&D center for Mercedes-Benz Group AG outside of Germany. Our mission is to drive innovation and excellence in automotive engineering, digitalization, and sustainable mobility solutions, shaping the future of mobility.

Job Summary:

We are seeking an AI Test Engineer specializing in Generative AI and Large Language Models (LLMs) to lead the quality assurance of our next-generation intelligent applications. This role goes beyond traditional software testing—you will validate non-deterministic AI behavior, detect hallucinations, assess ethical compliance, and ensure the reliability and safety of our LLM-powered platforms.

You will develop advanced evaluation frameworks to measure semantic accuracy, factuality, and safety, ensuring robust model performance across diverse scenarios.

Here’s an expanded job description:

Key Responsibilities:

• Model Behavioral Testing: Design and execute evaluation tests to detect hallucinations, assess factual correctness, evaluate tone, and verify safety guardrails.

• Evaluation Framework Development: Build and maintain automated evaluation systems using frameworks like RAGAS, DeepEval, or Giskard to score LLM responses.

• Adversarial Red Teaming: Conduct proactive “red teaming” to identify vulnerabilities like prompt injection, jailbreaks, and data leakage.

• Data Quality Engineering: Validate the quality and diversity of training and validation datasets to prevent “garbage in, garbage out” scenarios.

• Drift & Performance Monitoring: Implement observability systems (using tools like Weights & Biases or Arize) to detect model and feature drift in production.

• Synthetic Data Generation: Use LLMs to generate high-fidelity synthetic test datasets to cover complex edge cases.

• Ethical & Bias Auditing: Run automated and manual checks to mitigate algorithmic bias and ensure compliance with the EU AI Act and other global regulations.

Required Skills:

• AI/ML Knowledge: Strong understanding of machine learning concepts, such as supervised vs. unsupervised learning, model evaluation metrics, neural networks, NLP, etc. Solid understanding of RAG (Retrieval-Augmented Generation) pipelines, vector databases (Pinecone, Weaviate), and transformer architectures.

• Statistical Competency: Ability to apply statistical validation (distributions, variance, confidence intervals) to evaluate non-deterministic system results.

• Automation & Testing Tools: Practical experience with Playwright, Selenium, PyTest, or equivalent automation frameworks.

• Programming: High proficiency in Python (essential for testing automation and data handling).

• Collaboration & Workflow Tools: Familiarity with tools such as Confluence, Jira, Xray, and ServiceNow.

Soft Skills:

• Proactive appearance and strong communication skills in German and English.

• Communicative, addressee-oriented, with stakeholders from different backgrounds and hierarchical levels.

• Organizational and coordination skills with the ability to coordinate timelines, resources, and dependencies.

• Ability to moderate & clarify professional, methodological and technical questions in the context of the release in interdisciplinary rounds.

• Ability to communicate in critical incident situations in a structured, addressee-oriented and solution-oriented manner.

• Provides visibility into escalation cases and confidently supports teams in root cause analysis and decision-making.

Qualifikationen

Bachelor’s degree in Computer Science & Engineering, Software Engineering, Information Technology, Data Science, or a related discipline.3+ years of experience in QA, data validation, or ML testing.Hands-on exposure to at least one end-to-end AI pipeline (data → model → evaluation → deployment).Prior involvement in testing or validating AI-powered features, ML models, or data-centric workflows.Experience testing autonomous AI agents and multi-step workflow reasoning.Familiarity with AI observability stacks.Certifications in AI Testing, Machine Learning, or Quality Assurance.

Why Join Us?

· Be part of a purpose-driven organization that is shaping the future of mobility

· Work on cutting-edge technologies and global projects

· Thrive in a collaborative, diverse, and inclusive environment

· Access world-class infrastructure and continuous learning opportunities

Equal Opportunity Statement

At MBRDI, we are committed to diversity and inclusion. We welcome applications from all qualified individuals, regardless of gender, background, or ability.

Benefits Mitarbeiterrabatte möglich Gesundheitsmaßnahmen Mitarbeiterhandy möglich Essenszulagen Betriebliche Altersversorgung Hybrides Arbeiten möglich Mobilitätsangebote Mitarbeiter Events Coaching Flexible Arbeitszeit möglich Kinderbetreuung Parkplatz Kantine, Café Gute Anbindung Barrierefreiheit Betriebsarzt

We need your consent to load the Youtube service!

We use a third party service to embed video content that may collect data about your activity. Please review the details and accept the service to watch this video.

This content is not permitted to load due to trackers that are not disclosed to the visitor. The website owner needs to setup the site with their CMP to add this content to the list of technologies used.

Posted: February 20th, 2026

Similar Jobs: