AI Engineer – Speech Modelling & Quality (STT / TTS)
Location: Bangalore/Mumbai/Hyderabad/Gurgaon/Indore
Work from Office
Role Overview
The Speech Modelling & Quality Senior Engineer is responsible for end-to-end ownership of
speech quality delivered by the Indic Speech AI platform. This role directly determines how
accurately speech is recognized and how natural, intelligible, and expressive synthesized
speech sounds across all supported Indic languages.
This role exists to ensure that improvements in model capability translate into measurable,
sustained gains in real-world user experience, and that quality does not regress as the
platform scales, new languages are added, or models are upgraded.
This role owns outcome-level quality, not just model execution.
Core Responsibilities
The role defines and owns quality metrics for speech-to-text and text-to-speech systems,
including word error rate, substitution and deletion patterns, punctuation accuracy,
pronunciation correctness, prosody, intelligibility, and naturalness.
The role performs deep error analysis across languages, accents, acoustic conditions,
device types, and usage contexts to identify systematic weaknesses in speech recognition
and synthesis.
The role drives language-specific optimization strategies, ensuring that each Indic language
is tuned independently and not treated as a secondary outcome of multilingual training.
The role collaborates with ML engineering and training teams to define data requirements,
sampling strategies, and curriculum approaches required to improve quality.
The role ensures that improvements in one language or model dimension do not introduce
regressions in others, enforcing strict quality isolation and regression testing.
The role validates that training gains are preserved through inference, ensuring no quality
loss due to quantization, batching, streaming, or runtime optimizations.
Operational Ownership
The Speech Modelling & Quality Lead owns quality regressions in production. If recognition
accuracy drops, synthesized speech quality degrades, or users experience noticeable
deterioration, this role is accountable.
The role owns the pre- and post-release quality validation process, including baseline
comparisons, A/B evaluations, and rollout gating criteria.
The role is responsible for ensuring that model upgrades, retraining, or data changes do not
negatively impact user-facing quality metrics.
The role participates in incident analysis when customer complaints, usage drop-offs, or
monetization anomalies are traced back to speech quality issues.
Key Interfaces
This role works closely with the PyTorch & Python ML Engineering Lead to translate quality
findings into concrete model changes.
The role interfaces with the PyTorch Lightning Training Lead to ensure training strategies
align with quality improvement goals.
The role collaborates with the GPU Inference Optimization Lead to ensure inference
optimizations do not compromise quality.
The role works with Language Guardrails teams to ensure safety mechanisms do not distort
or degrade speech output unintentionally.
The role coordinates with Monetization Analytics & Billing teams when quality changes
correlate with usage or revenue shifts.
Explicit Non-Responsibilities
This role does not own training infrastructure, GPU scheduling, or Kubernetes operations.
This role does not own raw ML pipeline implementation or inference service engineering.
This role does not define system architecture or networking Behaviour.
Role Expectation
The Speech Modelling & Quality Lead is expected to operate with a user-centric and
language-centric mindset, treating speech quality as the primary product outcome.
Success in this role is measured by:
Sustained reduction in word error rates
Improved naturalness and intelligibility of synthesized speech
Language-specific quality leadership rather than averaged performance
Absence of silent quality regressions in production
Clear correlation between quality improvements and user adoption or retention
…