Machine Learning Engineer (Audio & Video Models)

Company: Lexlegis AI
Apply for the Machine Learning Engineer (Audio & Video Models)
Location: Mumbai
Job Description:

 Key Responsibilities

Design, train, and optimize audio and video ML models, including classification, detection, segmentation, generative models, speech processing, and multimodal architectures.

● Develop and maintain data pipelines for large-scale audio/video datasets, ensuring quality, labeling consistency, and efficient ingestion.

● Implement model evaluation frameworks that measure robustness, latency, accuracy, and overall performance across real-world conditions.

● Work with product teams to transform research prototypes into production-ready models with reliable inference performance.

● Optimize models for scalability, low latency, and edge/cloud deployment, including quantization, pruning, and hardware-aware tuning.

● Collaborate with cross-functional teams to define technical requirements and experiment roadmaps.

● Monitor and troubleshoot production models, ensuring reliability and continuous improvement.

● Stay current with trends in deep learning, computer vision, speech processing, and multimodal AI.

Required Qualifications

● Bachelor’s or Master’s degree in Computer Science, Electrical Engineering, Machine Learning, or a related field (PhD a plus).

● Strong experience with deep learning frameworks such as PyTorch or TensorFlow.

● Proven experience training and deploying audio or video models, such as: ○ Speech recognition, speech enhancement, speaker identification

○ Audio classification, event detection

○ Video classification, action recognition, tracking

○ Video-to-text, lip reading, multimodal fusion models

● Solid understanding of neural network architectures (CNNs, RNNs, Transformers, diffusion models, etc.).

● Proficiency in Python, along with ML tooling for experimentation and production (e.g., NumPy, OpenCV, FFmpeg, PyTorch Lightning).

● Experience working with GPU/TPU environments, distributed training, and model optimization.

● Ability to write clean, maintainable production-quality code.

Preferred Qualifications

● Experience with foundation models or multimodal transformers (e.g., audio-language, video-language).

● Background in signal processing, feature extraction (MFCCs, spectrograms), or codec-level audio/video understanding.

● Experience with MLOps tools (e.g., MLflow, Weights & Biases, Kubeflow, Airflow).

● Knowledge of cloud platforms (AWS, GCP, Azure) and scalable model serving frameworks.

● Experience with real-time audio/video processing for streaming applications.

● Publications, open-source contributions, or competitive ML achievements are a plus.

Experience:

Min 2 years

Posted: February 11th, 2026