Company: Genzeon Global
Location: Hyderabad
Job Description:
Data Scientist
Job Overview We are seeking an experienced Data Engineer with strong AI engineering expertise to design, build, and optimize large-scale healthcare data platforms. The ideal candidate will have hands-on experience in data pipelines, orchestration frameworks, EHR data integration, and AI/ML model enablement—driving the transformation of healthcare data into actionable intelligence.
Key Responsibilities
- Architect, build, and optimize scalable ETL/ELT pipelines to process structured and unstructured healthcare data.
- Design data systems that directly support AI/ML workflows, including feature engineering, data versioning, and real-time model serving.
- Ingest and normalize healthcare data from EHR systems (Epic, Cerner, etc.) with expertise in FHIR, HL7, and Epic Clarity extracts, ensuring HIPAA/HITRUST compliance.
- Implement and manage workflows using Airflow, Prefect, or similar orchestration tools.
- Collaborate with data scientists, ML engineers, and backend developers to deliver high-quality AI-ready datasets.
- Implement monitoring, anomaly detection, and automated quality checks for large-scale healthcare datasets.
Minimum Qualifications
- Bachelor’s degree in Computer Science, Data Engineering, AI/ML, or related field.
- 4+ years of experience in data engineering with proficiency in Python, MongoDB, and distributed data systems.
- Proven track record in building data pipelines for healthcare/EHR systems.
- Experience with data lakes, cloud storage (Azure, AWS, or GCP), and scalable data architectures.
Highly Preferred Qualifications
- Expertise in AI/ML data pipelines, including feature store design, model input pipelines, and real-time data streaming.
- Proficiency with Spark, Databricks, or equivalent big data tools.
- Hands-on experience with Azure Data Factory, Synapse, and AI/ML ecosystem tools.
- Experience with RESTful APIs and microservices for exposing healthcare/AI data assets.
- Knowledge of data quality, lineage, and governance frameworks (e.g., Great Expectations, DataHub).
- Familiarity with vector databases, embeddings, and LLM integration for AI use cases
…
Posted: March 26th, 2026