Location
Open
Position Id
363
Job Type
Full-Time Regular

Our client is redefining how healthcare organizations access, trust, and act on patient data. Their AI-powered platform transforms fragmented clinical information into structured, traceable, and clinically meaningful intelligence that supports better care decisions across the healthcare ecosystem.

They are seeking an AI Curation Data Scientist to help expand the organization’s advanced health data extraction, normalization, and AI model training capabilities. This role is ideal for someone who thrives at the intersection of AI/ML engineering, healthcare interoperability, and large-scale data curation — and who wants to build systems that directly impact patient outcomes.

You’ll join a highly technical and collaborative remote team working on mission-critical initiatives involving EHR data processing, LLM training pipelines, de-identification workflows, and clinical data quality systems. The environment is fast-moving, deeply innovative, and focused on delivering reliable, production-grade AI solutions in healthcare.

What You’ll Do

  • Develop and optimize software pipelines for extracting and integrating structured and unstructured healthcare data
  • Build and maintain AI/ML workflows for data classification, normalization, and analysis
  • Train, fine-tune, and evaluate large language models and embedding-based systems
  • Curate and validate high-quality datasets used for LLM training and model improvement
  • Work with complex healthcare data formats including XML, JSON, FHIR, and C-CDA
  • Implement de-identification strategies and ensure compliance with PHI/PII handling policies
  • Design and execute data quality assessments, validation frameworks, and automated testing processes
  • Collaborate cross-functionally with engineering and product teams to improve scalability and system performance
  • Contribute to code repositories, testing infrastructure, and deployment best practices
  • Explore emerging AI methodologies and rapidly prototype innovative solutions in a highly iterative environment

What You’ll Need

Required Qualifications

  • Master’s degree or equivalent experience in Computer Science, Software Engineering, Statistics, Biology, or a related field
  • 5+ years of hands-on experience in AI/ML engineering, data science, software development, or predictive analytics
  • Strong experience training and tuning transformer models and LLMs
  • Significant experience curating datasets for AI model training
  • Advanced Python development experience, including building extraction, classification, or NLP tools
  • Hands-on experience with embeddings models, sentence transformers, and modern LLM tooling
  • Strong experience parsing and processing complex data formats such as XML and JSON
  • Familiarity with healthcare interoperability standards such as FHIR and/or C-CDA
  • Experience with TensorFlow, PyTorch, scikit-learn, or similar ML frameworks
  • Proficiency with Git and software development best practices
  • Experience developing unit and integration tests for scientific or healthcare-focused applications
  • Strong communication skills and ability to collaborate effectively within remote teams
  • A proactive, solutions-oriented mindset with a passion for building high-impact products

Preferred Qualifications

  • Deep understanding of regex and advanced text-processing techniques
  • Experience with Unix command-line tooling such as jq, xq, sed, and bash scripting
  • Strong AWS experience, particularly around data storage and AI training infrastructure tradeoffs
  • Experience working with HIPAA, PHI/PII handling, and healthcare de-identification strategies
  • Experience extending or customizing open-source AI tooling
  • Familiarity with AI-assisted coding workflows and tools such as GitHub Copilot, Claude Code, or similar platforms
  • Experience working across multiple programming languages and distributed technical teams

Why This Role

  • Opportunity to build AI systems that directly improve healthcare outcomes
  • Work alongside experienced experts in AI, software systems, molecular biology, and clinical medicine
  • High-impact role within a fast-growing and mission-driven environment
  • Exposure to cutting-edge challenges in healthcare interoperability, AI model training, and clinical data engineering
  • Collaborative culture that values innovation, ownership, and technical excellence
  • Fully remote flexibility with meaningful opportunities for growth and technical leadership

Let’s Talk

If you’re excited by the opportunity to apply advanced AI and machine learning techniques to real-world healthcare challenges — while working with a highly talented and mission-driven team — we’d love to connect.