AI/ML Data Engineer
Details
Contact
Job Description
We are looking for an AI/ML Data Engineer to work for our client. The ideal candidate aligns with the responsibilities and qualifications outlined below.
About the Role
Our client is seeking an AI/ML Data Engineer to design, build, and scale modern data and machine learning platforms. You’ll partner with data scientists and product/engineering teams to develop reliable data pipelines, productionize ML models, and enable robust MLOps practices—driving measurable business impact from data and AI.
Responsibilities
- Design and build cloud-scale data pipelines (batch & streaming) using tools such as Spark/Databricks, Airflow/Prefect, Kafka/Kinesis, and DBT
- Develop feature engineering workflows, feature stores, and reusable data assets for ML
- Productionize models with ML pipelines (training, evaluation, deployment) using MLflow/SageMaker/Vertex AI/Azure ML
- Implement MLOps best practices: CI/CD for data & ML, automated testing, model registry, canary/blue‑green deployments
- Build monitoring for data quality, model performance (drift, bias, accuracy), and platform health
- Optimize data storage and compute performance (partitioning, Z‑ordering, indexing, caching)
- Enforce data governance & security (IAM, secrets management, lineage, PII handling)
- Collaborate with data scientists, analysts, and software engineers to translate requirements into scalable solutions
- Create documentation and provide enablement for stakeholders and downstream consumers
Qualifications
- 5+ years of experience in data engineering or ML engineering, including production systems
- Strong skills in Python and SQL; experience with Spark (PySpark) and Databricks or similar platforms
- Hands-on experience with one major cloud (AWS, Azure, or GCP) and services for data & ML (e.g., S3/ADLS/GCS, EMR/Databricks, SageMaker/Azure ML/Vertex AI)
- Workflow orchestration (e.g., Airflow, Prefect, Dagster) and CI/CD (GitHub Actions, Azure DevOps, GitLab CI)
- Containerization and deployment with Docker; Kubernetes experience is a plus
- Streaming data experience (e.g., Kafka, Kinesis, Pub/Sub) strongly preferred
- Knowledge of MLOps frameworks (MLflow, model registries), testing (unit/integration), and observability (metrics, logging)
- Familiarity with data quality frameworks (Great Expectations, Deequ) and governance (Lakehouse/medallion architecture, lineage)
What Our Client Offers
- Competitive compensation with performance bonus
- Modern data stack (Databricks/Spark, Airflow/Prefect, MLflow) and green‑field build opportunities
- High visibility to leadership and ownership over ML platform decisions
- Professional development budget (certifications, conferences, courses)