Can work in Plano, TX or Camas, WA
As a Machine Learning Operations Engineer, you will design, implement, and maintain the systems that bridge the gap between machine learning development deployment. You will work with data scientists, data engineers, and platform teams to ensure models are monitored, versioned, governed, and continuously improved. You will report to the Vice President, Technology Innovation.
The Day-to-Day:
- Build, maintain, and improve Machine Learning pipelines for training, testing, deployment, and monitoring
- Develop CI/CD workflows tailored for ML environments, including model versioning and reproducibility
- Implement monitoring systems for model drift, performance, and reliability
- Automate retraining and deployment workflows using Kubernetes, Docker, and cloud services
- Collaborate with security and governance teams to ensure compliance with internal and external regulations
- Partner with data scientists to translate experiments into production-ready pipelines
- Contribute to the evolution of scalable AI platforms using Azure Machine Learning, NVIDIA NIMs, and NeMo services
Your Qualifications:
- 10+ years experience developing data-related solutions and software
- 5+ years of experience with Machine Learning Operations, DevOps, or related disciplines
- 5+ years of proficient experience with Python, strong experience in ML frameworks (TensorFlow, PyTorch, Scikit-learn)
- Hands-on expertise with Kubernetes, Docker, and CI/CD tools (Azure DevOps, Jenkins, GitHub Actions)
- Experience with monitoring tools (Prometheus, Grafana, MLflow, Weights & Biases)
- Deep knowledge of cloud-native AI services, especially Microsoft Azure AI
- Practical experience with NVIDIA NIMs and NeMo services for deployment and fine-tuning of foundation models
- Familiarity with model governance, audit, and compliance frameworks
- Bachelor's degree in Computer Science, Data Science, or equivalent work experience
#DIG10-NY
|