The AI Research Division of Agile Robots is looking for an ML Platform Engineer (m/f/d), who will build and operate the distributed training, deployment, and experimentation infrastructure that research, data, and robotics teams depend on to move models from prototype to production.
ML Platform Engineer (m/f/d)
About the role
Your Responsibilities
- Training Infrastructure: Design and scale distributed training workflows for large models using tools such as PyTorch Distributed, DeepSpeed, and cluster schedulers like SLURM or Kubernetes.
- ML Platform: Build and maintain containerised ML environments that support reproducible experimentation and benchmarking.
- CI/CD Pipelines: Develop and maintain CI/CD pipelines for machine learning systems to enable reliable testing, training, and deployment of models.
- Lifecycle Management: Implement experiment tracking, model versioning, and reproducibility workflows using tools such as ClearML or Weights & Biases.
- Observability: Set up monitoring systems such as Prometheus and Grafana to track model performance and system health and detect drift in production.
- Cross-Team Collaboration: Work with research, data, and robotics teams to connect new models to robust production systems.
Essential Skills
- Background and Experience: Degree in Computer Science, Software Engineering, or a related field, with professional experience building and operating ML or software infrastructure in production.
- Distributed Training: Experience designing and operating distributed training systems on Kubernetes and Docker, using PyTorch Distributed, DeepSpeed, and schedulers such as SLURM.
- CI/CD for ML: Experience building CI/CD pipelines that support reliable model testing, training, and deployment.
- Cloud Infrastructure: Experience operating ML workloads on cloud infrastructure, preferably AWS.
- Experiment Tracking: Hands-on experience with experiment tracking and model versioning using tools such as MLflow or Weights & Biases.
- Observability: Experience with monitoring and drift detection using tools such as Prometheus and Grafana.
- Software Engineering: Python and system design skills, with experience building and operating ML systems beyond the prototype stage.
Beneficial Skills
- Multimodal Systems: Experience with large-scale or multimodal ML systems such as vision-language-action models.
- Infrastructure As Code: Familiarity with infrastructure-as-code tools such as Terraform.
- ML Orchestration: Experience with ML pipeline and orchestration tools.
- Distributed Compute: Exposure to high-performance or distributed compute environments.
What we offer
- Dynamic high-tech company combined with financial soundness and world class investors.
- Join an interdisciplinary, international team with 60+ different nationalities in a collaborative work environment.
- Lots of development opportunities in the context of our continued growth.
- Challenging tasks and impactful projects alongside experts that enable professional and personal growth.
- Corporate Benefits Program that covers health, mobility and learning with 100 € net per month.
- Modern office facilites with a rooftop terrace overlooking Munich, free drinks & fruits, and regular company events contribute to a good working environment.
About us
Agile Robots SE is an international high-tech company based in Munich, Germany with a production site in Kaufbeuren and more than 2300 employees worldwide. Our mission is to bridge the gap between artificial intelligence and robotics by developing systems that combine state-of-the-art force-moment-sensing and world-leading image-processing technology. This unique combination of technologies allows us to provide user-friendly and affordable robotic solutions that enable intelligent precision assembly.
This is made possible by our employees, who bring out the best in each and every day with creativity and enthusiasm. Become part of this team and shape the future of robotics with us!
We are proud of our diversity and welcome your application regardless of gender and sexual identity, nationality, ethnicity, religion, age, or disability.
