The optimisation team is responsible for the research and development of the machine learning services that are used across the Rokt network. To help our team of data scientists quickly and efficiently run experiments and deploy models to production, we are looking for a Machine Learning Infrastructure Engineer to maintain and improve our Kubeflow cluster, as well as the various other Kubernetes infrastructure that supports our machine learning services. In this role you will be the authority on all of our machine learning related infrastructure, and additionally you will contribute to software engineering related tasks when needed.
- Maintenance and improvement of our Kubeflow cluster. This includes ensuring security of the cluster, handling version upgrades, etc.
- Provide subject matter expertise on Kubeflow to our team of data scientists. This will include writing documentation and hosting workshops as necessary.
- Maintenance and development of Kubernetes infrastructure for our real-time prediction systems.
- Implementation of automated monitoring of the machine learning models being used in production so that we are alerted to problems quickly.
- Ensure stability and availability of our services. This involves being on call for production incidents outside of working hours.
- Implement best practice of CICDCT (continuous integration, continuous delivery, continuous training) in our machine learning lifecycle. This includes end to end pipeline orchestration to support distributed model experimenting, training, serving and monitoring.
- Good understanding of software engineering principles. Development experience with Python and Golang is a plus.
- Experience in an SRE role or similar. Experience with Kubernetes and Kubeflow in a production environment is a massive plus.
- Be motivated, self-driven in a fast (we truly mean fast) paced environment with a proven track record demonstrating impact across several teams and/or organisations.
- BS degree in Computer Science, similar technical field of study or equivalent practical experience.
- 4+ years of working experience in software development or infrastructure support roles.