Senior Platform Engineer - Kubernetes + MLOps at Harrison AI

AI Platform, Full-time, Sydney sydney engineering full-time
Posted 6 months ago

About your role

As a Senior DevOps/Linux Engineer working within the Harrison Platform team, you will be working in a small team of Solutions Architects and DevOps engineers to support and deliver components of the Harrison Platform.

The Harrison Platform is "the machine that builds the machine". It is a common toolset for building AI as a Medical Device solutions; a MLOps platform if you will. It is used by our ventures to accelerate, enhance and simplify their model development. A key component of the Platform is our physical Machine Learning Training Cluster, which is based on NVidia A100 DGX's. Your role will focus on the physical Machine Learning Training Cluster and the Kubernetes based software stack it hosts.

What you'll do:

  • Manage support requests
  • Perform maintenance tasks such as software upgrades
  • Liaise with vendors as required on support issues
  • Provide assistance in developing and deploying new features and improvements to the cluster. This covers both services in the physical datacenter and within AWS
  • Write end user technical documentation
  • Deploy the infrastructure stacks for Platform (MLOps) related services as they are developed, using Terraform and Ansible.
  • Develop, update and improve upon variety of Terraform modules for venture and internal consumption
  • Occasionally be required to visit the datacenter in Sydney should the need arise. (Note that being based in Sydney is not a requirement).
  • What we're looking for:

  • Linux administration skills
  • Kubernetes knowledge and experience.
  • Familiarity with physical datacenter environments and ideally have hands-on datacenter / physical infrastructure experience
  • Working knowledge of TCP/IP networking.
  • Exposure to CI/CD pipelines
  • Exposure to AWS
  • Familiarity or experience in Infrastructure as Code tooling such as Terraform or Ansible
  • Bash and Python scripting