ML / AI Engineer

Join a world-leading quantitative trading fund as they expand their next-gen machine learning research platform. You'll shape core infrastructure, partner closely with researchers and drive high-impact engineering across large-scale, GPU-accelerated workloads.

Selby Jennings - Hong Kong - Full time

Salary: Negotiable



Responsibilities

  • Lead the design and development of a scalable, reliable, and reproducible machine learning research platform.
  • Build infrastructure to support large-scale experimentation, model training, and simulation across both on‑premise high‑performance compute environments and multi‑cloud setups.
  • Work closely with researchers to understand evolving workflows and translate those needs into robust platform capabilities.
  • Architect and optimize distributed training pipelines for high-throughput, GPU‑accelerated workloads.
  • Enhance experiment management, model versioning, artifact tracking, and data lineage to ensure transparent and repeatable research processes.
  • Develop tools and frameworks that improve feature engineering, dataset creation, and large-scale backtesting.
  • Drive initiatives to improve compute efficiency, resource allocation, and workload isolation across heterogeneous environments.
  • Enhance platform observability with improved metrics, logging, tracing, and debugging capabilities tailored to ML and distributed systems.
  • Support rapid iteration by delivering features and fixes quickly while maintaining strong engineering standards.
  • Contribute to long-term architectural planning to ensure the platform scales with growing data volumes and model complexity.


Qualifications

  • 2+ years of experience designing and building distributed systems at scale, ideally supporting research or data-heavy workloads.
  • Strong programming skills in Python with a focus on clean, maintainable, high-performance code.
  • Experience running applications on Linux-based HPC clusters and/or cloud computing platforms.
  • Solid understanding of distributed computing, parallel processing, and resource management.
  • Hands-on experience with GPU workloads and familiarity with modern ML frameworks such as PyTorch, TensorFlow, or JAX.
  • Experience optimizing data pipelines and handling large structured and unstructured datasets.
  • Strong debugging skills with the ability to diagnose issues across multiple layers of the stack.
  • Comfortable working independently in a fast-paced, research-oriented environment.
  • Strong communication skills and experience collaborating directly with researchers or data-focused teams.


Preferred Attributes

  • Experience building internal ML platforms or research tooling at scale.
  • Familiarity with experiment‑tracking tools, workflow orchestration systems, and model lifecycle management.
  • Experience with containerization and orchestration technologies (e.g., Docker, Kubernetes).
  • Exposure to high-performance or latency-sensitive domains such as quantitative research, simulation systems, or large‑scale distributed compute.

23975008
Ad