You will be part of a high-performance platform engineering team focused on optimizing and deploying advanced neural networks on a next-generation compute architecture. This role sits at the intersection of machine learning, systems, and algorithm design, with a strong focus on extracting maximum performance from hardware.
You will be responsible for:
- Driving the lowering and optimization of cutting-edge deep neural networks on a specialised compute platform
- Applying mathematical and algorithmic optimisation techniques to solve complex, large-scale problems
- Designing heuristics and optimisation strategies for computationally intensive and NP-hard challenges
- Collaborating with software teams to optimise graph-based execution and performance
- Continuously improving performance through a deep understanding of hardware-software interaction
Ideal Candidate
- You have an MS or PhD in Computer Science or a related field with 8+ years of industry experience
- You have a strong background in numerical and/or algorithmic optimisation
- You have experience designing heuristics for complex or NP-hard problems
- You have knowledge of both classical and machine learning algorithms (e.g. Computer Vision, DSP, Deep Learning)
- You have a strong foundation in graph algorithms and related data structures
- You are comfortable working at the intersection of systems, performance, and machine learning
- Proficiency in C++ (C++11 or later) would be preferred
- Prior experience with frameworks such as TVM or MLIR, and knowledge of front-end and back-end compiler techniques, is a plus
The Offer
- Competitive compensation with meaningful equity
- High-impact role in a deeply technical, low-bureaucracy environment
- Opportunity to work on cutting-edge compute systems and long-term career growth
About the employer
Our client is a Silicon Valley based deep-tech company building a new compute architecture for real-time AI at the edge. Founded by engineers from leading research backgrounds, the focus is on solving the gaps in current neural processing approaches through the tight integration of hardware and software.
The platform is built to run both neural network inference and conventional compute workloads efficiently across a wide range of edge devices. Unlike typical accelerators that only handle parts of an ML graph, this architecture supports end-to-end execution, including both neural network graph code and standard C++ DSP and control code, enabling greater flexibility and performance in real-world deployments.
…