NorthMark Compute & Cloud (NMC²) is backed by dedicated leadership and investment, with a clear mission as it operates at the bleeding edge of technology. Its goal is to scale and enhance the high-performance computing (HPC) and cloud infrastructure that supports its clients' research, production, and delivery, enabling breakthroughs that shape the industries of tomorrow. Its engineers build critical infrastructure to eliminate friction in scientific research, simulations, analysis, and decision-making, accelerating discovery and driving faster innovation.
The Position
The Emerging Network Architect role focuses on evaluating, designing, and integrating next-generation networking technologies to support high-performance computing (HPC) platforms. This position works at the intersection of compute, storage, and networking to assess new architectures, benchmark performance, and guide technology selection aligned with scalability, efficiency, and workload requirements. Partnering closely with vendors, internal architects, and customers, the role helps translate evolving network capabilities into practical, production-ready HPC solutions that can be validated in lab environments and deployed at scale across advanced research and compute systems
Responsibilities:
Explore opportunities from the industry to utilize future network architectures and technologies that will increase key HPC metrics like Model Flops Utilization, and Performance per Watt
Develop and maintain strong technical partnerships with leading vendors in networks to incorporate their future roadmaps in our HPC platform and architectures
Recommending and justifying hardware and software solutions aligned with performance, efficiency and scalability objectives
Work with customers to understand their HPC current and future workloads and requirements and the impact on our models and performance benchmarks
Evaluate the incoming hardware and software sufficient to verify the systems in our own environment and lab setups
Aid in bottleneck identification and performance evaluation done within the team for new hardware especially as it pertains to networks, such as latency/bandwidth modelling
Collaborate with storage and compute architects to stitch together the individual vendor’s pieces to achieve a complete HPC solution
Contribute technical guidance and support to other internal teams responsible for standing up the chosen architectures at scale
Constantly evaluate and stay current on the existing and future HPC landscape from proven vendors to start-ups in exploring the best and brightest ideas and products in this space
Influencing vendor roadmaps through feedback, joint initiatives and technology evaluations
Requirements:
Bachelor's Degree or equivalent experience
Deep expertise in network architectures and interconnect topologies with demonstrable experience working on these products for HPC
Hands on experience with high-speed fabric solutions, particularly InfiniBand and NVLink, but also including Ethernet (RoCE) and Omni-Path, etc..
Expertise in various forms of packet switching, routing algorithms, flow control, and congestion management and can adapt the right solutions for the highest network performance
An understanding of how high-speed networking impacts compute and experience with performance modelling and industry standard benchmarks like OSU, MPI, STREAM, etc..
Previous experience of being hands on in a lab environment through running benchmarks and test jobs on unproven hardware in a testing environment
Understanding of storage distributed and parallel file systems such as VAST, Lustre and particularly its needs and impact on the network performance of a system
Proven experience designing HPC clusters and parallel computing environments, with strong proficiency in Linux kernel tuning, system-level optimisations and performance profiling
Any experience working with emerging Network trends like CXL, PCIe Gen 6, DPU are not required, but strongly valued
Demonstrated success working directly with clients to capture technical requirements and deliver tailored, scalable system designs across AI/ML, scientific computing, and CFD workloads