The Company
NorthMark Compute & Cloud (NMC²) is backed by dedicated leadership and investment, with a clear mission as it operates at the bleeding edge of technology. Its goal is to scale and enhance the high-performance computing (HPC) and cloud infrastructure that supports its clients' research, production, and delivery, enabling breakthroughs that shape the industries of tomorrow. Its engineers build critical infrastructure to eliminate friction in scientific research, simulations, analysis, and decision-making, accelerating discovery and driving faster innovation.
The Position
As an HPC Network Solutions Architect, you will design, integrate, and optimize high-performance networking architectures that form the backbone of HPC, AI/ML, and data-intensive workloads. You will act as a trusted advisor to customers, guiding them across the entire solution lifecycle — from requirements gathering and design, through proof-of-concept and deployment, to optimization and long-term adoption.
This is a customer-facing, technically focused role. You will collaborate closely with customers to align low-latency, high-bandwidth networking designs with their workload requirements, while also working with internal engineering and product teams to influence roadmap priorities. Your role will bridge the gap between cutting-edge networking technologies (InfiniBand, RoCE, EVPN, VXLAN) and real-world HPC adoption at scale.
This position offers the opportunity to shape the future of HPC networking, deliver measurable impact for customers, and influence vendor ecosystems by incorporating emerging innovations into enterprise-ready solutions.
Responsibilities
Act as the primary networking SME for customers adopting or scaling HPC environments.
Partner with customers to capture network performance goals, scalability requirements, and integration constraints.
Design and document end-to-end HPC network architectures, including Ethernet, InfiniBand, RoCE, EVPN, and VXLAN fabrics.
Lead proof-of-concept and benchmarking engagements, validating low-latency and high-throughput designs against workload requirements.
Optimize multi-vendor, multi-protocol data center and HPC interconnects, addressing scaling challenges such as data gravity and throughput bottlenecks.
Define integration strategies across compute, storage, orchestration, and security layers to deliver resilient, workload-aware solutions.
Conduct network performance assessments and tuning, identifying bottlenecks and recommending enhancements.
Build observability frameworks for HPC networks at scale using tools like Prometheus, Grafana, and vendor telemetry.
Collaborate with engineering, product, and operations teams to refine architecture blueprints and ensure consistent delivery.
Partner with ecosystem vendors (e.g., NVIDIA, Mellanox, Cisco, Arista) to integrate cutting-edge features and influence roadmap evolution.
Stay current with emerging HPC networking technologies and protocols, providing future insight to customers on adoption strategies.
Represent the organization at customer design sessions, workshops, and industry events, building strong technical relationships.
Requirements
Demonstrated experience in HPC networking solution architecture, systems design, or data center network engineering.
Strong expertise in InfiniBand and RoCE protocols, including deployment and tuning at scale.
Hands-on experience designing and implementing large-scale Ethernet networks, including BGP, OSPF, EVPN, and VXLAN.
Deep understanding of GPU communication frameworks such as MPI and NCCL, and their integration with HPC interconnects.
Proficiency with Linux-based environments and scripting (e.g., Python, Bash, PowerShell) for automation.
Experience supporting multi-vendor environments and evaluating new networking platforms.
Ability to translate complex networking requirements into clear solution architectures and present them effectively to customers.
Strong customer-facing communication skills, including the ability to engage executives and technical stakeholders alike.
Preferred Experience
Experience delivering HPC or AI/ML workloads across large-scale, low-latency network infrastructures.
Familiarity with CNI plugins (Multus, Cilium, NVIDIA CNI) for HPC/Kubernetes environments.
Exposure to automation and infrastructure-as-code practices for network provisioning (Terraform, Ansible).
Experience in vendor collaboration, including influencing feature roadmaps and participating in joint evaluations.
Contributions to open-source HPC networking or infrastructure projects.
Bachelor’s or Master’s degree in Computer Science, Networking, Engineering, or a related technical field.
Relevant Networking and systems certifications such as Cisco CCNP/CCIE, Juniper JNCIP, AWS Advanced Networking Specialty, or Red Hat RHCE.