The Company
NorthMark Compute & Cloud (NMC²) is backed by dedicated leadership and investment, with a clear mission as it operates at the bleeding edge of technology. Its goal is to scale and enhance the high-performance computing (HPC) and cloud infrastructure that supports its clients' research, production, and delivery, enabling breakthroughs that shape the industries of tomorrow. Its engineers build critical infrastructure to eliminate friction in scientific research, simulations, analysis, and decision-making, accelerating discovery and driving faster innovation.
The Position
As an HPC Network Engineer you will play a critical role in leading the design, deployment and scaling of large multi-vendor, multi-protocol data centers and high-performance compute networks. In this role, you will have broad responsibilities and freedom to analyze and solve complex problems that will require you to identify emerging technology solutions. You will work with vendors to influence roadmaps to deliver features necessary to meet the business’ requirements for scale and performance.
This role will require deep technical skills as you provide technical leadership to drive strategy and improvements, ensuring our network meets the demands for high performance, scalability and security. You will make a significant impact by helping solve complex challenges in a dynamic, fast-moving and highly innovative environment.
Responsibilities:
Designing, deploying and scaling large multi-vendor, multi-protocol data centre and high-performance compute networks
Creating a framework for optimizing performance and observability of HPC networks at scale
Identifying emerging technology solutions and partnering with vendors and the technical community to influence improvements and meet the performance demands of next generation networks
Optimizing for robust, high throughput and low latency network infrastructure frameworks at scale to reduce the impacts of scaling challenges, such as data gravity
Requirements:
You will be passionate about automation and technology, particularly in the infrastructure space. The successful candidate should be willing to cross-train in multiple IaC technologies and methodologies.
The ideal candidate will have experience:
Demonstrable experience designing and implementing scalable, large-scale Ethernet and InfiniBand networks to support HPC architectures
Define and deliver measurable strategies at scale to optimize performance of network equipment to achieve performance requirements as defined for high performance compute and storage systems
Experience supporting multi-vendor environments as well as exploring new platforms to deliver best in class services
Experience delivering scalable solutions in a hybrid/multi-cloud ecosystem, integrating on-prem, private and public cloud solutions
In-depth knowledge of network routing and switching technologies, such as BGP, OSPF, EVPN and VXLAN, and implementing best practices for them
Proficiency with Linux-based environments and in scripting languages, such as Python, Bash or PowerShell for automation
In-depth knowledge of RoCE and InfiniBand protocols
Excellent ability to distill technically complex and complicated issues into clear and concise terms