Together AI is building the next-generation AI compute platform, with networking at its core.
As a Network Architect, you will define and evolve the global network architecture powering AI training, inference, and research platforms.
You will own routing, topology, traffic engineering, and control-plane strategies across data centers, cloud environments, and backbone fabrics.
The role involves designing high-bandwidth, low-latency, fault-tolerant networks supporting large-scale GPU clusters and HPC-style compute fabrics.
Responsibilities include establishing topology strategies, traffic engineering, multicloud interconnects, control-plane architecture, observability primitives, and guiding emerging technologies.
You will work closely with infrastructure, compute, storage, hardware, and operations teams to meet performance demands, support distributed workloads, and ensure supportability and resilience.
Mentorship and strategic influence across the organization are key aspects of this role.
🎯
Requirements
Deep experience designing and operating large-scale GPU clusters or HPC-style compute fabrics, understanding their network demands.
Fluent in building high-throughput data center fabrics supporting tens of thousands of GPUs and multi-terabit east–west traffic.
Experience designing or operating RoCEv2 or lossless Ethernet environments at scale, including PFC/ECN tuning and congestion control.
Experience designing backbone and DCI architectures supporting GPU training clusters across multiple regions and fabrics.
Led network architecture for multi-cloud, private backbone, and diverse PoP environments, understanding AI workload behaviors.