LazyApply Logo

LazyApply

How It WorksPricingReviewsAI Cover Letter

T

TogetherAid

1-10 employees

WebsiteLinkedIn
philanthropy
About TogetherAid

The premium domain name togetheraid.com is available for sale!

2 months ago

Network Architect

Full-time
Mid Level
Network Architect
Report problem

📋

Description
  • Together AI is building the next-generation AI compute platform, with networking at its core.
  • As a Network Architect, you will define and evolve the global network architecture powering AI training, inference, and research platforms.
  • You will own routing, topology, traffic engineering, and control-plane strategies across data centers, cloud environments, and backbone fabrics.
  • The role involves designing high-bandwidth, low-latency, fault-tolerant networks supporting large-scale GPU clusters and HPC-style compute fabrics.
  • Responsibilities include establishing topology strategies, traffic engineering, multicloud interconnects, control-plane architecture, observability primitives, and guiding emerging technologies.
  • You will work closely with infrastructure, compute, storage, hardware, and operations teams to meet performance demands, support distributed workloads, and ensure supportability and resilience.
  • Mentorship and strategic influence across the organization are key aspects of this role.

🎯

Requirements
  • Deep experience designing and operating large-scale GPU clusters or HPC-style compute fabrics, understanding their network demands.
  • Fluent in building high-throughput data center fabrics supporting tens of thousands of GPUs and multi-terabit east–west traffic.
  • Experience designing or operating RoCEv2 or lossless Ethernet environments at scale, including PFC/ECN tuning and congestion control.
  • Experience designing backbone and DCI architectures supporting GPU training clusters across multiple regions and fabrics.
  • Led network architecture for multi-cloud, private backbone, and diverse PoP environments, understanding AI workload behaviors.
  • Design experience with operational considerations: observability, capacity modeling, automation, telemetry, failure analysis.
  • Ability to set architectural direction in fast-evolving compute, storage, and network environments.

🏖️

Benefits
  • Competitive compensation
  • Startup equity
  • Health insurance
  • Other competitive benefits