Together AI to Co-Build Turbocharged NVIDIA GB200 Cluster with 36K Blackwell GPUs in Partnership with Hypertec Cloud : US Pioneer Global VC DIFCHQ SFO Singapore – Riyadh Swiss Our Mind

Today marks an exciting milestone in our journey to accelerate AI development. We’re thrilled to announce that we are co-building one of the world’s largest optimized GPU clusters, featuring 36,000 NVIDIA GB200 NVL72 GPUs, in partnership with Hypertec Cloud.

This partnership combines Together AI’s high-performance GPU Clusters and deep AI research expertise with Hypertec Cloud’s infrastructure compute and data center capabilities to deliver next generation infrastructure to accelerate training,  fine-tuning, and inference of large generative AI models, to optimize resource usage, reduce operational costs, and empower companies to push the boundaries of AI without compromising on performance, cost, or reliability.

Designed by AI researchers for AI innovators, Together GPU Clusters incorporate cutting-edge technology, including Blackwell Tensor Core GPUs, Grace CPUs, NVIDIA’s NVLink and Infiniband, and the Together Kernel Collection — a suite of unique set of powerful optimizations developed by our co-founder & Chief Scientist and FlashAttention creator, Tri Dao. This integrated hardware and software stack, tuned specifically for large-scale AI training and inference, enhances performance, scalability, and cost efficiency by optimizing GPU usage, thereby enabling efficient scaling for AI workloads, while reducing training time and operational costs.

“At Krea, we’re building a next-generation creative suite that brings AI-powered visual creation to everyone. Together AI provides the performance and reliability we need for real-time, high-quality image and video generation at scale. We value that Together AI is much more than an infrastructure provider – they’re a true innovation partner, enabling us to push creative boundaries without compromise.” – Victor Perez, Co-Founder, Krea

Accelerating AI Development at Scale

The partnership between Together AI and Hypertec Cloud delivers both immediate and long-term value for enterprises and AI innovators. This infrastructure backbone that will power the next generation of AI innovations enables the following customer benefits:

  • 36,000 NVIDIA Blackwell GPUs in GB200 NVL72 Cluster starting Q1 2025
  • Immediate access to thousands of H100 and H200 GPUs across North America
  • Secured data center capacity for over 100,000 GPUs throughout 2025
  • Industry-leading deployment times and reliability

Together AI and Hypertec: Massively Scaling AI Infrastructure

Hypertec Cloud brings decades of expertise in high-performance computing and data center operations, as they take on an increased role in our growing network of partners. At Together AI, we’re partnering with independent cloud providers like Hypertec, and deploying our inference stack on hyperscalers like AWS, aggregating and accelerating a massive network of global GPU compute. In essence, we are applying our research in order to create the AI acceleration supercloud — pulling together distributed cloud GPUs into a turbocharged, developer-friendly platform that maximizes performance for today’s generative AI workloads and scales seamlessly to meet tomorrow’s exponential demands.

“We are excited to announce this strategic partnership with Together AI and bring together our expertise to deliver next-generation high-performance AI solutions that are efficient and powerful,” said Jonathan Ahdoot, President of Hypertec Cloud. “With our large-scale secured data center capacity and commitment to sustainability, we will ensure that our joint customers can rapidly access highly optimized large AI clusters at scale while minimizing the impact on our planet.”

“We are excited to partner with Hypertec Cloud to expand our highly performant and reliable Together GPU Cluster footprint, serving the exponentially growing computational needs of our global customers,” said Vipul Ved Prakash, CEO of Together AI. “Through Hypertec’s strategically located data centers and Together AI’s fleet of GPU Clusters — featuring innovations like Together Kernel Collection — customers can now achieve industry-leading performance and cost-efficiency in training frontier models and running inference at scale.”

Performance that Powers Innovation

At the heart of our infrastructure lies the Together Kernel Collection (TKC), our proprietary optimization suite that pushes the boundaries of what’s possible with modern AI hardware. These optimizations aren’t just incremental improvements — they represent breakthrough advances in AI computation, delivering up to 24% faster training operations and a 75% boost in FP8 inference tasks. For our customers, this translates into significant reductions in GPU hours and operational costs.

In our tests of Black Forest Labs’ Flux model on the H100 GPU, we found that the Together Kernel Collection significantly reduces text-to-image generation time, cutting it from 6000 ms in the baseline setup to 3050 ms with BF16 precision and further down to 2230 ms with FP8 precision, showcasing substantial efficiency gains in inference speed.

Enterprise-Grade GPU Clusters at a Global Scale, Paired with AI Expertise

Together GPU Clusters provide a fully integrated and flexible AI infrastructure solution that caters to both startups and large enterprises. Configure your cluster with anywhere from 36 to tens of thousands of NVIDIA H100, H200, or GB200 NVL72 GPUs, enabling the flexibility to meet the unique needs of your AI projects. High-speed interconnects like InfiniBand and NVLink provide low-latency communication, with up to 1.8 TB/s GPU-to-GPU bandwidth to accelerate model training and convergence.

Our AI-native storage solutions, such as VAST Data and WEKA, allow for optimized workflows, seamlessly managing data ingestion, training, and inference at a petabyte scale. Together AI also offers advisory services to help you maximize model performance, providing dedicated support for kernel customization, system optimization, and strategic deployment.

Choosing to use Together GPU Clusters enables customers to benefit from the domain expertise of Together AI. Train and run inference for your models in environments that are optimized end-to-end, leveraging the software innovations from Together AI, and receiving specialized support in your cluster operation from the Together AI SRE and support teams.

Through tools like Slurm and Kubernetes, our clusters can handle both training and inference workloads with flexibility, ensuring optimal performance for demanding AI applications. Together AI’s support team works alongside your internal teams to ensure your AI infrastructure delivers maximum return on investment while scaling with your organization’s growing needs.

The Power of NVIDIA’s GB200 NVL72 in Together GPU Clusters

NVIDIA’s GB200 NVL72 is a revolutionary addition to Together GPU Clusters, and expands upon the thousands of H100 and H200 GPUs Together AI and Hypertec Cloud have already deployed across North America. GB200 provides 30X faster real-time inference for trillion-parameter models and up to 4X accelerated training compared to previous architectures. This performance is driven by its advanced Transformer Engine and FP4 precision capabilities, which allow for a more efficient representation of numerical values, improving both speed and memory usage. Built with fifth-generation NVLink, this liquid-cooled architecture creates a cohesive GPU powerhouse ideal for running high-demand models across multiple industries.

Together We Forge the AI Frontier

Together AI is massively scaling its AI infrastructure platform, leveraging Hypertec’s expertise to meet the growing needs of frontier AI. This partnership ensures that Together GPU Clusters offer the reliability and rapid scalability that industry innovators and large-scale enterprises need to accelerate next-generation AI.

https://www.together.ai/blog/nvidia-gb200-together-gpu-cluster-36k