Stretching Multi Site Clusters to Nearly 1 Million GPUs

Google and NVIDIA have teamed up to provide users with access to as much as one million NVIDIA GPUs to power up the freshly launched A5X instances. The announcement is part of the pair’s latest collaboration to reduce inference costs and improve token throughput. Their A5X system relies on NVIDIA’s network accelerators that enable the development of single and mutli-cluster computing infrastructure for AI workloads.

The A5X Instance: Purpose-Built for Agentic AI Workloads

The A5X instances are Google’s latest products that are designed specifically to run agentic artificial intelligence workloads. They are part of Google’s AI Hypercomputer portfolio which also powers the firm’s Gemini platform and its consumer and enterprise AI offerings. As part of its latest announcements, Google announced a slew of upgrades to Hypercomputer which include new virtual machines powered by custom Arm-based CPUs, eight generation tensor processors, native PyTorch TPU support and the A5X instances.

These new capabilities are designed specifically to target agentic AI workloads which rely on a group of AI agents to focus on a piece-wise approach of solving a problem or a task. The A5X instances are the first ones from Google that are designed to work on NVIDIA’s latest Vera Rubin AI GPUs.

Google Virgo & ConnectX-9: Scaling to a Million Vera Rubin GPUs

According to the details, the A5X will use NVIDIA’s ConnectX-9 NICs which are designed to accelerate AI workloads in cloud infrastructure run on ethernet. The NICs, along with Google’s Virgo platform, will allow users to access as much as 80,000 Rubin GPUs in a single cluster and 960,000 GPUs in a multisite cluster.

Hardware Architecture	Max Single Data Center Cluster	Max Multi-Site Cluster
NVIDIA Vera Rubin GPUs	80,000	960,000
Google Custom TPUs	134,000	1,000,000+
Networking Backbone	NVIDIA ConnectX-9 NICs	Google Virgo Platform

The ROI: 10x Lower Inference Costs & Higher Throughput

Google’s Virgo platform enables it to connect multiple AI chips within a single data center. Along with working with NVIDIA’s Rubin GPUs, it also supports Google’s tensor processing units (TPUs). Virgo can connect as much as 134,000 TPUs in a single data center and more than a million chips across multiple sites. According to NVIDIA, the A5X instance is capable of delivering 10x lower inference costs per token and 10x higher throughput per megawatt compared to the previous generation.

NVIDIA also briefly touches upon physical and industrial AI as it shares that products from firms such as Cadence and Siemens are powered through its infrastructure and available on Google Cloud. The fir adds that Google’s Gemini platform can also deploy agentic models and workflows across industries such as cybersecurity.

https://wccftech.com/nvidias-rubin-lands-inside-googles-virtual-machine-stretching-multi-site-clusters-to-nearly-1-million-gpus/amp/

NVIDIA’s Rubin Lands Inside Google’s Virtual Machine : US Pioneer Global VC DIFCHQ SFO NYC Singapore – Riyadh Swiss Our Mind

NVIDIA’s Rubin Lands Inside Google’s Virtual Machine : US Pioneer Global VC DIFCHQ SFO NYC Singapore – Riyadh Swiss Our Mind

Stretching Multi Site Clusters to Nearly 1 Million GPUs

The A5X Instance: Purpose-Built for Agentic AI Workloads

Google Virgo & ConnectX-9: Scaling to a Million Vera Rubin GPUs

The ROI: 10x Lower Inference Costs & Higher Throughput

Useful Links

Contact Us

NVIDIA’s Rubin Lands Inside Google’s Virtual Machine : US Pioneer Global VC DIFCHQ SFO NYC Singapore – Riyadh Swiss Our Mind

Stretching Multi Site Clusters to Nearly 1 Million GPUs

The A5X Instance: Purpose-Built for Agentic AI Workloads

Related Story NVIDIA’s 35x AI Inferencing Leap Arrives Early As Foxconn Fast-Fowards Groq 3 LPX Racks For Trillion-Parameter Models

Google Virgo & ConnectX-9: Scaling to a Million Vera Rubin GPUs

The ROI: 10x Lower Inference Costs & Higher Throughput

Useful Links

Contact Us

Newsletter Signup