Nvidia’s growing impact on enterprise infrastructure was central at its recent GTC conference. GTC is the largest AI-focused event in the industry, bringing together nearly the entire AI ecosystem.
Applications and foundation models may provide enterprise value and drive investment, but the specialized infrastructure required to support AI makes modern AI practical. Nvidia sits at the center of it all, enabling cloud providers and on-prem solution providers alike.
Nvidia is a Platform Company
The big news from Nvidia is the launch of its next-generation Blackwell accelerators, which will bring new levels of capability to AI training and high-performance inference for generative AI. Nvidia’s new BH200 …
While customers will likely have access to raw GPUs, Nvidia packages its accelerators as system-level solutions to provide a turnkey, optimized, and efficient solution for enterprise AI. This starts with the Nvidia GB200 NVL72, an advanced rack-scale AI supercomputer designed for large-scale AI and HPC challenges.
It features the Grace Blackwell Superchip, which integrates high-performance NVIDIA GPUs and CPUs with a 900 GB/s NVLink-C2C interface for seamless data access. This architecture delivers 80 petaflops of AI performance, 1.7 TB of fast memory, and support for up to 72 GPUs.
Nvidia introduced its DGX SuperPOD with DGX GB200 systems, scaling things up even further. This SuperPOD is scalable to tens of thousands of GPUs, utilizing Nvidia GB200 Grace Blackwell Superchips for tackling trillion-parameter models.
This next-generation system ensures constant uptime with full-stack resilience. It features an efficient, liquid-cooled design for extreme performance. It integrates Nvidia AI Enterprise and Base Command software, streamlining AI development and deployment while maximizing developer productivity and system reliability.
AI Continues to be Cloud First
Nvidia is laser-focused on breaking out of the GPU business and delivering systems-level solutions to the market. This has caused some recent tension among the cloud service providers who prefer to build their own solutions, but that tension seems to be fading.
Nvidia and Amazon’s AWS, the last CSP to announce support for the current generation DGX cloud, jointly announced a strategic engagement extending beyond just DGX support and including joint development of a new AI supercomputer as part of their revamped Project Ceiba.
Oracle Cloud, one of Nvidia’s first DGX partners, also announced broad support for the GPU giant’s new systems. Taking things further, Oracle will offer Nvidia’s Bluefield-3 DPUs as part of its networking stack, giving its customers a powerful new option for offloading data center tasks from CPUs.
Microsoft Azure announced support for Nvidia’s new Grace Blackwell GB200 and advanced Nvidia Quantum-X800 InfiniBand networking. Similarly, Google Cloud will support Nvidia’s GB200 NVL72 systems, which combine 72 Blackwell GPUs and 36 Grace CPUs interconnected by fifth-generation NVLink.
OEMs are Ready for AI
Despite the common belief, AI is not a cloud-only play. Dell Technologies, HPE, Supermicro, and Lenovo all have substantial AI-related businesses. In their latest earnings, Dell and HPE reported a healthy AI-related server backlog of about $2 billion each.
Nvidia lent its support to the on-prem story with a joint announcement with Dell that the two companies will collaborate on a new AI Factory initiative. Dell’s AI Factory combines Dell’s robust portfolio of computing, storage, networking, and workstations. The integration includes Nvidia’s Enterprise AI software suite and the Nvidia Spectrum-X networking fabric, ensuring a seamless and robust AI infrastructure.
Dell also announced updates to its PowerEdge server line-up to support Nvidia’s next-generation accelerators, including introducing a powerful new liquid-cooled eight-processor server.
Lenovo introduced new ThinkEdge servers designed for AI. Its new liquid-cooled eight-processor ThinkSystem SR780a V3 server boasts efficient power usage effectiveness. At the same time, the Lenovo ThinkSystem SR680a V3 is an air-cooled server that supports AI acceleration with Intel processors and a range of Nvidia GPUs. Finally, The Lenovo PG8A0N is a 1U node with open-loop liquid cooling for accelerators and supports the new Nvidia GB200 Grace Blackwell Superchip.
Hewlett Packard Enterprise didn’t introduce new servers but announced new capabilities for its targeted generative AI solutions. HPE and Nvidia are collaborating on new HPE Machine Learning Inference Software, allowing enterprises to rapidly and securely deploy ML models at scale. The latest offering will integrate with Nvidia NIM to deliver Nvidia-optimized foundation models using pre-built containers.
https://www.forbes.com/sites/stevemcdowell/2024/03/26/ai-infrastructure-takes-center-stage-at-gtc-2024/amp/

