Two hardware announcements were made by Nvidia at this year’s Supercomputing conference in Atlanta
At SC24 – this year’s edition of the US-based annual supercomputing conference – Nvidia made a slew of announcements, spanning hardware, AI for science, computer-aided engineering, and updated versions of the company’s CUDA-Q and cuPyNumeric solutions.
Expanding its hardware offerings even further, alongside announcing the general availability of the Nvidia H200 NVL PCIe GPU for air-cooled data centers, the company also introduced its new GB200 NVL4 form factor.
The GB200 NVL4 connects four Blackwell GPUs to two Grace CPUs. Speaking with DCD and SC24, Dion Harris, head of data center product marketing at Nvidia, said the GB200 NVL4 was a “really unique configuration” that would be of most use to HPC use cases within the scientific community as they often have an interesting mix of CPU to GPU workloads.
“We deployed Grace Hopper last year and that was basically a one CPU to one GPU configuration,” Harris said. “But with Blackwell, we saw that a lot of the workloads that were being used didn’t leverage the CPU to a large extent, they were more GPU heavy, or they found that they can get better economics by doing one to two in terms of an actual super chip ratio.”
Aimed at HPC and AI-hybrid workloads, the super chip features 1.3TB of coherent memory and, according to Nvidia, has 2.2X the simulation, 1.8X the training, and 1.8X the inference performance of its predecessor, the GH200 NVL4 Grace Hopper Superchip.
The GB200 NVL4 is expected to be available to customers in 2H 2024.
SC24 also saw Nvidia announce the general availability of its H200 NVL PCIe form factor. The company says it is “ideal for data centers with lower power, air-cooled enterprise rack designs” and comes with flexible configurations to support AI and HPC workloads of all sizes.
Companies can use their existing racks and select the number of GPUs that best meet their needs from one, two, four, or eight GPUs with NVLink domains up to four. The H200 NVL can be used to accelerate AI and HPC applications, while also improving energy efficiency through reduced power consumption.
“When we think about adopting GPUs, we have lots of exact form factors, whether it’s HDX or our new NVL 72 rack configuration, those require specific types of both server enclosures and data center rack enclosures,” said Harris. However, Harris explained that because of its PCIe form factor, it can fit into any mainstream CPU and PCIe configuration.
According to Nvidia, the H200 NVL has 7X faster GPU-to-GPU communication with NVLink over PCIe Gen5, and has a 1.5x memory increase and 1.2X more HBM bandwidth over Nvidia’s H100 NVL offering. The company says this allows it to deliver up to 1.7x faster inference performance, while for HPC workloads, performance is boosted up to 1.3x over H100 NVL and 2.5x over the Nvidia Ampere architecture generation.
The availability of a product specifically targeted at air-cooled data centers comes amidst reports that its Blackwell processors are overheating when linked together in 72-chip data center racks.
Housed in a liquid-cooled rack design, the GB200 NVL72 is capable of running 72 GB200 GPUs, 36 Grace CPUs, and nine NVLink switch trays, each of which has two NVLink switches. The idea, the company said, is that this will enable the system to run as a giant GPU, boosting performance.
When asked about the reports by DCD, Harris said Nvidia was working with all its partners “to make sure that Blackwell will be successful.”
“That’s how we approach everything. We’ve done the same thing with Hopper, we’ll do the same thing with the next generation. That’s just standard protocol,” added Harris.
Hopper will continue to have value
Despite the hype around Blackwell, Harris said that Hopper will still continue to have “incredible value” amongst Nvidia customers.
Harris explained that when launched, the tightly coupled CPU and GPU configuration on the Grace Hopper meant the architecture was somewhat unique and thus allowed customers to do different things that might not have otherwise been possible, such as certain aspects of climate simulation.
He added that while he was excited to see the work that will be done using Blackwell, Harris said that Nvidia’s Grace Hopper has acted as a great test case for what is possible with this chip architecture.
“A lot of that groundwork is being laid with Grace Hopper. Now, when Grace Blackwell comes to market next year, I think a lot of those [applications] will instantly carry forward, and you’ll see a lot more excitement just in terms of performance and benefits. But I think Grace Hopper is really having a transformative impact in terms of how some of these applications are being developed and run.”
https://www.datacenterdynamics.com/en/news/nvidia-announces-new-gb200-nvl4-superchip-at-sc24-but-says-theres-still-value-to-be-found-in-grace-hopper/