It is easy to say that NVIDIA Blackwell will sell like hotcakes in 2025. The company went into the platform architecture a bit more at Hot Chips 2024. Blackwell is something a lot of folks are excited about in the industry. As a quick note, for some reason NVIDIA had strange slides, so there is a lot of white space when doing this quickly. This was the one strange PDF of the over a dozen posted today. Sorry for that. On the plus side, NVIDIA showed its latest data center roadmap.
Please note that we are doing these live at Hot Chips 2024 this week, so please excuse typos.
NVIDIA Blackwell Platform at Hot Chips 2024
NVIDIA is not talking about the individual GPU as much as it is talking about the cluster level for AI. That makes a lot of sense especially if you see talks from large AI shops like the OpenAI Keynote on Building Scalable AI Infrastructure at Hot Chips 2024.

NVIDIA does not just focus on building the hardware cluster, but it also the software with optimized libraries.

The NVIDIA Blackwell platform spans from the CPU and GPU compute, to the different types of networks used for interconnects. This is chips to racks and interconnects, not just a GPU.

We did a fairly in-depth look at Blackwell during the NVIDIA GTC 2024 Keynote earlier this year.

The GPU is huge. One of the big features is the NVLink-C2C to the Grace CPU.

As NVIDIA’s newest, GPU is also its highest performance one.

NVIDIA uses the NVIDIA High-Bandwidth Interface (NV-HBI) to provide 10TB/s of bandwidth between the two GPU dies.

The NVIDIA GB200 Superchip is the NVIDIA Grace CPU and two NVIDIA Blackwell GPUs in a half-width platform. Two of these side-by-side means that each compute tray has four GPUs and two Arm CPUs.

NVIDIA has new FP4 and FP6 precision. Lowering the precision of compute is a well-known way to increase performance.

NVIDIA Quasar Quantization is used to figure out what can use lower precision, and therefore less compute and storage.

FP4 for Inference NVIDIA says can get close to BF16 performance in some cases.

Here is an image generation task using FP16 inference and FP4. These rabbits are not the same, but they are fairly close at a quick glance.