Nvidia GTC 2024 Wrapup: Blackwell, MediaTek, Omniverse And Vision Pro : US Pioneer Global VC DIFCHQ NYC India Singapore – Riyadh Norway Our Mind

This year, Nvidia’s annual GTC event returned to an in-person format. It was undoubtedly the biggest one I have ever attended, and I’ve been to every GTC since 2009, when it started. One of the biggest reasons for the growth of GTC is that the company now has a presence in more markets than ever before.

The other reason is that Nvidia is the undisputed market leader in AI computing for the cloud, where most AI computation occurs, both in training and inference. That said, Nvidia does have more competition this year than it did a year ago, and I suspect that the competition will only grow over time. That made this year’s GTC an important one for the company to reassert itself as the market leader to the world and its partners.

Blackwell GPUs — B100, B200, GB200 And More

First and foremost, Nvidia needed to reestablish itself as the leader in GPU technology with market-leading hardware. That’s why Nvidia announced the Blackwell family of products that scale from a single GPU to an entire datacenter of GPUs interconnected with Nvidia’s Mellanox InfiniBand technology, which the company acquired back in 2019. That acquisition has been critical to Nvidia’s ability to build hyperscale and HPC-scale systems with low latency and high bandwidth.

The 208B transistor Blackwell GPU comes in two flavors, combining two 104B transistor GPU dies into a single chip. Both Blackwell’s B100 and B200 variants feature 8 Gbps HBM3E memory on a 4,096-bit bus with 8 TB/s of memory bandwidth derived from 192GB of VRAM. The differences come in performance, with Nvidia quoting a peak FP4 Dense Tensor performance of 7 PFLOPS for the B100 and 9 PFLOPS for the B200. Both GPUs support NVLink 5’s 1800 GB/s of interconnect bandwidth, as well as PCIe 6.0.

Both GPUs are manufactured using TSMC’s 4NP process node. The B100 consumes 700 watts of power and the B200 consumes a hefty 1 kilowatt. Nvidia designed the B100 to be a drop-in replacement for the H100, hence the same 700-watt TDP. The B100 is roughly 80% faster than the H100, which gives you a good idea of how much faster the Blackwell architecture is than Nvidia’s Hopper architecture. This is also Nvidia’s slowest Blackwell part; the B200 is more than 10% faster in most scenarios.

The GB200 “superchip” combines two B200 GPUs paired with a Grace Arm server CPU connected with NVLink interconnects that have 900 GB/s of bandwidth. The GB200 claims 20 PFLOPS of FP4 tensor performance (40 PFLOPS with sparsity), which is more than double that of a single B200. As a combined single unit, the GB200 also has 384GB of HBM3E memory. This combined “superchip” has more than 496 billion transistors; each of the Blackwell dies has 104 billion transistors, and there are four of them on each “superchip,” along with an 80-billion-transistor Grace server chip. The GB200 has a 2,700-watt TDP and comes in two flavors, one for rack mounting and another for more compact DGX/HGX systems.

These GB200s will be interconnected with one another using NVLink into a complete rack that Nvidia is calling the GB200 NVL36 and NVL72. The NVL36 uses 36 of the B200 chips in one rack and 18 single GB200 compute nodes, while the NVL72 uses 18 dual GB200s. This system uses fifth-generation NVLink and NVLink switching systems to interconnect the rack’s GPUs. Nvidia claims 1.33 exaFLOPs of FP4 inference with sparsity, which is incredible when you consider what it took to get an exaFLOP of any kind of performance only a few years ago.

While the Blackwell architecture seems power-hungry, it is huge on space savings and power savings at scale, which reflects the customers that these systems are targeted to serve. Nvidia talked about training a GPT-MoE 1.8-trillion-parameter model, which would require 8,000 GPUs and 15 megawatts in about 90 days using its last-generation Hopper GPUs. By comparison, a Blackwell GB200 NVL72 system training the same 1.8-trillion-parameter model would require only 2,000 GPUs and 4 megawatts of power with the new system, which is important because power is becoming such a serious sticking point for both AI computing and cloud computing in general. Nvidia is definitely telling the right power and performance story for its customers and partners.

NIM For Generative AI

As a leader in the AI market, Nvidia must make sure it can accelerate the adoption of AI services and applications as quickly as possible. One way to achieve that is to ensure that all the latest AI models are optimized for Nvidia hardware—or, in short, to make implementing those models as easy as possible for developers. This is why Nvidia just announced a new catalog of NIM microservices and cloud endpoints for pretrained AI models, all optimized for CUDA-capable Nvidia GPUs. Enterprises can use these NIM microservices for a host of tasks including LLM customization, inference, retrieval-augmented generation and guardrails.

Nvidia will make these microservices available on its website at no charge and integrate them with its AI Enterprise 5.0 software suite. I believe that Nvidia is taking this approach to ensure that generative AI is not slowed down by poorly optimized models or inaccurate results due to a lack of guardrails or RAGs. Nvidia is clearly focusing on improving both time-to-market and quality of results, and I think that’s a positive for the industry and its growth.

MediaTek Auto Cockpit Lineup

At GTC 2024, Nvidia and MediaTek announced the next phase of the two companies’ automotive partnership. Dimensity Auto Cockpit combines MediaTek’s SoC-building capabilities and Nvidia’s GPUs that run Drive OS. At GTC 2024, MediaTek announced four separate 3nm products ranging from mainstream to high-end cockpit solutions. The CX-1 and CY-1 are MediaTek’s premium products that are pin-compatible with one another, while the CM-1 and CV-1 are MediaTek’s more mainstream products that are also pin-compatible with one another. This is an excellent approach because it allows MediaTek’s OEM customers to mix and match designs with the appropriate SoC based on the vehicle’s needs and its price segment.

All of these Dimensity Auto Cockpit chips combine an Arm v9-A CPU with a licensed Nvidia “next-gen” GPU to enable AI and RTX graphics onboard the vehicle. There is also a multi-camera HDR ISP for the many camera features that vehicles will need in the future, as well as an integrated audio DSP for the latest voice assistants to ensure a smooth natural language processing experience. While it’s unclear exactly what kind of AI performance these chipsets will have, AI seems to be a major focus of this partnership. Also, MediaTek will support QNX, Linux and Android Automotive OS from the start. This seems to be a great platform for many of Nvidia’s existing customers such as BYD, which recently became the number-one EV manufacturer in the world. While I do believe that MediaTek’s initial designs will be picked up by Chinese OEMs such as Geely, NIO, SAIC Motor and XPeng, there is a good chance that momentum could bring MediaTek and Nvidia other OEM customers down the road.

https://www.forbes.com/sites/moorinsights/2024/03/26/nvidia-gtc-2024-wrapup-blackwell-mediatek-omniverse-and-vision-pro/amp/