Nvidia has announced world foundation models (WFMs) for creating generative physical AI to power autonomous factories and warehouses, traffic control systems, and even surgical rooms.
At CES in Las Vegas Monday, Nvidia trumpeted a slew of AI announcements, with an emphasis on generative physical AI that promises a new revolution in factory and warehouse automation.
“AI requires us to build an entirely new computing stack to build AI factories, accelerated computing at data center scale,” Rev Lebaredian, vice president of omniverse and simulation technology at Nvidia, said at a press conference Monday.
[ Related: More Nvidia news and insights ]
Nvidia defines physical AI as the embodiment of artificial intelligence in humanoids, factories, and other devices within industrial systems. Large language models (LLMs), Nvidia says, are one-dimensional. They can predict the next token in modes like letters or words. Image- and video-generation models are two-dimensional; they can predict the next pixel. But physical AI — which Nvidia believes will power everything from surgical rooms to data centers, warehouses, factories, traffic control systems, and smart cities — requires models that can understand and interpret a three-dimensional world.
Much as the advent of generative pre-trained transformers (GPT) led to an explosion of gen AI development over the past several years, Lebaredian believes that today’s investments in agentic AI will unlock a wave of physical AI innovation.
AI gets physical
Nvidia places robots in three distinct buckets: knowledge robots (agentic AI), generalist and humanoid robots, and transportation robots (autonomous vehicles). The latter two categories consist of robots that can understand and interact with the physical world.
“These robots are driven by physical AI models that can understand and interact with their environments,” Lebaredian said. “While language models generate text or video from text or image prompts, physical AI will generate their next action based on instructions. Physical AI will completely revolutionize the world’s industrial markets, bringing AI into 10 million factories and 200,000 warehouses.”
Lebaredian said that most people think of Nvidia’s robotics and automotive businesses as the computer in the robot or car. But, he said, the real opportunity is the AI factory. Currently, developers of humanoid robots rely on hundreds of human operators performing thousands of repetitive demonstrations to teach a handful of skills. And autonomous vehicle (AV) developers need to drive millions of miles and process, filter, and label the thousands of petabytes of data they capture.
“Ultimately, no matter how much real-world data you collect and how many miles you drive, we’ll always need synthetic data to perfect models and ensure they can perform well even in long-tail, edge-case scenarios,” Lebaredian said.
To power the next wave of AI robotics, Nvidia has created what it calls its “three computer solution,” which consists of Nvidia DGX, for training the AI-based stack in the data center; Nvidia Omniverse running on Nvidia OVX systems, for simulation and synthetic data generation; and the Nvidia AGX in-vehicle computer, for processing real-time sensor data.
On Monday, the company added Nvidia Cosmos to that mix. Cosmos’ central innovation are generative world foundation models (WFMs) — neural networks that simulate real-world environments and predict outcomes based on text, image, or video output. The platform also provides advanced tokenizers, guardrails, and an accelerated video processing pipeline.
Training physical AI models for robots has proved costly and time-consuming due to the vast amounts of real-world data and testing required. Cosmos’ WFMs promise to streamline that process by giving developers the ability to generate massive amounts of photoreal, physics-based synthetic data to train and evaluate their existing models.
“Cosmos will dramatically accelerate the time to train intelligent robots and advanced self-driving cars,” Lebaredian said.
Nvidia said Cosmos WFMs are purpose-built for physical AI research and development and can generate physics-based videos from inputs ranging from text and images to video and robot sensor or motion data. The models will be available under an open model license that will give developers the ability to customize them with datasets, such as video recordings of AV trips or robots navigating a warehouse.
In his keynote at CES, Nvidia Founder and CEO Jensen Huang said developers will be able to use the Cosmos models for numerous use cases, including:
- Video search and understanding. Developers will be able to search video data for specific training scenarios, such as snowy road conditions or warehouse congestion.
- Controllable 3D-to-real synthetic data generation. Cosmos models can generate photoreal videos from controlled 3D scenarios.
- Multiverse simulation. Cosmos and Omniverse can generate every possible future outcome an AI model could take to help it select the best and most accurate path.
Agentic AI on display
Physical robots weren’t the only AI-enabled entities targeted by Nvidia in Monday’s announcements. The company is also bolstering its support for agentic AI with new models, new blueprints, and a major ecosystem expansion.
“AI agents are the digital workforce that will work for us and work with us,” Lebaredian said.
At CES, Nvidia introduced its Nvidia Nemotron model family for agentic AI. Lebaredian said Nvidia’s Nemotron LLMs are fully optimized versions of Meta’s open-source Llama models, using Nvidia CUDA and AI acceleration to enable the high performance and lower compute costs crucial for agentic systems running multiple LLMs.
Nvidia is offering the Nemotron family in three sizes:
- “Nano” for cost-efficient, low-latency applications on PC and edge devices
- “Super” for high accuracy and throughput on a single GPU
- “Ultra” for the highest accuracy at data center scale
Nvidia announced two new Nvidia AI blueprints. The first, PDF to podcast, is an agent that can turn documents like whitepapers and financial reports into interactive podcasts. The second blueprint is a video analytics agent for analyzing video data to enable interactive search, summarization, and report generation.
Nvidia partners also announced a new set of blueprints:
- CrewAI announced a blueprint focused on code documentation for software development.
- Daily added a voice agent blueprint.
- LangChain expanded its structured report generation blueprint, which allows users to define a topic and specify an outline to guide an agent in searching the web for information relevant to a report.
- LlamaIndex added a document research assistant for blog creation blueprint.
- Weights & Biases added its W&B Weave capability to the AI Blueprint for AI virtual assistants. The blueprint streamlines the process of debugging, evaluating, iterating, and tracking production performance and collecting human feedback.
Nvidia partner Accenture also announced a new AI Refinery for Industry with 12 new industry agent solutions, including revenue growth management for consumer goods and services, clinical trial companion for life sciences, industrial asset troubleshooting, and B2B marketing.
See also:
- Nvidia points to the future of AI hardware
- Accenture-Nvidia deal: A first peek into the new world of gen AI-centric strategies
- Nvidia launches ‘easy button’ for creating gen AI workflows
- How Nvidia became a trillion-dollar company
https://www.cio.com/article/3632479/nvidia-unveils-generative-physical-ai-platform-agentic-ai-advances-at-ces.html?amp=1