With artificial intelligence driving unprecedented demand for AI networking solutions, the push for increased performance, scalability and efficiency in modern infrastructure is top of mind for many companies.
Broadcom Inc.’s latest advancements, from energy-efficient chips to innovative AI networking solutions technologies, address these challenges head-on, enabling dense AI workloads while cutting power consumption and costs. With a collaborative approach and a focus on reliability, Broadcom is setting the stage for future-ready systems designed to meet the escalating demands of AI and high-performance computing.
At SC24, during three interviews with theCUBE, Broadcom leaders shared insights into the critical innovations shaping AI and high-performance computing. Collaborative efforts with key partners and a focus on reliability and adaptability underscore the company’s vision for meeting the growing demands of AI, ensuring systems are future-ready and cost-efficient, according to Hasan Siraj (pictured), head of software products, ecosystem, at Broadcom.
“There [are] people who know how to manage Ethernet-based networks,” Siraj said. “There are troubleshooting tools, monitoring tools that are available. Whenever you’re building an AI network, you have a front end, a backend, a storage and an outband management network that’s all Ethernet. It’s a standard way of managing all of it.”
Hasan Siraj and Hemal Shah, distinguished engineer and architect at Broadcom, spoke with theCUBE Research’s John Furrier, Dave Vellante and Savannah Peterson at SC24, during an exclusive broadcast on theCUBE, SiliconANGLE Media’s livestreaming studio. They discussed Broadcom’s advancements in AI networking solutions, focusing on energy-efficient technologies, scalable systems and collaborative innovations. (* Disclosure below.)
Broadcom drives AI innovation with power-efficient networking and scalable systems
Hasan Siraj of Broadcom highlighted the company’s groundbreaking advancements in power efficiency and networking solutions, emphasizing their critical role in AI and HPC. Broadcom’s Tomahawk 5 chip and Thor 2 NIC were showcased as transformative technologies, enabling up to 75% reductions in power consumption and cooling needs while maintaining exceptional performance.
“We’ll see the scale become bigger and bigger over the next four years, but we’ll also see this go down to other verticals,” Siraj said. “We will see enterprise adopters … and from a networking perspective, we believe ethernet will win. It’s already on its way, and it can scale from the largest clusters on the planet to whatever optimizations that are required for inference and other use cases.”
Collaboration is key to Broadcom’s strategy, as its partnerships with Dell Technologies Inc. and Denver Dataworks Corp. help create open, scalable systems that seamlessly integrate networking, storage and computing components, Siraj stressed.
Read More: https://siliconangle.com/2024/11/20/ai-factories-dell-broadcom-denvr-dataworks-sc24/
Networking as the backbone of scalable AI
The transformative role of networking is enabling large-scale AI workloads, Siraj outlined during the keynote analysis on day 2 at SC24.
There has been a shift from traditional server-based systems to clustered architectures, where networking serves as the essential “glue” for scalability. There are also unique demands of AI, such as massive bandwidth and low latency, with networking inefficiencies critically hindering AI job completion and infrastructure utilization, Siraj added.
“There [are] people who know how to manage Ethernet-based networks,” he said. “There are troubleshooting tools, monitoring tools that are available. Whenever you’re building an AI network, you have a front end, a backend, a storage and an outband management network that’s all Ethernet. It’s a standard way of managing all of it.”
It is likewise important to address challenges such as congestion management and failure recovery as GPU clusters scale to unprecedented sizes, potentially reaching millions of nodes, according to Siraj. Broadcom’s forward-looking approach aims to meet the increasing complexity of AI infrastructure with strong, future-ready solutions.
“If you are training a large model and these models are growing at an exponential, they don’t fit in a CPU, and a core of a CPU, virtualization is no play,” he explained. “This is why you cannot fit a model within a server or two servers or four servers. That is why you need a cluster. When you have a cluster and everything is spread out, you need glue to put this all together. That is networking.”
Read More: https://siliconangle.com/2024/11/20/broadcom-clustered-systems-sc24/
Scalable AI networking solutions and innovation spotlighted at SC24
There is currently a rapid advancement in high-performance computing to address the increasing demands of AI and machine learning. The critical role of scalable AI networks and open standards is a critical part in driving innovation.
“Dell and Broadcom with our other partners, we are working to build really high bandwidth, high network utilized fabrics,” Shah said. “In partnership, what we’ll bring together is a lot of software integration, the whole diagnostic monitoring of the fabric, which makes life easy for deployments.”
Looking ahead to SC25, Shah expect continued focus on scalability, alongside emerging developments such as advancements in UEC specifications and enhancements in fabric solutions. These ongoing innovations aim to meet the ever-growing demands of the AI and machine learning landscape while maintaining a commitment to quality and ease of use.
“We should be able to talk about some of the enhancements we are doing at the solution, which are already in the works, but it’ll be more mature next year,” he said.
Read More: https://siliconangle.com/2024/11/22/scalable-ai-networks-sc24/
Find all of our reporting here, and watch the full playlist from our Nov. 19-21 broadcast below:
(* Disclosure: Dell Technologies Inc. sponsored these segments of theCUBE. Neither Dell nor other sponsors have editorial control over content on theCUBE or SiliconANGLE.)
Photo: SiliconANGLE
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU
https://siliconangle.com/2024/11/22/ai-networking-solutions-sc24/