What does it take to power the world’s most demanding AI models, like those behind ChatGPT?
At the most fundamental level, the world’s most demanding AI models require massive GPU compute working in lockstep. As AI systems scale, efficiently bringing that compute together depends increasingly on the network that connects it. Hundreds of thousands of GPUs must continuously stay synchronized, exchange data, and recover quickly from inevitable disruptions.
At this scale, the network directly determines how much compute can be utilized.
Today, OpenAI in collaboration with AMD, Microsoft and other industry leaders, announced that it is contributing Multipath Reliable Connection (MRC) to the Open Compute Project (OCP), making this new network protocol available to the broader ecosystem. As a long-standing contributor to open ecosystems helping advance Ethernet for the era of AI, AMD is helping transform AI networking into an open, programmable, production-ready foundation for customers building AI infrastructure.
For AMD, and the industry at large, MRC represents more than a new networking protocol for frontier-scale supercomputers. It is an important step toward a more open, programmable, and resilient foundation for AI infrastructure. As customers build larger AI clusters across cloud, enterprise, research, and sovereign AI environments, the industry needs networks that are not only fast in ideal conditions, but consistent, adaptive, and operationally practical in real world deployments.
MRC: Built for AI networking at Scale
MRC is designed specifically for large-scale AI training environments where traditional single-path networking models struggle. These workloads require continuous, high-speed communication, and even brief disruptions can impact overall system progress.
Instead of sending traffic along a single path, MRC distributes packets across multiple paths simultaneously. This reduces congestion hotspots and limits latency variation that can slow synchronized training. When failures inevitably occur, MRC adapts quickly and allows traffic to reroute in near real-time, avoiding the delays associated with traditional network recovery.
In practical terms, MRC helps turn the network into a shock absorber for AI infrastructure. Instead of forcing every event to become a disruption, MRC gives the network a way to adapt locally and quickly so workloads can continue making progress. That matters because performance at AI scale is not defined by peak bandwidth alone. It is defined by how much useful accelerator capacity remains productive under real-world conditions.
AMD Contributions: From Development to Deployment
AMD played a formative role in shaping how MRC works today. AMD co-led authorship the specification that defines next-generation AI networking and contributed advanced congestion control technology to improve performance under real-world conditions.
More importantly, this isn’t theoretical. AMD has implemented and deployed MRC, combined with AMD networking technology, at scale in test clusters with a leading cloud provider. This validation means the design reflects how networks actually perform under sustained AI workloads.
“As GPUs and CPUs continue to drive compute, the real bottleneck in scaling AI is the network. Today’s MRC announcement from OpenAI marks a major step forward for the industry. AMD’s programmability enables us to rapidly turn innovations like this into real-world performance at scale, where consistent, resilient throughput matters more than theoretical peak bandwidth.” – Krishna Doddapaneni, CVP, Engineering, NTSG, AMD
Programmability remains a key differentiator for AMD, as one of the only networking solutions that combines full hardware and software programmability with proven deployments, allowing networks to adapt as workloads evolve. Before the development of the MRC specification, AMD had a pre-standard implementation of an improved RoCEv2 transport protocol, which evolved into the MRC standard of today. This was due to the open programmability of the AMD Pensando™ Pollara 400 AI NIC, and that programmability contributed to the flexibility in obtaining early validation. As AMD being one of the first and only companies to deploy MRC on a 400G NIC, we can accelerate a seamless transition to our AMD Pensando “Vulcano” 800G AI NIC, which also supports the MRC transport protocol.
This combination of a defined specification, contributed technology, and implementation in testing positions AMD at the forefront of deploying MRC in real-world AI infrastructure.
Redefining Performance for AI Infrastructure
For AI at scale, performance is defined by how systems behave under real conditions, not peak bandwidth. Consistent throughput, effective congestion handling, and quick recovery from failures, while keeping GPUs synchronized and productive is what’s optimal to power AI networking at scale. MRC can improve model efficiency and helps make the networking protocols connecting large-scale AI training across large GPU clusters highly reliable.
By helping define and contribute to MRC, AMD is advancing AI networking from concept to practical, production-ready infrastructure.
https://www.amd.com/en/blogs/2026/amd-advances-ai-networking-at-scale-with-mrc.html

