Google’s AI-powered next-generation global network: Built for the Gemini era : US Pioneer Global VC DIFCHQ SFO NYC Singapore – Riyadh Swiss Our Mind

From answering search queries, to streaming YouTube videos, to handling the most demanding cloud workloads, for over 25 years, we’ve been relentlessly pushing the boundaries of network technology, building a global infrastructure that powers Google and Google Cloud for billions of users and enterprise customers globally. We now stand at another pivotal moment, driven by the transformative power of AI, and our network is once again evolving to meet the challenges and opportunities of this new era.

Here’s a look behind the scenes at the evolution of our global network, from enabling the early days of web search, to today, powering demanding AI workloads to bring AI’s benefits to everyone — people and businesses alike.

Our network’s evolution

There have been several fundamental inflection points for the Google network over the last 25 years, leading to three distinct networking eras:

Internet era: Our journey began in the internet era, when we primarily focused on offering our users across the globe a consistently high-quality experience in terms of reliability and latency — whether they were using Search, Maps, or Gmail. Key innovations included the B2 network; Bandwidth Enforcer (BwE); B4, our first fully software-defined backbone; our Orion software-defined network (SDN) controller; and our petabit-scale SDN data-center fabric, Jupiter.

Streaming era: With the advent of YouTube and similar services, streaming video became a significant portion of global internet traffic — a trend that continues even today. We adapted our network to deliver low-jitter and high-quality video around the world through technologies such as Google Global CacheEspressoQUIC, and TCP BBR.

Cloud era: The rise of cloud computing demanded greater resiliency, multi-tenancy, and security, which inspired innovations such as AndromedagRPCPSP, and Swift.

Alongside technology innovations, our network footprint had to scale continuously to reach every Google user and customer with a consistent, high-quality experience. Today, this network spans over 2 million miles of lit fiber, including 33 subsea cable investments, with 202 network edge locations and more than 3,000 media content delivery network (CDN) locations across the globe. It connects 42 Google Cloud regions and 127 zones. We are also the most deeply peered cloud service provider network in the world.

https://storage.googleapis.com/gweb-cloudblog-publish/images/1_-_GGN_Eras.max-2200x2200.jpg

AI is driving unprecedented network demands 

As Sundar noted in his Google I/O 2024 keynote, we’ve been AI-first in our approach for more than a decade, investing in and innovating at every layer of the stack. From research and products to infrastructure — our global network fuels these AI innovations and brings them to you wherever you are in the world. All 15 of our half-billion-user products — including seven with 2 billion users — are powered by our Gemini model, and all of them rely on the Google global network to bring us closer to our ultimate goal: making AI helpful for everyone. We take this responsibility very seriously.

The AI era presents unique challenges that require a fundamental rethinking of our network architecture from four key perspectives:

  • The wide area network (WAN) is the new local area network (LAN): In the AI era, we train the largest of our foundation models across multiple campuses and even multiple metros to pool together large numbers of TPUs. The need for scalability has never been this acute, both for Gemini and for our customers building foundation models on Google Cloud infrastructure. Moreover, these ML applications have unique traffic patterns, such as highly bursty elephant flows. Understanding and managing these flows is critical for efficient network performance.
  • AI demands zero impact from any outages: AI foundation model training, fine-tuning and inferencing are intensive processes that rely on valuable GPU/TPU resources, and a prolonged outage can be very disruptive to them. In other words, network disruptions are simply unacceptable — our customers expect always-on connected network capacity.
  • A heightened need for security and control: AI models and the data they are trained on must be protected to ensure their integrity. In addition, there are evolving compliance requirements for AI models from different regions and for data in transit.
  • Operational excellence: From creating site reliability engineering (SRE) principles and leveraging AI/ML innovations in operations, to finding failure root causes using ML, we’re always exploring new ways to deliver excellence in our network operations. Simultaneously, the challenges of costs and complexity associated with linear scaling have pushed us to seek solutions that are efficient and sustainable for our customers.

New network design principles and innovations

To address these challenges, we’ve reimagined our next-generation network from the ground up, establishing four new design principles.

https://storage.googleapis.com/gweb-cloudblog-publish/images/2_-_GGN_Design_Principles.max-2200x2200.jpg
  1. Exponential scalability: Our network needs the ability and agility to handle massive amounts of data and traffic, especially in key regions serving AI traffic. The need for scalability has never been greater. In the AI era, the WAN is the new LAN and the continent is the data center.
  2. Beyond-9s reliability: The industry has traditionally understood reliability in terms of “3 9s,” “4-9s” or “5-9s” of availability. Increasingly, that is simply not enough, as long-tail events, well within the x-9s specifications, matter as much as the average reliability of the network. Our users and customers expect deterministic performance, a limited impact radius for incidents, and proactive and ultra-fast mitigation. We are embarking on a journey to go “beyond 9s.”
  3. Intent-driven programmability: Billions of people use our network. They have unique requirements for security, compliance, resiliency, performance, and efficiency. To address all these requirements, we need a fully intent-driven, highly programmable network.
  4. Autonomous network: Automation and zero-touch have been buzzwords for the last decade. To support the next decade’s demands, we need autonomous networks that can run at scale 24×7 with minimal human intervention.

Guided by these four design principles, we have built our next-generation global network by making  foundational networking advancements.

Multi-shard network: We are moving beyond traditional vertical scaling limitations to elastic, horizontal scalability with our multi-shard network architecture. Each network shard is independent and enables horizontal scaling; not only can we scale the network within a shard, but we can scale the number of shards in the network. This allows for swift and substantial WAN bandwidth growth to support AI infrastructure demands. In fact, from 2020 to 2025, our WAN bandwidth grew a whopping 7x.

Multi-shard isolation, region isolation, and protective reroute: Each of our network shards has its own control plane, data plane, and management plane, and operates independently of other shards. This multi-shard isolation enables a high level of resiliency that’s rare for global backbones at our scale; in fact, it parallels the level of resiliency typically achieved via multiple independent global ISPs, without the associated complexity of managing multiple networks. Regional isolation minimizes the impact of failures and limits the impact radius. Protective ReRoute, a transport technique for shortening user-visible outages, glues it all together – it lets hosts promptly detect and route around any network failures within a few seconds. With Protective ReRoute deployed in our network, we have observed up to 93% reduction in cumulative outage minutes.

Fully intent-driven, fine-grained programmability: We’ve built a highly programmable network with SDN controllers, standard APIs, and universal network models such as the Multi-Abstraction-Layer Topology representation, or MALT. This enables fully intent-driven network controls that allow us to tailor our network to specific application needs, and meet the unique needs of our customers. For example, these controls can be used for regulatory compliance and data sovereignty, including control over data in motion.

Autonomous network: Over the last decade, we’ve transformed our network, moving from event-driven to machine-driven to now autonomous operations. This journey is fueled by ML, which provides us with actionable intelligence. Inspired by Google DeepMind’s work with graph neural networks (GNN) for accurate arrival-time predictions in Google Maps, we used GNN to create a digital twin of our network. This twin lets us predict and prevent outages, quickly pinpoint failures and their root causes, and optimize network capacity planning. As a result, we’ve observed failure mitigation times improve from hours to minutes, boosting our network’s efficiency and resilience with minimal human intervention.

A network to unlock the full potential of AI

For cloud customers, Google’s global network offers the capacity, elasticity, and scale to deploy and leverage AI effectively, 24×7 app resilience with a reliable network, security through zero-trust principles, and performance that meets the needs of AI/ML applications. Furthermore, AI-driven efficiencies reduce maintenance toil, enable faster recovery, and improve ROI. And with Cloud WAN, starting today, Google Cloud customers can use Google’s global network to connect their global enterprises. For end users, this translates to expanded global reach, resilient mission-critical applications, zero-trust security to protect their data, and a performant network for power-intensive real-time apps. Taken together, these help ensure a great user experience.

This is a truly exciting time as we continue to push the boundaries of network technology and realize the transformative potential it holds for our customers in the AI era.

To learn more, we invite you to join us at our Google Cloud Next 2025 session, where we’ll share more details and demonstrate how our network continues to uphold Google’s mission and drive our customers’ success in the Gemini era. Keep an eye out for future blogs about the groundbreaking innovations that are powering Google’s next-generation global network.

https://cloud.google.com/blog/products/networking/google-global-network-principles-and-innovations