A look at the networking inside the hyperscaler’s data centers
By any measure – quantity, square footage, industry percentage – Amazon Web Services (AWS) has one of the largest data center footprints in the world.
In total, the AWS Cloud spans 114 Availability Zones in 36 regions, and each availability zone may be served by one or several data centers. This number is constantly growing.
Each data center features myriad internal and external connections, and with that comes a wildly complex networking system, one that is designed and mostly developed by AWS.
“We started developing our own hardware around 15 years ago,” Matt Rehder, VP of core networking at AWS, tells DCD.
The reason behind that move, which becomes clear as Rehder goes on to detail the networking in AWS data centers, is the need for simplicity and scale.
“Commonly, companies will use slightly different hardware in each of those roles – connecting the servers, or connecting different data centers,” Rehder says. “AWS is a bit different in that way because, many years ago, we decided to basically have the same thing everywhere to make our lives simpler.”
Of course, “simpler” is a relative term. Connecting massive data centers with hundreds or thousands of servers can never be quite considered “simple.”
But, in a bid for an easier life, AWS developed what it calls a “Brick” – a rack of AWS network switches.
“It’s a building block, and we can use Bricks wherever we need to – and then we make some of them functionally unique with some software changes,” Rehder explains.
Each AWS data center has lots of Bricks, which can be either “connecting a bunch of servers to the network, connecting other AWS data centers together in the local region, connecting to third-party networks, or even connecting AWS Regions together across long distances.”
Slightly different software gets loaded onto a Brick depending on the task, but “under the covers, they look almost identical,” Rehder says. This applies even when looking at AWS Local Zones, or to an extent, Outposts.
Local Zones are one of AWS’ “Edge” offerings, and see the AWS cloud brought closer to end users in a data center where the company does not have a cloud region.
In some cases, these are third-party data centers, but according to Rehder, they are increasingly Amazon-owned facilities that are not part of a geographic cloud region.
“If you walk into a Local Zones facility, it looks much the same as one of our other data centers,” he says. “There are some minor variations, for example, we add some extra layers of security and resilience because it’s further proximate from the region itself.”
Outposts, on the other hand, are AWS’ on-premises offering in which AWS hardware goes to a customer’s data center.
Though Rehder says Outposts are “different” by their nature, he adds: “We try and make it as much of a pure AWS experience as possible, so it’s the same servers we use, and the same effective network concepts, but oftentimes there are some extra layers on top of that for integrating into the customer’s network in the building.”
Brick by brick
AWS’ Bricks are of their own design, and within the Brick itself, the company also provides networking hardware and switches.
The hardware, Rehder notes, has a single switching ASIC within it – a strategy that differs from most vendors, which will have multiple chips for this task.
“This is very good for certain places, but because you have a lot of these chips inside the system that are connected together, there’s a lot of internal complexity in the switch, in the router, and that’s hard to see or manage,” Rehder argues. “It’s much easier if you just break it down to that primitive unit, like the atomic unit, of just giving me the one chip in one device.”
AWS doesn’t actually make the ASIC – nor specify what company does – that goes into the network switches, but it does lay out the specifications for the components and the behaviour of the switch, as well as iteratively looking to improve its software and hardware design.
“We still have vendor devices, and in our experience, our switches are much better than theirs.”
Another area of hardware that AWS has “gone very deep into” is the optical transceivers, which shine the laser down the fiber in cables.
Rehder explains that AWS network switches can have 32 or 64 ports in them, creating a really “fascinating complexity.”
“Back in the days of 10G or 25G, optical transceivers were generally reliable, but it gets more complicated to push more speed through the fiber, and as we’ve gotten to say 400G or 800G, the reliability has gone down,” he says.
“We started to make material investments and went deeper into specifying how these optical module designs work, and trying to understand why links were failing.
“That led to a lot of investments in our fiber plant, and in how we redesign the modules to make them simpler and more reliable.”
The biggest investment, Rehder says, was in the software that runs on these modules.
“Two or three years back, we decided to start investing in that, and we now own all the software that runs on those models, and we actively patch them all the time,” he explains. “Now, our 400G generation is actually more reliable than our 100G. This is a huge advantage for us, especially with GenAI, where there are more links and the workloads in general are just more sensitive to failures.”
Oodles of noodles
Connecting all of this hardware is, to put it politely, rather a lot of cabling.
The network switches are stacked up and connected together in the rack, and are then connected to AWS’ server racks via fiber optic cables. According to Rehder, currently the cables within the racks use copper (he notes that the copper is “much fancier” than it used to be), but this isn’t possible outside the rack due to distance limitations.
When asked how much cabling AWS has, Rehder laughs. “We have many, many, many miles of cables. I don’t know the exact number, but it’s definitely one of the harder parts of data center networking.
“The largest AI data center we have has more than 100,000 links or fiber connections within one building.”
Rehder estimates that, given the “huge density of fiber connections in all AWS data centers,” a single data center could have hundreds or thousands of miles of fiber cables within it, and they all have to be very carefully organized. This, apparently, is not as simple as just a color-coordination method.
“In the past year of my job, I’ve probably spent more time on the fiber cables – the building, how we design and install them – than I have on any other part of the technology stack,” Rehder emphasizes.
When establishing a new data center, Rehder says the company wants to turn up capacity as quickly as possible, and this means bringing in all the network racks and cables and plugging them in. “That is a very real work, physical job, and the sheer volume of cables is very high. The more connections you need to install, the longer lead time you’ll have.”
He also reiterates that it isn’t like plugging in a power outlet – when working with fiber, any disturbance can cause a degradation of the signal.
To help with this, AWS uses “structured cabling,” which Rehder illustrates with a road analogy.
“You can route or bundle lots of the little pairs or threads of glass into creative physical structures. You basically have a freeway of these cables, and they can be dense – with hundreds or thousands of the fibers bundled in them, that go between rooms or rows, and then break off into smaller outlets and off-ramps into smaller ‘streets’ that flow to the racks.”
Then, by using higher-density connectors, AWS can reduce the number of things that need to be installed or touched over time.
This saves time in the actual deployment, but adds swathes of complexity during the planning stage.
Machine learnings
The cloud giant applies a strategy it calls “over subscription” to its networking in data centers.
Rehder explains this as having “more capacity facing the servers than you might have for them to talk to the Internet or to talk between data centers.” The company can balance this “dynamically” when deploying capacity, so it can “turn the dial up or down depending on the workloads of the servers we have in place,” he adds.
This, however, does not apply as effectively with AI or machine learning hardware. While the oversubscription model does enable AWS to “dial up” more capacity, Rehder notes that a machine learning or generative AI server will often need “two to three times the bandwidth of what other servers would have.”
“A lot of the ML networks aren’t oversubscribed, which means you build the capacity so all the servers can indeed talk to each other all at once, but that’s something our fundamental architecture and building blocks allow us to do relatively easily,” he says. “It’s the same hardware, same switches, same concept, same operating system.”
Across the industry, AI hardware is refreshing at a rapid rate, with Nvidia having moved to a roadmap featuring yearly updates for its GPUs. This is not the case with networking, with Rehder noting that new generations emerge every three to four years.
“It depends on where industry hardware is going,” says Rehder. “By staying on generations a little longer, you do get to a level of maturity with working the kinks out where it’s easier and more reliable for customers if you aren’t actually making changes all the time.”
“It is moving a little faster right now, but it’s still early days in terms of generative AI demand. We haven’t really pulled in our hardware refresh cycles yet, we are more keeping an eye on when the next generation is coming, and if we want to do it when we normally would, or try and move a little bit earlier.”
A technology shift that Rehder sees as potentially being interesting in the context of ML is co-packaged optics. This is an advanced integration of optics and silicon on a single packaged substrate, and aims to help increase bandwidth while reducing power consumption. However, the technology remains stubbornly on the horizon, perpetually ready to come to fruition “next year.”
But Rehder believes we are getting closer to that “next year.” He says: “It’s real and it’s happening, but there’s a trade off.
“There are advantages in less power usage, and the short distances things are travelling. But if you bake all the optics into the switch, then you’ve eaten all the costs associated with it, and if you aren’t going to actually use every port on the switch to plug something in, there’s an extra cost associated.”
Other emerging technologies are also looked at by AWS, but aren’t necessarily in place as yet.
On the topic of Free Space Optics, Rehder responds: “Every couple of years, someone has the April Fool’s Day project of Free Space Optics with the disco ball. I don’t think we are there yet in terms of where it’s a technology that is interesting.”
He is, however, more optimistic about the potential of hollow-core fiber, though he notes that this is likely to play a bigger role outside of the data center than in.
AWS put hollow-core fiber into production for the first time in 2024. While it can reduce latency, within a data center itself, this is negligible.
“The latency inside the data center is already relatively low from the fiber in the building – they are all short runs, and there’s a marginal advantage to reducing that further.
“Everything from testing and talking to customers suggests a couple percent reduction in latency doesn’t really move the needle much in terms of performance,” Rehder shrugs.
Reliability is king
While new and experimental technology is always exciting, for a business the size of AWS, reliability and uptime are the most important things for customers.
While AWS looks at new solutions, Rehder explains that there’s always the question of “can we do something fundamentally different with new hardware?” If the answer is no, then “we don’t see an advantage in that,” he says.
He adds: “Moving to new hardware unlocks other risks, and it’s nothing against any of the providers; there are always more kinks and bugs when you are learning.
“The new generations are always going to see complexities – and when you are looking to make things faster, bigger, and better, and pushing lots of limits across lots of different processes, the jump to the next complexity level is going to be even harder than the previous one.
“There’s going to be a lot of learnings, for example, in manufacturing and design processes. We try not to bring our customers through that. We’d rather already have the kinks worked out, and know it’s going to work at scale.”
With that backdrop, full-scale outages caused by the networks are very rare, but part of preventing those is always being prepared.
As Rehder tells DCD: “Everything fails. Everything will fail more than you expect it to, and it will fail in unique and exciting and creative ways.”
By opting for a “simpler” networking solution, Rehder and AWS will be hoping to avoid too much of this kind of excitement in the future.
https://www.datacenterdynamics.com/en/analysis/how-aws-is-connecting-the-cloud/

