How AWS engineers infrastructure to power generative AI

Generative artificial intelligence (AI) has transformed our world seemingly overnight, as individuals and enterprises use the new technology to enhance decision-making, transform customer experiences, and boost creativity and innovation. But the underlying infrastructure that powers generative AI wasn’t built in a day—in fact, it is the result of years of innovation.

AI and machine learning (ML) have been a focus for Amazon for more than 25 years, and they drive everyday capabilities like shopping recommendations and packaging decisions. Within Amazon Web Services (AWS), we’ve been focused on bringing that knowledge to our customers by putting ML into the hands of every developer, data scientist, and expert practitioner. AI is now a multibillion-dollar revenue run rate business for AWS. Over 100,000 customers across industries—including adidas, New York Stock Exchange, Pfizer, Ryanair, and Toyota—are using AWS AI and ML services to reinvent experiences for their customers. Additionally, many of the leading generative AI models are trained and run on AWS.

All of this work is underpinned by AWS’s global infrastructure, including our data centers, global network, and custom AI chips. There is no compression algorithm for experience, and since we’ve been building large-scale data centers for more than 15 years and GPU-based (graphics processing units) servers for more than 12 years, we have a massive existing footprint of AI infrastructure.

As the world changes rapidly, AWS continues to adapt and improve upon our strong infrastructure foundation to deliver new innovations that support generative AI at scale. Here are four ways we’re doing that.

Page overview

Delivering low-latency, large-scale networking

1

Delivering low-latency, large-scale networking

2

Continuously improving the energy efficiency of our data centers

3

Security from the ground up

4

AWS AI Chips

1.

Delivering low-latency, large-scale networking

Generative AI models require massive amounts of data to train and run efficiently. The larger and more complex the model, the longer the training time. As you increase time to train, you’re not only increasing operating costs but also slowing down innovation. Traditional networks are not sufficient for the low latency and large scale needed for generative AI model training.

We’re constantly working to reduce network latency and improve performance for customers. Our approach is unique in that we have built our own network devices and network operating systems for every layer of the stack—from the Network Interface Card, to the top-of-rack switch, to the data center network, to the internet-facing router and our backbone routers. This approach not only gives us greater control over improving security, reliability, and performance for customers, but also enables us to move faster than others to innovate. For example, in 2019, we introduced Elastic Fabric Adapter (EFA), a network interface custom-built by AWS that provides operating system bypass capabilities to Amazon EC2 instances. This enables customers to run applications requiring high levels of inter-node communications at scale. EFA uses Scalable Reliable Datagram (SRD), a high-performance, lower-latency network transport protocol that was designed specifically by AWS, for AWS.

More recently, we moved fast to deliver a new network for generative AI workloads. Our first generation UltraCluster network, built in 2020, supported 4,000 graphics processing units, or GPUs, with a latency of eight microseconds between servers. The new network, UltraCluster 2.0, supports more than 20,000 GPUs with 25% latency reduction. It was built in just seven months, and this speed would not have been possible without the long-term investment in our own custom network devices and software. Internally, we call UltraCluster 2.0 the “10p10u” network, as it delivers tens of petabits per second of throughput, with a round-trip time of less than 10 microseconds. The new network results in at least 15% reduction in time to train a model.

2.

Continuously improving the energy efficiency of our data centers

Training and running AI models can be energy-intensive, so efficiency efforts are critical. AWS is committed to running our business in an efficient way to reduce our impact on the environment. Not only is this the right thing to do for communities and for our planet, but it also helps AWS reduce costs, and we can then pass those cost savings on to our customers. For many years, we’ve focused on improving energy efficiency across our infrastructure. Some examples include:

Optimizing the longevity and airflow performance of the cooling media in our data center cooling systems.

Using advanced modeling methods to understand how a data center will perform before it‘s built and to optimize how we position servers in a rack and in the data hall so that we’re maximizing power utilization.

Building data centers to be less carbon-intensive, including using lower-carbon concrete and steel, and transitioning to hydrotreated vegetable oil for backup generators.

New research by Accenture shows these efforts are paying off. The research estimates that AWS’s infrastructure is up to 4.1 times more efficient than on-premises, and when optimizing on AWS, carbon footprint can be reduced by up to 99%. But we can’t stop there as power demand increases.

AI chips perform mathematical calculations at high speed, making them critical for ML models. They also generate much more heat than other types of chips, so new AI servers that require more than 1,000 watts of power per chip will need to be liquid-cooled. However, some AWS services utilize network and storage infrastructure that does not require liquid cooling, and therefore, cooling this infrastructure with liquid would be an inefficient use of energy. AWS’s latest data center design seamlessly integrates optimized air-cooling solutions alongside liquid cooling capabilities for the most powerful AI chipsets, like the NVIDIA Grace Blackwell Superchips. This flexible, multimodal cooling design allows us to extract maximum performance and efficiency whether running traditional workloads or AI/ML models. Our team has engineered our data centers—from rack layouts to electrical distribution to cooling techniques—so that we continuously increase energy efficiency, no matter the compute demands.

3.

Security from the ground up

One of the most common infrastructure questions we hear from customers as they explore generative AI is how to protect their highly sensitive data. Security is our top priority, and it’s built into everything we do. Our infrastructure is monitored 24/7, and when data leaves our physical boundaries and travels between our infrastructure locations, it is encrypted at the underlying network layer. Not all clouds are built the same, which is adding to the number of companies moving their AI focus to AWS.

AWS is architected to be the most secure and reliable global cloud infrastructure. Our approach to securing AI infrastructure relies on three key principles: 1) Complete isolation of the AI data from the infrastructure operator, meaning the infrastructure operator must have no ability to access customer content and AI data, such as AI model weights and data processed with models; 2) Ability for customers to isolate AI data from themselves, meaning the data remains inaccessible from customers’ own users and software; and 3) Protected infrastructure communications, meaning the communication between devices in the ML accelerator infrastructure must be protected.

In 2017, we launched the AWS Nitro System, which protects customers’ code and data from unauthorized access during processing, fulfilling the first principle of Secure AI Infrastructure. The second principle is fulfilled by our integrated solution between AWS Nitro Enclaves and AWS Key Management Service (AWS KMS). With Nitro Enclaves and AWS KMS, customers can encrypt their sensitive AI data using keys that they own and control, store that data in a location of their choice, and securely transfer the encrypted data to an isolated compute environment for inferencing. Throughout this process, the data is encrypted and isolated from their own users and software on their EC2 instance, and AWS operators cannot access this data. Previously, Nitro Enclaves operated only in the CPU. Recently, we announced our plans to extend this Nitro end-to-end encrypted flow to include first-class integration with ML accelerators and GPUs, fulfilling the third principle.

4.

AWS AI Chips

The chips that power generative AI are crucial, impacting how quickly, inexpensively, and sustainably you can train and run models.

Close-up of engineer's hands working on computer chip

For many years, AWS has innovated to reduce the costs of our services. This is no different for AI—by helping customers keep costs under control, we can ensure AI is accessible to customers of all sizes and industries. So for the last several years, we’ve been designing our own AI chips, including AWS Trainium and AWS Inferentia. These purpose-built chips offer superior price performance, and make it more energy-efficient to train and run generative AI models. AWS Trainium is designed to speed up and lower the cost of training ML models by up to 50 percent over other comparable training-optimized Amazon EC2 instances, and AWS Inferentia enables models to generate inferences more quickly and at lower cost, with up to 40% better price performance than other comparable inference-optimized Amazon EC2 instances. Demand for our AI chips is quite high given its favorable price-performance benefits relative to available alternatives. Trainium2 is our third-generation AI chip and will be available later this year. Trainium2 is designed to deliver up to 4 times faster training than first-generation Trainium chips and will be able to be deployed in EC2 UltraClusters of up to 100,000 chips, making it possible to train foundation models and large language models in a fraction of the time, while improving energy efficiency up to 2 times.

Additionally, AWS works with partners including NVIDIA, Intel, Qualcomm, and AMD to offer the broadest set of accelerators in the cloud for ML and generative AI applications. And we’ll continue to innovate in order to deliver future generations of AWS-designed chips that deliver even better price performance for customers.

Amid the AI boom, it’s important that organizations choose the right compute infrastructure to lower costs and ensure high performance. We are proud to offer our customers the most secure, performant, cost-effective, and energy-efficient infrastructure for building and scaling ML applications.