AWS activates Project Rainier: One of the world’s largest AI compute clusters comes online

The collaborative infrastructure innovation delivers nearly half a million Trainium2 chips in record time, with Anthropic scaling to more than one million chips by the end of 2025.

A data center technician walking within a data center hall near UltraServers.

Written by Amazon Staff

6 min read

Project Rainier is now in use, featuring one of the world's largest AI compute clusters with nearly half a million Trainium2 chips.
AWS deployed this massive AI infrastructure project less than one year after it was first announced, with partner Anthropic already running workloads.
Anthropic is actively using Project Rainier to build and deploy its industry-leading AI model, Claude, which AWS expects to be on more than 1 million Trainium2 chips by the end of 2025.

Project Rainier, one of the world's largest AI compute clusters, is now fully operational, less than one year after it was first announced. This groundbreaking project represents a major milestone in AWS's commitment to advancing its AI infrastructure at an unprecedented scale.

AWS collaborated with AI safety and research leader, Anthropic, on Project Rainier, which features nearly half a million Trainium2 chips and provides more than five times the compute power Anthropic used to train its previous AI models.

Anthropic is actively using Project Rainier to build and deploy its industry-leading AI model, Claude. Claude is expected to be on more than 1 million Trainium2 chips—for workloads including training and inference—by the end of the year.

This AI compute power is being used to build and deploy future versions of Claude. The more compute that is dedicated to training this frontier model, the smarter and more accurate it will become.

A mountain of compute

Project Rainier, named after the 14,410-foot (4,392-meter) stratovolcano that can be seen from Seattle on a clear day, is an endeavor as monumental as its namesake.

A Data Center Technician working on a laptop device.

Spread across multiple data centers in the United States, the sheer size of the project is unlike anything AWS has ever attempted before.

“Project Rainier is one of AWS’s most ambitious undertakings to date,” said Ron Diamant, an AWS distinguished engineer and head architect of Trainium. “It’s a massive, one-of-its-kind infrastructure project that will usher in the next generation of artificial intelligence models.”

Chips chips chips

To deliver on this bold vision, Project Rainier is designed as a massive “EC2 UltraCluster of Trainium2 UltraServers.” The first part refers to Amazon Elastic Compute Cloud (EC2), an AWS service that lets customers rent virtual computers in the cloud rather than buying and maintaining their own physical servers.

Trainium2 chip

The more interesting bit is Trainium2, a custom-designed AWS AI chip built specifically for training artificial intelligence systems. Unlike the general-purpose chips in your laptop or phone, Trainium2 is specialized for processing the enormous amounts of data required to teach AI models how to complete all manner of different and increasingly complex tasks—fast.

With Project Rainier, AWS has already built Trainium2 infrastructure that’s 70% larger than any other AI computing platform in AWS history.

To put the power of Trainium2 in context: a single chip is capable of completing trillions of calculations a second. If, understandably, that’s a little hard to visualize: consider that it would take one person more than 31,700 years to count to one trillion. A task that would require millennia for a human to complete can do in the blink of an eye with Trainium2.

From traditional to ultra

Impressive, yes. But Project Rainier doesn’t just use one, or even a few, chips. This is where the UltraServers and UltraClusters come in.

Traditionally, servers in a data center operate independently. If and when they need to share information, that data has to travel through external network switches. This introduces latency, which is not ideal at such large scale.

A motion view within a hallway of UltraServers within a data center.

AWS’s answer to this problem is the UltraServer. A new type of compute solution, an UltraServer combines four physical Trainium2 servers, each with 16 Trainium2 chips. They communicate via specialized high-speed connections called “NeuronLinks.” Identifiable by their distinctive blue cables, NeuronLinks are like dedicated express lanes, allowing data to move much faster within the system and significantly accelerating complex calculations across all 64 chips.

When you connect tens of thousands of these UltraServers and point them all at the same problem, you get Project Rainier—a mega “UltraCluster.”

No room for failure

Communication between components happens at two critical levels: the NeuronLinks provide high-bandwidth connections within UltraServers, while Elastic Fabric Adapter (EFA) networking technology (identified by its yellow cables) connects UltraServers inside and across data centers. This two-tier approach maximizes speed where it's most needed while maintaining the flexibility to scale across multiple data center buildings.

So far, so good—but operating and maintaining such an enormous compute cluster is not without its challenges. To ensure all of that gigantic capacity is available to customers, reliability is paramount. That’s where the company’s approach to hardware and software development really comes to the fore. Unlike most other cloud providers, AWS builds its own hardware, and in doing so, can control every aspect of the technology stack, from a chip’s tiniest components, to the software that runs on it, to the complete design of the data center itself.

Controlling the stack

This vertical integration gives AWS a significant advantage in accelerating machine learning and reducing cost barriers to AI accessibility. With visibility across the entire stack—from chip design to software implementation to server architecture—AWS can optimize at precisely the right points in the system.

A technician working on a laptop.

Sometimes the solution lies in redesigning power delivery systems, sometimes in rewriting the software that coordinates the entire operation, and oftentimes in doing all of these solutions simultaneously. By maintaining comprehensive oversight of every component and system level, AWS can troubleshoot and innovate at pace.

Sustainability at scale

The exterior of a data center

As the teams that run AWS’s data centers innovate quickly, they are also focused on improving energy efficiency—whether from rack layouts to electrical distribution to cooling techniques. When it comes to carbon-free energy use in data centers: all of the electricity consumed by Amazon’s operations, including its data centers, was matched 100% renewable energy resources in 2023 and 2024.

The company is investing billions of dollars in nuclear power and battery storage, and in financing large-scale renewable energy projects around the world to power its operations. In fact, for the past five years Amazon has been the largest corporate purchaser of renewable energy in the world. The company is still on a path to be net-zero carbon by 2040. This goal remains unchanged by the addition of Project Rainier, and its continued worldwide growth in general.

Last year AWS announced it would be rolling out new data center components that combine advances in power, cooling, and hardware, not only for data centers it’s currently building, but also in existing facilities. New data center components are projected to reduce mechanical energy consumption by up to 46% and reduce embodied carbon in the concrete used by 35%.

The new sites the company is constructing to support Project Rainier and beyond will include a variety of upgrades for energy efficiency and sustainability.

A technician standing between water pipes.

Some of these will have a strong focus on water stewardship. AWS engineers its facilities to use as little water as possible, and where possible none at all. One way it does this is by eliminating cooling water use in many of its facilities for most of the year, instead relying on outside air. For example, data centers in St. Joseph County, Indiana—one of the Project Rainier sites—will maximize the use of outside air for cooling. From October to March the data centers won’t use any water for cooling at all, while on an average day from April to September they’ll only use cooling water for a few hours per day.

Thanks to engineering innovations like this, AWS leads the industry in water efficiency. Based on findings from a recent Lawrence Berkeley National Laboratory (LBNL) report on the data center industry’s water usage efficiency (WUE), the industry standard measure of how efficiently water is used inside data centers is 0.375 liters of water per kilowatt-hour. At 0.15 liters of water per kilowatt hour, AWS’s WUE is more than twice as good as than the industry average. It’s also a 40% improvement since 2021.

The future of AI

Project Rainier doesn't just push technical boundaries—it represents a fundamental shift in what's possible with artificial intelligence. And the implications go much further than making Claude an infinitely more sophisticated model.

A motion video of an aerial view of a data center

Project Rainier is now a template for deploying the kind of raw computational power that will allow AI to tackle challenges that have long resisted human solution, enabling breakthroughs across everything from medicine to climate science.

Just as its namesake peak stands as a defining landmark of the Pacific Northwest, Project Rainier marks a distinct before-and-after moment in computing history—one that could transform the technological landscape, chip by chip by chip.

More about AWS innovation

Look inside Annapurna Labs, where AWS designs custom chips, and explore the intricate world of AWS chips.
Discover how Amazon is making its data centers more sustainable, through hardware and liquid cooling efficiency, reducing carbon emissions, and optimizing workloads.
Learn about how AWS is already more than 50% of the way to being ‘water positive’ by 2030, meaning it will return more water to communities and the environment than it uses in its data center operations.

Trending news and stories

More Amazon News

1 / 1