Aws

AWS and NVIDIA Expand Cloud GPU Footprint — 1M+ Blackwell and Rubin GPUs

A joint push to make inference-ready Blackwell and Rubin GPUs widely available across AWS regions.

An abstract digital background features luminous circuit patterns and interconnected network nodes layered across dark, translucent geometric structures. © The GPU Trade Inc 2026

By The GPU Trade Staff May 23, 2026

Amazon Web Services and NVIDIA announced an expanded infrastructure partnership that will put more than one million Blackwell- and Rubin‑generation GPUs into AWS regions beginning in 2026, a move aimed at speeding enterprises’ shift to inference‑optimized cloud services.

The hardware plan was unveiled at NVIDIA GTC and described in an AWS machine‑learning blog post and follow‑up coverage by industry outlets. AWS said the capacity commitment covers both Blackwell and the newer Rubin architectures and ties into broader software and networking integrations.

NVIDIA’s own Rubin announcement frames the platform as built for “agentic AI” — workloads that require large context windows, fast multi‑token inference, and tighter hardware‑software codesign. Rubin is presented as a rack‑scale, NVLink‑connected architecture designed to boost inference throughput per watt and to lower cost per token versus prior generations.

AWS has already begun rolling Blackwell‑based EC2 offerings and said it will add more instance types targeted at production inference. Amazon made its G7e Blackwell instances generally available on Jan. 20, 2026, and separately announced support for RTX PRO Blackwell Server Edition variants as part of the GTC disclosures.

The emphasis in the deal is squarely on inference and agentic workflows rather than only on training capacity. NVIDIA and AWS both highlighted software and interconnect features — including NVLink, inference transfer libraries and high‑speed EFA networking — that aim to reduce inter‑token latency for large model serving.

Industry coverage framed the commitment as a response to urgent enterprise demand for inference scale, and as a shift in how hyperscalers manage scarce GPU inventory. Observers say the promise of a million GPUs changes platform engineering priorities: capacity planning, cross‑region routing and reserved capacity strategies will all come to the fore.

Operationally, the announcement bundles hardware and system changes. Rubin NVL72 rack‑scale systems pack many Rubin GPUs with Vera CPUs and NVLink 6 switches for low‑latency rack‑level topology, while AWS is adding software hooks like NIXL on EFA to enable disaggregated inference across GPUs and Trainium accelerators. The architecture shift is meant to let clouds stitch pools of inference capacity with lower token costs.

For customers, the immediate implication is more available high‑end inference capacity but not necessarily lower on‑demand prices. Analysts and FinOps practitioners warn that faster cards still cost a premium if utilization is low, and many organizations will need reserved capacity or capacity blocks to avoid a “region lottery” for availability.

The announcement also recognizes a multi‑provider ecosystem. NVIDIA listed large cloud partners and specialist GPU cloud providers that will offer Rubin and related products starting in the second half of 2026, so enterprises planning multi‑cloud inference pipelines may finally see more consistent hardware across providers. That can simplify tooling and model tuning, but latency, data‑sovereignty and egress costs still complicate cross‑cloud deployments.

Supply and component constraints remain a practical limit. Industry reporting has flagged intense demand for HBM memory and high‑bandwidth interconnect components as Rubin production ramps, and enterprises should expect staged region rollouts rather than a uniform, immediate global availability. Those bottlenecks mean the one‑million figure is distribution across time and regions, not a one‑day inventory dump.

Smaller specialized cloud providers and so‑called neoclouds are an important part of the story. Companies such as CoreWeave, Lambda and Nebius are named among early Rubin adopters, and their presence offers customers alternative pricing and burst capacity when hyperscaler availability is constrained. That dynamic could press down spot prices for inference at scale while hyperscalers pursue reserved and capacity‑block sales.

Taken together, the AWS‑NVIDIA expansion signals a practical pivot: hyperscalers are aligning supply with the immediate economics of inference and agentic AI, not just chasing raw training peak performance. For platform teams and FinOps groups, the next 12–24 months will be focused on matching model choices, utilization strategies and cross‑region routing to a new, but still finite, fleet of Blackwell and Rubin GPUs.