Nvidia

NVIDIA Unveils Blackwell, Targets Agentic AI Factories

Blackwell GPUs and Rubin hardware aim to scale large‑context inference and agentic AI

Blackwell GPUs and Rubin hardware aim to scale large‑context inference and agentic AI

Two technicians kneel on the floor of a data center aisle while working on cables inside an open server rack. © The GPU Trade Inc 2026


NVIDIA used this year’s GTC keynote to press its advantage in AI silicon, spotlighting the Blackwell GPU microarchitecture alongside a set of Rubin-class products built for “agentic” AI and high-volume inference factories. The company framed the announcements as steps toward purpose-built datacenter platforms for massive, multi‑step AI workloads.

Blackwell first arrived as NVIDIA’s flagship GPU architecture intended for large language models and generative AI workloads. The Blackwell B200 single‑chip GPU, unveiled at GTC, packs roughly 208 billion transistors and is manufactured on a TSMC 4NP node, with NVIDIA claiming major gains in throughput and energy efficiency versus the prior H100 family.

At recent GTC events NVIDIA layered Rubin — sometimes called Vera Rubin — on top of the Blackwell story, describing a new platform of chips, rack designs and software that the company says is optimized for “agentic” AI systems that run long‑running, multi‑step tasks at huge token scales. NVIDIA’s investor release and developer blog say Rubin includes rack‑scale NVL systems and new memory/IO tiers to keep inference state across long workflows.

NVIDIA has also introduced Rubin‑class GPUs aimed specifically at massive‑context inference, including Rubin CPX and multi‑chip superchips that target higher token throughput and lower cost‑per‑token for production inference. The company claims some Rubin configurations deliver up to 10x higher token throughput per megawatt versus comparable Blackwell systems in its internal tests.

Technically, NVIDIA points to several Blackwell advances that feed into these platforms: a second‑generation Transformer Engine, dedicated decompression hardware, larger on‑chip memory and tighter CPU/GPU integration via its Grace family. Those elements are designed to shrink model‑parallel overheads and speed both training and inference for trillion‑parameter models.

For cloud providers and hyperscalers the message is practical: adopt denser racks, faster interconnects and AI‑native storage layers if you want to run “AI factories” at scale. NVIDIA showcased partners and system designs built around NVL rack systems and HGX boards that emphasize liquid cooling and high GPU counts per rack. Those design choices change usual datacenter power, cooling and space tradeoffs.

The announcements also have immediate supply‑chain implications. Blackwell‑class chips rely on advanced TSMC process nodes and complex packaging, while Rubin superchips increase demand for high‑bandwidth memory, networking silicon and liquid‑cooling hardware. That raises the stakes for chip fabs, memory suppliers and OEMs competing to meet hyperscaler purchase cycles.

NVIDIA’s integrated software stack — CUDA, TensorRT‑LLM and the NeMo framework — remains a key lever that makes its hardware attractive to customers. The company argues that pairing hardware with tightly tuned inference runtimes and model tooling lowers migration friction and boosts real‑world cost efficiency for token‑heavy services.

For smaller cloud providers and enterprises, the new systems create a tougher economics problem. NVIDIA’s benchmarks portray major operating‑cost savings per token, but the hardware is expensive and drives new requirements for networking and cooling. Firms will need to weigh build‑versus‑buy, rack density versus facility upgrades, and long‑term software lock‑in when deciding whether to deploy Rubin or Blackwell systems.

Competitors and alternative architectures are not standing still. Public coverage of Rubin and Blackwell has prompted commentary that NVIDIA’s lead is widening, but vendors such as AMD, specialized AI chip startups and cloud builders continue to push differentiated approaches to inference and model scaling. The market will ultimately judge performance, price and operational costs in real deployments.

There are reasons for caution. Many of the headline performance claims come from NVIDIA’s own tests or partner demos; independent benchmarks and long‑term availability data are still emerging. Observers will be watching real‑world throughput, software maturity, and how quickly ecosystems outside NVIDIA adapt to new architectures.

What to watch next: adoption announcements from hyperscalers, independent benchmark reports and supply‑chain signals from TSMC and major OEMs. If Rubin and the newest Blackwell variants deliver their promised token‑scale efficiency in real datacenters, they could reshape cloud pricing, datacenter design and the broader economics of production AI.