Nvidia

NVIDIA Unveils Nemotron 3 Ultra

A 500–550B open MoE model built for agentic, long‑context workflows

A 500–550B open MoE model built for agentic, long‑context workflows

NVIDIA used its Computex/GTC keynote on June 1, 2026 to introduce Nemotron 3 Ultra, a new flagship in its open‑weights Nemotron family aimed at agentic AI and long‑running workflows.

Nemotron 3 Ultra is built as a large mixture‑of‑experts model with roughly 500–550 billion total parameters and up to about 55 billion active parameters per token in inference, according to NVIDIA’s public documentation.

NVIDIA says it will publish the model weights and training recipes — part of a broader commitment to “open‑weights” releases that include code, datasets where licensed, and evaluation tools. The company’s technical papers and preprints outline the plan to share pre‑ and post‑training artifacts.

The company positioned Nemotron 3 Ultra as hardware‑aware: NVIDIA trained and tuned the model using its NVFP4 formats and targeted Blackwell‑class GPUs, and the release is optimized for the new Rubin platform family that NVIDIA is promoting for agentic PCs and servers.

NVIDIA describes Nemotron 3 Ultra as the top tier of the Nemotron 3 family — designed specifically for agentic reasoning, chained decision making, and long‑context tasks that involve persistent memory and multi‑step workflows. The company framed the model as a building block for developers and enterprises building multi‑agent systems.

At the technical level, Nemotron 3 Ultra uses a hybrid or latent MoE design that activates only a subset of experts per token. That gating lets the model scale total parameter counts far beyond active parameter budgets, which improves efficiency for workloads that need sharp reasoning rather than brute‑force dense computation.

For agentic applications this matters because active parameter efficiency can cut latency and cost for long‑running sessions where the model must track state, invoke tools, and coordinate multiple components. Developers building agent orchestration frameworks stand to benefit from a model that balances sparsity with a large expert pool.

NVIDIA also emphasized software and ecosystem pieces: NeMo libraries, NeMo Gym RL environments, and a suite of evaluation tools accompany the Nemotron 3 family, making it easier for teams to fine‑tune, safety‑test, and deploy agent‑centric workflows at scale.

The release marks a notable shift in the industry: a major silicon vendor is publishing frontier open weights and training recipes while tying them closely to its hardware roadmap. That approach blurs the line between chip vendor and model developer and could accelerate hardware‑software co‑design for agent workloads.

There are immediate implications for clouds, enterprises, and on‑device agents. Cloud providers may offer optimized Blackwell and Rubin instances for Nemotron 3 Ultra; enterprises may run private agent fleets with the open weights; and software vendors can build orchestration layers that leverage the model’s long‑context strengths.

NVIDIA said Nemotron 3 Ultra weights and recipes will be released with documentation and tools to reproduce key steps; the company positioned the timing inside its 1H 2026 rollout window and invited partners and the open research community to begin experiments once the assets are published.