Nvidia

NVIDIA pushes Nemotron 3 Ultra into GA

550B MoE model rolls out to partners as enterprises race to agentic AI

By The GPU Trade Staff June 4, 2026

NVIDIA said this week it is moving Nemotron 3 Ultra into general availability, making the model broadly accessible to partner platforms and cloud providers in the first week of June 2026. The company unveiled Nemotron 3 Ultra during its GTC/Taipei keynote and said the model would be available starting June 4, 2026.

Nemotron 3 Ultra is described by NVIDIA as a 500–550 billion parameter, mixture‑of‑experts (MoE) model tuned for long‑running, agentic workloads that need persistent memory and tool use over many steps. The company’s technical documentation and model README list the Ultra variant at the top of the Nemotron 3 family.

NVIDIA said the Ulta release will appear on popular model hubs and inference platforms and as NVIDIA NIM microservices, naming Hugging Face, ModelScope and OpenRouter as initial distribution points, alongside a broad set of NVIDIA cloud partners and inference providers. That rollout is intended to let enterprises plug the model into existing agent runtimes and production pipelines.

On architecture, Nemotron 3 Ultra uses a hybrid Mamba‑Transformer MoE design that activates a fraction of the full parameter set per token — a common MoE efficiency pattern that keeps compute and memory use lower than a dense model of equal total parameters. NVIDIA’s public docs describe the Ultra variant as having a large total parameter count with a much smaller active subnetwork per token.

NVIDIA staff and accompanying materials promoted sharp throughput and cost gains for agentic inference. The company claimed the Ultra model, when paired with its NVFP4 format and Blackwell platform, can deliver multiple‑times higher throughput and materially lower inference costs for complex, multi‑step agent tasks. NVIDIA quantified some gains in product materials — for example, promoting up to 5x higher throughput and as much as 30% lower cost for certain agentic workloads.

The vendor push was accompanied by enterprise integration announcements. NVIDIA named CrowdStrike and Palantir among early users of Nemotron models for continuous, domain‑specific agents, and listed Cadence, Dassault Systèmes, Siemens and Synopsys as early adopters of its NemoClaw toolkits for engineering agents. SAP and ServiceNow were cited as platform partners integrating NVIDIA’s OpenShell runtime for policy and oversight.

Why this matters for enterprises: agentic AI—systems that plan, call tools, and persist memory across sessions—changes procurement and cost patterns. Long‑running agents often drive high inference costs because they keep large context and invoke external tools; models that increase throughput per GPU can reduce cloud bills and make agent deployments economically viable. NVIDIA framed Nemotron 3 Ultra as purpose‑built for that tradeoff.

The technical tie‑ins matter. NVIDIA is positioning Nemotron 3 Ultra to run best on its broader Vera Rubin/Blackwell compute stack and to interoperate with its OpenShell secure runtime, NemoClaw agent blueprints and CUDA‑X skills. Those integrations aim to give enterprises a packaged pathway from pilot to production, including on‑prem and hybrid cloud options.

Cloud and inference vendors are already preparing to surface Nemotron 3 Ultra as an API and as microservice images. NVIDIA’s press materials say the model will be distributed as NIM microservices for easier integration, which shortens the path for ISVs and systems integrators to embed agent capabilities into vertical workflows. That distribution model also shapes how providers will meter and price agent workloads.

Security and governance were central to the launch narrative. NVIDIA emphasized OpenShell as a secure runtime for agents, and said it is working with Microsoft, Red Hat, Canonical and others on primitives and integrations to keep agents contained, auditable and policy‑driven in enterprise environments. The company framed these controls as necessary as agents gain the ability to access files, run code and orchestrate business systems.

Nemotron 3 Ultra arrives into a crowded and rapidly changing model landscape. Industry trackers and news aggregators noted the launch alongside other compute and model announcements at Computex and GTC Taipei, and placed NVIDIA’s open‑weights push in the context of broader competition between open and closed frontier models for enterprise workloads. Observers will watch early benchmarks, pricing, and how easily partners operationalize the model.

For enterprises considering Nemotron 3 Ultra, the practical checklist will be standard: verify bench‑marking on your workloads, test integrations with OpenShell and NemoClaw, and model costs under expected agent lifecycles. NVIDIA’s press materials include the usual forward‑looking caveats that availability, features and performance claims are subject to change, so CIOs should validate results in pilot deployments before large rollouts.