AWS adds P6b200 Blackwell GPUs and serverless fine-tuning
SageMaker gains 8‑GPU Blackwell nodes and serverless Qwen 3.6 customization, easing larger agent workloads
An illustration displays a cube of microchips connected by curved lines to a digital interface featuring chat, document, and robot icons. © The GPU Trade Inc 2026
Amazon Web Services this week expanded SageMaker with two headline moves: broader availability of P6‑B200 Blackwell GPU nodes for Studio notebooks and serverless fine‑tuning support for Qwen 3.6. Both updates were posted on AWS’s site during the week of May 11–17, 2026.
The P6‑B200 capacity is now available in SageMaker Studio notebooks in US East (N. Virginia), bringing an interactive path to very large GPU nodes for developers and data scientists. AWS describes the P6‑B200 family as powered by eight NVIDIA Blackwell GPUs with 1,440 GB of high‑bandwidth GPU memory.
On SageMaker the instance appears as ml.p6‑b200.48xlarge for notebook use, and the node couples the 8 Blackwell GPUs with 192 vCPUs and about 2 TiB of system memory, marking it as a high‑capacity training and experimentation option. Cloud instance listings and AWS documentation reflect those core specs.
The P6‑B200 line builds on AWS’s earlier EC2 P6 launch and targets large foundation‑model training and multimodal work. AWS has pitched the family as delivering up to roughly 2x training performance over the prior P5en generation and as suitable for distributed training, agents, and multi‑modal reasoning workloads.
Separately, Amazon announced that SageMaker AI now supports serverless model customization for Qwen 3.6, the 27‑billion‑parameter open‑weight model from the Qwen family. The May 14 notice says SageMaker now supports both supervised fine‑tuning (SFT) and reinforcement fine‑tuning (RFT) for Qwen 3.6.
AWS specifically framed the Qwen 3.6 feature as serverless customization, meaning SageMaker handles infrastructure provisioning and orchestration so customers “only pay for what you use.” The announcement lists immediate availability in US East (N. Virginia), US West (Oregon), Asia Pacific (Tokyo), and EU (Ireland).
Taken together, the hardware and software moves make it easier to both run and adapt larger models on AWS. The P6‑B200 nodes bring very large GPU memory pools and NVLink‑style connectivity that reduce the need to shard models across many smaller GPUs, while the serverless fine‑tuning path reduces ops overhead for customization. Those are two complementary levers for scaling agentic and multi‑step AI workloads.
For enterprises, serverless fine‑tuning lowers the operational barrier to specialize open‑weight models like Qwen. Firms that lacked GPU cluster expertise or wanted to avoid reserving high‑capacity blocks can now run SFT or RFT flows without managing spot fleets or long‑running clusters. That can shorten experiments and broaden who in an organization can perform model adaptation.
There are cost and capacity tradeoffs to note. P6‑B200 nodes are large, power‑hungry instances intended for high‑throughput training and experimentation, so raw on‑demand costs per hour will be substantial compared with smaller GPU instances. Serverless customization reduces some of this friction by moving billing to usage, but teams must still plan for peak resource needs during heavy fine‑tuning jobs.
AWS’s moves also sit alongside other recent infrastructure launches designed to cut inference and deployment overhead. For example, AWS introduced G7e instances for lower‑cost inference on large models, showing a parallel push to compress operational cost across both training and serving. Together these launches signal a platform strategy that covers high‑end training, cheaper inference, and easier customization.
For customers building agentic systems — long‑running agents that need larger context windows, multimodal inputs, or multi‑step planning — the new P6 nodes reduce a common bottleneck: GPU memory and interconnect. Serverless fine‑tuning then makes it simpler to align base models to corporate data, policies, or tool interfaces without a full‑time infra team. That combination is likely to accelerate pilot projects moving to production.
What to watch next: pricing details and regional rollouts will shape adoption. Availability in additional regions, support for more model families in serverless fine‑tuning, and integration with SageMaker HyperPod or other multi‑node orchestration features will determine whether enterprises shift large‑scale LLM work from other clouds or on‑prem clusters to AWS. For now, AWS’s May announcements make its platform a stronger contender for large, agentic, and enterprise‑customized AI workloads.