URGENT UPDATE: The world’s leading cloud providers are rapidly adopting Nvidia’s Dynamo to revolutionize AI inference performance. This game-changing move, confirmed in a recent blog post, showcases how Amazon Web Services (AWS), Google Cloud, Microsoft Azure, and Oracle Cloud Infrastructure (OCI) are now leveraging this cutting-edge technology.
Nvidia’s Dynamo, a newly launched Kubernetes-based API, is designed to streamline complex orchestration and enhance efficiency for AI workloads across various GPUs. As cloud giants race to optimize their systems, the implications are significant for businesses relying on AI-driven solutions.
According to Nvidia, AWS is one of the first to implement Dynamo, using it to supercharge inference for clients managing generative AI tasks. The integration with Amazon’s Elastic Kubernetes Service (EKS) allows seamless scaling of disaggregated serving for Kubernetes, extending capabilities both on AWS and on-premises.
Google Cloud is also on board, utilizing Dynamo to enhance large language model (LLM) inference on its advanced supercomputer platform, AI Hypercomputer. Meanwhile, Microsoft Azure is capitalizing on Dynamo to facilitate multi-node LLM inference using its powerful GB200-v6 virtual machines. These VMs have already set performance records, previously achieving an impressive 865,000 tokens per second in MLPerf Inference benchmarks.
Not to be outdone, Oracle Cloud’s team is employing Nvidia’s Dynamo within its Superclusters, which are equipped with custom-designed networking utilizing RDMA over Converged Ethernet Version 2 (RoCE v2). This technology enables staggering 400 Gb/s connections between GPUs, amplifying AI inference capabilities.
Nvidia is also introducing Grove, a new open-source Kubernetes API designed to optimize workload management across extensive GPU deployments. This tool simplifies orchestration, transforming complex requirements into manageable Kubernetes pods. Grove is available as part of Dynamo or separately via GitHub, making it accessible for developers looking to enhance operational efficiency.
The urgency of these developments is underscored by the growth of distributed data centers among hyperscalers like AWS and Microsoft. AWS’s Rainier site interconnects multiple facilities on a single campus, while Microsoft’s Fairwater project spans hundreds of miles, exemplifying the need for robust and efficient AI infrastructure.
Even smaller players are joining the Nvidia ecosystem. Nebius, a European neocloud provider with substantial contracts with Meta and Microsoft, has recently partnered with Nvidia to utilize the Dynamo platform, positioning itself to meet the increasing demand for AI workloads.
“As AI inference becomes increasingly distributed, the combination of Kubernetes and Nvidia Dynamo with Grove simplifies how developers build and scale intelligent applications,” said Shruti Koparkar, Nvidia’s senior manager of product marketing for AI inference.
The adoption of Nvidia’s Dynamo across major cloud platforms marks a critical step in the evolution of AI technology. As these platforms enhance their capabilities, businesses can expect to see improved performance and efficiency when deploying AI solutions, making this a pivotal moment for the industry.
As the competition heats up among cloud service providers, all eyes will be on how these advancements impact the future of AI. Stay tuned for more updates on this developing story.
