Skip to main content
Arrow Electronics, Inc.
Triangular_Prism_Point_Closeup
Article

Arrow Quick Hit: NVIDIA DGX Cloud — AI solutions for the enterprise

October 04, 2023 | Russ Braden

What is it?

With the rise of generative AI and the increasing demand for accelerated computing to process vast amounts of unstructured data, businesses are now seeking a more robust infrastructure for next-gen AI applications with many turning to the cloud.

However, AI also presents new requirements that traditional infrastructures can't fully meet, such as spanning CPUs, GPUs and GPU clusters. Businesses must now adapt to handle these complex models that require multi-node infrastructure to support cutting-edge AI. Many realize too late that without the right platform, they are seeing rapidly increasing costs associated with decreased developer productivity and slower time-to-market for critical AI endeavors.

Fortunately, NVIDIA DGX Cloud overcomes the challenges of traditional IaaS offerings by providing up to three times the utilization efficiency compared to conventional AI infrastructure. This results in shorter training runs, less developer idle time, and improved overall efficiency.

Some high-level benefits of the DGX Cloud include:

  • Faster results: DGX Cloud combines NVIDIA Base Command and AI Enterprise, both of which accelerate AI development and deliver production-ready models faster with optimized libraries, frameworks and pre-trained models.
  • Dedicated serverless AI platform: Developers will be empowered with a serverless AI platform featuring dedicated, multi-node training infrastructure and scalable GPU resources based on NVIDIA DGX technology.
  • Money-saving solutions: DGX Cloud lets businesses innovate with AI without infrastructure hassles. Your customers will enjoy reliable, resource allocation, error-free job execution, reduced developer idle time, and a lower TCO compared to traditional services.
  • Dedicated technical support: DGX Cloud provides direct access to NVIDIA AI experts who can enhance your customers' outcomes and optimize their work. Distinguishing it from traditional clouds, DGX Cloud includes a dedicated technical account manager, solution architect, customer service manager, and 24x7 NVIDIA Enterprise Support.

Why should you care?

There are many compelling reasons why resellers and customers should be interested in the NVIDIA DGX Cloud solution.

How does it work?

The NVIDIA DGX Cloud operates on a cloud-based infrastructure that gives users almost immediate access to the computational power of NVIDIA DGX systems for AI and deep learning workloads.

NVIDIA DGX Cloud Image 2

Included with DGX Cloud is an Accelerated Compute Environment (ACE), which is a private, dedicated GPU cluster with storage, high-performance networking (built on InfiniBand or RoCE), and cross-sectional bandwidth between all nodes.

Only DGX Cloud provides readily available GPU server clusters. Traditional services don't have large clusters that are ready to use, and these special builds can take many months to be deployed. Most leverage a network fabric that is not optimized for AI training and operates at slower speeds with higher latency than NVIDIA networking employed in DGX Cloud. Traditional services suffice for distributed scale-out workloads, but they are not sufficient for AI training, which can be both scaled-up and scaled-out.

The fastest way to get started using the DGX platform is with NVIDIA DGX Cloud. Key features and capabilities of the DGX Cloud solution include:

Software

Hardware

  • Multi-node capable
  • 8 NVIDIA A100 Tensor Core GPUs or 8 NVIDIA H100 Tensor Core GPUs per node (640GB total)
  • Access to the latest GPU technology
  • 10TB storage per instance — scale-up as needed
  • 10TB egress per month per tenant — scale-up bandwidth as needed

Services

Differentiation in the market

NVIDIA DGX Cloud stands out among traditional cloud services with several key differentiators:

  • A developer-first platform: More than traditional GPU IaaS, DGX Cloud offers best of breed AI workflow management software and optimized containers that increase developer productivity and speed time to production-ready models.
  • Severless AI development: DGX Cloud abstracts away the complexities and gives developers a workload-focused interface to do their work without worrying about infrastructure.
  • Purpose-built for multi-node training: Ready-to-use clusters are optimized for the most complex AI workloads.
  • Hybrid-cloud support: DGX Cloud in combination with on-premises DGX BasePOD and DGX SuperPOD use the Base Command Platform as a single pane to manage hybrid clouds.
  • Scalability and flexibility: Infrastructures can grow from a couple of instances to large clusters without long lead times or complexities.
  • Ease of use and management: Removes hardware complexities and focuses on AI tasks.
  • NVIDIA AI Enterprise Software Stack: This solution includes pre-trained models, optimized frameworks and accelerated data science libraries.
  • Predictable pricing: Your customers will enjoy predictable, transparent pricing models that will help them avoid unexpected costs and plan budgets effectively for their AI initiatives.
  • A team of experts: DGX Cloud includes NVIDIA Enterprise Support and a dedicated team of AI practitioners who can help optimize workloads for maximum performance.

Click on image to view a larger version.

NVIDIA DGX Cloud Differentiates

DGX Cloud offers unique features

NVIDIA DGX Cloud differentiates itself from the competition with purpose-built hardware, high-performance, scalability and ease-of-use. The cloud-based accessibility, best-of-breed software tools and access to AI expertise all make DGX Cloud a robust platform for AI. Its unique features include:

  • Readily accessible DGX H100 (as of January 2024) or A100-based reserved instances
  • High-performance multi-node cluster that is ready-to-scale
  • NVIDIA performance-optimized network for multi-node training
  • Direct access to NVIDIA AI experts who know LLMs and generative AI
  • Hybrid cloud-ready with single-pane view across on-premises and cloud infrastructures
  • Multi-cloud support with no lock-in
  • One predictable price with no surprises

How to position and sell NVIDIA DGX Cloud

Common use cases

NVIDIA DGX Cloud caters to diverse AI use cases by providing access to powerful multi-node clusters for computationally intensive tasks. It is a severless AI training platform that is used for numerous AI applications and can adapt to evolving research and industry needs. Common use cases include:

  • Deep learning model training: Train complex deep learning models on large datasheets, such as CNNs for image recognition, RNNs for NLP and GANs for generative AI models
  • AI research and development: Experiment with new AI algorithms, architectures and techniques without dedicated hardware, which fosters faster iteration and efficient exploration
  • Computer vision applications: Accelerate computer vision tasks like object detection, segmentation, image generation and video analysis, which demand substantial computational resources
  • Natural languages processing (NLP): Allocate the necessary resources for NLP tasks like language translation, sentiment analysis and text generation, which train and deploy large-scale language models like BERT
  • Recommendation systems: Train recommender models that are crucial for e-commerce and content platforms with large user data processing needs
  • Drug discover and healthcare: Train models for drug discovery, virtual screening, molecular dynamics simulations, medical image analysis and disease diagnosis
  • Autonomous vehicles and robotics: Facilitate the development and deployment of autonomous driving and robotics models
  • Financial modeling and predictive analytics: Train complex financial modeling, risk analysis and predictive analytics tasks
  • Scientific research: Train climate modeling, astrophysics simulations and drug-protein interactions in bioinformatics
  • Data science pipelines: Integrate into data science workflows to accelerate data processing, feature engineering, and model training

Target personas

NVIDIA DGX Cloud caters to various AI personas, including LOB leaders, IT leaders and AI practitioners. For enterprises that are AI-dependent and need to train complex models, such as generative AI, DGX Cloud provides the AI infrastructure needed without worrying about the underlying infrastructure. Some example target personas for NVIDIA DGX Cloud include:

  • AI practitioners benefit from multi-node training with rapid experimentation and access to GPU resources.
  • IT leaders can take advantage of the easy utilization and resource management.
  • LOB leaders will enjoy a serverless AI platform designed to train generative AI with a shorter time-to-market and lower TCO.
  • Startups and SMEs will have access to powerful GPUs without requiring significant upfront investment in hardware.

Questions to ask your customer

When engaging with your customers about NVIDIA DGX Cloud, here are some questions you can ask to gather additional valuable information:

  • Do you find your current cloud costs escalating with AI development?
  • Do you have a data center with enough power and cooling for large-scale AI infrastructure?
  • Do you struggle to get a CapEx budget for AI computing?
  • Do you have a hard time finding multi-node training capacity for your large complex models?
  • Do you wish you could access a supercomputer on an as-needed basis?

Answers to your customers' questions

  • What's the difference between DGX Cloud and Amazon Sagemaker?
    Amazon Sagemaker is only compatible with AWS, whereas DGX Cloud pairs with leading CSPs. Additional unique benefits that DGX Cloud customer can enjoy include:
    • Unlimited, dedicated access to NVIDIA A100 80GB
    • Tensor Core GPUs
    • Access to the latest GPU technology
    • Multi-node readiness
    • NVIDIA networking
    • AI expertise
    • Hybrid-cloud readiness
    • Multi-cloud capabilities
    • Predictable pricing
  • What do you mean by predictable pricing?
    DGX Cloud is priced on a per-month basis, with storage and data egress, support and software included so businesses can predict costs ahead of time.
  • What are the storage and data egress fees for DGX Cloud?
    Each DGX Cloud instance includes 10TB of storage with additional storage available for purchase in 10TB increments. Each tenant has 10TB of data egress per month, with more storage egress available for an extra fee.
  • Does DGX Cloud have hybrid capabilities?
    The Base Command Platform powers DGX Cloud and on-premises DGX systems, creating a hybrid-cloud architecture across clouds and on-premises infrastructure. DGX Cloud is the only truly hybrid cloud for AI.
  • Who answers support questions?
    NVIDIA DGX Cloud service provides world-class NVIDIA Enterprise Support. The single monthly cost covers 24x7 business-critical support without any additional fees.

The bottom line

NVIDIA DGX Cloud provides enterprises with the necessary resources and infrastructure to drive innovation, advance research and develop AI-driven solutions without the burden of managing the underlying infrastructure. The high-performance computing, scalability and cost-efficiency makes it an attractive choice for those seeking to bring complex models into production faster.

More information

Subscribe to Arrow Channel Advisor
Sign Up