Major Components inside the NVIDIA DGX A100 System
At the core, the NVIDIA DGX A100 system leverages the NVIDIA A100 GPU, designed to efficiently accelerate large complex AI workloads as well as several small workloads, including enhancements and new features for increased performance over the V100 GPU. The A100 GPU incorporates 40 GB or 80GB high-bandwidth HBM2 memory, larger and faster caches, and is designed to reduce AI and HPC software and programming complexity.
AgilityFlexAI is Fed by IBM Spectrum Scale V5: New Levels of Storage Performance
Substantial improvements in I/O performance
- Significantly reduced inter-node software path latency to support the newest low-latency, high-bandwidth NVMe technology
- Improved performance for many small and large block size workloads simultaneously from new 4 MB default block size with variable sub-block size
- Improved metadata operation performance to a single directory from multiple nodes
Fully Optimized DGX Software Stack
The DGX A100 software has been built to run AI workloads at scale. A key goal is to enable practitioners to deploy deep learning frameworks, data analytics, and HPC applications on the DGX A100 with minimal setup effort. The design of the platform software is centered around a minimal OS and driver install on the server, and provisioning of all application and SDK software available through the NGC Private Registry.
The NGC Private Registry provides GPU-optimized containers for deep learning, machine learning, and high performance computing (HPC) applications, along with pretrained models, model scripts, Helm charts, and SDKs. This software has been developed, tested, and tuned on DGX A100 systems.
IBM Spectrum Discover
Metadata Management for FlexAI
• Event notifications and policy-based workflows
• Fine-grained views of storage consumption
• Fast, efficient search through petabytes of data
• Quickly differentiate mission-critical business data
• Policy-based custom tagging
• Extensible via Software Developers Kit (SDK)
Game Changing Performance
AgilityFlexAI is powered by the NVIDIA DGX A100 Accelerated Compute Server that delivers unprecedented performance for deep learning training and inference. Organizations can now deploy data-intensive, deep learning frameworks with confidence. DGX A100 enables the cutting-edge DL/ML and AI innovation data scientists desire, with the dependability IT requires.