Major Components inside the NVIDIA DGX A100 System
At the core, the NVIDIA DGX H100 system leverages the NVIDIA H100 GPU, designed to efficiently accelerate large complex AI workloads as well as several small workloads, including enhancements and new features for increased performance over the A100 GPU. The H100 GPU incorporates 80GB high-bandwidth HBM2 memory, larger and faster caches, and is designed to reduce AI and HPC software and programming complexity.
AgilityFlexAI is Fed by IBM Storage Scale System
Substantial improvements in I/O performance
- Significantly reduced inter-node software path latency to support the newest low-latency, high-bandwidth NVMe technology
- Improved performance for many small and large block size workloads simultaneously from new 4 MB default block size with variable sub-block size
- Improved metadata operation performance to a single directory from multiple nodes
- Lowers resource requirements over 50% with GPU direct storage
- Flexibility with access via S3, NFS, SMB, GDS, POSIX, HDFS and CSI
Fully Optimized DGX Software Stack
The DGX H100 software has been built to run AI workloads at scale. A key goal is to enable practitioners to deploy deep learning frameworks, data analytics, and HPC applications on the DGX H100 with minimal setup effort. The design of the platform software is centered around a minimal OS and driver install on the server, and provisioning of all application and SDK software available through the NGC Private Registry.
The NGC Private Registry provides GPU-optimized containers for deep learning, machine learning, and high performance computing (HPC) applications, along with pretrained models, model scripts, Helm charts, and SDKs. This software has been developed, tested, and tuned on DGX H100 systems.
IBM Spectrum Discover
Metadata Management for FlexAI
IBM Spectrum Discover is modern metadata management software that provides data insight for exabyte-scale unstructured storage. IBM Spectrum Discover indexes metadata for billions of files. This metadata enables data scientists, storage administrators, and data stewards to efficiently manage, classify and gain insights from massive amounts of unstructured data. The insights gained accelerate large-scale analytics, improve storage economics, and help with risk mitigation to create competitive advantage and speed critical research.
- Event notifications and policy-based workflows
- Fine-grained views of storage consumption
- Fast, efficient search through petabytes of data
- Quickly differentiate mission-critical business data
- Policy-based custom tagging
- Extensible via Software Developers Kit (SDK)
Game Changing Performance
AgilityFlexAI is powered by the NVIDIA DGX H100 Accelerated Compute Server that delivers unprecedented performance for deep learning training and inference. Organizations can now deploy data-intensive, deep learning frameworks with confidence. DGX H100 enables the cutting-edge DL/ML and AI innovation data scientists desire, with the dependability IT requires.