skip to Main Content
1-844-371-4949 info@applieddatasystems.com

AgilityFlexAI NVIDIA DGX-A100

Future Proof Compute and Storage AI Appliance Uniquely Scales GPU and Storage Independently

AgilityFlexAI Cluster, Built for Maximum AI Throughput

NVIDIA DGX A100 The Universal System For AI Infrastructure

  • Eight NVIDIA A100 Tensor Core GPUs
  • 320GB Total GPU Memory
  • Five PetaFlops AI and 10 PetaOPS INT8
  • Dual AMD Rome 64-Core Processors
  • 1TB System Memory
  • Eight Mellanox Single Port ConnectX-6 VPI 200Gb HDR Infiniband Adapter
  • One Mellanox Dual Port ConnectX-6 VPI 200Gb HDR Infiniband Adapter
  • Two 1.92TB m.2 NVMe OS Drives
  • Four 3.84TB u.2 NVMe Data Drives

Multi-Instance GPU (MIG)

  • MIG can partition the A100 GPU into as many as seven instances
  • Each instance is fully isolated with their own high-bandwidth memory, cache and compute cores
  • Support for every workload, from smallest to largest with guaranteed quality of service
  • Optimize GPU utilization
  • Run simultaneous mixed workloads
  • No change to the CUDA platform
  • MIG instances present as additional GPU resources in container orchestrators like Kubernetes

Optimized Software Stack

  • Tested and Optimized DGX software stack including:
  • Base operating system
  • All necessary system software
  • GPU-accelerated applications
  • Pre-trained models
AgilityFlexAI Cluster

ExtremeStor with IBM Spectrum Scale ECE Keeps GPU’s Fed

  • Parallel access for performance
  • Access the same data from all nodes
  • Superior Performance using all NVMe
  • Large file throughput, small file IOPS
  • Tier to Object Storage seamlessly

Next-Generation NVLink and NVSwitch

  • Third generation NVLink double the GPU to GPU direct bandwidth to 600GB/s
  • NVLink is nearly 10x higher than PCIe Gen4
  • Second generation NVSwitch doubles GPU to GPU bandwidth 600GB/sec
  • NVSwitch supports full all-to-all communication with direct GPU peer-to-peer memory addressing

Hardware and Software Fully Integrated

FlexAI software is preloaded and hardware is fully rack integrated at Applied Data Systems factory before shipment

Major Components inside the NVIDIA DGX A100 System

At the core, the NVIDIA DGX A100 system leverages the NVIDIA A100 GPU, designed to efficiently accelerate large complex AI workloads as well as several small workloads, including enhancements and new features for increased performance over the V100 GPU. The A100 GPU incorporates 40 GB high-bandwidth HBM2 memory, larger and faster caches, and is designed to reduce AI and HPC software and programming complexity.

 

FlexAI Powered with ExtremeStor Ultra High-Performance Storage and IBM Spectrum Scale ECE

AgilityFlexAI is Fed by IBM Spectrum Scale V5: New Levels of Storage Performance

Substantial improvements in I/O performance

  • Significantly reduced inter-node software path latency to support the newest low-latency, high-bandwidth NVMe technology
  • Improved performance for many small and large block size workloads simultaneously from new 4 MB default block size with variable sub-block size
  • Improved metadata operation performance to a single directory from multiple nodes

Fully Optimized DGX Software Stack

The DGX A100 software has been built to run AI workloads at scale. A key goal is to enable practitioners to deploy deep learning frameworks, data analytics, and HPC applications on the DGX A100 with minimal setup effort. The design of the platform software is centered around a minimal OS and driver install on the server, and provisioning of all application and SDK software available through the NGC Private Registry.

The NGC Private Registry provides GPU-optimized containers for deep learning, machine learning, and high performance computing (HPC) applications, along with pretrained models, model scripts, Helm charts, and SDKs. This software has been developed, tested, and tuned on DGX A100 systems.

IBM Spectrum Discover

Metadata Management for FlexAI 

IBM Spectrum Discover is modern metadata management software that provides data insight for exabyte-scale unstructured storage. IBM Spectrum Discover indexes metadata for billions of files. This metadata enables data scientists, storage administrators, and data stewards to efficiently manage, classify and gain insights from massive amounts of unstructured data. The insights gained accelerate large-scale analytics, improve storage economics, and help with risk mitigation to create competitive advantage and speed critical research.

• Event notifications and policy-based workflows
• Fine-grained views of storage consumption
• Fast, efficient search through petabytes of data
• Quickly differentiate mission-critical business data
• Policy-based custom tagging
• Extensible via Software Developers Kit (SDK)

Game Changing Performance

AgilityFlexAI is powered by the NVIDIA DGX A100 Accelerated Compute Server that delivers unprecedented performance for deep learning training and inference. Organizations can now deploy data-intensive, deep learning frameworks with confidence. DGX A100 enables the cutting-edge DL/ML and AI innovation data scientists desire, with the dependability IT requires.

We’d love to work on your project. We do extensive analysis of your existing and future needs, deliver a comprehensive solution architecture on a validated hardware and software build that ships fully integrated. Contact us for an expert FlexAI consultation today!

Back To Top