skip to Main Content
1-844-371-4949 info@applieddatasystems.com

AgilityFlexAI GPU POD Solution

Future Proof Compute and Storage AI Appliance Uniquely Scales GPU and Storage Independently

AgilityFlexAI GPU POD, Built for Maximum AI Throughput

NVIDIA DGX H100 The Gold Standard For AI Infrastructure

  • Eight NVIDIA H100 Tensor Core GPUs with 80GB each or 640GB total
  • Thirty-Two petaFlops FP8
  • Dual Intel Xeon Platinum 8480C Processors, 112 Cores Total
  • 2TB system Memory options
  • Ten NVIDIA Mellanox Single Port ConnectX-7 400Gb NDR Infiniband Adapters
  • Two 1.92TB m.2 NVMe OS Drives
  • Eight 3.84TB u.2 NVMe Data Drives

Multi-Instance GPU (MIG)

  • Second Generation MIG can partition the H100 GPU into as many as seven instances
  • Each instance is fully isolated with their own high-bandwidth memory, cache and compute cores
  • Support for every workload, from smallest to largest with guaranteed quality of service
  • Optimize GPU utilization
  • Run simultaneous mixed workloads
  • No change to the CUDA platform
  • MIG instances present as additional GPU resources in container orchestrators like Kubernetes

Optimized Software Stack

  • Tested and Optimized DGX software stack including:
  • NVIDIA AI Enterprise; Optimized AI Software
  • NVIDIA Base Command; Orchestration, scheduling and cluster management
  • DGX OS/Ubuntu/Red Hat Enterprise Linux/Rocky
DGX_POD
AgilityFlexAI Cluster

IBM Storage Scale Keeps GPU’s Fed

  • Parallel access for performance
  • Access the same data from all nodes
  • Superior Performance using all NVMe; up to 1.3M IOPS and 125GB/s per node
  • Large file throughput, small file IOPS
  • Tier to Object Storage seamlessly
  • Active File Management
  • Encryption
  • GPU Direct
  • Scales from 46TB to 633YB

Next-Generation NVLink and NVSwitch

  • Fourth generation NVLink offers GPU to GPU direct bandwidth to 900GB/s
  • NVLink offers nearly 7x higher bandwidth than PCIe Gen5
  • Third generation NVSwitch doubles GPU to GPU bandwidth 900GB/sec
  • NVSwitch supports full all-to-all communication with direct GPU peer-to-peer memory addressing

Hardware and Software Fully Integrated

FlexAI software is preloaded and hardware is fully rack integrated at Applied Data Systems factory before shipment

Major Components inside the NVIDIA DGX H100 System

At the core, the NVIDIA DGX H100 system leverages the NVIDIA H100 GPU, designed to efficiently accelerate large complex AI workloads as well as several small workloads, including enhancements and new features for increased performance over the A100 GPU. The H100 GPU incorporates 80GB high-bandwidth HBM2 memory, larger and faster caches, and is designed to reduce AI and HPC software and programming complexity.

 

FlexAI Powered with IBM Storage Scale System

AgilityFlexAI is Fed by IBM Storage Scale System

Substantial improvements in I/O performance

  • Significantly reduced inter-node software path latency to support the newest low-latency, high-bandwidth NVMe technology
  • Improved performance for many small and large block size workloads simultaneously from new 4 MB default block size with variable sub-block size
  • Improved metadata operation performance to a single directory from multiple nodes
  • Lowers resource requirements over 50% with GPU direct storage
  • Flexibility with access via S3, NFS, SMB, GDS, POSIX, HDFS and CSI

Fully Optimized DGX Software Stack

The DGX H100 software has been built to run AI workloads at scale. A key goal is to enable practitioners to deploy deep learning frameworks, data analytics, and HPC applications on the DGX H100 with minimal setup effort. The design of the platform software is centered around a minimal OS and driver install on the server, and provisioning of all application and SDK software available through the NGC Private Registry.

The NGC Private Registry provides GPU-optimized containers for deep learning, machine learning, and high performance computing (HPC) applications, along with pretrained models, model scripts, Helm charts, and SDKs. This software has been developed, tested, and tuned on DGX H100 systems.

IBM Spectrum Discover

Metadata Management for FlexAI 

IBM Spectrum Discover is modern metadata management software that provides data insight for exabyte-scale unstructured storage. IBM Spectrum Discover indexes metadata for billions of files. This metadata enables data scientists, storage administrators, and data stewards to efficiently manage, classify and gain insights from massive amounts of unstructured data. The insights gained accelerate large-scale analytics, improve storage economics, and help with risk mitigation to create competitive advantage and speed critical research.

  • Event notifications and policy-based workflows
  • Fine-grained views of storage consumption
  • Fast, efficient search through petabytes of data
  • Quickly differentiate mission-critical business data
  • Policy-based custom tagging
  • Extensible via Software Developers Kit (SDK)

 

Game Changing Performance

AgilityFlexAI is powered by the NVIDIA DGX H100 Accelerated Compute Server that delivers unprecedented performance for deep learning training and inference. Organizations can now deploy data-intensive, deep learning frameworks with confidence. DGX H100 enables the cutting-edge DL/ML and AI innovation data scientists desire, with the dependability IT requires.

We’d love to work on your project. We do extensive analysis of your existing and future needs, deliver a comprehensive solution architecture on a validated hardware and software build that ships fully integrated. Contact us for an expert FlexAI consultation today!

FlexAI

Back To Top