NVIDIA DGX POD
Custom built NVIDIA DGX POD includes any number of NVIDIA DGX A100 Servers along with high-speed 200Gb NVIDIA Mellanox Infiniband networking, high-speed NVMe storage using IBM Spectrum Scale file system software on top of NetApp EF600 storage arrays, Red Hat or NVIDIA Ubuntu OS, Bright Cluster Manager software, SLURM and all NVIDIA NGC software packages.
All designed to work together, optimally and seamlessly.
Factory integrated and factory tested ensuring you get a tightly integrated and optimized HPC cluster.
Delivered and setup onsite with our white glove AgilityCare professional service technicians.
TS-SCI cleared personnel for government or sensitive installations available.
Essential Building Block of the AI Data Center
The Universal System for Every AI Workload
NVIDIA DGX A100 is the universal system for all AI infrastructure, from analytics to training to inference. It sets a new bar for compute density, packing 5 petaFLOPS of AI performance into a 6U form factor, replacing legacy infrastructure silos with one platform for every AI workload
DGXperts: Integrated Access to AI Expertise
NVIDIA DGXperts are a global team of 16,000+ AI-fluent professionals who have built a wealth of experience over the last decade to help you maximize the value of your DGX investment.
Fastest Time To Solution:
NVIDIA DGX A100 is the world’s first AI system built on the NVIDIA A100 Tensor Core GPU. Integrating eight A100 GPUs with up to 640GB of GPU memory, the system provides unprecedented acceleration and is fully optimized for NVIDIA CUDA-X™ software and the end-to-end NVIDIA data center solution stack.
Optional liquid cooling available.
ExtremeStor NVMe Storage
ExtremeStor ECE is the ultimate all NVMe flash storage system for all AI and Machine Learning workloads. Our solution is designed to ensure that no NVIDIA DGX A100 GPU server is waiting on data. ExtremeStor with IBM Spectrum Scale ECE file system software delivers extreme scalability, flash-accelerated performance, and automatic policy-based storage tiering from flash through disk to cloud or even tape—helping reduce storage costs up to 90 percent while providing better security and management efficiency in cloud, big data, and analytics environments. ExtremStor delivers high IOPS and high bandwidth to support data-hungry GPUs with tens of GB/s of bandwidth per compute server, simultaneously across as many servers as needed.
ExtremeStor ECE utilizes several levels of erasure coding brings much better storage efficiency, e.g. 70% with 8+3p and 80% with 8+2p Reed Solomon Code. Better storage efficiency means less hardware and cost, which can help customers save a lot of budget without compromising system availability and data reliability. ECE erasure coding can better protect data comparing with traditional RAID 5/6, e.g. 3 nodes of fault tolerance with 8+3p and 11 or more nodes which can survive concurrent failure of multiple servers and storage devices.
ExtremeStor ECE also offers:
Consistently fast performance because IBM Spectrum Scale scatters data across spinning disk, eliminating fragmentation degradation on performance as the system fills
Enterprise reliability and availability; Rock solid high performance GPFS file system and high scale data management trusted by 1000’s of organizations combined with best of breed components for extreme performance and data protection
Flexible building blocks; ExtremeStor GPFS is delivered as a modular, repeatable, and highly supportable solution consisting of best of breed industry standard components
Intuitive Graphical User Interface; monitors performance, capacity, network, and enhanced maintenance and support, including interaction with IBM support
Applied Data Systems can help with your project and custom design your own DGX POD. We understand the importance of a properly designed, balanced system, especially when it comes to both network and storage considerations. Too many times, these are afterthoughts.
Our experience has shown that all to often the storage system is an afterthought. An improperly designed storage system will starve the GPU servers for data that leads to increases the time to results. The idea with GPU computing is high-performance and shortened time to results.
We like to say, don’t do your science experiments on a science experiment. That’s why we carefully select the right components and configuration when designing your system. The correct storage system is one that meets your requirements, meets the performance requirements and fits within your budget. Understanding all the moving parts in such complex HPC systems is our expertise. We tie all the moving pieces together, giving you a highly integrated and supported cluster ready for the most demanding workloads.
Let us discuss with you your project so we can help determine what is the best setup for your requirements. If there is a requirement for discussions to be done in a classified setting, we can help with that too. No project or discussion is too large or too small. We’re here to help out. Get in touch with us now.