skip to Main Content
1-844-371-4949 info@applieddatasystems.com

“The technical, sales, support and leadership teams at ADS are outstanding. They offer cost-effective, scalable solutions for scientific compute and storage applications at all tiers of need.

Support and sales have been integral to our ability to offer IT services to research labs across our enterprise with a small team of technical experts.

We consider their team an extension of our technical and engineering staff. They are typically our first choice to work with when we fulfill customer infrastructure needs. It’s a pleasure to work with them.”

Kevin Smith
IT Manager
UC San Diego

ExtremeStor with IBM Spectrum Scale Erasure Code Edition

  1. High Performance Erasure Coding
  2. Declustered RAID
  3. End-to-end Checksum and Extreme Data Integrity
  4. High Scalability
  5. Rich Enterprise Storage Features and Internal Disk Drive Management
Extreme Scale and Performance Trusted By Thousands of Organizations Worldwide

High Performance Erasure Coding

ExtremeStor ECE supports several erasure coding levels and brings much better storage efficiency, e.g. 70% with 8+3p and 80% with 8+2p Reed Solomon Code. Better storage efficiency means less hardware and cost, which can help customers save a lot of budget without compromising system availability and data reliability. ECE erasure coding can better protect data comparing with traditional RAID 5/6, e.g. 3 nodes of fault tolerance with 8+3p and 11 or more nodes which can survive concurrent failure of multiple servers and storage devices.

Declustered RAID

ECE implements advanced declustered RAID with erasure coding. ECE declustered RAID can put a large amount of disk drives across multiple servers in the same group.  ECE failure domain feature can detect and analyze hardware topology automatically and distribute data evenly among all the nodes and disk drives. With declustered RAID, even data and spare distribution and critical/normal rebuild, ECE can balance between very high data reliability and low performance impact to the applications.

End-to-end Checksum and Extreme Data Integrity

ECE calculates, transfers and verifies checksum for data over network. If corruption happens during network transfer, the data are re-transmitted until it succeeds. ECE also calculates, stores and verifies checksum and a lot of other information like data versions, which vdisk it belongs and which data block and strip it is, etc. These are called buffer trailer in ECE, which are used to protect data from various data corruptions especially silent data corruptions, including hardware failures, offset write, drop write, write garbage, media errors, etc.

High Scalability

One of the major advantage of Spectrum Scale and Spectrum Scale RAID is the high scalability. This has been proved in many large scale systems. The latest and impressive one is the Coral system, Summit, in the US DoE’s Oak Ridge National Laboratory (ORNL). IBM Spectrum Scale and IBM Spectrum Scale RAID are the same core storage software technologies inside ESS as an ECE system. A set of commodity servers can be built with ECE as a high performance and reliable storage building block. Many of these building blocks can be aggregated together into the same large GPFS file system, which eliminates data silos and unnecessary data movements.

Rich Enterprise Storage Features and Internal Disk Drive Management

IBM Spectrum Scale has been in production for 20+ years. It’s well known as an enterprise file system with a competitive list of features to meet data management requirements in various use cases. ECE further extends its footprint to commodity server based storage systems. ECE implements disk hospital to detect disk failures, diagnose the problems and identify failing disks for replacement to the system admin. It defines a standard procedure to help them to figure out and replace bad disk drives. It tells system administrator which server and slot a bad disk drive locates and can possibly turn on LED for some types of disk drives, which makes disk replacement very convenient. This is tough to implement for a commodity server based storage software due to its hardware platform neutrality, but ECE does it.

ExtremeStor ECE Use case

High performance file serving: Use ECE as the backend storage and IBM Spectrum Scale Protocol services to allow customers to access ECE with NFS, SMB and Object

High performance compute tier: ECE implements high performance erasure coding and provides the capability of storage tiering to different storage medias (e.g. flash drives, spinning disks, tape, cloud storage, etc.) with different performance and cost characteristics. The policy based Information Life Cycle management feature makes it very convenient to manage data movement among different storage tiers. A typical ECE high performance compute tier is composed of full NVMe drives to store and accelerate GPFS metadata and the set of hot data for high performance computing and analytics.

High capacity cloud storage: With space efficient erasure coding and extreme end-to-end data protection design and implementation, ECE can deliver the essential cost effective and data reliability value-adds to large scale cloud storage system. A typical ECE storage system for high capacity cloud storage can be composed of a NVMe storage pool to store and accelerate GPFS metadata and small data I/O’s, and a bunch of HDD drives to store the massive user data, and move cold data to much cheaper tape system if needed.

Professional Consultation and Architecture Design

Our IBM Spectrum Scale/GPFS R&D engineers have extensive knowledge of software defined storage, distributed file systems, Information Life-cycle Management, Big Data, DR/HA/data protection strategies and multi-site deployments including cloud integration.

We have architected multi-million dollar, enterprise level, distributed storage solutions for the National Labs and Fortune 100 companies in healthcare, finance, telecommunication, technology and energy. Our expertise is in clearly defining technical and business needs and then building innovative solutions that lead to client success.

ExtremeStor GPFS High Performance, Multi-tier, Single Namespace Architecture

Accelerate Life Science, Machine Learning and Research Workflows

Shown above, data is ingested from high output instruments, sequencers, cameras, and microscopes into ExtremeStor GPFS high-performance, multi-tier, single namespace storage. Data is then processed and analyzed by the AgilityFlex Compute Cluster, also provided by Applied Data Systems. ExtremeStor GPFS storage provides the cluster the necessary high throughput and low latency to accelerate data hungry compute node processing, speeding completion.

Results files are stored back on ExtremeStor GPFS shared storage where researchers access data from their workstations over standard NFS/SMB protocols to perform local workstation-scale data analysis or kick off subsequent HPC-scale analysis on the AgilityFlex Cluster – where it can access data quickly again. Cold data can be migrated by policy to low-cost, high-capacity drives – within the same namespace – without the added complexity or expense of tiering to external object or cloud storage. External tiering (tape, object, cloud) is also available.

Consistently Fast Performance

IBM Spectrum Scale scatters data across spinning disk, eliminating fragmentation degradation on performance as the system fills

Performance of Flash with the Economics of Disk

Economic high capacity disks can be accessed in parallel for high throughput and capacity, combined with a high performance flash tier along with server-side flash

Intuitive Graphical User Interface

Monitors performance, capacity, network, and enhanced maintenance and support, including interaction with IBM support

Enterprise Reliability and Availability

Rock solid high performance GPFS file system and high scale data management trusted by 1000’s of organizations combined with best of breed components for extreme performance and data protection

Genuine IBM Support

2nd and 3rd tier support provided by the IBM Spectrum Scale experts at IBM directly

Flexible Building Blocks

ExtremeStor GPFS is delivered as a modular, repeatable, and highly supportable solution consisting of best of breed industry standard components

IBM Spectrum Scale V5 with Expert Architectural Design and Support

Applied Data Systems ExtremeStor GPFS is based on the latest and most innovative IBM Spectrum Scale release v5, with expert architectural design and performance tuning by globally acknowledged IBM Spectrum Scale experts. Applied Data Systems delivers IBM Spectrum Scale on fully integrated, top quality, industry standard hardware for maximum performance, reliability and data protection. Expert support is provided by Applied Data Systems engineers, backed by IBM second level support.

IBM Spectrum Discover

Metadata Management for ExtremeStor GPFS

IBM Spectrum Discover is modern metadata management software that provides data insight for exabyte-scale unstructured storage. IBM Spectrum Discover indexes metadata for billions of files. This metadata enables data scientists, storage administrators, and data stewards to efficiently manage, classify and gain insights from massive amounts of unstructured data. The insights gained accelerate large-scale analytics, improve storage economics, and help with risk mitigation to create competitive advantage and speed critical research.

• Event notifications and policy-based workflows
• Fine-grained views of storage consumption
• Fast, efficient search through petabytes of data
• Quickly differentiate mission-critical business data
• Policy-based custom tagging
• Extensible via Software Developers Kit (SDK)

We’d love to work on your project. We do extensive analysis of your existing and future needs, deliver a comprehensive solution architecture on a validated hardware and software build that ships fully integrated. Contact us for an expert GPFS consultation today!

Back To Top