Genome Sequencing Center
A Global Leader in Plant Sequencing
The HudsonAlpha Genome Sequencing Center laboratory and computational staff are considered among world leaders in plant sequencing, genome assembly, improvement and data analysis. The HudsonAlpha Genome Sequencing Center is one of the major centers in the world performing original sequencing of plants and humans. The center specializes in applying genomic techniques to understand how plants function in response to environmental changes. Their gold standard plant reference genomes are used by over 20,000 researchers around the world.
Applied Data Systems has been a trusted supplier of advanced computing technology to the HudsonAlpha Genome Sequencing Center. HudsonAlpha recently needed to purchase new computing technology that could keep up with their exploding sequencing workload – workloads that couldn’t even be addressed by the United States’ national supercomputing resources.
The Challenge: Pioneering Advancements in Genomic Sequencing at HudsonAlpha Increased Data Rates 20X, Requiring New, Large-Memory, High-Bandwidth Servers to Keep Up
To advance their ground-breaking plant genomics work, HudsonAlpha acquired several of the latest PacBio and Illumina sequencers that scaled up sequencing dramatically, increasing their data analysis throughput by 20X compared to the previous year. HudsonAlpha’s PacBio Sequel Systems are powered by Single Molecule, Real-Time (SMRT) Sequencing technology and deliver highly accurate long reads ideal for plant genomics. These unique platforms deliver a comprehensive view of HudsonAlpha’s genomes under study. HudsonAlpha also uses the NovaSeq 6000, Illumina’s fastest production scale sequencing instrument, generating in excess of 6,400 Gb of sequence data in less than two days.
Plants, Like Sugarcane, Have Highly Complex Genomes
Why Conventional HPC Compute Servers are Too Slow for HudsonAlpha
Because plant genome sequences are longer and have multiple sequence copies, plant sequencing is far more complicated than human genome sequencing. Conventional compute servers used in High Performance Computing (HPC) are primarily architected for particle physics, computational fluid dynamics, weather and climate modelling – where relatively small- concentrated packets of data are heavily processed by the numerous CPUs in these systems. These “compact” simulations require relatively little local memory, and as a result, these HPC servers are typically equipped with relatively small amounts of local memory. Due to the work HudsonAlpha does, conventional HPC servers have too small of a local memory to efficiently process genomics, especially plant genomics, which require large memory pools to process the extremely long and complex plant genomes quickly.
The Solution: Intel® Advanced Performance (AP) HPC and AI S9200WK Servers from Applied Data Systems
HudsonAlpha’s sequence analysis pipeline was I/O bound and needed servers with a lot of memory to run efficiently. HudsonAlpha purchased 12 Intel® Advanced Performance (AP) HPC and AI Servers featuring Intel® Xeon® Platinum 9200 Processors. Having 192 CPU’s in total, HudsonAlpha’s servers are purpose built and performance-optimized for High Performance Computing (HPC) and Artificial Intelligence (AI) applications. Built with Intel quality, reliability and performance, Intel’s AP server boasts an unprecedented 12 memory channels per CPU for memory-intensive workloads, with 24 memory channels in a single compute module providing a superior deep learning and AI server for genomic data.
“Our data is hard to break up into small enough data sets to match conventional processors. Computationally, the Intel AP servers from Applied Data Systems are big enough to process an entire genome, so we don’t have to split a genome and run subsegments on multiple boxes and reassemble them anymore” said Jeremy Schmutz, Faculty Investigator, HudsonAlpha Genome Sequencing Center. “Intel AP servers have enough memory to keep all the sequence data in RAM, and enough processing threads to run all the steps in our algorithms very efficiently. Huge assemblies that took months to run before takes days now. The scientific problem space we can work on has opened up dramatically.”
HudsonAlpha Sequencing Analysis Infrastructure
Why HudsonAlpha Chose to Work with Applied Data Systems
Since they are creating many of the world’s reference plant genomes, the most important thing for HudsonAlpha’s leading-edge computational and algorithm work is to make sure they get reliable and accurate results. It was critical for HudsonAlpha to test the system before purchasing. Before purchase, Applied Data Systems staged HudsonAlpha’s recommended systems in their Customer Integration and Manufacturing Center (CIMC) to verify performance and accuracy.
“ADS customized systems for our exact needs, selecting from the entire universe of technology providers, and worked with us to put test machines in place in order to verify that our unique computational work and algorithms are producing reliable results at the speeds HudsonAlpha needed”, said Jeremy. “Quick warranty repairs and returns from Applied Data Systems also helps an institution like ours to keep running at maximum productivity”.
For more information on the HudsonAlpha Genome Sequencing Center, visit www.hudsonalpha.org/gsc/.