There’s Big Data, and then there’s really Big Data. Case in point: oil and gas exploration, where 3D seismic images and reservoir models require 1 petabyte of storage and more than 18 petaflops of compute power to analyze.
With an estimated 5.5 million Internet of Things devices coming online daily, Big Data is becoming the norm for businesses in just about every vertical. Organizations spent nearly $19 billion last year on high-performance computing (HPC), which applies advanced compute, memory, storage and interconnects to generate the data-driven analyses and insights that make information so valuable.
In the past, HPC was almost exclusively the domain of scientific researchers. But in recent years, these top-notch computing capabilities are more widely available, thanks to increased vendor competition and supporting technologies that keep getting better even as they get cheaper. Now, HPC is widely used for applications as far-ranging as fraud detection, high frequency trading, oil exploration and smart grids. That means no matter which business you’re in, you already have plenty of use cases to learn from as you develop and refine your HPC strategy.
I saw the business value of HPC firsthand as an IT manager at a major electronic design automation (EDA) company, where I was responsible for managing a 22,000 CPU core HPC compute farm hosting EDA applications environments. Now, as a principal architect in CDW’s data center practice, I help businesses navigate their HPC options — and there are a lot of them. Here are a few top considerations.
How to Choose the Right HPC Solution
All HPC systems have four major components: CPU, physical memory, storage and interconnect. Vendor choices are crucial because your organization will be better off with state-of-the-art solutions in each category. CDW partners with all the leading HPC vendors, which means we can provide unique insight into their product roadmaps and how they align with a specific organizational strategy.
For example, instead of 10Gbps or 40Gbps Ethernet interconnects, focus on solutions that use InfiniBand, which provides higher bandwidth at much lower latency. It’s also smart to consider remote direct memory access (RDMA), which offers lower CPU overhead by enabling systems to talk directly with one another across network layers.
Organizations also want to know if they should consider Big Data cluster computing, a close cousin of HPC. The main difference is that BDCC uses parallel processing on a distributed file system, whereas HPC uses parallel processing on a parallel file system. With BDCC, the data computations can be supported by most of the common applications used in web analytics and file-crawling systems. The inputs/outputs streams are also handled by several systems that work in parallel, shared across several nodes, each node processing only a small slice of data. And HPC focuses on computations that are based on lower level code (C/C++), with a single node handling the input/output code, so the data must fit into the RAM of the individual node.
A growing number of our clients are moving away from HPC in favor of BDCC, partly because it’s less expensive. But the workload is the decisive factor. If your data can be diced into chunks, such as genome analytics, log data analytics, financial reporting and transaction analytics, then it’s worth considering BDCC. But if applications require all the data to be available to all nodes in a cluster — such as a chip design or a film that’s being rendered — then HPC is the way to go.
In one important sense, HPC and BDCC are like most other technologies: They offer plenty of benefits, but also a lot of risks simply because there are so many vendor and product choices. To avoid those risks, start your HPC journey by choosing a partner that knows the leading solutions and has a long history of helping other businesses take advantage of HPC and BDCC.
This blog post brought to you by: