Machine Learning will increase the demands on Data Center Performance

Machine learning will increase the demands on data center performance, leading to new ways to cool and power the systems there, reports Peter Judge.
Machine Learning will increase the demands on Data Center Performance

AI and machine learning are increasingly performed by specialist hardware within and outside the data center. This inevitably makes more demands on the infrastructure. These technologies change the way facilities handle power and cooling, but we can relax. We’re ready to do it, according to a leading academic.

“As we move towards 2030, we will see a departure from the homogeneous compute environment of using CPUs, to a situation where you we use GPUs, ASICs and FPGAs – using computing hardware silicon that is more optimized for the workload that you’re trying to process,” predicted Suvojit Ghosh, managing director of the Computing Infrastructure Research Centre (CIRC) at McMaster University in Ontario, at DCD>New York 2019.

He disagrees with Intel VP Wei Li (p13) on the value of GPUs. In some cases adding one GPU per CPU, can save about 75 percent of the capex, he said.

For years now, GPUs have been three times more efficient than CPUs at crunching numbers, said Ghosh. But AI workloads are fundamentally different from traditional ones, making a lot of use of lower-precision calculations. “If half of the computing load can be done in single precision, GPUs are not just three times better, they are 20 or 30 times better than CPUs.”

Supporting this heterogeneous compute resource, he expects so see heterogeneous storage, also optimized for different kinds of workload.

The arrival of specialist processors saves energy, but it also drives towards higher densities, he said. Some might have concerns about that, but he says we should embrace it: “We have this common misconception that running high density is very expensive, but that’s actually not true.”

High density deployments – maybe as high as 100kW per rack – can be cheaper because of the cost of real estate, the hardware costs of masses of racks and, significantly, the cost of providing interconnects between the servers within a data center: “The cost per kiloWatt goes down with density,” he asserted. ”I would encourage you to look a little closer when doing your own business case.”

Of course, high density means liquid cooling, which rings alarm bells. Most people believe that the promised savings come at a high capital cost, but Ghosh says this is another myth: “When the density is right, liquid cooling is actually cheaper, even on the installation, even on the capital expenses. And not by a little bit, it’s almost a third of the cost of the total infrastructure.”

The opex savings are even more obvious, of course. Analysis from Rittal suggests that a 2MW IT load could cost $1.4 million per year to cool using conventional CRAC systems, a figure that falls to $690,000 with liquid cooled racks and free (outside air) cooling. Immersion cooling by the likes of Submer or GRC can virtually eliminate the power demand of cooling completely, taking the cost down to a mere $26,000 per year.

“In 2030, I would forecast we’re going to be using some form of liquid cooling,” he said. “Immersion cooling will probably be used only for ultra high density specialized applications. For more general uses, he thinks that direct-to-chip liquid cooled servers will be good enough, offering a substantial improvement over air cooling.

So taking those predictions together, Ghosh predicts, “we are going to have lots of application specific circuits cooled by water.”

The mechanical and electrical parts of the system are also up for change, he predicted.

Today’s fault tolerant power and liquid cooling subsystems provided by expensive hardware redundancy, will eventually be replaced by intelligent fault prediction systems, he promised.

There will be integrated control of “IT” and “Facilities” he suggested, and much of it will be automated, so there’s no need for facilities staff on site. One person, with the support of a smart adaptive control system, will be able to manage a “constellation” of data centers from a single pane of glass.

Of course that sort of intelligence takes us full circle. As we saw on p4, data centers are already beginning to take advantage of AI in improving their own operations, from intelligent cooling to smarter workload balancing.

In the data centers of 2030, Ghosh says, this will not just be a good idea, but an absolute essential.

To support AI, data centers must exploit AI.

Source: Data Centre Dynamics (Peter Judge)