A brief history of HPC cooling
Within the modern era of massively parallel HPC systems build from commodity components, energy use has increased rapidly, and the technologies required to cool these systems has evolved.
Air-cooling
Systems from the 1990’s and early 2000’s used air cooling. Air conditioning units were employed to keep the server rooms cool, using air as the heat transfer fluid.
As HPC systems grew in size, this became problematic and so cold aisle containment technologies were introduced. Here, hot air generated by servers is kept separate from air cooled by the ventilation system, resulting in hot and cold islands, separated by physical barriers. This somewhat simplified the removal of waste heat, as the hotter areas could reach a higher temperature.
Rear-door cooling
Passive rear door systems started to became common in the 2000’s. These systems relied on rack rear doors operating as radiators to directly capture the heat generated by servers and transfer the energy into fluid (usually water) piped through the rear doors. This resulted in increased water temperatures, and the water was then piped away to a heat exchanger, and typically, an active refridgeration unit to reduce the water temperature before sending it back to the rear doors in a closed circuit.
Active rear-door cooling
As server power consumtion and rack densities increased, passive rear-door cooling became insufficient to remove waste heat. Server fans lacked the power to move air fast enough through the radiators, resulting in circulation of warmer air within the racks. Active rear-doors were becoming common between the late 2000’s and 2010’s. These rear doors contained active fans which sucked air through upgraded door radiators, ensuring a better capture of heat.
Air was still the transfer fluid to move waste heat from processors to the cooling fluid. The temperature of the cooling fluid is typically raised by a few degrees using this technology.
Direct liquid cooling
Direct liquid cooling systems started to become common in the early 2020’s, with the advent of higher wattage processors, with power consumption increasing up to nearly 300W.
Direct liquid cooling uses a heat transfer fluid (typically water with added propolyene glycol) to directly remove heat from the CPUs, and in some cases other system components such as memory. The fluid is typically piped from a manifold within the rack directly into server heat sinks.
The return temperatures (waste heat) can be significantly elevated, typically 30-40 degrees. This helps to increase the efficiency of the cooling system, as a larger delta temperature is easier to deal with.
The rack manifolds are connected to a coolant distribution unit with a heat exchanger to transfer heat from the secondary (in-rack) cooling circuit to the primary circuit, and thus to the heat disposal system.
Direct liquid cooling is usually used in tandem with active rear-door coolers, which can be either on the primary or secondary heat circuits.
Immersion cooling
The next logical step in cooling technology is immersion cooling. This is a technology which has been around for several decades, but which is only recently maturing and seeing wider adoption. Here, servers are immersed directly in a non-conductive heat transfer fluid, typically a mineral old, or bespoke fluid.
Single or two-phase systems are available. With a two-phase system, the heat from the server causes evapouration of the fluid, prior to condensation within the closed cycle tank. However such systems are expensive, requiring special fluids, and are falling out of favour.
Single phase systems have the servers immersed within the fluid in a rack-sized. The fluid convects around the tank removing heat from the servers, and then exchanging it with a primary facility fluid (typically water-based) in a heat exchanger, from where the heat is transfered away from the system typically using passive coolers.
Immersion cooling fluid temperatures are typically around 40 degrees, thus presenting significant opportunities for reuse of the waste heat.
An additional benefit of immersion cooling is simplicity. Servers can be minimal, with no requirement for fans, heat pipes and large copper or aluminium components, thus significantly reducing manufacturing cost (and embodied carbon). Noise in the data centre environment is also greatly reduced.
The main drawback of immersion cooling is oil: different working practices are required when maintaining and repairing servers which have been immersed in the cooling fluid. A key aim of this project is to understand and refine the restrictions related to this, and develop working practices to optimise operations.