SEATTLE, WA, UNITED STATES, March 16, 2026 /EINPresswire.com/ — Phaidra today announced a groundbreaking methodology to drastically improve the thermal stability of liquid-cooled AI data centers. This methodology is outlined in the joint white paper “AI Agents for Liquid-Cooled AI Factories.”
By successfully leveraging AI-driven, feed-forward control systems on production NVIDIA Grace Blackwell platforms, the collaboration is paving the way for the future of “DSX AI factories” — a new operational paradigm where power, cooling, and workload management are unified to maximize efficiency and computational throughput. Phaidra has integrated NVIDIA DSX Max-Q to run GPU clusters as efficiently as possible, so more of the available power can go towards running AI workloads.
The Challenge of AI Thermal Volatility: Modern AI factories are fundamentally different from traditional data centers: defined by massive scale, extreme density, and highly synchronized workloads. Operators of large-scale AI factory campuses, such as Applied Digital, must manage increasingly complex interactions between power infrastructure, liquid cooling systems and rapidly fluctuating GPU workloads for their partners. When massive AI training or inference jobs are dispatched, thousands of networked GPUs ramp up simultaneously, creating “peaky” power profiles that can jump from idle to maximum capacity within seconds.
Traditional liquid cooling relies on Proportional-Integral-Derivative (PID) controllers, which wait for a sensor to register a coolant temperature change before taking action. Because coolant has high thermal inertia, this reactive feedback loop suffers from a 3-to-5-minute delay, resulting in rapid heat spikes that force GPUs to throttle performance to protect themselves. To mitigate this, operators significantly over-cool their facilities to create a safety buffer—a strategy that wastes massive amounts of energy and limits overall compute capacity.
The AI-Driven Solution: To close this latency gap, Phaidra developed a self-learning reinforcement learning (RL) AI Agent that fundamentally changes how cooling is managed. Instead of reacting after-the-fact to temperature changes, the AI Agent uses real-time rack power data as a leading indicator to predict and prevent thermal spikes. The agent seamlessly sends optimal setpoint commands to the Coolant Distribution Unit (CDU) before the heat fully registers in the fluid, reducing the effective response delay from minutes to under 10 seconds in validated production environments.
Proven Results for Gigawatt-Scale: The new methodology underwent rigorous joint A/B testing in live production environments, including an NVIDIA DGX SuperPOD cluster running LLM training workloads and CoreWeave’s NVIDIA GB200 NVL72 environments. The results were transformative:
– Massive Reduction in Thermal Overshoot: The AI Agent successfully reduced the magnitude of thermal spike overshoots by 75% to 80% compared to optimally-tuned PID baselines during sudden load ramps.
– Unprecedented Scale: Following this successful validation, Phaidra and CoreWeave are scaling the deployment of these AI agents throughout CoreWeave’s liquid-cooled fleet, bringing AI-driven thermal management to its next generation of data center capacity.
The Pathway to Max-Q AI Factories: Phaidra has integrated NVIDIA DSX Max-Q to operate the entire AI factory as a single unit of compute, at scale. By deep integration of Information Technology (IT) and Operational Technology (OT), this collaboration bridges the divide between white space compute and facility operations.
With thermal stability secured by Phaidra’s AI agents, facilities can safely raise their supply water temperatures, significantly reducing the burden on facility chillers. This provides the foundation for the next phase of the collaboration, where operators dynamically shift stranded power from the cooling system to revenue-generating IT compute. For a baseline 1GW AI factory, raising the coolant temperature safely could unlock billions in additional annual revenue.
“In a world where computational resources are limited by energy availability, every watt that isn’t being used for valuable token generation is a wasted watt,” the joint white paper states.
By co-designing power, cooling, and workload management systems, Phaidra, CoreWeave, NVIDIA, and critical infrastructure partner Applied Digital, are setting a new standard for reliability, end-user SLAs, and peak operational efficiency in the age of AI.
Mandi Fong
Phaidra
email us here
Legal Disclaimer:
EIN Presswire provides this news content “as is” without warranty of any kind. We do not accept any responsibility or liability
for the accuracy, content, images, videos, licenses, completeness, legality, or reliability of the information contained in this
article. If you have any complaints or copyright issues related to this article, kindly contact the author above.
![]()






























