Saturday, February 28, 2026

THE AI INFRASTRUCTURE BUILD Cooling: The Unsexy Necessity Post 6: Terrestrial Foundation From Air to Liquid — Why Blackwell GPUs Changed Everything

The AI Infrastructure Build: Post 6 - Cooling: The Unsexy Necessity ```

Cooling: The Unsexy Necessity

Post 6: Terrestrial Foundation

From Air to Liquid — Why Blackwell GPUs Changed Everything

By Randy Gipe | March 2026

NVIDIA GPUs don't just consume power. They generate massive heat.

An H100 chip: 700W. That’s seven incandescent light bulbs’ worth of heat—in a chip the size of your palm.

Blackwell: 1,000W. Ten light bulbs. And you’re putting 80,000 of them in one building.

80 megawatts of heat. Continuously. 24/7.

Air conditioning can’t handle it anymore. The entire industry is shifting to liquid cooling—pumping coolant directly onto chips, or even submerging entire servers in fluid.

This is the unglamorous infrastructure nobody photographs. But without it, AI stops.

Part 1: The Heat Problem

How Much Heat Are We Talking About?

🔥 GPU HEAT GENERATION (2020-2026)

Evolution of AI chip heat:

Chip	TDP (Watts)	Heat per Rack (40 GPUs)	Cooling Challenge
NVIDIA V100 (2018)	300W	12 kW	Air cooling sufficient
NVIDIA A100 (2020)	400W	16 kW	Air cooling strained
NVIDIA H100 (2022)	700W	28 kW	Liquid recommended
Blackwell B200 (2025)	1,000W	40 kW	Liquid required

For a 10,000 GPU cluster (Blackwell):

10,000 GPUs × 1,000W = 10 MW of heat
Equivalent to running 10,000 space heaters simultaneously
Or: Heating 500 average homes in winter

Data center cooling rule of thumb:

For every 1 MW of IT power, need 0.3-0.5 MW of cooling power
10 MW IT load → 3-5 MW cooling → 13-15 MW total facility power

This is why Post 3's power crisis matters—cooling multiplies the electricity need.

Why Air Cooling Fails at Scale

Traditional data center cooling (pre-AI):

Cold air blown into server racks
Hot air exhausted out the back
Works fine for 5-10 kW per rack (traditional servers)

AI data center cooling (2024+):

40+ kW per rack (Blackwell)
Air can't absorb heat fast enough
GPUs overheat → throttle performance → wasted money
Air cooling maxes out at ~20-25 kW/rack

The physics problem:

Air's heat capacity: ~1 kJ/(kg·K)
Water's heat capacity: ~4.2 kJ/(kg·K)
Water absorbs 4x more heat per unit mass than air
Result: Liquid cooling = only viable option for Blackwell-density racks

Part 2: The Liquid Cooling Revolution

Direct-to-Chip Liquid Cooling

💧 HOW DIRECT LIQUID COOLING WORKS

The system:

Cold plates: Metal plates mounted directly onto GPUs/CPUs
Coolant: Water or water-glycol mixture flows through cold plates
Heat transfer: Coolant absorbs heat from chips (direct contact)
Heat rejection: Hot coolant pumped to cooling towers/chillers outside building
循环: Cooled fluid returns to servers, cycle repeats

Advantages:

Efficiency: 30-40% more efficient than air cooling (less energy for same cooling)
Density: Can cool 40-80 kW racks (Blackwell + future chips)
Noise: Quieter (no loud fans)
Space: Smaller cooling infrastructure footprint

Disadvantages:

Complexity: Plumbing, leak risks, maintenance
Cost: 30-50% higher capex than air cooling
Expertise: Requires skilled technicians (can't just swap parts like air systems)

Adoption (2026):

50%+ of new AI data centers use direct liquid cooling
Up from <10% in 2022 (pre-H100 era)
Projected: 80%+ by 2028 (as Blackwell deploys at scale)

Immersion Cooling — The Extreme Solution

For ultra-high-density deployments: Submerge entire servers in liquid.

🌊 IMMERSION COOLING

How it works:

Servers placed in tanks filled with dielectric fluid (non-conductive, doesn't short-circuit electronics)
GPUs, memory, everything submerged
Heat transfers directly from components to fluid
Hot fluid pumped to heat exchangers, cooled, returned

Types of immersion:

1. Single-phase immersion:

Fluid stays liquid (doesn't boil)
Simpler, more common
Can cool 100-200 kW per tank

2. Two-phase immersion:

Fluid boils at low temperature (~50°C)
Vapor rises, condenses, returns as liquid
More efficient but complex
Can cool 250+ kW per tank

Advantages:

Extreme density: Can cool 100+ kW racks (beyond Blackwell, future-proof)
Efficiency: 40-50% more efficient than air (PUE ~1.05 vs. air's 1.3-1.5)
No dust: Sealed systems, no particulate contamination

Disadvantages:

Cost: 2-3x more expensive than air cooling
Maintenance: Accessing components requires draining tanks
Fluid cost: Dielectric fluids expensive ($50-200/gallon, thousands of gallons needed)
Psychological barrier: Operators nervous about submerging expensive GPUs

Adoption (2026):

~5-10% of new AI data centers use immersion
Mostly hyperscalers experimenting (Microsoft, Meta testing)
Bitcoin miners pivoting to AI (Post 4) often use immersion (already had infrastructure)

The Cooling Adoption Curve

Year	Air Cooling	Direct Liquid	Immersion	Driver
2020	95%	4%	1%	A100 era (400W, air sufficient)
2023	70%	25%	5%	H100 (700W, liquid recommended)
2026	40%	50%	10%	Blackwell (1000W, liquid required)
2028 (proj.)	20%	65%	15%	Next-gen GPUs (1200-1500W)

Air cooling won't disappear (still used for inference, legacy systems), but liquid dominates new AI builds.

Part 3: The Cooling Infrastructure Winners

Vertiv — The Data Center Cooling Leader

❄️ VERTIV

What they do:

Data center infrastructure: Cooling, power distribution, monitoring
Leading provider of direct liquid cooling systems for AI

Revenue (2025):

~$7.5B total revenue (up 15-20% YoY, AI-driven)
Thermal management (cooling): ~40% of revenue (~$3B)
Gross margins: ~30-35%

AI cooling products:

Liebert DSE: Direct liquid cooling system (rack-level)
Liebert EconoPhase: Two-phase immersion cooling
Cold plates, coolant distribution units (CDUs), heat rejection

Customer base:

Hyperscalers (AWS, Azure, Google Cloud)
Data center REITs (Digital Realty, Equinix)
Enterprises deploying on-prem AI

Stock performance:

Nov 2022 (ChatGPT launch): ~$10
March 2026: ~$90-110
+800-1,000% gain (massive AI infrastructure winner)

Why Vertiv wins:

Incumbent advantage (already in 80%+ of large data centers)
End-to-end solutions (cooling + power + monitoring integrated)
Scale: Can deliver thousands of cooling units/year

Schneider Electric — The Diversified Giant

⚡ SCHNEIDER ELECTRIC

What they do:

Energy management, industrial automation, data center infrastructure
Cooling, UPS (uninterruptible power), power distribution

Revenue (2025):

~€40B total (~$43B USD)
Data center segment: ~€8-10B (~$9-11B, 20-25% of total)
AI driving data center growth 20-30% YoY

AI cooling products:

EcoStruxure: Integrated data center management platform
APC by Schneider: Liquid cooling systems, in-row coolers
Partnerships with hyperscalers for custom solutions

Why Schneider competes:

Diversified (not dependent on data centers alone)
Global scale (operates in 100+ countries)
Software integration (cooling + power + monitoring via EcoStruxure)

Startups & Niche Players

LiquidStack:

Immersion cooling specialist
Two-phase immersion systems
Backed by Bitcoin mining pivot companies

CoolIT Systems:

Direct liquid cooling (cold plates, CDUs)
Focus: High-performance computing (HPC), AI

Asetek:

Liquid cooling for servers/GPUs
Originally gaming PC cooling (scaled to data centers)

These startups have 10-15% combined market share. Vertiv + Schneider dominate 60-70%.

Part 4: The Economics — 15-20% of Data Center Capex

Cooling Cost Breakdown

💰 EXAMPLE: 500 MW AI DATA CENTER (BLACKWELL)

Total IT load: 500 MW

Cooling requirements:

500 MW IT × 1.3 PUE (Power Usage Effectiveness) = 650 MW total facility power
Cooling power: ~150 MW

Cooling capex (direct liquid cooling):

1. In-rack cooling (cold plates, manifolds):

~50,000 servers × $5,000-8,000/server = $250-400M

2. Coolant distribution units (CDUs):

~500 units × $100k-200k = $50-100M

3. Heat rejection (cooling towers, chillers):

150 MW cooling capacity × $500k-1M/MW = $75-150M

4. Piping, pumps, controls:

$100-200M

Total cooling capex: $475-850M

Total data center capex: $3-4B (GPUs, servers, networking, cooling, building, power)

Cooling as % of total: 12-28% (average ~15-20%)

For comparison, air cooling would be:

~$300-500M (30-40% cheaper)
But can't handle Blackwell density (wouldn't work)

Operating Costs (Opex)

Cooling also consumes power continuously:

Air cooling PUE: 1.3-1.5 (30-50% overhead on IT power)
Liquid cooling PUE: 1.15-1.25 (15-25% overhead)
Immersion PUE: 1.05-1.15 (5-15% overhead)

For 500 MW IT load:

Air cooling: 150-250 MW cooling power → $0.08/kWh × 8,760 hours = $105-175M/year
Liquid cooling: 75-125 MW → $52-87M/year
Immersion: 25-75 MW → $17-52M/year

Opex savings from liquid cooling: $50-100M/year

Payback on higher capex: 5-8 years (liquid cooling pays for itself via energy savings)

Part 5: The Verdict — Cooling = Unglamorous but Essential

Nobody writes headlines about cooling. But without it, $400M GPU clusters become space heaters.

The picks-and-shovels thesis:

Vertiv: +800-1,000% since ChatGPT launch (infrastructure winner)
Schneider Electric: Data center segment growing 20-30% YoY
Cooling = 15-20% of data center capex (non-trivial)

The transition is inevitable:

Blackwell requires liquid (1,000W/chip)
Next-gen GPUs will be even hotter (1,200-1,500W projected)
Air cooling relegated to legacy/inference workloads
Liquid becomes standard by 2028

Infrastructure players capture steady returns while AI apps burn cash searching for business models.

What's Next in the Series

Post 7 (FINAL POST OF SECTION 1): Who Pays? — The $220B Capex Explosion

Microsoft, Google, Amazon, Meta spending $220 billion collectively in 2025. Where does it all go?

What we'll cover:

Hyperscaler capex breakdown (GPUs 40-50%, networking 20-30%, power/cooling 15-20%, buildings 10-15%)
OpenAI's $6B annual burn (mostly compute costs)
When does ROI kick in? (Azure AI revenue growing, but not yet profitable)
The coming capex taper? (2027-2028 risk if AI revenue doesn't materialize)

This completes Section 1: Terrestrial Foundation!

Then Section 2: The Power Solution (SMR nuclear, grid expansion)

SOURCES

GPU Heat Specifications:

NVIDIA product datasheets: H100, H200, Blackwell TDP (thermal design power)

Cooling Technology:

Vertiv, Schneider Electric product documentation (direct liquid, immersion systems)
Industry reports (Uptime Institute, Data Center Dynamics): PUE benchmarks, adoption rates

Company Financials:

Vertiv quarterly earnings (2025): Revenue growth, stock performance
Schneider Electric annual reports: Data center segment revenue

Cost Estimates:

Industry sources (JLL, CBRE): Data center construction costs, cooling capex breakdowns

the gipster

Saturday, February 28, 2026

THE AI INFRASTRUCTURE BUILD Cooling: The Unsexy Necessity Post 6: Terrestrial Foundation From Air to Liquid — Why Blackwell GPUs Changed Everything

Cooling: The Unsexy Necessity

Post 6: Terrestrial Foundation

Part 1: The Heat Problem

How Much Heat Are We Talking About?

🔥 GPU HEAT GENERATION (2020-2026)

Why Air Cooling Fails at Scale

Part 2: The Liquid Cooling Revolution

Direct-to-Chip Liquid Cooling

💧 HOW DIRECT LIQUID COOLING WORKS

Immersion Cooling — The Extreme Solution

🌊 IMMERSION COOLING

The Cooling Adoption Curve

Part 3: The Cooling Infrastructure Winners

Vertiv — The Data Center Cooling Leader

❄️ VERTIV

Schneider Electric — The Diversified Giant

⚡ SCHNEIDER ELECTRIC

Startups & Niche Players

Part 4: The Economics — 15-20% of Data Center Capex

Cooling Cost Breakdown

💰 EXAMPLE: 500 MW AI DATA CENTER (BLACKWELL)

Operating Costs (Opex)

Part 5: The Verdict — Cooling = Unglamorous but Essential

What's Next in the Series

SOURCES

THE AI INFRASTRUCTURE BUILD — POST 6 COMPLETE

No comments:

Post a Comment

About Me

Subscribe To

Saturday, February 28, 2026

THE AI INFRASTRUCTURE BUILD Cooling: The Unsexy Necessity Post 6: Terrestrial Foundation From Air to Liquid — Why Blackwell GPUs Changed Everything

Cooling: The Unsexy Necessity

Post 6: Terrestrial Foundation

Part 1: The Heat Problem

How Much Heat Are We Talking About?

🔥 GPU HEAT GENERATION (2020-2026)

Why Air Cooling Fails at Scale

Part 2: The Liquid Cooling Revolution

Direct-to-Chip Liquid Cooling

💧 HOW DIRECT LIQUID COOLING WORKS

Immersion Cooling — The Extreme Solution

🌊 IMMERSION COOLING

The Cooling Adoption Curve

Part 3: The Cooling Infrastructure Winners

Vertiv — The Data Center Cooling Leader

❄️ VERTIV

Schneider Electric — The Diversified Giant

⚡ SCHNEIDER ELECTRIC

Startups & Niche Players

Part 4: The Economics — 15-20% of Data Center Capex

Cooling Cost Breakdown

💰 EXAMPLE: 500 MW AI DATA CENTER (BLACKWELL)

Operating Costs (Opex)

Part 5: The Verdict — Cooling = Unglamorous but Essential

What's Next in the Series

SOURCES

THE AI INFRASTRUCTURE BUILD — POST 6 COMPLETE

No comments:

Post a Comment