Skip to content
Nodes · racks · power · interconnect · bisection

GPU Cluster Sizing Console

Turn a GPU count into a datacenter build. Compute the nodes, racks, power and network — power, not floor space, is the constraint, and the interconnect is what makes the GPUs actually scale.

01 · Quick scope

GPU count & topology → nodes, racks and power.

Nodes / racks
128/32
Facility power
1.2MW
Power, racks & network topology ↓
02 · Deep analysis

Cluster build console

Physical build
Nodes
128
8 GPUs each
Racks
32
4 nodes/rack
IT power
909 kW
Power / rack
37 kW
air feasible
Network topology
NVLink domain
8 GPUs
all-to-all in node
Inter-node NICs
128
InfiniBand/Ethernet
Bisection BW
~26 TB/s
full fat-tree
Facility power
1.18 MW
PUE 1.3
Read-out

1,024 GPUs is 128 nodes across 32 racks, drawing 909 kW IT (1.18 MW facility). At 37 kW/rack air cooling is feasible.

GPUs share an all-to-all NVLink domain within each 8-GPU node; a fat-tree of 128 inter-node links gives ~26 TB/s bisection for all-reduce. Under-provision it and scaling stalls.

Get the GPU count from Training Cost; cost the facility power in Data Center Power.

Why it matters

Why a cluster is power and network

Power, not floor space, limits AI clusters

A rack of kilowatt-class GPUs draws tens of kilowatts — far beyond traditional ~10kW racks. Modern AI datacenters are designed around power and cooling density, with far fewer servers per rack than legacy halls.

The interconnect is the cluster

Training spreads a model across thousands of GPUs that must exchange gradients constantly. The network — NVLink within a node, InfiniBand or Ethernet between — determines whether the GPUs actually scale or stall on communication.

NVLink domains then a fat-tree

Within a node, GPUs share an all-to-all high-bandwidth NVLink domain; across nodes, a fat-tree of InfiniBand switches provides full bisection bandwidth. The two-tier hierarchy is the standard AI-cluster topology.

Bisection bandwidth gates scaling

How much data can cross the middle of the network at once — the bisection bandwidth — sets the ceiling on collective operations like all-reduce. Under-provision it and adding GPUs stops helping.

Field notes

From a pile of GPUs to a machine

A GPU count is not a cluster. Turning thousands of accelerators into a machine that trains a model is a problem of power and network far more than of the chips themselves, and getting either wrong wastes the silicon. The first surprise for anyone from traditional IT is power density: a rack of kilowatt-class GPUs draws tens of kilowatts, several times what legacy racks were built for, so AI datacenters are designed around delivering and removing that power — fewer servers per rack, liquid cooling, dedicated substations.

The second, deeper truth is that the interconnect is the cluster. Distributed training synchronizes gradients across every GPU on every step, so the machine is only as fast as its slowest communication path. The standard answer is a two-tier hierarchy: within a node, eight GPUs share an all-to-all NVLink domain at terabytes per second; across nodes, a fat-tree of InfiniBand switches provides full bisection bandwidth so any node can talk to any other at full rate simultaneously.

That bisection bandwidth — how much data can cross the middle of the network at once — is the number that gates scaling. Collective operations like all-reduce move data across the whole cluster, and if the network can't carry it, communication overwhelms computation and adding GPUs stops helping (the scaling efficiency, and the MFU, collapse). This is why frontier clusters spend enormously on networking, and why a sizing exercise must include the network, not just the node and rack counts.

This console scopes that build from a GPU count — nodes, racks, the power envelope, and a first-order bisection bandwidth. The count itself comes from the workload in the Training Cost console, the facility power flows into the Data Center Power console for energy and cost, and the own-vs-rent question is the Accelerator ROI console.

GPU Cluster Sizing FAQs

Have more questions? Contact us

Trusted by Datacenter & HPC Infrastructure Teams

4.8
Based on 2,990 reviews

Nodes, racks, power-per-rack and bisection bandwidth from a GPU count is exactly the first-pass cluster scope. The power-not-floor-space point is the one facility teams must internalize — 37kW/rack is a different building than legacy. The two-tier NVLink/InfiniBand model is right.

D
Dr. Viktor Petrov
Datacenter network architect
June 14, 2026

The 16k-GPU at 25MW figure is the utility-scale reality that drives our site selection. Bisection bandwidth gating all-reduce is the insight that justifies the InfiniBand spend. Pairs perfectly with the training-cost and data-center power tools.

A
Amara Okafor
AI infrastructure planning
May 20, 2026

Clean node/rack/power scoping with the NVLink-domain and fat-tree framing. Power-per-rack against our density limit is the check I run first. Would love switch-count and oversubscription modeling, but as a first-pass sizing tool it's exactly right.

L
Lukas Brandt
HPC systems engineer
March 30, 2026

We scope clusters off this before detailed design — node count, megawatts, racks. The interconnect-is-the-cluster framing reframes it from a pile of GPUs to a network. Feeds straight into the data-center power estimator. Excellent.

M
Mei Tan
Cloud capacity planning
December 30, 2025

Love using our calculator?

Connected instruments

Related tools

Similar Calculators

More tools in the same category

Inference Cost Calculator

Estimate deployment costs for AI models across cloud, edge, and hybrid infrastructures with per-query, per-token, and per-hour pricing models. Integrates GPU/ASIC rental rates, network egress, storage, and scaling overhead for accurate inference TCO analysis.

Training Cost Calculator

Calculate AI model training expenses including GPU cluster rental, data transfer, checkpoint storage, and engineering time with distributed-training overhead modeling. Supports LLM, vision, and multimodal training with FLOPs-to-cost mapping and carbon-footprint estimation.

Model Fit Checker

Verify whether AI models fit within hardware constraints including GPU HBM capacity, on-chip SRAM, and interconnect bandwidth with layer-wise memory profiling. Supports model parallelism, pipeline parallelism, and ZeRO optimization recommendations for large-model deployment.

HBM Bandwidth Calculator

Estimate memory bandwidth requirements for AI workloads with operation-type analysis, data-movement profiling, and roofline model integration. Calculates HBM generation selection, channel count, and clock-speed requirements to eliminate memory-bound bottlenecks.

AI Chip Comparator

Compare AI accelerators across performance, cost, power, and software-ecosystem metrics with normalized benchmarking for training and inference workloads. Supports NVIDIA, AMD, Intel, Google TPU, Amazon Trainium, and custom ASICs with TCO-per-FLOP analysis.

Token Cost Estimator

Calculate infrastructure costs per token generated for LLM serving with batch-size optimization, KV-cache management, and speculative decoding impact. Models pricing for API providers and self-hosted deployments with demand-spike handling and multi-model routing.

Often Used Together

Complementary tools for complete analysis

Learn More

Related Articles

Dive deeper with our expert guides and tutorials related to GPU Cluster Sizing

Loading articles...

nodes = GPUs ÷ per-node · facility power = (GPUs×W + nodes×overhead) × PUE · bisection ≈ GPUs × NIC ÷ 2 · Last reviewed: 2026-06