Skip to content
6ND FLOPs · MFU · GPU-hours · cost & carbon

Training Cost Console

A model's training cost is one formula: ≈ 6 × parameters × tokens FLOPs, divided by your cluster's real throughput (peak FLOPS × MFU). Get the wall-clock days, GPU-hours, cost in any currency, energy and CO₂.

01 · Quick estimate

Params, tokens, GPUs & MFU → time and cost.

H100: 990 TFLOPS bf16 · 700W

Training time
10.7
days
Compute cost
$1.57M
GPU-hours, energy & carbon ↓
02 · Deep analysis

Training-run console

Compute & schedule
Total FLOPs
840.0e21
6 × N × D
Cluster throughput
912 PFLOP/s
45% MFU
Wall-clock
10.7 d
GPU-hours
524k
Compute cost
$1.57M
USD
Energy
367 MWh
Carbon
136 t
CO₂, grid-dependent
Cost / 1k GPU-h
$3K
Read-out

Training 70B params on 2000B tokens is 840.0×10²¹ FLOPs. On 2,048 H100s at 45% MFU that runs 10.7 days (524k GPU-hours), costing $1,571,268 in compute and emitting 136 t CO₂.

Halving MFU to 23% would double the time and cost — MFU is the biggest efficiency lever after GPU choice. This is compute-only; add data, salaries and failed runs (+20–50%).

Compare buying vs renting this in the Accelerator ROI console; size the cluster in GPU Cluster Sizing.

Currency conversion uses indicative rates — verify against a live source for contracts.

Why it matters

Why one formula sets the bill

Training compute is 6 × params × tokens

The Chinchilla rule puts a model's training cost in one formula: roughly six FLOPs per parameter per token. It turns 'how big, how much data' straight into a compute — and a bill.

MFU is the efficiency tax

No cluster hits peak FLOPS. Model FLOPs Utilization — typically 30–50% — captures real efficiency after communication, memory and pipeline bubbles. It can double or halve your time and cost.

Frontier runs cost tens of millions

A frontier model on trillions of tokens across tens of thousands of GPUs runs months and tens of millions of dollars in compute alone — before salaries, data and failed runs. Compute is the headline number.

Energy and carbon scale with the run

Tens of thousands of GPUs at a kilowatt each for months is gigawatt-hours of energy and hundreds of tonnes of CO₂ — increasingly a reported figure, and a real cost on a dirty grid.

Field notes

Six FLOPs per parameter per token

The cost of training a large model collapses, with surprising accuracy, into a single relationship: about six floating-point operations per parameter per token. Two for the forward pass, four for the backward — multiply by the parameter count and the number of training tokens and you have the total compute, the 6ND rule that underpins every training budget. It turns the two decisions that define a model, how big and how much data, directly into FLOPs and therefore dollars.

But FLOPs aren't time, because no cluster runs at its peak. Model FLOPs Utilization — the fraction of theoretical peak actually achieved after communication overhead, memory limits, and pipeline bubbles — is the efficiency tax, and it's large: real runs land at 30–50%. It scales the schedule and the cost linearly, so a run at 40% MFU takes two and a half times longer than the peak number suggests. Using a realistic MFU rather than the spec-sheet FLOPS is the single most important thing in an honest estimate, which is why it's a first-class input here.

At frontier scale these numbers become staggering. Hundreds of billions of parameters on tens of trillions of tokens is upward of 10²⁵ FLOPs, demanding tens of thousands of GPUs for months and tens of millions of dollars in compute alone. And that compute is only the floor — the full budget adds data pipelines, engineering teams, checkpoint storage, and the failed runs that can add a half again to the total. The compute figure is just the part that scales most violently with ambition.

It's also an energy and carbon story now. Tens of thousands of kilowatt-class GPUs for months is gigawatt-hours and hundreds of tonnes of CO₂, varying several-fold with how clean the grid is — increasingly a reported number. Once you have the compute cost, decide whether to rent or buy it in the Accelerator ROI console, and size the cluster in the GPU Cluster Sizing console.

Training Cost FAQs

Have more questions? Contact us

Trusted by ML Infrastructure & Research Teams

4.8
Based on 3,180 reviews

6ND for FLOPs and MFU for the efficiency tax is exactly how we budget a training run, and seeing the cost in dollars and euros settles cross-region planning. The frontier preset at tens of millions is the number leadership needs. MFU as a first-class input is what most calculators get wrong.

D
Dr. Sasha Volkov
ML infrastructure lead
June 14, 2026

The time-from-MFU calculation matches our actual run schedules closely. Switching GPU types to compare total cost vs time is the analysis I do before every cluster reservation. Carbon alongside cost is increasingly what we report — having both here is perfect.

L
Liam O'Brien
AI research engineering
May 13, 2026

Clean compute-cost floor with the right levers — params, tokens, MFU, rate. The currency support saves a spreadsheet. Would love failed-run and PUE multipliers built in, but as the core 6ND cost estimate it's exactly right and fast.

Y
Yuki Tanaka
Cloud cost optimization
March 23, 2026

Existential numbers made concrete — what a 7B-from-scratch run actually costs in GPU-hours and dollars before we commit. The MFU sensitivity is the reality check. Pairs perfectly with the inference and accelerator-ROI tools.

P
Priya Nair
AI startup CTO
December 30, 2025

Love using our calculator?

Connected instruments

Related tools

Similar Calculators

More tools in the same category

Inference Cost Calculator

Estimate deployment costs for AI models across cloud, edge, and hybrid infrastructures with per-query, per-token, and per-hour pricing models. Integrates GPU/ASIC rental rates, network egress, storage, and scaling overhead for accurate inference TCO analysis.

GPU Cluster Sizing

Determine optimal GPU cluster configurations for training and inference workloads with interconnect topology modeling, memory-bandwidth balancing, and fault-tolerance planning. Supports NVIDIA, AMD, and custom accelerator clusters with InfiniBand and NVLink network analysis.

Model Fit Checker

Verify whether AI models fit within hardware constraints including GPU HBM capacity, on-chip SRAM, and interconnect bandwidth with layer-wise memory profiling. Supports model parallelism, pipeline parallelism, and ZeRO optimization recommendations for large-model deployment.

HBM Bandwidth Calculator

Estimate memory bandwidth requirements for AI workloads with operation-type analysis, data-movement profiling, and roofline model integration. Calculates HBM generation selection, channel count, and clock-speed requirements to eliminate memory-bound bottlenecks.

AI Chip Comparator

Compare AI accelerators across performance, cost, power, and software-ecosystem metrics with normalized benchmarking for training and inference workloads. Supports NVIDIA, AMD, Intel, Google TPU, Amazon Trainium, and custom ASICs with TCO-per-FLOP analysis.

Token Cost Estimator

Calculate infrastructure costs per token generated for LLM serving with batch-size optimization, KV-cache management, and speculative decoding impact. Models pricing for API providers and self-hosted deployments with demand-spike handling and multi-model routing.

Often Used Together

Complementary tools for complete analysis

Learn More

Related Articles

Dive deeper with our expert guides and tutorials related to Training Cost Calculator

Loading articles...

FLOPs ≈ 6 × params × tokens · time = FLOPs ÷ (GPUs × peak FLOPS × MFU) · cost = GPU-hours × rate · Last reviewed: 2026-06