Skip to content
Energy per token · cost & carbon per million tokens

Energy Per Inference Console

The true efficiency of AI serving is energy per token — power divided by throughput. It scales straight to cost and carbon per million tokens, the unit the inference economy runs on. Compute it from your hardware power and throughput, in any currency.

01 · Quick estimate

Serving power & throughput → energy and cost per token.

Energy / token
2.24
J/token
Cost / 1M tokens
$0.05
Per-query cost, carbon & efficiency ↓
02 · Deep analysis

Inference economics console

Per-token & per-query
Energy / token
2.24 J
Cost / 1M tokens
$0.05
Energy / query
0.31 Wh
500 tokens
Cost / query
$0.00002
Queries / kWh
3.2k
CO₂ / 1M tok
230 g
Throughput-per-watt is the lever

At 5600W and 2,500 tok/s, each token costs 2.24 J$0.05 per million. Doubling throughput per watt (batching, quantization, efficient hardware) halves all of it.

Current
$0.05
+50% throughput
$0.033
2× throughput
$0.025
Read-out

A 500-token query uses 0.31 Wh and costs $0.00002 in electricity — the hard floor under any API price for this workload. You get 3.2k such queries per kWh.

This is the IT-level figure — multiply by your datacenter PUE for the all-in facility cost.

Scale it to a full fleet in the Data Center Power console.

Currency conversion uses indicative rates — verify against a live source for contracts.

Why it matters

Why energy per token rules inference

Energy per token is the true efficiency metric

Performance per watt collapses, for LLMs, into energy per token: the joules spent to generate one token. It folds hardware power and throughput into one number that scales straight to cost and carbon per query.

Inference dwarfs training over a model's life

A model is trained once but serves billions of queries. Cumulative inference energy quickly exceeds the one-time training cost, which is why per-token efficiency — not just training FLOPs — drives the real energy and cost story.

Throughput per watt beats raw speed

Two systems at the same tokens-per-second can differ hugely in energy per token if one draws far more power. Batching and efficient hardware raise throughput per watt, the lever that lowers cost and carbon simultaneously.

Cost per million tokens is the unit of the AI economy

API pricing, margins and capacity planning all run in dollars (or your currency) per million tokens. Tying that unit back to energy reveals the hard floor under inference pricing — you can't price below your energy cost.

Field notes

The joule behind every token

Every token a language model generates costs a measurable amount of energy, and at the scale AI now operates — trillions of tokens served to a global user base — that tiny per-token figure compounds into the dominant cost and carbon line of the whole enterprise. Understanding it starts with a single, exact relation: energy per token is the serving hardware's power divided by its throughput in tokens per second, because a watt is just a joule per second.

That number is the right efficiency metric precisely because it unifies the two things that pull against each other. Raw tokens-per-second flatters a power-hungry system; raw wattage ignores the work done. Energy per token captures throughput-per-watt, so a fast-but-thirsty accelerator and a slower-but-efficient one are judged on the same honest basis — and it's expressed in tokens, the unit inference actually bills in, so it flows directly into cost per million tokens and carbon per query.

The strategic insight is that inference, not training, dominates a model's lifetime energy. Training is a single large expense; serving is billions of queries over months and years, and the cumulative energy overtakes training many times over. That's why the levers that raise throughput-per-watt — batching, quantization, efficient hardware, speculative decoding — matter so much: each one lowers energy per token, and therefore cost and carbon per query, simultaneously. The sensitivity bars here show that leverage directly.

Because cost per million tokens is the unit of the AI economy — the floor under API pricing and the basis of capacity planning — this console reports it in your chosen currency, tied to real power and throughput. Scale a single node's figure to a full fleet in the Data Center Power console, and remember to apply PUE for the all-in facility cost.

Energy Per Inference FAQs

Have more questions? Contact us

Trusted by ML Systems, Inference & Sustainability Teams

4.8
Based on 3,050 reviews

Energy per token is the metric we optimize on, and this expresses it cleanly — power over throughput, straight to cost per million tokens. Showing that batching halves energy per token, in cost terms and in our reporting currency, is the slide that funds the optimization work.

D
Dr. Naomi Sasaki
ML systems efficiency
June 4, 2026

Finally cost-per-Mtok with real currency support — I compare our serving cost floor against API prices in USD and BRL without a spreadsheet. The inference-dwarfs-training point reframed our whole efficiency roadmap. Accurate and fast.

F
Felipe Costa
Inference infrastructure lead
April 25, 2026

Separating energy per token from grid intensity is exactly right — it shows efficiency and clean-grid siting as two distinct levers on per-query carbon. At trillion-token scale those grams add up, and this makes the case quantitatively.

A
Anita Deshmukh
AI sustainability analyst
March 5, 2026

The energy floor under token pricing is the insight I needed for margin modeling. Cost per million tokens in our currency, tied to real power and throughput, is the number. Would love prefill/decode split, but as a first-order economics tool it's excellent.

G
Greg Holloway
AI product finance
December 30, 2025

Love using our calculator?

Connected instruments

Related tools

Similar Calculators

More tools in the same category

Thermal Density Calculator

Analyze heat concentration hotspots and cooling requirements across multi-die packages, HBM stacks, and power-delivery networks. Supports 3D stacking scenarios with microfluidic cooling, vapor-chamber integration, and liquid-metal TIM modeling for next-gen AI silicon.

Junction Temperature Calculator

Estimate semiconductor junction temperatures under dynamic load conditions with transient thermal analysis, power-map import, and multi-die interaction modeling. Incorporates backside power delivery impact, ambient temperature profiles, and safety-margin recommendations.

Heat Sink Sizing Calculator

Determine required heat sink specifications including fin geometry, base-plate area, and airflow requirements for air-cooled, liquid-cooled, and immersion-cooled systems. Optimizes for acoustic noise, form-factor constraints, and thermal resistance budgets in data-center deployments.

Power Budget Calculator

Estimate power consumption across CPU, GPU, NPU, memory, and I/O subsystems with workload-aware profiling and dynamic voltage-frequency scaling (DVFS) modeling. Supports AI inference and training workloads with per-layer power breakdown and TDP envelope management.

Power Delivery Network Calculator

Analyze power delivery efficiency, IR drop, and Ldi/dt noise across on-die, package, and PCB power distribution networks. Models backside power delivery networks (BSPDN), integrated voltage regulators, and decoupling capacitor placement for sub-1V advanced-node designs.

Voltage Drop Calculator

Calculate static and dynamic voltage drop (IR and Ldi/dt) across power distribution networks with mesh and tree topology analysis. Supports electromigration-aware routing, bump-array optimization, and worst-case vector generation for sign-off-quality verification.

Often Used Together

Complementary tools for complete analysis

Learn More

Related Articles

Dive deeper with our expert guides and tutorials related to Energy Per Inference Calculator

Loading articles...

energy per token = power ÷ throughput · cost/1M tokens = energy × price · CO₂ = energy × grid intensity · Last reviewed: 2026-06