Energy Per Inference Console
The true efficiency of AI serving is energy per token — power divided by throughput. It scales straight to cost and carbon per million tokens, the unit the inference economy runs on. Compute it from your hardware power and throughput, in any currency.
Serving power & throughput → energy and cost per token.
Inference economics console
At 5600W and 2,500 tok/s, each token costs 2.24 J — $0.05 per million. Doubling throughput per watt (batching, quantization, efficient hardware) halves all of it.
A 500-token query uses 0.31 Wh and costs $0.00002 in electricity — the hard floor under any API price for this workload. You get 3.2k such queries per kWh.
This is the IT-level figure — multiply by your datacenter PUE for the all-in facility cost.
Scale it to a full fleet in the Data Center Power console.
Currency conversion uses indicative rates — verify against a live source for contracts.
Why energy per token rules inference
Performance per watt collapses, for LLMs, into energy per token: the joules spent to generate one token. It folds hardware power and throughput into one number that scales straight to cost and carbon per query.
A model is trained once but serves billions of queries. Cumulative inference energy quickly exceeds the one-time training cost, which is why per-token efficiency — not just training FLOPs — drives the real energy and cost story.
Two systems at the same tokens-per-second can differ hugely in energy per token if one draws far more power. Batching and efficient hardware raise throughput per watt, the lever that lowers cost and carbon simultaneously.
API pricing, margins and capacity planning all run in dollars (or your currency) per million tokens. Tying that unit back to energy reveals the hard floor under inference pricing — you can't price below your energy cost.
The joule behind every token
Every token a language model generates costs a measurable amount of energy, and at the scale AI now operates — trillions of tokens served to a global user base — that tiny per-token figure compounds into the dominant cost and carbon line of the whole enterprise. Understanding it starts with a single, exact relation: energy per token is the serving hardware's power divided by its throughput in tokens per second, because a watt is just a joule per second.
That number is the right efficiency metric precisely because it unifies the two things that pull against each other. Raw tokens-per-second flatters a power-hungry system; raw wattage ignores the work done. Energy per token captures throughput-per-watt, so a fast-but-thirsty accelerator and a slower-but-efficient one are judged on the same honest basis — and it's expressed in tokens, the unit inference actually bills in, so it flows directly into cost per million tokens and carbon per query.
The strategic insight is that inference, not training, dominates a model's lifetime energy. Training is a single large expense; serving is billions of queries over months and years, and the cumulative energy overtakes training many times over. That's why the levers that raise throughput-per-watt — batching, quantization, efficient hardware, speculative decoding — matter so much: each one lowers energy per token, and therefore cost and carbon per query, simultaneously. The sensitivity bars here show that leverage directly.
Because cost per million tokens is the unit of the AI economy — the floor under API pricing and the basis of capacity planning — this console reports it in your chosen currency, tied to real power and throughput. Scale a single node's figure to a full fleet in the Data Center Power console, and remember to apply PUE for the all-in facility cost.
Trusted by ML Systems, Inference & Sustainability Teams
“Energy per token is the metric we optimize on, and this expresses it cleanly — power over throughput, straight to cost per million tokens. Showing that batching halves energy per token, in cost terms and in our reporting currency, is the slide that funds the optimization work.”
“Finally cost-per-Mtok with real currency support — I compare our serving cost floor against API prices in USD and BRL without a spreadsheet. The inference-dwarfs-training point reframed our whole efficiency roadmap. Accurate and fast.”
“Separating energy per token from grid intensity is exactly right — it shows efficiency and clean-grid siting as two distinct levers on per-query carbon. At trillion-token scale those grams add up, and this makes the case quantitatively.”
“The energy floor under token pricing is the insight I needed for margin modeling. Cost per million tokens in our currency, tied to real power and throughput, is the number. Would love prefill/decode split, but as a first-order economics tool it's excellent.”
Love using our calculator?
Related tools
Similar Calculators
More tools in the same category
Thermal Density Calculator
Analyze heat concentration hotspots and cooling requirements across multi-die packages, HBM stacks, and power-delivery networks. Supports 3D stacking scenarios with microfluidic cooling, vapor-chamber integration, and liquid-metal TIM modeling for next-gen AI silicon.
Junction Temperature Calculator
Estimate semiconductor junction temperatures under dynamic load conditions with transient thermal analysis, power-map import, and multi-die interaction modeling. Incorporates backside power delivery impact, ambient temperature profiles, and safety-margin recommendations.
Heat Sink Sizing Calculator
Determine required heat sink specifications including fin geometry, base-plate area, and airflow requirements for air-cooled, liquid-cooled, and immersion-cooled systems. Optimizes for acoustic noise, form-factor constraints, and thermal resistance budgets in data-center deployments.
Power Budget Calculator
Estimate power consumption across CPU, GPU, NPU, memory, and I/O subsystems with workload-aware profiling and dynamic voltage-frequency scaling (DVFS) modeling. Supports AI inference and training workloads with per-layer power breakdown and TDP envelope management.
Power Delivery Network Calculator
Analyze power delivery efficiency, IR drop, and Ldi/dt noise across on-die, package, and PCB power distribution networks. Models backside power delivery networks (BSPDN), integrated voltage regulators, and decoupling capacitor placement for sub-1V advanced-node designs.
Voltage Drop Calculator
Calculate static and dynamic voltage drop (IR and Ldi/dt) across power distribution networks with mesh and tree topology analysis. Supports electromigration-aware routing, bump-array optimization, and worst-case vector generation for sign-off-quality verification.
Often Used Together
Complementary tools for complete analysis
Related Articles
Dive deeper with our expert guides and tutorials related to Energy Per Inference Calculator
energy per token = power ÷ throughput · cost/1M tokens = energy × price · CO₂ = energy × grid intensity · Last reviewed: 2026-06