Question 1

How do you calculate the cost per token for self-hosting an LLM?

Accepted Answer

Compute the hourly cost of the serving GPU (its rental or amortized rate plus its electricity, including datacenter PUE), then divide by the tokens it produces per hour (throughput × 3600 × utilization). That gives cost per token; multiply by a million for the standard cost-per-million-tokens figure. For example, a GPU costing about $3.20/hour all-in, producing 7.2 million tokens per hour at 80% utilization, costs roughly $0.45 per million tokens. This calculator computes that and compares it to an API's per-million price, in your chosen currency.

Question 2

Is self-hosting or using a token API cheaper?

Accepted Answer

It depends almost entirely on volume and utilization. API providers charge a per-token price that bundles their infrastructure, margin and convenience — so for low or bursty volume the API is cheaper because you pay nothing when idle. But at high, steady volume, self-hosting's raw cost (a fraction of the API list price) wins, because you amortize the GPU across enormous token output. This calculator computes both per-million-token costs and the break-even monthly volume where self-hosting overtakes the API.

Question 3

Why do token APIs cost more than self-hosting per token?

Accepted Answer

Because the API price includes far more than raw compute: the provider's profit margin, their infrastructure and operations overhead, the convenience of zero setup and instant elastic scale, and the risk they absorb on utilization. You pay that premium for not running anything yourself. At low volume it's an excellent deal — you'd waste money on idle GPUs otherwise. At high volume, you're paying that markup on every one of billions of tokens, and self-hosting's lower raw cost dominates. This calculator quantifies the gap.

Question 4

What is the break-even volume for self-hosting?

Accepted Answer

It's the monthly token volume at which the total cost of self-hosting (the per-token serving cost plus fixed monthly costs like engineering and baseline infrastructure) equals what you'd pay an API for the same tokens. Below it, the API is cheaper; above it, self-hosting is. The fixed costs of running your own serving — ops, on-call, baseline capacity — mean there's a minimum scale before self-hosting pays off, even when its per-token cost is lower. This calculator estimates the break-even monthly volume in millions of tokens.

Question 5

How does utilization affect self-hosted token cost?

Accepted Answer

Critically — it's the denominator. Self-hosted cost per token is the GPU's fixed hourly cost divided by the tokens it actually produces, so a GPU running at 20% utilization costs five times as much per token as one at 100%. This is why self-hosting only beats the API for steady, high-throughput workloads: an underused GPU has a terrible cost-per-token, often worse than the API. The API's pay-per-use model is precisely designed to win the low-utilization case. This calculator makes utilization a primary input so you see its effect.

Question 6

Can I compare token costs in different currencies?

Accepted Answer

Yes. Use the currency selector to enter the GPU hourly rate, electricity price and API price, and see the self-hosted and API costs per million tokens in US dollars, euros, pounds, rupees, yen and other currencies. The throughput and token figures are currency-independent; only the money converts, using indicative rates. Since token pricing and infrastructure budgets are set in local currency, this makes the build-vs-buy comparison directly usable for your planning.

Question 7

What costs are in the self-hosted figure besides the GPU?

Accepted Answer

This calculator's per-token figure includes the GPU's hourly cost (rental or amortized purchase) and its electricity (power × PUE × rate). The break-even calculation adds fixed monthly costs — engineering, on-call, baseline infrastructure, and the share of capacity you keep available. A complete self-hosting cost would also include networking, storage, load balancing, and the failed-experiment and maintenance overhead. Treat the per-token number as the marginal serving cost and the fixed monthly input as the operational overhead; together they set the true break-even against the API.

Question 8

How does batching affect cost per token?

Accepted Answer

Hugely, because it raises throughput per GPU. LLM generation is memory-bandwidth-bound, so processing many requests together amortizes the weight loads across more tokens, multiplying tokens-per-hour for the same GPU cost — and cost per token is the hourly cost divided by tokens per hour. Continuous batching can cut self-hosted cost per token several-fold versus naive serving, which is often what makes self-hosting beat the API. This calculator takes throughput as an input, so improving it (via batching) directly lowers the computed self-hosted cost.

Question 9

Should I include hardware amortization or use a rental rate?

Accepted Answer

Either, consistently. If you rent GPUs, use the cloud hourly rate directly. If you own them, amortize the purchase price over the useful life and utilization to get an equivalent hourly cost (the accelerator-ROI tool does this), and use that as the rate here. Owning typically yields a lower effective hourly rate than renting at high utilization, further favoring self-hosting — but only if utilization is high enough to justify owning. This calculator takes the hourly rate as an input so it works for both rented and owned hardware.

Question 10

How accurate is this token-cost estimate?

Accepted Answer

The arithmetic — hourly cost ÷ tokens per hour — is exact for your inputs, and it captures the decisive economics. Accuracy depends on a realistic throughput (measure it with your model, hardware and batching, not a peak), the right all-in hourly GPU cost, and honest utilization. It models the marginal serving cost plus a fixed monthly overhead for break-even; it simplifies networking, storage, and variable demand. Use it for the build-vs-buy decision and break-even volume; refine throughput with load tests and the hourly rate with your actual contracts or amortization.

Question 11

Does this tool send my data anywhere?

Accepted Answer

No. All token-cost math — and the currency conversion — runs entirely in your browser in JavaScript. Nothing is uploaded and there's no telemetry.

Token Cost Console

Build-vs-buy console

Why volume decides build vs buy

The price of a million tokens

Token Cost FAQs

Trusted by AI Product & Inference Cost Teams

Related tools

Similar Calculators

Inference Cost Calculator

Training Cost Calculator

GPU Cluster Sizing

Model Fit Checker

HBM Bandwidth Calculator

AI Chip Comparator

Often Used Together

Wafer Cost Calculator

Die Per Wafer Calculator

Yield Calculator

Chip Profitability Calculator

Related Articles

Technical Services