StackJot
All tools

Tool

AI API Cost Calculator

See exactly what GPT-4, Claude, and Gemini will cost you per call, per day, and per month — at the volume and prompt size you actually use. Helps decide which model is the right value for your workload.

ModelPer callPer dayPer month
GPT-4o
OpenAI · $2.5/M in · $10/M out
$0.00750$0.7500$22.50
GPT-4 Turbo
OpenAI · $10/M in · $30/M out
$0.0250$2.50$75.00
GPT-3.5 Turbo
OpenAI · $0.5/M in · $1.5/M out
$0.00125$0.1250$3.75
Claude 3.5 Sonnet
Anthropic · $3/M in · $15/M out
$0.0105$1.05$31.50
Claude 3 Opus
Anthropic · $15/M in · $75/M out
$0.0525$5.25$157.50
Claude 3 Haiku
Anthropic · $0.25/M in · $1.25/M out
$0.00088$0.0875$2.63
Gemini 1.5 Pro
Google · $1.25/M in · $5/M out
$0.00375$0.3750$11.25
Gemini 1.5 FlashCheapest
Google · $0.075/M in · $0.3/M out
$0.00022$0.0225$0.6750
Pricing notes. Rates shown are the standard per-million-token list prices as of 2026. Many vendors offer prompt caching (50-90% discount on cached input), batch APIs (50% discount, slower), and volume discounts. Real bills can be lower than these estimates if you use those features.

Frequently asked questions

How do AI APIs charge for usage?

All major AI APIs charge per token, billed separately for input tokens (what you send) and output tokens (what the model returns). Output tokens almost always cost more — typically 3-5x the input rate. Pricing is quoted per million tokens. For example, GPT-4o is $2.50 per million input tokens and $10 per million output tokens as of 2026.

Which AI API is cheapest?

For most general use cases in 2026, Gemini 1.5 Flash and Claude 3 Haiku are the cheapest, followed by GPT-3.5 Turbo. For higher-quality outputs, Gemini 1.5 Pro and GPT-4o are the best value among the strong models. Claude 3.5 Sonnet sits in the middle on price but is favored for writing and reasoning quality.

What is prompt caching and how does it reduce costs?

Prompt caching stores parts of your prompt (typically the system prompt or long context) on the vendor side, so repeated requests pay 50-90% less on the cached portion. Claude, GPT, and Gemini all offer some form of caching. For applications that send the same system prompt many times, caching can cut bills by 70%+. Not reflected in this calculator since it depends on your specific usage.

How do I estimate token counts before calling the API?

Use our companion AI Token Counter tool. For English prose, one token is roughly 4 characters or three-quarters of a word. A 500-word email is about 650 tokens. A 5-page PDF is roughly 2,500-3,500 tokens depending on density. For exact counts, use the official tokenizer for the model.

Are these prices accurate?

They are the published list prices as of 2026 from each vendor. Real bills can be lower if you use batch APIs (typically 50% discount with delayed processing), prompt caching (50-90% off cached inputs), or volume commitments. For billing-critical decisions, always verify against the vendor pricing page on the day you commit.

Related