Tokens vs words

Tokens vs Words

Words are easy for humans to count, but AI models usually process text as tokens. Paste your text to compare characters, words, estimated tokens, and potential cost.

Calculator

Estimate your AI prompt cost

Paste a prompt, choose an example pricing profile, and estimate cost per prompt run, per day, and per month.

Quick examples

Input tokens are what you send to the AI model. Output tokens are what the model returns. API providers often price them separately.

Advanced settings

Prices are manual for now. Example: if your provider charges $2 input and $10 output per 1M tokens, enter 2 and 10.

Energy usage is a rough estimate. Actual energy depends on model, hardware, provider, datacenter efficiency, workload, and region.

Words are human units

Words are useful for reading and writing, but they are not the unit most AI APIs use for billing.

Words are not billing units

A 500-word article, a 500-word JSON sample, and a 500-word code block can produce different token counts and different costs.

Tokens are model units

A token can be a word, part of a word, punctuation, whitespace, code, or formatting.

Counts vary by text

Language, punctuation, code, markdown, JSON, and message structure can all change token estimates.

Cost usually follows tokens

AI cost is usually based on input and output tokens, not word count. PromptMeter estimates all three side by side.

Content type guide

Simple prose is often easier to estimate. Technical text, code, JSON, and markdown can use more tokens because symbols and structure count too.

Language guide

English, Spanish, German, Chinese, and Japanese can tokenize differently. Treat every language-specific estimate as approximate.

Content type orientation

Content typeToken behaviorNotes
Simple proseUsually close to the general estimateVaries by language and punctuation
Technical textOften slightly denserAcronyms and symbols can change counts
CodeOften denserBrackets, operators and indentation matter
JSONOften denserKeys, quotes and repeated structure add tokens
MarkdownVariableLists, headings and formatting affect estimates

These are orientation tables, not official tokenizer measurements.

Language and script orientation

Language/scriptWhy it can vary
EnglishOften close to common token estimates
Spanish/French/Italian/PortugueseAccents, longer words and punctuation can shift estimates
German/Dutch/Polish/RussianCompound words and morphology can change token counts
Chinese/Japanese/KoreanCharacter-based scripts behave differently from word-based estimates
Code/structured textStructure can matter more than natural language

These are orientation tables, not official tokenizer measurements.

Why word count can mislead

Text typeWhy word count can misleadBetter estimate
Plain proseWords may map roughly to common token estimates, but punctuation and language still matterEstimate tokens directly
CodeOperators, brackets, indentation, and short identifiers count even when word count is lowUse a token estimate with code-heavy assumptions
JSONKeys, quotes, braces, commas, and repeated structure add tokensEstimate input and output tokens separately
MarkdownHeadings, lists, links, and tables add formatting tokensCompare characters, words, and token estimates
Long answersBilling depends on generated tokens, not the words you originally sentUse an output-token cost estimate

Cost depends on input and output tokens, not word count alone.

FAQ

Tokens vs words FAQ

Is one word the same as one token?

No. Some words are one token, some split into multiple tokens, and punctuation or formatting can count too.

Why do tokens matter for AI cost?

Providers often price API usage by tokens. More input or output tokens usually means higher cost.

Can token counts vary by model?

Yes. Different models and tokenizers can count the same text differently, so these estimates stay approximate.

Does JSON count differently from prose?

Often yes. Keys, punctuation, braces, brackets, indentation, and repeated fields can increase token density.