HomeBlog › Tokens vs words vs characters

Tokens vs Words vs Characters: How AI Text Is Measured

By Token Counter Team · Updated June 29, 2026 · 6 min read

Word count, character count, token count — three ways to measure the same piece of text, and none of them agree. If you work with AI tools, the one that actually controls your costs and limits is the token, not the word or character. This guide explains how all three relate, gives you the conversion ratios, and shows why they never line up perfectly.

Skip the math: the token counter on our homepage shows characters, words, and tokens together, live as you type. Open it →

The three units, defined

The conversion ratios you need

For everyday English text, use these:

FromToMultiply by
CharactersTokens÷ 4 (i.e. × 0.25)
WordsTokens× 1.33
TokensWords× 0.75
TokensCharacters× 4

So a 500-word email is about 665 tokens. A 2,000-character message is about 500 tokens. These are the same approximations used by most online estimators, including ours.

Quick conversion examples

Use these examples when you need a fast estimate before opening a token calculator:

Input sizeApprox. tokensCommon use case
50 wordsAbout 67 tokensShort prompt or instruction.
250 wordsAbout 333 tokensEmail, support ticket, or short brief.
1,000 wordsAbout 1,333 tokensArticle draft or long prompt.
5,000 wordsAbout 6,665 tokensReport, transcript, or document chunk.
20,000 charactersAbout 5,000 tokensLarge paste, JSON payload, or code file.

Why they never line up exactly

Tokenizers do not split on spaces the way word counters do. They split on statistical frequency. Common words map to one token; rare words, technical terms, and made-up words break into pieces. Consider:

That is why two documents with identical word counts can have noticeably different token counts. Text full of jargon, code, numbers, or symbols always runs "token-heavy."

Language matters a lot

The 4-characters-per-token rule is tuned for English. Other languages tokenize very differently:

If you build a multilingual app, never assume English ratios — test with real sample text.

Code is token-hungry too

Source code uses lots of punctuation, indentation, and unusual identifiers, all of which fragment into many tokens. A 100-line code file can easily use more tokens than a 100-line prose document of the same character count. If you paste code into an LLM, expect the token count to run high.

Which measure should you use?

To understand exactly how text becomes tokens, see how many tokens is my text. To turn token counts into dollars, read our LLM cost calculator guide.

Which ratio should you trust?

For SEO drafts, emails, and ordinary English prompts, the word-to-token ratio is good enough. For billing, context limits, or production prompts, use a tokenizer-aware tool. A GPT token counter is the right choice for OpenAI models, while Claude and Gemini should be treated as estimates unless you verify with their official APIs or usage reports.

The safest planning habit is to count the full request: system message, developer instructions, retrieved context, user text, and the response budget. That gives you a more realistic view than counting the user prompt alone.

Convert instantly: paste any text into TokenCounter.cc and see characters, words, and tokens side by side. Try it →

Frequently asked questions

How many characters is one token?

About 4 characters of English text on average, though it varies by word and model.

How many words is one token?

Roughly 0.75 words. Equivalently, 100 words is about 133 tokens.

Why is my token count higher than my word count?

Because rare words, numbers, punctuation, and non-English text split into multiple tokens each, pushing the total above the word count.

Does the same text have the same token count in GPT and Claude?

No. Each model uses a different tokenizer, so counts differ — usually within 10–20% for English. See our Claude token guide.

Token Counter Team
Maintainers of TokenCounter.cc, a free token estimation tool. Writes about LLM tokenization, prompt efficiency, and AI API costs.