Tokens vs Words vs Characters: How AI Text Is Measured (2026)

Word count, character count, token count — three ways to measure the same piece of text, and none of them agree. If you work with AI tools, the one that actually controls your costs and limits is the token, not the word or character. This guide explains how all three relate, gives you the conversion ratios, and shows why they never line up perfectly.

Skip the math: the token counter on our homepage shows characters, words, and tokens together, live as you type. Open it →

The three units, defined

Characters — every letter, digit, space, and punctuation mark. The most granular measure. "Hello" is 5 characters.
Words — sequences of characters separated by spaces. "Hello world" is 2 words.
Tokens — sub-word chunks created by a model's tokenizer. "Hello world" is typically 2 tokens, but "Tokenization" alone might be 2 or 3.

The conversion ratios you need

For everyday English text, use these:

From	To	Multiply by
Characters	Tokens	÷ 4 (i.e. × 0.25)
Words	Tokens	× 1.33
Tokens	Words	× 0.75
Tokens	Characters	× 4

So a 500-word email is about 665 tokens. A 2,000-character message is about 500 tokens. These are the same approximations used by most online estimators, including ours.

Quick conversion examples

Use these examples when you need a fast estimate before opening a token calculator:

Input size	Approx. tokens	Common use case
50 words	About 67 tokens	Short prompt or instruction.
250 words	About 333 tokens	Email, support ticket, or short brief.
1,000 words	About 1,333 tokens	Article draft or long prompt.
5,000 words	About 6,665 tokens	Report, transcript, or document chunk.
20,000 characters	About 5,000 tokens	Large paste, JSON payload, or code file.

Why they never line up exactly

Tokenizers do not split on spaces the way word counters do. They split on statistical frequency. Common words map to one token; rare words, technical terms, and made-up words break into pieces. Consider:

"the" → 1 token (1 word)
"antidisestablishmentarianism" → 5+ tokens (1 word)
"GPT-4o" → 3+ tokens (1 word) because of the digits and hyphen

That is why two documents with identical word counts can have noticeably different token counts. Text full of jargon, code, numbers, or symbols always runs "token-heavy."

Language matters a lot

The 4-characters-per-token rule is tuned for English. Other languages tokenize very differently:

Spanish, French, German — close to English, maybe 10–30% more tokens.
Chinese, Japanese, Korean — often far more tokens per character, since each character carries more meaning and may map to one or more tokens.
Arabic, Hindi, Thai — frequently 2–3× the tokens of equivalent English.

If you build a multilingual app, never assume English ratios — test with real sample text.

Code is token-hungry too

Source code uses lots of punctuation, indentation, and unusual identifiers, all of which fragment into many tokens. A 100-line code file can easily use more tokens than a 100-line prose document of the same character count. If you paste code into an LLM, expect the token count to run high.

Which measure should you use?

Writing for humans? Use words.
Filling a fixed-size field or database column? Use characters.
Working with any AI model — cost, limits, or performance? Use tokens.

To understand exactly how text becomes tokens, see how many tokens is my text. To turn token counts into dollars, read our LLM cost calculator guide.

Which ratio should you trust?

For SEO drafts, emails, and ordinary English prompts, the word-to-token ratio is good enough. For billing, context limits, or production prompts, use a tokenizer-aware tool. A GPT token counter is the right choice for OpenAI models, while Claude and Gemini should be treated as estimates unless you verify with their official APIs or usage reports.

The safest planning habit is to count the full request: system message, developer instructions, retrieved context, user text, and the response budget. That gives you a more realistic view than counting the user prompt alone.

Convert instantly: paste any text into TokenCounter.cc and see characters, words, and tokens side by side. Try it →

Frequently asked questions

How many characters is one token?

About 4 characters of English text on average, though it varies by word and model.

How many words is one token?

Roughly 0.75 words. Equivalently, 100 words is about 133 tokens.

Why is my token count higher than my word count?

Because rare words, numbers, punctuation, and non-English text split into multiple tokens each, pushing the total above the word count.

Does the same text have the same token count in GPT and Claude?

No. Each model uses a different tokenizer, so counts differ — usually within 10–20% for English. See our Claude token guide.

Token Counter Team

Maintainers of TokenCounter.cc, a free token estimation tool. Writes about LLM tokenization, prompt efficiency, and AI API costs.