Word count, character count, token count — three ways to measure the same piece of text, and none of them agree. If you work with AI tools, the one that actually controls your costs and limits is the token, not the word or character. This guide explains how all three relate, gives you the conversion ratios, and shows why they never line up perfectly.
Skip the math: the token counter on our homepage shows characters, words, and tokens together, live as you type. Open it →The three units, defined
- Characters — every letter, digit, space, and punctuation mark. The most granular measure. "Hello" is 5 characters.
- Words — sequences of characters separated by spaces. "Hello world" is 2 words.
- Tokens — sub-word chunks created by a model's tokenizer. "Hello world" is typically 2 tokens, but "Tokenization" alone might be 2 or 3.
The conversion ratios you need
For everyday English text, use these:
| From | To | Multiply by |
|---|---|---|
| Characters | Tokens | ÷ 4 (i.e. × 0.25) |
| Words | Tokens | × 1.33 |
| Tokens | Words | × 0.75 |
| Tokens | Characters | × 4 |
So a 500-word email is about 665 tokens. A 2,000-character message is about 500 tokens. These are the same approximations used by most online estimators, including ours.
Quick conversion examples
Use these examples when you need a fast estimate before opening a token calculator:
| Input size | Approx. tokens | Common use case |
|---|---|---|
| 50 words | About 67 tokens | Short prompt or instruction. |
| 250 words | About 333 tokens | Email, support ticket, or short brief. |
| 1,000 words | About 1,333 tokens | Article draft or long prompt. |
| 5,000 words | About 6,665 tokens | Report, transcript, or document chunk. |
| 20,000 characters | About 5,000 tokens | Large paste, JSON payload, or code file. |
Why they never line up exactly
Tokenizers do not split on spaces the way word counters do. They split on statistical frequency. Common words map to one token; rare words, technical terms, and made-up words break into pieces. Consider:
- "the" → 1 token (1 word)
- "antidisestablishmentarianism" → 5+ tokens (1 word)
- "GPT-4o" → 3+ tokens (1 word) because of the digits and hyphen
That is why two documents with identical word counts can have noticeably different token counts. Text full of jargon, code, numbers, or symbols always runs "token-heavy."
Language matters a lot
The 4-characters-per-token rule is tuned for English. Other languages tokenize very differently:
- Spanish, French, German — close to English, maybe 10–30% more tokens.
- Chinese, Japanese, Korean — often far more tokens per character, since each character carries more meaning and may map to one or more tokens.
- Arabic, Hindi, Thai — frequently 2–3× the tokens of equivalent English.
If you build a multilingual app, never assume English ratios — test with real sample text.
Code is token-hungry too
Source code uses lots of punctuation, indentation, and unusual identifiers, all of which fragment into many tokens. A 100-line code file can easily use more tokens than a 100-line prose document of the same character count. If you paste code into an LLM, expect the token count to run high.
Which measure should you use?
- Writing for humans? Use words.
- Filling a fixed-size field or database column? Use characters.
- Working with any AI model — cost, limits, or performance? Use tokens.
To understand exactly how text becomes tokens, see how many tokens is my text. To turn token counts into dollars, read our LLM cost calculator guide.
Which ratio should you trust?
For SEO drafts, emails, and ordinary English prompts, the word-to-token ratio is good enough. For billing, context limits, or production prompts, use a tokenizer-aware tool. A GPT token counter is the right choice for OpenAI models, while Claude and Gemini should be treated as estimates unless you verify with their official APIs or usage reports.
The safest planning habit is to count the full request: system message, developer instructions, retrieved context, user text, and the response budget. That gives you a more realistic view than counting the user prompt alone.
Convert instantly: paste any text into TokenCounter.cc and see characters, words, and tokens side by side. Try it →Frequently asked questions
How many characters is one token?
About 4 characters of English text on average, though it varies by word and model.
How many words is one token?
Roughly 0.75 words. Equivalently, 100 words is about 133 tokens.
Why is my token count higher than my word count?
Because rare words, numbers, punctuation, and non-English text split into multiple tokens each, pushing the total above the word count.
Does the same text have the same token count in GPT and Claude?
No. Each model uses a different tokenizer, so counts differ — usually within 10–20% for English. See our Claude token guide.