HomeBlog › ChatGPT token limits

ChatGPT Token Limits & Context Windows Explained

By Token Counter Team · Updated June 29, 2026 · 7 min read

If you have ever seen ChatGPT "forget" the start of a long conversation, or had an API call rejected for being too long, you have hit a token limit. Every model can only hold so many tokens in mind at once — its context window. Understanding this one concept prevents a whole category of frustrating errors.

Check before you send: paste your prompt into the token counter on our homepage to see if it fits the window. Open it →

What is a context window?

A context window is the maximum number of tokens a model can process in a single request — and crucially, it counts both your input and the model's output together. If a model has a 1,000,000-token window and your prompt uses 950,000, only about 50,000 tokens are left for the reply.

The window is the model's short-term memory. Anything outside it simply does not exist as far as the model is concerned.

Input tokens + output tokens share the window

This trips people up constantly. The window is a shared budget:

input tokens + output tokens ≤ context window

So a giant prompt does not just cost more — it leaves less room for the answer. If you need a long response, you must leave headroom for it.

Typical context window sizes

Windows have grown enormously. Approximate sizes for popular models in 2026:

Model familyApprox. context window
GPT-5.5 / GPT-5.4 (OpenAI)~1,050,000 tokens
GPT-5.4 mini / nano~400,000 tokens
Claude Opus 4.8 / Sonnet 4.61,000,000 tokens
Claude Haiku 4.5200,000 tokens
Gemini 2.5 / 3.x (Google)1,000,000+ tokens

Exact limits depend on the specific model version and your plan. Always check the provider's docs.

To picture it: a 1,000,000-token window is roughly 750,000 words — several full-length novels at once. Even a 200,000-token window holds about 150,000 words, around a 500-page book.

Context window examples

The easiest way to understand a context window is as a shared budget. Your prompt and the answer compete for the same space.

Model windowYour inputRoom left for outputRisk
128,000 tokens20,000 tokens108,000 tokensPlenty of room for a long answer.
128,000 tokens120,000 tokens8,000 tokensGood for summary, risky for analysis.
200,000 tokens195,000 tokens5,000 tokensLikely to truncate or produce a short answer.
1,000,000 tokens900,000 tokens100,000 tokensHuge, but still not unlimited.

What happens when you exceed the limit?

How to stay under the limit

  1. Count first. Run long prompts through a token counter before sending.
  2. Trim context. Remove irrelevant history, boilerplate, and duplicate instructions.
  3. Summarize old turns. Replace a long back-and-forth with a short summary of what matters.
  4. Chunk large documents. Split big files and process them in pieces, or use retrieval to send only the relevant sections.
  5. Reserve output space. Leave enough of the window free for the reply length you actually need.

For more efficiency tactics that also cut your bill, see 10 ways to reduce token usage. To translate windows into real costs, read the cost calculator guide.

How much output space should you reserve?

As a practical rule, reserve more output tokens than you think you need. A short answer may need only a few hundred tokens, but a detailed analysis, code generation task, or multi-section report can need thousands. If your prompt is near the context limit, ask for a shorter answer or split the job into smaller chunks.

For current model limits and pricing, check official provider docs before relying on a number in production: OpenAI pricing, Anthropic pricing, and Gemini pricing.

Will it fit? Paste your prompt into TokenCounter.cc and compare it against your model's window in seconds. Open the tool →

Frequently asked questions

What is ChatGPT's token limit?

It depends on the model. Modern GPT-5 models handle about 1,000,000 tokens of combined input and output; smaller variants like GPT-5.4 nano handle around 400,000.

Does the context window include the response?

Yes. Input and output share the same token budget, so a longer prompt leaves less room for the answer.

How many words is a 1 million token window?

Roughly 750,000 words of English — several full-length novels, or more than a dozen 300-page books.

Why does ChatGPT forget earlier parts of a long chat?

Once the conversation exceeds the context window, the oldest messages are dropped, so the model can no longer see them.

Token Counter Team
Maintainers of TokenCounter.cc, a free token estimation tool. Writes about LLM tokenization, prompt efficiency, and AI API costs.