If you have ever seen ChatGPT "forget" the start of a long conversation, or had an API call rejected for being too long, you have hit a token limit. Every model can only hold so many tokens in mind at once — its context window. Understanding this one concept prevents a whole category of frustrating errors.
Check before you send: paste your prompt into the token counter on our homepage to see if it fits the window. Open it →What is a context window?
A context window is the maximum number of tokens a model can process in a single request — and crucially, it counts both your input and the model's output together. If a model has a 1,000,000-token window and your prompt uses 950,000, only about 50,000 tokens are left for the reply.
The window is the model's short-term memory. Anything outside it simply does not exist as far as the model is concerned.
Input tokens + output tokens share the window
This trips people up constantly. The window is a shared budget:
input tokens + output tokens ≤ context window
So a giant prompt does not just cost more — it leaves less room for the answer. If you need a long response, you must leave headroom for it.
Typical context window sizes
Windows have grown enormously. Approximate sizes for popular models in 2026:
| Model family | Approx. context window |
|---|---|
| GPT-5.5 / GPT-5.4 (OpenAI) | ~1,050,000 tokens |
| GPT-5.4 mini / nano | ~400,000 tokens |
| Claude Opus 4.8 / Sonnet 4.6 | 1,000,000 tokens |
| Claude Haiku 4.5 | 200,000 tokens |
| Gemini 2.5 / 3.x (Google) | 1,000,000+ tokens |
Exact limits depend on the specific model version and your plan. Always check the provider's docs.
To picture it: a 1,000,000-token window is roughly 750,000 words — several full-length novels at once. Even a 200,000-token window holds about 150,000 words, around a 500-page book.
Context window examples
The easiest way to understand a context window is as a shared budget. Your prompt and the answer compete for the same space.
| Model window | Your input | Room left for output | Risk |
|---|---|---|---|
| 128,000 tokens | 20,000 tokens | 108,000 tokens | Plenty of room for a long answer. |
| 128,000 tokens | 120,000 tokens | 8,000 tokens | Good for summary, risky for analysis. |
| 200,000 tokens | 195,000 tokens | 5,000 tokens | Likely to truncate or produce a short answer. |
| 1,000,000 tokens | 900,000 tokens | 100,000 tokens | Huge, but still not unlimited. |
What happens when you exceed the limit?
- In the API: the request is rejected with an error before anything runs. You pay nothing but get no answer.
- In the ChatGPT app: the oldest messages are silently dropped from context. The chat keeps working, but the model genuinely cannot recall the trimmed parts — which is why it "forgets" earlier details in very long threads.
How to stay under the limit
- Count first. Run long prompts through a token counter before sending.
- Trim context. Remove irrelevant history, boilerplate, and duplicate instructions.
- Summarize old turns. Replace a long back-and-forth with a short summary of what matters.
- Chunk large documents. Split big files and process them in pieces, or use retrieval to send only the relevant sections.
- Reserve output space. Leave enough of the window free for the reply length you actually need.
For more efficiency tactics that also cut your bill, see 10 ways to reduce token usage. To translate windows into real costs, read the cost calculator guide.
How much output space should you reserve?
As a practical rule, reserve more output tokens than you think you need. A short answer may need only a few hundred tokens, but a detailed analysis, code generation task, or multi-section report can need thousands. If your prompt is near the context limit, ask for a shorter answer or split the job into smaller chunks.
For current model limits and pricing, check official provider docs before relying on a number in production: OpenAI pricing, Anthropic pricing, and Gemini pricing.
Will it fit? Paste your prompt into TokenCounter.cc and compare it against your model's window in seconds. Open the tool →Frequently asked questions
What is ChatGPT's token limit?
It depends on the model. Modern GPT-5 models handle about 1,000,000 tokens of combined input and output; smaller variants like GPT-5.4 nano handle around 400,000.
Does the context window include the response?
Yes. Input and output share the same token budget, so a longer prompt leaves less room for the answer.
How many words is a 1 million token window?
Roughly 750,000 words of English — several full-length novels, or more than a dozen 300-page books.
Why does ChatGPT forget earlier parts of a long chat?
Once the conversation exceeds the context window, the oldest messages are dropped, so the model can no longer see them.