Tokens, Context, and Output Length

Every interaction with ChatGPT is bound by mathematical limits. Understanding these limits is the difference between a beginner and a power user.

1. What is a Token?

A rule of thumb is that 1,000 tokens $\approx$ 750 words. This includes both your prompt and the AI's response combined.

2. The Context Window

Think of the Context Window as the AI's "Short-Term Memory." It is the total number of tokens the model can "see" at any one time.

If the window is 128k tokens, the model can keep track of a small book.
If you exceed the window, the model starts to "forget" the beginning of the chat.

graph TD
    History[Chat History] --> Window{Context Window}
    NewPrompt[New Message] --> Window
    Window -->|Under Limit| FullContext[Model 'remembers' everything]
    Window -->|Over Limit| Truncated[Model 'forgets' oldest messages]

3. Output Length Constraints

Models also have a Maximum Output Limit. Even if the context window is huge, the model might stop writing after 4,000 tokens of generation to prevent runaway costs or infinite loops.

4. Strategies for Long Content

Summarization: Ask the AI periodically to "Summarize our conversation so far" to compress the context.
Chunking: Break large documents into smaller pieces and process them one by one.
Pinning: Mention critical facts explicitly in your latest prompt if the chat is getting long.

Hands-on: Testing the Limit

Start a long conversation or paste a very long (but safe) text.
Ask the AI to repeat a specific fact from the very beginning.
If it fails or hallucinates, you've likely hit the context window limit of that specific model version.

Key Takeaways

Tokens are both a measure of length and a measure of cost (for API users).
Context Windows are finite; conversation history is trimmed when full.
Efficiency means saying more with fewer tokens.

Module 2 Lesson 2: Tokens, Context, and Output Length