Module 11 Lesson 3: Long-Horizon Planning

If you've ever tried to build an AI agent that performs 10 consecutive tasks (e.g., "Research a company, find its CEO, find their email, write a draft, check for errors, send the email..."), you've likely noticed that the AI often fails around Step 4 or 5.

This is known as the problem of Long-Horizon Planning. In this final lesson of Module 11, we explore why models that are brilliant at "Writing" are often terrible at "Planning."

1. The Meandering Brain

Because LLMs predict one token at a time, they are essentially "following their nose."

In a 3-word sentence, it's easy to stay on track.
In a 10-step plan, each step introduces a small amount of "statistical noise."
By Step 6, the model might be "Attending" more to the details of Step 5 than to the original goal you gave it at the very beginning.

This is called Goal Drift. The AI starts doing a sub-task and forgets that the sub-task was only a means to an end.

2. Planning vs. Action

Humans plan using a blueprint. Before we start building a house, we know where the kitchen will be. LLMs "build" the house by placing one brick and then deciding where the next one goes. If the first brick is slightly crooked, the whole house eventually falls over.

Why Agents Fail:

Error Accumulation: A 5% error in Step 1 becomes a 25% error by Step 5.
No Backtracking: If an LLM realizes it made a mistake 200 tokens ago, it can't "delete" those tokens. It has to keep moving forward, often trying to "justify" its previous mistake, which leads to a hallucination loop.

graph TD
    Start["Original Goal: 'Book a Flight'"] --> S1["Step 1: Search Sites (Success)"]
    S1 --> S2["Step 2: Find Prices (Success)"]
    S2 -- "Small Error: Picks wrong date" --> S3["Step 3: Check Seat Map (Success)"]
    S3 --> S4["Step 4: Enter Identity Info (Success)"]
    S4 --> GoalDrift["Step 5: Fails to book because the date was wrong/unavailable"]
    GoalDrift --> Hallucination["AI Claims: 'I booked it!' (to be helpful)"]

3. The Context Window Bottleneck

Even if a model has a context window of 1 million tokens, its Effective Attention is limited. As the "history" of the task gets longer, the "Signal-to-Noise" ratio drops. The original instruction (The Signal) gets buried under thousands of tokens of tool outputs and intermediate thoughts (The Noise).

4. How to solve Planning

To fix this, developers are moving away from "One Giant Prompt" and toward:

Hierarchical Agents: One "Manager" model that only does the planning, and several "Worker" models that only do the small, short tasks.
State Machines: Using traditional code to force the AI to follow a rigid map (e.g., "You cannot go to Step 3 until Step 2 is verified as TRUE").

Lesson Exercise

Goal: Spot the Goal Drift.

Ask an LLM to play a game of "20 Questions."
After 10 questions, ask it: "What were the first 3 questions you asked me?"
Check if it remembers.
Now, tell it a very long, boring story, and halfway through, tell it: "Note that my favorite color is Magenta."
At the very end of the story, ask it: "What is my favorite color?"

Observation: You'll see how the "Density" of information can cause the model to lose track of the specific goals or facts you established at the beginning.

Conclusion of Module 11

You've now completed the "Check and Balance" module. You know what LLMs can't do:

Lesson 1: Bridge the System 1 vs System 2 Reasoning Gap.
Lesson 2: Distinguish between Correlation and Causation.
Lesson 3: Maintain objective consistency in Long-Horizon Planning.

Final Module: We look toward the horizon. In Module 12: The Future of Large Language Models, we'll discuss AGI, Multi-modal models, and how AI might change the way we live and work forever.