The Order of Information Matters: Exploiting Model Attention

In the previous lessons, we learned what to put in a prompt (The Four Pillars). In this lesson, we will learn where to put it.

It might seem logical that a prompt is read from top to bottom, like a book. While this is technically true, Large Language Models (LLMs) do not treat every part of a prompt with equal weight. Due to the way Attention Mechanisms work, a model's "memory" of a prompt is often distorted. This has lead to two critical findings in AI research that every prompt engineer must master: the Recency Bias and the Lost in the Middle phenomenon.

By understanding these patterns, you can structure your prompts to ensure the model focuses on the most critical information precisely when it needs to.

1. The Recency Bias: The Final Word is King

LLMs have a strong tendency to follow the instructions placed at the very end of a prompt over those placed at the beginning.

Why it Happens

As the model processes your prompt, it creates a "state" (the KV Cache). The most recent tokens are the most "active" in the model's current attention window. If you provide 10 pages of context and then a small instruction at the end, that instruction is what determines the immediate "next-token" generation.

The Pro Strategy: Always repeat your most critical constraints (e.g., "Output JSON ONLY!") at the very bottom of the prompt, right before the model begins its reply.

2. The "Lost in the Middle" Phenomenon

Research from Stanford and other institutions has shown that LLMs are very good at retrieving information from the beginning and end of a long prompt, but their performance drops significantly for information placed in the middle.

The U-Shaped Accuracy Curve

High Accuracy: First 10% of the prompt.
Low Accuracy: Middle 60% of the prompt.
High Accuracy: Last 10% of the prompt.

graph TD
    A[Start of Prompt: Instructions] -->|High Attention| B[B]
    B[Middle of Prompt: Data/Context] -->|Low Attention - Lost!| C[C]
    C[End of Prompt: Final Instructions] -->|Highest Attention| D[Final Output]
    
    style A fill:#2ecc71,color:#fff
    style B fill:#e74c3c,color:#fff
    style C fill:#3498db,color:#fff

How to Beat the Curve:

If you have multiple pieces of context, put the most important piece at the very top or the very bottom. Never put your most critical data in the middle of a large text block.

3. The "Instruction Sandwich" Pattern

To deal with these biases, professional engineers use the Instruction Sandwich.

Top Layer: High-level instructions and Persona (Set the Stage).
Middle Layer: Raw data and context (The Meat).
Bottom Layer: Reiteration of constraints and output format (The Guardrails).

Example Pattern:

Task: Summarize the following legal document.

<ROLE>
You are a Senior Legal Counsel.
</ROLE>

### DOCUMENT START ###
[10 pages of legal text...]
### DOCUMENT END ###

### FINAL COMMANDS ###
1. The summary must be under 300 words.
2. Focus on indemnification clauses.
3. Return as a Markdown table.

4. Technical Implementation: Dynamic Ordering in Python

In a FastAPI application, you can ensure your instructions are always "fresh" in the model's mind by appending a "Footer" to every prompt.

Python Example: The Footer Injection

from fastapi import FastAPI
from pydantic import BaseModel

app = FastAPI()

# A static footer that enforces our strict rules
GLOBAL_FOOTER = """
---
REMINDER:
- Output only valid JSON.
- No conversational filler.
- If unsure, return {"error": "insufficient_data"}.
"""

@app.post("/process")
async def process(user_prompt: str, context: str):
    # We "sandwich" the context between the instructions and the footer
    final_prompt = f"Instructions: Process this data.\n\nContext: {context}\n\nUser Input: {user_prompt}\n\n{GLOBAL_FOOTER}"
    
    # send to model (e.g. AWS Bedrock)
    return {"prompt_sent": final_prompt}

By programmatically injecting the GLOBAL_FOOTER, you ensure that the Recency Bias works for you, not against you.

5. Deployment: Optimizing for Context Windows in Docker

When you use Docker to deploy your AI service, you are often limited by the RAM of your host or the cost of the cloud tokens.

Chunking and Ordering

In a RAG system, when you retrieve 5 chunks from your Vector DB, don't just order them by similarity (1, 2, 3, 4, 5). Better Strategy: Order them (1, 3, 5, 4, 2). This puts the most relevant chunk (1) at the top and the second most relevant (2) at the bottom, exploiting the U-shaped attention curve.

6. Real-World Case Study: The Ignored API Key

A developer was building an AI tool that analyzed code. They put the instruction "Do NOT display the API keys found in the code" in the middle of a 5-page system prompt. The Failure: The model repeatedly displayed the API keys in its output. The Fix: Moving that instruction to the very bottom of the prompt (the "Recency" position) stopped the leaking immediately.

7. SEO and Content Hierarchy

The same principle applies to SEO. Google's crawlers (and AI models that summarize web pages) prioritize the information in your H1, the first paragraph, and the concluding summary. When prompting an AI to write a blog post, instruct it to place the most high-value "Search Intent" keywords at the beginning and end of the article to satisfy both human readers and search algorithms.

Summary of Module 3, Lesson 3

The Recency Bias: The last instruction is the most powerful.
The Lost in the Middle Problem: Models struggle with information in the center of long prompts.
Use the Instruction Sandwich: Wrap your data in instructions at the top and bottom.
Programmatic footers are your friend: Use Python to ensure rules are always the last thing the model sees.

In the next lesson, we will look at Delimiters and Formatting (Refining the Structure)—how to use visual indicators to keep these ordered sections separate and clear.

Practice Exercise: The Attention Test

The "Middle" Test: Provide a prompt with 50 unrelated facts. In the exact middle (fact #25), put a secret code: "The secret code is 9999." Ask the model to "List all the facts."
Analyze: Did the model miss the secret code? It likely did.
The "Top/Bottom" Test: Move the secret code to the 1st or 50th position.
Analyze: The model will now almost certainly find it. Use this insight to plan your future RAG architectures.