Token Lineage: Tracking the Thread

In a multi-agent system, a single token in the final answer might have a long Lineage.

It started in a Search Result (Agent A).
It was Summarized (Agent B).
It was Coded into a script (Agent C).
It was Verified (Agent D).

If the final answer is wrong, where did the "Bad Token" come from? If the bill is high, which agent "Bloated" the lineage?

Token Lineage is the practice of tracking the origin and transformation of data across your agentic graph.

1. The Lineage Metadata Pattern

To track tokens across agents, you must append an origin_id to your internal data structures.

The Flow:

Agent A creates a fact. Metadata: {"origin": "search_agent", "source": "wikipedia"}.
Agent B reads the fact. When it passes the fact to Agent C, it Preserves the metadata.
The Final Answer now carries a "Map" of where its tokens came from.

2. Visualizing the "Token Inflation" Map

Lineage allows you to see how many "Supporting Tokens" were required to produce a single "Result Token."

graph LR
    S[Search: 10k tokens] -->|Summarize| A[Agent: 500 tokens]
    A -->|Consolidate| B[Result: 50 tokens]
    
    subgraph "Lineage Efficiency"
        S -- 20:1 -- A
        A -- 10:1 -- B
    end

Audit Question: If a lineage has a 1000:1 inflation ratio (e.g., browsing 50 websites to find one phone number), you should consider Specializing the tool (Module 11.2) to return only the number, bypassing the LLM's raw browsing entirely.

3. Implementation: The Trace Observer (Python)

Python Code: Tracking State Lineage

class Fact:
    def __init__(self, content, author_agent):
        self.content = content
        self.author = author_agent
        self.timestamp = time.time()
        # We track how many tokens this fact cost at birth
        self.birth_token_cost = count_tokens(content)

def transfer_fact(fact, target_agent):
    print(f"Transferring {fact.content[:20]} from {fact.author} to {target_agent}")
    # We log this in our telemetry database (Module 16.3)
    log_lineage(fact.author, target_agent, fact.birth_token_cost)

4. The "Lineage Pruning" Strategy

Once you see the lineage, you can Prune it. If Agent C only needs the content of a fact, Strip the Metadata before sending it to the LLM.

Token Saving: Metadata is for Humans (debugging). Data is for AI (processing). By move metadata to a "Sidecar Database" (Module 11.5) and only sending pure data to the agents, you save 10-20% on every inter-agent turn.

5. Summary and Key Takeaways

Tag the Origin: Know which agent "Born" which fact.
Preserve through Handoffs: Don't lose the source metadata as facts move through the chain.
Inflation Analysis: Identify nodes where input tokens exploded but output quality didn't improve.
Sidecar Metadata: Log lineage to your dashboard, but keep it out of the LLM context to save tokens.

In the next lesson, Visualizing Cost with Grafana, we look at چگونه to turn these technical traces into "Board-Ready" dashboards.

Exercise: The Lineage Map

Run a 3-step agent chain (Search -> Summarize -> Write).
Track the 'Source ID' of the final answer.
Ask:
- Which agent contributed the most tokens to the final answer?
- Which agent consumed the most tokens but contributed the least?
- Strategy: The agent with the lowest "Input-to-Contribution" ratio is your target for the next efficiency rewrite.

Token Lineage: Tracking the Thread

Token Lineage: Tracking the Thread

1. The Lineage Metadata Pattern

2. Visualizing the "Token Inflation" Map

3. Implementation: The Trace Observer (Python)

Python Code: Tracking State Lineage

4. The "Lineage Pruning" Strategy

5. Summary and Key Takeaways

Exercise: The Lineage Map

Congratulations on completing Module 17 Lesson 2! You are now a token lineage expert.

Subscribe to our newsletter