Structured Context: YAML vs Markdown vs JSON for Graphs

Structured Context: YAML vs Markdown vs JSON for Graphs

The syntax of success. Explore the pros and cons of different data formats for presenting Knowledge Graph subgraphs to an LLM and find the one that balances token cost with reasoning accuracy.

Structured Context: YAML vs Markdown vs JSON for Graphs

How you "Package" your graph facts matters as much as the facts themselves. When you send a subgraph to an LLM, you are sending a structured block of data. Some formats are very "Noisy" (using too many curly braces and quotes), while others are "Fragile" (relying on indentation that might get lost).

In this final lesson of Module 11, we will compare the three most popular formats for Graph Context: YAML, Markdown, and JSON. We will look at the Token Overhead of each and see how different formats impact the "Attention" of the LLM. By the end, you'll know exactly which "Syntax" to use for your specific RAG deployment.


1. JSON: The Machine Standard

[
  {"subj": "Sudeep", "rel": "WORKS_AT", "obj": "Google"},
  {"subj": "Google", "rel": "INDUSTRY", "obj": "Tech"}
]
  • Pros: Perfectly parsed by every model. No confusion on where a fact starts and ends.
  • Cons: Most Token-Expensive. Every quote (") and brace ({) costs you money. About 30-40% "Fluff" tokens.

2. YAML: The Clean Middle

- subject: Sudeep
  relation: WORKS_AT
  object: Google
- subject: Google
  relation: INDUSTRY
  object: Tech
  • Pros: Much cleaner than JSON. Fewer "Decorative" characters.
  • Cons: Relies on indentation. If the LLM prompt gets mangled or the context window is near the limit, the structure can "Fade."

3. Markdown (Triplet Notation): The Token Champion

- (Sudeep) -[:WORKS_AT]-> (Google)
- (Google) -[:INDUSTRY]-> (Tech)
  • Pros: The most Token-Efficient. Uses symbols (-, ->, []) that the LLM already understands from being trained on Cypher and technical documentation.
  • Cons: Can be ambiguous if your entity names contain special characters like ->.

4. Comparison Table: Token Density

FormatTokens per 10 FactsReasoning Clarity
JSON~300High
YAML~220Medium-High
Markdown~150Highest

The Winner: For most Graph RAG systems, Markdown Triplet Notation is the superior choice. It captures the "Flow" of the graph visually, which helps the LLM's spatial reasoning.

graph LR
    subgraph "Token Efficiency"
    JSON[JSON: Expensive]
    YAML[YAML: Balanced]
    MD[Markdown: Lean]
    end
    
    style MD fill:#34A853,color:#fff
    style JSON fill:#f44336,color:#fff

5. Summary and Exercises

The syntax of your context defines the Economy and Accuracy of your system.

  • JSON is the safest but most expensive.
  • YAML is readable but fragile.
  • Markdown is the standard for high-performance Graph RAG.
  • Consistency is key: Pick one format and use it across all your Few-Shot examples (Lesson 5).

Exercises

  1. Token Counting: Take 5 triplets and write them in all three formats. Use a tokenizer tool to see the difference in count.
  2. Edge Case: If your entity name contains a bracket (e.g., (Project [Internal])), how would you escape it in Markdown format?
  3. Visualization: Draw a relationship between "Sun" and "Earth" in Markdown style. Now, draw it in YAML style. Which looks more like a "Connection"?

Congratulations! You have completed Module 11: Prompting and Context Construction. You now know how to build the perfect "Information Package" for your AI.

In Module 12: Reasoning and Multi-Hop Inference, we will look at how to guide the AI as it parses this context to find deep answers.

Subscribe to our newsletter

Get the latest posts delivered right to your inbox.

Subscribe on LinkedIn