
Structured Context: YAML vs Markdown vs JSON for Graphs
The syntax of success. Explore the pros and cons of different data formats for presenting Knowledge Graph subgraphs to an LLM and find the one that balances token cost with reasoning accuracy.
Structured Context: YAML vs Markdown vs JSON for Graphs
How you "Package" your graph facts matters as much as the facts themselves. When you send a subgraph to an LLM, you are sending a structured block of data. Some formats are very "Noisy" (using too many curly braces and quotes), while others are "Fragile" (relying on indentation that might get lost).
In this final lesson of Module 11, we will compare the three most popular formats for Graph Context: YAML, Markdown, and JSON. We will look at the Token Overhead of each and see how different formats impact the "Attention" of the LLM. By the end, you'll know exactly which "Syntax" to use for your specific RAG deployment.
1. JSON: The Machine Standard
[
{"subj": "Sudeep", "rel": "WORKS_AT", "obj": "Google"},
{"subj": "Google", "rel": "INDUSTRY", "obj": "Tech"}
]
- Pros: Perfectly parsed by every model. No confusion on where a fact starts and ends.
- Cons: Most Token-Expensive. Every quote (
") and brace ({) costs you money. About 30-40% "Fluff" tokens.
2. YAML: The Clean Middle
- subject: Sudeep
relation: WORKS_AT
object: Google
- subject: Google
relation: INDUSTRY
object: Tech
- Pros: Much cleaner than JSON. Fewer "Decorative" characters.
- Cons: Relies on indentation. If the LLM prompt gets mangled or the context window is near the limit, the structure can "Fade."
3. Markdown (Triplet Notation): The Token Champion
- (Sudeep) -[:WORKS_AT]-> (Google)
- (Google) -[:INDUSTRY]-> (Tech)
- Pros: The most Token-Efficient. Uses symbols (
-,->,[]) that the LLM already understands from being trained on Cypher and technical documentation. - Cons: Can be ambiguous if your entity names contain special characters like
->.
4. Comparison Table: Token Density
| Format | Tokens per 10 Facts | Reasoning Clarity |
|---|---|---|
| JSON | ~300 | High |
| YAML | ~220 | Medium-High |
| Markdown | ~150 | Highest |
The Winner: For most Graph RAG systems, Markdown Triplet Notation is the superior choice. It captures the "Flow" of the graph visually, which helps the LLM's spatial reasoning.
graph LR
subgraph "Token Efficiency"
JSON[JSON: Expensive]
YAML[YAML: Balanced]
MD[Markdown: Lean]
end
style MD fill:#34A853,color:#fff
style JSON fill:#f44336,color:#fff
5. Summary and Exercises
The syntax of your context defines the Economy and Accuracy of your system.
- JSON is the safest but most expensive.
- YAML is readable but fragile.
- Markdown is the standard for high-performance Graph RAG.
- Consistency is key: Pick one format and use it across all your Few-Shot examples (Lesson 5).
Exercises
- Token Counting: Take 5 triplets and write them in all three formats. Use a tokenizer tool to see the difference in count.
- Edge Case: If your entity name contains a bracket (e.g.,
(Project [Internal])), how would you escape it in Markdown format? - Visualization: Draw a relationship between "Sun" and "Earth" in Markdown style. Now, draw it in YAML style. Which looks more like a "Connection"?
Congratulations! You have completed Module 11: Prompting and Context Construction. You now know how to build the perfect "Information Package" for your AI.
In Module 12: Reasoning and Multi-Hop Inference, we will look at how to guide the AI as it parses this context to find deep answers.