
Reducing Irrelevant Context
Master techniques to strip noise and maintain high-density information for your LLM generation step.
Reducing Irrelevant Context
"Garbage in, garbage out" applies perfectly to RAG. Even if you retrieve the right document, if it's surrounded by 2,000 words of legal boilerplate or website headers, the LLM might hallucinate or fail to see the relevant details.
The Goal: High Information Density
Your objective is to provide the LLM with the maximum amount of relevant information using the minimum number of tokens.
Technique 1: Post-Retrieval Scraping
If you retrieve a chunk from a web page, don't send the full HTML. Use a library like BeautifulSoup or Trafilatura to extract only the narrative text.
Technique 2: LLM-Based Summarization
Before sending retrieved chunks to your final prompt, run a fast, cheap model (like Claude 3 Haiku) to summarize them.
summary_prompt = f"Summarize the following document focusing only on facts related to {user_query}: {retrieved_doc}"
Technique 3: Contextual Compression
This is a LangChain feature where you use an embedding model to "compress" a document by keeping only the sentences that are semantically close to the query.
from langchain.retrievers.document_compressors import EmbeddingsFilter
filter = EmbeddingsFilter(embeddings=embeddings, similarity_threshold=0.76)
compressed_docs = filter.compress_documents(docs, query)
Technique 4: Removing Redundancy
If Retrieval returns 5 paragraphs that all say "The price is $50", remove 4 of them. Redundancy confuses LLMs and wastes your money.
Measuring Success
Use the Context Precision metric. This measures the signal-to-noise ratio in your retrieved chunks.
| Strategy | Performance Gain | Latency Hit |
|---|---|---|
| Native Context | Baseline | 0ms |
| Header/Footer Stripping | High | Low |
| LLM Summarization | Very High | High |
| Semantic Filtering | Medium | Medium |
Exercises
- Why is "Noise" in retrieval more dangerous than having no information at all?
- Write a function that removes all lines from a text chunk that contain fewer than 5 words (often headers or noise).
- How can you use "Metadata" to decide whether to skip a retrieved document entirely?