Reducing Irrelevant Context

"Garbage in, garbage out" applies perfectly to RAG. Even if you retrieve the right document, if it's surrounded by 2,000 words of legal boilerplate or website headers, the LLM might hallucinate or fail to see the relevant details.

The Goal: High Information Density

Your objective is to provide the LLM with the maximum amount of relevant information using the minimum number of tokens.

Technique 1: Post-Retrieval Scraping

If you retrieve a chunk from a web page, don't send the full HTML. Use a library like BeautifulSoup or Trafilatura to extract only the narrative text.

Technique 2: LLM-Based Summarization

Before sending retrieved chunks to your final prompt, run a fast, cheap model (like Claude 3 Haiku) to summarize them.

summary_prompt = f"Summarize the following document focusing only on facts related to {user_query}: {retrieved_doc}"

Technique 3: Contextual Compression

This is a LangChain feature where you use an embedding model to "compress" a document by keeping only the sentences that are semantically close to the query.

from langchain.retrievers.document_compressors import EmbeddingsFilter
filter = EmbeddingsFilter(embeddings=embeddings, similarity_threshold=0.76)
compressed_docs = filter.compress_documents(docs, query)

Technique 4: Removing Redundancy

If Retrieval returns 5 paragraphs that all say "The price is $50", remove 4 of them. Redundancy confuses LLMs and wastes your money.

Measuring Success

Use the Context Precision metric. This measures the signal-to-noise ratio in your retrieved chunks.

Strategy	Performance Gain	Latency Hit
Native Context	Baseline	0ms
Header/Footer Stripping	High	Low
LLM Summarization	Very High	High
Semantic Filtering	Medium	Medium

Exercises

Why is "Noise" in retrieval more dangerous than having no information at all?
Write a function that removes all lines from a text chunk that contain fewer than 5 words (often headers or noise).
How can you use "Metadata" to decide whether to skip a retrieved document entirely?