Text Embeddings

Embeddings are the core of RAG. They convert human language into numerical vectors (arrays of numbers) such that words with similar meanings are positioned close to each other in high-dimensional space.

How They Work

An embedding model takes a string of text and outputs a vector (e.g., 1536 numbers for OpenAI's text-embedding-3-small).

# Conceptual example
vector = model.embed("What is RAG?")
# Output: [0.012, -0.045, 0.231, ...]

Key Properties

Semantic Density: Unlike keyword search (which looks for exact characters), embeddings capture the "idea" of a sentence.
Cosine Similarity: The primary way we measure "closeness" between two vectors.
Fixed Dimension: Every output from a given model has the same number of dimensions.

Choosing a Text Embedding Model

Model	Provider	Dims	Key Strength
`text-embedding-3-small`	OpenAI	1536	Cost & Efficiency
`titan-embed-text-v2`	AWS	1024	Cloud Integrated
`bge-small-en-v1.5`	Open Source	384	Speed (Local)
`voyage-2`	Voyage AI	1024	Retrieval Accuracy

The MTEB Benchmark

If you are looking for the "best" model, refer to the Massive Text Embedding Benchmark (MTEB) leaderboard on Hugging Face. It ranks models based on their performance across retrieval, clustering, and classification tasks.

Practical Implementation (OpenAI)

from openai import OpenAI
client = OpenAI()

def get_embedding(text, model="text-embedding-3-small"):
   text = text.replace("\n", " ")
   return client.embeddings.create(input = [text], model=model).data[0].embedding

Exercises

Compare the word "Apple" (the fruit) and "Apple" (the company) in vector space using two different sentences.
What happens to the embedding if you change a single word to its synonym?
Why is it important to use the same model for both ingestion and querying?