Module 10 Lesson 2: Embeddings with Local Models
Turning words into math. Understanding the 'Embeddings' that power local semantic search.
Embeddings: The AI's Index
To build a RAG system, we need to solve a math problem: How does a computer "know" that the word "Puppy" is related to the word "Dog"?
Traditional search (Control + F) looks for exact spelling. If you search for "Canine," it won't find the word "Dog." Embeddings solve this by turning words into coordinates in a multidimensional map.
1. What is a Vector?
Imagine a 3D map:
- X-axis: Small vs Large.
- Y-axis: Feline vs Canine.
- Z-axis: Domestic vs Wild.
The word "Cat" might be at coordinate [1, 5, 2].
The word "Kitten" might be at [0.9, 4.9, 2.1].
Because those two lists of numbers are "close" to each other in math space, the computer knows they are semantically related.
2. Using Ollama as an "Embedding Engine"
Most people use Ollama for chat (Generation). But it is also a world-class "Embedder."
API Example:
curl http://localhost:11434/api/embeddings -d '{
"model": "mxbai-embed-large",
"prompt": "The dog is in the garden."
}'
The response will be a long list of numbers (e.g., 1024 or 4096 dimensions).
3. Picking an Embedding Model
You can use Llama 3 for embeddings, but it's like using a Ferrari to deliver mail. It's too big/slow. Instead, use specialized, tiny models:
mxbai-embed-large: (250MB) Current state-of-the-art for local embeddings.nomic-embed-text: Excellent for long documents.all-minilm: Tiny and extremely fast.
4. The Workflow of Search
- Ingestion: You turn all the sentences in your PDF into "Vectors" and save them.
- Querying: When a user asks a question, you turn the question into a vector.
- Distance math: You find the PDF vector that is "Closest" to the Question vector.
This "Closeness" is called Cosine Similarity.
Key Takeaways
- Embeddings turn human meaning into numeric vectors.
- Semantic Search allows AI to find "related ideas" even if the spelling is different.
- Ollama provides small, specialized Embedding Models (
mxbai,nomic) for high-speed indexing. - Local embeddings ensure that the "Fingerprint" of your data stays on your machine.