What are Embeddings?

Welcome to Module 2: Embeddings Fundamentals. In the previous module, we treated embeddings as a "black box" that turns words into numbers. Now, we pull back the curtain.

An Embedding is a representation of data in a continuous, high-dimensional vector space. It is the process of mapping a high-level concept (like a sentence or an image) into a numerical format that a computer can perform arithmetic on.

In this lesson, we will explore the intuition behind these "meaning-coordinates" and why they are the breakthrough that enabled modern Large Language Models (LLMs).

1. The Intuition: From One Dimension to Many

To understand embeddings, let's start with a simple 1D example: Temperature. We can map words to a 1D line based on how "Hot" they are:

"Freezing" -> 0.0
"Chilly" -> 0.2
"Warm" -> 0.7
"Boiling" -> 1.0

In this 1D space, calculating similarity is easy. "Chilly" is closer to "Freezing" than it is to "Boiling."

The Problem: Single Dimensions are Simple

But human language is complex. A word like "Apple" can't be mapped on a single line. Is it a fruit? A tech company? A red object? A round object?

To capture the "meaning" of "Apple," we need more dimensions.

Dimension 1: Is it Food? (0 to 1)
Dimension 2: Is it Tech? (0 to 1)
Dimension 3: Is it Red? (0 to 1)

Now, "Apple" becomes a vector: [0.9, 0.8, 0.9] (It's food, it's tech, it's red). An "Orange" might be: [0.9, 0.1, 0.0] (It's food, it's NOT tech, it's NOT red).

graph TD
    subgraph VectorSpace_Conceptual
    A[Apple: 0.9, 0.8, 0.9]
    B[Orange: 0.9, 0.1, 0.0]
    C[Microsoft: 0.1, 0.9, 0.2]
    end

In this 3D space, "Apple" is close to "Orange" in the Food dimension, but close to "Microsoft" in the Tech dimension. The more dimensions we add, the more nuanced the meaning becomes.

2. Sparse vs. Dense Embeddings

Historically, we used Sparse Embeddings (like One-Hot Encoding).

Sparse Embeddings (The Old Way)

If you have a vocabulary of 10,000 words, you represent "Cat" as a list of 10,000 numbers, where every number is 0 except for the one corresponding to "Cat" which is 1.

Cat = [0, 0, 1, 0, ... 0]

Problem: Every word is equidistant from every other word. The vector for "Cat" tells you nothing about its relationship to "Kitten."

Dense Embeddings (The Modern Way)

Modern embeddings are Dense. Every word is represented by a fixed-size vector of floating-point numbers (e.g., 768 or 1536). Every position in that vector captures some abstract "feature" of the data.

We don't know exactly what "Dimension 45" represents—it might be "femininity," "liquid state," or "historical significance"—but the model has learned these patterns from billions of pages of text.

3. The Power of Vector Arithmetic

One of the most famous examples in AI history is the Word2Vec calculation. If the embedding model is well-trained, you can perform math on the vectors to discover relationships:

King - Man + Woman = Queen

If you take the vector for "King," subtract the vector for "Man," and add the vector for "Woman," the resulting coordinates will be closer to the vector for "Queen" than any other word in the database.

This proves that the model hasn't just memorized words; it has captured the relational logic of concepts.

4. How Models "Learn" Embeddings

Embeddings are not manually defined by humans. They are learned through a process called Self-Supervised Learning.

A model is given a sentence with a word missing:
"The quick brown [MASK] jumps over the lazy dog."

The model tries to guess the missing word. To get better at guessing, it has to learn that "fox", "dog", and "cat" are interchangeable in this context. Therefore, their vectors start to move closer together in the high-dimensional space.

5. Python Example: Visualizing Vector Closeness

Let's use the scipy library to calculate the distance between three concepts to see how our intuition matches the math.

import numpy as np
from scipy.spatial.distance import cosine

# Let's pretend we have 3-dimensional embeddings for 3 items:
# Format: [is_animal, is_pet, is_wild]
cat = np.array([1.0, 1.0, 0.1])
dog = np.array([1.0, 0.9, 0.2])
wolf = np.array([1.0, 0.1, 0.9])
car = np.array([0.0, 0.0, 0.0])

# Similarity = 1 - Distance
# (The lower the distance, the higher the similarity)

def print_similarity(a_name, b_name, a_vec, b_vec):
    sim = 1 - cosine(a_vec, b_vec)
    print(f"Similarity between {a_name} and {b_name}: {sim:.4f}")

print_similarity("Cat", "Dog", cat, dog)
print_similarity("Cat", "Wolf", cat, wolf)
print_similarity("Dog", "Wolf", dog, wolf)
print_similarity("Cat", "Car", cat, car)

What we learn from this:

Cat and Dog are highly similar (~0.95+) because they are both domestic animals.
Cat and Wolf are somewhat similar (~0.60) because they are both animals, but their "wild" vs "pet" dimensions pull them apart.
Cat and Car have zero similarity because they share no conceptual overlap.

6. Embedding Models You Should Know

In this course, we will primarily use three types of models:

Proprietary Models (OpenAI/Google):
- Example: text-embedding-3-small (1536 dimensions).
- Pros: Extremely high quality, easy API access.
- Cons: Costly at scale, data ownership concerns.
Open Source Models (HuggingFace):
- Example: BAAI/bge-small-en-v1.5.
- Pros: Free to use, can run locally (Ollama), private.
- Cons: Requires your own compute (CPU/GPU).
Multimodal Models (CLIP):
- Bridges the gap between images and text.

7. The Workflow of an Embedding

To visualize the workflow in your code:

Input: "How do I bake a cake?"
Tokenizer: Breaks input into tokens [How, do, I, bake, a, cake].
Transformer: Contextualizes the tokens.
Pooling: Averages the token vectors into a single Sentence Embedding.
Output: [-0.012, 0.554, ... 1536 times ...]

This final output is what you save into your vector database.

Summary and Key Takeaways

Embeddings are the DNA of the AI stack. Without them, we are back to simple string matching.

Embeddings are coordinates in a high-dimensional space.
Distance = Relationship: Points that are close together share similar meanings.
Dimensionality matters: More dimensions capture more nuance but cost more to compute and store.
Models learn relationships by training on massive datasets to predict context.

In the next lesson, we will look specifically at How Text Embeddings Work, exploring the difference between words, sub-words, and how a whole paragraph is squashed into a single vector.

Exercise: Conceptual Dimensions

Think of the word "Coffee."

If you had to describe "Coffee" using only 5 values (0.0 to 1.0), what would those 5 dimensions be? (e.g., Temperature, Caffeine Level, Color, Price, Health).
Assign values to "Coffee," "Green Tea," and "Water" based on those 5 dimensions.
Which two are mathematically "closer"?

This exercise simulates exactly what a neural network does, but at a much larger scale (1536+ dimensions).

What are Embeddings? Mapping Meaning to Coordinates