How LLMs Work (Conceptual)

How does a machine "Understand" the difference between a poem and a piece of code? It doesn't use words; it uses Math. The process happens in three main steps: Tokenization, Embedding, and Next-Token Prediction.

1. Tokens: The AI's Vocabulary

Models don't see "Words." They see Tokens. A token is a chunk of text.

Short words are usually 1 token.
Long words (like "Incomprehensible") are split into multiple tokens ("In-compre-hen-sible").
On average, 1,000 tokens ≈ 750 words.

2. Embeddings: Numerical Meaning

Computers can't calculate with letters, so they turn every token into a Vector (a list of numbers). Tokens with similar meanings end up with similar numbers.

"King" and "Queen" are close together in this "Meaning Space."
"Apple" and "Dog" are far apart.

3. Next-Token Prediction: The High-Stakes Guess

When you ask an LLM a question, it is performing a high-speed game of "Fill in the blank."

Prompt: "The capital of France is..."
AI Brain: Statistically, based on all the books I've read, the 99% most likely next word is...
Output: "Paris."

Visualizing the Pipeline

graph LR
    T[Text: 'Hello world'] --> Tok[Tokenization: 'Hello', ' world']
    Tok --> Emb[Embeddings: 0.1, 0.9, ...]
    Emb --> Pred[Prediction Engine]
    Pred --> Out['!']

Why It Feels Intelligent

Because the model has seen so many examples of human logic, its "Guess" for the next word includes the context of your whole conversation. It isn't just picking the most common word; it's picking the word that logically completes the thought.

💡 Guidance for Learners

When an LLM makes a mistake, it's usually because the statistics of its training data pointed it toward a common but incorrect answer (A "Hallucination").

Summary

Tokens are the building blocks of AI language.
Embeddings turn text into mathematical coordinates (Meaning).
The LLM's only job is to predict the most likely next token.
Intelligence is a "side effect" of very accurate statistical prediction.

Module 2 Lesson 2: How LLMs Work (Conceptual)