
Module 1 Lesson 1: Defining Large Language Models
Welcome to the first lesson of the LLM course! We start by defining what Large Language Models actually are, why they are 'large', and what they can (and cannot) do.
Module 1 Lesson 1: Defining Large Language Models
Welcome to the foundation of modern AI. If you've used ChatGPT, Claude, or Gemini, you've interacted with a Large Language Model (LLM). But what exactly is happening behind the curtain?
In this lesson, we will demystify the "Large", the "Language", and the "Model" to build a rock-solid mental framework for the rest of the course.
1. What is a "Language Model"?
At its simplest, a Language Model (LM) is a mathematical tool designed to predict the next piece of text in a sequence.
Think of it like a hyper-advanced version of the "autocorrect" or "predictive text" on your smartphone. When you type "How are...", your phone might suggest "you" or "things". A language model does exactly this, but it doesn't just look at the last word; it looks at billions of patterns it learned during its creation.
The Probability Game
LLMs don't "know" facts the way a database does. Instead, they calculate probabilities. If you ask an LLM to complete "The capital of France is...", it calculates that "Paris" has a 99.9% probability of being the correct next token based on its training data.
graph LR
Input["'The sky is...'"] --> LM["Language Model"]
LM -- "85% prob" --> Blue["'blue'"]
LM -- "10% prob" --> Grey["'grey'"]
LM -- "5% prob" --> Dark["'dark'"]
2. Why are they called "Large"?
The word "Large" in LLM refers to two massive scales:
- Parameters: These are essentially the "neural connections" within the model. A model like GPT-4 is estimated to have over a trillion parameters. These are the adjustable "knobs" that were tuned during training to represent knowledge.
- Training Data: They are trained on virtually the entire public internet—books, articles, code, and conversations. This scale allows them to understand not just grammar, but also logic, coding, and specialized domains like medicine or law.
3. What Problem do LLMs Solve?
LLMs are the ultimate "Unstructured Data Processors." Before LLMs, computers were great at math and structured data (like spreadsheets) but terrible at human nuance.
LLMs excel at:
- Summarization: Turning a 50-page PDF into 5 bullet points.
- Translation: Moving between human languages (and programming languages) with high fidelity.
- Creative Synthesis: Generating ideas, drafts, or marketing copy.
- Code Generation: Converting human intent into executable software.
4. What LLMs cannot (yet) do
It is critical to understand the boundaries of this technology:
- True Reasoning: LLMs simulate reasoning through pattern matching. They can often fail at simple logic puzzles if the specific pattern wasn't in their training.
- Real-Time Knowledge: Unless they are connected to a search tool (like RAG), they only know what was in their training data.
- Deterministic Reliability: Because they are probabilistic, they can give different answers to the same question (hallucination).
Lesson Exercise
Goal: Identify Three tasks suited for LLMs and three that are not.
- Suited: summarizing a meeting transcript, refactoring a legacy Python function, brainstorming names for a new startup.
- Not Suited: performing complex multi-step accounting with 100% precision (better for Excel), predicting the exact stock price 10 minutes from now, or performing a physical heart surgery.
Summary and Next Steps
In this lesson, we defined LLMs as probabilistic next-token predictors that leverage massive scale to solve unstructured text problems.
Coming Up in Lesson 2: We will dive deeper into How LLMs Differ from Traditional Software—explaining why you can't just "debug" an AI model the way you fix a line of Java or Python code.