Introduction to AI Agents: Transitioning from Chat to Agency

Up until this point in the course, we have looked at the LLM as a "Question and Answer" box. You send a prompt, you get a text string. But the true power of an LLM Engineer lies in building Agents.

An agent is an LLM that is given a Goal, a set of Tools, and the ability to Reason in a Loop. It doesn't just talk; it does things to achieve an outcome.

1. What Defines an AI Agent?

An AI Agent is a system where the LLM is in the driver's seat of an application's logic.

The Chatbot vs. The Agent

Chatbot: User: "Summarize this." $\rightarrow$ Bot: "Here is the summary." (Dead end).
Agent: User: "Apply for this job for me." $\rightarrow$ Bot: "I need to find the job portal $\rightarrow$ I will research the company $\rightarrow$ I will draft a cover letter $\rightarrow$ I will upload the file $\rightarrow$ Done." (Autonomous loop).

graph TD
    A[User Goal] --> B{LLM Reasoning Engine}
    B -- "I need information" --> C[Tool: Web Search]
    C --> D[Observation: Search Results]
    D --> B
    B -- "I am ready" --> E[Tool: Email API]
    E --> F[Task Complete]

2. The Four Pillars of Agency

To build a professional agent, you must implement these four capabilities:

Reasoning Engine (The Brain): The LLM that decides what to do next. Typically a large model like Claude 3.5 Sonnet.
Planning: The ability to break a complex goal ("Build a website") into smaller steps (1. Header, 2. CSS, 3. Content).
Memory:
- Short-term: The current chat history.
- Long-term: Persistent state (e.g., "The user hates the color blue").
Tool Use (Capabilities): Giving the LLM "Hands" like database access, web search, or the ability to execute code.

3. The "Stateful" Mindset

In a standard script, code runs top to bottom. In an agent, code runs in a Graph.

An agent can reach a point of failure, decide to "go back" to a previous step, and try a different approach. This is why we use frameworks like LangGraph (which we will cover in Lesson 7.3). An agent's execution path is directed by its own reasoning, not by a hardcoded if/else statement.

4. Why "Vanilla" LLMs are not Agents

A model like GPT-4o is Passive. It cannot "decide" to wake up and check your email. It only responds when you call its API.

The Agent Shell is the Python code YOU write that wraps the model.

Your code provides the while loop.
Your code provides the Try/Except logic for tool failures.
Your code manages the Memory store.

Code Concept: The "Mental Model" of an Agent Loop

def autonomous_agent(goal):
    state = {"goal": goal, "history": [], "finished": False}
    
    while not state["finished"]:
        # 1. THINK: Ask the LLM what the next step is
        brain_response = llm.decide_next_step(state)
        
        # 2. ACT: If it wants to use a tool, call the function
        if brain_response.action == "CALL_TOOL":
            result = tools.execute(brain_response.tool_name, brain_response.args)
            state["history"].append(result)
            
        # 3. EVALUATE: Did we finish?
        elif brain_response.action == "FINISH":
            state["finished"] = True
            
    return "Goal Accomplished!"

Summary

Agency is about autonomy, tools, and loops.
Agents use LLMs as Reasoning Engines, not just text generators.
The four pillars are Reasoning, Planning, Memory, and Tools.
As an engineer, you are building the "Loop" and the "Safeguards" around the model.

In the next lesson, we will look at the ReAct Pattern, the specific logic structure that allows agents to think and act reliably.

Exercise: Identify the Agentic Task

Which of these tasks requires an Agent rather than a Chain?

"Translate this JSON file from English to Spanish."
"Research the top 5 competitors of NVIDIA, summarize their latest earnings reports, and calculate their average P/E ratio."
"Generate 10 variants of a marketing headline."

Answer: #2. It requires planning, multiple tool calls (Search, PDF extraction), and intermediate reasoning (Calculating the average).