The Speed of Sentiment

In the previous lesson, we looked at Cost. Now, we look at Performance. In AI, "Performance" is usually a trade-off between two contradicting goals:

Latency: How fast does the user get the answer?
Accuracy/Quality: How "Right" is the answer?

On the AWS Certified AI Practitioner exam, you will be given a specific business requirement and asked to choose a model that fits the "Performance Profile."

1. High Latency, High Accuracy (The Specialist)

The Scenario: A lawyer needs to find a tiny error in a 500-page contract.

The Priorities: Accuracy is the only thing that matters. If the AI is wrong, the lawyer might lose millions. They don't mind waiting 30 seconds for the answer.
The Choice: Use a Large Foundation Model (e.g., Claude 3 Opus or Llama 405B).

2. Low Latency, Low-to-Medium Accuracy (The Speedster)

The Scenario: A customer is typing into a live chat and expects an "Instant" auto-complete or a quick "Hello."

The Priorities: Speed is the priority. If the AI takes 10 seconds to say "Hello," the customer will leave. Small errors in grammar are acceptable if the response is fast.
The Choice: Use a Small Language Model (SLM) (e.g., Claude 3 Haiku or Mistral 7B).

3. The "Cold Start" Problem

If you are using AWS Lambda or SageMaker Serverless Inference, you might experience a "Cold Start."

This is the delay while AWS "Wakes up" your server after it hasn't been used for a while.
For a website that needs a response in under 500ms, a "Cold Start" of 5 seconds is unacceptable. In this case, you must use Provisioned Throughput or Real-time Endpoints (Always On).

4. Visualizing the Performance Curve

graph LR
    subgraph Speed_Domain
    A[Claude Haiku]
    B[Mistral 7B]
    end
    
    subgraph Logic_Domain
    C[Claude Sonnet]
    D[Llama 70B]
    end
    
    subgraph Brain_Domain
    E[Claude Opus]
    F[Lama 405B]
    end
    
    A & B -->|LOW Latency / LOW Cost| G[UI Interaction]
    C & D -->|MED Latency / MED Cost| H[General Assistance]
    E & F -->|HIGH Latency / HIGH Cost| I[Scientific / Legal Analysis]
    
    Note[As Complexity increases, Latentcy increases]

5. Summary: Know Your User

Before choosing a model:

Ask: "Is this for a machine (Batch) or a human (Real-time)?"
Ask: "What is the cost of a mistake?"
Ask: "What is the cost of a delay?"

Exercise: Identify the Performance Need

A gaming company is using AI to generate "NPC Dialogue" (speech for non-player characters) while the player is walking through a forest. If the AI takes longer than 200ms to generate the speech, the game will stutter. Which model should they choose?

A. Anthropic Claude 3 Opus (High Accuracy/High Latency).
B. Anthropic Claude 3 Haiku (Medium Accuracy/Low Latency).
C. A custom-trained 500B parameter model in SageMaker.
D. Amazon Transcribe.

The Answer is B! Haiku is designed specifically for High-speed, low-latency tasks where real-time interaction is critical.

Knowledge Check

Error: Quiz options are missing or invalid.

What's Next?

Performance is one thing, but what about "Risks"? We’ve talked about hackers, but what about "Infrastructure"? Find out in Lesson 3: Infrastructure Risk Awareness.

The Performance Scale: Latency vs. Accuracy