One Size Does Not Fit All

In the AWS Certified Generative AI Developer – Professional exam, you will encounter scenarios where an application switches from one model to another (e.g., from Llama 3 to Claude 3.5). You might assume you can just copy-paste the prompt. You are wrong.

Each foundation model has been trained on different datasets and formatted with different tokens. What works perfectly for Claude might confuse Llama. In this lesson, we will learn the nuances and secrets of Model-Specific Optimization.

1. The Anthropic Claude Style (The XML King)

Claude models are unique in how they process structure. They are highly responsive to XML tags.

Best Practice: Wrap your different context blocks and instructions in clear XML tags like <context>, <instructions>, and <data>.
Formatting: Use the messages API format: [{"role": "user", "content": "..."}].
The "Think" Pattern: Claude loves being told to think before answering. It respects the <thinking> tag structure.

2. The Meta Llama Style (The Prompt Wrapper)

Llama models (especially when used in SageMaker) often expect specific "Instruction Wrappers" to know where the prompt ends and the user query begins.

Tokens: Llama uses tokens like [INST] and [/INST].
Best Practice: If you are using the raw weights on SageMaker, you must manually wrap your prompt:
- <s>[INST] <<SYS>> You are a helpful assistant <</SYS>> What is the capital of France? [/INST]
The Bedrock Difference: When you use Llama through the Bedrock Converse API, AWS handles these tokens for you automatically.

3. The Amazon Titan Style (Direct and Concise)

Titan models are built for efficiency. They prefer direct, concise instructions without excessive "fluff" or complex XML structures.

Best Practice: Keep your system instructions at the top and the data at the bottom.
Titan Image Generator: Requires very specific keywords (e.g., "Photorealistic", "4k") to achieve high-end results compared to Stable Diffusion.

4. Comparing Response Structures

Feature	Claude 3+	Llama 3	Titan Text
Logic/Reasoning	Highest (prefers XML)	High (prefers [INST])	Medium (prefers direct)
JSON Support	Native/Structured	Good	High focus on Lite tasks
Multi-modal	Excellent (Vision)	Vision emerging	Image/Embeddings

5. Iterative Selection: The Bedrock Playground

Before you write a single line of code, you should use the Amazon Bedrock Playground to compare performance.

Side-by-Side Comparison: Open two windows, one with Claude and one with Llama.
Identical Prompt Test: Paste your prompt into both.
Observe: Does one model hallucinate while the other doesn't? Is one much faster?
Tune: Adjust the prompt specifically for the model that is failing until it succeeds.

6. Pro-Tip: The "Converse API" Shortcut

As a Professional Developer, you should use the Amazon Bedrock Converse API whenever possible.

The Converse API provides a consistent interface that works across almost all models in Bedrock. It handles the specific "Role" mappings and "Token" wrappers for you behind the scenes. This allows you to swap a Llama model for a Claude model by changing just one line of code (the ModelID).

# The professional way to build model-agnostic code
response = client.converse(
    modelId="anthropic.claude-3-sonnet-20240229-v1:0", # Change this to swap models
    messages=[{"role": "user", "content": [{"text": "Hello world"}]}]
)

Knowledge Check: Test Your Optimization Knowledge

Error: Quiz options are missing or invalid.

Summary

Models have personalities. By tailoring your prompts to their specific training—using XML for Claude or the Converse API for consistency—you ensure that your application is both High Performance and Easy to Maintain.

This concludes Module 12. In the next module, we move to a more advanced way of optimization: Model Tuning and Fine-tuning.

Next Module: Precision Surgery: Fine-tuning Foundation Models

Cross-Model Engineering: Optimizing Prompts for Different Models