
The Memory of AI: Vector Stores and Embeddings
Master the math behind the meaning. Deep dive into vector embeddings, similarity search, and choosing between OpenSearch, Aurora, and Pinecone for your AWS GenAI architecture.
Mathematical Meaning
In traditional software, we store data as strings or numbers. In Generative AI, we store data as Vectors. A vector is simply a list of numbers (a coordinate) in a high-dimensional space. The magic of "Embeddings" is that similar concepts are placed close together in this space.
In this lesson, we will explore how to turn text into math and how to store those numbers in a way that allows us to find the "needle in the haystack" in milliseconds.
1. What are Embeddings?
An Embedding is a representation of an object (text, image, audio) in a dense vector space.
- The word "King" is mathematically close to "Queen."
- The word "King" is mathematically far from "Toaster."
Dimensions
Embedding models have a fixed number of Dimensions. For example:
- Amazon Titan Text Embeddings v2: Up to 1024 dimensions.
- Cohere Embed English v3: 1024 dimensions.
The Rule: More dimensions usually mean more nuance, but they also mean more storage space and slower search speeds.
2. Similarity Metrics: Measuring Distance
How does the computer "know" that two vectors are similar? It uses math to measure the distance between them.
| Metric | Business Use Case |
|---|---|
| Cosine Similarity | The industry standard for text. Measures the angle between two vectors (regardless of length). |
| Euclidean Distance | Measures the literal 'straight line' distance. Good for fixed-length data. |
| Inner Product | Used when you want both magnitude and direction to stay relevant. |
In the AIP-C01 exam, assume Cosine Similarity for most RAG/Text scenarios.
3. Vector Stores on AWS: The Options
AWS provides several ways to store and search these vectors. Choosing the right one is a classic "Professional" exam topic.
Amazon OpenSearch Service (Managed & Serverless)
- Strengths: Built-in
k-NNplugin, incredibly fast, handles billions of records. - The Choice: Use OpenSearch Serverless for "Least Operational Overhead."
Amazon Aurora (with pgvector)
- Strengths: Best if your metadata is complex and lives in a relational database.
- The Choice: Use this if you want to perform a SQL join between your AI knowledge and your transactional customer data.
Amazon Neptune (Graph Database)
- Strengths: Best for "Knowledge Graphs" where the relationship between entities (e.g., "Person X is the CEO of Company Y") is as important as the text meaning.
4. The Embedding Workflow in Python
import boto3
import json
bedrock = boto3.client(service_name='bedrock-runtime')
def get_embedding(text):
# Using Titan Embeddings v2
body = json.dumps({"inputText": text})
response = bedrock.invoke_model(
body=body,
modelId='amazon.titan-embed-text-v2:0',
accept='application/json',
contentType='application/json'
)
response_body = json.loads(response.get('body').read())
return response_body['embedding']
# 'cat' and 'kitten' will have very similar vector coordinates
vec_cat = get_embedding("Cat")
vec_kitten = get_embedding("Kitten")
print(f"First 5 dimensions of 'cat': {vec_cat[:5]}")
5. Decision Factors for the Exam
When choosing a vector store, evaluate:
- Scale: If you have 100 docs, use anything. If you have 100 million, use OpenSearch.
- Consistency: Do you need ACID compliance (SQL)? Use Aurora.
- Cost: OpenSearch Serverless has a minimum cost (OCUs). For very small projects, a local vector store like Chroma or FAISS on an EC2 instance might be cheaper (though less resilient).
6. Indexing Strategies for Performance
- Flat Indexing: Checks EVERY vector against your query. Perfect accuracy, but very slow as data grows.
- HNSW (Hierarchical Navigable Small Worlds): The industry standard. It builds a "neighborhood map" of vectors so it only checks the most likely candidates. Extremely fast.
graph TD
User[User Query] --> Embed[Embedding Model]
Embed --> QueryVec[Query Vector]
QueryVec --> Search{Vector Index}
Search -->|HNSW Search| Match1[Best Match]
Search -->|HNSW Search| Match2[2nd Best Match]
Match1 --> RAG[RAG Context]
Match2 --> RAG
Knowledge Check: Test Your Vector Knowledge
?Knowledge Check
A company is building a RAG application and needs a vector store that can easily scale to handle unpredictable bursts in traffic without the need to manually manage or provision servers. Which AWS service is the most appropriate choice?
Summary
You now understand the "Math" of AI. You know how to turn text into vectors and where to store them.
This concludes Domain 1: Foundation Model Integration and Data Management. You have mastered more than 30% of the exam material. In the next module, we move to Domain 2: Knowledge Bases and RAG Architectures, where we put these vectors to work.
Next Module: The Grounding of AI: Retrieval-Augmented Generation (RAG) Concepts