
OpenSearch Mapping: Defining Vector Fields and k-NN logic
Learn how to configure OpenSearch for vector search. Master the JSON mapping syntax for knn_vector fields and choosing your search engine (HNSW vs. IVF).
OpenSearch Vector Fields and k-NN Search
In Pinecone and Chroma, the schema is largely implicit. In OpenSearch, you must be explicit. To enable vector search, you define a Mapping (a schema) that tells the engine exactly how to handle your high-dimensional data.
This lesson explores the knn_vector field type, the importance of the settings block, and how to configure your HNSW parameters directly within the JSON mapping.
1. Enabling k-NN on an Index
Before you can add a vector field, you must tell the index that it is a "k-NN" index. This activates the background graph-building logic.
{
"settings": {
"index": {
"knn": "true"
}
}
}
2. Defining the knn_vector Mapping
The knn_vector field is the core of your AI search. Here is a production-grade mapping example:
{
"mappings": {
"properties": {
"my_vector_field": {
"type": "knn_vector",
"dimension": 1536,
"method": {
"name": "hnsw",
"space_type": "l2",
"engine": "nmslib",
"parameters": {
"m": 16,
"ef_construction": 200
}
}
},
"text_content": { "type": "text" },
"category": { "type": "keyword" }
}
}
}
Breakdown of Parameters:
space_type:l2(Euclidean),cosinesimil(Cosine), orinnerproduct. (Note the different naming convention in OpenSearch!).engine: Choosing betweennmslib,faiss, orlucene(Module 7, Lesson 1).parameters: Fine-tuning the HNSW graph (Module 3).
3. Querying the Vector Field
Searching in OpenSearch uses the Query DSL (Domain Specific Language). To run a vector search, you use the knn query block.
{
"size": 5,
"query": {
"knn": {
"my_vector_field": {
"vector": [0.1, 0.2, 0.3, ...],
"k": 5
}
}
}
}
4. Why Engines Matter: NMSLIB vs. Lucene
Inside OpenSearch, you have a choice of hardware-optimized engines.
- nmslib: Generally faster than Lucene for large indices. It manages the HNSW graph outside of the JVM (Java Virtual Machine) heap, preventing expensive Garbage Collection pauses.
- Lucene: Better for Hybrid Search. If you need to combine the vector search with a complex boolean filter (
WHERE user_id = 'A'), Lucene's native integration makes this much smoother and more accurate.
5. Python Example: Creating the Index and Searching
Let's look at how to implement this mapping using the opensearch-py client.
from opensearchpy import OpenSearch
client = OpenSearch(...) # Conn logic from Lesson 1
index_name = "products_ai"
# 1. Define the Index with kNN settings
index_body = {
"settings": {
"index": {
"knn": True,
"knn.algo_param.ef_search": "100"
}
},
"mappings": {
"properties": {
"product_vector": {
"type": "knn_vector",
"dimension": 1536,
"method": {
"name": "hnsw",
"space_type": "cosinesimil",
"engine": "nmslib"
}
},
"product_name": {"type": "text"}
}
}
}
# 2. Create the index
client.indices.create(index_name, body=index_body)
# 3. Perform a kNN Search
query_body = {
"size": 2,
"query": {
"knn": {
"product_vector": {
"vector": [0.1]*1536, # Dummy vector
"k": 2
}
}
}
}
response = client.search(body=query_body, index=index_name)
print(response['hits']['hits'])
6. Managing RAM in OpenSearch
Because nmslib and faiss manage memory outside of the JVM, you must be careful not to "Over-allocate" your OpenSearch instances.
A common rule for OpenSearch Vector Nodes:
- 50% of RAM goes to the JVM (for standard filters and caching).
- 50% of RAM goes to the OS / Native Memory (for the k-NN graph index).
If you give 90% of your RAM to the JVM, your vector search will be forced onto the Disk, and your latency will jump from 5ms to 5,000ms.
Summary and Key Takeaways
Configuring OpenSearch for vector search requires a blend of SQL-like schema design and AI infrastructure tuning.
- Enable k-NN at the index level before anything else.
- Choose your engine (nmslib for speed, Lucene for hybrid).
- Map your vectors carefully, matching the
dimensionandspace_typeto your model. - RAM/JVM Balance: Ensure your server has enough "Native Memory" (off-heap) to store your HNSW graphs.
In the next lesson, we will look at Combining keyword and vector search, learning the "Secret Sauce" of OpenSearch: Hybrid Retrieval.
Exercise: Mapping Logic
You are building a "Music Discovery Engine."
- Embedding: 1024D vectors from a custom audio model.
- Model requirement: Dot Product similarity.
- Write a JSON mapping for a field named
melody_vector. - Which
space_typedo you use in OpenSearch for Dot Product? - If you want the search to be extremely fast even with 10M songs, which
engineshould you pick?