Python: Implementing Hybrid Search

We have reached the culmination of Module 7. You have seen the architecture, the mappings, and the decision frameworks for OpenSearch. Now, we write the code.

In this lesson, we will build a complete Python search client. We will go through the process of:

Creating a Search Pipeline for normalization.
Ingesting documents with text and vectors.
Executing a Hybrid Query that targets both the inverted index and the k-NN index.
Handling the results in a way that respects the RRF (Reciprocal Rank Fusion) ranking.

1. Prerequisites: The "Search Pipeline"

Before we can run a hybrid query in Python, we must ensure OpenSearch is configured to merge the scores. We do this once during the setup of our application.

from opensearchpy import OpenSearch

client = OpenSearch(hosts=[{'host': 'localhost', 'port': 9200}], http_auth=('admin', 'admin'))

def create_search_pipeline():
    pipeline_id = "norm-pipeline"
    pipeline_body = {
        "description": "Normalize scores for hybrid search",
        "phase_results_processors": [
            {
                "normalization": {
                    "normalization": {"technique": "min_max"},
                    "combination": {
                        "technique": "arithmetic_mean",
                        "parameters": {"weights": [0.3, 0.7]} # 30% Keywords, 70% Vectors
                    }
                }
            }
        ]
    }
    client.transport.perform_request('PUT', f'/_search/pipeline/{pipeline_id}', body=pipeline_body)
    print(f"Pipeline '{pipeline_id}' created.")

# create_search_pipeline()

2. Setting Up the Index Mapping

We need an index that supports both text (for Keywords) and knn_vector (for Semantic).

def create_hybrid_index(index_name):
    settings = {
        "settings": {"index": {"knn": True}},
        "mappings": {
            "properties": {
                "content_text": {"type": "text"},
                "content_vector": {
                    "type": "knn_vector",
                    "dimension": 1536,
                    "method": {
                        "name": "hnsw",
                        "space_type": "cosinesimil",
                        "engine": "nmslib"
                    }
                },
                "metadata": {"type": "keyword"}
            }
        }
    }
    client.indices.create(index=index_name, body=settings)

3. The Hybrid Search Function

This is the core of your application. Notice how we pass the query twice: once as strings for the keyword engine and once as vectors for the AI engine.

def search_hybrid(query_text, query_vector, index_name="my_index"):
    search_query = {
        "size": 5,
        "query": {
            "hybrid": {
                "queries": [
                    {
                        "match": {
                            "content_text": query_text
                        }
                    },
                    {
                        "knn": {
                            "content_vector": {
                                "vector": query_vector,
                                "k": 10
                            }
                        }
                    }
                ]
            }
        }
    }
    
    # Passing the pipeline via params
    response = client.search(
        index=index_name,
        body=search_query,
        params={"search_pipeline": "norm-pipeline"}
    )
    
    return response['hits']['hits']

4. Handling Results: The Metadata Advantage

Unlike Pinecone, where you often just get IDs, OpenSearch returns the full source document. This allows you to build rich UIs immediately.

results = search_hybrid("password reset", [0.12, 0.33, ...])

for hit in results:
    score = hit['_score']
    text = hit['_source']['content_text']
    category = hit['_source']['metadata']
    
    print(f"[{score:.4f}] Category: {category}")
    print(f"Content: {text[:100]}...")
    print("-" * 30)

5. The "Enterprise" Tip: Bulk Helpers

When dealing with large enterprise datasets, don't use the standard index() method. Use the helpers.bulk method. It is 10x faster and handles retries automatically.

from opensearchpy import helpers

def bulk_ingest(docs):
    actions = [
        {
            "_index": "my_index",
            "_source": doc
        }
        for doc in docs
    ]
    helpers.bulk(client, actions)

Summary and Module 7 Wrap-up

You have completed the transition to Enterprise Vector Search.

Architecture: You understand that OpenSearch is a document store + a keyword engine + a vector database.
Configuration: You can define knn_vector mappings and hardware engines (nmslib/faiss).
Logic: You can implement Hybrid Retrieval using normalization and RRF.
Code: you have a working Python pattern for production search pipelines.

What's Next?

In Module 8: CRUD Operations in Vector Databases, we go back to basics but with a "Production" twist. We will learn how to handle updates (UPSERTS), deletions, and the terrifying problem of Re-indexing when your model changes.

Exercise: Building a Re-ranker Hook

Modify the search_hybrid function above:

Retrieve the top 20 results using the hybrid query.
If the highest score is below 0.5, print a warning: "Low confidence result."
Look into the rank_features field in OpenSearch. How could you add "User Popularity" as a third signal in your hybrid search?

Python Masterclass: Implementing Hybrid Search in OpenSearch