LangChain Integration: Using your Custom LLM Class

LangChain Integration: Using your Custom LLM Class

The Framework Bridge. Learn how to plug your private fine-tuned model into the LangChain ecosystem by writing a custom LLM provider class.

LangChain Integration: Using your Custom LLM Class

You have your fine-tuned model running on a FastAPI server (Module 13). Now you want to use it inside a complex application with chains, routers, and pre-processors.

Most people use LangChain for this. While LangChain has built-in support for OpenAI and Anthropic, it doesn't know about your specific model. To use it, you need to create a Custom LLM Wrapper. This allows LangChain to "speak" to your fine-tuned model just as if it were GPT-4.

In this lesson, we will write the Python class that connects your private intelligence to the world's most popular AI framework.


1. Why Wrap Your Model?

  1. Uniform Interface: You can swap your fine-tuned model for GPT-4 with a single line of code for testing.
  2. Tool Access: Once your model is a "LangChain LLM," it can automatically use hundreds of LangChain tools (Google Search, Python REPL, SQL).
  3. Observability: LangChain (and LangSmith) can track the performance and latency of your custom model automatically.

2. The Custom LLM Blueprint

To integrate with LangChain, you create a class that inherits from BaseChatModel. You only need to implement one main method: _generate.


Visualizing the Framework Interop

graph LR
    A["LangChain Components (Chains, Agents)"] --> B["Your Custom LLM Class"]
    B --> C["HTTP Request (FastAPI)"]
    C --> D["your-fine-tuned-model (vLLM)"]
    
    subgraph "The Framework Bridge"
    B
    end
    
    D --> E["Response Output"]
    E --> B
    B --> A

3. Implementation: The LangChain Wrapper

Here is the production-ready code to connect LangChain to your microservice.

from typing import Any, List, Optional
from langchain_core.language_models import BaseChatModel
from langchain_core.messages import BaseMessage, AIMessage
from langchain_core.outputs import ChatResult, ChatGeneration
import requests

class FineTunedCustomLLM(BaseChatModel):
    model_url: str
    model_name: str

    def _generate(
        self,
        messages: List[BaseMessage],
        stop: Optional[List[str]] = None,
        run_manager: Optional[Any] = None,
        **kwargs: Any,
    ) -> ChatResult:
        # 1. Prepare the payload for your FastAPI wrapper (Module 13)
        last_message = messages[-1].content
        payload = {
            "prompt": last_message,
            "max_tokens": 500,
            "temperature": 0.0
        }

        # 2. Make the HTTP call to your server
        response = requests.post(f"{self.model_url}/generate", json=payload)
        response.raise_for_status()
        data = response.json()

        # 3. Wrap the text in an AIMessage
        message = AIMessage(content=data["data"])
        generation = ChatGeneration(message=message)
        return ChatResult(generations=[generation])

    @property
    def _llm_type(self) -> str:
        return "custom_fine_tuned_model"

# Usage
llm = FineTunedCustomLLM(model_url="http://localhost:8080", model_name="my-specialist")
print(llm.invoke("What is the status of project X?").content)

4. Why "BaseChatModel" vs "BaseLLM"?

  • BaseLLM: For simple text-in, text-out models (Legacy).
  • BaseChatModel: For models that use System/User/Assistant roles. Since we fine-tuned our model on conversation data in Module 8, you should almost always use BaseChatModel. This allows you to use LangChain's powerful ChatPromptTemplate features.

Summary and Key Takeaways

  • Custom Wrappers are the glue that connects your private model to the LangChain ecosystem.
  • Interchangeability: You can now use your model anywhere LangChain expects an LLM.
  • HTTP Proxy: Your LangChain class acts as a client for your FastAPI inference server.
  • Legacy Warning: Always prefer BaseChatModel over the older BaseLLM for modern, instruction-tuned models.

In the next lesson, we will look at a more advanced agentic structure: LangGraph and Agents: Specializing the Reasoning Loop.


Reflection Exercise

  1. If you update your model weights but keep the FastAPI server running at the same URL, do you need to change your LangChain code?
  2. Why is it useful to inherit from BaseChatModel even if you only plan to use one single prompt? (Hint: Think about future features like 'Streaming' and 'Memory').

SEO Metadata & Keywords

Focus Keywords: LangChain custom LLM class, integrating private model LangChain, BaseChatModel implementation tutorial, invoking custom AI locally, AI framework integration. Meta Description: Connect your private brain to the world. Learn how to write a custom LangChain LLM wrapper to plug your fine-tuned models into the most powerful AI application framework.

Subscribe to our newsletter

Get the latest posts delivered right to your inbox.

Subscribe on LinkedIn