
Shipping to Production: Deployment Checklist
Cross the finish line. Learn the infrastructure, monitoring, and scaling steps required to take your local agent from a prototype to a live, production-grade service.
Live Deployment of a Basic Agent
You have reached a major milestone. You now have all the components of a production agent system. But "It works on my machine" is not "It works for 1,000 users."
In this lesson, we will cover the final hurdle: Deployment. We will look at the architecture of a live agent system, the critical monitoring steps, and the "Launch Day" checklist.
1. The Production Architecture (Cloud-Native)
In production, you don't run everything on one server. You use a distributed set of services.
- Frontend: Hosted on Vercel or AWS Amplify (Global CDN).
- Backend API: Hosted on AWS ECS, Google Cloud Run, or a Kubernetes cluster.
- Database: Managed PostgreSQL (AWS RDS or Supabase) with the LangGraph checkpointer.
- Cache: Managed Redis (Upstash or Redis Cloud) for rate limiting.
2. Environment Variable Audit
Before you ship, you must ensure your .env file is sanitized.
The "Must-Have" Production Secrets
OPENAI_API_KEY(or Anthropic/Google)DATABASE_URL(Encrypted connection string)LANGCHAIN_API_KEY(For tracing in LangSmith)JWT_SECRET_KEY(For user authentication)SENTRY_DSN(For error tracking)
3. The "Cold Start" and Latency Optimization
Stateful agents can be slow to start if your server has to re-load a massive Docker image or re-initialize a database connection every time.
Optimization Strategies
- Connection Pooling: Use
SQLAlchemyorpgbouncerto keep database connections open. - Pre-warming: In Cloud Run or Lambda, keep a "Provisioned" instance running so the first user doesn't wait 10 seconds.
- Graph Pre-Compilation: Ensure your LangGraph is compiled on startup, not on every request.
4. Monitoring: The "Agent Vital Signs"
Once live, you need to see what is happening in real-time.
- LangSmith: Watch the reasoning traces (Module 4.4).
- Prometheus/Grafana: Watch CPU/Memory usage of your agent containers.
- Sentry: Catch the crashes that the LLM doesn't see (e.g., Database connection lost).
5. Security: The Final Lock Down
- Encryption in Transit: Every connection must be
HTTPS. - Encryption at Rest: The checkpointer database must be encrypted.
- Port Masking: Your API should only expose port 443. Ports like 5432 (Postgres) or 6379 (Redis) should be hidden behind a VPC.
6. Launch Day Checklist
- Rate Limits: Are the production API keys set to the "Tier 5" billing level to avoid 429s?
- Persistence: Did you run the SQL migration to create the
checkpointstable in production? - Frontend: Is the
API_URLin the React app pointing toapi.yourdomain.cominstead oflocalhost? - Tracing: Is LangSmith enabled for the "Production" project?
- Budget: Have you set a "Hard Cap" in the OpenAI dashboard to prevent a $5,000 surprise if the agent loops?
Summary: Congratulations!
You have built more than a "Chatbot." You have built a Production AI Service.
- It is Stateful.
- It is Secure (Isolated).
- It is Observable.
- it is Authenticated.
This concludes Core Training. From Module 11 onwards, we move into Expert Patterns: Tool Engineering, Local LLMs, and System Operations (SysOps).
Exercise: Deployment Review
- The Choice: Would you deploy your agent API to AWS Lambda (Serverless) or AWS ECS (Containers)?
- Pros/Cons: Consider that an agent might take 2 minutes to finish a task, and Lambda has a 15-minute limit but ECS is cheaper for high volume.
- Cost Management: If 100 users all use the agent at the same time and each uses 50,000 tokens, how much will you pay in the first hour?
- (Hint: Do the math for GPT-4o-mini prices).
- Recovery: What is the FIRST thing you do if you see a "Memory Limit Exceeded" error in your production logs?
- (Hint: Look at Module 7.3). You are ready. Ship it.