Module 14 Exercises: Architecture and Design

In Module 14, we moved beyond individual components and started building complete systems. These exercises will test your ability to think as a System Architect. You will need to balance competing requirements for security, performance, and reliability.

Exercise 1: The RAG Pipeline Blueprint

Scenario: You are building a Retrieval-Augmented Generation (RAG) system. It includes:
- A Vector Database (Qdrant).
- An Embedding Service (Python/FastAPI).
- An Orchestrator (LangChain/FastAPI).
- A Frontend (Next.js).
Architecture Task: Draw (in text or Mermaid) the network path of a user's question.
- Which components should have a Public Ingress?
- Which components should be isolated with a NetworkPolicy?
Storage: Which components need a Persistent Volume? Which can be Stateless?

Exercise 2: Shared Cluster Governance

Scenario: You are hosting 3 internal teams on one cluster: Data-Science, Web-Front, and Internal-Tools.
Resource Allocation:
- You have 16 GPUs total.
- The Data-Science team is "Greedy" and often tries to use all 16.
- The Internal-Tools team has a tiny budget.
Task: Define the ResourceQuotas for each of the three namespaces. How do you ensure the Web-Front team always has at least 2 GPUs available for their production search feature?

Exercise 3: The Database "Chaos" Test

Preparation: You have a 3-node PostgreSQL cluster running with an Operator.
The Test: Describe exactly what you would do to simulate a "Regional Outage."
Analysis:
- When you delete the Primary Pod, how does the application know the new IP address of the promoted Replica? (Hint: Does the Service name change?)
- How do you verify that data written 1 second before the crash was successfully replicated?

Exercise 4: Scaling Strategy Design

Scenario: Your AI inference workers take 60 seconds to boot up because they have to download a large model.
Problem: If you use standard CPU scaling, your users will experience a 60-second delay every time the cluster scales up.
Solution: Design a "Warm Pool" or "Predictive" scaling strategy. How would you use CronJobs (Module 3.2) or Over-provisioning (Module 8.3) to solve this?

Solutions (Self-Check)

Exercise 1 Answer:

Ingress: ONLY the Frontend should be public.
NetworkPolicy: The Vector DB should only allow traffic from the Orchestrator. The Embedding service should only allow traffic from the Orchestrator.
Storage: Only the Vector DB needs a Persistent Volume. Everything else is stateless and can be scaled horizontally.

Exercise 2 Solution:

Data-Science: Quota = 14 GPUs.
Web-Front: Quota = 2 GPUs (Guaranteed by giving them higher PriorityClass or a dedicated node group).
Internal-Tools: Quota = 0 GPUs.
Governance: By capping Data Science at 14, Web-Front is mathematically guaranteed their 2.

Exercise 3 Hint:

Service: No, the Service name stays the same (e.g., my-db-primary). The Operator updates the Service's Selector to point to the new pod. The application never needs to change its connection string!
Data: You look at the LSN (Log Sequence Number) in Postgres to ensure the replica was in sync with the master.

Exercise 4 Logic:

Over-provisioning: Create a "Placeholder" pod with the same resource requests but a very low PriorityClass. When a "Real" AI pod needs to start, K8s will kill the placeholder instantly, giving its spot to the AI pod. This provides "Negative Latency" for scaling.

Summary of Module 14

Congratulations! You have built the most critical systems in the Kubernetes ecosystem.

You designed an AI Inference Pipeline.
You implemented a Multi-tenant SaaS Platform.
You managed a High-Availability Database.

You are now ready for the Final Challenge. In Module 15: The Capstone Project, you will build a global-scale AI platform from the ground up.