The Capstone Project - Part 2: Implementation and Automation

The Capstone Project - Part 2: Implementation and Automation

Build the machine. Implement the Helm charts, GitOps workflows, and CI/CD pipelines needed to deploy the OmniVision platform across a global, multi-cluster environment.

The Capstone Project: Part 2 - Building the Automation Engine

In Part 1, we designed the OmniVision AI architecture on paper. We have our EKS and GKE clusters, our Zero-Trust security model, and our scaling strategy. Now, it is time to build.

In a modern enterprise, we never "manually" deploy anything. Everything must be declarative. If a cluster in Oregon is destroyed, we should be able to recreate it perfectly by simply pointing our automation at a Git repository.

In this second part of the Capstone, we will master the Implementation phase. We will build a unified Helm Chart for OmniVision, structure our GitOps Repository, and create a GitHub Actions Pipeline that builds, scans, and deploys our AI agents to the world.


1. The Unified Helm Chart: One Blueprint, Many Clouds

We will create a single chart named omnivision-platform. Instead of hardcoding AWS or GCP specific values, we use Toggles.

values.yaml structure:

global:
  cloudProvider: "aws" # or "gcp"
  imageRegistry: "<account>.dkr.ecr.us-east-1.amazonaws.com"

aiWorker:
  image: "omnivision-worker"
  tag: "latest" # Overwritten by CI
  resources:
    limits:
      nvidia.com/gpu: 1
  # Use cloud-specific affinity
  nodeAffinity:
    {{- if eq .Values.global.cloudProvider "aws" }}
    key: "eks.amazonaws.com/nodegroup"
    operator: In
    values: ["gpu-workers"]
    {{- else }}
    key: "cloud.google.com/gke-accelerator"
    operator: In
    values: ["nvidia-tesla-t4"]
    {{- end }}

2. The GitOps Repository Structure

We follow the Folder-per-Environment pattern for ArgoCD (Module 11.4).

omnivision-gitops/
  infrastructure/
    cluster-aws/
      argo-app.yaml
      values-overrides.yaml # AWS specific IDs
    cluster-gcp/
      argo-app.yaml
      values-overrides.yaml # GCP specific IDs
  apps/
    omnivision-ui/
    omnivision-worker/

By pushing a change to the values-overrides.yaml in the cluster-aws folder, ArgoCD will immediately sync the Oregon cluster while leaving the Belgium cluster untouched, allowing for Regional Testing.


3. The CI/CD Pipeline: The "Secure Conveyor Belt"

Our GitHub Actions pipeline must be a fortress.

The Stages:

  1. Build: Create the Docker image using Buildx for cache optimization.
  2. Scan: Run Trivy (Module 10.4). Stop the build if high vulnerabilities are found.
  3. Sign: Use Sigstore Cosign (Module 10.4) to digitally sign the image.
  4. Update Git: The pipeline then automatically creates a Pull Request in the omnivision-gitops repo to update the image.tag to the new version.
sequenceDiagram
    participant Dev as Developer
    participant CI as GitHub Actions
    participant ECR as ECR / GCR
    participant Git as GitOps Repo
    participant Argo as ArgoCD
    
    Dev->>CI: Push Code
    CI->>CI: Build & Scan (Trivy)
    CI->>ECR: Push Signed Image
    CI->>Git: Update image.tag=v1.2
    Git->>Argo: Webhook Trigger
    Argo->>Argo: Diff & Sync
    Argo->>ECR: Pull Signed Image
    Argo-->>Dev: Deployment Successful

4. Implementing the sidecar injection

For deep observability, we need to ensure every worker pod has a FluentBit agent to ship logs to Loki (Module 9.3).

Instead of developers adding this to their Helm chart, we use a Mutating Admission Webhook (Module 12.4). We install the FluentBit Operator.

  • We label our application namespaces logging=enabled.
  • The Operator sees the label and automatically injects the sidecar into every pod.
  • This ensures 100% Log Coverage regardless of who deployed the app.

5. Practical Example: The "Canary" Rollout Manifest

Combining what we learned in Module 11.5, our Argo Rollout will handle the promotion between versions.

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: omnivision-worker
spec:
  strategy:
    canary:
      analysis:
        templates:
        - templateName: gpu-health-check
      steps:
      - setWeight: 5
      - pause: { duration: 10m }
      - setWeight: 50
      - pause: { duration: 30m }

If the GPU Health Check (which checks for CUDA errors in Prometheus) fails during the 5% phase, the rollout Automatically Reverts.


6. Next Steps

You have built the automation. The "Machine" is now capable of deploying code globally with a single Git commit. In the Final Part of the Capstone Project, we will focus on Operational Excellence: Setting up the Grafana dashboards, performing a Disaster Recovery drill, and the final course graduation.

Your Thinking Exercise:

If your CI/CD pipeline correctly signs an image but a hacker manages to push a malicious, unsigned image with the same name to your ECR registry, will your cluster run it? (Hint: Re-read Part 1 on Admission Controllers).


7. SEO Metadata & Keywords

Focus Keywords: Kubernetes GitOps repository structure, building Helm charts for multi-cloud, automating Trivy scans in GitHub Actions, Sigstore Cosign Kubernetes tutorial, ArgoCD multi-cluster sync, K8s sidecar injection for logging.

Meta Description: Move from theory to code. Learn how to implement a professional automation engine for your global Kubernetes platform, mastering Helm, GitOps, and secure CI/CD pipelines to deploy AI services with zero manual effort.

Subscribe to our newsletter

Get the latest posts delivered right to your inbox.

Subscribe on LinkedIn