Module 6 Lesson 5: Rollbacks and Error Recovery
·DevOps

Module 6 Lesson 5: Rollbacks and Error Recovery

The emergency exits. Learn how to perform instant rollbacks when a deployment goes wrong and how to automate recovery using GitLab's built-in tools.

Module 6 Lesson 5: Rollbacks and Error Recovery

No matter how many tests you have, things will eventually break. A senior DevOps engineer is judged not by whether they have outages, but by how Fast they fix them.

1. The Manual Rollback (The "Easy" Button)

Inside the GitLab Operate -> Environments page:

  • Next to every successful deployment, there is a "Rollback" button.
  • Clicking this simply re-runs the "Deploy" job of the previous successful pipeline.
  • Tip: This is why "Immutable Artifacts" (Module 3) are so important. You need that old version to still exist on the server to roll back to it!

2. Automated Rollbacks (V14+)

For Kubernetes and certain cloud environments, GitLab can detect an error and roll back automatically.

  • If the "Health Check" (Module 5) fails for 2 minutes after a deployment, GitLab fires a webhook to the orchestrator to revert to the previous image.

3. "Fix Forward" vs "Roll Back"

  • Roll Back: Reverting to the old version. (Best for "Site is Down" emergencies).
  • Fix Forward: Pushing a new, quick fix to the main branch. (Best for "Typo in a button" or small UI bugs).

4. The "Post-Mortem"

Once the site is back up, you must use the GitLab Audit Events and Pipeline Logs to find out:

  1. Why did the tests pass but the deployment failed?
  2. Was it a server configuration issue?
  3. How can we add a new "Quality Gate" (Module 5) to ensure this specific bug NEVER happens again?

Exercise: The Emergency Drill

  1. Imagine your "Production" deployment script just deleted the /var/www folder. What is your 5-second plan to fix it?
  2. Go to Operate -> Environments and find the "Rollback" button. Research: Does the rollback re-run the test stage too?
  3. Why is the "Roll Back" strategy better for user experience than leaving the site "Broken" while you try to fix it?
  4. Search: How do you use the when: on_failure keyword in a .gitlab-ci.yml to send an alert to a phone?

Summary

You have completed Module 6: Deployment Strategies. You now have the skills to get code to the user using SSH, manage complex staging/production pipelines, use advanced patterns like Blue-Green, and recover instantly when things go wrong.

Next Module: Container transformation: Module 7: Containerized Pipelines.

Subscribe to our newsletter

Get the latest posts delivered right to your inbox.

Subscribe on LinkedIn