Module 3 Lesson 4: Caching Dependencies
·DevOps

Module 3 Lesson 4: Caching Dependencies

Stop waiting for 'npm install'. Learn the difference between Artifacts and Caching, and how to use Cache to speed up your pipeline by 500%.

Module 3 Lesson 4: Caching Dependencies

The #1 reason pipelines are slow is Downloading Libraries.

  • Every build runs npm install or pip install.
  • This takes 2 minutes every time. Cache allows you to "Save" these libraries between pipeline runs.

1. Cache vs. Artifacts

This is the most common confusion:

  • Artifacts: Pass data forward to the next stage of the same pipeline. (e.g. Build -> Test).
  • Cache: Pass data forward to the next pipeline (e.g. Today's run -> Tomorrow's run).

2. Setting Up the Cache

# Global cache for the whole project
cache:
  # Using the lock file as a key. 
  # If the lock file doesn't change, the cache remains valid!
  key:
    files:
      - package-lock.json
  paths:
    - node_modules/

test-job:
  script:
    - npm install # This will be INSTANT if the cache is hit
    - npm test

Visualizing the Process

graph TD
    Start[Input] --> Process[Processing]
    Process --> Decision{Check}
    Decision -->|Success| End[Complete]
    Decision -->|Retry| Process

3. Choosing the Cache Key

  • Constant Key: key: project-wide. Every run uses the same cache. Simple, but can get "Dirty."
  • Branch Key: key: $CI_COMMIT_REF_SLUG. Every branch has its own cache.
  • Lock File Key (Pro): key: { files: [package-lock.json] }. The cache only updates when you actually add a new library.

4. Why Cache is "Fragile"

  1. Not Guaranteed: Unlike Artifacts, GitLab might delete a cache if it needs space. Your script should always be able to run npm install from scratch if the cache is missing.
  2. Storage Costs: Large caches (like 1GB of Docker layers) can fill up your runner's hard drive quickly.

Exercise: The Speed Benchmark

  1. Create a project with a package.json that includes a heavy library (like lodash).
  2. Run a pipeline that does npm install WITHOUT cache. Note the time.
  3. Add the cache block from Section 2.
  4. Run the pipeline again.
  5. Run it a third time. Was the third run significantly faster than the first?
  6. Modify the package-lock.json. Watch the logs: Does the runner say "Creating cache" or "Restoring cache"?

Summary

Caching is the difference between a pipeline that developers love (fast) and a pipeline that developers ignore (slow). By properly caching your vendor folders, you respect your developers' time and your company's compute budget.

Next Lesson: Logical control: Job control (only, except, rules).

Subscribe to our newsletter

Get the latest posts delivered right to your inbox.

Subscribe on LinkedIn