
Module 3 Lesson 4: Caching Dependencies
Stop waiting for 'npm install'. Learn the difference between Artifacts and Caching, and how to use Cache to speed up your pipeline by 500%.
Module 3 Lesson 4: Caching Dependencies
The #1 reason pipelines are slow is Downloading Libraries.
- Every build runs
npm installorpip install. - This takes 2 minutes every time. Cache allows you to "Save" these libraries between pipeline runs.
1. Cache vs. Artifacts
This is the most common confusion:
- Artifacts: Pass data forward to the next stage of the same pipeline. (e.g. Build -> Test).
- Cache: Pass data forward to the next pipeline (e.g. Today's run -> Tomorrow's run).
2. Setting Up the Cache
# Global cache for the whole project
cache:
# Using the lock file as a key.
# If the lock file doesn't change, the cache remains valid!
key:
files:
- package-lock.json
paths:
- node_modules/
test-job:
script:
- npm install # This will be INSTANT if the cache is hit
- npm test
Visualizing the Process
graph TD
Start[Input] --> Process[Processing]
Process --> Decision{Check}
Decision -->|Success| End[Complete]
Decision -->|Retry| Process
3. Choosing the Cache Key
- Constant Key:
key: project-wide. Every run uses the same cache. Simple, but can get "Dirty." - Branch Key:
key: $CI_COMMIT_REF_SLUG. Every branch has its own cache. - Lock File Key (Pro):
key: { files: [package-lock.json] }. The cache only updates when you actually add a new library.
4. Why Cache is "Fragile"
- Not Guaranteed: Unlike Artifacts, GitLab might delete a cache if it needs space. Your script should always be able to run
npm installfrom scratch if the cache is missing. - Storage Costs: Large caches (like 1GB of Docker layers) can fill up your runner's hard drive quickly.
Exercise: The Speed Benchmark
- Create a project with a
package.jsonthat includes a heavy library (likelodash). - Run a pipeline that does
npm installWITHOUT cache. Note the time. - Add the
cacheblock from Section 2. - Run the pipeline again.
- Run it a third time. Was the third run significantly faster than the first?
- Modify the
package-lock.json. Watch the logs: Does the runner say "Creating cache" or "Restoring cache"?
Summary
Caching is the difference between a pipeline that developers love (fast) and a pipeline that developers ignore (slow). By properly caching your vendor folders, you respect your developers' time and your company's compute budget.
Next Lesson: Logical control: Job control (only, except, rules).