
Implementing LoRA with the PEFT Library
Hands-on Efficiency. Learn how to use the Hugging Face PEFT library to wrap any base model with a LoRA configuration and start training on budget hardware.
Implementing LoRA with the PEFT Library: Hands-on Efficiency
We have learned the math (Matrix Decomposition), the logic (Frozen Weights), and the knobs (Rank and Alpha). Now, we put it into code.
To implement LoRA in the modern ecosystem, we use the PEFT (Parameter-Efficient Fine-Tuning) library from Hugging Face. This library acts as a "Wrapper" for any base model. It handles the freezing of weights, the insertion of the adapter layers, and the merging of weights at the end—all with just a few lines of Python.
In this final lesson of Module 9, we will build a complete LoRA configuration.
1. The PEFT Workflow
The PEFT workflow consists of three main steps:
- Define the Config: Set your Rank, Alpha, and Target Modules.
- Wrap the Model: Turn a standard model into a
PeftModel. - Train normally: Use the same
Trainerclass from Module 8.
2. Implementation: The LoRA Configuration
Here is how you set up a professional-grade LoRA adapter for a Mistral or Llama model.
from peft import LoraConfig, get_peft_model, TaskType
# 1. Define the LoRA Config
lora_config = LoraConfig(
# The 'Capacity' knobs from Lesson 4
r=16,
lora_alpha=32,
lora_dropout=0.05,
# Where to apply the LoRA (The 'All-Linear' strategy)
target_modules=[
"q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj",
],
# The type of task (Causal Language Modeling)
bias="none",
task_type=TaskType.CAUSAL_LM
)
# 2. Wrap the Model
# Assuming 'model' is already loaded via transformers (Module 8)
model = get_peft_model(model, lora_config)
# 3. Print the 'Trainable Parameters'
# This will show you that only ~1% of weights are being updated!
model.print_trainable_parameters()
# Output: trainable params: 20,971,520 || all params: 7,262,031,872 || trainable%: 0.288
3. Saving and Loading LoRA Adapters
When you save a PEFT model, you don't save the whole 14GB model. You only save the adapter (the $A$ and $B$ matrices).
Saving
model.save_pretrained("./lora-adapter-v1")
Inside this folder, you will see a file named adapter_model.bin (usually 50MB-100MB) and a adapter_config.json.
Loading for Inference
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM
# a. Load the base model (Frozen)
base_model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1")
# b. 'Plug in' the adapter
model = PeftModel.from_pretrained(base_model, "./lora-adapter-v1")
# The model is now specialized and ready for use!
Visualizing the Adapter Attachment
graph LR
A["Raw Model (Disk)"] --> B["Load into VRAM (Base)"]
C["Adapter Config (Disk)"] --> D["Attach Adapters to Base Layers"]
B --> E["Combined Model (Ready to Train/Run)"]
D --> E
subgraph "PEFT Integration"
D
end
4. The "Merging" Step (Production)
If you want the fastest possible performance in production, you "Merge" the weights. This calculates $W' = W + (A \times B)$ as a one-time operation.
# Create a single merged model
merged_model = model.merge_and_unload()
# Now save the FULL model (14GB) for high-speed production use
merged_model.save_pretrained("./production-ready-model")
Summary and Key Takeaways
- PEFT library is the standard tool for implementing adapters.
- Config: Use
r,lora_alpha, andtarget_modulesto define your adapter. - Print Trainable: Always check
print_trainable_parameters()to verify your efficiency gains. - Portability: Adapters are tiny files (MBs) compared to base models (GBs).
- Merging: Use
merge_and_unload()to eliminate inference latency in production.
Congratulations! You have completed Module 9. You are now a master of Parameter-Efficient Fine-Tuning. You know how to build models that are both smart and efficient.
In Module 10, we will look at how to tell if your model is actually good: Evaluation and Metrics.
Reflection Exercise
- If you have two different adapters (one for French and one for Spanish), can you swap them on the same base model in real-time? How does this change your server architecture?
- Look at the
target_moduleslist. Why isgate_projincluded? (Hint: Does the 'knowledge' of the model only live in the Attention layers?)
SEO Metadata & Keywords
Focus Keywords: Implementing LoRA with PEFT, get_peft_model tutorial, LoRA target modules list, save_pretrained PEFT, merging LoRA weights. Meta Description: Get hands-on with AI efficiency. Learn how to use the Hugging Face PEFT library to implement LoRA adapters, track trainable parameters, and package your models for production.