Module 3 Lesson 5: Prompting Local Models
·AI & LLMs

Module 3 Lesson 5: Prompting Local Models

Talking to the machine. Why prompting a local 8B model requires a different approach than ChatGPT.

Prompting Local Models: Better Inputs = Better Outputs

If you are used to ChatGPT (GPT-4), you are used to a model that is "lazy-proof." It understands your intent even if your prompt is vague.

Local models (especially 8B and smaller) don't have that luxury. To get GPT-4 level results from a local model, you need to be a better "Prompt Engineer."

1. The Power of "Be Explicit"

Cloud models can guess your context. Local models need you to provide it.

  • Bad Prompt: "Write a python script."
  • Good Prompt: "You are an expert Python developer. Write a script using the requests library to fetch data from a JSON API. Include error handling for a 404 status code."

2. Using "Few-Shot" Prompting

Few-shot prompting means giving the model a few examples of what you want before asking your question. This is the single most effective way to improve local model performance.

Example Prompt:

Classify the sentiment of these movie reviews:
1. "I hate this movie, it was too slow." -> Sentiment: Negative
2. "The acting was superb and the plot was tight." -> Sentiment: Positive
3. "The cinematography was great but the ending was dull." -> Sentiment: 

By providing examples #1 and #2, the 8B model now perfectly understands the format and logic you expect for #3.


3. The Role of the "System Prompt"

In Ollama, the "System Prompt" is the invisible instruction that sets the model's personality and boundaries.

  • System: "You are a helpful assistant." (Standard)
  • System: "You are a grumpy Linux sysadmin. Use technical jargon and be brief." (Custom)

We will learn how to bake these into your own models in Module 5, but you can test them in the CLI using: /set system "You are a pirate."


4. Temperature: Creativity vs. Accuracy

Local models have a setting called Temperature (default 0.7 or 0.8).

  • Lower Temp (0.1 - 0.2): The model is deterministic. It picks the most likely next word. Use this for code and factual analysis.
  • Higher Temp (0.9 - 1.0): The model takes risks. Use this for creative writing and brainstorming.

5. Output Formatting (JSON)

One of the hardest things for a small model is staying in a specific format. To help it:

  1. Ask for JSON explicitly.
  2. Provide the JSON schema.
  3. Remind it: "Respond ONLY with the JSON object. Do not provide an introduction."

Summary Checklist for Better Prompting

  • Role: Did I tell the model who it is?
  • Context: Did I provide enough background?
  • Examples: Did I use 2-3 examples (Few-shot)?
  • Format: Did I specify the expected structure?

Key Takeaways

  • Local models are less "forgiving" than cloud models.
  • Few-shot prompting is your best tool for accuracy.
  • Specifying roles (System Prompts) drastically changes the quality of output.
  • Control Temperature based on whether you need "Facts" or "Vibes."

Subscribe to our newsletter

Get the latest posts delivered right to your inbox.

Subscribe on LinkedIn