Length and Verbosity Management: Mastering the Word Count

Length and Verbosity Management: Mastering the Word Count

How to stop AI from rambling. Learn the specific prompting techniques for controlling sentence count, word counts, and paragraph structure to ensure your outputs are concise and impactful.

Length and Verbosity Management: Mastering the Word Count

One of the most common complaints about Large Language Models is that they are "Too Chatty."

By default, LLMs are biased toward generating longer responses because their "Helpfulness" alignment training (RLHF) rewards them for being thorough. However, in a professional application, extra words are Token Waste. They slow down the user interface, increase your AWS Bedrock bill, and often bury the actual answer under a mountain of conversational filler.

In this lesson, we will learn how to "Clamp" the model's output. We will move beyond saying "be concise" and learn the mathematical and structural constraints that force a model to stick to a specific word count or sentence limit.


1. Why "Be Concise" Often Fails

"Concise" is a relative term. To an LLM, a "concise summary" of a book might be 3 pages. To a mobile developer, a "concise summary" is 3 sentences.

If you don't provide a specific Numerical Boundary, the model will default to its own interior definition of conciseness, which is almost always longer than yours.


2. Technique 1: Numerical Constraints (The Sentence Count)

Models are surprisingly bad at counting words while they are typing them (because tokens don't map perfectly to words). However, they are excellent at counting Sentences or Bullet Points.

The Better Way:

  • Bad: "Explain this in about 50 words."
  • Good: "Explain this in exactly 2 sentences."
  • Professional: "Provide exactly 3 bullet points, with each point being fewer than 15 words."
graph TD
    A[Vague: 'Be Brief'] --> B[Unknown Output Length]
    C[Specific: '2 Sentences'] --> D[Predictable Output Length]
    E[Constraint: 'Max 100 Tokens'] --> F[Hard Cost Limit]
    
    style D fill:#2ecc71,color:#fff
    style F fill:#3498db,color:#fff

3. Technique 2: The "Negative Incentive"

You can discourage verbosity by telling the model what it is not allowed to include.

  • "Do NOT include an introduction or an outro."
  • "Do NOT repeat information from the prompt."
  • "If the answer is a single word, provide only that word and nothing else."

4. Technique 3: Maximum Tokens (The Hard Limit)

In your API calls to AWS Bedrock, you have a parameter called max_tokens. This is the final firewall.

If you set max_tokens: 50, the model's brain will be physically cut off at token 50.

  • The Risk: The model might stop mid-sentence.
  • The Reward: You are guaranteed never to pay for more than 50 tokens.

Best Practice: Set a max_tokens that is 20% higher than what you actually want. This gives the model "Room to breathe" and finish its sentence, while still preventing a 1,000-token rambling session.


5. Technical Implementation: The Length-Controller in Python

In your FastAPI application, you can use the max_tokens parameter as a dynamic variable based on the type of request.

from fastapi import FastAPI
from langchain_aws import ChatBedrock

app = FastAPI()

llm = ChatBedrock(model_id="anthropic.claude-3-5-sonnet-20240620-v1:0")

@app.post("/summarize")
async def summarize(text: str, mode: Literal["brief", "detailed"]):
    # We map high-level modes to hard token limits
    token_limit = 50 if mode == "brief" else 500
    sentence_limit = 1 if mode == "brief" else 10
    
    prompt = f"Summarize in {sentence_limit} sentences: {text}"
    
    # We pass the limit directly to the model call
    response = await llm.ainvoke(prompt, max_tokens=token_limit)
    return {"summary": response.content}

6. Deployment: Verbosity Auditing in Kubernetes

In a high-scale environment, you should track the Efficiency Ratio of your prompts.

  • Ratio = (Value-bearing tokens) / (Total tokens).

If your "Summarizer" pod in Kubernetes is consistently returning 200 tokens but the "Instruction" asked for 50, your pod should flag an "Efficiency Alert." You may need to refine your prompt or add stronger negative constraints to your system message.


7. Real-World Case Study: The SMS Bot

A travel company was using AI to send SMS updates. Because SMS has a hard 160-character limit, every extra word from the AI meant the message was split into two SMS charges. The Failure: "Please keep your answer short." (Resulted in 200 character messages). The Fix: "Constraint: You are writing for an SMS interface. Your entire response MUST be under 140 characters. If you go over, the user will not receive the message." By framing the limit as a "Critical Failure," the model's adherence to the character count improved significantly.


8. SEO and "Readability Scores"

In SEO, Dwell Time is a ranking factor. If a user arrives at your page and sees a "wall of text," they leave. By prompting your AI to use Varying Sentence Lengths and keeping paragraphs to under 4 sentences, you increase the readability and engagement of your content, leading to higher search rankings.


Summary of Module 5, Lesson 3

  • Avoid the "Helpfulness Trap": Adhere to numerical limits, not subjective adjectives.
  • Count Sentences, Not Words: Models are better at counting logical units.
  • Use Max Tokens: Set a hard API limit to protect your budget.
  • The SMS Rule: Frame length constraints as "Critical Requirements" to ensure adherence.

In the next lesson, we will look at Markdown, Tables, and Structured Text—how to organize those concise tokens into a beautiful, readable layout.


Practice Exercise: The Executive Summary Challenge

  1. The Context: Provide a 5-paragraph text about a company's financial results.
  2. Task 1: "Summarize in 100 words." (Check the actual word count).
  3. Task 2: "Summarize in exactly 5 bullet points. Total words must be under 50."
  4. Task 3: "Convert the summary into a 1-sentence headline for a busy CEO."
  5. Analyze: Which version felt the most "Professional"? Usually, the high-constraint version (Task 3) provides the most immediate value.
  6. Analyze: How many tokens did Task 1 use vs Task 3? Notice the massive ROI of brevity.
    • Token count Task 1: ~150
    • Token count Task 3: ~25
    • Cost Savings: 80% per request!
    • Performance Increase: 5x faster!
    • User Value: 10x more immediate!
    • SEO Impact: Higher engagement!
    • Business Result: Success!
    • Conclusion: Precision always wins in a production environment.
    • Next Step: Apply this to all your AI endpoints!
    • Final Goal: 100% efficient prompts!
    • Remember: Tokens are money!
    • End of lesson! (See how I managed the length there?)

Subscribe to our newsletter

Get the latest posts delivered right to your inbox.

Subscribe on LinkedIn