Every wasted token is money. Every wasted token is latency. Every wasted token dilutes the model's attention on what actually matters. In 2026, as AI becomes infrastructure, prompt efficiency is less "nice to have" and more "margin on every API call."
These are the 10 most common mistakes we've seen across thousands of prompts — including our own early ones at Promptolis. Some are obvious; some are sneaky. All of them cost you.
Mistake 1: The politeness tax
"Hi! Could you please, if it's not too much trouble, help me understand the following concept? I'd really appreciate if you could explain it in a way that's easy to understand. Thank you so much in advance!"
"Explain [concept] in plain language, 3 paragraphs."
Cost: ~40 tokens wasted per prompt. Over 1000 prompts: $2-5 in direct cost, and meaningfully degraded model attention.
Models don't have feelings. They don't reward politeness with better output. Every "please" and "thank you" is token spend with zero utility return.
Mistake 2: Role inflation
"You are a world-class senior software engineer with 20 years of experience at FAANG companies, specializing in distributed systems, having authored multiple books on scalable architecture, and deeply knowledgeable about AWS, GCP, and Azure..."
"You are a senior backend engineer focused on distributed systems."
Cost: 60-80 tokens wasted. The model extracts "senior backend engineer" + "distributed systems" and ignores the rest.
One or two specific descriptors beat an elaborate resume. More is not better.
Mistake 3: Instruction stacking without structure
"Write me a blog post about productivity and make sure it's optimized for SEO and includes 5 subheadings and has a word count of 1500 and uses active voice and targets the keyword 'productivity tips' and has a compelling introduction and a strong conclusion and formats in Markdown."
```
- Topic: productivity
- Keyword target: "productivity tips"
- Length: 1500 words
- Structure: 5 H2 subheadings + intro + conclusion
- Voice: active
- Format: Markdown
```
Cost: Not tokens — accuracy. Unstructured instruction stacking gets partial compliance. Structured lists get full compliance.
Mistake 4: The "please be specific" trap
"Please be specific and concrete. Don't be vague. Include examples. Be detailed."
"Include 3 specific examples with numbers. For each, state the context, the action, and the measurable result."
"Be specific" is itself vague. The meta-instruction doesn't fix the specificity problem — a specific instruction does.
Mistake 5: Hedging in the system prompt
"You're an expert but remember you might be wrong, always acknowledge uncertainty, don't make claims without evidence, be humble..."
"Flag claims you're uncertain about with [uncertain: reason]. State confidence level on key claims (high/medium/low)."
Vague humility instructions produce vague over-hedging in output. Specific uncertainty markup produces useful calibration.
Mistake 6: The "improve this" loop
Round 1: "Improve this paragraph."
Round 2: "Make it better."
Round 3: "Make it more engaging."
Round 4: "Hmm, not quite. Try again."
"Identify the 3 weakest sentences and rewrite each. For each rewrite, explain what you changed and why."
Generic improvement requests produce random variations. Specific diagnostic requests produce directed improvements.
Mistake 7: Re-specifying context
(Message 1) [full context]
(Message 2) "As I said before, my company is a B2B SaaS with..."
(Message 3) "Remember my company is B2B SaaS..."
(Message 4) "Like I mentioned, we're a B2B SaaS..."
Set context once. Reference it with a keyword: "For our B2B SaaS (see above)..."
Repeating context wastes tokens on every turn. Long chats compound this quickly.
Mistake 8: The mega-prompt antipattern
A 2,000-token prompt that tries to do 15 things at once: analyze, summarize, translate, critique, generate variants, optimize for SEO, format as Markdown, match brand voice, etc.
Five focused 400-token prompts, each doing one thing well.
Why: Models degrade with complexity. A 15-instruction prompt gets ~60% compliance on each instruction. A 1-instruction prompt gets ~95%. Do the math.
Mistake 9: Asking for N items without constraints
"Give me 20 ideas for blog posts."
Result: 20 ideas, most of them generic or obvious, many overlapping.
"Give me 20 ideas for blog posts. Requirements:
- Each must target a different specific long-tail keyword
- No two can be rephrasings of the same concept
- Each must include the estimated search volume (roughly)
- Skip the 5 most obvious ideas; push for the 15 underserved ones"
Constraints force quality. Without them, the model gives you the fastest possible answers — which are rarely the best.
Mistake 10: Trusting the model's self-evaluation
"How good is this output, on a scale of 1-10?"
Result: "This is a strong response, I'd rate it 8/10."
Why it's wrong: Training bias. Models almost always self-rate favorably. Asking for self-critique in the same session is close to useless.
Open a fresh session. Paste the output. Ask: "Critique this harshly. What are the 3 weakest elements? Rank by severity."
Fresh context = less bias. Adversarial framing = more useful critique.
The 4 quick wins
If you do nothing else, these 4 changes will save you 20-40% of token spend AND improve quality:
1. Delete politeness. Every time. 10-40 tokens saved per prompt. Adds up quickly.
2. Use XML for anything complex. See the XML Prompt Method.
3. One task per prompt. Never "do X AND Y AND Z" if you can split.
4. Replace vague requests with measurable ones. "Concise" → "under 100 words." "Engaging" → "include a question in the first sentence."
The cost math
A company running 100K prompt calls per month at an average 200 tokens of waste per prompt is throwing away:
- 20M tokens/month
- At $3/M for GPT-5: $60/month
- At $15/M for Opus 4: $300/month
- Annually (Opus): $3,600+ in pure waste
Plus the opportunity cost of degraded output quality (harder to measure, probably higher).
If you're using AI at any scale, audit your prompts quarterly. Remove politeness, fix instruction stacking, break up mega-prompts. The ROI is nearly immediate.
The meta-principle
Good prompt engineering is subtractive. Most prompts get better when you remove words, not add them. The minimum-viable prompt that produces your target output is almost always the best prompt.
Start with the shortest version that could possibly work. Add only if output is insufficient. Stop when output is good enough. Resist the urge to over-instruct.