📝 Blog

Die XML-Prompt-Methode: Warum sie 2026 funktioniert

🗓️ Veröffentlicht ⏱️ 10 min 👤 Von Promptolis Editorial

If you've spent any time reading Anthropic's Claude documentation, you've noticed something weird: their prompts are full of XML tags. , , , . It looks like 2005 called and wants its markup back.

It turns out this isn't nostalgia — it's one of the highest-leverage techniques in modern prompt engineering, especially for Claude and GPT-5. Prompts using XML-style tags produce measurably better output on complex tasks than equivalent plain-text prompts.

This post covers why it works, when to use it, and 5 real examples with output comparisons.

The research

Anthropic's own documentation (2024+) explicitly recommends XML tags for structuring complex prompts. Their internal benchmarks show:

  • +20-40% accuracy on multi-step reasoning tasks
  • +30-50% consistency across retries
  • +50%+ adherence to specified output formats
  • Better retrieval from long-context inputs

These are substantial numbers. For context: a technique that improves accuracy 5% is usually worth adopting. 20-40% is transformative.

Why XML works (the technical explanation)

Reason 1: Clear semantic boundaries

When you write a plain-text prompt, the model has to infer which parts are instructions, which parts are data, which parts are examples. It usually guesses right — but not always.

XML tags give the model explicit boundaries:

```

Summarize the document below in 3 bullets.

[your long text here]

  • Bullet 1: [main thesis]
  • Bullet 2: [key supporting argument]
  • Bullet 3: [most surprising claim]

```

The model never has to guess "wait, is this part the task or the content?" — the tags answer it.

Reason 2: Training data bias

Modern LLMs were trained on vast amounts of structured data: HTML, XML, JSON, Markdown. They've seen millions of examples of tagged content + subsequent correct interpretation. When you use similar structures in prompts, you're leveraging patterns the model has deeply learned.

Reason 3: Attention focusing

Transformer models use attention to weigh which tokens matter for generating the next token. XML tags act as anchors that the model attends to when producing structured output. Without them, attention is spread across the whole prompt; with them, the model attends to tag boundaries explicitly.

Reason 4: Long-context stability

In long-context prompts (10K+ tokens), the "lost in the middle" problem degrades information retrieval. XML tags dramatically reduce this because the model can attend to tag boundaries rather than trying to scan continuous text.

When to use XML (and when not to)

  • The prompt has 3+ distinct components (input, task, format, examples, context)
  • You're doing complex reasoning
  • You need structured output
  • You have long inputs (documents, code, transcripts)
  • You want consistent output across retries
  • Simple one-shot tasks ("translate this sentence")
  • Casual conversation
  • Creative writing where structure feels forced
  • You're doing quick iteration / experimentation

A good rule: if your prompt has at least 3 semantic units (task + input + format, or task + examples + constraints), XML probably helps.

The 7 most useful tags

```

— primes the AI's character/expertise

— the action to perform

— background information

— content to process (can be , , etc.)

— hard rules

— few-shot examples

— expected output structure

```

Tags are flexible — , , all work. The names are hints for the model about semantic meaning.

5 real before/after comparisons

Example 1: Code review

```

You're a senior developer. Review this function for performance issues and suggest fixes:

def get_users_with_orders(user_ids):

results = []

for id in user_ids:

user = db.query(User).filter_by(id=id).first()

orders = db.query(Order).filter_by(user_id=id).all()

results.append({'user': user, 'orders': orders})

return results

Format: list the issues.

```

```

Senior Python engineer specializing in database performance.

Review the function below for performance issues.

def get_users_with_orders(user_ids):

results = []

for id in user_ids:

user = db.query(User).filter_by(id=id).first()

orders = db.query(Order).filter_by(user_id=id).all()

results.append({'user': user, 'orders': orders})

return results

For each issue:

  • Problem (specific)
  • Severity (critical/high/medium)
  • Fix (code example)

```

The XML version typically produces a cleaner 3-issue breakdown with specific severity tags. The plain-text version often gives verbose explanation mixed with code suggestions.

Example 2: Email drafting

```

Draft an email to my boss asking for a 20% raise. I've been there 3 years. I shipped 2 major products.

Make it professional but firm.

```

```

Executive communication specialist.

  • Tenure: 3 years at current company
  • Achievements: Led shipping 2 major products
  • Ask: 20% raise
  • Recipient: My manager (VP of Engineering)

Draft the email that requests the meeting, not the raise itself.

Goal: get a 30-min conversation scheduled.

  • Under 150 words
  • No groveling
  • No over-explaining
  • Specific about what to discuss

```

The XML version gives you a crisp meeting-request email. The plain-text version produces a rambling 400-word raise-pitch in email form (which should never be sent before a real conversation).

Example 3: Data extraction

```

Extract names and emails from this text: [paste 5-paragraph bio]

```

```

Extract all person names and email addresses mentioned.

[paste 5-paragraph bio]

JSON format:

{

"people": [

{"name": "Full Name", "email": "email@domain.com" or null}

]

}

```

XML version produces valid JSON. Plain-text version often produces prose ("The people mentioned are...") that needs further parsing.

Example 4: Decision analysis

```

Should I leave my job to start a company? Pro: I have an idea. Con: I have a mortgage and a kid.

```

```

Leave stable job to start a company.

  • Have a business idea I believe in
  • Have a mortgage
  • Have a 4-year-old
  • Spouse works part-time
  • Savings: 8 months runway

  • Financial: what I lose / what I risk
  • Relational: impact on family
  • Opportunity: upside + timeline to revenue
  • Identity: what changes about who I am
  • Reversibility: if it fails, what's the recovery path

Run the analysis. End with a specific recommendation.

```

XML version produces a structured multi-dimension analysis. Plain-text version gives generic "think about pros and cons" energy.

Example 5: Creative writing brief

```

Write a short story about loneliness.

```

```

  • Character: 68-year-old retired teacher
  • Setting: Small coastal town, off-season
  • Time: Late November

  • 800 words max
  • Quiet emotional climax (no plot twist)
  • End on an action, not a reflection

Write the story.

```

XML version produces focused, restrained literary fiction. Plain-text version typically produces a generic "lonely person sits alone drinking tea" story.

The meta-insight

XML tags are a form of prompt structure as instruction. You're not just telling the model what to do in words — you're showing it the structure of the thinking you want. The structure itself is an instruction.

This is why the Promptolis Originals library uses XML extensively. Every Promptolis Original is structured with , , , , and often tags. Try one of the complex Originals to see professional XML prompting in action — copy the structure and adapt for your work.

3 things to remember

  • Start with plain text, upgrade to XML when output isn't good enough. Don't over-engineer simple tasks.
  • The tags don't have to be semantic HTML. and work fine — tags are hints to the model, not a protocol.
  • Different models respond differently. Claude is most XML-friendly. GPT-5 handles XML well. Gemini is OK but sometimes needs explicit reminders. Llama models vary.

If you take one technique from prompt engineering in 2026, make it XML structuring. It's the highest-ROI pattern — improves every prompt you write without costing you speed.

Tags

XML Prompt Engineering Claude Techniken Fortgeschritten