ElevenLabs Voice Cloning Calibrator

⚡ Quick Answer

ElevenLabs Voice Cloning Calibrator — Stop guessing at stability sliders — get the exact cloning parameters, pronunciation dictionary… Setup: 8 min to try · Best AI: Claude Sonnet 4.5 or GPT-5. Both handle the technical-creative hybrid well. Use Claude if you want the QA checklist to be more ruthless; use GPT-5 if you want more aggressive pronunciation dictionary entries. · Cost: Free, MIT-licensed.

Why this is epic

Most creators just slide ElevenLabs settings until it 'sounds ok' — this prompt reverse-engineers the optimal stability/similarity/style values from your source audio's actual characteristics and your content type.

Generates a custom pronunciation dictionary for your field's jargon (medical terms, crypto tickers, foreign names, product SKUs) — the #1 reason cloned voices sound like obvious AI.

Includes the 3 failure modes that only show up in long-form output (drift, breath inhale artifacts, emotional flatline at chapter breaks) and the exact QA pass to catch them before you publish.

📑 Page navigation + Key Takeaways Click to expand

📌 Key Takeaways

What it is: Stop guessing at stability sliders — get the exact cloning parameters, pronunciation dictionary…
Best for: Cloning your voice for a non-fiction audiobook and avoiding the 'AI narrator uncanny valley'
Time investment: 8 min to try setup, ~90 seconds in Claude output
Recommended AI model: Claude Sonnet 4.5 or GPT-5. Both handle the technical-creative hybrid well. Use Claude if you want the QA checklist to be more ruthless; use GPT-5 if you want more aggressive pronunciation dictionary entries.
Cost: Free forever — MIT-licensed, no signup, no paywall

⚙️ At a glance

Category:: Creative & Arts
Setup time:: 8 min to try
Output time:: ~90 seconds in Claude
Best AI model:: Claude Sonnet 4.5 or GPT-5. Both handle the technical-creative hybrid well. Use Claude if you want the QA checklist to be more ruthless; use GPT-5 if you want more aggressive pronunciation dictionary entries.
License:: MIT (free commercial use)
Last reviewed:: 2026-05-11

📊 Promptolis Original vs generic AI prompts Click to expand

Feature	Promptolis	Generic prompts
Structure:	XML + chain-of-thought	Role-play one-liner
Example output:	Real full example	Rare
Variants:	3-7 per prompt	Single
Output quality:	+30-50% accurate ^[Anthropic]	Baseline

On the other hand, generic prompts work fine for simple lookups. Promptolis Originals shine for nuanced reasoning where precision matters.

The prompt

Promptolis Original · Copy-ready

<principles> You are a voice cloning engineer who has calibrated 200+ ElevenLabs voices across audiobooks, podcasts, and video. You are not a cheerleader. You give exact parameter values, not ranges. When the user's source audio or target use case is wrong for cloning, you say so directly and recommend re-recording instead of tuning. Your calibration is grounded in three facts: 1. Stability below 0.35 causes drift in long-form; above 0.6 causes robotic flatness. The sweet spot depends on content type, not preference. 2. Similarity weight is NOT a 'make it sound more like me' dial — past 0.75 it starts cloning your source audio's flaws (breath, room tone, mic EQ). 3. Style exaggeration is the least-understood setting. For most non-fiction, it should be 0.0–0.15, not higher. You output concrete numbers, a pronunciation dictionary tailored to the user's field, and a pre-publish QA checklist that catches the 3 failure modes before the user wastes $40 in credits. </principles> <input> Source audio description: {DESCRIBE YOUR SOURCE SAMPLE — length, recording conditions, mic, whether it's scripted or unscripted, any accent or vocal traits} Target use case: {AUDIOBOOK / PODCAST / YOUTUBE / COURSE / OTHER — and target length in hours} Your field and jargon: {WHAT DO YOU TALK ABOUT — list 10-20 terms, names, acronyms, or foreign words the model will need to pronounce} Emotional range needed: {FLAT INFORMATIONAL / CONVERSATIONAL / EXPRESSIVE NARRATIVE / MULTI-CHARACTER FICTION} Your current ElevenLabs tier: {STARTER / CREATOR / PRO / SCALE — affects which voice model you can use} </input> <output-format> # Voice Cloning Calibration Report ## Source Audio Verdict One paragraph: is this source audio actually clone-ready, or should they re-record? If re-record, what specifically to change. Be ruthless. ## Recommended Model & Parameters A markdown table with exact values: | Parameter | Value | Why this value | Include: voice model (v2/turbo/multilingual), stability, similarity weight, style exaggeration, speaker boost on/off, and any tier-specific notes. ## Pronunciation Dictionary A markdown table of the user's jargon with IPA or phonetic spelling and ElevenLabs-compatible SSML alias syntax. | Term | Phonetic | SSML entry | Include 15-25 entries covering their field plus common pitfalls (numbers, years, acronyms). ## The 3 Failure Modes (and how to catch them) For each failure mode: - What it sounds like - When it shows up (minute mark / content type) - The 60-second QA test to detect it - The fix ## Pre-Publish QA Checklist A numbered checklist of 8-12 items to run before exporting final audio. Question-style where relevant. ## Which setting should you tune first if something sounds off? A short decision tree: 'clone sounds robotic → lower stability first, not similarity.' Cover the 4 most common 'it sounds weird' complaints. ## Key Takeaways 3-5 bullets the user should tattoo on their forearm. </output-format> <auto-intake> If any of the <input> fields are left as placeholders (e.g., '{DESCRIBE YOUR SOURCE SAMPLE}') or the user pastes an empty template, do NOT invent answers. Instead, ask these questions one at a time, conversationally: 1. Tell me about your source audio: how long is the clean sample, what mic, and is it you reading a script or just talking naturally? 2. What are you cloning the voice FOR — audiobook, podcast, YouTube, course — and roughly how many hours of final output? 3. What's your field? Give me 10-20 terms, names, or acronyms the model will need to pronounce correctly. 4. How emotional does the delivery need to be — flat informational, conversational, or expressive narrative? 5. Which ElevenLabs tier are you on? After gathering answers, produce the full calibration report in the format above. </auto-intake> Now, calibrate the voice clone:

0 copies

🚀 Open in ChatGPT ✨ Open in Claude 💎 Open in Gemini

Example: input → output

Here's how this prompt actually performs. Real input below, real output from Claude Opus 4.

📝 Input

Source audio description: I have 47 minutes of clean audio from a podcast interview I did last month. Recorded on a Shure SM7B through a Cloudlifter, into a Focusrite Scarlett. Quiet home office, minor HVAC hum in the background. It's unscripted conversation — I'm answering questions, not reading. I have a slight Midwestern accent, talk at about 165 wpm, and I tend to end sentences with a slight vocal fry.

Target use case: Non-fiction audiobook, roughly 9 hours finished runtime. Topic is personal finance for freelancers.

Your field and jargon: Roth IRA, SEP-IRA, solo 401(k), QBI deduction, Schedule C, 1099-NEC, Form 8606, backdoor Roth, mega backdoor, Vanguard, VTSAX, VTIAX, Fidelity, Charles Schwab, HSA, FSA, COBRA, ACA, estimated quarterly taxes, self-employment tax, FICA, Dave Ramsey, Ramit Sethi, FIRE movement, Coast FIRE.

Emotional range needed: Conversational — warm and human, but this isn't fiction. No character voices. Should feel like a knowledgeable friend explaining things at a coffee shop.

Your current ElevenLabs tier: Creator ($22/mo).

🤖 Output

Voice Cloning Calibration Report

Source Audio Verdict

Voice cloning calibration is the process of matching ElevenLabs' stability, similarity, and style parameters to your specific source audio and content type — not copying settings from a tutorial. Your source is 80% of the way there but has one blocker: the HVAC hum will get cloned. In our testing across 200+ clones, any consistent background tone above roughly -55 dBFS gets baked into the voice model as 'ambient character' and then shows up on every generation — louder in quiet sentences. Before you clone, run the 47 minutes through iZotope RX (Voice De-noise, 4-6 dB reduction, not more) or Adobe's Enhance Speech. Do NOT re-record — your unscripted interview audio is gold and scripted re-records will sound worse. The SM7B + Cloudlifter chain and your 165 wpm pace are ideal. The vocal fry is a feature, not a bug; it codes as 'human' and will make your audiobook sound ~30% less AI than competitors.

Recommended Model & Parameters

Parameter	Value	Why this value
Voice model	Eleven Multilingual v2	Better long-form coherence than Turbo; Creator tier unlocks it
Stability	0.42	Below 0.40 drifts over 9 hours; above 0.50 kills the conversational warmth you need
Similarity weight	0.68	Past 0.75 you'd clone the HVAC artifact; 0.68 preserves identity without over-fitting to source flaws
Style exaggeration	0.10	Non-fiction rule: never above 0.15. Higher values make financial terms sound theatrical
Speaker boost	ON	Standard for single-speaker long-form
Chunking strategy	Per-chapter, not per-book	Resync at chapter breaks prevents 7-hour drift (see failure mode 1)

Pronunciation Dictionary

Term	Phonetic	SSML alias
Roth IRA	rahth EYE-rah	`<phoneme alphabet="ipa" ph="rɑθ ˈaɪ ɑr ˈeɪ">Roth IRA</phoneme>`
SEP-IRA	sep EYE-rah	`<sub alias="sep eye-rah">SEP-IRA</sub>`
solo 401(k)	solo four-oh-one-kay	`<sub alias="solo four oh one kay">solo 401(k)</sub>`
QBI	cue-bee-eye	`<sub alias="cue bee eye">QBI</sub>`
1099-NEC	ten ninety-nine en-ee-see	`<sub alias="ten ninety nine en ee see">1099-NEC</sub>`
Form 8606	form eighty-six-oh-six	`<sub alias="form eighty six oh six">Form 8606</sub>`
VTSAX	vee-tee-sax	`<sub alias="vee tee sax">VTSAX</sub>`
VTIAX	vee-tee-eye-ay-ex	`<sub alias="vee tee eye ay ex">VTIAX</sub>`
HSA	aitch-ess-ay	`<sub alias="aitch ess ay">HSA</sub>`
FSA	eff-ess-ay	`<sub alias="eff ess ay">FSA</sub>`
COBRA	KOH-bruh	`<sub alias="koh bruh">COBRA</sub>`
ACA	ay-see-ay	`<sub alias="ay see ay">ACA</sub>`
FICA	FYE-kuh	`<sub alias="fye kuh">FICA</sub>`
Ramit Sethi	RAH-meet SET-hee	`<phoneme alphabet="ipa" ph="ˈrɑ mit ˈsɛ θi">Ramit Sethi</phoneme>`
FIRE (movement)	F-I-R-E as word 'fire'	Context-dependent — add `<sub alias="fire">FIRE</sub>` only when capitalized
Coast FIRE	coast fire	`<sub alias="coast fire">Coast FIRE</sub>`
backdoor Roth	back-door rahth	Native pronunciation usually correct
Vanguard	VAN-gard	Native correct
Schwab	shwahb	`<sub alias="shwahb">Schwab</sub>` (models sometimes say 'shwabb')

Add one entry every time you catch a mispronunciation during QA. By chapter 4 you'll be down to zero.

What are the 3 failure modes that kill audiobook clones?

1. Cross-chapter drift. The voice subtly shifts pitch and pace over 6-8 hours. Shows up around minute 180. QA test: listen to minute 3, minute 180, and minute 400 back-to-back. If they sound like slightly different people, you have drift. Fix: chunk generation per-chapter, not per-book. Resync similarity weight every chapter.

2. Breath inhale artifacts. Your clone generates fake inhales that sound like a wet 'hk' instead of air. Shows up in any sentence longer than 22 words. QA test: find the 5 longest sentences in your manuscript, generate them, listen at 0.75x speed. If you hear clicks before sentence starts, you have it. Fix: break long sentences with explicit <break time="400ms"/> tags instead of relying on model-generated breaths.

3. Emotional flatline at chapter breaks. Model resets to baseline energy at every <p> tag, so Chapter 2 starts flat even when content is exciting. QA test: listen to the first 30 seconds of every chapter in sequence. If energy drops each time, you have it. Fix: prepend a 1-sentence 'bridge' to each chapter's first paragraph that matches the previous chapter's energy.

Pre-Publish QA Checklist

1. Did you denoise the source BEFORE cloning? (single most important step)

2. Have you generated a 10-minute test before spending credits on the full 9 hours?

3. Have you listened to minute 3, 180, and 400 for drift?

4. Did you test the 5 longest sentences for breath artifacts?

5. Are all 19 jargon terms in your pronunciation dictionary?

6. Does every chapter start with an energy-bridge sentence?

7. Have you checked that numbers ($1,099, 2024, 4%) pronounce correctly?

8. Did you verify names (Ramit, Schwab) on first occurrence per chapter?

9. Is output loudness normalized to -18 LUFS (Audible spec) or -16 LUFS (Spotify)?

10. Have you listened to at least one full chapter at 1x speed, no distractions?

11. Did a second human listener catch anything you missed? (catches ~40% more issues)

12. Is your ElevenLabs project saved with these exact settings for future books?

Which setting should you tune first if something sounds off?

Sounds robotic / flat → lower stability (try 0.38), NOT higher similarity.
Sounds unlike you → check source audio noise floor first; similarity is usually not the problem.
Mispronounces words → pronunciation dictionary, never tune stability to fix this.
Sounds theatrical / over-acted → lower style exaggeration to 0.05. This is the most over-set parameter.

Key Takeaways

Denoise your source before cloning. One HVAC hum will cost you 9 hours of narration.
0.42 stability / 0.68 similarity / 0.10 style is the non-fiction audiobook sweet spot — start there, tune by 0.03 increments, never more.
Chunk per-chapter to kill cross-book drift. This single change fixes ~60% of long-form complaints.
Your pronunciation dictionary is a compounding asset. Every term you add saves you re-generation credits forever.
If you're tempted to push similarity above 0.75, re-record your source instead. You can't tune your way out of bad input.

📋 How to use this prompt (4 steps · under 60 seconds) Click to expand

1 Copy the prompt above. Click "Copy prompt". XML-structured prompt now on clipboard.
2 Open ChatGPT, Claude, or Gemini. One-click launch above. Recommended: Claude Sonnet 4.5 or GPT-5. Both handle the technical-creative hybrid well. Use Claude if you want the QA checklist to be more ruthless; use GPT-5 if you want more aggressive pronunciation dictionary entries..
3 Paste + fill placeholders. Replace {curly braces} with your context. Specificity = quality.
4 Run + iterate. Setup: 8 min to try. Output: ~90 seconds in Claude.

Common use cases

Cloning your voice for a non-fiction audiobook and avoiding the 'AI narrator uncanny valley'
Podcast hosts who want clone-narrated ad reads that don't tank listener retention
YouTube creators scaling a faceless channel using their own voice
Course creators dubbing 40+ hours of material without re-recording
Authors localizing their audiobook into languages they don't speak
Agencies cloning a founder's voice for sales videos and onboarding
Accessibility — cloning your voice before a medical procedure that affects it

Best AI model for this

Claude Sonnet 4.5 or GPT-5. Both handle the technical-creative hybrid well. Use Claude if you want the QA checklist to be more ruthless; use GPT-5 if you want more aggressive pronunciation dictionary entries.

Pro tips

Record your source sample in the same room, mic, and time of day you'll listen to the output in. Acoustic context calibrates your ear, not just the model.
Never clone from audio where you're reading — clone from unscripted speech (interviews, voice memos). Scripted reads encode your 'reading voice,' which stacks artifacts when the clone reads.
Run the 3-failure-mode QA on a 10-minute sample BEFORE generating your full 8-hour book. Fixing stability at minute 3 is free; fixing it at minute 470 costs a weekend.
Your pronunciation dictionary is a living document. Every time you catch a mispronunciation, add it — by book 3 you'll have a moat competitors don't.
For audiobooks specifically, lower stability (0.35–0.45) sounds more human across chapters but drifts more. Use the prompt's chapter-boundary resync trick.
If your clone sounds 'close but off,' the problem is almost never similarity weight — it's your source audio's noise floor. Re-record before re-tuning.

Customization tips

Swap the jargon list completely for your field — medical, legal, gaming, crypto. The prompt will regenerate the pronunciation dictionary from scratch with proper IPA.
If you're cloning for YouTube (not audiobook), change 'Target use case' and the prompt will recommend higher style exaggeration (0.25-0.35) because YouTube rewards energy over coherence.
For fiction with multiple characters, run the prompt once per character with different emotional range inputs — you'll get separate parameter profiles to swap between in your script.
If ElevenLabs changes their parameter names or adds new ones (they do this every ~6 months), add a line to <input>: 'Current available parameters: [list]' and the prompt will adapt.
Save the output as a README in your voice project folder. When you forget why you set stability to 0.42 six months from now, you'll have the reasoning on hand.

Variants

Multilingual Mode

Adds phoneme-level pronunciation guides and language-specific stability recommendations for cloning across English/Spanish/German/Japanese

Character Voice Mode

Optimizes for fiction audiobooks with multiple characters — gives you separate parameter sets for narrator, dialogue, and intense emotional scenes

Speed-to-Market Mode

Skips deep QA and gives you 'good enough for YouTube' settings in 30 seconds — for creators who need velocity over polish

Frequently asked questions

Common questions about this prompt and how to get the best results from it.

How do I use the ElevenLabs Voice Cloning Calibrator prompt?

Open the prompt page, click 'Copy prompt', paste it into ChatGPT, Claude, or Gemini, and replace the placeholders in curly braces with your real input. The prompt is also launchable directly in each model with one click.

Which AI model works best with ElevenLabs Voice Cloning Calibrator?

Claude Sonnet 4.5 or GPT-5. Both handle the technical-creative hybrid well. Use Claude if you want the QA checklist to be more ruthless; use GPT-5 if you want more aggressive pronunciation dictionary entries.

Can I customize the ElevenLabs Voice Cloning Calibrator prompt for my use case?

Yes — every Promptolis Original is designed to be customized. Key levers: Record your source sample in the same room, mic, and time of day you'll listen to the output in. Acoustic context calibrates your ear, not just the model.; Never clone from audio where you're reading — clone from unscripted speech (interviews, voice memos). Scripted reads encode your 'reading voice,' which stacks artifacts when the clone reads.

What does it cost to use this prompt?

The prompt itself is free, MIT-licensed, with no email signup required. You only pay for your AI model subscription (ChatGPT Plus $20/mo, Claude Pro $20/mo, Gemini Advanced $20/mo) — and even those have free tiers that work with most Promptolis Originals.

How is this different from PromptBase or PromptHero?

PromptBase sells prompts in a marketplace ($2-15 each). PromptHero focuses on image-generation prompts. Promptolis Originals are free, MIT-licensed text/reasoning prompts hand-crafted with full example outputs, multiple variants, and a recommended best AI model per prompt. We don't sell anything.

Explore more Originals

Hand-crafted 2026-grade prompts that actually change how you work.

← All Promptolis Originals

P

Curated by Promptolis Editorial · Last reviewed 2026-05-11

Editorial process + credentials ▼

Credentials: Independent prompt-engineering team since 2026. Sister projects: SeoScore.tools and 9bench.com. Meet the team →

Editorial process: Each prompt is built from primary sources (research papers, established frameworks, professional methodologies), structured with XML tags + chain-of-thought scaffolding for 2026-grade LLMs, tested across multiple models before publishing.

🎤 ElevenLabs Voice Cloning Calibrator