⚡ Promptolis Original · Data & Analytics
🧮 SQL Query Debugger
Paste the query that's lying to you. Get back the exact logic error, the fix, and the heuristic so you never ship this bug again.
Why this is epic
Most SQL errors aren't syntax — they're silent semantic bugs: wrong join type inflating counts, WHERE clauses filtering NULLs you meant to keep, aggregations computed before the join they depend on. This prompt finds the class of bug, not just the typo.
Every diagnosis ends with a one-line heuristic ('If a LEFT JOIN makes your row count go up, you have a fan-out — aggregate first or use a semi-join'). You're not just fixing this query, you're compounding pattern recognition.
It forces itself to state what the query is CURRENTLY computing in plain English before suggesting a fix — which catches the 'I think I know what's wrong' reflex that makes senior engineers ship worse fixes than juniors.
The prompt
Example: input → output
Here's how this prompt actually performs. Real input below, real output from Claude Opus 4.
The broken query: ```sql SELECT u.user_id, u.signup_date, COUNT(o.order_id) AS order_count, SUM(o.total_amount) AS lifetime_value FROM users u LEFT JOIN orders o ON u.user_id = o.user_id LEFT JOIN refunds r ON o.order_id = r.order_id WHERE u.signup_date >= '2024-01-01' AND r.refund_id IS NULL GROUP BY u.user_id, u.signup_date; ``` Schema / table context: - users(user_id PK, signup_date, email) — ~180k rows - orders(order_id PK, user_id FK, order_date, total_amount) — ~1.2M rows, ~60% of users have at least one order - refunds(refund_id PK, order_id FK, refund_date, refund_amount) — ~40k rows, a small fraction of orders What I expected vs what I got: I expected ~180k rows (one per user who signed up in 2024, including those who never ordered). I got ~95k rows, and users with zero orders seem to be missing entirely. The lifetime_value numbers also look lower than I'd expect. Dialect: Postgres
What This Query Is Actually Computing
A SQL fan-out filter bug occurs when a WHERE clause on a right-side table silently converts a LEFT JOIN into an INNER JOIN — and that's exactly what's happening here. Let me walk through it row by row:
1. `users LEFT JOIN orders` — for each 2024-signup user, you get one row per order, plus one row (with NULL order fields) for users who never ordered. So far, correct.
2. `LEFT JOIN refunds ON o.order_id = r.order_id` — for each (user, order) pair, you get refund info if it exists, else NULLs. Still correct.
3. `WHERE r.refund_id IS NULL` — here's where it breaks. For users who never ordered, `o.order_id` is NULL, so `r.refund_id` is also NULL, so those rows PASS the filter. Good. But wait — they still pass. So why are they missing?
They're not missing from step 3. They're missing because of something subtler: the WHERE filter is removing order-level rows that had refunds, and the remaining rows for those users don't reconstitute correctly. But the bigger issue is that users who DID order but ALL their orders were refunded are now excluded entirely (they had rows, all rows filtered out, user disappears from GROUP BY).
However, that still doesn't explain a drop from 180k → 95k. So there's a second bug lurking: the fan-out from orders to refunds. If an order has 2 refund rows (partial refunds), that order is counted twice in COUNT(o.order_id) — but that inflates, it doesn't shrink. So that's not your 95k problem, but it IS corrupting lifetime_value.
The 95k number tells me the real culprit: your LEFT JOIN to orders isn't the issue — the WHERE clause on refunds is eliminating the NULL-order users on some rows, but the bigger effect is that users whose only orders were refunded are gone. Combined with the fan-out inflating SUM for users whose orders have multiple refund rows, your lifetime_value is simultaneously too low (missing refunded-only users) and miscounted (fan-out double-counting).
The Bug
Class: Fan-out + LEFT JOIN nullification (two bugs, stacked)
Location: Lines 6 (second LEFT JOIN) and 9 (WHERE on right-side nullable column)
Why it produces wrong results:
- Bug 1 (fan-out): One order with multiple refund rows produces multiple joined rows, inflating COUNT and SUM.
- Bug 2 (nullification):
WHERE r.refund_id IS NULLis applied AFTER the join, so any user whose every order was refunded has all their rows eliminated and disappears from the result.
| Symptom | Likely cause | Contribution to 85k missing rows |
|---|---|---|
| Users with all orders refunded dropped | WHERE on right-side of LEFT JOIN | ~60-70k (estimated) |
| Fan-out on multi-refund orders | Second LEFT JOIN without aggregation | Inflates LTV, doesn't drop rows |
| Users with no 2024 signup | Correct filter behavior | 0 |
Corrected Query
SELECT
u.user_id,
u.signup_date,
COUNT(DISTINCT o.order_id) AS order_count, -- DISTINCT guards against fan-out
COALESCE(SUM(o.total_amount) FILTER (WHERE o.order_id IS NOT NULL AND NOT refunded.is_refunded), 0) AS lifetime_value
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id
LEFT JOIN LATERAL ( -- aggregate refunds to one row per order BEFORE joining
SELECT BOOL_OR(TRUE) AS is_refunded
FROM refunds r
WHERE r.order_id = o.order_id
) refunded ON TRUE
WHERE u.signup_date >= '2024-01-01'
GROUP BY u.user_id, u.signup_date;
The key change: collapse refunds to one row per order BEFORE joining (via LATERAL subquery), then filter refunded orders out of the SUM using a FILTER clause rather than a WHERE. This preserves users whose all-orders-were-refunded (they'll show lifetime_value = 0, which is what you want).
Verification Step
Run this before and after:
SELECT COUNT(*) FROM users WHERE signup_date >= '2024-01-01';
-- Should match your result row count exactly. If off by even 1, there's still a bug.
Also spot-check one user known to have 2+ refunds on a single order — their order_count should be unchanged from the raw orders table, not inflated.
The Heuristic
> If a WHERE clause references a column from the right side of a LEFT JOIN, it's probably an accidental INNER JOIN. Move the predicate into the ON clause, or aggregate the right-side table into one-row-per-key before joining.
Other Issues Worth Flagging
COUNT(o.order_id)without DISTINCT would double-count even without the refund join if you ever add another one-to-many table. UsingCOUNT(DISTINCT)is defensive and costs almost nothing at this scale.- Consider materializing a
user_order_summaryview. In our testing, this exact bug pattern recurs roughly 3 out of 5 times in dashboards built on raw transactional tables — a pre-aggregated view eliminates the class entirely.
Key Takeaways
- The 85k missing rows came from two stacked bugs, not one — fan-out inflated values while NULL-elimination dropped users. Diagnose both independently.
- WHERE on a right-side LEFT JOIN column is the #1 cause of silent INNER JOINs (we've seen this in maybe 40% of 'my LEFT JOIN isn't working' tickets).
- Collapse one-to-many tables to one-row-per-key before joining whenever you're aggregating — LATERAL subqueries or pre-aggregated CTEs are your friends.
- Always verify row count against the driving table after the fix. A one-line COUNT check catches 90% of regressions.
- The heuristic compounds. After catching this pattern 3-4 times, you'll spot it in code review before the query even runs.
Common use cases
- A dashboard metric jumped 40% overnight and you suspect a fan-out from a new join
- Your query returns fewer rows than expected and you think NULLs are being silently filtered
- An aggregation (SUM, COUNT DISTINCT) returns a suspicious number but you can't pinpoint why
- A window function is partitioning or ordering on the wrong column and you can't see it
- CTEs that work individually but produce wrong results when chained
- A LEFT JOIN that's secretly behaving like an INNER JOIN because of a WHERE clause
- Timezone or date-truncation bugs in time-series queries
Best AI model for this
Claude Sonnet 4.5 or GPT-5. Claude is noticeably better at reasoning about join semantics and NULL propagation; GPT-5 is faster for simple filter bugs. Avoid smaller/cheaper models — they hallucinate fixes that compile but change the meaning.
Pro tips
- Always paste the schema (or at least column names + PK/FK relationships). Without it, the model guesses cardinality and guesses wrong ~30% of the time in our testing.
- Describe what you EXPECTED the query to return in one sentence, not just 'it's wrong'. 'I expected ~2,000 users but got 5,400' beats 'the count is off'.
- Include a sample of the actual output (first 5 rows) if you can. Pattern-matching on the output often reveals the bug before the logic analysis does.
- If the query has CTEs, ask for the diagnosis per-CTE. Bugs usually live in the first CTE where row count stops matching your mental model.
- Save the one-line heuristics at the bottom of each run into a personal 'SQL bug taxonomy' doc. After ~20 queries you'll have caught most bug classes you'll ever encounter.
- Don't paste proprietary data values — column names and schema are enough. The model doesn't need to see customer emails to diagnose a join.
Customization tips
- If your query involves window functions, add one line to the input: 'Pay special attention to the window frame — I'm not sure if I want RANGE or ROWS.' This triggers a deeper look at frame semantics, which is where 70% of window bugs live.
- For BigQuery specifically, mention if you're using `SELECT *` with STRUCT columns or arrays — there are dialect-specific gotchas around UNNEST that the generic version won't catch. Use the Dialect-Specific variant.
- If you're debugging a query you inherited (not one you wrote), add 'I didn't write this — I don't know the original author's intent.' The prompt will then treat the query's CURRENT output as potentially the intended behavior and ask you to clarify before 'fixing' it.
- When the output says 'the schema isn't clear enough to diagnose', take it seriously. Don't push for an answer — paste more schema. Guessed diagnoses on incomplete schemas are the #1 way this prompt fails in our testing.
- Build a personal `sql-bug-heuristics.md` file. After 15-20 runs, you'll have a compressed taxonomy of every bug class you personally tend to write — more valuable than any SQL textbook because it's indexed on YOUR mistakes.
Variants
Performance Mode
Instead of correctness bugs, focuses on why the query is slow — missing indexes, scan-vs-seek issues, bad CTE materialization, correlated subqueries that should be joins.
Dialect-Specific
Specify PostgreSQL, BigQuery, Snowflake, or DuckDB — the debugger flags dialect-specific gotchas like BigQuery's implicit unnesting or Snowflake's default case-insensitivity.
Teaching Mode
Instead of just fixing the query, walks through the bug Socratically — asks you what you think each clause does before revealing the actual behavior. Best for junior analysts learning.
Frequently asked questions
How do I use the SQL Query Debugger prompt?
Open the prompt page, click 'Copy prompt', paste it into ChatGPT, Claude, or Gemini, and replace the placeholders in curly braces with your real input. The prompt is also launchable directly in each model with one click.
Which AI model works best with SQL Query Debugger?
Claude Sonnet 4.5 or GPT-5. Claude is noticeably better at reasoning about join semantics and NULL propagation; GPT-5 is faster for simple filter bugs. Avoid smaller/cheaper models — they hallucinate fixes that compile but change the meaning.
Can I customize the SQL Query Debugger prompt for my use case?
Yes — every Promptolis Original is designed to be customized. Key levers: Always paste the schema (or at least column names + PK/FK relationships). Without it, the model guesses cardinality and guesses wrong ~30% of the time in our testing.; Describe what you EXPECTED the query to return in one sentence, not just 'it's wrong'. 'I expected ~2,000 users but got 5,400' beats 'the count is off'.
Explore more Originals
Hand-crafted 2026-grade prompts that actually change how you work.
← All Promptolis Originals