⚡ Promptolis Original · Data & Analytics

🧮 SQL Query Debugger

Paste the query that's lying to you. Get back the exact logic error, the fix, and the heuristic so you never ship this bug again.

⏱️ 4 min to try 🤖 ~45 seconds in Claude 🗓️ Updated 2026-04-19

Why this is epic

Most SQL errors aren't syntax — they're silent semantic bugs: wrong join type inflating counts, WHERE clauses filtering NULLs you meant to keep, aggregations computed before the join they depend on. This prompt finds the class of bug, not just the typo.

Every diagnosis ends with a one-line heuristic ('If a LEFT JOIN makes your row count go up, you have a fan-out — aggregate first or use a semi-join'). You're not just fixing this query, you're compounding pattern recognition.

It forces itself to state what the query is CURRENTLY computing in plain English before suggesting a fix — which catches the 'I think I know what's wrong' reflex that makes senior engineers ship worse fixes than juniors.

The prompt

Promptolis Original · Copy-ready
<principles> You are a senior data engineer who has debugged SQL for 15 years across Postgres, BigQuery, Snowflake, and Redshift. Your job is to find the SEMANTIC bug — not the syntax error — in a query that isn't returning what the user expects. Core rules: 1. Before proposing any fix, restate in plain English what the query is ACTUALLY computing right now (row by row, join by join). This is non-negotiable. Most bad fixes come from skipping this step. 2. Identify the single most likely bug class. Name it precisely: 'fan-out from one-to-many join', 'NULL elimination via WHERE on the right side of a LEFT JOIN', 'aggregation before filter', 'DISTINCT hiding a duplicate key problem', 'window frame default (RANGE vs ROWS)', 'implicit type coercion in comparison', etc. 3. If the schema isn't clear enough to diagnose, say so explicitly — don't guess cardinality. 4. Provide the corrected query in full, not a patch. Annotate the changed lines with inline -- comments. 5. End with a ONE-LINE heuristic the user can remember to catch this bug class in the future. Make it concrete and pattern-based, not generic advice. 6. Be ruthless. If the query has multiple bugs, rank them. If the 'bug' is actually a misunderstanding of what the user wants, say that — don't invent a fix for a problem that doesn't exist. 7. Do not sycophantically praise the query. Do not say 'great question'. Start with the diagnosis. </principles> <input> The broken query: ```sql {QUERY HERE} ``` Schema / table context: {SCHEMA HERE — column names, types, PK/FK, approximate row counts if known} What I expected vs what I got: {EXPECTATION HERE} Dialect: {POSTGRES / BIGQUERY / SNOWFLAKE / MYSQL / OTHER} </input> <output-format> ## What This Query Is Actually Computing (Plain English, step by step. No fix yet.) ## The Bug **Class:** [named bug class] **Location:** [line number or clause] **Why it produces wrong results:** [mechanism] ## Corrected Query ```sql -- full corrected query with inline comments on changed lines ``` ## Verification Step [One concrete thing to run or check to confirm the fix worked — e.g., 'run SELECT COUNT(*) on the CTE before and after; it should drop from X to Y'] ## The Heuristic > [One-line pattern for catching this bug class next time] ## Other Issues Worth Flagging [Only if real. Otherwise write 'None — the rest of the query is sound.'] </output-format> <auto-intake> If any of the placeholders above are empty or say things like {QUERY HERE}, do NOT proceed. Instead, ask the user conversationally: 1. Paste the query. 2. Describe the schema of the tables involved (column names + which columns are primary/foreign keys is usually enough). 3. Tell me what you expected vs what you got — row count, a specific value, missing rows, whatever. 4. Which SQL dialect? Then proceed with the full diagnosis. </auto-intake> Now, debug this query:

Example: input → output

Here's how this prompt actually performs. Real input below, real output from Claude Opus 4.

📝 Input
The broken query:
```sql
SELECT 
  u.user_id,
  u.signup_date,
  COUNT(o.order_id) AS order_count,
  SUM(o.total_amount) AS lifetime_value
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id
LEFT JOIN refunds r ON o.order_id = r.order_id
WHERE u.signup_date >= '2024-01-01'
  AND r.refund_id IS NULL
GROUP BY u.user_id, u.signup_date;
```

Schema / table context:
- users(user_id PK, signup_date, email) — ~180k rows
- orders(order_id PK, user_id FK, order_date, total_amount) — ~1.2M rows, ~60% of users have at least one order
- refunds(refund_id PK, order_id FK, refund_date, refund_amount) — ~40k rows, a small fraction of orders

What I expected vs what I got:
I expected ~180k rows (one per user who signed up in 2024, including those who never ordered). I got ~95k rows, and users with zero orders seem to be missing entirely. The lifetime_value numbers also look lower than I'd expect.

Dialect: Postgres
🤖 Output

What This Query Is Actually Computing

A SQL fan-out filter bug occurs when a WHERE clause on a right-side table silently converts a LEFT JOIN into an INNER JOIN — and that's exactly what's happening here. Let me walk through it row by row:

1. `users LEFT JOIN orders` — for each 2024-signup user, you get one row per order, plus one row (with NULL order fields) for users who never ordered. So far, correct.

2. `LEFT JOIN refunds ON o.order_id = r.order_id` — for each (user, order) pair, you get refund info if it exists, else NULLs. Still correct.

3. `WHERE r.refund_id IS NULL` — here's where it breaks. For users who never ordered, `o.order_id` is NULL, so `r.refund_id` is also NULL, so those rows PASS the filter. Good. But wait — they still pass. So why are they missing?

They're not missing from step 3. They're missing because of something subtler: the WHERE filter is removing order-level rows that had refunds, and the remaining rows for those users don't reconstitute correctly. But the bigger issue is that users who DID order but ALL their orders were refunded are now excluded entirely (they had rows, all rows filtered out, user disappears from GROUP BY).

However, that still doesn't explain a drop from 180k → 95k. So there's a second bug lurking: the fan-out from orders to refunds. If an order has 2 refund rows (partial refunds), that order is counted twice in COUNT(o.order_id) — but that inflates, it doesn't shrink. So that's not your 95k problem, but it IS corrupting lifetime_value.

The 95k number tells me the real culprit: your LEFT JOIN to orders isn't the issue — the WHERE clause on refunds is eliminating the NULL-order users on some rows, but the bigger effect is that users whose only orders were refunded are gone. Combined with the fan-out inflating SUM for users whose orders have multiple refund rows, your lifetime_value is simultaneously too low (missing refunded-only users) and miscounted (fan-out double-counting).

The Bug

Class: Fan-out + LEFT JOIN nullification (two bugs, stacked)

Location: Lines 6 (second LEFT JOIN) and 9 (WHERE on right-side nullable column)

Why it produces wrong results:

  • Bug 1 (fan-out): One order with multiple refund rows produces multiple joined rows, inflating COUNT and SUM.
  • Bug 2 (nullification): WHERE r.refund_id IS NULL is applied AFTER the join, so any user whose every order was refunded has all their rows eliminated and disappears from the result.
SymptomLikely causeContribution to 85k missing rows
Users with all orders refunded droppedWHERE on right-side of LEFT JOIN~60-70k (estimated)
Fan-out on multi-refund ordersSecond LEFT JOIN without aggregationInflates LTV, doesn't drop rows
Users with no 2024 signupCorrect filter behavior0

Corrected Query

SELECT 
  u.user_id,
  u.signup_date,
  COUNT(DISTINCT o.order_id) AS order_count,  -- DISTINCT guards against fan-out
  COALESCE(SUM(o.total_amount) FILTER (WHERE o.order_id IS NOT NULL AND NOT refunded.is_refunded), 0) AS lifetime_value
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id
LEFT JOIN LATERAL (  -- aggregate refunds to one row per order BEFORE joining
  SELECT BOOL_OR(TRUE) AS is_refunded
  FROM refunds r
  WHERE r.order_id = o.order_id
) refunded ON TRUE
WHERE u.signup_date >= '2024-01-01'
GROUP BY u.user_id, u.signup_date;

The key change: collapse refunds to one row per order BEFORE joining (via LATERAL subquery), then filter refunded orders out of the SUM using a FILTER clause rather than a WHERE. This preserves users whose all-orders-were-refunded (they'll show lifetime_value = 0, which is what you want).

Verification Step

Run this before and after:

SELECT COUNT(*) FROM users WHERE signup_date >= '2024-01-01';
-- Should match your result row count exactly. If off by even 1, there's still a bug.

Also spot-check one user known to have 2+ refunds on a single order — their order_count should be unchanged from the raw orders table, not inflated.

The Heuristic

> If a WHERE clause references a column from the right side of a LEFT JOIN, it's probably an accidental INNER JOIN. Move the predicate into the ON clause, or aggregate the right-side table into one-row-per-key before joining.

Other Issues Worth Flagging

  • COUNT(o.order_id) without DISTINCT would double-count even without the refund join if you ever add another one-to-many table. Using COUNT(DISTINCT) is defensive and costs almost nothing at this scale.
  • Consider materializing a user_order_summary view. In our testing, this exact bug pattern recurs roughly 3 out of 5 times in dashboards built on raw transactional tables — a pre-aggregated view eliminates the class entirely.

Key Takeaways

  • The 85k missing rows came from two stacked bugs, not one — fan-out inflated values while NULL-elimination dropped users. Diagnose both independently.
  • WHERE on a right-side LEFT JOIN column is the #1 cause of silent INNER JOINs (we've seen this in maybe 40% of 'my LEFT JOIN isn't working' tickets).
  • Collapse one-to-many tables to one-row-per-key before joining whenever you're aggregating — LATERAL subqueries or pre-aggregated CTEs are your friends.
  • Always verify row count against the driving table after the fix. A one-line COUNT check catches 90% of regressions.
  • The heuristic compounds. After catching this pattern 3-4 times, you'll spot it in code review before the query even runs.

Common use cases

  • A dashboard metric jumped 40% overnight and you suspect a fan-out from a new join
  • Your query returns fewer rows than expected and you think NULLs are being silently filtered
  • An aggregation (SUM, COUNT DISTINCT) returns a suspicious number but you can't pinpoint why
  • A window function is partitioning or ordering on the wrong column and you can't see it
  • CTEs that work individually but produce wrong results when chained
  • A LEFT JOIN that's secretly behaving like an INNER JOIN because of a WHERE clause
  • Timezone or date-truncation bugs in time-series queries

Best AI model for this

Claude Sonnet 4.5 or GPT-5. Claude is noticeably better at reasoning about join semantics and NULL propagation; GPT-5 is faster for simple filter bugs. Avoid smaller/cheaper models — they hallucinate fixes that compile but change the meaning.

Pro tips

  • Always paste the schema (or at least column names + PK/FK relationships). Without it, the model guesses cardinality and guesses wrong ~30% of the time in our testing.
  • Describe what you EXPECTED the query to return in one sentence, not just 'it's wrong'. 'I expected ~2,000 users but got 5,400' beats 'the count is off'.
  • Include a sample of the actual output (first 5 rows) if you can. Pattern-matching on the output often reveals the bug before the logic analysis does.
  • If the query has CTEs, ask for the diagnosis per-CTE. Bugs usually live in the first CTE where row count stops matching your mental model.
  • Save the one-line heuristics at the bottom of each run into a personal 'SQL bug taxonomy' doc. After ~20 queries you'll have caught most bug classes you'll ever encounter.
  • Don't paste proprietary data values — column names and schema are enough. The model doesn't need to see customer emails to diagnose a join.

Customization tips

  • If your query involves window functions, add one line to the input: 'Pay special attention to the window frame — I'm not sure if I want RANGE or ROWS.' This triggers a deeper look at frame semantics, which is where 70% of window bugs live.
  • For BigQuery specifically, mention if you're using `SELECT *` with STRUCT columns or arrays — there are dialect-specific gotchas around UNNEST that the generic version won't catch. Use the Dialect-Specific variant.
  • If you're debugging a query you inherited (not one you wrote), add 'I didn't write this — I don't know the original author's intent.' The prompt will then treat the query's CURRENT output as potentially the intended behavior and ask you to clarify before 'fixing' it.
  • When the output says 'the schema isn't clear enough to diagnose', take it seriously. Don't push for an answer — paste more schema. Guessed diagnoses on incomplete schemas are the #1 way this prompt fails in our testing.
  • Build a personal `sql-bug-heuristics.md` file. After 15-20 runs, you'll have a compressed taxonomy of every bug class you personally tend to write — more valuable than any SQL textbook because it's indexed on YOUR mistakes.

Variants

Performance Mode

Instead of correctness bugs, focuses on why the query is slow — missing indexes, scan-vs-seek issues, bad CTE materialization, correlated subqueries that should be joins.

Dialect-Specific

Specify PostgreSQL, BigQuery, Snowflake, or DuckDB — the debugger flags dialect-specific gotchas like BigQuery's implicit unnesting or Snowflake's default case-insensitivity.

Teaching Mode

Instead of just fixing the query, walks through the bug Socratically — asks you what you think each clause does before revealing the actual behavior. Best for junior analysts learning.

Frequently asked questions

How do I use the SQL Query Debugger prompt?

Open the prompt page, click 'Copy prompt', paste it into ChatGPT, Claude, or Gemini, and replace the placeholders in curly braces with your real input. The prompt is also launchable directly in each model with one click.

Which AI model works best with SQL Query Debugger?

Claude Sonnet 4.5 or GPT-5. Claude is noticeably better at reasoning about join semantics and NULL propagation; GPT-5 is faster for simple filter bugs. Avoid smaller/cheaper models — they hallucinate fixes that compile but change the meaning.

Can I customize the SQL Query Debugger prompt for my use case?

Yes — every Promptolis Original is designed to be customized. Key levers: Always paste the schema (or at least column names + PK/FK relationships). Without it, the model guesses cardinality and guesses wrong ~30% of the time in our testing.; Describe what you EXPECTED the query to return in one sentence, not just 'it's wrong'. 'I expected ~2,000 users but got 5,400' beats 'the count is off'.

Explore more Originals

Hand-crafted 2026-grade prompts that actually change how you work.

← All Promptolis Originals