πŸ“ Blog

ChatGPT Images 2.0: 7 Prompts That Look Right But Are Wrong

πŸ—“οΈ Published ⏱️ 7 min πŸ‘€ By Promptolis Editorial

This is the article that should have been written the week ChatGPT Images 2.0 launched. Instead, every AI blog rushed out "Top 30 Amazing Prompts" lists. Most of those prompts produce visually convincing images that fail the second test β€” the test where the output actually has to be correct.

We tested gpt-image-2 against the seven categories where it most reliably produces "looks right but is wrong" outputs. These aren't gotchas β€” they're documented failure modes from OpenAI's own launch notes, the OpenAI Developer Community bug threads, and the deception-research analysis from 36kr. If you're using gpt-image-2 for anything client-facing, educational, or commercial, you need to know exactly where to be skeptical.

Each section: the prompt category, why it fails, what the failure costs you, and the right alternative tool.

If you want the foundational review first: ChatGPT Images 2.0 Honest Guide. This article goes deeper on the failure modes that guide briefly mentions.

---

1. Technical Diagrams and Cross-Sections

```

Technical cross-section diagram of a car engine showing all parts

labeled β€” pistons, crankshaft, valves, intake manifold, exhaust manifold.

```

What you get: A visually convincing technical illustration. The aesthetic is correct. The labels are readable. The parts look mechanically plausible.

Why it's wrong: OpenAI's launch documentation explicitly states "physical reasoning remains weak." Specific failure modes we've documented:

  • Labels frequently point to the wrong part
  • Mechanical relationships shown don't function physically (gears that don't actually mesh, valves positioned where they couldn't open)
  • Critical parts are sometimes omitted entirely
  • Cross-section geometry can violate the laws of how engines actually look in section view

What it costs you: If this image goes into a textbook, a YouTube tutorial, or any educational context, you've taught wrong information. Students remember imagery. Wrong imagery is sticky.

  • For accuracy: licensed technical illustration from sources like Shutterstock's technical category, Adobe Stock, or commission a specialist
  • For speed + accuracy: hire an illustrator from Reedsy or Upwork who's done 50+ technical illustrations in your category
  • For professional documentation: use proper CAD software with section-view rendering

When gpt-image-2 is acceptable: purely decorative or impressionistic technical imagery where accuracy doesn't matter (a stylized car for a car-themed birthday card, decorative engine art for a garage).

---

2. Anything Requiring Specific Counts

```

Generate an aerial photograph of a marathon with exactly 47 runners

crossing the finish line.

```

What you get: An aerial photograph that looks correct, with somewhere between 31 and 52 runners depending on the generation.

Why it's wrong: From the OpenAI Developer Community: gpt-image-2 was documented generating a Boston Marathon visual claiming "127 years of tradition" when the correct number is 129. When asked to recount its own generated content, it claimed "41 people" for an image that actually contained 35.

The model cannot reliably:

  • Generate a specific count of objects
  • Count its own generated content accurately
  • Maintain numerical accuracy in stat-based imagery

What it costs you: Sports content, infographics, inventory visualizations, any analytical imagery β€” every numerical claim becomes suspect. If the count of runners in your marathon image is wrong, why should anyone trust the data label "127 years of tradition" you'll inevitably composite next to it?

  • Use qualitative phrasing in your prompts: "a dense crowd of runners" not "47 runners"
  • Generate the visual concept, then composite numbers and labels in Figma using verified data
  • For real data visualization: D3.js, Datawrapper, or Tableau β€” actual data tools

When gpt-image-2 is acceptable: when the count doesn't matter at all (background crowd, abstract masses, unspecified quantities).

---

3. Brand Logo Reproduction

```

Generate a Coca-Cola advertising mockup showing a product on a

vintage diner counter, with the Coca-Cola logo clearly visible.

```

What you get: A vintage diner scene with a logo that looks 70% right. The "C" curves are approximately correct. The script font is approximate. The red is close. The kerning drifts.

Why it's wrong: OpenAI documents this directly: "the model still struggles to reproduce specific logos with pixel accuracy." Beyond the aesthetic problem, AI-generated reproductions of registered trademarks have unresolved IP implications β€” you can't reliably claim fair use for an AI-generated approximation of someone else's brand mark.

What it costs you: For client work involving any real brand (yours or someone else's), you have legal exposure. For your own brand, AI logo approximations dilute brand identity over time.

  • Generate the scene and lifestyle context in gpt-image-2
  • Specify whitespace where the logo will appear: "leave clean space in lower-right for logo composite"
  • Composite the actual logo SVG in Figma or Photoshop
  • For your own brand: use your real licensed brand assets, never AI approximations

When gpt-image-2 is acceptable: stylized illustrations or concept art where logos are deliberately impressionistic and not meant to represent real brands.

---

4. Anything Resembling Authentic Documentation

```

Generate a screenshot of a tweet from a verified user with profile photo,

verification badge, like and retweet counts, and timestamp.

```

What you get: A pixel-convincing tweet screenshot, complete with apparent verification badge.

Why it's wrong, beyond the obvious ethical problem: This is the deception capability that the Chinese tech publication 36kr called out in their April 2026 analysis "Caution: Avoid Being Deceived by ChatGPT Images 2.0." Their finding: gpt-image-2 produces near-perfect fakes of social media screenshots, academic articles with valid-looking DOI numbers, official documents with seals, medical prescriptions, and handwritten homework.

  • Reputational damage if a "for example" tweet image circulates outside your context
  • Platform bans (Twitter, LinkedIn, and others are actively flagging AI-generated platform-imagery as deceptive)
  • Legal exposure under emerging AI-disclosure regulations
  • Erosion of public trust in actual screenshots and documentation
  • For tutorials about real platforms: take real screenshots of your own real account
  • For "examples" of social posts: clearly mock-up in Figma with obvious styling that signals "this is an illustration, not a real post"
  • For "what an academic paper looks like" educational content: use a real (cited) academic paper screenshot, not a generated one
  • For any documentation imagery: photograph or scan real documentation, then redact identifying information

When gpt-image-2 is acceptable: never for content that could be mistaken for authentic. Even with disclaimers, the imagery itself has lifespan beyond your context.

---

5. Origami, Knots, Folds, and Physical Reasoning

```

Generate a tutorial image showing the step-by-step folding of an

origami crane with each step clearly visible.

```

What you get: Visually convincing fold steps that don't physically work. If you tried to follow the fold pattern shown in the image, you'd reach an impossible state by step 3.

Why it's wrong: Origami is a documented gpt-image-2 failure mode. The model produces fold patterns that look correct but violate the geometry of paper folding. Same for knots (knots that don't actually hold), gears that don't mesh, joints that don't articulate.

What it costs you: Tutorial credibility. If you publish an origami tutorial with AI-generated fold steps and someone tries to follow them, they'll fail and learn that your content can't be trusted.

  • Real photographs of real folded paper at each step
  • Established origami diagram conventions (Yoshizawa-Randlett system) drawn manually or in vector software
  • Stock origami illustration libraries

When gpt-image-2 is acceptable: decorative imagery suggesting "origami aesthetic" where no one is expected to follow the fold pattern.

---

6. Reflections, Mirrors, and Optical Effects

```

Generate a still life of a glass perfume bottle on a marble surface

with accurate reflection of the bottle and surrounding scene visible.

```

What you get: A beautiful still life. Reflections that are approximately correct in placement but optically wrong. The reflection in the bottle's glass is independent of what's actually in front of the bottle. The surface reflection on marble doesn't match the bottle's geometry.

Why it's wrong: Reflections require optical reasoning, which is in the same category as physical reasoning gpt-image-2 fails at. Outputs look beautiful at first glance but fail any careful inspection.

What it costs you: For professional product photography, this is the difference between "AI-generated, you can tell" and "real photograph." Sophisticated viewers spot the optical errors immediately. For luxury product imagery, this matters.

  • Real product photography for hero shots
  • Use gpt-image-2 for the surrounding scene/context, composite a real product photo into it
  • For pure illustration where realism doesn't matter: stylize aggressively so reflections aren't expected

When gpt-image-2 is acceptable: stylized illustration where the reflections are deliberately impressionistic, mood-piece imagery where viewers don't inspect details.

---

7. Iterative Refinement Past the Second Pass

```

[Generation 1] Generate a hero image of a workspace with morning light.

[Generation 2] Make it warmer.

[Generation 3] Move the laptop slightly to the left.

[Generation 4] Add more depth-of-field.

[Generation 5] The wood texture is too prominent, soften it.

```

What you get: Generation 1 is great. Generation 2 is fine. Generation 3 is acceptable. Generation 4 starts looking strange. Generation 5 produces visible artifacts and quality degradation.

Why it's wrong: Documented on the OpenAI Developer Community thread as the "noise amplification bug" β€” sequential generations within the same session reuse data from prior outputs, amplifying noise patterns. After 3-5 iterations, images visibly degrade.

OpenAI has not officially acknowledged this bug. Plan around it.

What it costs you: That perfect image you were 90% of the way to gets ruined when you push for the final 10%. The temptation to keep iterating is strong; the model rewards you with worse outputs.

  • Limit iterations to 2 refinements per session
  • After 2 refinements: start a fresh session with a refined prompt incorporating what you learned
  • For complex final adjustments: composite in Figma rather than asking gpt-image-2 to refine further
  • Reload the browser tab between major generation sessions

When this rule doesn't apply: if your first generation is a no-go (wrong concept, wrong style), abandon and start fresh rather than iterating. Iteration is for fine-tuning, not concept correction.

---

How to Use gpt-image-2 Without Falling Into These Traps

Five rules that emerge from these seven failure modes:

  • AI generates the scene; you control the brand-critical elements. Never trust gpt-image-2 with logos, exact text, or specific brand mark reproduction. Composite those in Figma using your licensed assets.
  • AI generates qualitative; you fact-check quantitative. Never trust gpt-image-2 with specific counts, statistics, or numerical claims. Use qualitative prompting and composite verified numbers separately.
  • AI generates impressionistic; specialists generate technical. Never trust gpt-image-2 with technical accuracy (engine diagrams, optics, physics, anatomy). Use specialist tools or hire specialists.
  • AI generates obviously stylized; reality requires real source material. Never trust gpt-image-2 with imagery that could be mistaken for authentic documentation. Stylize aggressively or use real source material with proper disclosure.
  • AI generates fresh; refinement happens elsewhere. Limit iterations to 2 per session. For polish past that point, switch to Figma or start a fresh session.

Apply these five rules and gpt-image-2 becomes one of the most useful production tools in 2026. Ignore them and you'll ship work that looks right and is quietly wrong β€” the worst category of professional output.

---

Get the Full Prompt Pack

ChatGPT Images 2.0 Prompts Pack β€” 30 prompts that are explicitly built around these constraints. Each prompt documents what the model does well and where it fails. MIT-licensed, free.

For the foundational review: ChatGPT Images 2.0 Honest Guide.

For when you need a different tool entirely: ChatGPT Images 2.0 vs Midjourney v7 β€” When Each Wins.

β€” Atilla

Tags

AI Image Generation ChatGPT Images 2.0 Failure Modes Production Quality AI Limitations

πŸ“¬ Promptolis Newsletter

One research-backed AI prompt per week. Free. Unsubscribe anytime.

No spam. No sales funnels. Just good prompts. Β· Or subscribe directly on Beehiiv β†’

Related articles

← Back to blog