# Model selection — full decision tree

The two-line rule: **default to Gemini Flash; promote to Pro for brand-critical / large / multi-reference / non-Latin-text work; reach for OpenAI gpt-image-1.5 for transparent-bg PNGs and 5-reference identity-preservation edits.**

This file is the long version, by asset type.

## Comparison matrix

| Capability | Gemini Flash 3.1 | Gemini Pro 3 | OpenAI gpt-image-1.5 |
|---|---|---|---|
| **Max resolution** | 4K (4096px) | 4K (4096px) | 1536px on long edge |
| **Default resolution** | 1K | 1K | 1024² |
| **Aspect ratios** | 1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9, plus extreme (1:4, 4:1, 1:8, 8:1) | Same minus extreme | 1:1 (1024²), 2:3 (1024×1536), 3:2 (1536×1024) only |
| **Reference images** | Up to 14 | Up to 14 | Up to 16 inputs; first 5 preserved at high fidelity |
| **Native transparent bg** | ❌ generate on white, post-process | ❌ generate on white, post-process | ✅ `background: "transparent"` |
| **Text rendering — English** | Very good | Best in class (~94% char accuracy) | Very good |
| **Text rendering — Hebrew/Arabic/CJK** | Flaky, RTL shaping unreliable | Better but still unreliable for RTL | Decent for Latin-script and English; weak for non-Latin |
| **Multi-turn / chat editing** | ✅ native, with thought-signature passing | ✅ native | ❌ stateless; use `/edits` endpoint with previous image |
| **Identity preservation on edits** | Good | Up to 5 character refs | Excellent: `input_fidelity=high` preserves first 5 inputs |
| **Thinking / reasoning before drawing** | Optional, `thinkingLevel: minimal\|high` | On by default, not user-tunable | N/A |
| **Output format** | PNG only (base64) | PNG only (base64) | PNG / JPEG / WebP, transparent supported on PNG/WebP |
| **Cost / image (typical)** | $0.07 (1K) / $0.10 (2K) / $0.15 (4K) | $0.13 (1K-2K) / $0.24 (4K) | $0.009 low / $0.034 medium / $0.133 high (1024²) |
| **Latency** | ~2-4s | ~6-10s (with thinking) | ~10-30s, "complex prompts may take up to 2 minutes" |
| **SynthID watermark** | Always on, invisible | Always on, invisible | Not applied |

## Decision by asset type

### Brand logo / mark / wordmark

| Stage | Model | Why |
|---|---|---|
| **Exploration** (10-30 variants) | Gemini Flash 1K | Cheap, fast, produces enough variation per call |
| **Final deliverable, English text only** | Gemini Pro 2K or 4K | Best text rendering, highest quality |
| **Final deliverable, Hebrew/Arabic text** | Gemini Pro 4K — generate **text-free mark only**, composite the wordmark in vector after | See [hebrew-rtl.md](hebrew-rtl.md) |
| **Need transparent PNG immediately** | gpt-image-1.5 high, `background=transparent` | Skips background-removal step |

For brand-consistent **variants** (5 different logos in the same style), prefer Gemini Pro and pass the brand style guide as references. Pro accepts up to 14 reference images and does role-assignment well.

### Icon set (multiple icons sharing a style)

| Stage | Model | Why |
|---|---|---|
| **Single icon, English style** | Gemini Flash 1K, square | Cheap, repeatable |
| **Coherent set of 4-12 icons** | Two strategies: (a) Gemini Flash, generate as a single 3×4 grid in one call; (b) Gemini Pro multi-turn — generate first icon, then for each next icon attach the first as reference and chain | Strategy (a) gives best within-set consistency. Strategy (b) gives clean isolated icons but needs the explicit "match the style of the attached reference" clause every turn |
| **Need transparent PNGs without post-processing** | gpt-image-1.5 high, `background=transparent` | Same as logo case |

Icons under 24px need post-vectorization to look crisp regardless of model.

### Mobile UI mockup

| Stage | Model | Why |
|---|---|---|
| **Wireframe / lo-fi** | Gemini Flash 1K, 9:16 | Cheap iteration |
| **Hi-fi UI with realistic content** | Gemini Pro 2K, 9:16 | Pro's text rendering keeps button labels and headlines legible at small sizes |
| **iPhone or device-framed mockup** | Gemini Pro 2K, 9:16 (or 16:9 for landscape device shot) | Pro handles the device-frame composition and small UI text well |
| **Hebrew/RTL UI** | Gemini Pro 2K — generate the chrome/layout, composite Hebrew text in post | See [hebrew-rtl.md](hebrew-rtl.md) |
| **Need transparent UI elements** | gpt-image-1.5 high | Rare for full mockups; common for individual UI elements |

### Web dashboard / SaaS UI

| Stage | Model | Why |
|---|---|---|
| **Quick concept** | Gemini Flash 2K, 16:9 | Cheap |
| **Hi-fi production-grade** | Gemini Pro 4K, 16:9 | 4K is needed for retina rendering at full-screen sizes; thinking mode helps with multi-region layouts (sidebar + header + cards + chart + table) |
| **Hebrew/RTL dashboard** | Gemini Pro 4K, generate UI without Hebrew text, composite in post | See [hebrew-rtl.md](hebrew-rtl.md) |

Dashboards are where Pro's thinking mode and text-rendering advantage matter most. Don't use Flash for hi-fi dashboards — small chart labels and table cells will smear.

### Marketing hero image / banner

| Stage | Model | Why |
|---|---|---|
| **Concept exploration** | Gemini Flash 2K | Cheap; the model's narrative-prompt strength suits hero composition |
| **Final hero, no headline text** | Gemini Pro 4K, target aspect (16:9, 21:9 on Flash for ultra-wide) | Quality bar is high for hero work |
| **Final hero with headline text** | Gemini Pro 4K — generate background, composite headline in design tool | Even Pro is unreliable for marketing-grade headline typography |
| **Hero with people / faces** | Gemini Pro (up to 5 character references) | Pro keeps faces consistent across compositions |

### Product photography / hero shot

| Stage | Model | Why |
|---|---|---|
| **Lighting / angle exploration** | Gemini Flash 2K | Iterate camera angles cheaply |
| **Final hero shot** | Gemini Pro 4K | Photographic quality at scale |
| **Product on transparent bg (catalog cutout)** | gpt-image-1.5 high, `background=transparent` | One-step cutout |
| **Identity-preserving virtual try-on / model-with-product** | gpt-image-1.5 with `input_fidelity=high` and ≤5 references | This is gpt-image-1.5's signature workflow; see the cookbook virtual try-on prompt in [openai-gpt-image-1-5.md](openai-gpt-image-1-5.md) §9 |

### Infographic / diagram

| Stage | Model | Why |
|---|---|---|
| **All cases** | Gemini Pro 4K, with `tools: [{google_search: {}}]` for fact-checking | Pro's thinking mode + best text rendering + grounding makes it the only viable choice. Flash will smear small labels; GPT struggles with the multi-region layout |

### Illustration / editorial / stylized work

| Stage | Model | Why |
|---|---|---|
| **All cases** | Gemini Pro for final, Flash for exploration | Both Gemini models have strong stylistic range. GPT tends toward photographic realism even when asked for illustration |

## When OpenAI gpt-image-1.5 wins outright

These are the cases where the GPT cookbook playbook beats anything Gemini does:

1. **Native transparent PNG** in one shot — `background: "transparent"` on PNG/WebP output. Saves a background-removal step.
2. **Identity-preserving virtual try-on or sketch-to-render edits** — `input_fidelity=high` preserves the first 5 inputs at high fidelity. The cookbook's clothing-swap example is the gold standard.
3. **Mask-based inpainting** — pass an alpha-channel mask for precise edit regions. Gemini doesn't expose explicit masks.
4. **Cost-sensitive English-text production** at low/medium quality — $0.009-0.034 per image is hard to beat.

## When Gemini Pro wins outright

1. **Anything requiring 4K final output.**
2. **Text-heavy images** (posters, infographics, packaging with multiple labels).
3. **Brand work using a style guide of 5+ reference images.**
4. **Multi-turn iterative refinement** where you keep editing the same image with "change X, keep Y."
5. **Scenes requiring world knowledge or factual accuracy** (historical scenes, scientific diagrams) — Pro's thinking mode + optional Google Search grounding.
6. **Multi-language text** that's not Hebrew/Arabic.

## When Gemini Flash wins outright

1. **High-volume exploration** — generate 30 logo variants for $2.10 instead of $4.00.
2. **Real-time / latency-sensitive** UX where ~3s response matters.
3. **0.5K thumbnail / preview** generation (Pro doesn't support 0.5K).
4. **Extreme aspect ratios** like 1:8 or 8:1 banners (Pro doesn't support these).

## Tiebreakers

If you're 50/50 between Gemini Flash and Pro: **pick Flash for the first 3 calls, then promote to Pro for the final.** This is the cheapest correct workflow.

If you're 50/50 between Gemini Pro and gpt-image-1.5: **pick Gemini Pro unless you specifically need transparent backgrounds or 5-reference identity preservation.**