# image-generation — a Claude Code skill

A Claude Code skill for generating logos, icons, UI mockups, hero images, and product shots using **OpenAI gpt-image-1.5** and **Google Gemini Nano Banana 2 / Pro** (gemini-3.1-flash-image-preview / gemini-3-pro-image-preview).

The skill packages:

- **Model selection logic** — when to reach for Flash vs Pro vs gpt-image-1.5, by asset type and brief.
- **Provider-specific prompt-engineering references** — the two providers have *opposite* prompt structures (OpenAI wants labeled segments + negative phrasing; Gemini wants narrative paragraphs + positive phrasing only). Mixing them up degrades outputs badly.
- **Asset templates** — fill-in-the-blank prompt scaffolds for logos, icon sets, mobile UI, dashboards, hero images, product shots.
- **Hebrew/RTL guidance** — text rendering in non-Latin scripts is a known weak spot; this skill documents the workaround (generate text-free, composite in post).
- **Bundled scripts** — `scripts/openai-image.sh` and `scripts/gemini-image.sh` wrap each provider's API including reference-image (`--ref`), aspect ratio, size, and the model-specific quirks.
- **Iteration discipline** — a self-critique loop where Claude reads the saved image with its multimodal vision, scores against the brief, and decides ship / edit / rewrite before showing the user.

## Install

Drop the directory at `~/.claude/skills/image-generation/`:

```bash
git clone https://github.com/shaharsha/claude-skill-image-generation.git ~/.claude/skills/image-generation
```

Then add your API keys to wherever you store them, and export them before invoking the scripts:

```bash
export OPENAI_IMAGE_API_KEY='sk-proj-...'
export GEMINI_IMAGE_API_KEY='AQ.Ab8RN...'
```

The skill's [SKILL.md](SKILL.md) references a per-user file at `~/.claude/projects/-Users-shaharshavit/memory/api-keys.md` for key storage — adjust that path to match your own setup.

## Entry point

Claude Code loads [SKILL.md](SKILL.md) when the skill is invoked. Start there to see the full workflow.

## Layout

```
SKILL.md                          ← agent entry point
README.md                         ← you are here
examples.md                       ← worked end-to-end examples

reference/
  model-selection.md              ← when to use which model, by asset type
  openai-gpt-image-1-5.md         ← OpenAI prompt grammar + API quirks
  gemini-image.md                 ← Gemini prompt grammar + API quirks
  pricing.md                      ← per-image cost tables for budget tracking
  hebrew-rtl.md                   ← Hebrew/Arabic text-in-image workflow

templates/
  logo.md
  icon-set.md
  ui-mobile.md
  ui-dashboard.md
  hero-image.md
  product-shot.md

scripts/
  README.md
  openai-image.sh                 ← POST /v1/images/generations & /edits
  gemini-image.sh                 ← POST /v1beta/models/<model>:generateContent
```

## Notes

- This is a personal skill for use inside Claude Code. The bash scripts assume `bash`, `curl`, `jq`, `base64`, and `file` are on PATH.
- Both API keys must be paid-tier; `gpt-image-1.5` requires OpenAI org verification.
- Generated outputs default to `./generated-images/` in the current working directory.