# Roadmap

[English](./roadmap.md) | [中文](./zh/roadmap.md)

---

> PPT Master is a solo-maintained open source project, driven by **priority rather than fixed timelines**. This roadmap is here to align expectations: what's already shipped, what's under ongoing maintenance and evolution, and what's intentionally out of scope. Priorities shift with user feedback and real usage signals — no committed delivery windows.
>
> **Where we are**: AI generates SVG from scratch → converts to DrawingML for natively editable PPTX. The core axis is **pixel-fidelity across four renderers** (PowerPoint / Keynote / LibreOffice / WPS) **+ real native shapes**. Every direction below serves that axis.

---

## Recent capability evolution

The past two months' structural capability growth. Single flags / incremental polish go to the commit log.

### 2026-03 — Native PPTX route takes shape

- **Direct export to natively editable PPTX** — `svg_to_pptx` adds glow / rotate / text-decoration / stroke-linejoin; the full SVG → DrawingML chain becomes usable
- Chart / layout template JSON indexes ship, AI selection path connected

### 2026-04 — Pipeline at scale

- **Source-less generation**: `topic-research` workflow supports "topic only, no source files"
- **PPTX export step-change**: SVG clipPath → DrawingML picture geometry, marker → native arrows, output consolidated to `exports/`
- **Chart library expands to 70 templates + three icon libraries** (simple-icons / phosphor-duotone / brand-logo)
- **`spec_lock.md` machine-readable contract**: Strategist locks the spec, Executor re-reads it before every page — cross-page consistency gets a real guarantee
- **Per-element animation on by default** + recorded narration / video export ([`workflows/generate-audio.md`](../skills/ppt-master/workflows/generate-audio.md))

### 2026-05 — Visual editing + AI image systematization

- **Live Preview enters the main pipeline** ([`workflows/live-preview.md`](../skills/ppt-master/workflows/live-preview.md)) — browser preview, click elements to write annotations, say "apply my annotations" and the AI rewrites that region (built on [@WodenJay](https://github.com/WodenJay)'s [PR #85](https://github.com/hugohe3/ppt-master/pull/85))
- **Replicate any PPTX as a template** ([`workflows/create-template.md`](../skills/ppt-master/workflows/create-template.md)) — PPTX → SVG reverse + OOXML theme / master / layout / asset extraction
- **AI image three-dimension system** rendering × palette × type + Strategist h.5 lock, downstream consumes a fixed contract
- **AI image `hero_page` dual-track** — local insert + full-canvas hero image coexist
- **Brand identity preset subsystem** ([`workflows/create-brand.md`](../skills/ppt-master/workflows/create-brand.md)) — extract and reuse brand palette / typography / logo / voice
- **Visual self-review workflow** ([`workflows/visual-review.md`](../skills/ppt-master/workflows/visual-review.md)) — rubric-based per-page check of AI-generated SVGs
- **AI image: Type concept boundary clarification** — Type is now narrowed to "the internal geometric skeleton of a local infographic block" (11 real skeletons); the four pseudo-types (hero / background / portrait / typography) fold back into `page_role: hero_page` plus four composition primitives (single-subject / portrait / typographic / atmospheric); hero_page text layering rule (visual keywords embedded, editable text via SVG overlay)
- **Brutalist AI newspaper example deck shipped** ([`examples/ppt169_brutalist_ai_newspaper_2026/`](../examples/ppt169_brutalist_ai_newspaper_2026/)) — first of the three P0 capability-backing demos: wall-to-wall small type + irregular columns + halftone monochrome + single-spot red + real native shapes; 10-page editorial annual report stressing text-position precision and cross-page consistency
- **Kubernetes Blueprint example deck shipped** ([`examples/ppt169_kubernetes_blueprint_2026/`](../examples/ppt169_kubernetes_blueprint_2026/)) — second of the three P0 capability-backing demos: isometric technical-drawing aesthetic + blueprint cyan/amber palette + hand-authored SVG geometry (no raster images) + custom drawing-in animation; 10-page Kubernetes architecture walkthrough stressing geometric shape generalization and chart-structure extensibility
- **AI image `custom` escape hatch** — `rendering` / `palette` / hero composition each accept `custom` + a one-paragraph `*_behavior` prose, replacing the false "default to vector-illustration / cool-corporate" fallback; end-to-end contract spans [`image-renderings/_index.md`](../skills/ppt-master/references/image-renderings/_index.md) §1.5, [`image-palettes/_index.md`](../skills/ppt-master/references/image-palettes/_index.md) §2, Strategist h.5 hard-rule (≤1 custom per dimension; one candidate may carry both), spec_lock fields, and Image_Generator Step 2 consumption branch
- **Template architecture: three-kind consolidation** ([`docs/templates-architecture.md`](./templates-architecture.md)) — brand / layout / deck split into three independent dirs with per-kind schemas + segment-level fusion + git-style conflict resolution; SKILL.md Step 3 dispatches per `kind`, trigger rule remains "explicit path only"
- **Pattern fill PPTX safety net** — `svg_quality_checker.py` now warns on `<pattern>` without `data-pptx-pattern` (silent fallback to `ltUpDiag`) and errors on values outside OOXML `ST_PresetPatternVal` (schema-failed PPTX that won't open); `shared-standards.md §7` documents the closed preset enum and the required `<rect fill="<bg>"/>` child convention
- **LaTeX math formula rendering shipped** ([`scripts/latex_render.py`](../skills/ppt-master/scripts/latex_render.py)) — Strategist locks one of three policies (`mixed` / `render-all` / `text-only`) inside the Typography confirmation and writes an explicit `images/formula_manifest.json`; the renderer walks a codecogs → quicklatex → mathpad → wikimedia fallback chain and emits transparent PNGs that land in §VIII as `Acquire Via: formula` / `Status: Rendered` rows; formula-heavy decks (academic / engineering / educational) finally have a native rendering route. Formula selection is a Strategist decision — the renderer never scans source files for `$...$` markers
- **Live preview direct editing — L1 / L2 / L3** ([`workflows/live-preview.md`](../skills/ppt-master/workflows/live-preview.md)) — the browser editor gains deterministic in-place edits with no AI round-trip: text content (L1), presentation attributes like fill / stroke / font-size (L2), and on-canvas geometry (L3) — drag a selected element to move it, arrow-key nudge (`Shift` = 10px), multi-select, plus a right-click overlap picker for stacked shapes. Edits stage with `Ctrl+Z` undo + coalescing and write to `svg_output/` on **Apply changes**; moves persist through finalize / export (moved text frames, promoted multi-line tspans, repositioned icons all reproduce in the PPTX). Re-export stays chat-driven; on-canvas resize handles are not yet implemented (resize via the geometry inputs)

### 2026-06 — Mode & visual-style dual catalogs + PPTX intake & content-strategy expansion

- **Replicate any PPTX's design → refill content route** ([`workflows/template-fill-pptx.md`](../skills/ppt-master/workflows/template-fill-pptx.md)) — when a user supplies an existing `.pptx` plus new material / a topic and asks to "reuse this deck's design / fill the content back in", this standalone workflow edits the PPTX directly and never enters the SVG generation pipeline. Output stays natively editable (it reuses the original slides' shapes / layouts, not a screenshot refill); it isolates private parts on reuse, exposes chart data, and runs capacity checks. Trigger follows the template rule — only on an explicit ask to reuse an existing deck — and it deliberately does not reflow / add pages / swap images (that's the from-scratch main route). Distinguished from Non-goals #53 below
- **Three executors retired → mode + visual-style dual catalogs** ([`references/modes/`](../skills/ppt-master/references/modes/) + [`references/visual-styles/`](../skills/ppt-master/references/visual-styles/)) — the old three `executor-*.md` (general / consultant / consultant-top) entangled domain · audience · persuasion · narrative on one axis; split into two orthogonal catalogs (following the `image-renderings` pattern: flat dir + `_index` + on-demand read + Strategist locks one). **mode** = narrative skeleton (`pyramid` / `narrative` / `instructional` / `showcase`; consultant + top merge into pyramid since their narrative core is identical); **visual-style** = SVG layout aesthetic (`swiss-minimal` / `editorial` / `soft-rounded` / `dark-tech`, each paired with an image-rendering, **zero HEX** — color truth stays in confirmation e + image-palettes). Strategist `§d` locks `mode` + `visual_style` independently into `spec_lock`; Executor loads the two locked files; any mode × any style. Render coordinates stay in `templates/charts/`
- **Prompt constraint-strength decoupling** ([`docs/rules/prompt-style.md`](rules/prompt-style.md) §4) — three explicit strength tiers — rule (`Hard rule` / `Forbidden`) / default (`Default — … may override`) / reference (`Reference — not a constraint`) — plus an "objective failure vs taste" test and a checker boundary, so the model can tell "must keep vs may deviate" at a glance; the visual-style catalog is Reference-strength throughout
- **visual-style catalog grows to 18, aligned with image-renderings + examples reclaimed** — first four distilled from the [examples library](../examples/) (`brutalist` / `blueprint` / `memphis` / `zine`), then six filling in the hand-drawn / textured renderings that have a layout twin (`sketch-notes` / `ink-notes` / `chalkboard` / `paper-cut` / `vintage-poster` / `pixel-art`), then four more reclaimed from still-uncovered example aesthetics: `ink-wash` (rice-paper whitespace, from 藏拙 / 李子柒) · `glassmorphism` (dark frosted-glass + gradient light, from glassmorphism_demo, split out of soft-rounded) · `photo-editorial` (full-bleed photography dominates, text captions, from Pritzker / fashion_weekly) · `data-journalism` (Bloomberg/Economist multi-column micro-charts + sidebars, from global_ai_capital). The catalog is regrouped into five families (corporate-product / editorial-publication / expressive-print / hand-drawn-brush / specialty). **The test**: a rendering earns a visual-style twin only when it defines a whole-page layout language, not merely how an inserted image looks — so photo-*led composition* gets `photo-editorial` (paired with corporate-photo), while purely atmospheric renderings (nature / warm-scene / fantasy-animation) stay imagery-only and just pair with a layout style. Zero HEX, Reference strength throughout
- **mode catalog grows to 5: add `briefing`** — fills the "neutral information delivery" cell: no thesis, no story, no teaching, no spectacle — topic titles, even weight, complete and scannable, for status updates / reference decks / catalogs / meeting packs / FAQs that inform without arguing. The five now partition presentation *intent* more nearly MECE: persuade (pyramid) · tell a story (narrative) · teach (instructional) · impress (showcase) · simply inform (briefing). `_index` gains a `briefing` vs `pyramid` tie-breaker ("if you'd have to invent a thesis to fit pyramid, it's briefing"). Five presets plus a `custom` escape hatch for a bespoke direction none of them captures (a special cadence, a multi-mode fusion, a particular posture) — either user-requested or Strategist-recommended, confirmed like every lock; a deck always locks one value, and a fusion is one `custom` describing the acts. The only thing to avoid is defaulting to `custom` when a preset genuinely fits. Mirrors the truth-precedence rule that a user-authored outline or direction overrides the mode
- **mode / visual-style system validated on real decks + four calibration tightenings landed** — the 5 modes + 18 visual-styles + the `custom` escape hatch were exercised on five covering decks (briefing×data-journalism / narrative×photo-editorial / instructional×chalkboard / showcase×glassmorphism / custom×zine, with the narrative deck going through the AI-image branch): **zero selection misses** (all four Close-calls tie-breakers fired and held against the real grey-zone pulls), **discipline fully honored** (zero HEX / Reference strength / whole-page layout language), **the `custom` mechanism works** (a `mode_behavior` prose paragraph carried a 10-page generation and reads as plain language at confirmation, not a bare token), **mode ⟂ visual_style holds** (any combination without bleed, including a positive check of the "keynote/launch = mode not style" route), export 5/5 decks × every page with 0 failures. Tightened four spots from the real signal: `strategist §e` now locks the full neutral tier set the visual_style implies up front (killing the mid-deck color top-ups seen in three consecutive decks), `executor-base §1` re-skins chart/layout template gradients-shadows-fills to the deck's visual_style (templates supply structure, not skin; mirror templates stay verbatim per §1.1), `briefing §1` makes `core_message` state coverage rather than a claim (briefing-only; global §IX assertion semantics stay for narrative/instructional/pyramid), and `svg_quality_checker` fixes font-family drift false positives (delimiter-matched capture + stack normalization) and drops the font-size ceiling for showcase mode and poster visual-styles
- **Optional spec-review checkpoint shipped** ([`workflows/refine-spec.md`](../skills/ppt-master/workflows/refine-spec.md)) — an opt-in pause after the Eight Confirmations: when the user explicitly asks (default OFF), the Strategist produces the full `design_spec.md` + `spec_lock.md`, then stops so the user can revise any part of the spec (outline / color / typography / layout / image strategy / page rhythm) before generation, keeping both files in sync. Same shape as the split-mode note — it never fires on its own, the default pipeline is unchanged, and it surfaces as one opt-in line inside the Eight Confirmations. Review lenses (logical clarity / information density / focus / register / emotional resonance / chapter balance + the design dimensions) give a direction only, never a numeric threshold (`Reference` strength). Prompted by [@cuberoocp](https://github.com/cuberoocp) in [issue #173](https://github.com/hugohe3/ppt-master/issues/173)
- **Interactive visual Eight Confirmations page (Step 4)** ([`scripts/confirm_ui/server.py`](../skills/ppt-master/scripts/confirm_ui/server.py), field schema [`scripts/docs/confirm_ui.md`](../skills/ppt-master/scripts/docs/confirm_ui.md)) — the Eight Confirmations move from chat-only to a browser page that auto-launches by default at Step 4: enumerable fields (canvas / mode / visual_style / icons / image usage / formula & generation policy) list common options from `catalogs.json`, while generative fields present candidates — color swatches, live font previews (CJK / Latin previewed independently), and AI-image rendering × palette picks; plus live swatch feedback on custom HEX, a live combined color × typography preview, and a body-size range hinted from the chosen canvas. It **shares port `5050`** with the Step 6 Live Preview (they never run at once — the page auto-`--shutdown`s at the end of Step 4 to free the port; if the port is busy the launcher auto-advances to the next free one). On principle **chat stays the canonical channel and the page is a convenience layer**: the page's `result.json` is authoritative over the recommendations and is consumed in place downstream (image plan / `image_strategy` / fonts / the split and refine-spec toggles all read from it), and any failure to open / timeout / headless host degrades losslessly to the chat-summary fallback
- **A batch of source-conversion fidelity gains** — less information dropped as source material enters the pipeline: `doc_to_md` converts Word OMML / Office Math equations to inline LaTeX, `pdf_to_md` recognizes `Figure N |` pipe-delimited captions, and `ppt_to_md` preserves hyperlinks already in the source deck (run-level external `[text](url)` / slide-internal jumps `[text](#slide-N)` / shape-level clicks, with dangerous-scheme filtering and anchor-text Markdown escaping) and transcribes native chart data into Markdown tables (values survive the conversion instead of collapsing to a single picture). Caption recognition is based on [@suay1113](https://github.com/suay1113)'s [PR #191](https://github.com/hugohe3/ppt-master/pull/191); hyperlink preservation is distilled from [@ZhaoZuohong](https://github.com/ZhaoZuohong)'s [PR #155](https://github.com/hugohe3/ppt-master/pull/155)

- **Content-faithful PPT beautification / re-layout shipped** ([`workflows/beautify-pptx.md`](../skills/ppt-master/workflows/beautify-pptx.md)) — mirror of `template-fill`: template-fill reuses a deck's design with new content, beautify keeps the content and redoes the layout. Given an existing PPTX, **every text string is preserved verbatim (nothing added / removed / reworded)**; it **extracts and inherits the source deck's visual identity (palette / fonts, `theme` or `observed` offered as two confirm-page candidates)** and redoes only layout / hierarchy / whitespace, strict 1:1 page count and order, with charts / tables regenerated natively from extracted data (values frozen) and source pictures re-laid-out. Technically still "generate a native, editable PPTX from scratch" (`ppt_to_md` extracts content → main pipeline → a brand-new deck), not a patch over the original, so it stays clear of Non-goals #53. Adds `beautify_identity.py` / `beautify_inventory.py`; the confirm page is seeded from the source for the user to review. Honest v1 ceiling: it does not relieve information overload (a crowded page improves within itself; true re-pagination is the main pipeline), does not guarantee coordinate-level paste-back, and combo / dual-axis / waterfall charts lose the un-captured plots

- **Multi-deck PPTX intake + `analysis/` source-name prefixing** — a main-pipeline project can now combine several source decks: each writes its own `<stem>.identity.json` / `<stem>.slide_library.json`, with every deck's digest inlined under `decks[]` in the single index `source_profile.json` (preserving the "Strategist must-read `source_profile.json`" single-entry contract — one entry for a one-deck project, several for a combined one; re-importing the same stem replaces its entry). `beautify` / `template-fill` stay single-deck (1:1) and read their own `<stem>.*` artifacts

- **Material divergence (a free-text field under §c Audience)** — a main-pipeline content-strategy question added as a **free-text box** under the audience field: the user states in their own words how closely to follow the source vs how freely to reshape it (blank = balanced default). Deliberately not a fixed set of options, not recommended from analyzing the source, and not coupled to page count — it is purely the user's stated intent. However freely they ask, facts **stay sourced** — reorganize / reframe / expand / connect what is in the source, never import facts from outside it (that is the `topic-research` job). The Strategist reads the prose when authoring the §IX outline and records it in `design_spec §I`; it is **not** written to `spec_lock` (the Executor never reads it). `mode` and divergence are orthogonal. Beautify / template-fill freeze content and do not surface this field

- **A batch of default-behavior and intake standardizations** — per-element entrance animation now defaults off (page transition `fade` only; element builds are opt-in via `-a auto` / `animations.json`), removing the auto-cascade "AI deck" tell; per-project `icons/` copies chosen icons into the project at selection time and embeds project-first; `analysis/` is established as the canonical must-read layer for machine-extracted facts (PPTX intake bundle + `image_analysis.csv`); the main pipeline treats a source deck's identity (palette / fonts / layout) as **reference, not constraint** (inherit or redesign by Strategist judgment, defaulting to fresh design); the confirm page gains a custom color input

---

## In progress / Next

Directions actively underway or up next, with no committed timeline.

- **Real-usage calibration of multi-deck intake and material divergence (just landed)** — multi-deck combined intake (`<stem>` prefixing + a `decks[]` merge index) and the material-divergence free-text field (under §c audience) both shipped (see "2026-06" above); next is calibration from real usage: same-stem collisions across decks currently let the later one overwrite the former — whether dedupe / numbering is needed awaits signal; and whether the Strategist reads the free-text intent accurately, and whether the "facts stay sourced" line holds when the user asks for free reshaping, awaits real generation. Neither gets a mechanical threshold added pre-emptively
- Otherwise: the mode / visual-style validation and calibration is settled (see "2026-06" above) — the structure (5 modes + 18 visual-styles + custom) is locked, the four adjacent pairs are consolidated into one Close-calls table, and the four calibration tightenings have landed. Future direction is driven by real usage signal and feedback; long-running improvements are under "Ongoing maintenance" below, and evaluated-out directions under "Non-goals"

---

## Ongoing maintenance directions

Long-running improvements with no committed timeline. Only real directions are listed; specific fixes / single flags go to the commit log.

- **Prompt slimming** — compress per-role prompt token footprint and improve cache hit rate without sacrificing quality, for indirect cost / speed gains. Complements "Pure speed optimization" below: indirect optimization yes, quality-sacrificing speedups no.

---

## Non-goals

The directions below come up repeatedly and have been evaluated as **not on the path**. Listing them is not a value judgment on the underlying need — they simply don't fit this project's main route. If you specifically need these capabilities, consider other tools or forking.

### Read arbitrary PPTX templates → fill text only

**Issues**: [#53](https://github.com/hugohe3/ppt-master/issues/53), [#118](https://github.com/hugohe3/ppt-master/issues/118)

PPT Master's main route is "AI generates SVG from scratch → DrawingML", with the whole pipeline built around full control of every shape / text / layout. "Parse existing PPTX placeholders + only refill text" is a different product shape requiring handling of arbitrary master / theme / placeholder systems — orthogonal to where this architecture invests.

**The basic need is actually simple**: if you just need "replace Excel data into fixed positions in a PPT template", have the AI write a few lines of `python-pptx`. You don't need this pipeline.

> **vs. the `template-fill-pptx` route**: "reuse a deck's *own* design and refill it with new content" is a supported capability (see "2026-06" above), with natively-editable output. What's out of scope here is the other shape — parsing an **arbitrary third-party** template's master / theme / placeholder system to do text-only substitution. Different investment points; don't conflate them.

### Switch to native PowerPoint charts (Excel-native chart)

**Issues**: [#99](https://github.com/hugohe3/ppt-master/issues/99), [#100](https://github.com/hugohe3/ppt-master/issues/100)-class

Pixel-fidelity across the four renderers (PowerPoint / Keynote / LibreOffice / WPS) is the project's spine. Switching to native PowerPoint charts breaks that — the same PPTX renders different chart layouts across renderers. Charts as SVG is **by design**, not a capability gap.

If you need data-driven native Excel charts, pick a different tool or manually replace charts in PowerPoint post-export — this project won't build that path in.

### uv as default / required dependency

**Issue**: [#111](https://github.com/hugohe3/ppt-master/issues/111)

`pip + requirements.txt` is the only official install path because it works in every Python environment with no extra learning cost. uv is a fine tool, but making it default raises the bar for new users. If you personally prefer uv, use it in your fork — it won't affect the main line.

### Pure speed optimization

**Issue**: [#97](https://github.com/hugohe3/ppt-master/issues/97)

In the cost / speed / quality triangle this project picks **quality**. ~20 minutes for a high-quality PPTX is the current reasonable point.

Will do: indirect improvements via prompt slimming / cache hit rate.
Won't do: trading quality for "throw a few pages together" speed.

If speed-sensitive and quality-tolerant, Gamma / similar AI tools are a better fit.

### CLI / SaaS / desktop app form factors

The product form is firmly **chat-driven AI IDE skill** (Claude Code / Cursor / VS Code + Copilot / Codebuddy).

Won't do: standalone CLI (`ppm`-style), SaaS web service, Electron shell. Any "make it run independently of chat" proposal will be declined. Chat is the interaction core, not a wrapper.

---

## Feedback channels

- **Issues**: [github.com/hugohe3/ppt-master/issues](https://github.com/hugohe3/ppt-master/issues) — bugs / proposals
- **Discussions**: [github.com/hugohe3/ppt-master/discussions](https://github.com/hugohe3/ppt-master/discussions) — usage / experience sharing
- **Email**: heyug3@gmail.com

Before proposing a new direction, scan the **Non-goals** above. If your request falls there, it's unlikely to land — but we're happy to discuss other paths to your underlying need.