# @imgly/pdf-importer

## 0.2.1

### Fidelity fixes

- Fixed later fragments of a colored text run being imported as default black. A producer can draw one styled run (e.g. white "FOR DUTY") that pdf.js splits into several kerned text items; the walker records one color snapshot per `showText`, so an earlier fragment consumed the run's only snapshot and a later fragment fell through to a `null` fill (painted as default RGB black). A later fragment now inherits the fill from a consumed same-line sibling when the donors are fill-painting (render modes 0/2/4/6), sit within a width-scaled horizontal window, and unanimously agree on one color; it falls back to the safe `null` path otherwise.

## 0.2.0

### Text

- Merge adjacent text runs from the same logical font family that differ only by weight, style, or color into a single editable text block with per-character style spans. Inline emphasis in headlines (e.g. regular + bold + italic words sharing a baseline) now imports as one paragraph block instead of several, with `getTextFontWeights` / `getTextFontStyles` reporting the mixed styles. The merger bridges the word-space gap between font subsets and re-applies style spans through the text-fit pass. Color-only merging (the previous behavior) is preserved. (#15583)

### Page-level block emission (used by `@imgly/idml-importer`)

- Added `PDFParser.loadAsBlocks({ pageIndex })` — emits one PDF page as a detached, grouped CE.SDK block (mirrors `engine.block.loadFromString` semantics). Returns `{ blocks, logger }` so callers can forward parser warnings into their own diagnostic stream. Used by `@imgly/idml-importer` to import `<PDF>` embeds; available to other consumers that need page-level emission without creating a top-level scene. (#15265)
- Added `PDFParser.getPageCount()` so callers can size loops over `loadAsBlocks` without parsing twice. The first call extracts the IR; `parse()` and `loadAsBlocks()` reuse the same cached IR. (#15265)
- The wrapper group returned from `loadAsBlocks` is anchored to the PDF `MediaBox` via an invisible spacer rect, so the group's bbox always matches the page size even when content fills only a subset. (#15265)
- New warning code `PAGE_INDEX_OUT_OF_RANGE` emitted when `loadAsBlocks` is called with a `pageIndex` outside the document's range. (#15265)
- **BREAKING**: Dropped the top-level `Logger` named export from the package entry. The class is still reachable structurally via `PDFParser.parse().logger` / `loadAsBlocks(...).logger`; the explicit re-export collided with dts-bundle-generator's auto-emitted referenced-types export, producing a duplicate `export { Logger }` in the bundled `.d.ts`. Consumers that need to type the logger instance can use `Awaited<ReturnType<PDFParser['parse']>>['logger']`. (#15265)

### Fidelity fixes

- Preserve PDF dash patterns and line caps on stroked vector paths — dotted/dashed rules previously flattened to solid lines. The dotted-line idiom (a zero-length dash with a round cap) now renders as dots. The dash/cap APIs are feature-detected, so the peer-dependency floor stays at `>=1.70`; on engines older than 1.76 the stroke renders solid instead. (#15874)
- Tilted-image placement keeps its true visible width — a sub-degree tilt no longer inflates the axis-aligned bounding box and crops the image fill. (#15874)
- Image placements emit their true sub-3° rotation instead of snapping to upright, fixing large off-centre clipped images that imported visibly displaced. (#15874)
- `filterBackgroundArtifacts` no longer drops white shapes that merely sit inscribed in the page bounding box (e.g. a diamond filling the page); it now verifies the SVG path vertices sit at the bbox's four corners before treating a shape as a paper backdrop. (#15265)

### Requirements

- Node.js ≥ 22 is now required (previously ≥ 20).

## 0.1.0

First public release of `@imgly/pdf-importer` — convert PDF files into
CE.SDK scenes using `@imgly/pdfjs-dist` (an IMG.LY fork of Mozilla
pdf.js, https://github.com/imgly/pdf.js).

### Pipeline

- Extract → post-process → emit pipeline: walks pdf.js operator streams (`extract/walker.mjs`), emits drawable blocks (images, vector paths, text outlines) in paint order, uses `page.getTextContent()` for editable text runs, merges adjacent same-font/size runs with horizontal overlap into multi-line paragraph blocks, and writes the IR as CE.SDK blocks.
- Text is always imported as full-paragraph blocks (no per-character or per-word splitting mode).
- PDF points are converted to inches (1pt = 1/72 inch) for CE.SDK design units.
- Spot color detection (CutContour, Thru-cut, etc.) for cut/fold mark handling.
- `addGfontsAssetLibrary` helper to register the shared `@imgly/gfonts` asset source.

### Font handling

- Font-strategy cascade with lazy vector fallback. Shipped presets: `editableFirstStrategy` (default), `exactFidelityStrategy`, `assetLibraryStrategy`. `createFontStrategy` / `createFontCascade` for custom cascades.

### Image handling

- Image-modulated luminosity SMask support (per-pixel alpha compositing of soft masks).
- PDF fill rule (`f`/`F` vs `f*`/`B*`) is forwarded to the `vector_path` shape.
- Skia 9-patch tiling-pattern decomposition is accepted as-is.

### Robustness

- 180°-rotated text runs are no longer dropped during text pickup.
- Tiling-pattern bitmaps and pdf.js `objs` cleanup race conditions are handled.
