export type LiteParseInput = string | Buffer | Uint8Array; export type OutputFormat = "json" | "text" | "markdown"; export type ImageMode = "off" | "placeholder" | "embed"; export interface LiteParseConfig { ocrLanguage: string; ocrEnabled: boolean; ocrServerUrl?: string; /** Extra HTTP headers sent with every request to `ocrServerUrl`. */ ocrServerHeaders?: Record; tessdataPath?: string; maxPages: number; targetPages?: string; dpi: number; outputFormat: OutputFormat; /** How to surface raster images in markdown output (default: "placeholder"). */ imageMode: ImageMode; /** Render hyperlink annotations as `[text](url)` in markdown output (default: true). */ extractLinks: boolean; preserveVerySmallText: boolean; password?: string; quiet: boolean; numWorkers: number; } export interface TextItem { text: string; x: number; y: number; width: number; height: number; fontName?: string; fontSize?: number; confidence?: number; /** Rotation in degrees (viewport space). Defaults to 0 when omitted. */ rotation?: number; } /** * A vector-graphic primitive supplied to {@link LiteParse.parsePages}. `kind` * selects the variant: `"stroke"` (uses `x1/y1/x2/y2`) or `"rect"` (uses * `x/y/width/height`, top-left origin). Coordinates are viewport space (72 DPI), * matching the text items. `hasFill`/`hasStroke` carry the paint intent even * when the color is unknown, so ruled-table edge detection still treats a * colorless stroked rect as stroked. */ export interface Graphic { kind: "stroke" | "rect"; x1?: number; y1?: number; x2?: number; y2?: number; x?: number; y?: number; width?: number; height?: number; hasFill?: boolean; hasStroke?: boolean; fillColor?: string; strokeColor?: string; lineWidth?: number; } /** * A page of pre-extracted text supplied to {@link LiteParse.parsePages}. * Coordinates are viewport space (top-left origin, 72 DPI). `graphics` is * optional; when supplied it enables ruled-table and horizontal-rule detection. */ export interface PageInput { pageNumber: number; pageWidth: number; pageHeight: number; textItems: TextItem[]; graphics?: Graphic[]; } export interface ParsedPage { pageNum: number; width: number; height: number; text: string; textItems: TextItem[]; } export interface ExtractedImage { /** Reference id used in the markdown output (e.g. `![](image_p1_0.png)` → `"p1_0"`). */ id: string; page: number; format: string; bytes: Buffer; } export interface ParseResult { pages: ParsedPage[]; text: string; /** Populated only when configured with `imageMode: "embed"`. */ images: ExtractedImage[]; } export interface ScreenshotResult { pageNum: number; width: number; height: number; imageBuffer: Buffer; } export declare class LiteParse { private _native; private _config; constructor(userConfig?: Partial); parse(input: LiteParseInput): Promise; /** * Parse from pre-extracted pages, skipping PDFium text extraction. Runs only * grid projection + the configured output formatter, so the caller's own * text-extraction / font-recovery owns the text content. Synchronous: no * PDFium load and no OCR on this path. */ parsePages(pages: PageInput[]): ParseResult; screenshot(input: LiteParseInput, pageNumbers?: number[]): Promise; getConfig(): LiteParseConfig; } export interface SearchItemsOptions { phrase: string; caseSensitive?: boolean; } export declare function searchItems(items: TextItem[], options: SearchItemsOptions): TextItem[]; export default LiteParse; //# sourceMappingURL=lib.d.ts.map