/** * sanitizeDraft — minimal pre-submit cleanup for chat-composer drafts. * * Mirrors the conservative behaviour ChatGPT / Claude / Telegram * actually ship: clean the obvious junk the user didn't intend to * send, touch nothing that *could* be intentional. * * **What we DO touch:** * * 1. Trim leading/trailing whitespace (spaces, tabs, newlines, * NBSP). The user typing `\n\n hello \n` meant `hello`. * 2. Normalise line endings — `\r\n` / `\r` → `\n`. Pasted Windows * / old-mac text gets the same internal shape, so the LLM * tokeniser and markdown renderer see one canonical form. * 3. Strip zero-width / invisible characters that web-paste * smuggles in: ZWSP (U+200B), ZWNJ (U+200C), ZWJ (U+200D), * BOM / ZWNBSP (U+FEFF). They're invisible, break LLM * tokenisation, and the user never meant to type them. * * **What we DO NOT touch (and why):** * * - **Internal whitespace runs** (3+ spaces, tabs, blank lines). * Code indentation depends on these. ChatGPT preserves them as * typed — " if (x):\n return" stays four-space-indented. * Collapsing them is the path to subtly broken code snippets. * * - **Bidi override marks** (U+200E LRM, U+200F RLM, U+202A..U+202E). * Legitimately used in Arabic / Hebrew / mixed-direction text. * Stripping silently breaks RTL users. If a specific deployment * wants to block them as a security measure, do it at that layer * with explicit user-visible feedback. * * - **Tabs vs spaces** beyond rule 1. Could be either code or * prose; without parsing markdown we can't tell. * * - **Emoji, mentions, URLs, code spans** — passthrough text. * * The function is intentionally tiny — every rule earns its keep * with a concrete "user pasted X from Y, got nonsense" story. * * **Idempotent**: `sanitizeDraft(sanitizeDraft(x)) === sanitizeDraft(x)`. */ export function sanitizeDraft(input: string): string { if (!input) return ''; // Strip zero-width invisibles. Done FIRST so the trim below sees // the real content edges — a stray ZWSP at the start would // otherwise count as non-whitespace and survive the trim. // U+200B ZWSP, U+200C ZWNJ, U+200D ZWJ, U+FEFF BOM/ZWNBSP. let s = input.replace(/[​‌‍]/g, ''); // Normalise line endings. s = s.replace(/\r\n?/g, '\n'); // Trim outer whitespace (includes \n, \t, NBSP via String.trim). s = s.trim(); return s; } /** * Convenience predicate: true when the draft is non-empty AFTER * sanitation. Use to gate Send buttons / Enter submits so an empty * or whitespace-only draft never produces a real message. * * Cheaper than sanitizeDraft(input).length > 0 only marginally — * we still allocate the cleaned string. Kept as a named helper for * call-site clarity. */ export function isSubmittableDraft(input: string): boolean { return sanitizeDraft(input).length > 0; }