# excel-l10n

A configurable Excel (XLSX) extraction → segmentation → XLIFF/JSON export → merge tool for modern JS workflows. Inspired by Okapi's OpenXML filter, with native support for multi-lingual target columns per sheet and rich Excel filter options via a simple JSON/YAML configuration.

## Installation

```bash
# As a dependency in your project
npm install excel-l10n

# Or globally for CLI usage
npm install -g excel-l10n
```

## Quick Start

### CLI Usage

```bash
# Extract to XLIFF (uses SRX segmentation if enabled in config)
excel-l10n extract -c config.yml -i workbook.xlsx -o out.xlf --src-lang en

# Extract to XLIFF 1.2 (default is 2.1)
excel-l10n extract -c config.yml -i workbook.xlsx -o out.xlf --src-lang en --xliff-version 1.2

# Extract to JSON
excel-l10n extract -c config.yml -i workbook.xlsx -o out.json --format json

# Merge translated file back
excel-l10n merge -c config.yml -i workbook.xlsx -t translated.xlf -o workbook.translated.xlsx
```

### Programmatic Usage

```typescript
import { parseConfig, extract, exportUnitsToXliff, parseTranslated, merge } from 'excel-l10n';

// Load configuration
const config = parseConfig('config.yml');

// Extract translatable content
const units = await extract('workbook.xlsx', config);

// Export to XLIFF
const xliff = await exportUnitsToXliff(units, config, { srcLang: 'en' });

// After translation, parse and merge back
const translated = parseTranslated(xliff, 'xlf');
await merge('workbook.xlsx', 'workbook.translated.xlsx', translated, config);
```

## CLI Commands

Advanced:

- Per-locale XLIFF export (one file per target language):

  ```bash
  excel-l10n extract -c config.yml -i workbook.xlsx -o out.xlf --per-locale
  # emits out.fr.xlf, out.de.xlf, ... (based on targetColumns in config)
  ```

- Single bilingual XLIFF with explicit target language:

  ```bash
  excel-l10n extract -c config.yml -i workbook.xlsx -o out.fr.xlf --target-lang fr
  ```

- Merge all translations in one run (auto-detect trgLang per file):

  ```bash
  # input is a directory containing .xlf/.xliff/.json files
  excel-l10n merge -c config.yml -i workbook.xlsx -t ./translated/ -o workbook.merged.xlsx

  # or a comma-separated list
  excel-l10n merge -c config.yml -i workbook.xlsx -t out.fr.xlf,out.de.xlf -o workbook.merged.xlsx
  ```

Quick inline (no config file):

```bash
# extract with inline flags
excel-l10n extract -i in.xlsx --sheet "Sheet1" --source A --target fr=B,de=C -o out.xlf --src-lang en
```

Run `excel-l10n --help` for details.

## API Reference

### Core Functions

```typescript
// Configuration
parseConfig(pathOrObject: string | Config): Config

// Extraction
extract(xlsxPath: string, config: Config): Promise<TranslationUnit[]>

// Export
exportUnitsToXliff(units: TranslationUnit[], config: Config, options?: {
  srcLang?: string;
  trgLang?: string;
  generator?: string;
}): Promise<string>

exportUnitsToJson(units: TranslationUnit[], config: Config, options?: {
  fileName?: string;
}): Promise<string>

// Parsing
parseTranslated(content: string, format: 'xlf' | 'json'): TranslationUnit[]

// Merging
merge(
  inputXlsxPath: string,
  outputXlsxPath: string,
  translatedUnits: TranslationUnit[],
  config: Config
): Promise<void>
```

## Using as a Library

When using `excel-l10n` as a dependency in your project:

```typescript
import { parseConfig, extract, exportUnitsToXliff, merge } from 'excel-l10n';
import path from 'path';

async function localizeWorkbook() {
  // Option 1: Load config from file
  const config = parseConfig('./localization-config.yml');
  
  // Option 2: Create config programmatically
  const config = {
    workbook: {
      sheets: [{
        namePattern: 'Sheet1',
        sourceColumns: ['B'],
        targetColumns: { fr: 'C', de: 'D' },
        html: { enabled: true },
        headerRow: 1,
        valuesStartRow: 2
      }]
    },
    global: {
      srcLang: 'en',
      xliffVersion: '2.1'
    }
  };
  
  // Extract
  const units = await extract('./input.xlsx', config);
  console.log(`Extracted ${units.length} translation units`);
  
  // Export to XLIFF
  const xliff = await exportUnitsToXliff(units, config, { srcLang: 'en' });
  
  // ... send to translation service ...
  
  // Parse translated XLIFF
  const translated = parseTranslated(xliff, 'xlf');
  
  // Merge back to Excel
  await merge('./input.xlsx', './output.xlsx', translated, config);
}
```

### TypeScript Support

Full TypeScript definitions are included. Import types as needed:

```typescript
import type { Config, TranslationUnit, Segment } from 'excel-l10n';
```

## Segmentation (SRX)

- SRX rules are supported via `segmentation.rules.srxPath` (see `examples/default_rules.srx`).
- If no matching rule is found for the locale, a pragmatic built-in sentence splitter is used.
- The locale is derived from `sheet.sourceLocale` or `global.srcLang`.

## XLIFF version selection

- Set `global.xliffVersion` to `"1.2"` or `"2.1"` (default: `"2.1"`) to control XLIFF output format.
- Use CLI flag `--xliff-version 1.2` to override the config.
- XLIFF 2.1 uses `<pc>` elements for inline codes.
- XLIFF 1.2 uses `<g>` elements for inline codes.
- Both versions are fully compatible with popular Translation Management Systems.

## Placeholders and inline codes

- Configure `inlineCodeRegexes` per sheet to detect non-translatable tokens (e.g., `{0}`, `%s`).
- XLIFF export converts tokens to `<ph id="..."/>` with a per-segment placeholder map preserved in a `<note category="ph">` JSON payload for roundtrip.
- JSON export preserves a placeholder map under `unit.meta.placeholders` without altering source text.
- During merge, placeholder markers (e.g., `[[ph:ph1]]` in translated content) are rehydrated back into original tokens.

## HTML inline tags

- When HTML content is detected in cells (e.g., `<div>This is <b>bold</b> text</div>`), inline tags are converted to XLIFF inline elements.
- **XLIFF 2.1**: HTML tags like `<b>bold</b>` become `<pc id="1" dataRef="html_b">bold</pc>`
- **XLIFF 1.2**: HTML tags like `<b>bold</b>` become `<g id="1" ctype="bold">bold</g>`
- Inline elements are properly recognized and protected by Translation Management Systems.
- During merge, XLIFF inline elements are converted back to their original HTML tags.

## XLIFF notes

If `global.exportComments` is true, XLIFF export includes extra `<note>` entries per unit:

- `category=header` — the header cell text for the source column (`headerRow`).
- `category=metadataRows` — a JSON object of metadata row values for this column.
- `category=comments` — cell notes/comments if `translateComments` is enabled.

These notes help maintain roundtrip context (sheet/row/col are always included as a base note).

## Style preservation

When `preserveStyles` is true:
- A minimal style snapshot (font name/size/bold/italic/color, alignment, fill color) is captured at extract time.
- During merge, the snapshot is reapplied to the target cell. If no snapshot exists, styles are copied from the source cell.

Rich text run-level formatting is not preserved in the MVP; this can be extended in future iterations.

## Config highlights

- Sheet selection via `namePattern`.
- `sourceColumns` and `targetColumns` (locale → column letter). Optional auto-create targets.
- Row/column filtering: `headerRow`, `valuesStartRow`, `skipHiddenRows`, `skipHiddenColumns`, `excludedRows/Columns`.
- Color exclusion via `excludeColors`.
- Formula handling via `extractFormulaResults`.
- Merged regions policy via `treatMergedRegions` (top-left | expand | skip).
- Comments via `translateComments`.
- Notes export via `global.exportComments`.
- Merge fallback via `global.mergeFallback` (default: `source`). When a segment lacks a `<target>`, choose to use its `<source>` or leave it empty (`empty`).
- XLIFF version via `global.xliffVersion` (default: `2.1`). Choose between XLIFF 1.2 and 2.1 output formats.

### Example CLI flows

- Extract per-locale XLIFFs, then merge all at once:

```bash
excel-l10n extract -c config.yml -i workbook.xlsx -o out.xlf --per-locale
# Edit out.fr.xlf, out.de.xlf ...
excel-l10n merge -c config.yml -i workbook.xlsx -t ./outdir -o workbook.merged.xlsx
```

- Extract a single bilingual XLIFF for French and merge only FR:

```bash
excel-l10n extract -c config.yml -i workbook.xlsx -o out.fr.xlf --target-lang fr
# Edit out.fr.xlf
excel-l10n merge -c config.yml -i workbook.xlsx -t out.fr.xlf -o workbook.fr.xlsx --target-lang fr
```

See `src/config/schema.json` for the full JSON Schema.

## Tests

- Unit tests cover config parsing, SRX segmentation, utilities and more.
- Integration testing can be added to validate end-to-end roundtrips (example scaffold included under `tests/`).

## Status

MVP implementation with SRX segmentation, placeholders, XLIFF notes, and style preservation. Further enhancements planned:
- Rich text run-level formatting support
- Streaming for very large workbooks
- Expanded Okapi option coverage

## Pseudo-translation

Generate fake translations to test UI expansion and encoding.

```bash
# XLSX → XLSX pseudo
excel-l10n pseudo -c config.yml -i workbook.xlsx -o pseudo.xlsx --target-lang fr

# XLIFF → XLIFF pseudo
excel-l10n extract -c config.yml -i workbook.xlsx -o out.xlf
excel-l10n pseudo -t out.xlf -o out.pseudo.xlf --expand 0.3 --wrap "⟦,⟧"
```

Behavior:
- Wrap text with markers (default ⟦ ⟧)
- Expand length by +30% (configurable)
- Replace characters with accented/uncommon variants
- Preserve placeholders: `{0}`, `%s`, `[[ph:ph1]]`

## Validate translations

Automatically check translated XLIFF/JSON for common issues.

```bash
excel-l10n validate -t translated/ --json --length-factor 2.5
```

Checks include:
- Missing targets
- Placeholder mismatches (`{0}`, `[[ph:ph1]]`)
- Length warnings (ratio > factor)
- ICU categories preserved (plural/select)

Exit code 0 = OK; 1 = findings.

## ICU handling (plural/select)

ICU plural/select blocks are protected during XLIFF export so structure is not accidentally broken. Inner texts are represented with placeholders to preserve logic while still surfacing translatable parts. Validation ensures ICU categories (e.g., one, other) are preserved between source and target.

Example:

```
{count, plural, one {1 file} other {# files}}
```

## Streaming mode (experimental)

For very large workbooks, you can enable streaming extraction.

```bash
excel-l10n extract -c config.yml -i huge.xlsx -o huge.xlf --stream
```

Note: the current release exposes the streaming flag and API; subsequent versions will wire a true streaming reader under the hood for constant-memory processing.

License: Polyform Noncommercial 1.0.0
For personal and non-commercial use only.
Commercial licensing inquiries: kaandemirel@yahoo.com
