[![npm version](https://badge.fury.io/js/web-csv-toolbox.svg)](https://badge.fury.io/js/web-csv-toolbox) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](http://makeapullrequest.com) ![node version](https://img.shields.io/node/v/web-csv-toolbox) [![FOSSA Status](https://app.fossa.com/api/projects/git%2Bgithub.com%2Fkamiazya%2Fweb-csv-toolbox.svg?type=shield)](https://app.fossa.com/projects/git%2Bgithub.com%2Fkamiazya%2Fweb-csv-toolbox?ref=badge_shield) ![GitHub code size in bytes](https://img.shields.io/github/languages/code-size/kamiazya/web-csv-toolbox) ![npm](https://img.shields.io/npm/dm/web-csv-toolbox) [![codecov](https://codecov.io/gh/kamiazya/web-csv-toolbox/graph/badge.svg?token=8RbDcXHTFl)](https://codecov.io/gh/kamiazya/web-csv-toolbox) # `🌐 web-csv-toolbox 🧰` A CSV Toolbox utilizing Web Standard APIs. 🔗 [![GitHub](https://img.shields.io/badge/-GitHub-181717?logo=GitHub&style=flat)](https://github.com/kamiazya/web-csv-toolbox) [![npm](https://img.shields.io/badge/-npm-CB3837?logo=npm&style=flat)](https://www.npmjs.com/package/web-csv-toolbox) [![API Reference](https://img.shields.io/badge/-API%20Reference-3178C6?logo=TypeScript&style=flat&logoColor=fff)](https://kamiazya.github.io/web-csv-toolbox/) [![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/kamiazya/web-csv-toolbox) [![Sponsor](https://img.shields.io/badge/-GitHub%20Sponsor-fff?logo=GitHub%20Sponsors&style=flat)](https://github.com/sponsors/kamiazya) [![CodSpeed Badge](https://img.shields.io/endpoint?url=https://codspeed.io/badge.json)](https://codspeed.io/kamiazya/web-csv-toolbox) [![format: Biome](https://img.shields.io/badge/format%20with-Biome-F7B911?logo=biome&style=flat)](https://biomejs.dev/) [![test: Vitest](https://img.shields.io/badge/tested%20with-Vitest-6E9F18?logo=vitest&style=flat)](https://vitest.dev/) [![build: Vite](https://img.shields.io/badge/build%20with-Vite-646CFF?logo=vite&style=flat)](https://rollupjs.org/)

--- ## Key Concepts ✨ - 🌐 **Web Standards first.** - Utilizing the Web Standards APIs, such as the [Web Streams API](https://developer.mozilla.org/en/docs/Web/API/Streams_API). - ❤️ **TypeScript friendly & User friendly.** - Fully typed and documented. - 0️⃣ **Zero dependencies.** - Using only Web Standards APIs. - 💪 **Property-based testing.** - Using [fast-check](https://fast-check.dev/) and [vitest](https://vitest.dev). - ✅ **Cross-platform.** - Works on browsers, Node.js, and Deno. ## Key Features 📗 - 🌊 **Efficient CSV Parsing with Streams** - 💻 Leveraging the [WHATWG Streams API](https://streams.spec.whatwg.org/) and other Web APIs for seamless and efficient data processing. - 🛑 **AbortSignal and Timeout Support**: Ensure your CSV processing is cancellable, including support for automatic timeouts. - ✋ Integrate with [`AbortController`](https://developer.mozilla.org/docs/Web/API/AbortController) to manually cancel operations as needed. - ⏳ Use [`AbortSignal.timeout`](https://developer.mozilla.org/docs/Web/API/AbortSignal/timeout_static) to automatically cancel operations that exceed a specified time limit. - 🛡️ **Memory Safety Protection**: Built-in limits prevent memory exhaustion attacks. - 🔒 Configurable maximum buffer size (default: 10M characters) to prevent DoS attacks via unbounded input. - 🚨 Throws `RangeError` when buffer exceeds the limit. - 📊 Configurable maximum field count (default: 100,000 fields/record) to prevent excessive column attacks. - ⚠️ Throws `RangeError` when field count exceeds the limit. - 💾 Configurable maximum binary size (default: 100MB bytes) for BufferSource inputs. - 🛑 Throws `RangeError` when binary size exceeds the limit. - 🎨 **Flexible Source Support** - 🧩 Parse CSVs directly from `string`s, `ReadableStream`s, `Response` objects, `Blob`/`File` objects, or `Request` objects. - ⚙️ **Advanced Parsing Options**: Customize your experience with various delimiters and quotation marks. - 🔄 Defaults to `,` and `"` respectively. - 💾 **Specialized Binary CSV Parsing**: Leverage Stream-based processing for versatility and strength. - 🔄 Flexible BOM handling. - 🗜️ Supports various compression formats. - 🔤 Charset specification for diverse encoding. - 🚀 **Using WebAssembly for High Performance**: WebAssembly is used for high performance parsing. (_Experimental_) - 📦 WebAssembly is used for high performance parsing. - ⚠️ **Experimental**: WASM automatic initialization (base64-embedded) is experimental and may change in future versions. - 📦 **Lightweight and Zero Dependencies**: No external dependencies, only Web Standards APIs. - 📚 **Fully Typed and Documented**: Fully typed and documented with [TypeDoc](https://typedoc.org/). ## Installation 📥 ### With Package manager 📦 This package can then be installed using a package manager. ```sh # Install with npm $ npm install web-csv-toolbox # Or Yarn $ yarn add web-csv-toolbox # Or pnpm $ pnpm add web-csv-toolbox ``` ### From CDN (unpkg.com) 🌐 ```html ``` #### Deno 🦕 You can install and use the package by specifying the following: ```js import { parse } from "npm:web-csv-toolbox"; ``` ## Entry Points 🚪 This library provides two entry points to suit different needs: For a deeper comparison and migration guidance, see: - docs/explanation/main-vs-slim.md ### `web-csv-toolbox` (Default - Full Features) **Best for**: Most users who want automatic WASM initialization and all features ```typescript import { loadWASM, parseStringToArraySyncWASM } from 'web-csv-toolbox'; // Optional but recommended: preload to reduce first‑parse latency await loadWASM(); const records = parseStringToArraySyncWASM(csv); ``` **Characteristics:** - ✅ Full features including synchronous WASM APIs - ✅ Automatic WASM initialization on first use (not at import time) - 💡 Call `loadWASM()` at startup to reduce first‑parse latency (optional) - ⚠️ **Experimental**: WASM auto-init embeds WASM as base64, may change in future - ⚠️ Larger bundle size (WASM embedded in main bundle) ### `web-csv-toolbox/slim` (Slim Entry - Smaller Bundle) **Best for**: Bundle size-sensitive applications and production optimization ```typescript import { loadWASM, parseStringToArraySyncWASM } from 'web-csv-toolbox/slim'; // Manual initialization required await loadWASM(); const records = parseStringToArraySyncWASM(csv); ``` **Characteristics:** - ✅ Smaller main bundle (WASM not embedded) - ✅ External WASM loading for better caching - ✅ Explicit control over initialization timing - ❌ Requires manual `loadWASM()` call before using WASM features **Comparison:** | Aspect | Main | Slim | |--------|------|------| | **Initialization** | Automatic | Manual (`loadWASM()` required) | | **Bundle Size** | Larger (WASM embedded) | Smaller (WASM external) | | **Caching** | Single bundle | WASM cached separately | | **Use Case** | Convenience, prototyping | Production, bundle optimization | > **Note**: Both entry points export the same full API (feature parity). The only difference is WASM initialization strategy and bundle size. ## Usage 📘 > **Note for Bundler Users**: When using Worker-based execution strategies (e.g., `EnginePresets.responsive()`, `EnginePresets.responsiveFast()`) with bundlers like Vite or Webpack, you must explicitly specify the `workerURL` option. See the [Bundler Integration Guide](./docs/how-to-guides/using-with-bundlers.md) for configuration details. ### Parsing CSV files from strings ```js import { parse } from 'web-csv-toolbox'; const csv = `name,age Alice,42 Bob,69`; for await (const record of parse(csv)) { console.log(record); } // Prints: // { name: 'Alice', age: '42' } // { name: 'Bob', age: '69' } ``` ### Parsing CSV files from `ReadableStream`s ```js import { parse } from 'web-csv-toolbox'; const csv = `name,age Alice,42 Bob,69`; const stream = new ReadableStream({ start(controller) { controller.enqueue(csv); controller.close(); }, }); for await (const record of parse(stream)) { console.log(record); } // Prints: // { name: 'Alice', age: '42' } // { name: 'Bob', age: '69' } ``` ### Parsing CSV files from `Response` objects ```js import { parse } from 'web-csv-toolbox'; const response = await fetch('https://example.com/data.csv'); for await (const record of parse(response)) { console.log(record); } // Prints: // { name: 'Alice', age: '42' } // { name: 'Bob', age: '69' } ``` ### Parsing CSV files from `Blob` or `File` objects ```js import { parse } from 'web-csv-toolbox'; // From file input const fileInput = document.querySelector('input[type="file"]'); fileInput.addEventListener('change', async (e) => { const file = e.target.files[0]; for await (const record of parse(file)) { console.log(record); } // Prints: // { name: 'Alice', age: '42' } // { name: 'Bob', age: '69' } }); ``` ### Parsing CSV files from `Request` objects (Server-side) ```js import { parse } from 'web-csv-toolbox'; // Cloudflare Workers / Service Workers export default { async fetch(request) { if (request.method === 'POST') { for await (const record of parse(request)) { console.log(record); } // Prints: // { name: 'Alice', age: '42' } // { name: 'Bob', age: '69' } return new Response('OK', { status: 200 }); } } }; ``` ### Parsing CSV files with different delimiters and quotation characters ```js import { parse } from 'web-csv-toolbox'; const csv = `name\tage Alice\t42 Bob\t69`; for await (const record of parse(csv, { delimiter: '\t' })) { console.log(record); } // Prints: // { name: 'Alice', age: '42' } // { name: 'Bob', age: '69' } ``` ### Parsing CSV files with headers ```js import { parse } from 'web-csv-toolbox'; const csv = `Alice,42 Bob,69`; for await (const record of parse(csv, { header: ['name', 'age'] })) { console.log(record); } // Prints: // { name: 'Alice', age: '42' } // { name: 'Bob', age: '69' } ``` ### Working with Headerless CSV Files Some CSV files don’t include a header row. You can provide custom headers manually: ```typescript import { parse } from 'web-csv-toolbox'; // Example: Sensor data without headers const sensorData = `25.5,60,1024 26.1,58,1020 24.8,62,1025`; // Provide headers explicitly for await (const record of parse(sensorData, { header: ['temperature', 'humidity', 'pressure'] })) { console.log(`Temp: ${record.temperature}°C, Humidity: ${record.humidity}%, Pressure: ${record.pressure} hPa`); } // Output: // Temp: 25.5°C, Humidity: 60%, Pressure: 1024 hPa // Temp: 26.1°C, Humidity: 58%, Pressure: 1020 hPa // Temp: 24.8°C, Humidity: 62%, Pressure: 1025 hPa ``` ### `AbortSignal` / `AbortController` Support Support for [`AbortSignal`](https://developer.mozilla.org/docs/Web/API/AbortSignal) / [`AbortController`](https://developer.mozilla.org/docs/Web/API/AbortController), enabling you to cancel ongoing asynchronous CSV processing tasks. This feature is useful for scenarios where processing needs to be halted, such as when a user navigates away from the page or other conditions that require stopping the task early. #### Example Use Case: Abort with user action ```js import { parse } from 'web-csv-toolbox'; const controller = new AbortController(); const csv = "name,age\nAlice,30\nBob,25"; try { // Parse the CSV data then pass the AbortSignal to the parse function for await (const record of parse(csv, { signal: controller.signal })) { console.log(record); } } catch (error) { if (error instanceof DOMException && error.name === 'AbortError') { // The CSV processing was aborted by the user console.log('CSV processing was aborted by the user.'); } else { // An error occurred during CSV processing console.error('An error occurred:', error); } } // Some abort logic, like a cancel button document.getElementById('cancel-button') .addEventListener('click', () => { controller.abort(); }); ``` #### Example Use Case: Abort with timeout ```js import { parse } from 'web-csv-toolbox'; // Set up a timeout of 5 seconds (5000 milliseconds) const signal = AbortSignal.timeout(5000); const csv = "name,age\nAlice,30\nBob,25"; try { // Pass the AbortSignal to the parse function const result = await parse.toArray(csv, { signal }); console.log(result); } catch (error) { if (error instanceof DOMException && error.name === 'TimeoutError') { // Handle the case where the processing was aborted due to timeout console.log('CSV processing was aborted due to timeout.'); } else { // Handle other errors console.error('An error occurred during CSV processing:', error); } } ``` ## Supported Runtimes 💻 ### Works on Node.js | Versions | Status | | -------- | ------ | | 20.x | ✅ | | 22.x | ✅ | | 24.x | ✅ | > Note: For Node environments, the WASM loader uses `import.meta.resolve`. Node.js 20.6+ is recommended. On older Node versions, pass an explicit URL/Buffer to `loadWASM()`. ### Works on Browser | OS | Chrome | Firefox | Default | | ------- | ------ | ------- | ------------- | | Windows | ✅ | ✅ | ✅ (Edge) | | macOS | ✅ | ✅ | ⬜ (Safari *) | | Linux | ✅ | ✅ | - | > **\* Safari**: Basic functionality is expected to work, but it is not yet automatically tested in our CI environment. ### Others - Verify that JavaScript is executable on the Deno. [![Deno CI](https://github.com/kamiazya/web-csv-toolbox/actions/workflows/deno.yaml/badge.svg)](https://github.com/kamiazya/web-csv-toolbox/actions/workflows/deno.yaml) ### Platform-Specific Usage Guide 📚 For detailed examples and best practices for your specific runtime environment, see: **[Platform-Specific Usage Guide](./docs/how-to-guides/platform-usage/)** This guide covers: - 🌐 **Browser**: File input, drag-and-drop, Clipboard API, FormData, Fetch API - 🟢 **Node.js**: Buffer, fs.ReadStream, HTTP requests, stdin/stdout - 🦕 **Deno**: Deno.readFile, Deno.open, fetch API - ⚡ **Edge**: Cloudflare Workers, Deno Deploy, Vercel Edge Functions - 🐰 **Bun**: File API, HTTP server ## APIs 🧑‍💻 ### High-level APIs 🚀 These APIs are designed for **Simplicity and Ease of Use**, providing an intuitive and straightforward experience for users. - **`function parse(input[, options]): AsyncIterableIterator`**: [📑](https://kamiazya.github.io/web-csv-toolbox/functions/parse-1.html) - Parses various CSV input formats into an asynchronous iterable of records. - **`function parse.toArray(input[, options]): Promise`**: [📑](https://kamiazya.github.io/web-csv-toolbox/functions/parse.toArray.html) - Parses CSV input into an array of records, ideal for smaller data sets. The `input` paramater can be: - a `string` - a [ReadableStream](https://developer.mozilla.org/docs/Web/API/ReadableStream) of `string`s or [Uint8Array](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Uint8Array)s - a [BufferSource](https://webidl.spec.whatwg.org/#BufferSource) ([Uint8Array](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/Uint8Array), [ArrayBuffer](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/ArrayBuffer), or other [TypedArray](https://developer.mozilla.org/docs/Web/JavaScript/Reference/Global_Objects/TypedArray)) - a [Response](https://developer.mozilla.org/docs/Web/API/Response) object - a [Blob](https://developer.mozilla.org/docs/Web/API/Blob) or [File](https://developer.mozilla.org/docs/Web/API/File) object - a [Request](https://developer.mozilla.org/docs/Web/API/Request) object (server-side) ### Middle-level APIs 🧱 These APIs are optimized for **Enhanced Performance and Control**, catering to users who need more detailed and fine-tuned functionality. - **`function parseString(string[, options])`**: [📑](https://kamiazya.github.io/web-csv-toolbox/functions/parseString-1.html) - Efficient parsing of CSV strings. - **`function parseBinary(buffer[, options])`**: [📑](https://kamiazya.github.io/web-csv-toolbox/functions/parseBinary-1.html) - Parse CSV binary data from BufferSource (Uint8Array, ArrayBuffer, or other TypedArray). - **`function parseResponse(response[, options])`**: [📑](https://kamiazya.github.io/web-csv-toolbox/functions/parseResponse-1.html) - Customized parsing directly from `Response` objects. - **`function parseRequest(request[, options])`**: [📑](https://kamiazya.github.io/web-csv-toolbox/functions/parseRequest-1.html) - Server-side parsing from `Request` objects (Cloudflare Workers, Service Workers, etc.). - **`function parseBlob(blob[, options])`**: [📑](https://kamiazya.github.io/web-csv-toolbox/functions/parseBlob-1.html) - Parse CSV data from `Blob` or `File` objects. - **`function parseFile(file[, options])`**: [📑](https://kamiazya.github.io/web-csv-toolbox/functions/parseFile-1.html) - Parse `File` objects with automatic filename tracking in error messages. - **`function parseStream(stream[, options])`**: [📑](https://kamiazya.github.io/web-csv-toolbox/functions/parseStream-1.html) - Stream-based parsing for larger or continuous data. - **`function parseStringStream(stream[, options])`**: [📑](https://kamiazya.github.io/web-csv-toolbox/functions/parseStringStream-1.html) - Combines string-based parsing with stream processing. - **`function parseBinaryStream(stream[, options])`**: [📑](https://kamiazya.github.io/web-csv-toolbox/functions/parseBinaryStream-1.html) - Parses binary streams with precise control over data types. ### Low-level APIs ⚙️ These APIs are built for **Advanced Customization and Pipeline Design**, ideal for developers looking for in-depth control and flexibility. The low-level APIs follow a 3-tier architecture: #### Parser Models (Tier 1: Simplified Composition) Combines Lexer and Assembler for streamlined usage without sacrificing flexibility. - **`function createStringCSVParser(options?)`** - Factory function for creating format-specific CSV parsers. - Returns `FlexibleStringObjectCSVParser` (default) or `FlexibleStringArrayCSVParser` based on `outputFormat` option. - Parses CSV strings by composing `FlexibleStringCSVLexer` and CSV Record Assembler. - Stateful parser maintains internal lexer and assembler instances for streaming. - Use with `StringCSVParserStream` for streaming workflows. - **Low-level API**: Accepts `CSVProcessingOptions` only (no `engine` option). - **Streaming mode**: When using `parse(chunk, { stream: true })`, you must call `parse()` without arguments at the end to flush any remaining data. ```typescript // Object format (default) const objectParser = createStringCSVParser({ header: ['name', 'age'] }); // Array format const arrayParser = createStringCSVParser({ header: ['name', 'age'], outputFormat: 'array' }); // Process chunks const records1 = objectParser.parse('Alice,30\nBob,', { stream: true }); const records2 = objectParser.parse('25\nCharlie,', { stream: true }); // Flush remaining data (required!) const records3 = objectParser.parse(); ``` - **Direct class usage**: - `FlexibleStringObjectCSVParser` - Always outputs object records - `FlexibleStringArrayCSVParser` - Always outputs array records - **`function createBinaryCSVParser(options?)`** - Factory function for creating format-specific binary CSV parsers. - Returns `FlexibleBinaryObjectCSVParser` (default) or `FlexibleBinaryArrayCSVParser` based on `outputFormat` option. - Parses binary CSV data (BufferSource: Uint8Array, ArrayBuffer, or other TypedArray) by composing `TextDecoder` with string CSV parser. - Uses `TextDecoder` with `stream: true` option for proper multi-byte character handling across chunk boundaries. - Supports various character encodings (utf-8, shift_jis, etc.) via `charset` option. - BOM handling via `ignoreBOM` option, fatal error mode via `fatal` option. - Use with `BinaryCSVParserStream` for streaming workflows. - **Low-level API**: Accepts `BinaryCSVProcessingOptions` only (no `engine` option). - **Streaming mode**: When using `parse(chunk, { stream: true })`, you must call `parse()` without arguments at the end to flush TextDecoder and parser buffers. ```typescript // Object format (default) const objectParser = createBinaryCSVParser({ header: ['name', 'age'], charset: 'utf-8' }); // Array format const arrayParser = createBinaryCSVParser({ header: ['name', 'age'], outputFormat: 'array', charset: 'utf-8' }); const encoder = new TextEncoder(); // Process chunks const records1 = objectParser.parse(encoder.encode('Alice,30\nBob,'), { stream: true }); const records2 = objectParser.parse(encoder.encode('25\n'), { stream: true }); // Flush remaining data (required!) const records3 = objectParser.parse(); ``` - **Direct class usage**: - `FlexibleBinaryObjectCSVParser` - Always outputs object records - `FlexibleBinaryArrayCSVParser` - Always outputs array records #### Lexer (Tier 2: Stage 1 - Tokenization) Low-level tokenization with full control over CSV syntax. - **`function createStringCSVLexer(options?)` / `class FlexibleStringCSVLexer`** - Factory helper plus underlying class for the standalone lexer used across the toolkit. - Configure delimiters, quotation, buffer limits, and cancellation per stream. - Returns tokens (field values, row delimiters, etc.) for manual processing. #### Assembler (Tier 2: Stage 2 - Record Assembly) Converts tokens into structured records with flexible formatting. - **`function createCSVRecordAssembler(options)`** - Factory that returns either an object- or array-format assembler based on `outputFormat`. - Applies new options like `includeHeader` and `columnCountStrategy` consistently across environments. - **`class FlexibleCSVObjectRecordAssembler` / `class FlexibleCSVArrayRecordAssembler`** - Specialized assemblers when you need full control over object vs tuple output or want to extend behavior. - `FlexibleCSVRecordAssembler` remains for backward compatibility but now delegates to these focused implementations. #### Streaming Transformers (Tier 3: TransformStream Integration) Web Streams API integration for all processing tiers. - **`class StringCSVParserStream`** - `TransformStream` for streaming string parsing. - Wraps Parser instances (accepts parser in constructor, doesn't construct internally). - Configurable backpressure handling via `backpressureCheckInterval` option. - Custom queuing strategies support for fine-tuned performance. - **`class BinaryCSVParserStream`** - `TransformStream` for streaming binary parsing. - Handles UTF-8 multi-byte characters across chunk boundaries. - Integration-ready for fetch API and file streaming. - Backpressure management with configurable check intervals. - **`createStringCSVLexerTransformer()`**: [📑](https://kamiazya.github.io/web-csv-toolbox/functions/createStringCSVLexerTransformer.html) - Factory function to create a StringCSVLexerTransformer with customizable queuing strategies. - **`createCSVRecordAssemblerTransformer()`**: [📑](https://kamiazya.github.io/web-csv-toolbox/functions/createCSVRecordAssemblerTransformer.html) - Factory function to create a CSVRecordAssemblerTransformer with cooperative backpressure support. These factory functions are the recommended way to create transformer instances. They encapsulate internal lexer/assembler initialization and provide sensible defaults, insulating your code from internal implementation changes. Direct class instantiation (`new StringCSVLexerTransformer(customLexer)`) is only needed when injecting a custom lexer implementation—see [Custom Lexer/Assembler](./docs/how-to-guides/custom-csv-parser.md#low-level-custom-lexerassembler) for advanced use cases. #### Customizing Queuing Strategies Both `createStringCSVLexerTransformer()` and `createCSVRecordAssemblerTransformer()` support custom queuing strategies following the Web Streams API pattern. Strategies are passed as function arguments with **data-type-aware size counting** and **configurable backpressure handling**. **Function signature:** ```typescript createStringCSVLexerTransformer(options?, streamOptions?, writableStrategy?, readableStrategy?) createCSVRecordAssemblerTransformer(options?, streamOptions?, writableStrategy?, readableStrategy?) ``` **Default queuing strategies (starting points, not benchmarked):** ```typescript // StringCSVLexerTransformer defaults createStringCSVLexerTransformer( { delimiter: ',' }, // CSV options { backpressureCheckInterval: 100 }, // Check every 100 tokens { highWaterMark: 65536, // 64KB of characters size: (chunk) => chunk.length, // Count by string length }, new CountQueuingStrategy({ highWaterMark: 1024 }) // 1024 tokens ) // CSVRecordAssemblerTransformer defaults createCSVRecordAssemblerTransformer( { header: ['name', 'age'] }, // Assembler options { backpressureCheckInterval: 10 }, // Check every 10 records new CountQueuingStrategy({ highWaterMark: 1024 }), // 1024 tokens new CountQueuingStrategy({ highWaterMark: 256 }) // 256 records ) ``` **Key Features:** 🎯 **Smart Size Counting:** - Character-based counting for string inputs (accurate memory tracking) - Token-based counting between transformers (smooth pipeline flow) - Record-based counting for output (intuitive and predictable) ⚡ **Cooperative Backpressure:** - Monitors `controller.desiredSize` during processing - Yields to event loop when backpressure detected - Prevents blocking the main thread - Critical for browser UI responsiveness 🔧 **Tunable Backpressure Check Interval:** - `backpressureCheckInterval` (in options): How often to check for backpressure (count-based) - Lower values (5-25): More responsive, slight overhead - Higher values (100-500): Less overhead, slower response - Customize based on downstream consumer speed > ⚠️ **Important**: These defaults are theoretical starting points based on data flow characteristics, **not empirical benchmarks**. Optimal values vary by runtime (browser/Node.js/Deno), file size, memory constraints, and CPU performance. **Profile your specific use case** to find the best values. **When to customize:** - 🚀 **High-throughput servers**: Higher `highWaterMark` (128KB+, 2048+ tokens), higher `backpressureCheckInterval` (200-500) - 📱 **Memory-constrained environments**: Lower `highWaterMark` (16KB, 256 tokens), lower `backpressureCheckInterval` (10-25) - 🐌 **Slow consumers** (DB writes, API calls): Lower `highWaterMark`, lower `backpressureCheckInterval` for responsive backpressure - 🏃 **Fast processing**: Higher values to reduce overhead **Example - High-throughput server:** ```typescript import { createStringCSVLexerTransformer, createCSVRecordAssemblerTransformer } from 'web-csv-toolbox'; const response = await fetch('large-dataset.csv'); await response.body .pipeThrough(new TextDecoderStream()) .pipeThrough(createStringCSVLexerTransformer( { delimiter: ',' }, { backpressureCheckInterval: 200 }, // Less frequent checks { highWaterMark: 131072, // 128KB size: (chunk) => chunk.length, }, new CountQueuingStrategy({ highWaterMark: 2048 }) // 2048 tokens )) .pipeThrough(createCSVRecordAssemblerTransformer( {}, // Use default assembler options { backpressureCheckInterval: 20 }, // Less frequent checks new CountQueuingStrategy({ highWaterMark: 2048 }), // 2048 tokens new CountQueuingStrategy({ highWaterMark: 512 }) // 512 records )) .pipeTo(yourRecordProcessor); ``` **Example - Slow consumer (API writes):** ```typescript import { createStringCSVLexerTransformer, createCSVRecordAssemblerTransformer } from 'web-csv-toolbox'; await csvStream .pipeThrough(createStringCSVLexerTransformer()) // Use defaults .pipeThrough(createCSVRecordAssemblerTransformer( {}, // Use default assembler options { backpressureCheckInterval: 2 }, // Very responsive new CountQueuingStrategy({ highWaterMark: 512 }), new CountQueuingStrategy({ highWaterMark: 64 }) )) .pipeTo(new WritableStream({ async write(record) { await fetch('/api/save', { method: 'POST', body: JSON.stringify(record) }); } })); ``` **Benchmarking:** Use the provided benchmark tool to find optimal values for your use case: ```bash pnpm --filter web-csv-toolbox-benchmark queuing-strategy ``` See `benchmark/queuing-strategy.bench.ts` for implementation details. ### Experimental APIs 🧪 These APIs are experimental and may change in the future. #### Parsing using WebAssembly for high performance. You can use WebAssembly to parse CSV data for high performance. ⚠️ **Experimental Notice**: - WASM automatic initialization is experimental and may change in future versions - Currently embeds WASM as base64 in the main bundle - Future versions may change the loading strategy for better bundle size optimization **WASM Limitations:** - Parsing with WebAssembly is faster than parsing with JavaScript, but it takes time to load the WebAssembly module. - Supports only UTF-8 encoding csv data. - Quotation characters are only `"`. (Double quotation mark) - If you pass a different character, it will throw an error. - Record output is always object-shaped; `outputFormat: 'array'` requires the JavaScript engine (`engine: { wasm: false }`). ```ts import { loadWASM, parseStringToArraySyncWASM } from "web-csv-toolbox"; // load WebAssembly module await loadWASM(); const csv = "a,b,c\n1,2,3"; // parse CSV string const result = parseStringToArraySyncWASM(csv); console.log(result); // Prints: // [{ a: "1", b: "2", c: "3" }] ``` - **`function loadWASM(): Promise`**: [📑](https://kamiazya.github.io/web-csv-toolbox/functions/loadWASM.html) - Loads the WebAssembly module. - **`function parseStringToArraySyncWASM(string[, options]): CSVRecord[]`**: [📑](https://kamiazya.github.io/web-csv-toolbox/functions/parseStringToArraySyncWASM.html) - Parses CSV strings into an array of records. ## Options Configuration 🛠️ ### Common Options ⚙️ | Option | Description | Default | Notes | | ---------------- | ------------------------------------- | ------------ | ---------------------------------------------------------------------------------- | | `delimiter` | Character to separate fields | `,` | | | `quotation` | Character used for quoting fields | `"` | | | `maxBufferSize` | Maximum internal buffer size (characters) | `10 * 1024 * 1024` | Set to `Number.POSITIVE_INFINITY` to disable (not recommended for untrusted input). Measured in UTF-16 code units. | | `maxFieldCount` | Maximum fields allowed per record | `100000` | Set to `Number.POSITIVE_INFINITY` to disable (not recommended for untrusted input) | | `header` | Custom headers for the parsed records | First row | If not provided, the first row is used as headers | | `outputFormat` | Record shape (`'object'` or `'array'`) | `'object'` | `'array'` returns type-safe tuples; not available when running through WASM today | | `includeHeader` | Emit header row when using array output | `false` | Only valid with `outputFormat: 'array'` — the header becomes the first emitted record | | `columnCountStrategy` | Handle column-count mismatches when a header is provided | `'keep'` for array format / `'pad'` for object format | Choose between `keep`, `pad`, `strict`, or `truncate` to control how rows align with the header | | `signal` | AbortSignal to cancel processing | `undefined` | Allows aborting of long-running operations | #### Record Output Formats High-level and mid-level parsers now let you choose whether records come back as objects (default) or as tuple-like arrays: ```ts const header = ["name", "age"]; // Object output (default) for await (const record of parse(csv, { header })) { record.name; // string } // Array output with named tuples const rows = await parse.toArray(csv, { header, outputFormat: "array", includeHeader: true, columnCountStrategy: "pad", engine: { wasm: false }, // Array output currently runs on the JS engine only }); // rows[0] === ['name', 'age'] (header row) // rows[1] has type readonly [name: string, age: string] ``` - `outputFormat: 'object'` (default) returns familiar `{ column: value }` objects. - `outputFormat: 'array'` returns readonly tuples whose indices inherit names from the header for stronger TypeScript inference. - `includeHeader: true` prepends the header row when you also set `outputFormat: 'array'`. - `columnCountStrategy` controls how rows with too many or too few columns are treated when a header is present: - `keep`: emit rows exactly as they appear (default for array output with inferred headers) - `pad`: fill short rows with `undefined` and truncate long rows (default for object output) - `strict`: throw if the row length differs from the header - `truncate`: discard columns beyond the header length without padding short rows > ⚠️ Array output is not yet available inside the WebAssembly execution path. If you request `outputFormat: 'array'`, force the JavaScript engine with `engine: { wasm: false }` (or run in an environment where WASM is disabled). ### Advanced Options (Binary-Specific) 🧬 | Option | Description | Default | Notes | | --------------------------------- | ------------------------------------------------- | ------- | --------------------------------------------------------------------------------------------------------------------------------------------------------- | | `charset` | Character encoding for binary CSV inputs | `utf-8` | See [Encoding API Compatibility](https://developer.mozilla.org/en-US/docs/Web/API/Encoding_API/Encodings) for the encoding formats that can be specified. | | `maxBinarySize` | Maximum binary size for BufferSource inputs (bytes) | `100 * 1024 * 1024` (100MB) | Set to `Number.POSITIVE_INFINITY` to disable (not recommended for untrusted input) | | `decompression` | Decompression algorithm for compressed CSV inputs | | See [DecompressionStream Compatibility](https://developer.mozilla.org/en-US/docs/Web/API/DecompressionStream#browser_compatibility). Default support: gzip, deflate. deflate-raw is runtime-dependent and experimental (requires `allowExperimentalCompressions: true` for Response/Request inputs). | | `ignoreBOM` | Whether to ignore Byte Order Mark (BOM) | `false` | See [TextDecoderOptions.ignoreBOM](https://developer.mozilla.org/en-US/docs/Web/API/TextDecoderStream/ignoreBOM) for more information about the BOM. | | `fatal` | Throw an error on invalid characters | `false` | See [TextDecoderOptions.fatal](https://developer.mozilla.org/en-US/docs/Web/API/TextDecoderStream/fatal) for more information. | | `allowExperimentalCompressions` | Allow experimental/future compression formats | `false` | When enabled, passes unknown compression formats to runtime. Use cautiously. See example below. | ## Performance & Best Practices ⚡ ### Memory Characteristics web-csv-toolbox uses different memory patterns depending on the API you choose: #### 🌊 Streaming APIs (Memory Efficient) ##### Recommended for large files (> 10MB) ```js import { parse } from 'web-csv-toolbox'; // ✅ Memory efficient: processes one record at a time const response = await fetch('https://example.com/large-data.csv'); for await (const record of parse(response)) { console.log(record); // Memory footprint: ~few KB per iteration } ``` - **Memory usage**: O(1) - constant per record - **Suitable for**: Files of any size, browser environments - **Max file size**: Limited only by available storage/network #### 📦 Array-Based APIs (Memory Intensive) ##### Recommended for small files (< 1MB) ```js import { parse } from 'web-csv-toolbox'; // ⚠️ Loads entire result into memory const csv = await fetch('data.csv').then(r => r.text()); const records = await parse.toArray(csv); // Memory footprint: entire file + parsed array ``` - **Memory usage**: O(n) - proportional to file size - **Suitable for**: Small datasets, quick prototyping - **Recommended max**: ~10MB (browser), ~100MB (Node.js) ### Platform-Specific Considerations | Platform | Streaming | Array-Based | Notes | |----------|-----------|-------------|-------| | **Browser** | Any size | < 10MB | Browser heap limits apply (~100MB-4GB depending on browser) | | **Node.js** | Any size | < 100MB | Use `--max-old-space-size` flag for larger heaps | | **Deno** | Any size | < 100MB | Similar to Node.js | ### Performance Tips #### 1. Use streaming for large files ```js import { parse } from 'web-csv-toolbox'; const response = await fetch('https://example.com/large-data.csv'); // ✅ Good: Streaming approach (constant memory usage) for await (const record of parse(response)) { // Process each record immediately console.log(record); // Memory footprint: O(1) - only one record in memory at a time } // ❌ Avoid: Loading entire file into memory first const response2 = await fetch('https://example.com/large-data.csv'); const text = await response2.text(); // Loads entire file into memory const records = await parse.toArray(text); // Loads all records into memory for (const record of records) { console.log(record); // Memory footprint: O(n) - entire file + all records in memory } ``` #### 2. Enable AbortSignal for timeout protection ```js import { parse } from 'web-csv-toolbox'; // Set up a timeout of 30 seconds (30000 milliseconds) const signal = AbortSignal.timeout(30000); const response = await fetch('https://example.com/large-data.csv'); try { for await (const record of parse(response, { signal })) { // Process each record console.log(record); } } catch (error) { if (error instanceof DOMException && error.name === 'TimeoutError') { // Handle timeout console.log('CSV processing was aborted due to timeout.'); } else { // Handle other errors console.error('An error occurred during CSV processing:', error); } } ``` #### 3. Use WebAssembly parser for CPU-intensive workloads (Experimental) ```js import { parseStringToArraySyncWASM } from 'web-csv-toolbox'; // Compiled WASM code for improved performance (UTF-8 only) // See CodSpeed benchmarks for actual performance metrics const records = parseStringToArraySyncWASM(csvString); ``` ### Known Limitations - **Delimiter/Quotation**: Must be a single character (multi-character delimiters not supported) - **WASM Parser**: UTF-8 encoding only, double-quote (`"`) only - **Streaming**: Best performance with chunk sizes > 1KB ### Security Considerations For production use with untrusted input, consider: - Setting timeouts using `AbortSignal.timeout()` to prevent resource exhaustion - Using `maxBinarySize` option to limit BufferSource inputs (default: 100MB bytes) - Using `maxBufferSize` option to limit internal buffer size (default: 10M characters) - Using `maxFieldCount` option to limit fields per record (default: 100,000) - Implementing additional file size limits at the application level - Validating parsed data before use #### Implementing Size Limits for Untrusted Sources When processing CSV files from untrusted sources (especially compressed files), you can implement size limits using a custom TransformStream: ```js import { parse } from 'web-csv-toolbox'; // Create a size-limiting TransformStream class SizeLimitStream extends TransformStream { constructor(maxBytes) { let bytesRead = 0; super({ transform(chunk, controller) { bytesRead += chunk.length; if (bytesRead > maxBytes) { controller.error(new Error(`Size limit exceeded: ${maxBytes} bytes`)); } else { controller.enqueue(chunk); } } }); } } // Example: Limit decompressed data to 10MB const response = await fetch('https://untrusted-source.com/data.csv.gz'); const limitedStream = response.body .pipeThrough(new DecompressionStream('gzip')) .pipeThrough(new SizeLimitStream(10 * 1024 * 1024)); // 10MB limit try { for await (const record of parse(limitedStream)) { console.log(record); } } catch (error) { if (error.message.includes('Size limit exceeded')) { console.error('File too large - possible compression bomb attack'); } } ``` **Note**: The library automatically validates Content-Encoding headers when parsing Response objects, rejecting unsupported compression formats. #### Using Experimental Compression Formats By default, the library only supports well-tested compression formats: `gzip` and `deflate`. Some runtimes may support additional formats like `deflate-raw` or Brotli, but these are runtime-dependent and not guaranteed. If you need to use these formats, you can enable experimental mode: ```js import { parse } from 'web-csv-toolbox'; // ✅ Default behavior: Only known formats const response = await fetch('data.csv.gz'); await parse(response); // Works // ⚠️ Experimental: Allow future formats const response2 = await fetch('data.csv.br'); // Brotli compression try { await parse(response2, { allowExperimentalCompressions: true }); // Works if runtime supports Brotli } catch (error) { // Runtime will throw if format is unsupported console.error('Runtime does not support this compression format'); } ``` **When to use this:** - Your runtime supports a newer compression format (e.g., Brotli in modern browsers) - You want to use the format before this library explicitly supports it - You trust the compression format source **Cautions:** - Error messages will come from the runtime, not this library - No library-level validation for unknown formats - You must verify your runtime supports the format ## How to Contribute 💪 ## Star ⭐ The easiest way to contribute is to use the library and star the [repository](https://github.com/kamiazya/web-csv-toolbox/). ### Questions 💭 Feel free to ask questions on [GitHub Discussions](https://github.com/kamiazya/web-csv-toolbox/discussions). ### Report bugs / request additional features 💡 Please create an issue at [GitHub Issues](https://github.com/kamiazya/web-csv-toolbox/issues/new/choose). ### Financial Support 💸 Please support [kamiazya](https://github.com/sponsors/kamiazya). > Even just a dollar is enough motivation to develop 😊 ## License ⚖️ This software is released under the MIT License, see [LICENSE](https://github.com/kamiazya/web-csv-toolbox/blob/main/LICENSE). [![FOSSA Status](https://app.fossa.com/api/projects/git%2Bgithub.com%2Fkamiazya%2Fweb-csv-toolbox.svg?type=large)](https://app.fossa.com/projects/git%2Bgithub.com%2Fkamiazya%2Fweb-csv-toolbox?ref=badge_large)