# asyncLLM [![npm version](https://img.shields.io/npm/v/asyncllm.svg)](https://www.npmjs.com/package/asyncllm) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [![bundle size](https://img.shields.io/bundlephobia/minzip/asyncllm)](https://bundlephobia.com/package/asyncllm) Fetch LLM responses across multiple providers as an async iterable. - 🚀 Lightweight (~2KB) and dependency-free - 🔄 Works with multiple LLM providers (OpenAI, Anthropic, Gemini, and more) - 🌐 Browser and Node.js compatible - 📦 Easy to use with ES modules ## Installation Add this to your script: ```js import { asyncLLM } from "asyncllm"; ``` To use via CDN, add this to your HTML file: ```html ``` To use locally, install via `npm`: ```bash npm install asyncllm ``` ... and add this to your HTML file: ```html ``` ## Usage ### Streaming Call `asyncLLM()` just like you would use `fetch` with any LLM provider with streaming responses. - [OpenAI Chat Completion Streaming](https://platform.openai.com/docs/api-reference/chat-streaming). Many providers including [Anthropic](https://docs.anthropic.com/en/api/openai-sdk), [Gemini](https://ai.google.dev/gemini-api/docs/openai), [Vertex AI](https://cloud.google.com/vertex-ai/generative-ai/docs/start/openai), [OpenRouter](https://openrouter.ai/docs/quickstart), [Groq](https://console.groq.com/docs/api-reference#chat-create), [Cerebras](https://inference-docs.cerebras.ai/resources/openai), [Azure](https://learn.microsoft.com/en-us/azure/ai-foundry/openai/reference), etc. follow the OpenAI Chat Completion API. - [OpenAI Responses API Streaming](https://platform.openai.com/docs/api-reference/responses-streaming). - [Anthropic Streaming](https://docs.anthropic.com/en/api/messages-streaming) - [Gemini Streaming](https://ai.google.dev/gemini-api/docs/text-generation?lang=rest#generate-a-text-stream) - [Gemini Interactions API Streaming](https://ai.google.dev/gemini-api/docs/interactions) The result is an async generator that yields objects with `content`, `error`, `tools`, and `message` properties. For example, to update the DOM with the LLM's response: ```javascript import { asyncLLM } from "https://cdn.jsdelivr.net/npm/asyncllm@2"; const body = { model: "gpt-5-nano", // You MUST enable streaming, else the API will return an {error} stream: true, messages: [{ role: "user", content: "Hello, world!" }], }; for await (const { content, error } of asyncLLM("https://api.openai.com/v1/chat/completions", { method: "POST", headers: { "Content-Type": "application/json", Authorization: `Bearer ${apiKey}` }, body: JSON.stringify(body), })) { if (content) document.getElementById("output").textContent = content; } ``` This will log something like this on the console: ```js { content: "", message: { "id": "chatcmpl-...", ...} } { content: "Hello", message: { "id": "chatcmpl-...", ...} } { content: "Hello!", message: { "id": "chatcmpl-...", ...} } { content: "Hello! How", message: { "id": "chatcmpl-...", ...} } ... { content: "Hello! How can I assist you today?", message: { "id": "chatcmpl-...", ...} } ``` ### Anthropic and Gemini Adapters Adapters convert OpenAI chat completions request bodies to the [Anthropic](https://docs.anthropic.com/en/api/messages) or [Gemini](https://ai.google.dev/gemini-api/docs/text-generation?lang=rest) formats. For example: ```javascript import { anthropic } from "https://cdn.jsdelivr.net/npm/asyncllm@2/dist/anthropic.js"; import { gemini } from "https://cdn.jsdelivr.net/npm/asyncllm@2/dist/gemini.js"; // Create an OpenAI chat completions request const body = { messages: [{ role: "user", content: "Hello, world!" }], temperature: 0.5, }; // Fetch request with the Anthropic API const anthropicResponse = await fetch("https://api.anthropic.com/v1/messages", { method: "POST", headers: { "Content-Type": "application/json", "x-api-key": "YOUR_API_KEY" }, // anthropic() converts the OpenAI chat completions request to Anthropic's format body: JSON.stringify(anthropic({ ...body, model: "claude-3-haiku-20240307" })), }).then((r) => r.json()); // Fetch request with the Gemini API const geminiResponse = await fetch( "https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash-8b:generateContent", { method: "POST", headers: { "Content-Type": "application/json", Authorization: `Bearer YOUR_API_KEY` }, // gemini() converts the OpenAI chat completions request to Gemini's format body: JSON.stringify(gemini(body)), }, ).then((r) => r.json()); ``` Here are the parameters supported by each provider. | OpenAI Parameter | Anthropic | Gemini | | ----------------------------------- | --------- | ------ | | messages | Y | Y | | system message | Y | Y | | temperature | Y | Y | | max_tokens | Y | Y | | top_p | Y | Y | | stop sequences | Y | Y | | stream | Y | Y | | presence_penalty | | Y | | frequency_penalty | | Y | | logprobs | | Y | | top_logprobs | | Y | | n (multiple candidates) | | Y | | metadata.user_id | Y | | | tools/functions | Y | Y | | tool_choice | Y | Y | | parallel_tool_calls | Y | | | response_format.type: "json_object" | | Y | | response_format.type: "json_schema" | | Y | Content types: | OpenAI | Anthropic | Gemini | | ------ | --------- | ------ | | Text | Y | Y | | Images | Y | Y | | Audio | | Y | Image Sources | OpenAI Parameter | Anthropic | Gemini | | ---------------- | --------- | ------ | | Data URI | Y | Y | | External URLs | | Y | ### OpenAI Responses API streaming ```javascript import { asyncLLM } from "https://cdn.jsdelivr.net/npm/asyncllm@2"; const body = { model: "gpt-5-mini", // You MUST enable streaming, else the API will return an {error} stream: true, input: "Hello, world!", }; for await (const data of asyncLLM("https://api.openai.com/v1/responses", { method: "POST", headers: { "Content-Type": "application/json", Authorization: `Bearer ${apiKey}` }, body: JSON.stringify(body), })) { console.log(data); } ``` This will log something like this on the console: ```js { content: "Hello", message: { "item_id": "msg_...", ...} } { content: "Hello!", message: { "item_id": "msg_...", ...} } { content: "Hello! How", message: { "item_id": "msg_...", ...} } ... { content: "Hello! How can I assist you today?", message: { "item_id": "msg_...", ...} } ``` ### Anthropic streaming The package includes an Anthropic adapter that converts OpenAI chat completions requests to Anthropic's format, allowing you to use the same code structure across providers. ```javascript import { asyncLLM } from "https://cdn.jsdelivr.net/npm/asyncllm@2"; import { anthropic } from "https://cdn.jsdelivr.net/npm/asyncllm@2/dist/anthropic.js"; // You can use the anthropic() adapter to convert OpenAI chat completions requests to Anthropic's format. const body = anthropic({ // Same as OpenAI example above }); // Or you can use the asyncLLM() function directly with the Anthropic API endpoint. const body = { model: "claude-3-haiku-20240307", // You MUST enable streaming, else the API will return an {error} stream: true, max_tokens: 10, messages: [{ role: "user", content: "What is 2 + 2" }], }; for await (const data of asyncLLM("https://api.anthropic.com/v1/messages", { headers: { "Content-Type": "application/json", "x-api-key": apiKey }, body: JSON.stringify(body), })) { console.log(data); } ``` ### Gemini streaming The package includes a Gemini adapter that converts OpenAI chat completions requests to Gemini's format, allowing you to use the same code structure across providers. ```javascript import { asyncLLM } from "https://cdn.jsdelivr.net/npm/asyncllm@2"; import { gemini } from "https://cdn.jsdelivr.net/npm/asyncllm@2/dist/gemini.js"; // You can use the gemini() adapter to convert OpenAI chat completions requests to Gemini's format. const body = gemini({ // Same as OpenAI example above }); // Or you can use the asyncLLM() function directly with the Gemini API endpoint. const body = { contents: [{ role: "user", parts: [{ text: "What is 2+2?" }] }], }; for await (const data of asyncLLM( // You MUST use a streaming endpoint, else the API will return an {error} "https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash-8b:streamGenerateContent?alt=sse", { method: "POST", headers: { "Content-Type": "application/json", "x-goog-api-key": apiKey, }, body: JSON.stringify(body), }, )) { console.log(data); } ``` ### Gemini Interactions API streaming Gemini also supports streaming via the [Interactions API](https://ai.google.dev/gemini-api/docs/interactions), which is similar to OpenAI's Responses API and supports tools/function calls and server-side state. ```javascript import { asyncLLM } from "https://cdn.jsdelivr.net/npm/asyncllm@2"; const body = { model: "gemini-2.5-flash", stream: true, input: "Say exactly: OK", }; for await (const data of asyncLLM("https://generativelanguage.googleapis.com/v1beta/interactions?alt=sse", { method: "POST", headers: { "Content-Type": "application/json", "x-goog-api-key": apiKey, }, body: JSON.stringify(body), })) { console.log(data); } ``` ### Function Calling asyncLLM supports function calling (aka tools). Here's an example with OpenAI chat completions: ```javascript for await (const { tools } of asyncLLM("https://api.openai.com/v1/chat/completions", { method: "POST", headers: { "Content-Type": "application/json", Authorization: `Bearer ${apiKey}`, }, body: JSON.stringify({ model: "gpt-5-nano", stream: true, messages: [ { role: "system", content: "Get delivery date for order" }, { role: "user", content: "Order ID: 123456" }, ], tool_choice: "required", tools: [ { type: "function", function: { name: "get_delivery_date", parameters: { type: "object", properties: { order_id: { type: "string" } }, required: ["order_id"] }, }, }, ], }), })) { console.log(JSON.stringify(tools)); } ``` `tools` is an array of objects with `name`, `id` (for OpenAI/Anthropic, and Gemini Interactions), and `args` properties. It streams like this: ```json [{"name":"get_delivery_date","id":"call_F8YHCjnzrrTjfE4YSSpVW2Bc","args":""}] [{"name":"get_delivery_date","id":"call_F8YHCjnzrrTjfE4YSSpVW2Bc","args":"{\""}] [{"name":"get_delivery_date","id":"call_F8YHCjnzrrTjfE4YSSpVW2Bc","args":"{\"order"}] [{"name":"get_delivery_date","id":"call_F8YHCjnzrrTjfE4YSSpVW2Bc","args":"{\"order_id"}] [{"name":"get_delivery_date","id":"call_F8YHCjnzrrTjfE4YSSpVW2Bc","args":"{\"order_id\":\""}] [{"name":"get_delivery_date","id":"call_F8YHCjnzrrTjfE4YSSpVW2Bc","args":"{\"order_id\":\"123"}] [{"name":"get_delivery_date","id":"call_F8YHCjnzrrTjfE4YSSpVW2Bc","args":"{\"order_id\":\"123456"}] [{"name":"get_delivery_date","id":"call_F8YHCjnzrrTjfE4YSSpVW2Bc","args":"{\"order_id\":\"123456\"}"}] ``` Use a library like [partial-json](https://www.npmjs.com/package/partial-json) to parse the `args` incrementally. ### Streaming Config asyncLLM accepts a `config` object with the following properties: - `fetch`: Custom fetch implementation (defaults to global `fetch`). - `onResponse`: Async callback function that receives the Response object before streaming begins. If the callback returns a promise, it will be awaited before continuing the stream. Here's how you can use a custom fetch implementation: ```javascript import { asyncLLM } from "https://cdn.jsdelivr.net/npm/asyncllm@2"; const body = { // Same as OpenAI example above }; // Optional configuration. You can ignore it for most use cases. const config = { onResponse: async (response) => { console.log(response.status, response.headers); }, // You can use a custom fetch implementation if needed fetch: fetch, }; for await (const { content } of asyncLLM( "https://api.openai.com/v1/chat/completions", { method: "POST", headers: { "Content-Type": "application/json", Authorization: `Bearer ${apiKey}` }, body: JSON.stringify(body), }, config, )) { console.log(content); } ``` ### Streaming from text You can parse streamed SSE events from a text string (e.g. from a cached response) using the provided `fetchText` helper: ```javascript import { asyncLLM } from "https://cdn.jsdelivr.net/npm/asyncllm@2"; import { fetchText } from "https://cdn.jsdelivr.net/npm/asyncsse@1/dist/fetchtext.js"; const text = ` data: {"candidates": [{"content": {"parts": [{"text": "2"}],"role": "model"}}]} data: {"candidates": [{"content": {"parts": [{"text": " + 2 = 4\\n"}],"role": "model"}}]} data: {"candidates": [{"content": {"parts": [{"text": ""}],"role": "model"}}]} `; // Stream events from text for await (const event of asyncLLM(text, {}, { fetch: fetchText })) { console.log(event); } ``` This outputs: ``` { data: "Hello" } { data: "World" } ``` This is particularly useful for testing SSE parsing without making actual HTTP requests. ### Error handling If an error occurs, it will be yielded in the `error` property. For example: ```javascript for await (const { content, error } of asyncLLM("https://api.openai.com/v1/chat/completions", { method: "POST", // ... })) { if (error) console.error(error); else console.log(content); } ``` The `error` property is set if: - The underlying API (e.g. OpenAI, Anthropic, Gemini) returns an error in the response (e.g. `error.message` or `message.error` or `error`) - The fetch request fails (e.g. network error) - The response body cannot be parsed as JSON ## API ### `asyncLLM(request: string | Request, options?: RequestInit, config?: SSEConfig): AsyncGenerator` Fetches streaming responses from LLM providers and yields events. - `request`: The URL or Request object for the LLM API endpoint - `options`: Optional [fetch options](https://developer.mozilla.org/en-US/docs/Web/API/fetch#parameters) - `config`: Optional configuration object for SSE handling - `fetch`: Custom fetch implementation (defaults to global fetch) - `onResponse`: Async callback function that receives the Response object before streaming begins. If the callback returns a promise, it will be awaited before continuing the stream. Returns an async generator that yields [`LLMEvent` objects](#llmevent). #### LLMEvent - `content`: The text content of the response - `tools`: Array of tool call objects with: - `name`: The name of the tool being called - `args`: The arguments for the tool call as a JSON-encoded string, e.g. `{"order_id":"123456"}` - `id`: Optional unique identifier for the tool call (e.g. OpenAI's `call_F8YHCjnzrrTjfE4YSSpVW2Bc` or Anthropic's `toolu_01T1x1fJ34qAmk2tNTrN7Up6`. Gemini does not return an id.) - `message`: The raw message object from the LLM provider (may include id, model, usage stats, etc.) - `error`: Error message if the request fails ### Node.js usage ```javascript import { asyncLLM } from "asyncllm"; // Rest of the usage is the same as in the browser examples ``` ## Development ```bash git clone https://github.com/sanand0/asyncllm.git cd asyncllm npm install npm run lint && npm run build && npm test npm publish git commit . -m"$COMMIT_MSG"; git tag $VERSION; git push --follow-tags ``` ## Release notes - [2.4.0](https://npmjs.com/package/asyncllm/v/2.4.0): 18 Dec 2025: Added Gemini Interactions API streaming support (text, tools, completion metadata) - [2.3.1](https://npmjs.com/package/asyncllm/v/2.3.1): 31 Jul 2025: Standardized package.json & README.md, renamed index.js to asyncllm.js - [2.2.0](https://npmjs.com/package/asyncllm/v/2.2.0): 23 Apr 2025. Added [OpenAI Responses API](https://platform.openai.com/docs/api-reference/responses-streaming) - [2.1.2](https://npmjs.com/package/asyncllm/v/2.1.2): 25 Dec 2024. Update repo links - [2.1.1](https://npmjs.com/package/asyncllm/v/2.1.1): 9 Nov 2024. Document standalone adapter usage - [2.1.0](https://npmjs.com/package/asyncllm/v/2.1.0): 7 Nov 2024. Added `id` to tools to support unique tool call identifiers from providers - [2.0.1](https://npmjs.com/package/asyncllm/v/2.0.1): 5 Nov 2024. Multiple tools support. **Breaking change**: `tool` and `args` are not part of the response. Instead, it has `tools`, an array of `{ name, args }`. Gemini adapter returns `toolConfig` instead of `toolsConfig` - [1.2.2](https://npmjs.com/package/asyncllm/v/1.2.2): 3 Nov 2024. Added streaming from text documentation via `config.fetch`. Upgrade to asyncSSE 1.3.1 (bug fix). - [1.2.1](https://npmjs.com/package/asyncllm/v/1.2.1): 3 Nov 2024. Added `config.fetch` for custom fetch implementation - [1.2.0](https://npmjs.com/package/asyncllm/v/1.2.0): 2 Nov 2024. Added `config.onResponse(response)` that receives the Response object before streaming begins - [1.1.3](https://npmjs.com/package/asyncllm/v/1.1.3): 2 Nov 2024. Ensure `max_tokens` for Anthropic. Improve error handling - [1.1.1](https://npmjs.com/package/asyncllm/v/1.1.1): 30 Oct 2024. Added [Anthropic adapter](#anthropic) - [1.1.0](https://npmjs.com/package/asyncllm/v/1.1.0): 30 Oct 2024. Added [Gemini adapter](#gemini) - [1.0.0](https://npmjs.com/package/asyncllm/v/1.0.0): 15 Oct 2024. Initial release with [asyncLLM](#asyncllm) and [LLMEvent](#llmevent) ## License This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.