
# browser-llm-engine

A **browser-friendly** library for running large language models (LLMs) directly in the browser using [Wllama](https://github.com/nadchif/wllama). This library provides a simple interface to load `.gguf` or `.bin` models (e.g., from Hugging Face) and generate text completions, including **streaming token** support.

---

## Features

- **Plug-and-Play**: Easy to integrate into your web projects.  
- **Local or Remote Models**: Load a URL from Hugging Face or pass local `File` objects.  
- **Token-by-Token Streaming**: Handle partial results in real-time via `onNewToken` callback.  
- **Templates**: Leverages [Jinja](https://github.com/huggingface/jinja) to format chat-based prompts.  
- **Lightweight**: Bundles a minimal set of dependencies.

---

## Table of Contents

1. [Installation](#installation)  
2. [Usage](#usage)  
   - [Quick Start](#quick-start)  
   - [Streaming](#streaming)  
   - [Loading Local Files](#loading-local-files)  
3. [Preset Models](#preset-models)  
4. [API](#api)  
   - [createLlmEngine](#creatllmengine)  
   - [loadModel](#loadmodel)  
   - [formatChat](#formatchat)  
   - [createCompletion](#createcompletion)  
   - [exit](#exit)  
5. [Local Development](#local-development)  
6. [License](#license)

---

## Installation

```bash
npm install browser-llm-engine
```

Or with Yarn:

```bash
yarn add browser-llm-engine
```

---

## Usage

### Quick Start

```js
import { createLlmEngine, CHAT_ROLE, PRESET_MODELS } from 'browser-llm-engine';

(async () => {
  // 1) Create an engine instance
  const llm = createLlmEngine({
    // Optional: provide custom WASM paths or config
    wasmPaths: {}
  });

  // 2) Load a preset model from the library
  const modelUrl = PRESET_MODELS["SmolLM2 (360M)"].url;
  await llm.loadModel(modelUrl, {
    progressCallback: (progress) => console.log(`Loading: ${progress}%`),
  });

  // 3) Generate a completion
  const result = await llm.createCompletion("Hello from the browser!");
  console.log("Full model response:", result);

  // 4) Clean up
  await llm.exit();
})();
```

That’s it! You have a working LLM in the browser.

---

### Streaming

To get partial tokens as they are generated, supply an `onNewToken` callback:

```js
const llm = createLlmEngine();
await llm.loadModel(PRESET_MODELS["SmolLM2 (360M)"].url);

let outputSoFar = "";
await llm.createCompletion("What's the weather today?", {
  nPredict: 128,
  sampling: { temp: 0.7, penalty_repeat: 1.1 },
  onNewToken: (token) => {
    outputSoFar += token;
    console.log("Streamed token:", token);
  }
});

console.log("Final streamed output:", outputSoFar);
```

---

### Loading Local Files

If you want to load the model from your local machine:

```html
<input type="file" id="modelFile" multiple />
<script type="module">
  import { createLlmEngine } from 'browser-llm-engine';

  const fileInput = document.getElementById("modelFile");
  const llm = createLlmEngine();

  fileInput.addEventListener("change", async () => {
    try {
      // fileInput.files is a FileList
      await llm.loadModel(fileInput.files);
      console.log("Model loaded locally!");
    } catch (error) {
      console.error("Failed to load local model:", error);
    }
  });
</script>
```

---

## Preset Models

The library includes a `models.json` with references to a few hosted models. You can get them via:

```js
import { PRESET_MODELS } from 'browser-llm-engine';

console.log("Available models:", PRESET_MODELS);
```

Feel free to add or remove entries if you fork this library.

---

## API

### `createLlmEngine(config?)`
Creates a new engine instance.  
- **Parameters:**  
  - `config` (Object) – Optional configuration, e.g. `{ wasmPaths: { ... } }`.

### `loadModel(source, options?)`
Loads the model from either a remote URL or local `File` objects.  
- **Parameters:**  
  - `source` (String | File[] | FileList) – The source of the model.  
  - `options` (Object) – Additional load options:
    - `progressCallback` (function): `(progress) => {}` for tracking loading progress  
    - `useCache` (Boolean): Cache the model for faster reloads  
    - `allowOffline` (Boolean): If false, tries to fetch from network

### `formatChat(messages, useProvidedTemplate?)`
Takes an array of messages (each with `role` and `content`) and formats them into a single prompt with Jinja.

### `createCompletion(prompt, options?)`
Creates the text completion for a given `prompt`.  
- **Parameters:**
  - `prompt` (String) – The text to generate from.  
  - `options` (Object) – Fine-tuning generation:
    - `nPredict` (Number) – Maximum tokens to predict (default 512)  
    - `sampling` (Object) – e.g. `{ temp: 0.7, penalty_repeat: 1.1 }`  
    - `onNewToken` (function) – A callback for streaming tokens

### `exit()`
Cleans up resources used by Wllama.  
- **Example:**
  ```js
  await llm.exit();
  ```

---

## Local Development

If you want to **develop locally**:

1. Clone the repo:  
   ```bash
   git clone https://github.com/you/browser-llm-engine.git
   cd browser-llm-engine
   ```
2. Install dependencies:  
   ```bash
   npm install
   ```
3. Build the library:  
   ```bash
   npm run build
   ```
   This will create `dist/` with both ESM and CJS bundles.
4. _(Optional)_ Start a dev server (if you add a script in `package.json`):  
   ```bash
   npm run dev
   ```
5. Open `index.html` (or any dev test page) in your browser to play around with the library.

---

## License

This project is released under the [MIT License](./LICENSE). Feel free to fork, adapt, and contribute!

---

**Happy coding and enjoy using your LLM in the browser!**