# llm-metrics

<div align="center">

[![npm version](https://img.shields.io/npm/v/llm-metrics.svg?style=for-the-badge)](https://www.npmjs.com/package/llm-metrics)
[![npm downloads](https://img.shields.io/npm/dm/llm-metrics.svg?style=for-the-badge)](https://www.npmjs.com/package/llm-metrics)
[![CI](https://img.shields.io/github/actions/workflow/status/Arakiss/llm-metrics/ci.yml?branch=main&style=for-the-badge&label=CI)](https://github.com/Arakiss/llm-metrics/actions/workflows/ci.yml)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg?style=for-the-badge)](https://opensource.org/licenses/MIT)
[![TypeScript](https://img.shields.io/badge/TypeScript-5.9-blue.svg?style=for-the-badge&logo=typescript)](https://www.typescriptlang.org/)
[![Node.js](https://img.shields.io/badge/Node.js-22+-green.svg?style=for-the-badge&logo=node.js)](https://nodejs.org/)
[![Bun](https://img.shields.io/badge/Bun-1.3+-black.svg?style=for-the-badge&logo=bun)](https://bun.sh/)

**Metrics collection system for LLMs and AI agents**

Track performance, latency, and usage metrics for agents, tools, and LLM requests. Perfect for monitoring LLM applications, AI agents, and agentic systems.

[Installation](#installation) • [Quick Start](#quick-start) • [Documentation](#documentation) • [Contributing](#contributing)

</div>

---

A professional, framework-agnostic metrics collection system designed specifically for **LLM applications** and **AI agents**. Built with TypeScript, featuring type-safe APIs, comprehensive validation, and flexible persistence backends.

## ✨ Features

- 🚀 **Framework-agnostic** - Works with any JavaScript/TypeScript project (Next.js, Express, Hono, etc.)
- 📊 **Multiple metric types** - Track agents, tools, latency, and request timing
- 💾 **Flexible persistence** - In-memory by default, pluggable persistence backends (PostgreSQL, MongoDB, Redis)
- ✅ **Type-safe** - Full TypeScript support with strict types and IntelliSense
- 🔍 **Validation** - Built-in metric validation (configurable, prevents invalid data)
- 📈 **Aggregations** - Built-in summary statistics, percentiles, and histograms
- 🎨 **Formatting** - Human-readable metric formatting utilities for logging
- 🔌 **Extensible** - Custom persistence backends, loggers, and event hooks
- ⚡ **Zero dependencies** - No runtime dependencies, lightweight and fast
- 🔎 **Query API** - Flexible filtering by context, time range, duration, metadata, etc.
- 📦 **Batch operations** - Efficient batch recording for migrations and imports
- 📊 **Derived metrics** - Rate calculations, error rates, and trend analysis
- 🧪 **Well tested** - Comprehensive test suite (154 tests, 290+ assertions)
- 📦 **ESM-only** - Modern JavaScript, no CommonJS legacy code

## 📦 Installation

Install `llm-metrics` from npm:

```bash
npm install llm-metrics
```

Or using your preferred package manager:

```bash
# Bun
bun add llm-metrics

# Yarn
yarn add llm-metrics

# pnpm
pnpm add llm-metrics
```

### Requirements

- **Node.js** >= 22.0.0 (LTS) or **Bun** >= 1.3.0
- **ESM-only** - This package uses ES Modules only (no CommonJS support)
- **TypeScript** 5.6+ (recommended for type safety)

## 🚀 Quick Start

Get started with `llm-metrics` in under 2 minutes:

```typescript
import { metricsCollector, measureAgent, measureTool } from 'llm-metrics';

// Measure an agent execution (e.g., LLM agent, AI assistant)
const result = await measureAgent(
  'memory-manager',        // Agent identifier
  'conversation-123',     // Context ID (conversation, session, etc.)
  async () => {
    // Your agent code here
    const facts = await extractFacts();
    return { facts, count: facts.length };
  }
);

// Measure a tool execution (e.g., database query, API call)
const toolResult = await measureTool(
  'search-database',      // Tool name
  'conversation-123',      // Context ID
  async () => {
    // Your tool code here
    return await db.query('SELECT * FROM users');
  }
);

// Get summary statistics
const summary = metricsCollector.getSummary(3600000); // Last hour
console.log(`Agents executed: ${summary.totalAgentsExecuted}`);
console.log(`Average duration: ${summary.averageAgentDuration}ms`);
console.log(`Tools called: ${summary.totalToolsCalled}`);
```

## 📚 Core Concepts

### Metrics Types

`llm-metrics` supports four types of metrics optimized for LLM and AI agent workflows:

1. **🤖 Agent Metrics** - Track execution of AI agents, LLM calls, or long-running processes
   - Duration, success/failure, custom metadata
   - Perfect for monitoring agent performance and reliability

2. **🔧 Tool Metrics** - Track individual tool/function calls (function calling, RAG queries, etc.)
   - Success rate, execution time, error tracking
   - Essential for debugging tool usage in agentic systems

3. **⏱️ Latency Metrics** - Track specific operations or bottlenecks
   - Embedding generation, vector search, cache lookups
   - Identify performance bottlenecks in your LLM pipeline

4. **📡 Request Timing Metrics** - Track client vs server timing for requests
   - Client-side latency, server processing time, streaming duration
   - Understand end-to-end user experience

### Storage Architecture

- **In-memory** - Fast access, limited by `maxMetrics` (default: 1000)
  - Perfect for real-time monitoring and debugging
  - Automatically rotates oldest metrics when limit reached

- **Persistence** - Optional backend for long-term storage
  - PostgreSQL, MongoDB, Redis, or any custom backend
  - Implement `MetricsPersistence` interface for your database

## 💡 Usage Examples

### Use Cases

Perfect for:
- **LLM Applications** - Monitor GPT-4, Claude, Gemini API calls
- **AI Agents** - Track agent execution, tool usage, and performance
- **RAG Systems** - Measure vector search, embedding generation latency
- **Agentic Workflows** - Monitor multi-step agent operations
- **Production Monitoring** - Track metrics in production LLM applications

## 📖 Usage Examples

### Basic Agent Tracking

```typescript
import { measureAgent } from 'llm-metrics';

const result = await measureAgent(
  'data-processor',
  'session-123',
  async () => {
    // Process data
    const processed = await processData();
    return processed;
  }
);
```

### Agent Tracking with Custom Metadata

```typescript
import { measureAgentWithMetrics } from 'llm-metrics';

const result = await measureAgentWithMetrics(
  'memory-manager',
  'conversation-456',
  async () => {
    const facts = await extractFacts();
    return { facts, count: facts.length };
  },
  (result) => ({
    factsExtracted: result.count,
    summaryLength: result.summary?.length || 0,
  })
);
```

### Tool Tracking

```typescript
import { measureTool } from 'llm-metrics';

const result = await measureTool(
  'database-query',
  'request-789',
  async () => {
    return await db.query('SELECT * FROM users');
  }
);
```

### Manual Metric Recording

```typescript
import { metricsCollector } from 'llm-metrics';

// Record agent metrics manually
metricsCollector.recordAgent({
  agentId: 'custom-agent',
  contextId: 'context-123',
  startTime: Date.now() - 5000,
  endTime: Date.now(),
  duration: 5000,
  metadata: {
    customField: 'value',
    itemsProcessed: 42,
  },
});

// Record latency metrics
metricsCollector.recordLatency({
  operation: 'cache-lookup',
  startTime: Date.now() - 100,
  endTime: Date.now(),
  duration: 100,
  metadata: {
    cacheHit: true,
  },
});
```

### Request Timing (Client vs Server)

```typescript
import { metricsCollector } from 'llm-metrics';

metricsCollector.recordRequestTiming({
  contextId: 'request-123',
  serverTimeToFirstChunk: 500,
  serverStreamDuration: 2000,
  serverTotalDuration: 2500,
  clientTimeToFirstChunk: 800, // From Performance API
  clientRequestStart: performance.now(),
  networkLatencyEstimate: 300, // client - server difference
  metadata: {
    model: 'gpt-4',
    messageCount: 5,
  },
});
```

## ⚙️ Configuration

Customize `llm-metrics` to fit your needs:

### Custom Persistence Backend

```typescript
import { MetricsPersistence, metricsCollector } from 'llm-metrics';
import type { AgentMetrics, ToolMetrics, LatencyMetrics, RequestTimingMetrics } from 'llm-metrics';

class MyDatabasePersistence implements MetricsPersistence {
  async persistAgentMetrics(metrics: AgentMetrics): Promise<void> {
    // Save to your database
    await db.insert('agent_metrics', metrics);
  }

  async persistToolMetrics(metrics: ToolMetrics): Promise<void> {
    await db.insert('tool_metrics', metrics);
  }

  async persistLatencyMetrics(metrics: LatencyMetrics): Promise<void> {
    await db.insert('latency_metrics', metrics);
  }

  async persistRequestTimingMetrics(metrics: RequestTimingMetrics): Promise<void> {
    await db.insert('request_timing_metrics', metrics);
  }

  async getAgentMetrics(timeRangeMs?: number, contextId?: string): Promise<AgentMetrics[]> {
    // Retrieve from database
    return await db.query('SELECT * FROM agent_metrics WHERE ...');
  }

  // ... implement other get methods
}

// Configure persistence
metricsCollector.setPersistence(new MyDatabasePersistence());
```

### Custom Logger

```typescript
import { MetricsLogger, metricsCollector } from 'llm-metrics';

class MyLogger implements MetricsLogger {
  info(message: string, data?: Record<string, unknown>): void {
    console.log(`[INFO] ${message}`, data);
  }

  debug(message: string, data?: Record<string, unknown>): void {
    console.debug(`[DEBUG] ${message}`, data);
  }

  warn(message: string, data?: Record<string, unknown>): void {
    console.warn(`[WARN] ${message}`, data);
  }

  error(message: string, data?: Record<string, unknown>): void {
    console.error(`[ERROR] ${message}`, data);
  }
}

metricsCollector.setLogger(new MyLogger());
```

### Collector Configuration

```typescript
import { MetricsCollector, MetricsCollectorConfig } from 'llm-metrics';

const config: MetricsCollectorConfig = {
  maxMetrics: 5000, // Keep more metrics in memory
  validateMetrics: true, // Enable validation (default)
  throwOnValidationError: false, // Don't throw, just log (default)
};

const customCollector = new MetricsCollector(undefined, undefined, config);

// Or configure existing collector
metricsCollector.configure({
  maxMetrics: 2000,
});
```

## API Reference

### MetricsCollector

#### Methods

- `recordAgent(metrics: AgentMetrics): void` - Record agent metrics
- `recordTool(metrics: ToolMetrics): void` - Record tool metrics
- `recordLatency(metrics: LatencyMetrics): void` - Record latency metrics
- `recordRequestTiming(metrics: RequestTimingMetrics): void` - Record request timing
- `getSnapshot(): MetricsSnapshot` - Get all current metrics
- `getSummary(timeRangeMs?: number): MetricsSummary` - Get aggregated statistics
- `getContextMetrics(contextId: string): Promise<...>` - Get metrics for a context
- `clear(): void` - Clear all metrics
- `setPersistence(persistence: MetricsPersistence): void` - Configure persistence
- `setLogger(logger: MetricsLogger): void` - Configure logger
- `configure(config: Partial<MetricsCollectorConfig>): void` - Update configuration

### Helper Functions

- `measureAgent<T>(agentId, contextId?, execute): Promise<T>` - Measure agent execution
- `measureAgentWithMetrics<T>(agentId, contextId, execute, extractMetadata): Promise<T>` - Measure with metadata extraction
- `measureTool<T>(toolName, contextId, execute): Promise<T>` - Measure tool execution
- `measureToolWithMetadata<T>(toolName, contextId, execute, extractMetadata): Promise<T>` - Measure tool with metadata

### Formatting Utilities

- `formatDuration(ms: number): string` - Format duration (e.g., "1.5s", "2m 5s")
- `formatDurationDetailed(ms: number): string` - Detailed duration format
- `formatAgentMetrics(metrics: AgentMetrics): string` - Human-readable agent metrics
- `formatToolMetrics(metrics: ToolMetrics): string` - Human-readable tool metrics
- `formatLatencyMetrics(metrics: LatencyMetrics): string` - Human-readable latency metrics
- `formatMetricsSummary(summary: MetricsSummary): string` - Human-readable summary

### Validation

- `validateAgentMetrics(metrics: AgentMetrics): ValidationResult` - Validate agent metrics
- `validateToolMetrics(metrics: ToolMetrics): ValidationResult` - Validate tool metrics
- `validateLatencyMetrics(metrics: LatencyMetrics): ValidationResult` - Validate latency metrics
- `validateRequestTimingMetrics(metrics: RequestTimingMetrics): ValidationResult` - Validate request timing

## Types

### AgentMetrics

```typescript
interface AgentMetrics {
  agentId: string;
  contextId?: string; // Generic context ID (conversationId, sessionId, requestId, etc.)
  startTime: number; // Timestamp in milliseconds
  endTime?: number; // Timestamp in milliseconds
  duration?: number; // Duration in milliseconds
  metadata?: Record<string, unknown>; // Custom metadata
  error?: string; // Error message if failed
}
```

### ToolMetrics

```typescript
interface ToolMetrics {
  toolName: string;
  contextId?: string;
  startTime: number;
  endTime?: number;
  duration?: number;
  success: boolean;
  error?: string;
  metadata?: Record<string, unknown>;
}
```

### LatencyMetrics

```typescript
interface LatencyMetrics {
  operation: string;
  startTime: number;
  endTime: number;
  duration: number;
  metadata?: Record<string, unknown>;
}
```

### RequestTimingMetrics

```typescript
interface RequestTimingMetrics {
  contextId?: string;
  serverTimeToFirstChunk: number; // milliseconds
  serverStreamDuration: number; // milliseconds
  serverTotalDuration: number; // milliseconds
  clientTimeToFirstChunk?: number; // milliseconds (from Performance API)
  clientRequestStart?: number; // performance.now() timestamp
  networkLatencyEstimate?: number; // milliseconds
  metadata?: Record<string, unknown>;
}
```

## Examples

See the [`examples/`](./examples/) directory for complete, runnable examples:

- **[Next.js API Route](./examples/nextjs-api-route.ts)** - Integration with Next.js API routes
- **[Express Middleware](./examples/express-middleware.ts)** - Express middleware for automatic request tracking
- **[AI SDK Integration](./examples/ai-sdk-integration.ts)** - Integration with Vercel AI SDK
- **[Export Metrics](./examples/export-metrics.ts)** - Export metrics to JSON and CSV
- **[Aggregations](./examples/aggregations.ts)** - Advanced aggregations and histograms
- **[Event Hooks](./examples/event-hooks.ts)** - Event hooks for integrations and alerting

## Advanced Usage

### Event Hooks

Use event hooks to integrate with external systems, dashboards, or alerting:

```typescript
import { metricsCollector } from 'llm-metrics';

// Set up callbacks
metricsCollector.setCallbacks({
  onAgentRecorded: (metrics) => {
    // Send to monitoring service, update dashboard, etc.
    console.log('Agent executed:', metrics.agentId, metrics.duration);
  },
  onToolRecorded: (metrics) => {
    // Track tool usage, alert on failures, etc.
    if (!metrics.success) {
      console.error('Tool failed:', metrics.toolName);
    }
  },
});

// Or configure during construction
const collector = new MetricsCollector(persistence, logger, {
  callbacks: {
    onAgentRecorded: (metrics) => { /* ... */ },
    onToolRecorded: (metrics) => { /* ... */ },
  },
});
```

See [`examples/event-hooks.ts`](./examples/event-hooks.ts) for complete examples.

### Query and Filter API

Query metrics with flexible filter criteria:

```typescript
import { metricsCollector } from 'llm-metrics';

// Filter by multiple context IDs
const metrics = metricsCollector.queryMetrics({
  contextIds: ['session-123', 'session-456'],
});

// Filter by agent IDs
const agentMetrics = metricsCollector.queryMetrics({
  agentIds: ['data-processor'],
});

// Filter by time range
const recentMetrics = metricsCollector.queryMetrics({
  startTime: Date.now() - 3600000, // Last hour
  endTime: Date.now(),
});

// Filter by duration range
const slowMetrics = metricsCollector.queryMetrics({
  minDuration: 5000, // Slower than 5 seconds
});

// Filter by metadata
const dataMetrics = metricsCollector.queryMetrics({
  metadata: { category: 'data' },
});

// Combine multiple filters
const complexFilter = metricsCollector.queryMetrics({
  contextIds: ['session-123'],
  minDuration: 1000,
  maxDuration: 5000,
  metadata: { category: 'data' },
});
```

See [`examples/query-filter.ts`](./examples/query-filter.ts) for complete examples.

### Batch Operations

Record multiple metrics efficiently in batch:

```typescript
import { metricsCollector } from 'llm-metrics';

// Record multiple agents in batch
metricsCollector.recordAgents([
  { agentId: 'agent-1', startTime: Date.now(), /* ... */ },
  { agentId: 'agent-2', startTime: Date.now(), /* ... */ },
]);

// Record multiple tools in batch
metricsCollector.recordTools([
  { toolName: 'tool-1', startTime: Date.now(), success: true, /* ... */ },
  { toolName: 'tool-2', startTime: Date.now(), success: false, /* ... */ },
]);

// Record multiple latency metrics in batch
metricsCollector.recordLatencies([
  { operation: 'op-1', startTime: Date.now() - 100, endTime: Date.now(), duration: 100 },
  { operation: 'op-2', startTime: Date.now() - 50, endTime: Date.now(), duration: 50 },
]);

// Record multiple request timings in batch
metricsCollector.recordRequestTimings([
  { contextId: 'req-1', serverTimeToFirstChunk: 500, serverStreamDuration: 2000, serverTotalDuration: 2500 },
  { contextId: 'req-2', serverTimeToFirstChunk: 300, serverStreamDuration: 1000, serverTotalDuration: 1300 },
]);
```

Batch operations are useful for:
- Migrating metrics from another system
- Importing historical data
- Bulk operations
- More efficient than individual `record*()` calls

See [`examples/batch-operations.ts`](./examples/batch-operations.ts) for complete examples.

### Derived Metrics

Calculate simple derived metrics like rates and trends:

```typescript
import { calculateAgentDerivedMetrics, calculateToolDerivedMetrics, calculateTrend } from 'llm-metrics';

const snapshot = metricsCollector.getSnapshot();

// Calculate agent derived metrics
const agentDerived = calculateAgentDerivedMetrics(snapshot.agents, 3600000); // Last hour
console.log(`Error Rate: ${agentDerived.errorRate}%`);
console.log(`Requests/Second: ${agentDerived.requestsPerSecond}`);

// Calculate tool derived metrics
const toolDerived = calculateToolDerivedMetrics(snapshot.tools, 3600000);
console.log(`Success Rate: ${toolDerived.successRate}%`);

// Calculate trends
const trend = calculateTrend(currentRate, previousRate);
console.log(`Change: ${trend.changePercent}%`);
```

Available derived metrics:
- **Rates**: Requests per second, operations per second
- **Error Rates**: Error percentage, success percentage
- **Trends**: Change between time periods, percentage change

See [`examples/derived-metrics.ts`](./examples/derived-metrics.ts) for complete examples.

### Custom Metadata Extraction

```typescript
import { measureAgentWithMetrics } from 'llm-metrics';

const result = await measureAgentWithMetrics(
  'data-processor',
  'batch-123',
  async () => {
    const data = await processBatch();
    return {
      items: data.items,
      errors: data.errors,
      stats: data.stats,
    };
  },
  (result) => ({
    itemsProcessed: result.items.length,
    errorCount: result.errors.length,
    averageScore: result.stats.averageScore,
    customMetric: result.stats.customValue,
  })
);
```

### Time-Range Filtering

```typescript
import { metricsCollector } from 'llm-metrics';

// Last hour
const lastHour = metricsCollector.getSummary(3600000);

// Last 24 hours
const lastDay = metricsCollector.getSummary(86400000);

// All time
const allTime = metricsCollector.getSummary();
```

### Context-Based Queries

```typescript
import { metricsCollector } from 'llm-metrics';

// Get all metrics for a specific context (conversation, session, etc.)
const contextMetrics = await metricsCollector.getContextMetrics('conversation-123');

console.log(`Agents: ${contextMetrics.agents.length}`);
console.log(`Tools: ${contextMetrics.tools.length}`);
console.log(`Latency operations: ${contextMetrics.latency.length}`);
```

## Best Practices

1. **Use context IDs** - Always provide `contextId` to track metrics across operations
2. **Extract meaningful metadata** - Use metadata to store domain-specific information
3. **Configure persistence** - For production, use a persistence backend
4. **Enable validation** - Keep validation enabled to catch errors early
5. **Monitor memory usage** - Adjust `maxMetrics` based on your needs
6. **Use helper functions** - Prefer `measureAgent`/`measureTool` over manual recording

## Performance Considerations

- **In-memory storage** - Fast but limited by `maxMetrics` (default: 1000)
- **Persistence is async** - Persistence operations don't block metric recording
- **Validation overhead** - Can be disabled for maximum performance if needed
- **FIFO eviction** - Oldest metrics are removed when limit is reached

## 🔨 Building with Bun

This package is fully compatible with Bun and can be bundled directly:

```bash
# Bundle with Bun
bun build ./src/index.ts --outdir ./dist --target bun

# Or use Bun's bundler in your project
bun build node_modules/llm-metrics/dist/index.js --outdir ./bundled
```

## 🛠️ Technical Details

### Modern JavaScript Only

This package uses **ES Modules (ESM) only**:
- ✅ ES2022+ syntax
- ✅ Native ESM imports/exports
- ✅ Compatible with Bun 1.3+, Node.js 22+ (LTS), Deno
- ❌ No CommonJS support
- ❌ No legacy browser support

**Requirements:**
- Node.js >= 22.0.0 (LTS)
- Bun >= 1.3.0

## 📊 Project Status

- ✅ **v0.6.0** - Latest release
- ✅ **154 tests** passing (290+ assertions)
- ✅ **~95% code coverage** (comprehensive edge case coverage)
- ✅ **100% TypeScript** type coverage
- ✅ **ESM-only** (modern JavaScript)
- ✅ **Zero dependencies** (runtime)

## 🤝 Contributing

Contributions are welcome! Please see [CONTRIBUTING.md](./CONTRIBUTING.md) for guidelines.

## 🔌 Creating Custom Adapters

Want to create your own persistence adapter? See [src/adapters/README.md](./src/adapters/README.md) for:
- Adapter interface documentation
- PostgreSQL adapter example
- MongoDB adapter example
- Redis adapter example
- Best practices and testing guidelines

1. Fork the repository
2. Create your feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add some amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request

### Development Setup

```bash
# Clone the repository
git clone https://github.com/Arakiss/llm-metrics.git
cd llm-metrics

# Install dependencies
bun install

# Run tests
bun test

# Build
bun run build
```

## 📝 License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## 🙏 Acknowledgments

- Built for the LLM and AI agent ecosystem
- Inspired by the need for better observability in agentic systems
- Designed with performance and developer experience in mind

## 🔗 Links

- **npm**: https://www.npmjs.com/package/llm-metrics
- **GitHub**: https://github.com/Arakiss/llm-metrics
- **Issues**: https://github.com/Arakiss/llm-metrics/issues
- **Releases**: https://github.com/Arakiss/llm-metrics/releases
- **Changelog**: https://github.com/Arakiss/llm-metrics/blob/main/CHANGELOG.md
- **Contributing**: https://github.com/Arakiss/llm-metrics/blob/main/CONTRIBUTING.md
- **Security**: https://github.com/Arakiss/llm-metrics/blob/main/SECURITY.md

---

<div align="center">

Made with ❤️ for the LLM community

[⭐ Star on GitHub](https://github.com/Arakiss/llm-metrics) • [📦 Install from npm](https://www.npmjs.com/package/llm-metrics) • [🐛 Report Bug](https://github.com/Arakiss/llm-metrics/issues)

</div>