# POLARIS Framework v2.1

<img src="https://github.com/user-attachments/assets/651b4783-9175-4970-a965-18065ce817ec" alt="POLARIS Logo" width="300" />

**POLARIS** (Policy Optimization via Layered Agents and Recursive Inference Search) — A multi-agent decision-making framework with dual execution modes: flat parallel inference and a 4-layer creative pipeline.

[![npm version](https://img.shields.io/npm/v/polaris-framework.svg)](https://www.npmjs.com/package/polaris-framework)
[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
[![Node.js](https://img.shields.io/badge/node-%3E%3D18.0.0-brightgreen.svg)](https://nodejs.org)

## What's New in v2.1

- **Rate Limiter Integration** — Built-in provider-aware rate limiting with jitter now wired into the engine's parallel execution, preventing 429 errors automatically
- **API Key Validation** — All agents (OpenAI, Anthropic, Google) now validate API keys at construction time with clear error messages
- **Retryable Error Hierarchy** — `PolarisError` now carries a `retryable` flag; new `APIError` subclass auto-classifies transient failures (429, 5xx) as retryable
- **Temperature Clamping** — Agent temperatures are clamped to `[0, 2]` at construction, preventing invalid API requests
- **Pipeline Caching** — `LayerPipeline` instances are cached across inferences for reduced allocation overhead
- **Secure Session IDs** — Uses `crypto.randomUUID()` instead of `Math.random()` for cryptographically sound session identifiers
- **Reduced Dependencies** — Removed unused `uuid`, `jest`, `ts-jest`, `@types/jest` packages
- **Stricter Types** — Eliminated multiple `any` casts in agent implementations

### Previous (v2.0)

- **Polaris Creativa Pipeline** — 4-layer creative pipeline (Divergent → Inquisitor → Synthesizer → Orchestrator) with decay-based reinjection loop
- **Strategy Pattern** — Layer-blind agents with injected prompt/output strategies
- **Structured Outputs** — OpenAI `json_schema` for deterministic parsing in Layers 2 & 4
- **ROI-Aware Routing** — Token budget tracking with forced delivery at budget exhaustion
- **AbortController Timeouts** — Per-agent latency bounding in Layer 1
- **Dual-Mode Engine** — `PolarisEngine` supports both `"flat"` and `"pipeline"` modes

## Quick Start

### Installation

```bash
pnpm add polaris-framework
# or
npm install polaris-framework
```

### Environment Setup

```bash
cp .env.example .env
# Edit .env with your API keys
```

Required keys depend on which providers you use:

```env
OPENAI_API_KEY=sk-...        # For OpenAI agents + pipeline mode
ANTHROPIC_API_KEY=sk-ant-... # For Anthropic agents
GOOGLE_API_KEY=AI...         # For Google agents
```

### Flat Mode (Multi-Agent Inference)

The simplest way to use POLARIS — multiple agents evaluate the same state in parallel, each from their assigned role's perspective.

```typescript
import { TaskBuilder, openAiAgent, PolarisEngine } from "polaris-framework";

// 1. Define your task with roles
const task = TaskBuilder.create("analysis", "Strategy Analysis")
  .description("Analyze a business strategy from multiple perspectives")
  .commonDomain("GENERAL")
  .commonRoles(["ANALYST", "RISKY", "CONSERVATIVE"])
  .goals("Comprehensive strategy analysis")
  .build();

// 2. Create role-aware agents (prompts are built automatically from role + task)
const agents = [
  openAiAgent({ role: task.roles.ANALYST, task, model: "gpt-4o" }),
  openAiAgent({ role: task.roles.RISKY, task, model: "gpt-4o", temperature: 0.9 }),
  openAiAgent({ role: task.roles.CONSERVATIVE, task, model: "gpt-4o-mini", temperature: 0.3 }),
];

// 3. Run inference
const engine = new PolarisEngine({ task, agents });
const result = await engine.inference({ state, actions });

// result.agentOutputs → array of AgentOutput (score, confidence, reasoning)
// result.sentinelAnalysis → bias detection and diversity metrics (if sentinel enabled)
// result.recommendation → consensus-based action recommendation
```

### Pipeline Mode (Polaris Creativa)

For complex creative tasks that benefit from iterative refinement. Four specialized layers collaborate through a decay loop.

```typescript
import {
  PolarisEngine,
  OpenAIStrategyAgent,
  DivergentPromptStrategy,
  DivergentOutputParser,
  InquisitorPromptStrategy,
  InquisitorOutputParser,
  SynthesizerPromptStrategy,
  SynthesizerOutputParser,
  OrchestratorPromptStrategy,
  OrchestratorOutputParser,
  INQUISITOR_SCHEMA,
  ORCHESTRATOR_SCHEMA,
} from "polaris-framework";

// Create layer-specific agents (same class, different strategies)
const divergentAgents = [
  new OpenAIStrategyAgent({
    name: "Explorer-A",
    model: "gpt-4o",
    role: explorerRole,
    task,
    promptStrategy: new DivergentPromptStrategy(),
    outputParser: new DivergentOutputParser(),
  }),
  // ...more divergent agents for diversity
];

const engine = new PolarisEngine({
  task,
  agents: [],
  mode: "pipeline",
  pipelineConfig: {
    task,
    divergentAgents,
    inquisitorAgent, // Filters with Structured Outputs
    synthesizerAgent, // Weaves into cohesive draft
    orchestratorAgent, // Decides: deliver or rerun
    decayConfig: {
      maxIterations: 5,
      baseTemperature: 0.8,
      temperatureIncrement: 0.05,
      temperatureCeiling: 1.2,
      promptMutationFactor: 0.1,
      tokenBudget: 100000,
      agentTimeoutMs: 30000,
      minSurvivingAgents: 1,
    },
  },
});

const result = await engine.inference({ state, actions });
// result.pipelineResult.finalOutput → the refined creative output
// result.pipelineResult.totalIterations → how many decay cycles ran
// result.pipelineResult.iterationBreakdowns → per-iteration timing and verdicts
```

> See [`examples/pipeline-demo.ts`](examples/pipeline-demo.ts) for a complete, runnable pipeline example.

### One-Line Setup

For quick prototyping with sensible defaults:

```typescript
import { quickStart } from "polaris-framework";

const { preset, createEngine } = quickStart("chess");
const engine = createEngine({ openai: "your-api-key" });
const result = await engine.inference({ state });
```

Available presets: `"chess"`, `"debate"`, `"decision"`, `"general"`.

## Architecture

POLARIS has two execution modes sharing the same agent system, task definitions, and output format:

```
┌─────────────────────────────────────────────────┐
│                  PolarisEngine                   │
│                                                  │
│   mode: "flat"           mode: "pipeline"        │
│   ┌──────────┐           ┌──────────────────┐   │
│   │ Parallel │           │  LayerPipeline   │   │
│   │ Agents   │           │  (4-layer loop)  │   │
│   └──────────┘           └──────────────────┘   │
│         │                        │               │
│         └────────┬───────────────┘               │
│                  ▼                                │
│            EngineOutput                           │
└─────────────────────────────────────────────────┘
```

### Flat Inference

All agents evaluate the same state in parallel and return independent outputs. Each agent builds its own prompt from its `AgentRole` and `PolarisEngineTask`. No inter-agent communication — pure parallel evaluation with built-in rate limiting.

**Best for:** scoring, classification, simple analysis, quick multi-perspective evaluation.

### Pipeline (Polaris Creativa)

A 4-layer creative pipeline with a decay-based quality loop:

```
 Layer 1          Layer 2          Layer 3          Layer 4
┌──────────┐    ┌────────────┐   ┌──────────────┐  ┌──────────────┐
│ Divergent │──▶│ Inquisitor │──▶│ Synthesizer  │──▶│ Orchestrator │
│ T=0.8-1.0│    │ T=0.1-0.3  │   │  T=0.3-0.5   │  │  T=0.1       │
│ (parallel)│    │ (filter)   │   │  (weave)     │  │ (quality gate)│
└──────────┘    └────────────┘   └──────────────┘  └──────┬───────┘
                                                          │
                                              ┌───────────┤
                                              │  deliver?  │
                                              │  or rerun? │
                                              ▼            ▼
                                           OUTPUT     REINJECTION ──▶ Layer 1
```

| Layer | Name             | Temperature | Purpose                                           |
| ----- | ---------------- | ----------- | ------------------------------------------------- |
| 1     | **Divergent**    | 0.8–1.0     | Parallel creative exploration (multiple agents)   |
| 2     | **Inquisitor**   | 0.1–0.3     | Filter for logical coherence (Structured Outputs) |
| 3     | **Synthesizer**  | 0.3–0.5     | Weave validated fragments into a draft            |
| 4     | **Orchestrator** | 0.1         | ROI-aware quality gate (Structured Outputs)       |

The Orchestrator's `deliver`/`rerun` decision triggers the **decay loop**: on rerun, Layer 1 receives accumulated correction feedback and progressively higher temperatures until delivery or budget exhaustion.

**Best for:** creative writing, complex synthesis, iterative refinement tasks.

### Strategy Pattern

Pipeline agents are **layer-blind** — they don't know which layer they're in. Behavior is entirely determined by the injected `PromptStrategy` and `OutputParserStrategy`:

```typescript
// Same agent class, different strategies → different layer behavior
new OpenAIStrategyAgent({
  promptStrategy: new DivergentPromptStrategy(), // Makes it Layer 1
  outputParser: new DivergentOutputParser(),
});

new OpenAIStrategyAgent({
  promptStrategy: new InquisitorPromptStrategy(), // Makes it Layer 2
  outputParser: new InquisitorOutputParser(),
  structuredOutputSchema: INQUISITOR_SCHEMA, // JSON deterministic parsing
});
```

### Sentinel Oversight

Optional meta-cognitive analysis of agent outputs:

- **Bias Detection** — Systematic, temporal, positional, confirmation, and anchoring bias
- **Diversity Analysis** — Score diversity, confidence spread, reasoning variety
- **Score Adjustments** — Automatic corrections when bias is detected

## Agent Providers

| Provider      | Models                                 | Features                             |
| ------------- | -------------------------------------- | ------------------------------------ |
| **OpenAI**    | GPT-5.4, GPT-5.4 Mini, O4-mini         | Structured Outputs, function calling |
| **Anthropic** | Claude Opus 4.8, Sonnet 4.6, Haiku 4.5 | Long context, safety                 |
| **Google**    | Gemini 3.5 Flash, Gemini 3.1 Pro       | Multi-modal, large context           |

All providers are created through ergonomic factory functions:

```typescript
import { openAiAgent, anthropicAgent, googleAgent, createEnsemble } from "polaris-framework";

// Individual agents
const analyst = openAiAgent({ role, task, model: "gpt-5.4-mini" });
const debater = anthropicAgent({ role, task, model: "claude-sonnet-4-6" });
const researcher = googleAgent({ role, task, model: "gemini-3.5-flash" });

// Or create a diverse ensemble in one call
const agents = createEnsemble(task, {
  roles: ["ANALYST", "RISKY", "CONSERVATIVE"],
  providers: ["openai", "anthropic", "google"],
});
```

## Error Handling

POLARIS provides a typed error hierarchy with retryability classification:

```typescript
import { PolarisError, APIError, ConfigurationError, ValidationError } from "polaris-framework";

try {
  const result = await engine.inference({ state });
} catch (error) {
  if (error instanceof PolarisError) {
    console.log(error.code); // "API_ERROR", "VALIDATION_ERROR", etc.
    console.log(error.retryable); // true for 429s and 5xx errors
    console.log(error.context); // Structured error metadata
  }
}
```

| Error Class          | Code                  | Retryable      | When                      |
| -------------------- | --------------------- | -------------- | ------------------------- |
| `PolarisError`       | `POLARIS_ERROR`       | No             | Base class for all errors |
| `APIError`           | `API_ERROR`           | Auto (429/5xx) | Provider API failures     |
| `ConfigurationError` | `CONFIGURATION_ERROR` | No             | Invalid config            |
| `ValidationError`    | `VALIDATION_ERROR`    | No             | Input validation failures |
| `AgentError`         | `AGENT_ERROR`         | No             | Agent lifecycle issues    |

## Rate Limiting

Built-in rate limiter with jitter prevents 429 errors during parallel execution:

```typescript
import { RateLimiter, ProviderRateLimiters } from "polaris-framework";

// Pre-configured provider limits (used automatically by the engine)
const limiter = ProviderRateLimiters.openai(); // 8 concurrent, 125ms gap
const limiter = ProviderRateLimiters.anthropic(); // 2 concurrent, 1200ms gap
const limiter = ProviderRateLimiters.google(); // 3 concurrent, 1000ms gap

// Or configure manually
const custom = new RateLimiter({
  maxConcurrent: 4,
  minDelayMs: 500,
  maxQueueSize: 50,
});

// Schedule rate-limited execution
const result = await limiter.schedule(() => agent.evaluate(state));
```

> **Note:** The engine automatically uses the conservative rate limiter when running agents in parallel mode. You don't need to configure this manually unless you want custom limits.

## Development

```bash
pnpm run dev              # Watch mode
pnpm run build            # Build TypeScript
pnpm test                 # All tests (9 test suites)
pnpm run test:basic       # Framework basics
pnpm run test:pipeline    # Pipeline & strategy tests
pnpm run test:inference   # Business decisions
pnpm run test:philosophy  # Philosophical discourse
pnpm run test:chess       # Chess analysis
pnpm run type-check       # Type checking only
pnpm run lint             # ESLint with auto-fix
pnpm run format           # Prettier formatting
pnpm run demo:pipeline    # Run the pipeline demo
```

## Project Structure

```
src/
├── agents/          # Agent implementations (flat + strategy)
│   ├── base/        #   BaseAgent, StrategyAgent abstractions
│   ├── factories/   #   Ergonomic agent creation functions
│   └── web/         #   OpenAI, Anthropic, Google implementations
├── config/          # Presets, model catalog, and configuration
├── domains/         # Domain-specific state/action types (chess, philosophy)
├── engine/          # PolarisEngine + LayerPipeline
├── errors/          # Typed error hierarchy with retryability
├── sentinel/        # Bias detection & diversity analysis
├── strategies/      # Prompt + output parser strategies (8 total)
├── types/           # TypeScript definitions (layer, task, agent-output)
└── utils/           # Logger, config, rate-limiter, math, validation, statistics
```

## Examples

| File                                                               | Description                                      |
| ------------------------------------------------------------------ | ------------------------------------------------ |
| [`examples/basic-usage.ts`](examples/basic-usage.ts)               | Quick start, custom tasks, agent factories       |
| [`examples/inference-examples.ts`](examples/inference-examples.ts) | Multi-agent inference, presets, selective agents |
| [`examples/pipeline-demo.ts`](examples/pipeline-demo.ts)           | Full pipeline demo with decay loop explanation   |

## Test Suite

| Domain     | Tests                                  | Description                                      |
| ---------- | -------------------------------------- | ------------------------------------------------ |
| Basic      | Component Validation, Setup, Benchmark | Framework init, agent creation, engine lifecycle |
| Inference  | Multi-Agent Business Decision          | Real LLM API calls with consensus                |
| Philosophy | Consciousness Debate                   | Multi-provider philosophical discourse           |
| Chess      | Tactics + Position Analysis            | Strategic role-based analysis                    |
| Strategies | 46 assertions                          | All 8 prompt/output strategies                   |
| Pipeline   | 24 assertions                          | Decay loop, message propagation, forced delivery |

See [`test/README.md`](test/README.md) for details.

## Use Cases

- **Decision Support** — Multi-perspective analysis with role-aware agents
- **Creative Writing** — 4-layer pipeline for iterative quality improvement
- **Game AI** — Chess analysis with strategic roles
- **Business Analysis** — Multi-agent strategic planning
- **Research** — Collaborative AI analysis with quality gates
- **Content Generation** — Blog posts, stories, reports with iterative refinement

## Contributing

1. Fork the repository
2. Create a feature branch: `git checkout -b feature/amazing-feature`
3. Make your changes with tests
4. Run checks: `pnpm run type-check && pnpm test`
5. Submit a pull request

## Citation

```bibtex
@software{vallejo2026polaris,
  title={POLARIS: Policy Optimization via Layered Agents and Recursive Inference Search},
  author={Vallejo, Diego},
  year={2026},
  version={2.1.0},
  url={https://github.com/DiegoVallejoDev/Polaris}
}
```

## License

MIT License — see [LICENSE](LICENSE) file for details.

## Author

Diego Vallejo — [diegovallejo.dev](https://www.diegovallejo.dev/)