# LLM Provider Support

AgentRails supports multiple LLM providers for evaluating your agent's responses. You can use any provider regardless of which LLM your agent uses!

## Supported Providers

### OpenAI

**Models:** GPT-4 Turbo, GPT-4, GPT-3.5 Turbo

```javascript
module.exports = {
  llm: {
    provider: "openai",
    apiKey: process.env.OPENAI_API_KEY,
    model: "gpt-4-turbo-preview", // optional, default
    temperature: 0.3, // optional, default
  },
  agent: async (input) => {
    /* your agent */
  },
  tests: [
    /* your tests */
  ],
};
```

**Setup:**

```bash
npm install openai
export OPENAI_API_KEY="sk-..."
```

### Anthropic Claude

**Models:** Claude 3.5 Sonnet, Claude 3 Opus, Claude 3 Sonnet

```javascript
module.exports = {
  llm: {
    provider: "anthropic",
    apiKey: process.env.ANTHROPIC_API_KEY,
    model: "claude-3-5-sonnet-20241022", // optional, default
    temperature: 0.3, // optional
  },
  agent: async (input) => {
    /* your agent */
  },
  tests: [
    /* your tests */
  ],
};
```

**Setup:**

```bash
npm install @anthropic-ai/sdk
export ANTHROPIC_API_KEY="sk-ant-..."
```

### Google Gemini

**Models:** Gemini Pro, Gemini Pro Vision

```javascript
module.exports = {
  llm: {
    provider: "google",
    apiKey: process.env.GOOGLE_API_KEY,
    model: "gemini-pro", // optional, default
    temperature: 0.3, // optional
  },
  agent: async (input) => {
    /* your agent */
  },
  tests: [
    /* your tests */
  ],
};
```

**Setup:**

```bash
npm install @google/generative-ai
export GOOGLE_API_KEY="..."
```

### Grok (xAI)

**Models:** Grok Beta

```javascript
module.exports = {
  llm: {
    provider: "grok",
    apiKey: process.env.XAI_API_KEY,
    model: "grok-beta", // optional, default
    temperature: 0.3, // optional
    baseURL: "https://api.x.ai/v1", // optional, default
  },
  agent: async (input) => {
    /* your agent */
  },
  tests: [
    /* your tests */
  ],
};
```

**Setup:**

```bash
# Grok uses OpenAI SDK
npm install openai
export XAI_API_KEY="..."
```

## Choosing a Provider

### When to use OpenAI

- **Best for:** General purpose, well-documented, stable API
- **Pros:** Excellent at following JSON format, reliable, fast
- **Cons:** More expensive than alternatives

### When to use Anthropic

- **Best for:** Long context evaluations, detailed reasoning
- **Pros:** Excellent reasoning, large context window (200k tokens), good at nuanced evaluation
- **Cons:** Slightly slower, requires separate SDK

### When to use Google Gemini

- **Best for:** Cost-effective evaluation, multimodal inputs
- **Pros:** Free tier available, fast, good for image inputs
- **Cons:** Newer, less consistent JSON parsing

### When to use Grok

- **Best for:** Latest news/current events evaluation
- **Pros:** Access to real-time information, X integration
- **Cons:** Beta stage, limited availability

## Cost Comparison

Approximate costs per 1M tokens (input/output):

| Provider  | Model             | Input                 | Output |
| --------- | ----------------- | --------------------- | ------ |
| OpenAI    | GPT-4 Turbo       | $10                   | $30    |
| OpenAI    | GPT-3.5 Turbo     | $0.50                 | $1.50  |
| Anthropic | Claude 3.5 Sonnet | $3                    | $15    |
| Google    | Gemini Pro        | Free tier, then $0.50 | $1.50  |
| Grok      | Grok Beta         | TBD                   | TBD    |

## Best Practices

1. **Match model to task complexity:**

   - Simple pass/fail: GPT-3.5 Turbo or Gemini Pro
   - Nuanced evaluation: GPT-4 Turbo or Claude 3.5 Sonnet

2. **Use different providers for redundancy:**

   ```javascript
   // Run same tests with multiple evaluators
   const providers = ["openai", "anthropic", "google"];
   ```

3. **Set temperature low (0.1-0.3):**

   - Low temperature = more consistent evaluation
   - High temperature = more creative but less reliable

4. **Your agent can use a different LLM:**
   ```javascript
   // Agent uses Claude, Evaluator uses GPT-4
   module.exports = {
     llm: { provider: "openai", apiKey: process.env.OPENAI_API_KEY },
     agent: async (input) => {
       // Your agent calls Claude internally
       return await yourClaudeAgent.chat(input);
     },
   };
   ```

## Adding a New Provider

To add a new provider:

1. Implement the `LLMEvaluator` interface in `src/evaluator.ts`
2. Add the provider to the `LLMProvider` type in `src/types.ts`
3. Update the `createEvaluator` factory function
4. Add tests in `tests/evaluator.test.ts`

Example:

```typescript
export class CustomEvaluator implements LLMEvaluator {
  async evaluate(
    input: string | Record<string, any>,
    actualResponse: string | Record<string, any>,
    expectedBehavior?: string,
    exampleResponses?: string[]
  ): Promise<{ passed: boolean; reasoning: string }> {
    // Your implementation
  }
}
```

Pull requests welcome!
