# Context precision scorer

The `createContextPrecisionScorer()` function creates a scorer that evaluates how relevant and well-positioned retrieved context pieces are for generating expected outputs. It uses **Mean Average Precision (MAP)** to reward systems that place relevant context earlier in the sequence.

It's especially useful for these use cases:

## RAG system evaluation

Ideal for evaluating retrieved context in RAG pipelines where:

- Context ordering matters for model performance
- You need to measure retrieval quality beyond basic relevance
- Early relevant context is more valuable than later relevant context

## Context window optimization

Use when optimizing context selection for:

- Limited context windows
- Token budget constraints
- Multi-step reasoning tasks

## Parameters

**model** (`MastraModelConfig`): The language model to use for evaluating context relevance

**options** (`ContextPrecisionMetricOptions`): Configuration options for the scorer

**Note**: Either `context` or `contextExtractor` must be provided. If both are provided, `contextExtractor` takes precedence.

## `.run()` returns

**score** (`number`): Mean Average Precision score between 0 and scale (default 0-1)

**reason** (`string`): Human-readable explanation of the context precision evaluation

## Scoring details

### Mean Average Precision (MAP)

Context Precision uses **Mean Average Precision** to evaluate both relevance and positioning:

1. **Context Evaluation**: Each context piece is classified as relevant or irrelevant for generating the expected output
2. **Precision Calculation**: For each relevant context at position `i`, precision = `relevant_items_so_far / (i + 1)`
3. **Average Precision**: Sum all precision values and divide by total relevant items
4. **Final Score**: Multiply by scale factor and round to 2 decimals

### Scoring Formula

```text
MAP = (Σ Precision@k) / R

Where:
- Precision@k = (relevant items in positions 1...k) / k
- R = total number of relevant items
- Only calculated at positions where relevant items appear
```

### Score Interpretation

- **0.9-1.0**: Excellent precision - all relevant context early in sequence
- **0.7-0.8**: Good precision - most relevant context well-positioned
- **0.4-0.6**: Moderate precision - relevant context mixed with irrelevant
- **0.1-0.3**: Poor precision - little relevant context or poorly positioned
- **0.0**: No relevant context found

### Reason analysis

The reason field explains:

- Which context pieces were deemed relevant/irrelevant
- How positioning affected the MAP calculation
- Specific relevance criteria used in evaluation

### Optimization insights

Use results to:

- **Improve retrieval**: Filter out irrelevant context before ranking
- **Optimize ranking**: Ensure relevant context surfaces early
- **Tune chunk size**: Balance context detail vs. relevance precision
- **Evaluate embeddings**: Test different embedding models for better retrieval

### Example Calculation

Given context: `[relevant, irrelevant, relevant, irrelevant]`

- Position 0: Relevant → Precision = 1/1 = 1.0
- Position 1: Skip (irrelevant)
- Position 2: Relevant → Precision = 2/3 = 0.67
- Position 3: Skip (irrelevant)

MAP = (1.0 + 0.67) / 2 = 0.835 ≈ **0.83**

## Scorer configuration

### Dynamic context extraction

```typescript
const scorer = createContextPrecisionScorer({
  model: 'openai/gpt-5.4',
  options: {
    contextExtractor: (input, output) => {
      // Extract context dynamically based on the query
      const query = input?.inputMessages?.[0]?.content || ''

      // Example: Retrieve from a vector database
      const searchResults = vectorDB.search(query, { limit: 10 })
      return searchResults.map(result => result.content)
    },
    scale: 1,
  },
})
```

### Large context evaluation

```typescript
const scorer = createContextPrecisionScorer({
  model: 'openai/gpt-5.4',
  options: {
    context: [
      // Simulate retrieved documents from vector database
      'Document 1: Highly relevant content...',
      'Document 2: Somewhat related content...',
      'Document 3: Tangentially related...',
      'Document 4: Not relevant...',
      'Document 5: Highly relevant content...',
      // ... up to dozens of context pieces
    ],
  },
})
```

## Example

Evaluate RAG system context retrieval precision for different queries:

```typescript
import { runEvals } from '@mastra/core/evals'
import { createContextPrecisionScorer } from '@mastra/evals/scorers/prebuilt'
import { myAgent } from './agent'

const scorer = createContextPrecisionScorer({
  model: 'openai/gpt-5.4',
  options: {
    contextExtractor: (input, output) => {
      // Extract context from agent's retrieved documents
      return output.metadata?.retrievedContext || []
    },
  },
})

const result = await runEvals({
  data: [
    {
      input: 'How does photosynthesis work in plants?',
    },
    {
      input: 'What are the mental and physical benefits of exercise?',
    },
  ],
  scorers: [scorer],
  target: myAgent,
  onItemComplete: ({ scorerResults }) => {
    console.log({
      score: scorerResults[scorer.id].score,
      reason: scorerResults[scorer.id].reason,
    })
  },
})

console.log(result.scores)
```

For more details on `runEvals`, see the [runEvals reference](https://mastra.ai/reference/evals/run-evals).

To add this scorer to an agent, see the [Scorers overview](https://mastra.ai/docs/evals/overview) guide.

## Comparison with context relevance

Choose the right scorer for your needs:

| Use Case                 | Context Relevance    | Context Precision         |
| ------------------------ | -------------------- | ------------------------- |
| **RAG evaluation**       | When usage matters   | When ranking matters      |
| **Context quality**      | Nuanced levels       | Binary relevance          |
| **Missing detection**    | ✓ Identifies gaps    | ✗ Not evaluated           |
| **Usage tracking**       | ✓ Tracks utilization | ✗ Not considered          |
| **Position sensitivity** | ✗ Position agnostic  | ✓ Rewards early placement |

## Related

- [Answer Relevancy Scorer](https://mastra.ai/reference/evals/answer-relevancy): Evaluates if answers address the question
- [Faithfulness Scorer](https://mastra.ai/reference/evals/faithfulness): Measures answer groundedness in context
- [Custom Scorers](https://mastra.ai/docs/evals/custom-scorers): Creating your own evaluation metrics