# Answer similarity scorer

The `createAnswerSimilarityScorer()` function creates a scorer that evaluates how similar an agent's output is to a ground truth answer. This scorer is specifically designed for CI/CD testing scenarios where you have expected answers and want to ensure consistency over time.

## Parameters

**model** (`LanguageModel`): The language model used to evaluate semantic similarity between outputs and ground truth.

**options** (`AnswerSimilarityOptions`): Configuration options for the scorer.

**options.requireGroundTruth** (`boolean`): Whether to require ground truth for evaluation. If false, missing ground truth returns score 0.

**options.semanticThreshold** (`number`): Weight for semantic matches vs exact matches (0-1).

**options.exactMatchBonus** (`number`): Additional score bonus for exact matches (0-1).

**options.missingPenalty** (`number`): Penalty per missing key concept from ground truth.

**options.contradictionPenalty** (`number`): Penalty for contradictory information. High value ensures wrong answers score near 0.

**options.extraInfoPenalty** (`number`): Mild penalty for extra information not present in ground truth (capped at 0.2).

**options.scale** (`number`): Score scaling factor.

This function returns an instance of the MastraScorer class. The `.run()` method accepts the same input as other scorers (see the [MastraScorer reference](https://mastra.ai/reference/evals/mastra-scorer)), but **requires ground truth** to be provided in the run object.

## `.run()` returns

**runId** (`string`): The id of the run (optional).

**score** (`number`): Similarity score between 0-1 (or 0-scale if custom scale used). Higher scores indicate better similarity to ground truth.

**reason** (`string`): Human-readable explanation of the score with actionable feedback.

**preprocessStepResult** (`object`): Extracted semantic units from output and ground truth.

**analyzeStepResult** (`object`): Detailed analysis of matches, contradictions, and extra information.

**preprocessPrompt** (`string`): The prompt used for semantic unit extraction.

**analyzePrompt** (`string`): The prompt used for similarity analysis.

**generateReasonPrompt** (`string`): The prompt used for generating the explanation.

## Scoring details

The scorer uses a multi-step process:

1. **Extract**: Breaks down output and ground truth into semantic units
2. **Analyze**: Compares units and identifies matches, contradictions, and gaps
3. **Score**: Calculates weighted similarity with penalties for contradictions
4. **Reason**: Generates human-readable explanation

Score calculation: `max(0, base_score - contradiction_penalty - missing_penalty - extra_info_penalty) × scale`

## Example

Evaluate agent responses for similarity to ground truth across different scenarios:

```typescript
import { runEvals } from '@mastra/core/evals'
import { createAnswerSimilarityScorer } from '@mastra/evals/scorers/prebuilt'
import { myAgent } from './agent'

const scorer = createAnswerSimilarityScorer({ model: 'openai/gpt-5.4' })

const result = await runEvals({
  data: [
    {
      input: 'What is 2+2?',
      groundTruth: '4',
    },
    {
      input: 'What is the capital of France?',
      groundTruth: 'The capital of France is Paris',
    },
    {
      input: 'What are the primary colors?',
      groundTruth: 'The primary colors are red, blue, and yellow',
    },
  ],
  scorers: [scorer],
  target: myAgent,
  onItemComplete: ({ scorerResults }) => {
    console.log({
      score: scorerResults[scorer.id].score,
      reason: scorerResults[scorer.id].reason,
    })
  },
})

console.log(result.scores)
```

For more details on `runEvals`, see the [runEvals reference](https://mastra.ai/reference/evals/run-evals).

To add this scorer to an agent, see the [Scorers overview](https://mastra.ai/docs/evals/overview) guide.