---
name: mastra-evals
description: Documentation for @mastra/evals. Use when working with @mastra/evals APIs, configuration, or implementation.
metadata:
  package: "@mastra/evals"
  version: "1.5.0"
---

## When to use

Use this skill whenever you are working with @mastra/evals to obtain the domain-specific knowledge.

## How to use

Read the individual reference documents for detailed explanations and code examples.

### Docs

- [Built-in scorers](references/docs-evals-built-in-scorers.md) - Overview of Mastra's ready-to-use scorers for evaluating AI outputs across quality, safety, and performance dimensions.
- [Scorers overview](references/docs-evals-overview.md) - Overview of scorers in Mastra, detailing their capabilities for evaluating AI outputs and measuring performance.
- [Quick Checks](references/docs-evals-quick-checks.md) - Zero-LLM micro-scorers for fast, deterministic assertions on agent output text and tool usage.

### Reference

- [Reference: Answer relevancy scorer](references/reference-evals-answer-relevancy.md) - Documentation for the Answer Relevancy Scorer in Mastra, which evaluates how well LLM outputs address the input query.
- [Reference: Answer similarity scorer](references/reference-evals-answer-similarity.md) - Documentation for the Answer Similarity Scorer in Mastra, which compares agent outputs against ground truth answers for CI/CD testing.
- [Reference: Bias scorer](references/reference-evals-bias.md) - Documentation for the Bias Scorer in Mastra, which evaluates LLM outputs for various forms of bias, including gender, political, racial/ethnic, or geographical bias.
- [Reference: Quick Checks](references/reference-evals-checks.md) - API reference for Quick Checks, zero-LLM composable micro-scorers for common assertions like text matching, tool usage, and tool ordering.
- [Reference: Completeness scorer](references/reference-evals-completeness.md) - Documentation for the Completeness Scorer in Mastra, which evaluates how thoroughly LLM outputs cover key elements present in the input.
- [Reference: Content similarity scorer](references/reference-evals-content-similarity.md) - Documentation for the Content Similarity Scorer in Mastra, which measures textual similarity between strings and provides a matching score.
- [Reference: Context precision scorer](references/reference-evals-context-precision.md) - Documentation for the Context Precision Scorer in Mastra. Evaluates the relevance and precision of retrieved context for generating expected outputs using Mean Average Precision.
- [Reference: Context relevance scorer](references/reference-evals-context-relevance.md) - Documentation for the Context Relevance Scorer in Mastra. Evaluates the relevance and utility of provided context for generating agent responses using weighted relevance scoring.
- [Reference: Faithfulness scorer](references/reference-evals-faithfulness.md) - Documentation for the Faithfulness Scorer in Mastra, which evaluates the factual accuracy of LLM outputs compared to the provided context.
- [Reference: Hallucination scorer](references/reference-evals-hallucination.md) - Documentation for the Hallucination Scorer in Mastra, which evaluates the factual correctness of LLM outputs by identifying contradictions with provided context.
- [Reference: Keyword coverage scorer](references/reference-evals-keyword-coverage.md) - Documentation for the Keyword Coverage Scorer in Mastra, which evaluates how well LLM outputs cover important keywords from the input.
- [Reference: Noise sensitivity scorer](references/reference-evals-noise-sensitivity.md) - Documentation for the Noise Sensitivity Scorer in Mastra. A CI/testing scorer that evaluates agent robustness by comparing responses between clean and noisy inputs in controlled test environments.
- [Reference: Prompt alignment scorer](references/reference-evals-prompt-alignment.md) - Documentation for the Prompt Alignment Scorer in Mastra. Evaluates how well agent responses align with user prompt intent, requirements, completeness, and appropriateness using multi-dimensional analysis.
- [Reference: Rubric scorer](references/reference-evals-rubric.md) - Documentation for the Rubric Scorer in Mastra. An LLM-as-judge scorer that grades an agent output against a checklist of criteria and returns a binary verdict with per-criterion feedback, designed to drive isTaskComplete loops.
- [Reference: Scorer utils](references/reference-evals-scorer-utils.md) - Utility functions for extracting data from scorer run inputs and outputs, including text content, reasoning, system messages, and tool calls.
- [Reference: Textual difference scorer](references/reference-evals-textual-difference.md) - Documentation for the Textual Difference Scorer in Mastra, which measures textual differences between strings using sequence matching.
- [Reference: Tone consistency scorer](references/reference-evals-tone-consistency.md) - Documentation for the Tone Consistency Scorer in Mastra, which evaluates emotional tone and sentiment consistency in text.
- [Reference: Tool call accuracy scorers](references/reference-evals-tool-call-accuracy.md) - Documentation for the Tool Call Accuracy Scorers in Mastra, which evaluate whether LLM outputs call the correct tools from available options.
- [Reference: Toxicity scorer](references/reference-evals-toxicity.md) - Documentation for the Toxicity Scorer in Mastra, which evaluates LLM outputs for racist, biased, or toxic elements.
- [Reference: Trajectory accuracy scorers](references/reference-evals-trajectory-accuracy.md) - Documentation for the Trajectory Accuracy Scorers in Mastra, which evaluate whether an agent or workflow follows the expected sequence of actions.


Read [assets/SOURCE_MAP.json](assets/SOURCE_MAP.json) for source code references.