# dataset.startExperiment()

**Added in:** `@mastra/core@1.4.0`

Runs an experiment on the dataset and waits for completion. Executes all items against a target (agent, workflow, or scorer) with optional scoring.

## Usage example

```typescript
import { Mastra } from '@mastra/core'

const mastra = new Mastra({
  /* storage config */
})

const dataset = await mastra.datasets.get({ id: 'dataset-id' })

// Run against a registered agent with scorers
const summary = await dataset.startExperiment({
  targetType: 'agent',
  targetId: 'my-agent',
  scorers: ['accuracy', 'relevancy'],
  maxConcurrency: 10,
})

console.log(`${summary.succeededCount}/${summary.totalItems} succeeded`)
console.log(`Status: ${summary.status}`)
```

## Parameters

**targetType** (`'agent' | 'workflow' | 'scorer'`): Type of registered target to run items against. Use with \`targetId\`.

**targetId** (`string`): ID of the registered target. Use with \`targetType\`.

**scorers** (`(MastraScorer | string)[]`): Scorers to evaluate each result. Pass \`MastraScorer\` instances or registered scorer IDs.

**name** (`string`): Display name for the experiment.

**description** (`string`): Description of the experiment.

**metadata** (`Record<string, unknown>`): Arbitrary metadata for the experiment.

**version** (`number`): Pin to a specific dataset version. Defaults to the latest version.

**maxConcurrency** (`number`): Maximum concurrent item executions. Defaults to \`5\`.

**signal** (`AbortSignal`): AbortSignal for cancelling the experiment.

**itemTimeout** (`number`): Per-item execution timeout in milliseconds.

**maxRetries** (`number`): Maximum retries per item on failure. Defaults to \`0\` (no retries). Abort errors are never retried.

## Returns

**result** (`Promise<ExperimentSummary>`): Summary of the completed experiment.

**result.experimentId** (`string`): Unique ID of the experiment.

**result.status** (`'pending' | 'running' | 'completed' | 'failed'`): Final status of the experiment.

**result.totalItems** (`number`): Total number of items in the dataset.

**result.succeededCount** (`number`): Number of items that succeeded.

**result.failedCount** (`number`): Number of items that failed.

**result.skippedCount** (`number`): Number of items skipped (e.g., due to abort).

**result.completedWithErrors** (`boolean`): \`true\` if the run completed but some items failed.

**result.startedAt** (`Date`): When the experiment started.

**result.completedAt** (`Date`): When the experiment completed.

**result.results** (`ItemWithScores[]`): All item results with their scores.

**result.results.itemId** (`string`): ID of the dataset item.

**result.results.itemVersion** (`number`): Dataset version of the item when executed.

**result.results.input** (`unknown`): Input data passed to the target.

**result.results.output** (`unknown | null`): Output from the target, or \`null\` if failed.

**result.results.groundTruth** (`unknown | null`): Expected output from the dataset item.

**result.results.error** (`{ message: string; stack?: string; code?: string } | null`): Structured error if execution failed.

**result.results.startedAt** (`Date`): When item execution started.

**result.results.completedAt** (`Date`): When item execution completed.

**result.results.retryCount** (`number`): Number of retry attempts.

**result.results.scores** (`ScorerResult[]`): Results from all scorers for this item.

**result.results.scores.scorerId** (`string`): ID of the scorer.

**result.results.scores.scorerName** (`string`): Display name of the scorer.

**result.results.scores.score** (`number | null`): Computed score, or \`null\` if the scorer failed.

**result.results.scores.reason** (`string | null`): Reason/explanation for the score.

**result.results.scores.error** (`string | null`): Error message if the scorer failed.

## Related

- [dataset.startExperimentAsync()](https://mastra.ai/reference/datasets/startExperimentAsync)
- [dataset.listExperiments()](https://mastra.ai/reference/datasets/listExperiments)
- [DatasetsManager.compareExperiments()](https://mastra.ai/reference/datasets/compareExperiments)