# ExtractParams

ExtractParams configures metadata extraction from document chunks using LLM analysis.

## Example

```typescript
import { MDocument } from '@mastra/rag'

const doc = MDocument.fromText(text)
const chunks = await doc.chunk({
  extract: {
    title: true, // Extract titles using default settings
    summary: true, // Generate summaries using default settings
    keywords: true, // Extract keywords using default settings
  },
})

// Example output:
// chunks[0].metadata = {
//   documentTitle: "AI Systems Overview",
//   sectionSummary: "Overview of artificial intelligence concepts and applications",
//   excerptKeywords: "KEYWORDS: AI, machine learning, algorithms"
// }
```

## Parameters

The `extract` parameter accepts the following fields:

**title** (`boolean | TitleExtractorsArgs`): Enable title extraction. Set to true for default settings, or provide custom configuration.

**summary** (`boolean | SummaryExtractArgs`): Enable summary extraction. Set to true for default settings, or provide custom configuration.

**questions** (`boolean | QuestionAnswerExtractArgs`): Enable question generation. Set to true for default settings, or provide custom configuration.

**keywords** (`boolean | KeywordExtractArgs`): Enable keyword extraction. Set to true for default settings, or provide custom configuration.

**schema** (`SchemaExtractArgs`): Enable structured metadata extraction using a Zod schema.

## Extractor arguments

### `TitleExtractorsArgs`

**llm** (`MastraLanguageModel`): AI SDK language model to use for title extraction

**nodes** (`number`): Number of title nodes to extract

**nodeTemplate** (`string`): Custom prompt template for title node extraction. Must include {context} placeholder

**combineTemplate** (`string`): Custom prompt template for combining titles. Must include {context} placeholder

### `SummaryExtractArgs`

**llm** (`MastraLanguageModel`): AI SDK language model to use for summary extraction

**summaries** (`('self' | 'prev' | 'next')[]`): List of summary types to generate. Can only include 'self' (current chunk), 'prev' (previous chunk), or 'next' (next chunk)

**promptTemplate** (`string`): Custom prompt template for summary generation. Must include {context} placeholder

### `QuestionAnswerExtractArgs`

**llm** (`MastraLanguageModel`): AI SDK language model to use for question generation

**questions** (`number`): Number of questions to generate

**promptTemplate** (`string`): Custom prompt template for question generation. Must include both {context} and {numQuestions} placeholders

**embeddingOnly** (`boolean`): If true, only generate embeddings without actual questions

### `KeywordExtractArgs`

**llm** (`MastraLanguageModel`): AI SDK language model to use for keyword extraction

**keywords** (`number`): Number of keywords to extract

**promptTemplate** (`string`): Custom prompt template for keyword extraction. Must include both {context} and {maxKeywords} placeholders

### `SchemaExtractArgs`

**schema** (`ZodType`): Zod schema defining the structure of the data to extract.

**llm** (`MastraLanguageModel`): AI SDK language model to use for extraction.

**instructions** (`string`): Instructions for the LLM on what to extract.

**metadataKey** (`string`): Key to nest extraction results under. If omitted, results are spread into the metadata object.

## Advanced example

```typescript
import { MDocument } from '@mastra/rag'

const doc = MDocument.fromText(text)
const chunks = await doc.chunk({
  extract: {
    // Title extraction with custom settings
    title: {
      nodes: 2, // Extract 2 title nodes
      nodeTemplate: 'Generate a title for this: {context}',
      combineTemplate: 'Combine these titles: {context}',
    },

    // Summary extraction with custom settings
    summary: {
      summaries: ['self'], // Generate summaries for current chunk
      promptTemplate: 'Summarize this: {context}',
    },

    // Question generation with custom settings
    questions: {
      questions: 3, // Generate 3 questions
      promptTemplate: 'Generate {numQuestions} questions about: {context}',
      embeddingOnly: false,
    },

    // Keyword extraction with custom settings
    keywords: {
      keywords: 5, // Extract 5 keywords
      promptTemplate: 'Extract {maxKeywords} key terms from: {context}',
    },

    // Schema extraction with Zod
    schema: {
      schema: z.object({
        productName: z.string(),
        category: z.enum(['electronics', 'clothing']),
      }),
      instructions: 'Extract product information.',
      metadataKey: 'product',
    },
  },
})

// Example output:
// chunks[0].metadata = {
//   documentTitle: "AI in Modern Computing",
//   sectionSummary: "Overview of AI concepts and their applications in computing",
//   questionsThisExcerptCanAnswer: "1. What is machine learning?\n2. How do neural networks work?",
//   excerptKeywords: "1. Machine learning\n2. Neural networks\n3. Training data",
//   product: {
//     productName: "Neural Net 2000",
//     category: "electronics"
//   }
// }
```

## Document grouping for title extraction

When using the `TitleExtractor`, you can group multiple chunks together for title extraction by specifying a shared `docId` in the `metadata` field of each chunk. All chunks with the same `docId` will receive the same extracted title. If no `docId` is set, each chunk is treated as its own document for title extraction.

**Example:**

```ts
import { MDocument } from '@mastra/rag'

const doc = new MDocument({
  docs: [
    { text: 'chunk 1', metadata: { docId: 'docA' } },
    { text: 'chunk 2', metadata: { docId: 'docA' } },
    { text: 'chunk 3', metadata: { docId: 'docB' } },
  ],
  type: 'text',
})

await doc.extractMetadata({ title: true })
// The first two chunks will share a title, while the third chunk will be assigned a separate title.
```