# MDocument

The MDocument class processes documents for RAG applications. The main methods are `.chunk()` and `.extractMetadata()`.

## Constructor

**docs** (`Array<{ text: string, metadata?: Record<string, any> }>`): Array of document chunks with their text content and optional metadata

**type** (`'text' | 'html' | 'markdown' | 'json' | 'latex'`): Type of document content

## Static methods

### `fromText()`

Creates a document from plain text content.

```typescript
static fromText(text: string, metadata?: Record<string, any>): MDocument
```

### `fromHTML()`

Creates a document from HTML content.

```typescript
static fromHTML(html: string, metadata?: Record<string, any>): MDocument
```

### `fromMarkdown()`

Creates a document from Markdown content.

```typescript
static fromMarkdown(markdown: string, metadata?: Record<string, any>): MDocument
```

### `fromJSON()`

Creates a document from JSON content.

```typescript
static fromJSON(json: string, metadata?: Record<string, any>): MDocument
```

## Instance methods

### `chunk()`

Splits document into chunks and optionally extracts metadata.

```typescript
async chunk(params?: ChunkParams): Promise<Chunk[]>
```

See [chunk() reference](https://mastra.ai/reference/rag/chunk) for detailed options.

### `getDocs()`

Returns array of processed document chunks.

```typescript
getDocs(): Chunk[]
```

### `getText()`

Returns array of text strings from chunks.

```typescript
getText(): string[]
```

### `getMetadata()`

Returns array of metadata objects from chunks.

```typescript
getMetadata(): Record<string, any>[]
```

### `extractMetadata()`

Extracts metadata using specified extractors. See [ExtractParams reference](https://mastra.ai/reference/rag/extract-params) for details.

```typescript
async extractMetadata(params: ExtractParams): Promise<MDocument>
```

## Examples

```typescript
import { MDocument } from '@mastra/rag'

// Create document from text
const doc = MDocument.fromText('Your content here')

// Split into chunks with metadata extraction
const chunks = await doc.chunk({
  strategy: 'markdown',
  headers: [
    ['#', 'title'],
    ['##', 'section'],
  ],
  extract: {
    summary: true, // Extract summaries with default settings
    keywords: true, // Extract keywords with default settings
  },
})

// Get processed chunks
const docs = doc.getDocs()
const texts = doc.getText()
const metadata = doc.getMetadata()
```