# harden-react-markdown

A wrapper for [react-markdown](https://www.npmjs.com/package/react-markdown) that ensures that untrusted
markdown does not contain images from and links to unexpected origins.

This is particularly important for markdown returned from [LLMs in AI agents which might have been subject to prompt
injection](https://vercel.com/blog/building-secure-ai-agents).

## Secure prefixes

This package validates URL prefixes and URL origins. Prefix allow-lists can be circumvented
with open redirects, so make sure to make the prefixes are specific enough to avoid such attacks.

E.g. it is more secure to allow `https://example.com/images/` than it is to allow all of
`https://example.com/` which may contain open redirects.

Additionally, URLs may contain path traversal like `/../`. This package does not resolve these.
It is your responsibility that your web server does not allow such traversal.

## Features

- 🔒 **URL Filtering**: Blocks links and images that don't match allowed URL prefixes
- 🔧 **Drop-in Replacement**: Works with any react-markdown compatible component

## Installation

```bash
npm install harden-react-markdown react react-markdown
# or
yarn add harden-react-markdown react react-markdown
# or
pnpm add harden-react-markdown react react-markdown
```

## Quick Start

```tsx
import React from "react";
import ReactMarkdown from "react-markdown";
import hardenReactMarkdown from "harden-react-markdown";

// Create a hardened version of ReactMarkdown
const HardenedMarkdown = hardenReactMarkdown(ReactMarkdown);

function MyComponent() {
  const markdown = `
# My Document
[Safe Link](https://github.com/user/repo)
[Blocked Link](https://malicious-site.com)
![Safe Image](https://via.placeholder.com/150)
![Blocked Image](https://evil.com/tracker.gif)
  `;

  return (
    <HardenedMarkdown
      defaultOrigin="https://mysite.com"
      allowedLinkPrefixes={["https://github.com/", "https://docs."]}
      allowedImagePrefixes={["https://via.placeholder.com/", "/"]}
    >
      {markdown}
    </HardenedMarkdown>
  );
}
```

## API

### `hardenReactMarkdown(MarkdownComponent)`

Creates a hardened version of any react-markdown compatible component.

#### Parameters

- `MarkdownComponent`: A React component that accepts `Options` from react-markdown

#### Returns

A new component with enhanced security that accepts all original props plus:

### Props

#### `defaultOrigin?: string`

- The origin to resolve relative URLs against
- Required when `allowedLinkPrefixes` or `allowedImagePrefixes` are provided
- Example: `"https://mysite.com"`

#### `allowedLinkPrefixes?: string[]`

- Array of URL prefixes that are allowed for links
- Links not matching these prefixes will be blocked and shown as `[blocked]`
- Use `"*"` to allow all URLs (disables filtering. However, `javascript:` and `data:` URLs are always disallowed)
- Default: `[]` (blocks all links)
- Example: `['https://github.com/', 'https://docs.example.com/']` or `['*']`

#### `allowedImagePrefixes?: string[]`

- Array of URL prefixes that are allowed for images
- Images not matching these prefixes will be blocked and shown as placeholders
- Use `"*"` to allow all URLs (disables filtering. However, `javascript:` and `data:` URLs are always disallowed unless `allowDataImages` is enabled)
- Default: `[]` (blocks all images)
- Example: `['https://via.placeholder.com/', '/']` or `['*']`

#### `allowDataImages?: boolean`

- When set to `true`, allows `data:image/*` URLs (base64-encoded images) in image sources
- This is useful for scenarios where images are embedded directly in markdown (e.g., documents converted from PDF or .docx)
- Only `data:image/*` URLs are allowed; other `data:` URLs (like `data:text/html`) remain blocked for security
- `data:` URLs are never allowed in links, regardless of this setting
- Default: `false` (blocks all data: URLs)
- Example: `true`

#### `allowedProtocols?: string[]`

- Array of custom URL protocols that are allowed in links
- Useful for deep links to applications (e.g., `tel:`, `mailto:`, `postman:`, `vscode:`, `slack:`)
- Use `"*"` to allow all protocols that can be parsed as valid URLs
- Dangerous protocols (`javascript:`, `data:`, `file:`, `vbscript:`) are **always blocked** regardless of this setting
- Default: `[]` (only allows built-in safe protocols: `https:`, `http:`, `mailto:`, `irc:`, `ircs:`, `xmpp:`, `blob:`)
- Example: `['tel:', 'postman:', 'vscode:']` or `['*']`

#### `linkBlockPolicy?: BlockPolicyType`

- Controls how blocked links are handled
- `"indicator"` (default): Renders as plain text with `[blocked]` suffix and the blocked URL in a title attribute
- `"text-only"`: Renders just the link text without any indicator or URL
- `"remove"`: Removes the blocked link entirely from the output

#### `imageBlockPolicy?: BlockPolicyType`

- Controls how blocked images are handled
- `"indicator"` (default): Renders as a placeholder span with `[Image blocked: {alt text}]`
- `"text-only"`: Renders just the alt text (images with no alt text are removed)
- `"remove"`: Removes the blocked image entirely from the output

All other props are passed through to the wrapped markdown component.

## Examples

### Basic Usage with Default Blocking

```tsx
const HardenedMarkdown = hardenReactMarkdown(ReactMarkdown);

// Blocks all external links and images by default
<HardenedMarkdown>{markdownContent}</HardenedMarkdown>;
```

### Allow Specific Domains

```tsx
<HardenedMarkdown
  defaultOrigin="https://mysite.com"
  allowedLinkPrefixes={[
    "https://github.com/",
    "https://docs.github.com/",
    "https://www.npmjs.com/",
  ]}
  allowedImagePrefixes={[
    "https://via.placeholder.com/",
    "https://images.unsplash.com/",
    "/", // Allow relative images
  ]}
>
  {markdownContent}
</HardenedMarkdown>
```

### Relative URL Handling

```tsx
<HardenedMarkdown
  defaultOrigin="https://mysite.com"
  allowedLinkPrefixes={["https://mysite.com/"]}
  allowedImagePrefixes={["https://mysite.com/"]}
>
  {`
  [Relative Link](/internal-page)
  ![Relative Image](/images/logo.png)
  `}
</HardenedMarkdown>
```

### Allow All URLs (Wildcard)

```tsx
<HardenedMarkdown allowedLinkPrefixes={["*"]} allowedImagePrefixes={["*"]}>
  {`
  [Any Link](https://anywhere.com/link)
  ![Any Image](https://untrusted-site.com/image.jpg)
  `}
</HardenedMarkdown>
```

**Note**: Using `"*"` disables URL filtering entirely. Only use this when you trust the markdown source.

### Allow Base64 Images

```tsx
<HardenedMarkdown
  defaultOrigin="https://mysite.com"
  allowedImagePrefixes={["https://mysite.com/"]}
  allowDataImages={true}
>
  {`
  ![Base64 Image](data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mNk+M9QDwADhgGAWjR9awAAAABJRU5ErkJggg==)
  ![Regular Image](https://mysite.com/image.png)
  `}
</HardenedMarkdown>
```

**Note**: This is particularly useful when converting documents from formats like PDF or .docx where images are embedded as base64. Only `data:image/*` URLs are allowed; other data: URLs remain blocked for security.

### Custom Protocol Support

Enable custom protocols for deep linking to applications and services:

```tsx
<HardenedMarkdown allowedProtocols={["tel:", "mailto:", "postman:", "vscode:", "slack:"]}>
  {`
  [Call us](tel:+1234567890)
  [Email support](mailto:support@example.com)
  [Open in Postman](postman://open/collection)
  [View in VS Code](vscode://file/path/to/file.ts)
  [Join Slack](slack://channel?id=C123456)
  `}
</HardenedMarkdown>
```

**Common use cases:**
- **`tel:`** - Phone number links that open the dialer on mobile devices
- **`mailto:`** - Email links (allowed by default, but shown here for completeness)
- **`sms:`** - SMS/text message links
- **`postman:`**, **`vscode:`**, **`slack:`** - Deep links to desktop applications
- **Custom app protocols** - Links to your own Electron or native applications

You can also use the wildcard to allow any custom protocol:

```tsx
<HardenedMarkdown allowedProtocols={["*"]}>
  {`[Custom Protocol Link](customapp://action)`}
</HardenedMarkdown>
```

**Security Note**: Even with `allowedProtocols={["*"]}`, dangerous protocols like `javascript:`, `data:`, `file:`, and `vbscript:` are **always blocked** for security. Custom protocols are safe because they trigger OS-level protocol handlers and don't execute in the browser context.

### Block Policies

Control how blocked content is handled instead of the default `[blocked]` indicator:

```tsx
<HardenedMarkdown
  defaultOrigin="https://mysite.com"
  allowedLinkPrefixes={["https://trusted.com/"]}
  allowedImagePrefixes={["https://trusted.com/"]}
  linkBlockPolicy="text-only" // Show link text only, no [blocked] indicator
  imageBlockPolicy="remove" // Remove blocked images entirely
>
  {markdownContent}
</HardenedMarkdown>
```

Available policies: `"indicator"` (default), `"text-only"`, `"remove"`.

### Custom Components

```tsx
const CustomMarkdown = (props) => (
  <div className="custom-wrapper">
    <ReactMarkdown {...props} />
  </div>
);

const HardenedCustomMarkdown = hardenReactMarkdown(CustomMarkdown);

<HardenedCustomMarkdown
  defaultOrigin="https://mysite.com"
  allowedLinkPrefixes={["https://trusted.com/"]}
>
  {markdownContent}
</HardenedCustomMarkdown>;
```

## Security Features

### URL Filtering

- **Links**: Filters `href` attributes in `<a>` elements
- **Images**: Filters `src` attributes in `<img>` elements
- **Relative URLs**: Properly resolves and validates relative URLs against `defaultOrigin`
- **Path Traversal Protection**: Normalizes URLs to prevent `../` attacks
- **Wildcard Support**: Use `"*"` prefix to disable filtering (only when markdown is trusted)
- **Prefix Matching**: Validates that URLs start with allowed prefixes and have matching origins

### Blocked Content Handling

Behavior is configurable per element type via `linkBlockPolicy` and `imageBlockPolicy`:

- **`"indicator"`** (default): Blocked links show a `[blocked]` suffix; blocked images show `[Image blocked: {alt}]`
- **`"text-only"`**: Outputs just the link text or image alt text with no indicator
- **`"remove"`**: Removes blocked elements entirely from the output

### Attack Prevention

- **XSS Prevention**: Blocks `javascript:`, `data:`, `vbscript:`, `file:` and other dangerous protocols (always, regardless of configuration)
- **Redirect Protection**: Prevents unauthorized redirects to malicious sites
- **Tracking Prevention**: Blocks unauthorized image tracking pixels
- **Domain Spoofing**: Validates full URLs, not just domains
- **Custom Protocols**: Optional support for custom protocols (e.g., `tel:`, `postman:`, `vscode:`) with explicit opt-in via `allowedProtocols`

## TypeScript Support

Full TypeScript support with strict type checking:

```tsx
// Type-safe component creation
const HardenedMarkdown = hardenReactMarkdown(ReactMarkdown);

// Inferred prop types include both react-markdown Options and security options
type Props = Parameters<typeof HardenedMarkdown>[0];

// Works with custom markdown components
const CustomMarkdown = (props: Options & { customProp?: string }) => (
  <ReactMarkdown {...props} />
);

const HardenedCustom = hardenReactMarkdown(CustomMarkdown);
// Props now include customProp + security options
```

## Testing

The package includes comprehensive tests covering:

- Basic markdown rendering
- URL filtering for links and images
- Relative URL handling
- Security bypass prevention
- Edge cases and malformed URLs
- TypeScript type safety

Run tests:

```bash
npm test
```

## Contributing

1. Fork the repository
2. Create your feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add some amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request

## License

MIT License - see the [LICENSE](LICENSE) file for details.

## Security

If you discover a security vulnerability, please send an e-mail to <security@vercel.com>.
