# PG vector store

The PgVector class provides vector search using [PostgreSQL](https://www.postgresql.org/) with [pgvector](https://github.com/pgvector/pgvector) extension. It provides robust vector similarity search capabilities within your existing PostgreSQL database.

## Constructor options

**connectionString** (`string`): PostgreSQL connection URL

**host** (`string`): PostgreSQL server host

**port** (`number`): PostgreSQL server port

**database** (`string`): PostgreSQL database name

**user** (`string`): PostgreSQL user

**password** (`string`): PostgreSQL password

**ssl** (`boolean | ConnectionOptions`): Enable SSL or provide custom SSL configuration

**schemaName** (`string`): The name of the schema you want the vector store to use. Will use the default schema if not provided.

**max** (`number`): Maximum number of pool connections (default: 20)

**idleTimeoutMillis** (`number`): Idle connection timeout in milliseconds (default: 30000)

**pgPoolOptions** (`PoolConfig`): Additional pg pool configuration options

## Constructor examples

### Connection String

```ts
import { PgVector } from '@mastra/pg'

const vectorStore = new PgVector({
  id: 'pg-vector',
  connectionString: 'postgresql://user:password@localhost:5432/mydb',
})
```

### Host/Port/Database Configuration

```ts
const vectorStore = new PgVector({
  id: 'pg-vector',
  host: 'localhost',
  port: 5432,
  database: 'mydb',
  user: 'postgres',
  password: 'password',
})
```

### Advanced Configuration

```ts
const vectorStore = new PgVector({
  id: 'pg-vector',
  connectionString: 'postgresql://user:password@localhost:5432/mydb',
  schemaName: 'custom_schema',
  max: 30,
  idleTimeoutMillis: 60000,
  pgPoolOptions: {
    connectionTimeoutMillis: 5000,
    allowExitOnIdle: true,
  },
})
```

## Methods

### `createIndex()`

**indexName** (`string`): Name of the index to create

**dimension** (`number`): Vector dimension (must match your embedding model)

**metric** (`'cosine' | 'euclidean' | 'dotproduct'`): Distance metric for similarity search (Default: `cosine`)

**indexConfig** (`IndexConfig`): Index configuration (Default: `{ type: 'ivfflat' }`)

**buildIndex** (`boolean`): Whether to build the index (Default: `true`)

**metadataIndexes** (`string[]`): Array of metadata field names to create btree indexes on. Improves query performance when filtering by these metadata fields.

#### `IndexConfig`

**type** (`'flat' | 'hnsw' | 'ivfflat'`): Index type (Default: `ivfflat`)

**type.flat** (`flat`): Sequential scan (no index) that performs exhaustive search.

**type.ivfflat** (`ivfflat`): Clusters vectors into lists for approximate search.

**type.hnsw** (`hnsw`): Graph-based index offering fast search times and high recall.

**ivf** (`IVFConfig`): IVF configuration

**ivf.lists** (`number`): Number of lists. If not specified, automatically calculated based on dataset size. (Minimum 100, Maximum 4000)

**hnsw** (`HNSWConfig`): HNSW configuration

**hnsw\.m** (`number`): Maximum number of connections per node (default: 8)

**hnsw\.efConstruction** (`number`): Build-time complexity (default: 32)

#### Memory Requirements

HNSW indexes require significant shared memory during construction. For 100K vectors:

- Small dimensions (64d): \~60MB with default settings
- Medium dimensions (256d): \~180MB with default settings
- Large dimensions (384d+): \~250MB+ with default settings

Higher M values or efConstruction values will increase memory requirements significantly. Adjust your system's shared memory limits if needed.

### `upsert()`

**indexName** (`string`): Name of the index to upsert vectors into

**vectors** (`number[][]`): Array of embedding vectors

**metadata** (`Record<string, any>[]`): Metadata for each vector

**ids** (`string[]`): Optional vector IDs (auto-generated if not provided)

### `query()`

**indexName** (`string`): Name of the index to query

**queryVector** (`number[]`): Query vector

**topK** (`number`): Number of results to return (Default: `10`)

**filter** (`Record<string, any>`): Metadata filters

**includeVector** (`boolean`): Whether to include the vector in the result (Default: `false`)

**minScore** (`number`): Minimum similarity score threshold (Default: `0`)

**options** (`{ ef?: number; probes?: number }`): Additional options for HNSW and IVF indexes

**options.ef** (`number`): HNSW search parameter

**options.probes** (`number`): IVF search parameter

### `listIndexes()`

Returns an array of index names as strings.

### `describeIndex()`

**indexName** (`string`): Name of the index to describe

Returns:

```typescript
interface PGIndexStats {
  dimension: number
  count: number
  metric: 'cosine' | 'euclidean' | 'dotproduct'
  type: 'flat' | 'hnsw' | 'ivfflat'
  config: {
    m?: number
    efConstruction?: number
    lists?: number
    probes?: number
  }
}
```

### `deleteIndex()`

**indexName** (`string`): Name of the index to delete

### `updateVector()`

Update a single vector by ID or by metadata filter. Either `id` or `filter` must be provided, but not both.

**indexName** (`string`): Name of the index containing the vector

**id** (`string`): ID of the vector to update (mutually exclusive with filter)

**filter** (`Record<string, any>`): Metadata filter to identify vector(s) to update (mutually exclusive with id)

**update** (`{ vector?: number[]; metadata?: Record<string, any>; }`): Object containing the vector and/or metadata to update

Updates an existing vector by ID or filter. At least one of vector or metadata must be provided in the update object.

```typescript
// Update by ID
await pgVector.updateVector({
  indexName: 'my_vectors',
  id: 'vector123',
  update: {
    vector: [0.1, 0.2, 0.3],
    metadata: { label: 'updated' },
  },
})

// Update by filter
await pgVector.updateVector({
  indexName: 'my_vectors',
  filter: { category: 'product' },
  update: {
    metadata: { status: 'reviewed' },
  },
})
```

### `deleteVector()`

**indexName** (`string`): Name of the index containing the vector

**id** (`string`): ID of the vector to delete

Deletes a single vector by ID from the specified index.

```typescript
await pgVector.deleteVector({ indexName: 'my_vectors', id: 'vector123' })
```

### `deleteVectors()`

Delete multiple vectors by IDs or by metadata filter. Either `ids` or `filter` must be provided, but not both.

**indexName** (`string`): Name of the index containing the vectors to delete

**ids** (`string[]`): Array of vector IDs to delete (mutually exclusive with filter)

**filter** (`Record<string, any>`): Metadata filter to identify vectors to delete (mutually exclusive with ids)

### `disconnect()`

Closes the database connection pool. Should be called when done using the store.

### `buildIndex()`

**indexName** (`string`): Name of the index to define

**metric** (`'cosine' | 'euclidean' | 'dotproduct'`): Distance metric for similarity search (Default: `cosine`)

**indexConfig** (`IndexConfig`): Configuration for the index type and parameters

Builds or rebuilds an index with specified metric and configuration. Will drop any existing index before creating the new one.

```typescript
// Define HNSW index
await pgVector.buildIndex('my_vectors', 'cosine', {
  type: 'hnsw',
  hnsw: {
    m: 8,
    efConstruction: 32,
  },
})

// Define IVF index
await pgVector.buildIndex('my_vectors', 'cosine', {
  type: 'ivfflat',
  ivf: {
    lists: 100,
  },
})

// Define flat index
await pgVector.buildIndex('my_vectors', 'cosine', {
  type: 'flat',
})
```

## Response types

Query results are returned in this format:

```typescript
interface QueryResult {
  id: string
  score: number
  metadata: Record<string, any>
  vector?: number[] // Only included if includeVector is true
}
```

## Error handling

The store throws typed errors that can be caught:

```typescript
try {
  await store.query({
    indexName: 'index_name',
    queryVector: queryVector,
  })
} catch (error) {
  if (error instanceof VectorStoreError) {
    console.log(error.code) // 'connection_failed' | 'invalid_dimension' | etc
    console.log(error.details) // Additional error context
  }
}
```

## Index configuration guide

### Performance Optimization

#### IVFFlat Tuning

- **lists parameter**: Set to `sqrt(n) * 2` where n is the number of vectors
- More lists = better accuracy but slower build time
- Fewer lists = faster build but potentially lower accuracy

#### HNSW Tuning

- **m parameter**:

  - 8-16: Moderate accuracy, lower memory
  - 16-32: High accuracy, moderate memory
  - 32-64: Very high accuracy, high memory

- **efConstruction**:

  - 32-64: Fast build, good quality
  - 64-128: Slower build, better quality
  - 128-256: Slowest build, best quality

### Index Recreation Behavior

The system automatically detects configuration changes and only rebuilds indexes when necessary:

- Same configuration: Index is kept (no recreation)
- Changed configuration: Index is dropped and rebuilt
- This prevents the performance issues from unnecessary index recreations

## Best practices

- Regularly evaluate your index configuration to ensure optimal performance.
- Adjust parameters like `lists` and `m` based on dataset size and query requirements.
- **Monitor index performance** using `describeIndex()` to track usage
- Rebuild indexes periodically to maintain efficiency, especially after significant data changes

## Direct pool access

The `PgVector` class exposes its underlying PostgreSQL connection pool as a public field:

```typescript
pgVector.pool // instance of pg.Pool
```

This enables advanced usage such as running direct SQL queries, managing transactions, or monitoring pool state. When using the pool directly:

- You are responsible for releasing clients (`client.release()`) after use.
- The pool remains accessible after calling `disconnect()`, but new queries will fail.
- Direct access bypasses any validation or transaction logic provided by PgVector methods.

This design supports advanced use cases but requires careful resource management by the user.

## Usage example

### Local embeddings with fastembed

Embeddings are numeric vectors used by memory's `semanticRecall` to retrieve related messages by meaning (not keywords). This setup uses `@mastra/fastembed` to generate vector embeddings.

Install `fastembed` to get started:

**npm**:

```bash
npm install @mastra/fastembed@latest
```

**pnpm**:

```bash
pnpm add @mastra/fastembed@latest
```

**Yarn**:

```bash
yarn add @mastra/fastembed@latest
```

**Bun**:

```bash
bun add @mastra/fastembed@latest
```

Add the following to your agent:

```typescript
import { Memory } from '@mastra/memory'
import { Agent } from '@mastra/core/agent'
import { PostgresStore, PgVector } from '@mastra/pg'
import { fastembed } from '@mastra/fastembed'

export const pgAgent = new Agent({
  id: 'pg-agent',
  name: 'PG Agent',
  instructions:
    'You are an AI agent with the ability to automatically recall memories from previous interactions.',
  model: 'openai/gpt-5.4',
  memory: new Memory({
    storage: new PostgresStore({
      id: 'pg-agent-storage',
      connectionString: process.env.DATABASE_URL!,
    }),
    vector: new PgVector({
      id: 'pg-agent-vector',
      connectionString: process.env.DATABASE_URL!,
    }),
    embedder: fastembed,
    options: {
      lastMessages: 10,
      semanticRecall: {
        topK: 3,
        messageRange: 2,
      },
    },
  }),
})
```

## Related

- [Metadata Filters](https://mastra.ai/reference/rag/metadata-filters)