# Database Design & Data Architecture Standards

## Database Design Checklist

- [ ] Tables use consistent naming conventions ([Naming Conventions](#naming-conventions))
- [ ] Primary keys chosen appropriately ([Primary Keys](#primary-key-patterns))
- [ ] Foreign keys defined with cascade rules ([Relationships](#relationships-and-constraints))
- [ ] Indexes on foreign keys and frequent queries ([Indexing Strategy](#indexing-strategy))
- [ ] Check constraints enforce business rules ([Check Constraints](#check-constraints))
- [ ] Timestamp columns on all tables ([Audit Columns](#audit-columns))
- [ ] Soft delete pattern where appropriate ([Soft Deletes](#soft-deletes))
- [ ] Audit tables/triggers for sensitive data ([Data Integrity](#data-integrity-patterns))
- [ ] Proper data types chosen ([Data Types](#data-type-selection))
- [ ] Normalization applied appropriately ([Normalization](#normalization-principles))
- [ ] Database-level validation in place ([Constraints](#database-constraints))
- [ ] Migration scripts include rollback ([Migration Patterns](#migration-patterns))
- [ ] Backup strategy implemented ([Backup and Recovery](#backup-and-recovery))
- [ ] Performance monitoring configured ([Performance Monitoring](#performance-monitoring))
- [ ] Security permissions follow least privilege ([Security](#security-considerations))
- [ ] Connection pooling configured ([Connection Pooling](#connection-pooling))
- [ ] Query optimization reviewed with EXPLAIN ([Query Optimization](#query-optimization))
- [ ] Retention policies defined ([Data Retention](#data-retention-policies))
- [ ] File storage strategy implemented ([File Storage](#file-storage-standards))
- [ ] NoSQL considerations evaluated ([NoSQL](#nosql-considerations))
- [ ] Documentation maintained ([Schema Documentation](#schema-documentation))

## Core Philosophy

Database design is the foundation of application performance and scalability. Good schema design prevents technical debt, ensures data integrity, and makes complex queries possible. Design for tomorrow's scale while keeping today's simplicity.

## Schema Design Principles

### Naming Conventions
Use consistent, descriptive naming across all database objects.

**Why:** Consistent naming reduces cognitive load, prevents errors, and makes queries self-documenting. It also helps new developers understand the schema quickly.

```sql
-- Tables: Plural, snake_case
users
user_profiles
order_items

-- Columns: Singular, snake_case
id
created_at
user_id
is_active

-- Indexes: idx_table_columns
idx_users_email
idx_orders_user_id_created_at

-- Constraints: constraint_type_table_columns
pk_users_id
fk_posts_user_id
uq_users_email
ck_users_age

-- Example table with all conventions
CREATE TABLE users (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    email VARCHAR(255) NOT NULL,
    first_name VARCHAR(100),
    last_name VARCHAR(100),
    is_active BOOLEAN DEFAULT true,
    created_at TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP,
    
    CONSTRAINT uq_users_email UNIQUE (email),
    CONSTRAINT ck_users_email CHECK (email ~* '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$')
);

CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_users_created_at ON users(created_at DESC);
```

### Primary Key Design
Choose appropriate primary key strategies.

**Why:** Primary keys are immutable identifiers that affect performance, storage, and integration patterns. The right choice depends on your specific needs for ordering, privacy, and distribution.

```sql
-- UUID: Best for distributed systems, privacy
CREATE TABLE users (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid()
);

-- Serial/AutoIncrement: Simple, efficient for single database
CREATE TABLE products (
    id SERIAL PRIMARY KEY
);

-- Composite: For many-to-many relationships
CREATE TABLE user_roles (
    user_id UUID REFERENCES users(id),
    role_id UUID REFERENCES roles(id),
    PRIMARY KEY (user_id, role_id)
);

-- Natural keys: Only when truly immutable
CREATE TABLE countries (
    iso_code CHAR(2) PRIMARY KEY,  -- ISO 3166-1 alpha-2
    name VARCHAR(100) NOT NULL
);

-- ULID/KSUID: Sortable + unique (implement as function)
CREATE TABLE events (
    id VARCHAR(26) PRIMARY KEY DEFAULT generate_ulid(),
    -- Sortable by creation time while still being unique
);
```

### Data Types Selection
Choose the most appropriate data type for each column.

**Why:** Correct data types ensure data integrity, optimize storage, improve query performance, and prevent subtle bugs. They serve as the first line of defense against bad data.

```sql
-- Text types
VARCHAR(n)     -- Variable length with limit (email, username)
TEXT           -- Unlimited text (comments, descriptions)
CHAR(n)        -- Fixed length (country codes, status codes)

-- Numeric types
INTEGER        -- Whole numbers (-2B to 2B)
BIGINT         -- Large whole numbers (for IDs that might exceed INT)
DECIMAL(p,s)   -- Exact decimals for money (DECIMAL(10,2))
FLOAT/DOUBLE   -- Approximate decimals (scientific data, coordinates)

-- Date/Time types
DATE           -- Date only (birth_date)
TIME           -- Time only (opening_hours)
TIMESTAMP      -- Date+Time without timezone (legacy data)
TIMESTAMPTZ    -- Date+Time with timezone (ALWAYS use for new data)

-- Boolean
BOOLEAN        -- TRUE/FALSE/NULL (is_active, is_verified)

-- JSON (PostgreSQL)
JSONB          -- Binary JSON, indexed, preferred over JSON
JSON           -- Text JSON, preserves formatting

-- Arrays (PostgreSQL)
INTEGER[]      -- Array of integers (tag_ids)
TEXT[]         -- Array of strings (tags, categories)

-- Example: Choosing right types
CREATE TABLE products (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    sku VARCHAR(50) NOT NULL,                    -- Limited string
    name VARCHAR(255) NOT NULL,                  -- Limited string
    description TEXT,                             -- Unlimited text
    price DECIMAL(10, 2) NOT NULL,              -- Money, exact
    weight FLOAT,                                 -- Physical measurement
    quantity INTEGER NOT NULL DEFAULT 0,         -- Whole number
    tags TEXT[],                                  -- Array of strings
    metadata JSONB DEFAULT '{}',                 -- Flexible data
    is_available BOOLEAN DEFAULT true,           -- Boolean flag
    created_at TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP -- Always TZ
);
```

### Timestamp Standards
Implement consistent timestamp tracking.

**Why:** Timestamps are crucial for debugging, auditing, and data analysis. Consistent timestamp columns across all tables enable powerful time-based queries and simplify data synchronization.

```sql
-- Standard timestamp columns for all tables
CREATE TABLE base_table (
    -- ... other columns ...
    created_at TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP,
    deleted_at TIMESTAMPTZ  -- For soft deletes
);

-- Automatic updated_at trigger (PostgreSQL)
CREATE OR REPLACE FUNCTION update_updated_at_column()
RETURNS TRIGGER AS $$
BEGIN
    NEW.updated_at = CURRENT_TIMESTAMP;
    RETURN NEW;
END;
$$ language 'plpgsql';

CREATE TRIGGER update_users_updated_at 
    BEFORE UPDATE ON users
    FOR EACH ROW 
    EXECUTE FUNCTION update_updated_at_column();

-- Soft delete pattern
CREATE TABLE posts (
    id UUID PRIMARY KEY,
    title VARCHAR(255),
    deleted_at TIMESTAMPTZ,
    deleted_by UUID REFERENCES users(id)
);

-- Query only active records
CREATE VIEW active_posts AS 
SELECT * FROM posts WHERE deleted_at IS NULL;

-- Hard delete after retention period
DELETE FROM posts 
WHERE deleted_at < CURRENT_TIMESTAMP - INTERVAL '90 days';
```

## Normalization Principles

### Normalization Levels
Apply appropriate normalization for your use case.

**Why:** Normalization eliminates redundancy and ensures data consistency, but over-normalization can hurt performance. The key is finding the right balance for your specific needs.

```sql
-- First Normal Form (1NF): Atomic values, no repeating groups
-- ❌ BAD: Multiple values in single column
CREATE TABLE users_bad (
    id INTEGER,
    name VARCHAR(255),
    phone_numbers VARCHAR(255)  -- "123-456-7890, 098-765-4321"
);

-- ✅ GOOD: Separate table for multiple values
CREATE TABLE users (
    id INTEGER PRIMARY KEY,
    name VARCHAR(255)
);

CREATE TABLE user_phones (
    id INTEGER PRIMARY KEY,
    user_id INTEGER REFERENCES users(id),
    phone_number VARCHAR(20),
    type VARCHAR(20)  -- 'mobile', 'home', 'work'
);

-- Second Normal Form (2NF): No partial dependencies
-- ❌ BAD: Partial dependency on composite key
CREATE TABLE order_items_bad (
    order_id INTEGER,
    product_id INTEGER,
    product_name VARCHAR(255),  -- Depends only on product_id
    quantity INTEGER,
    PRIMARY KEY (order_id, product_id)
);

-- ✅ GOOD: Move product details to products table
CREATE TABLE products (
    id INTEGER PRIMARY KEY,
    name VARCHAR(255)
);

CREATE TABLE order_items (
    order_id INTEGER REFERENCES orders(id),
    product_id INTEGER REFERENCES products(id),
    quantity INTEGER,
    PRIMARY KEY (order_id, product_id)
);

-- Third Normal Form (3NF): No transitive dependencies
-- ❌ BAD: Transitive dependency
CREATE TABLE employees_bad (
    id INTEGER PRIMARY KEY,
    name VARCHAR(255),
    department_id INTEGER,
    department_name VARCHAR(255),  -- Depends on department_id
    department_budget DECIMAL(10,2)  -- Depends on department_id
);

-- ✅ GOOD: Separate departments table
CREATE TABLE departments (
    id INTEGER PRIMARY KEY,
    name VARCHAR(255),
    budget DECIMAL(10,2)
);

CREATE TABLE employees (
    id INTEGER PRIMARY KEY,
    name VARCHAR(255),
    department_id INTEGER REFERENCES departments(id)
);
```

### Strategic Denormalization
Know when to break normalization rules.

**Why:** Denormalization trades storage for performance. It's valuable for read-heavy workloads, reporting, and reducing complex joins, but requires careful management to maintain consistency.

```sql
-- Denormalized for performance: Store calculated values
CREATE TABLE orders (
    id UUID PRIMARY KEY,
    user_id UUID REFERENCES users(id),
    
    -- Normalized approach would calculate these from order_items
    subtotal DECIMAL(10,2),      -- Denormalized
    tax_amount DECIMAL(10,2),     -- Denormalized  
    total_amount DECIMAL(10,2),   -- Denormalized
    item_count INTEGER,           -- Denormalized
    
    created_at TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP
);

-- Maintain consistency with triggers
CREATE OR REPLACE FUNCTION update_order_totals()
RETURNS TRIGGER AS $$
BEGIN
    UPDATE orders SET
        subtotal = (SELECT SUM(price * quantity) FROM order_items WHERE order_id = NEW.order_id),
        item_count = (SELECT COUNT(*) FROM order_items WHERE order_id = NEW.order_id),
        total_amount = subtotal + tax_amount
    WHERE id = NEW.order_id;
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

-- Denormalized for querying: Materialized views
CREATE MATERIALIZED VIEW user_statistics AS
SELECT 
    u.id,
    u.email,
    COUNT(DISTINCT o.id) as total_orders,
    SUM(o.total_amount) as lifetime_value,
    MAX(o.created_at) as last_order_date,
    AVG(o.total_amount) as average_order_value
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id, u.email;

-- Refresh periodically
CREATE INDEX idx_user_statistics_lifetime_value ON user_statistics(lifetime_value DESC);
REFRESH MATERIALIZED VIEW CONCURRENTLY user_statistics;
```

## Indexing Strategy

### Index Types and Usage
Choose the right index type for your queries.

**Why:** Proper indexing can improve query performance by orders of magnitude. Different index types optimize for different query patterns, and the wrong index can actually hurt performance.

```sql
-- B-Tree Index (default): Equality and range queries
CREATE INDEX idx_users_created_at ON users(created_at);
SELECT * FROM users WHERE created_at > '2024-01-01';

-- Unique Index: Enforce uniqueness + performance
CREATE UNIQUE INDEX uq_users_email ON users(email);
SELECT * FROM users WHERE email = 'user@example.com';

-- Composite Index: Multiple columns (order matters!)
CREATE INDEX idx_orders_user_id_created_at ON orders(user_id, created_at DESC);
SELECT * FROM orders WHERE user_id = ? AND created_at > ?;

-- Partial Index: Index subset of rows
CREATE INDEX idx_users_active ON users(email) WHERE is_active = true;
SELECT * FROM users WHERE email = ? AND is_active = true;

-- Expression Index: Index computed values
CREATE INDEX idx_users_email_lower ON users(LOWER(email));
SELECT * FROM users WHERE LOWER(email) = LOWER(?);

-- GIN Index: Full-text search and arrays (PostgreSQL)
CREATE INDEX idx_products_tags ON products USING GIN(tags);
SELECT * FROM products WHERE tags @> ARRAY['electronics'];

-- Full-text search index (PostgreSQL)
CREATE INDEX idx_products_search ON products USING GIN(to_tsvector('english', name || ' ' || description));
SELECT * FROM products WHERE to_tsvector('english', name || ' ' || description) @@ to_tsquery('laptop');

-- Spatial Index: Geographic queries (PostGIS)
CREATE INDEX idx_locations_coords ON locations USING GIST(coordinates);
SELECT * FROM locations WHERE ST_DWithin(coordinates, point, 1000);
```

### Index Design Guidelines
Follow best practices for index creation.

**Why:** Too many indexes slow down writes, too few slow down reads. Strategic indexing based on actual query patterns ensures optimal performance without unnecessary overhead.

```sql
-- Index foreign keys (almost always)
CREATE INDEX idx_posts_user_id ON posts(user_id);

-- Index columns used in WHERE clauses
CREATE INDEX idx_users_status ON users(status) WHERE status != 'active';

-- Index columns used in ORDER BY
CREATE INDEX idx_products_price ON products(price DESC);

-- Covering index: Include all needed columns
CREATE INDEX idx_orders_covering ON orders(user_id, created_at) INCLUDE (total_amount, status);
-- Query can be satisfied entirely from index
SELECT total_amount, status FROM orders WHERE user_id = ? AND created_at > ?;

-- Monitor index usage
SELECT 
    schemaname,
    tablename,
    indexname,
    idx_scan as index_scans,
    idx_tup_read as tuples_read,
    idx_tup_fetch as tuples_fetched,
    pg_size_pretty(pg_relation_size(indexrelid)) as index_size
FROM pg_stat_user_indexes
ORDER BY idx_scan DESC;

-- Find missing indexes (PostgreSQL)
SELECT 
    schemaname,
    tablename,
    attname,
    n_distinct,
    array_to_string(most_common_vals, ',') as common_values,
    correlation
FROM pg_stats
WHERE tablename = 'your_table'
AND correlation < 0.1  -- Poor correlation might benefit from index
ORDER BY n_distinct DESC;

-- Remove unused indexes
DROP INDEX IF EXISTS idx_unused_index;
```

## Relationships and Constraints

### Foreign Key Design
Implement referential integrity properly.

**Why:** Foreign keys prevent orphaned data, enforce business rules at the database level, and make relationships explicit. They're your safety net against application bugs.

```sql
-- Basic foreign key
CREATE TABLE posts (
    id UUID PRIMARY KEY,
    user_id UUID NOT NULL REFERENCES users(id),
    title VARCHAR(255)
);

-- Foreign key with explicit actions
CREATE TABLE comments (
    id UUID PRIMARY KEY,
    post_id UUID NOT NULL,
    user_id UUID NOT NULL,
    content TEXT,
    
    CONSTRAINT fk_comments_post 
        FOREIGN KEY (post_id) 
        REFERENCES posts(id) 
        ON DELETE CASCADE      -- Delete comments when post deleted
        ON UPDATE CASCADE,     -- Update if post ID changes
        
    CONSTRAINT fk_comments_user
        FOREIGN KEY (user_id)
        REFERENCES users(id)
        ON DELETE SET NULL     -- Keep comment but remove user reference
        ON UPDATE CASCADE
);

-- Self-referential foreign key
CREATE TABLE employees (
    id UUID PRIMARY KEY,
    name VARCHAR(255),
    manager_id UUID,
    
    CONSTRAINT fk_employee_manager
        FOREIGN KEY (manager_id)
        REFERENCES employees(id)
        ON DELETE SET NULL
);

-- Composite foreign key
CREATE TABLE order_items (
    order_id UUID,
    product_id UUID,
    variant_id UUID,
    quantity INTEGER,
    
    PRIMARY KEY (order_id, product_id, variant_id),
    
    CONSTRAINT fk_order_items_product_variant
        FOREIGN KEY (product_id, variant_id)
        REFERENCES product_variants(product_id, id)
);
```

### Check Constraints
Enforce business rules at the database level.

**Why:** Check constraints catch bad data before it enters the database, serving as the last line of defense against application bugs and ensuring data integrity even during direct database access.

```sql
-- Simple check constraints
CREATE TABLE users (
    id UUID PRIMARY KEY,
    email VARCHAR(255) NOT NULL,
    age INTEGER,
    status VARCHAR(20),
    
    CONSTRAINT ck_users_age CHECK (age >= 0 AND age <= 150),
    CONSTRAINT ck_users_email CHECK (email ~* '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$'),
    CONSTRAINT ck_users_status CHECK (status IN ('active', 'inactive', 'suspended'))
);

-- Complex business rules
CREATE TABLE products (
    id UUID PRIMARY KEY,
    name VARCHAR(255),
    price DECIMAL(10,2),
    cost DECIMAL(10,2),
    discount_percentage INTEGER,
    minimum_quantity INTEGER DEFAULT 1,
    maximum_quantity INTEGER,
    
    CONSTRAINT ck_products_price_positive CHECK (price > 0),
    CONSTRAINT ck_products_profit_margin CHECK (price > cost),
    CONSTRAINT ck_products_discount_range CHECK (discount_percentage BETWEEN 0 AND 100),
    CONSTRAINT ck_products_quantity_logic CHECK (maximum_quantity >= minimum_quantity)
);

-- Date range constraints
CREATE TABLE subscriptions (
    id UUID PRIMARY KEY,
    user_id UUID REFERENCES users(id),
    start_date DATE NOT NULL,
    end_date DATE,
    
    CONSTRAINT ck_subscription_dates CHECK (
        end_date IS NULL OR end_date > start_date
    )
);

-- Conditional constraints
CREATE TABLE orders (
    id UUID PRIMARY KEY,
    status VARCHAR(20),
    paid_at TIMESTAMPTZ,
    shipped_at TIMESTAMPTZ,
    
    CONSTRAINT ck_orders_payment_logic CHECK (
        (status != 'paid' OR paid_at IS NOT NULL) AND
        (status != 'shipped' OR shipped_at IS NOT NULL)
    )
);
```

## Query Optimization

### Query Performance Patterns
Write efficient queries from the start.

**Why:** Poor queries can bring down entire applications. Understanding how databases execute queries helps you write performant code that scales with your data.

```sql
-- Use EXPLAIN ANALYZE to understand query performance
EXPLAIN ANALYZE
SELECT u.*, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.created_at > '2024-01-01'
GROUP BY u.id;

-- Avoid SELECT * in production
-- ❌ BAD
SELECT * FROM users WHERE status = 'active';

-- ✅ GOOD: Select only needed columns
SELECT id, email, name FROM users WHERE status = 'active';

-- Use proper JOIN types
-- INNER JOIN: Only matching records
SELECT o.*, u.email
FROM orders o
INNER JOIN users u ON o.user_id = u.id;

-- LEFT JOIN: All from left, matching from right
SELECT u.*, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.id;

-- Optimize IN clauses with EXISTS
-- ❌ BAD for large datasets
SELECT * FROM users 
WHERE id IN (SELECT user_id FROM orders WHERE total > 100);

-- ✅ GOOD: EXISTS stops at first match
SELECT * FROM users u
WHERE EXISTS (
    SELECT 1 FROM orders o 
    WHERE o.user_id = u.id AND o.total > 100
);

-- Batch updates instead of individual
-- ❌ BAD: N queries
UPDATE users SET status = 'active' WHERE id = 1;
UPDATE users SET status = 'active' WHERE id = 2;

-- ✅ GOOD: Single query
UPDATE users SET status = 'active' WHERE id IN (1, 2, 3);

-- Use window functions for analytics
SELECT 
    id,
    email,
    created_at,
    ROW_NUMBER() OVER (ORDER BY created_at DESC) as user_rank,
    COUNT(*) OVER () as total_users,
    LAG(created_at) OVER (ORDER BY created_at) as previous_user_signup
FROM users;
```

### Common Performance Pitfalls
Avoid these query anti-patterns.

**Why:** These patterns seem innocent but can cause exponential performance degradation as data grows. Identifying and avoiding them early prevents crisis situations in production.

```sql
-- N+1 Query Problem
-- ❌ BAD: Fetching related data in loop
const users = await db.query('SELECT * FROM users');
for (const user of users) {
    const orders = await db.query('SELECT * FROM orders WHERE user_id = ?', [user.id]);
}

-- ✅ GOOD: Single query with JOIN or batch fetch
const usersWithOrders = await db.query(`
    SELECT u.*, 
           JSON_AGG(o.*) as orders
    FROM users u
    LEFT JOIN orders o ON u.id = o.user_id
    GROUP BY u.id
`);

-- Missing indexes on foreign keys
-- ❌ BAD: No index on foreign key column
CREATE TABLE posts (
    user_id UUID REFERENCES users(id)  -- No index!
);

-- ✅ GOOD: Index foreign keys
CREATE INDEX idx_posts_user_id ON posts(user_id);

-- Using functions on indexed columns
-- ❌ BAD: Function prevents index usage
SELECT * FROM users WHERE LOWER(email) = 'user@example.com';

-- ✅ GOOD: Use expression index or store lowercase
CREATE INDEX idx_users_email_lower ON users(LOWER(email));
-- Or store email as lowercase and query directly

-- Implicit type conversions
-- ❌ BAD: String to number conversion
SELECT * FROM orders WHERE user_id = '123';  -- user_id is INTEGER

-- ✅ GOOD: Use correct types
SELECT * FROM orders WHERE user_id = 123;
```

## Migration Patterns

### Schema Migration Strategy
Manage database changes safely.

**Why:** Database migrations are high-risk operations that can cause downtime or data loss. A systematic approach ensures changes are trackable, reversible, and safely deployable.

```sql
-- Migration file naming: timestamp_descriptive_name.sql
-- 20240115143022_add_users_table.sql
-- 20240116091533_add_index_users_email.sql

-- Always include up and down migrations
-- UP Migration
CREATE TABLE IF NOT EXISTS users (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    email VARCHAR(255) NOT NULL,
    created_at TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP
);

CREATE UNIQUE INDEX IF NOT EXISTS uq_users_email ON users(email);

-- DOWN Migration
DROP INDEX IF EXISTS uq_users_email;
DROP TABLE IF EXISTS users;

-- Safe column addition (non-breaking)
ALTER TABLE users 
ADD COLUMN IF NOT EXISTS phone_number VARCHAR(20);

-- Safe column removal (multiple steps)
-- Step 1: Stop using column in application
-- Step 2: Mark as deprecated
ALTER TABLE users 
ALTER COLUMN old_column SET DEFAULT NULL;
COMMENT ON COLUMN users.old_column IS 'DEPRECATED: Will be removed in v2.0';

-- Step 3: Remove column (after deployment)
ALTER TABLE users 
DROP COLUMN IF EXISTS old_column;

-- Safe column rename (multiple steps)
-- Step 1: Add new column
ALTER TABLE users ADD COLUMN username VARCHAR(100);

-- Step 2: Copy data
UPDATE users SET username = user_name WHERE username IS NULL;

-- Step 3: Add constraints to new column
ALTER TABLE users ALTER COLUMN username SET NOT NULL;

-- Step 4: Remove old column (after code deployment)
ALTER TABLE users DROP COLUMN user_name;

-- Safe NOT NULL addition
-- Step 1: Add column as nullable
ALTER TABLE users ADD COLUMN status VARCHAR(20);

-- Step 2: Backfill data
UPDATE users SET status = 'active' WHERE status IS NULL;

-- Step 3: Add NOT NULL constraint
ALTER TABLE users ALTER COLUMN status SET NOT NULL;
```

### Zero-Downtime Migrations
Deploy schema changes without stopping the application.

**Why:** Modern applications need to stay online during deployments. Zero-downtime migrations enable continuous deployment and improve user experience by eliminating maintenance windows.

```sql
-- Adding index concurrently (PostgreSQL)
CREATE INDEX CONCURRENTLY idx_orders_created_at ON orders(created_at);
-- Application stays online during index creation

-- Adding column with default (PostgreSQL 11+)
ALTER TABLE users 
ADD COLUMN credits INTEGER DEFAULT 0 NOT NULL;
-- Fast operation, no table rewrite

-- Changing column type safely
-- Step 1: Add new column
ALTER TABLE products ADD COLUMN price_new DECIMAL(12,2);

-- Step 2: Dual write from application
-- Application writes to both price and price_new

-- Step 3: Backfill historical data
UPDATE products 
SET price_new = CAST(price AS DECIMAL(12,2)) 
WHERE price_new IS NULL;

-- Step 4: Switch reads to new column
-- Step 5: Drop old column
ALTER TABLE products DROP COLUMN price;
ALTER TABLE products RENAME COLUMN price_new TO price;

-- Adding constraint safely
-- Step 1: Add as NOT VALID (doesn't check existing data)
ALTER TABLE orders 
ADD CONSTRAINT ck_orders_total_positive 
CHECK (total_amount >= 0) NOT VALID;

-- Step 2: Validate in background
ALTER TABLE orders 
VALIDATE CONSTRAINT ck_orders_total_positive;
```

## Data Integrity Patterns

### Transaction Management
Use transactions to maintain consistency.

**Why:** Transactions ensure that related changes either all succeed or all fail, preventing partial updates that leave data in an inconsistent state. They're essential for maintaining data integrity.

```typescript
// TypeScript/Node.js transaction example
async function transferMoney(
  fromUserId: string, 
  toUserId: string, 
  amount: number
) {
  const client = await pool.connect();
  
  try {
    await client.query('BEGIN');
    
    // Deduct from sender
    const deductResult = await client.query(
      'UPDATE accounts SET balance = balance - $1 WHERE user_id = $2 AND balance >= $1 RETURNING balance',
      [amount, fromUserId]
    );
    
    if (deductResult.rowCount === 0) {
      throw new Error('Insufficient funds');
    }
    
    // Add to receiver
    await client.query(
      'UPDATE accounts SET balance = balance + $1 WHERE user_id = $2',
      [amount, toUserId]
    );
    
    // Record transaction
    await client.query(
      'INSERT INTO transactions (from_user_id, to_user_id, amount) VALUES ($1, $2, $3)',
      [fromUserId, toUserId, amount]
    );
    
    await client.query('COMMIT');
    
  } catch (error) {
    await client.query('ROLLBACK');
    throw error;
  } finally {
    client.release();
  }
}

-- SQL transaction with isolation level
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;

-- Lock rows to prevent concurrent modifications
SELECT * FROM inventory 
WHERE product_id = '123' 
FOR UPDATE;

UPDATE inventory 
SET quantity = quantity - 1 
WHERE product_id = '123' AND quantity > 0;

INSERT INTO order_items (order_id, product_id, quantity) 
VALUES ('abc', '123', 1);

COMMIT;
```

### Audit Trail Pattern
Track all changes for compliance and debugging.

**Why:** Audit trails provide accountability, enable debugging of data issues, meet compliance requirements, and allow reconstruction of historical states.

```sql
-- Audit table for each main table
CREATE TABLE users_audit (
    audit_id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
    operation VARCHAR(10) NOT NULL CHECK (operation IN ('INSERT', 'UPDATE', 'DELETE')),
    user_id UUID NOT NULL,
    changed_by UUID,  -- User who made the change
    changed_at TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP,
    old_values JSONB,
    new_values JSONB,
    ip_address INET,
    user_agent TEXT
);

-- Trigger to automatically audit changes
CREATE OR REPLACE FUNCTION audit_trigger_function()
RETURNS TRIGGER AS $$
BEGIN
    IF TG_OP = 'INSERT' THEN
        INSERT INTO users_audit (operation, user_id, new_values, changed_by)
        VALUES ('INSERT', NEW.id, to_jsonb(NEW), current_setting('app.current_user_id')::UUID);
        RETURN NEW;
    ELSIF TG_OP = 'UPDATE' THEN
        INSERT INTO users_audit (operation, user_id, old_values, new_values, changed_by)
        VALUES ('UPDATE', NEW.id, to_jsonb(OLD), to_jsonb(NEW), current_setting('app.current_user_id')::UUID);
        RETURN NEW;
    ELSIF TG_OP = 'DELETE' THEN
        INSERT INTO users_audit (operation, user_id, old_values, changed_by)
        VALUES ('DELETE', OLD.id, to_jsonb(OLD), current_setting('app.current_user_id')::UUID);
        RETURN OLD;
    END IF;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER users_audit_trigger
AFTER INSERT OR UPDATE OR DELETE ON users
FOR EACH ROW EXECUTE FUNCTION audit_trigger_function();

-- Query audit history
SELECT 
    operation,
    changed_at,
    changed_by,
    old_values->>'email' as old_email,
    new_values->>'email' as new_email
FROM users_audit
WHERE user_id = '123'
ORDER BY changed_at DESC;
```

## Backup and Recovery

### Backup Strategy
Implement comprehensive backup procedures.

**Why:** Data loss can kill businesses. A proper backup strategy ensures you can recover from hardware failures, human errors, or security incidents while meeting your recovery objectives.

```bash
# Full backup daily
pg_dump -h localhost -U postgres -d myapp -F custom -b -v -f /backups/myapp_$(date +%Y%m%d).backup

# Incremental backup with WAL archiving (PostgreSQL)
# postgresql.conf
archive_mode = on
archive_command = 'rsync -a %p /backup/archive/%f'
wal_level = replica

# Point-in-time recovery setup
# Recovery to specific time
restore_command = 'cp /backup/archive/%f %p'
recovery_target_time = '2024-01-15 14:30:00'

# Test backup restoration regularly
createdb myapp_test
pg_restore -h localhost -U postgres -d myapp_test /backups/myapp_20240115.backup

# Automated backup script with retention
#!/bin/bash
BACKUP_DIR="/backups"
DB_NAME="myapp"
RETENTION_DAYS=30

# Create backup
pg_dump -F custom -b -v -f "$BACKUP_DIR/${DB_NAME}_$(date +%Y%m%d_%H%M%S).backup" $DB_NAME

# Remove old backups
find $BACKUP_DIR -name "${DB_NAME}_*.backup" -mtime +$RETENTION_DAYS -delete

# Upload to S3 (off-site backup)
aws s3 sync $BACKUP_DIR s3://my-backup-bucket/postgres/ --delete
```

### Data Retention Policies
Implement data lifecycle management.

**Why:** Data retention policies ensure compliance with regulations (GDPR, CCPA), reduce storage costs, improve performance, and minimize liability from holding unnecessary data.

```sql
-- Soft delete with retention period
CREATE TABLE user_data (
    id UUID PRIMARY KEY,
    user_id UUID REFERENCES users(id),
    data JSONB,
    created_at TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP,
    deleted_at TIMESTAMPTZ,
    purge_at TIMESTAMPTZ  -- When to hard delete
);

-- Mark for deletion with retention period
UPDATE user_data 
SET deleted_at = CURRENT_TIMESTAMP,
    purge_at = CURRENT_TIMESTAMP + INTERVAL '30 days'
WHERE user_id = ?;

-- Automated purge job
DELETE FROM user_data 
WHERE purge_at < CURRENT_TIMESTAMP;

-- Archive old data to separate table
-- Step 1: Create archive table
CREATE TABLE orders_archive (LIKE orders INCLUDING ALL);

-- Step 2: Move old data
INSERT INTO orders_archive 
SELECT * FROM orders 
WHERE created_at < CURRENT_TIMESTAMP - INTERVAL '2 years';

DELETE FROM orders 
WHERE created_at < CURRENT_TIMESTAMP - INTERVAL '2 years';

-- Partitioning for efficient data management (PostgreSQL)
CREATE TABLE events (
    id UUID DEFAULT gen_random_uuid(),
    event_type VARCHAR(50),
    data JSONB,
    created_at TIMESTAMPTZ NOT NULL DEFAULT CURRENT_TIMESTAMP
) PARTITION BY RANGE (created_at);

-- Create monthly partitions
CREATE TABLE events_2024_01 PARTITION OF events
FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');

-- Drop old partitions efficiently
DROP TABLE events_2022_01;
```

## Performance Monitoring

### Key Metrics to Track
Monitor database health proactively.

**Why:** Monitoring catches problems before they become critical, enables capacity planning, and provides data for optimization decisions. You can't improve what you don't measure.

```sql
-- Query performance statistics (PostgreSQL)
SELECT 
    query,
    calls,
    total_time,
    mean_time,
    max_time,
    rows,
    100.0 * total_time / SUM(total_time) OVER () AS percentage
FROM pg_stat_statements
ORDER BY total_time DESC
LIMIT 20;

-- Table size and bloat
SELECT
    schemaname,
    tablename,
    pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size,
    pg_size_pretty(pg_relation_size(schemaname||'.'||tablename)) AS table_size,
    pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename) - pg_relation_size(schemaname||'.'||tablename)) AS indexes_size
FROM pg_tables
WHERE schemaname NOT IN ('pg_catalog', 'information_schema')
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;

-- Connection monitoring
SELECT 
    datname,
    usename,
    application_name,
    state,
    COUNT(*) as connection_count
FROM pg_stat_activity
GROUP BY datname, usename, application_name, state
ORDER BY connection_count DESC;

-- Cache hit ratio (should be > 99%)
SELECT 
    sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read)) as cache_hit_ratio
FROM pg_statio_user_tables;

-- Lock monitoring
SELECT 
    blocking.pid AS blocking_pid,
    blocking.usename AS blocking_user,
    blocked.pid AS blocked_pid,
    blocked.usename AS blocked_user,
    blocked.query AS blocked_query
FROM pg_stat_activity AS blocked
JOIN pg_stat_activity AS blocking ON blocking.pid = ANY(pg_blocking_pids(blocked.pid))
WHERE blocked.pid != pg_backend_pid();
```

## Security Considerations

### Database Security Checklist
Protect your data from unauthorized access.

**Why:** Database breaches expose sensitive user data, destroy trust, and can result in massive fines. Security must be built in from the start, not added later.

```sql
-- Principle of least privilege
-- Create application user with limited permissions
CREATE USER app_user WITH PASSWORD 'secure_password';
GRANT CONNECT ON DATABASE myapp TO app_user;
GRANT USAGE ON SCHEMA public TO app_user;
GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public TO app_user;
-- Never grant SUPERUSER or CREATE privileges to application users

-- Row-level security (PostgreSQL)
ALTER TABLE documents ENABLE ROW LEVEL SECURITY;

CREATE POLICY documents_owner_policy ON documents
    FOR ALL
    TO app_user
    USING (owner_id = current_setting('app.current_user_id')::UUID);

-- Encrypt sensitive data
CREATE EXTENSION IF NOT EXISTS pgcrypto;

-- Store encrypted
INSERT INTO sensitive_data (user_id, ssn_encrypted)
VALUES (
    '123',
    pgp_sym_encrypt('123-45-6789', 'encryption_key')
);

-- Retrieve decrypted
SELECT 
    user_id,
    pgp_sym_decrypt(ssn_encrypted, 'encryption_key') as ssn
FROM sensitive_data
WHERE user_id = '123';

-- SQL injection prevention
-- ❌ NEVER do this
const query = `SELECT * FROM users WHERE email = '${userInput}'`;

-- ✅ Always use parameterized queries
const query = 'SELECT * FROM users WHERE email = $1';
const result = await db.query(query, [userInput]);

-- Audit database access
CREATE TABLE access_log (
    id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
    user_id UUID,
    table_name VARCHAR(100),
    operation VARCHAR(10),
    accessed_at TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP,
    ip_address INET,
    query TEXT
);
```

## File Storage Standards

### File Storage Strategy
Design scalable, secure file storage systems.

**Why:** Files don't belong in databases - they bloat backups, hurt performance, and make replication slow. A proper file storage strategy ensures scalability, performance, and cost-effectiveness.

```sql
-- Store metadata in database, files in object storage
CREATE TABLE user_files (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id UUID REFERENCES users(id) ON DELETE CASCADE,
    file_key VARCHAR(255) NOT NULL,        -- S3 key or file path
    file_name VARCHAR(255) NOT NULL,       -- Original filename
    file_size BIGINT NOT NULL,             -- Size in bytes
    mime_type VARCHAR(100),                -- application/pdf, image/jpeg, etc.
    storage_location VARCHAR(50) DEFAULT 's3', -- s3, local, gcs
    bucket_name VARCHAR(100),              -- For multi-bucket setups
    
    -- Metadata
    width INTEGER,                         -- For images
    height INTEGER,                        -- For images
    duration INTEGER,                      -- For videos (seconds)
    checksum VARCHAR(64),                  -- SHA-256 hash
    
    -- Access control
    is_public BOOLEAN DEFAULT false,
    expires_at TIMESTAMPTZ,                -- For temporary files
    
    -- Audit fields
    uploaded_at TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP,
    uploaded_by UUID REFERENCES users(id),
    last_accessed_at TIMESTAMPTZ,
    access_count INTEGER DEFAULT 0,
    
    -- Indexes
    CONSTRAINT uq_file_key UNIQUE(file_key),
    INDEX idx_user_files_user_id (user_id),
    INDEX idx_user_files_mime_type (mime_type),
    INDEX idx_user_files_uploaded_at (uploaded_at DESC)
);

-- Separate table for file versions
CREATE TABLE file_versions (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    file_id UUID REFERENCES user_files(id) ON DELETE CASCADE,
    version_number INTEGER NOT NULL,
    file_key VARCHAR(255) NOT NULL,
    file_size BIGINT NOT NULL,
    checksum VARCHAR(64),
    created_at TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP,
    created_by UUID REFERENCES users(id),
    
    UNIQUE(file_id, version_number)
);
```

### File Organization Patterns
Structure files for easy management and access.

**Why:** Good file organization prevents naming collisions, enables efficient cleanup, simplifies debugging, and makes migration easier. It also improves security through obscurity.

```typescript
// Path structure patterns
// Pattern 1: By date and user (good for user content)
// /uploads/2024/01/15/user_123/uuid_filename.ext
const getDateBasedPath = (userId: string, filename: string): string => {
  const date = new Date();
  const year = date.getFullYear();
  const month = String(date.getMonth() + 1).padStart(2, '0');
  const day = String(date.getDate()).padStart(2, '0');
  const fileId = generateUUID();
  const extension = path.extname(filename);
  
  return `uploads/${year}/${month}/${day}/user_${userId}/${fileId}${extension}`;
};

// Pattern 2: By content type (good for CDN optimization)
// /images/avatars/user_123/uuid.jpg
// /documents/invoices/2024/01/uuid.pdf
const getTypeBasedPath = (type: string, subtype: string, filename: string): string => {
  const fileId = generateUUID();
  const extension = path.extname(filename);
  
  return `${type}/${subtype}/${fileId}${extension}`;
};

// Pattern 3: By hash (deduplication)
// /files/a7/b3/a7b3c2d1e5f6.jpg (first 2 chars / next 2 chars / full hash)
const getHashBasedPath = async (fileBuffer: Buffer): Promise<string> => {
  const hash = crypto.createHash('sha256').update(fileBuffer).digest('hex');
  const extension = await getFileExtension(fileBuffer);
  
  return `files/${hash.slice(0, 2)}/${hash.slice(2, 4)}/${hash}${extension}`;
};

// Database record example
interface FileRecord {
  id: string;
  userId: string;
  fileKey: 'uploads/2024/01/15/user_123/550e8400-e29b-41d4-a716-446655440000.jpg';
  fileName: 'profile-photo.jpg';  // Original name
  fileSize: 102400;
  mimeType: 'image/jpeg';
  width: 800;
  height: 600;
  checksum: 'a7b3c2d1e5f6...';
}
```

### Direct Upload Pattern
Enable secure client-side uploads.

**Why:** Direct uploads reduce server load, improve upload speed, enable better progress tracking, and prevent large files from overwhelming your application servers.

```typescript
// Generate presigned URL for direct S3 upload
async function getPresignedUploadUrl(
  userId: string,
  filename: string,
  mimeType: string
): Promise<UploadCredentials> {
  // Validate file type and size
  if (!ALLOWED_MIME_TYPES.includes(mimeType)) {
    throw new Error('File type not allowed');
  }
  
  const fileKey = getDateBasedPath(userId, filename);
  const uploadId = generateUUID();
  
  // Store pending upload in database
  await db.query(
    `INSERT INTO pending_uploads 
     (id, user_id, file_key, expires_at) 
     VALUES ($1, $2, $3, $4)`,
    [uploadId, userId, fileKey, new Date(Date.now() + 3600000)] // 1 hour
  );
  
  // Generate presigned URL
  const command = new PutObjectCommand({
    Bucket: process.env.S3_BUCKET,
    Key: fileKey,
    ContentType: mimeType,
    Metadata: {
      userId,
      uploadId,
      originalName: filename
    }
  });
  
  const presignedUrl = await getSignedUrl(s3Client, command, { 
    expiresIn: 3600 // 1 hour
  });
  
  return {
    uploadId,
    uploadUrl: presignedUrl,
    fileKey,
    expiresAt: new Date(Date.now() + 3600000)
  };
}

// Confirm upload completion
async function confirmUpload(uploadId: string, userId: string): Promise<void> {
  // Verify upload exists and belongs to user
  const pending = await db.query(
    'SELECT * FROM pending_uploads WHERE id = $1 AND user_id = $2',
    [uploadId, userId]
  );
  
  if (!pending.rows[0]) {
    throw new Error('Invalid upload');
  }
  
  // Verify file exists in S3
  const headCommand = new HeadObjectCommand({
    Bucket: process.env.S3_BUCKET,
    Key: pending.rows[0].file_key
  });
  
  const metadata = await s3Client.send(headCommand);
  
  // Store file record
  await db.query(
    `INSERT INTO user_files 
     (user_id, file_key, file_name, file_size, mime_type, checksum)
     VALUES ($1, $2, $3, $4, $5, $6)`,
    [
      userId,
      pending.rows[0].file_key,
      pending.rows[0].original_name,
      metadata.ContentLength,
      metadata.ContentType,
      metadata.ETag
    ]
  );
  
  // Clean up pending upload
  await db.query('DELETE FROM pending_uploads WHERE id = $1', [uploadId]);
}
```

### Image Processing Pipeline
Handle image transformations efficiently.

**Why:** On-demand image processing saves storage, ensures consistent quality, enables responsive images, and provides better user experience with optimized formats.

```sql
-- Track image variants
CREATE TABLE image_variants (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    original_file_id UUID REFERENCES user_files(id) ON DELETE CASCADE,
    variant_key VARCHAR(50) NOT NULL,  -- thumbnail, medium, large, webp
    file_key VARCHAR(255) NOT NULL,
    width INTEGER,
    height INTEGER,
    file_size BIGINT,
    format VARCHAR(20),  -- jpeg, webp, avif
    quality INTEGER,     -- compression quality used
    created_at TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP,
    
    UNIQUE(original_file_id, variant_key),
    INDEX idx_image_variants_original (original_file_id)
);

-- Configuration table for image processing rules
CREATE TABLE image_processing_rules (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name VARCHAR(50) UNIQUE NOT NULL,
    max_width INTEGER,
    max_height INTEGER,
    quality INTEGER DEFAULT 85,
    format VARCHAR(20) DEFAULT 'jpeg',
    fit VARCHAR(20) DEFAULT 'cover', -- cover, contain, fill, inside, outside
    is_active BOOLEAN DEFAULT true
);

-- Insert standard rules
INSERT INTO image_processing_rules (name, max_width, max_height, quality, format) VALUES
('thumbnail', 200, 200, 80, 'jpeg'),
('small', 400, 400, 85, 'jpeg'),
('medium', 800, 800, 85, 'jpeg'),
('large', 1600, 1600, 90, 'jpeg'),
('webp_medium', 800, 800, 85, 'webp'),
('avif_medium', 800, 800, 85, 'avif');
```

### File Access Control
Implement secure file access patterns.

**Why:** Files often contain sensitive data that requires access control. Proper patterns prevent unauthorized access while enabling sharing when needed.

```typescript
// Secure file access with signed URLs
async function getSecureFileUrl(
  fileId: string,
  userId: string,
  expiresIn = 3600
): Promise<string> {
  // Check user has access to file
  const file = await db.query(
    `SELECT f.* FROM user_files f
     WHERE f.id = $1 
     AND (
       f.user_id = $2  -- Owner
       OR f.is_public = true  -- Public file
       OR EXISTS (  -- Shared with user
         SELECT 1 FROM file_shares fs 
         WHERE fs.file_id = f.id 
         AND fs.shared_with_user_id = $2
         AND (fs.expires_at IS NULL OR fs.expires_at > NOW())
       )
     )`,
    [fileId, userId]
  );
  
  if (!file.rows[0]) {
    throw new Error('File not found or access denied');
  }
  
  // Update access tracking
  await db.query(
    `UPDATE user_files 
     SET last_accessed_at = NOW(), 
         access_count = access_count + 1 
     WHERE id = $1`,
    [fileId]
  );
  
  // Generate signed URL
  const command = new GetObjectCommand({
    Bucket: process.env.S3_BUCKET,
    Key: file.rows[0].file_key,
    ResponseContentDisposition: `inline; filename="${file.rows[0].file_name}"`
  });
  
  return getSignedUrl(s3Client, command, { expiresIn });
}

// File sharing table
CREATE TABLE file_shares (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    file_id UUID REFERENCES user_files(id) ON DELETE CASCADE,
    shared_by_user_id UUID REFERENCES users(id),
    shared_with_user_id UUID REFERENCES users(id),
    permission VARCHAR(20) DEFAULT 'view', -- view, download, edit
    expires_at TIMESTAMPTZ,
    created_at TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP,
    
    UNIQUE(file_id, shared_with_user_id)
);
```

### File Cleanup and Lifecycle
Manage file lifecycle and storage costs.

**Why:** Orphaned files waste storage and money. Automated cleanup ensures you only pay for files you actually need while maintaining data retention compliance.

```sql
-- Track file references to prevent orphaned files
CREATE TABLE file_references (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    file_id UUID REFERENCES user_files(id) ON DELETE CASCADE,
    entity_type VARCHAR(50) NOT NULL,  -- 'post', 'comment', 'message'
    entity_id UUID NOT NULL,
    created_at TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP,
    
    UNIQUE(file_id, entity_type, entity_id),
    INDEX idx_file_references_entity (entity_type, entity_id)
);

-- Cleanup orphaned files (no references and not accessed recently)
WITH orphaned_files AS (
    SELECT f.id, f.file_key
    FROM user_files f
    LEFT JOIN file_references fr ON f.id = fr.file_id
    WHERE fr.id IS NULL
    AND f.created_at < NOW() - INTERVAL '7 days'
    AND (f.last_accessed_at IS NULL OR f.last_accessed_at < NOW() - INTERVAL '30 days')
    AND f.is_public = false
)
DELETE FROM user_files
WHERE id IN (SELECT id FROM orphaned_files);

-- Archive old files to cheaper storage
UPDATE user_files
SET storage_location = 'glacier',
    archived_at = NOW()
WHERE created_at < NOW() - INTERVAL '1 year'
AND access_count < 5
AND storage_location = 's3';

-- Set up lifecycle rules in S3
{
  "Rules": [{
    "Id": "ArchiveOldFiles",
    "Status": "Enabled",
    "Transitions": [{
      "Days": 90,
      "StorageClass": "STANDARD_IA"
    }, {
      "Days": 365,
      "StorageClass": "GLACIER"
    }],
    "NoncurrentVersionTransitions": [{
      "NoncurrentDays": 30,
      "StorageClass": "STANDARD_IA"
    }],
    "NoncurrentVersionExpiration": {
      "NoncurrentDays": 90
    }
  }]
}
```

### Backup and Disaster Recovery for Files
Ensure file availability and durability.

**Why:** Files are often irreplaceable user data. Proper backup strategies prevent data loss from accidental deletion, corruption, or service outages.

```typescript
// Multi-region replication setup
const replicationConfig = {
  Role: 'arn:aws:iam::account-id:role/replication-role',
  Rules: [{
    ID: 'ReplicateAll',
    Status: 'Enabled',
    Priority: 1,
    Filter: {},
    Destination: {
      Bucket: 'arn:aws:s3:::backup-bucket-us-west-2',
      StorageClass: 'STANDARD_IA'
    },
    DeleteMarkerReplication: {
      Status: 'Enabled'
    }
  }]
};

// Database backup includes file metadata
-- Backup file metadata with database
pg_dump --table=user_files --table=file_versions --table=file_shares

// Verify file integrity periodically
async function verifyFileIntegrity(): Promise<void> {
  const files = await db.query(
    'SELECT id, file_key, checksum FROM user_files WHERE verified_at < NOW() - INTERVAL \'30 days\' LIMIT 100'
  );
  
  for (const file of files.rows) {
    try {
      const headCommand = new HeadObjectCommand({
        Bucket: process.env.S3_BUCKET,
        Key: file.file_key
      });
      
      const metadata = await s3Client.send(headCommand);
      
      if (metadata.ETag !== file.checksum) {
        await logIntegrityError(file.id, file.checksum, metadata.ETag);
      }
      
      await db.query(
        'UPDATE user_files SET verified_at = NOW() WHERE id = $1',
        [file.id]
      );
    } catch (error) {
      await logMissingFile(file.id, file.file_key);
    }
  }
}
```

## NoSQL Considerations

### When to Use NoSQL
Choose the right tool for the job.

**Why:** NoSQL databases excel at specific use cases where relational databases struggle. Understanding when to use each ensures optimal performance and developer experience.

```javascript
// Document store (MongoDB) - Good for:
// - Flexible schemas
// - Nested data
// - Rapid prototyping

// User profile with nested data
{
  "_id": "user_123",
  "email": "user@example.com",
  "profile": {
    "name": "John Doe",
    "preferences": {
      "theme": "dark",
      "notifications": {
        "email": true,
        "push": false
      }
    }
  },
  "addresses": [
    {
      "type": "home",
      "street": "123 Main St",
      "city": "Portland"
    }
  ]
}

// Key-Value store (Redis) - Good for:
// - Caching
// - Sessions
// - Real-time features

// Cache user session
SETEX session:abc123 3600 '{"userId": "123", "role": "admin"}'

// Rate limiting
INCR api_calls:user:123
EXPIRE api_calls:user:123 60

// Time-series (InfluxDB) - Good for:
// - Metrics
// - IoT data
// - Analytics

// Write metric
INSERT temperature,location=room1 value=22.5

// Graph database (Neo4j) - Good for:
// - Social networks
// - Recommendations
// - Fraud detection

// Friend relationships
CREATE (u1:User {id: '123'})-[:FRIENDS_WITH]->(u2:User {id: '456'})
```

### Hybrid Approach
Combine SQL and NoSQL for optimal results.

**Why:** Different data has different requirements. Using the right database for each use case optimizes performance, developer experience, and operational costs.

```typescript
// PostgreSQL for core business data
const user = await postgres.query(
  'SELECT * FROM users WHERE id = $1',
  [userId]
);

// Redis for caching
const cacheKey = `user:${userId}`;
const cached = await redis.get(cacheKey);
if (cached) return JSON.parse(cached);

// Elasticsearch for full-text search
const searchResults = await elasticsearch.search({
  index: 'products',
  body: {
    query: {
      match: { description: searchTerm }
    }
  }
});

// S3 for file storage
const fileUrl = await s3.upload({
  Bucket: 'user-uploads',
  Key: `${userId}/${filename}`,
  Body: fileBuffer
});

// Store reference in PostgreSQL
await postgres.query(
  'INSERT INTO user_files (user_id, file_url) VALUES ($1, $2)',
  [userId, fileUrl]
);
```

