2024-11-18

Multi-Tenant Architecture: Why Tenant-By-Row Kills Performance

Scenario: You have a SaaS ERP with 500 tenants (companies). Tenant #47 has 2 million invoices. Tenant #312 has 50 invoices.

Query:

SELECT * FROM invoices 
WHERE tenant_id = 312 
  AND status = 'unpaid' 
  AND due_date < NOW()
ORDER BY created_at DESC 
LIMIT 10;

Problem: PostgreSQL must scan 2+ million rows to find 10 invoices for tenant #312.

Why? Because you're using the tenant-by-row anti-pattern.


Tenant-By-Row: The Worst Multi-Tenant Pattern

How It Works

Every table has a tenant_id column:

CREATE TABLE invoices (
  id SERIAL PRIMARY KEY,
  tenant_id INTEGER NOT NULL,
  invoice_number VARCHAR(50),
  amount DECIMAL(10,2),
  status VARCHAR(20),
  created_at TIMESTAMP
);

CREATE INDEX idx_tenant_invoices 
ON invoices(tenant_id, status, created_at);

Typical SaaS setup:

  • 500 tenants
  • Largest tenant: 2M invoices (80% of total data)
  • Smallest tenant: 50 invoices
  • Total rows: 2.5M

The Performance Disaster

Query for small tenant:

EXPLAIN ANALYZE
SELECT * FROM invoices 
WHERE tenant_id = 312 
  AND status = 'unpaid'
LIMIT 10;

Execution plan:

Index Scan using idx_tenant_invoices
  Filter: (tenant_id = 312 AND status = 'unpaid')
  Rows Removed by Filter: 2,487,234
  Buffers: 45,231 (hit), 12,345 (read from disk)
Execution time: 1,250 ms

What happened:

  1. Index scan starts at tenant_id = 312
  2. But index is polluted with 2M rows from tenant #47
  3. PostgreSQL must skip over millions of irrelevant rows
  4. Result: 1,250ms for a query that should take 5ms

Real-World Benchmark: Tenant-By-Row vs Schema-Per-Tenant

Test Setup

Database: PostgreSQL 15, 32GB RAM, SSD
Data: 2.5M invoices across 500 tenants
Query: Fetch 10 unpaid invoices for tenant #312

Tenant-By-Row Pattern

-- Shared table with tenant_id column
CREATE TABLE invoices (
  id SERIAL PRIMARY KEY,
  tenant_id INTEGER NOT NULL,
  -- ... other fields
);
CREATE INDEX ON invoices(tenant_id, status);

Results:

  • Query time: 1,250ms (average)
  • Index bloat: 4.2GB
  • Cache efficiency: Poor (large tenant pollutes cache)
  • Vacuum time: 45 minutes

Schema-Per-Tenant Pattern (PMFA)

-- Separate schema for each tenant
CREATE SCHEMA tenant_312;
CREATE TABLE tenant_312.invoices (
  id SERIAL PRIMARY KEY,
  -- No tenant_id needed!
  -- ... other fields
);
CREATE INDEX ON tenant_312.invoices(status);

Results:

  • Query time: 12ms (average)
  • Index size: 850KB (per tenant)
  • Cache efficiency: Excellent (only relevant data)
  • Vacuum time: 2 seconds (per tenant)

Performance improvement: 104x faster (1,250ms → 12ms)


Why Tenant-By-Row Is Broken

1. Index Bloat

Problem: Indexes include ALL tenants, even though queries only need ONE.

Example:

-- 2.5M rows total, but you only query 1 tenant at a time
CREATE INDEX idx_tenant_status ON invoices(tenant_id, status);

Index structure:

tenant_id=1, status='unpaid' → [1000 rows]
tenant_id=2, status='unpaid' → [800 rows]
...
tenant_id=47, status='unpaid' → [2,000,000 rows]  ← 80% of index!
...
tenant_id=312, status='unpaid' → [50 rows]

When you query tenant #312:

  • Index scan must skip over 2M rows from tenant #47
  • Disk I/O: reading irrelevant pages
  • Cache pollution: tenant #47 data evicts tenant #312 data

2. Query Planner Confusion

PostgreSQL statistics are global:

ANALYZE invoices;

-- Result:
-- Avg rows per tenant: 5,000
-- But reality:
--   Tenant #47: 2,000,000 rows
--   Tenant #312: 50 rows

Query planner assumes average:

SELECT * FROM invoices 
WHERE tenant_id = 312 
  AND status = 'unpaid';

-- Planner estimates: 5,000 rows (WRONG!)
-- Actual: 50 rows
-- Result: Chooses wrong index or scan strategy

3. Vacuum & Maintenance Nightmare

Tenant #47 has frequent updates:

  • 2M rows × 10 updates/day = 20M row versions created per day
  • VACUUM must scan entire table to clean up
  • Small tenants suffer from large tenant's churn

VACUUM output:

VACUUM ANALYZE invoices;
-- Scanned 2.5M rows
-- Removed 18M dead row versions (mostly from tenant #47)
-- Time: 45 minutes
-- During this time: locks, performance degradation

4. Connection Pooling Issues

Scenario: Tenant #47 runs expensive report query:

-- Tenant #47 query (locks pages)
SELECT SUM(amount) FROM invoices 
WHERE tenant_id = 47 
  AND created_at > '2024-01-01';

Impact on other tenants:

  • Locks 2M rows worth of pages
  • Other tenants' queries wait for locks
  • One tenant's workload affects everyone

PMFA Solution: Schema-Per-Tenant

Architecture

Core principle: Each tenant gets isolated PostgreSQL schema.

-- Tenant #1
CREATE SCHEMA tenant_1;
CREATE TABLE tenant_1.invoices (...);

-- Tenant #312
CREATE SCHEMA tenant_312;
CREATE TABLE tenant_312.invoices (...);

-- Tenant #47
CREATE SCHEMA tenant_47;
CREATE TABLE tenant_47.invoices (...);

PMFA meta-model:

Entity: Invoice
  fields:
    - number: string
    - amount: decimal
    - status: enum[draft, unpaid, paid]
  indexes:
    - [status, created_at]

# PMFA generates:
# - tenant_1.invoices (50 rows)
# - tenant_312.invoices (50 rows)
# - tenant_47.invoices (2M rows)

Each tenant:

  • Own indexes (no bloat)
  • Own statistics (accurate query plans)
  • Own vacuum schedule
  • Isolated performance

Query Execution

PMFA routing layer:

async function queryInvoices(tenantId: number, filter: object) {
  // Dynamically select correct schema
  const schema = `tenant_${tenantId}`;
  
  // Query only relevant data
  return await db.query(`
    SELECT * FROM ${schema}.invoices 
    WHERE status = $1 
    ORDER BY created_at DESC 
    LIMIT 10
  `, [filter.status]);
}

No tenant_id filter needed. No cross-tenant pollution.

Performance Results

Metric Tenant-By-Row Schema-Per-Tenant Improvement
Query time (small tenant) 1,250ms 12ms 104x faster
Index size (per tenant) 4.2GB (shared) 850KB 5000x smaller
Vacuum time 45 min (all) 2 sec (per tenant) 1350x faster
Cache efficiency Poor Excellent 10x better
Noisy neighbor effect High None Isolated

Migration Strategy: Tenant-By-Row → Schema-Per-Tenant

Step 1: Provision New Schemas

// PMFA provisioning script
for (const tenant of tenants) {
  await db.query(`CREATE SCHEMA tenant_${tenant.id}`);
  
  // Generate tables from YAML meta-model
  const tables = pmfa.generateSchema(metaModel);
  for (const table of tables) {
    await db.query(table.createSQL, { schema: `tenant_${tenant.id}` });
  }
}

Step 2: Migrate Data in Background

// Run during off-peak hours
for (const tenant of tenants) {
  await db.query(`
    INSERT INTO tenant_${tenant.id}.invoices 
    SELECT id, invoice_number, amount, status, created_at
    FROM public.invoices 
    WHERE tenant_id = ${tenant.id}
  `);
}

Step 3: Switch Routing

// Before:
const invoices = await db.query(
  'SELECT * FROM invoices WHERE tenant_id = $1',
  [tenantId]
);

// After:
const invoices = await db.query(
  `SELECT * FROM tenant_${tenantId}.invoices`
);

Step 4: Drop Old Table

-- After verifying all tenants migrated
DROP TABLE public.invoices;

Total migration time (500 tenants): 4 hours
Downtime: 0 (run in background, switch atomically)


Common Objections Answered

"Schema-per-tenant doesn't scale to 10,000+ tenants"

False. PostgreSQL handles 10,000+ schemas easily.

Evidence:

  • Each schema is just a namespace (minimal overhead)
  • pgAdmin manages 50,000+ schemas in production systems
  • PMFA customers running 8,000+ tenant schemas without issues

Actual limits:

  • Shared buffers: 128GB RAM = cache for 10,000 tenants
  • Connections: Use schema-aware connection pooling
  • Maintenance: Parallel vacuum across schemas

"Backups are harder with multiple schemas"

False. PostgreSQL backups handle schemas natively.

# Backup all tenants
pg_dump -Fc mydb > backup.dump

# Backup single tenant
pg_dump -Fc -n tenant_312 mydb > tenant_312.dump

# Restore single tenant (disaster recovery)
pg_restore -n tenant_312 -d mydb tenant_312.dump

PMFA backup strategy:

  • Automated per-tenant backups
  • Point-in-time recovery per tenant
  • One tenant's data corruption doesn't affect others

"Schema changes are complex with 500+ schemas"

False. PMFA automates schema changes.

Traditional approach:

-- Manually run for each tenant
ALTER TABLE tenant_1.invoices ADD COLUMN tax_rate DECIMAL;
ALTER TABLE tenant_2.invoices ADD COLUMN tax_rate DECIMAL;
-- ... 500 times

PMFA approach:

# Update YAML meta-model
Invoice:
  fields:
    - tax_rate: decimal  # New field

# PMFA auto-generates migration
$ pmfa migrate --from-version 1 --to-version 2
Migrating 500 tenants...
✅ tenant_1: 12ms
✅ tenant_2: 15ms
...
✅ tenant_500: 9ms
Total time: 8 seconds

When To Use Each Pattern

Use Tenant-By-Row If:

  • Small dataset (< 100K rows total)
  • All tenants roughly equal size
  • No performance requirements
  • Simple reporting across all tenants

Use Schema-Per-Tenant If:

  • Large dataset (> 1M rows)
  • Uneven tenant sizes (power law distribution)
  • Performance matters (< 50ms query time)
  • Noisy neighbor isolation required
  • Per-tenant backups/recovery needed

PMFA default: Schema-per-tenant (proven to scale to 10K+ tenants)


Conclusion

Tenant-by-row is broken for SaaS ERP:

  • ❌ 100x slower queries (index bloat)
  • ❌ Noisy neighbor effects (one tenant kills everyone)
  • ❌ Maintenance nightmares (45-minute vacuums)

Schema-per-tenant with PMFA:

  • ✅ 100x faster (isolated indexes)
  • ✅ Perfect isolation (no cross-tenant impact)
  • ✅ Easy maintenance (parallel vacuum, per-tenant backups)

The secret: PMFA generates schemas automatically from YAML. You define entity once, PMFA provisions 500 isolated tenants.

Real-world impact:

  • 🚀 104x faster queries (1,250ms → 12ms)
  • 💾 5000x smaller indexes (4.2GB → 850KB per tenant)
  • 1350x faster maintenance (45min → 2sec vacuum)

Want schema-per-tenant without manual management? Contact us at office@nonnotech.com for PMFA technical demo.