2024-11-18

Multi-Tenant Arhitektura: Zašto Tenant-By-Row Ubija Performance

Scenario: Imate SaaS ERP sa 500 tenanta (kompanija). Tenant #47 ima 2 miliona invoice-a. Tenant #312 ima 50 invoice-a.

Query:

SELECT * FROM invoices 
WHERE tenant_id = 312 
  AND status = 'unpaid' 
  AND due_date < NOW()
ORDER BY created_at DESC 
LIMIT 10;

Problem: PostgreSQL mora skenirati 2+ miliona redova da bi našao 10 faktura za tenant #312.

Zašto? Zato što ste koristili tenant-by-row pattern.


Tenant-By-Row: Najgori Multi-Tenant Pattern

Kako Radi

Svaka tabela ima tenant_id kolonu:

CREATE TABLE invoices (
  id UUID PRIMARY KEY,
  tenant_id UUID NOT NULL REFERENCES tenants(id),
  customer_id UUID NOT NULL,
  amount DECIMAL(15,2),
  status VARCHAR(50),
  due_date TIMESTAMP,
  created_at TIMESTAMP,
  -- ... još 20 kolona
);

CREATE INDEX idx_invoices_tenant ON invoices(tenant_id);

Svaki query mora filtirati po tenant_id:

// Backend kod
const invoices = await db.invoices.find({
  tenant_id: currentUser.tenantId,  // OBAVEZNO
  status: 'unpaid'
});

Problem #1: Indeksiranje Je Neupotrebljivo

PostgreSQL indeks na (tenant_id, status) postaje ogroman kada imate:

  • 500 tenanta
  • 10M+ total redova
  • Neravnomerna distribucija (1 tenant ima 50% podataka)

Index bloat je ogroman:

SELECT 
  schemaname, 
  tablename, 
  pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size
FROM pg_tables
WHERE tablename = 'invoices';

-- Rezultat: 12 GB (tabela) + 8 GB (indeksi) = 20 GB TOTAL

Problem #2: Query Planner Ne Može Optimizovati

Query za tenant sa 50 redova koristi isti execution plan kao query za tenant sa 2M redova:

EXPLAIN ANALYZE
SELECT * FROM invoices
WHERE tenant_id = '...' AND status = 'unpaid';

-- Tenant #312 (50 invoices):
-- Seq Scan on invoices  (cost=0.00..350000.00 rows=50 actual=23ms)

-- Tenant #47 (2M invoices):
-- Seq Scan on invoices  (cost=0.00..350000.00 rows=800000 actual=1870ms)

Isti query, 80x razlika u performance-u.

Problem #3: Vacuum i Maintenance Su Košmar

VACUUM mora procesirati celu tabelu, čak i ako tenant #312 ima samo 50 redova:

VACUUM ANALYZE invoices;
-- Processing 10M rows...
-- Time: 45 minutes

Tenant #312 plaća performance penalty zbog tenant #47.

Problem #4: Backup i Restore Nisu Selektivni

Ne možete uraditi per-tenant backup ili selective restore:

# Backup CELE baze (uključuje sve tenant-e)
pg_dump mydb > backup.sql

# Želite restore samo tenant #312?
# NEMA NAČINA. Moraš restore SVE pa filtirati.

Problem #5: Data Leakage Rizik

Jedan bug u WHERE clause-u = catastrophic data leak:

// BUG: Zaboravili tenant_id filter
const invoices = await db.invoices.find({
  status: 'unpaid'  // ❌ LEAK: Vrača SVE tenant-e
});

Compliance nightmare: GDPR, HIPAA, SOC2 - sve su violated jednim query-jem.


PMFA Rešenje: Tenant-By-Architecture

Ideja: Svaki tenant ima fizički odvojen storage (schema, tabela, ili database).

Pattern #1: Schema-Per-Tenant (PostgreSQL)

-- Tenant A
CREATE SCHEMA tenant_a;
CREATE TABLE tenant_a.invoices (
  id UUID PRIMARY KEY,
  -- NEMA tenant_id kolone
  customer_id UUID,
  amount DECIMAL(15,2),
  status VARCHAR(50),
  due_date TIMESTAMP
);

-- Tenant B
CREATE SCHEMA tenant_b;
CREATE TABLE tenant_b.invoices (
  id UUID PRIMARY KEY,
  customer_id UUID,
  amount DECIMAL(15,2),
  status VARCHAR(50),
  due_date TIMESTAMP
);

Query:

-- Automatski radi sa tenant context-om
SET search_path TO tenant_a;

SELECT * FROM invoices WHERE status = 'unpaid';
-- Skenira SAMO tenant_a.invoices (50 rows), ne celu bazu

Prednosti

10x+ brži queries - Indeksi su mali, samo za jedan tenant
Perfect isolation - Fizički odvojeni podaci, zero leakage rizik
Selective backups - pg_dump --schema=tenant_a
Per-tenant optimizacije - Vacuum, reindex, samo za jedan tenant
Custom fields per tenant - Tenant A ima kolonu warranty_period, Tenant B nema


PMFA Implementation: Automatski Schema Management

YAML Metadata

# metadata/entities/invoice.yml
entity: Invoice
multiTenant: true
strategy: schema-per-tenant  # PMFA automatski kreira schema za svakog tenant-a

fields:
  - name: id
    type: uuid
    primary: true

  - name: customerId
    type: relation
    target: Customer

  - name: amount
    type: decimal
    precision: 15
    scale: 2

  - name: status
    type: enum
    values: [draft, sent, paid, overdue]

  - name: dueDate
    type: timestamp

Automatic Schema Creation

Kada se novi tenant registruje, PMFA automatski:

// Korisnik kreira novi tenant
POST /api/tenants
{
  "name": "Acme Corp",
  "subdomain": "acme"
}

// PMFA automatski izvršava:
await db.query(`
  CREATE SCHEMA tenant_acme;
  
  -- Generiše SVAKU tabelu iz metadata/
  CREATE TABLE tenant_acme.invoices (...);
  CREATE TABLE tenant_acme.customers (...);
  CREATE TABLE tenant_acme.products (...);
  -- ... i sve ostale entitete
  
  -- Kreira indekse
  CREATE INDEX idx_invoices_status ON tenant_acme.invoices(status);
  CREATE INDEX idx_invoices_due_date ON tenant_acme.invoices(due_date);
`);

Zero ručni rad. Sve je metadata-driven.

Query Routing

PMFA automatski postavlja search_path na osnovu JWT tokena:

// Middleware
app.use((req, res, next) => {
  const tenantId = req.user.tenantId;
  const schema = `tenant_${tenantId}`;
  
  // Postavi PostgreSQL context
  await db.query(`SET search_path TO ${schema}`);
  
  next();
});

// Kontroler (bez tenant_id filtera!)
app.get('/api/invoices', async (req, res) => {
  // Query automatski koristi tenant_acme schema
  const invoices = await db.invoices.find({
    status: 'unpaid'
  });
  
  res.json(invoices);
});

Nema tenant_id kolona. Nema WHERE clause kompleksnost.


Benchmark: Tenant-By-Row vs Tenant-By-Schema

Setup:

  • 500 tenanta
  • 10M total invoices
  • Tenant #47: 2M invoices (20%)
  • Tenant #312: 50 invoices (0.0005%)

Query: SELECT * FROM invoices WHERE status = 'unpaid' LIMIT 10

Tenant-By-Row (klasičan pristup)

SELECT * FROM invoices 
WHERE tenant_id = '312' AND status = 'unpaid' 
LIMIT 10;

-- Execution time: 1,250ms
-- Rows scanned: 10M (full table scan sa filter)
-- Index used: idx_tenant_status (8 GB size)

Tenant-By-Schema (PMFA pristup)

SET search_path TO tenant_312;

SELECT * FROM invoices WHERE status = 'unpaid' LIMIT 10;

-- Execution time: 12ms
-- Rows scanned: 50 (samo tenant 312 podaci)
-- Index used: idx_status (24 KB size)

Rezultat: 104x brže za male tenant-e.


Šta Sa Custom Fields Per Tenant?

Problem: Tenant A želi custom polje warranty_period, Tenant B ne želi.

Klasičan tenant-by-row:

  • Dodaj warranty_period kolonu u shared invoices tabelu
  • 499 tenanta ima NULL vrednost (storage waste)
  • Ili koristite JSONB kolonu (lose performance, no type safety)

PMFA tenant-by-schema:

# Tenant A metadata override
entity: Invoice
customFields:
  - name: warrantyPeriod
    type: integer
    unit: months
    validation:
      min: 0
      max: 60

PMFA generiše:

-- Tenant A
CREATE TABLE tenant_a.invoices (
  id UUID PRIMARY KEY,
  amount DECIMAL(15,2),
  status VARCHAR(50),
  warranty_period INTEGER CHECK (warranty_period BETWEEN 0 AND 60)  -- CUSTOM
);

-- Tenant B (nema custom field)
CREATE TABLE tenant_b.invoices (
  id UUID PRIMARY KEY,
  amount DECIMAL(15,2),
  status VARCHAR(50)
  -- No warranty_period
);

Perfect type safety. Zero storage waste.


Downsides: Tenant-By-Schema Limitations

1. Schema Limit

PostgreSQL podržava ~9900 schema per database. Ako imate 10,000+ tenanta, treba vam database-per-tenant pattern (PMFA podržava).

2. Cross-Tenant Reporting

Analytics queries preko svih tenant-a su kompleksni:

-- Koliko ukupno unpaid invoices ima?
SELECT COUNT(*) FROM (
  SELECT COUNT(*) FROM tenant_a.invoices WHERE status = 'unpaid'
  UNION ALL
  SELECT COUNT(*) FROM tenant_b.invoices WHERE status = 'unpaid'
  -- ... x 500 tenants
);

PMFA rešenje: Separate analytics database sa tenant_id (ETL pipeline, read replika).


Zaključak

Tenant-by-row je anti-pattern za multi-tenant SaaS:

  • Spor performance zbog shared indeksa
  • Security rizik (jedan bug = sve tenant podatke leaks)
  • Nema per-tenant optimizacija

Tenant-by-schema (PMFA pristup):

  • ✅ 10x-100x brži queries
  • ✅ Perfect isolation
  • ✅ Custom fields per tenant
  • ✅ Selective backups/restores

PMFA automatski generiše sve iz YAML metadata:

  • Schema creation
  • Table generation
  • Index setup
  • Query routing

Metadata-driven multi-tenancy = future of SaaS.


Zainteresovani za PMFA? Kontaktirajte nas na office@nonnotech.com za early access program.