Multi-Tenant SaaS Platform

Overview

Designed and built a multi-tenant SaaS platform from the ground up as the founding engineer. The platform scaled from first customer to $800K MRR in 18 months, serving 2,000+ business tenants with 99.9% uptime.

Challenge

The founding team had product-market fit validated through a prototype but needed a production-grade platform that could:

Scale tenant isolation without proportional infrastructure cost increases
Handle unpredictable growth — some tenants would 10x their usage overnight
Maintain sub-200ms API response times across all tenants
Support rapid feature iteration without risking platform stability

Warning

The prototype was a single-tenant Rails app. Everything needed to be rebuilt for multi-tenancy — auth, data isolation, billing, and deployment.

Solution

Multi-Tenancy Architecture

Chose a hybrid isolation model: shared application layer with schema-level database isolation.

// Tenant context middleware
async function tenantMiddleware(req: Request, next: NextFunction) {
  const tenantId = extractTenantId(req);
  const tenant = await tenantCache.get(tenantId);
 
  if (!tenant) throw new TenantNotFoundError(tenantId);
 
  // Set PostgreSQL schema for this request
  await db.raw(`SET search_path TO tenant_${tenant.schemaId}, public`);
 
  req.tenant = tenant;
  return next();
}

Infrastructure as Code

Every environment is reproducible via Terraform:

# Per-tenant resource scaling
resource "aws_rds_cluster" "main" {
  cluster_identifier = "saas-${var.environment}"
  engine             = "aurora-postgresql"
  engine_mode        = "provisioned"
 
  serverlessv2_scaling_configuration {
    min_capacity = 0.5
    max_capacity = 64
  }
}

Deployment Pipeline

Zero-downtime deployments with automated canary releases:

Build and test in CI
Deploy to canary (5% of traffic)
Monitor error rates and latency for 10 minutes
Auto-promote or auto-rollback based on metrics

Deployment Frequency

15/week

Average production deployments per week

Technical Decisions

Why Next.js: Server components let us render tenant-specific dashboards without shipping tenant data to the client. The App Router's layout system naturally maps to multi-tenant UI patterns.

Why Aurora PostgreSQL: Serverless v2 scaling handles tenant usage spikes without pre-provisioning. Schema-per-tenant isolation provides strong data boundaries without the operational overhead of database-per-tenant.

Why Terraform: With 2,000+ tenants, manual infrastructure management is impossible. Everything from DNS to database schemas is codified and version-controlled.

Results

$800K MRR reached in 18 months from first customer
99.9% uptime since launch (3 incidents in 18 months, all < 15 min)
2,000+ tenants on shared infrastructure
Sub-200ms p95 API response time across all endpoints

Tech Stack

Next.jsNode.jsPostgreSQLAWSRedisTerraform