Overview
Designed and built a multi-tenant SaaS platform from the ground up as the founding engineer. The platform scaled from first customer to $800K MRR in 18 months, serving 2,000+ business tenants with 99.9% uptime.
Challenge
The founding team had product-market fit validated through a prototype but needed a production-grade platform that could:
- Scale tenant isolation without proportional infrastructure cost increases
- Handle unpredictable growth — some tenants would 10x their usage overnight
- Maintain sub-200ms API response times across all tenants
- Support rapid feature iteration without risking platform stability
The prototype was a single-tenant Rails app. Everything needed to be rebuilt for multi-tenancy — auth, data isolation, billing, and deployment.
Solution
Multi-Tenancy Architecture
Chose a hybrid isolation model: shared application layer with schema-level database isolation.
// Tenant context middleware
async function tenantMiddleware(req: Request, next: NextFunction) {
const tenantId = extractTenantId(req);
const tenant = await tenantCache.get(tenantId);
if (!tenant) throw new TenantNotFoundError(tenantId);
// Set PostgreSQL schema for this request
await db.raw(`SET search_path TO tenant_${tenant.schemaId}, public`);
req.tenant = tenant;
return next();
}Infrastructure as Code
Every environment is reproducible via Terraform:
# Per-tenant resource scaling
resource "aws_rds_cluster" "main" {
cluster_identifier = "saas-${var.environment}"
engine = "aurora-postgresql"
engine_mode = "provisioned"
serverlessv2_scaling_configuration {
min_capacity = 0.5
max_capacity = 64
}
}Deployment Pipeline
Zero-downtime deployments with automated canary releases:
- Build and test in CI
- Deploy to canary (5% of traffic)
- Monitor error rates and latency for 10 minutes
- Auto-promote or auto-rollback based on metrics
Technical Decisions
Why Next.js: Server components let us render tenant-specific dashboards without shipping tenant data to the client. The App Router's layout system naturally maps to multi-tenant UI patterns.
Why Aurora PostgreSQL: Serverless v2 scaling handles tenant usage spikes without pre-provisioning. Schema-per-tenant isolation provides strong data boundaries without the operational overhead of database-per-tenant.
Why Terraform: With 2,000+ tenants, manual infrastructure management is impossible. Everything from DNS to database schemas is codified and version-controlled.
Results
- $800K MRR reached in 18 months from first customer
- 99.9% uptime since launch (3 incidents in 18 months, all < 15 min)
- 2,000+ tenants on shared infrastructure
- Sub-200ms p95 API response time across all endpoints