Overview
Designed and built a custom payment orchestration layer that processes $50M+ annually across multiple payment providers. The system handles intelligent routing, automatic retry logic, and real-time reconciliation with a 99.99% success rate.
Challenge
The existing payment integration was a direct Stripe integration that:
- Failed silently on network timeouts, leaving orders in limbo
- Had no retry logic — failed charges required manual intervention
- Couldn't route between providers for cost optimization
- Lacked reconciliation — monthly accounting took days of manual work
Analysis of 6 months of payment data revealed that 3.2% of initial charge attempts failed but were recoverable. At $50M annual volume, that's $1.6M in potentially lost revenue.
Solution
Idempotent Payment Pipeline
Every payment operation is idempotent and observable:
// Payment operations are idempotent via idempotency keys
async function processPayment(intent: PaymentIntent): Promise<PaymentResult> {
const idempotencyKey = `pay_${intent.orderId}_${intent.attempt}`;
// Check for existing result
const existing = await db.paymentResult.findUnique({
where: { idempotencyKey },
});
if (existing) return existing;
// Process with timeout and circuit breaker
const result = await circuitBreaker.fire(async () => {
return stripe.paymentIntents.create(
{
amount: intent.amount,
currency: intent.currency,
customer: intent.customerId,
metadata: { orderId: intent.orderId },
},
{ idempotencyKey }
);
});
// Persist result atomically
return db.paymentResult.create({
data: { idempotencyKey, ...mapResult(result) },
});
}Intelligent Retry Strategy
Not all failures are equal. The retry engine classifies failures and applies appropriate strategies:
const RETRY_STRATEGIES: Record<FailureType, RetryConfig> = {
network_timeout: {
maxAttempts: 3,
backoff: "exponential",
baseDelay: 1000,
},
rate_limited: {
maxAttempts: 5,
backoff: "exponential",
baseDelay: 5000,
},
card_declined: {
maxAttempts: 1, // Don't retry hard declines
backoff: "none",
baseDelay: 0,
},
insufficient_funds: {
maxAttempts: 3,
backoff: "fixed",
baseDelay: 86400000, // Retry daily
},
};Real-Time Reconciliation
Every transaction is reconciled against provider records within 5 minutes:
- Payment processed → event emitted
- Reconciliation worker picks up event
- Fetches provider-side record via API
- Compares amounts, fees, and status
- Flags discrepancies for review
Technical Decisions
Why TypeScript: Payment code demands type safety. Runtime type errors in payment processing can mean lost revenue or double charges. TypeScript catches these at compile time.
Why Circuit Breaker: When Stripe has a partial outage, continuing to send requests makes the problem worse. The circuit breaker pattern prevents cascade failures and provides fast feedback to the application.
Why PostgreSQL: ACID transactions are non-negotiable for payment state management. The idempotency key pattern relies on unique constraints and atomic inserts.
Results
- $50M+ processed annually through the orchestration layer
- 99.99% success rate (up from 96.8% with direct integration)
- 12% recovery rate on initially-failed payments
- 180ms average latency for the full payment pipeline
- Zero reconciliation discrepancies in 12 months of operation