Overview
Built a real-time growth analytics platform that consolidates revenue, churn, and engagement metrics across 10+ products into a single dashboard. The system processes 2M+ data points daily and delivers actionable insights with sub-second dashboard loads.
Challenge
Growth data was scattered across multiple tools:
- Stripe for payment metrics
- Mixpanel for product analytics
- Google Analytics for traffic
- Custom databases for product-specific KPIs
Each product team had its own reporting process, making cross-product analysis impossible. Monthly board reports took 2+ days to compile manually.
When a critical churn spike happened in one product, it took 3 days to notice because the data was buried in Stripe's dashboard — by then, 47 customers had already cancelled.
Solution
Data Ingestion Pipeline
# Unified event schema for all data sources
@dataclass
class GrowthEvent:
product_id: str
event_type: str # revenue, churn, signup, activation
value: Decimal
currency: str
timestamp: datetime
metadata: dict
# BigQuery streaming insert for real-time ingestion
def ingest_events(events: list[GrowthEvent]):
rows = [event.to_bq_row() for event in events]
client.insert_rows_json(
f"{PROJECT}.growth.events",
rows,
row_ids=[e.idempotency_key for e in events]
)Real-Time Aggregation
Pre-computed materialized views keep dashboard queries fast:
-- Materialized view refreshed every 60 seconds
CREATE MATERIALIZED VIEW growth_daily AS
SELECT
product_id,
DATE(timestamp) as date,
SUM(CASE WHEN event_type = 'revenue' THEN value END) as revenue,
COUNT(CASE WHEN event_type = 'churn' THEN 1 END) as churned,
COUNT(CASE WHEN event_type = 'signup' THEN 1 END) as signups,
ROUND(
COUNT(CASE WHEN event_type = 'churn' THEN 1 END)::numeric /
NULLIF(LAG(COUNT(*)) OVER (PARTITION BY product_id ORDER BY DATE(timestamp)), 0) * 100,
2
) as churn_rate
FROM growth_events
GROUP BY product_id, DATE(timestamp);Alerting System
Anomaly detection triggers real-time alerts:
def check_anomaly(product_id: str, metric: str, current: float):
# Compare against 30-day rolling average
baseline = get_rolling_average(product_id, metric, days=30)
std_dev = get_rolling_stddev(product_id, metric, days=30)
z_score = abs(current - baseline) / std_dev if std_dev > 0 else 0
if z_score > 2.5: # 2.5 standard deviations
send_alert(
product_id=product_id,
metric=metric,
current=current,
baseline=baseline,
severity="high" if z_score > 3.5 else "medium"
)Technical Decisions
Why BigQuery: The analytics queries involve scanning millions of rows with complex aggregations. BigQuery's columnar storage and auto-scaling compute handle this without infrastructure management.
Why D3.js: Off-the-shelf charting libraries couldn't handle the custom visualizations needed — cohort heatmaps, funnel flows with branching, and interactive revenue waterfalls. D3 provided the flexibility.
Why WebSockets: Dashboard users expect real-time updates. Polling would create unnecessary load on the API. WebSocket connections push updates only when new data arrives.
Results
- 10+ products tracked from a single dashboard
- 2M+ data points processed daily
- Sub-second dashboard loads via pre-computed aggregations
- Under 30 second alert latency for anomaly detection
- Board reports generated automatically, saving 2+ days monthly