Built for scale: the architecture behind ParkAttach

The ParkAttach quote endpoint is sized for the biggest OTAs on the planet — Booking.com and Expedia-scale integrations from day one. This post walks through the headline numbers, what they actually mean, and the architectural choices that produce them.

The aim isn’t to brag. It’s to make the technical contract concrete so partner engineering teams reviewing the platform know exactly what they’re signing up to.

The numbers

50,000 quote requests per minute is the planning headroom — roughly 833 queries per second, the design target that every architectural decision is measured against. Per-event payload is ~2 KB for a nested quote response with car-park and nearby-supply sub-arrays. Reserve traffic is sized at ~10% of quote, commit at ~5%.

5,000 RPS measured ceiling per single replica, with p99 latency of 36 ms against preprod (warm cache, 150 concurrent connections). At that point the per-quote CPU work dominates — quote-event assembly, JSON serialisation, KafkaJS produce — and adding concurrency raises latency without raising throughput.

Sub-2 ms wall-clock on the warm path, with all three cache layers populated. Two Redis round trips and one Postgres capacity query, run in parallel.

Horizontal scaling beyond. Past the per-replica ceiling, we scale out behind a load balancer — single-replica gains would need profiling the assembly path, and that’s not a high-priority lever today because horizontal scaling is cheap and 5,000 RPS is already well above any single OTA’s realistic peak.

Three cache layers, decaying independently

The hot path is read-heavy and bursty. The cache layout is what makes the numbers above possible. Three layers, each scoped to the right shape of data:

Layer 1: venue cache

A Redis hash keyed by venue GUID — or aliased per OTA’s venue_proxy row, so any OTA’s first lookup warms the cache for every other OTA’s subsequent lookup regardless of which key form they use. The blob is venue + nearby car parks (local_parking) + car-park metadata. Stops at the car-park level deliberately — agreements and tariffs decay on different cadences and shouldn’t ride this cache’s TTL.

A channel-marker sub-field lets us short-circuit cross-OTA hot venues: if we know this venue has commercial coverage for the calling channel, we skip the per-operator iteration.

Layer 2: per-car-park cache

Tariffs, maintenance windows, and car-park details live under one Redis hash keyed by car-park GUID. Shared across every venue that uses the car park — a downtown garage that’s “nearby parking” for five hotels caches once and serves all five.

Layer 3: per-(operator, OTA) permission cache

An in-memory map keyed by (parking_operator_org_id, ota_organization_id), listing every active agreement between that pair. Channel filtering is a sub-microsecond in-memory predicate against this map.

This is the layer that lets the venue cache stop at car-park level. An operator with car parks at fifty venues caches its agreement-with-Expedia once in Layer 3, not duplicated across fifty venue blobs.

Sized by the number of (operator, OTA) pairs that have ever transacted — orders of magnitude smaller than venues × OTAs. Per-process, per-replica drift accepted. TTL clamped to the soonest-expiring agreement within the entry.

The hot path with all three caches warm

HMGET quote-tree:{venueRef} broad channel:{slug} — one Redis round trip. broad JSON parses to venue + car parks; channel:{slug} says AVAILABLE.
In-memory Layer 3 lookup for each unique operator at the venue. Filter agreements by channel and active period. Drop car parks whose operator has no matching agreement. Sub-microsecond.
In parallel: pipelined HMGET quote-tree:carpark:{guid} for each survivor (Redis, one round trip regardless of N), and the live capacity query (Postgres, one round trip). Different stores so they run naturally in parallel.

Two Redis round trips, one Postgres query, ~1–2 ms wall clock.

Capacity is never cached

There’s a deliberate exception in the design: the capacity overlap query against the GIST index on prebooking.parking_period is live, per-request, every time we return a non-empty car-park list. Capacity is the one data point that can’t tolerate any staleness — quoting available space against held reservations the cache hasn’t seen yet would oversell — so it pays the round trip every time. It runs in parallel with the Layer 2 fetch, so the latency cost is one Postgres RTT shared across both.

Kafka as a load-bearing dependency from day one

The API produces events directly to Kafka — quote, reserve, commit, cancel — with no Redis-Stream intermediate, no drain process. Schemas live in a shared event-contract package; the four topics are consumed by independent groups for urgency, look-to-book ratio, coverage analytics, and future BI workloads.

The producer-side discipline is strict:

Fire-and-forget on the hot path. events.emitQuote(event).catch(logger.warn) — never awaited on the response path.
Kafka outages do not fail quotes. The producer wrapper swallows broker errors after a warn-level log + metric. At-most-once delivery is acceptable for analytics.
linger.ms ~5 ms batching collapses bursts into fewer broker round-trips, cutting QPS to the broker by ~100×.
Backpressure ceiling. If the in-flight buffer fills (sustained broker outage), we drop the event with a warn — never grow the JS heap.

A KAFKA_DISABLED=true env var swaps the producer for a no-op client in test environments without a broker. Plain JSON wire format, zod runtime validation on the consumer side — no Schema Registry, because the operational tax doesn’t earn its keep until there are multi-language consumers downstream and there aren’t yet.

The urgency service is lifecycle-isolated

Urgency signals — spacesRemaining, recentSales, lastBookedAt, activeViewers, capacityUtilisationPct — are computed by a dedicated service that holds 24 hours of in-memory state and serves the API via a 50 ms HTTP call. Three things about that design matter for partner integrations:

The service is decoupled from API restarts. API deploys are minute-scale; the urgency state takes hours to fully warm from live Kafka consumption. If they shared a process, every API deploy would flush the analytics signal partners depend on. So the urgency service deploys on its own cadence, runs two replicas with distinct consumer groups, and the API is a stateless HTTP client of it.

There’s no Kafka replay on startup. Each urgency replica starts at latest offset. Per-window honesty (a freshly-started replica that’s only been ingesting for 12 hours returns absent for last12h and last24h, not zero) ensures partners always know whether a window is observed or unknown.

Failure is invisible to the OTA. The 50 ms timeout via AbortController, against a tuned undici.Pool with 32 persistent keep-alive connections, means urgency-side failures degrade to absent fields on the quote response. The quote still returns 200. Partners building UI against urgency signals must treat absence as “no signal”, never as zero — and that contract is documented in the wire schema.

Bucketed memory layout

Inside the urgency service, recent-sales counters are stored as Uint32Array[144] per car park — 10-minute resolution, 24-hour retention, ~580 B per car park. On the ingest path: bucket[idx]++ and you’re done. Zero per-event JavaScript-object allocation, zero per-event garbage collection pressure.

At full population that’s roughly 336 MB per replica for a half-million car parks. Tractable. The same shape generalises to any future per-car-park time-windowed counter.

What this costs partners

For OTAs, integrating against ParkAttach is a single bearer-token API and a wire-schema contract. None of the above is visible — the quote returns in ~1–2 ms warm-path, the response shape is stable, and urgency signals appear as optional fields you render in your UI or ignore.

For parking operators, the connector contract is HTTP + JSON, with three capability tiers so you can join the platform at the level your existing parking platform supports today. You don’t need to handle hot-path traffic if you can’t — Layer 1 caches your data at the platform level.

The architecture is what it is so that you don’t have to think about it. Which is, in the end, the only honest measure of a good integration platform.

If you’re an engineering team evaluating ParkAttach for an integration, we’d love to walk through any of the above in more detail.