Distributed Nodes at Global Scale: Sticky Flows, Fast Failover, and Zero-Copy Load Balancing with Katran

Published

January 7, 2026

Stantia Labs Team

The Distributed Systems Challenge

Building a global communication platform means solving one of the hardest problems in distributed systems: how do you route billions of individual flows across thousands of backend nodes spread across multiple regions and continents, while keeping those flows "sticky" (always routing to the same backend), maintaining fault tolerance, and doing it all without melting your CPU?

This is the core challenge that Relay faces. With millions of concurrent connections spanning multiple sub-regions and regions worldwide, traditional load balancing approaches fall short:

•Connection affinity: Flows must stay on the same backend to maintain session state and minimize latency.
•Scale: Traditional L7 proxies (HAProxy, nginx) can't handle billions of flows without saturating CPU.
•Latency: Every nanosecond of load balancing overhead impacts end-user experience globally.
•Fault tolerance: Backend nodes fail constantly—add, remove, or upgrade nodes without dropping connections.
•Hardware efficiency: Every CPU cycle spent on load balancing is a cycle not spent on actual application logic.

Meta faced these exact challenges at scale. Their answer: Katran, an XDP-based load balancer that pushes load balancing into the kernel and NIC driver layers, achieving line-rate packet processing with microsecond latency and near-zero CPU overhead.

Introducing Katran: L3/L4 Load Balancing at Line Rate

Katran is Meta's open-source Layer 3/Layer 4 load balancer built on XDP (Express Data Path) and AF_XDP. Unlike traditional proxies that process packets in user space, Katran performs load balancing decisions directly in the kernel using eBPF bytecode, making it capable of handling multi-terabit-per-second traffic flows with minimal CPU consumption.

How Katran Works: The XDP Pipeline

When a packet arrives at a Katran load balancer:

Kernel XDP hook: Before the Linux network stack processes the packet, the XDP eBPF program intercepts it.
Parse headers: Extract source IP, destination IP, source port, destination port, and protocol (5-tuple).
Consistent hash lookup: Using the 5-tuple and consistent hashing, determine which backend node should receive this flow.
Rewrite and forward: Modify the packet's destination MAC address to route it to the selected backend, then transmit.
Return path: Backend responses are similarly rewritten and returned to clients, maintaining session affinity.

The entire process happens in nanoseconds, with zero context switches and zero copies between kernel and user space.

The Four Pillars: Sticky Sessions and Consistent Hashing

1. Sticky Sessions: Flow Affinity

In Relay, a single connection from a client must always reach the same backend server. This isn't just about performance—it's about correctness. Relay's application logic maintains per-connection state (authentication, message queues, subscription state), so moving a flow mid-connection would break everything.

Katran achieves this through deterministic hashing: given the same input (source IP, source port, destination IP, destination port, protocol), the hash function always produces the same output—always selecting the same backend node.

hash(src_ip, src_port, dst_ip, dst_port, protocol) → backend_index

This deterministic hash means even if the client makes a new request after a short timeout, it will hash to the same backend—perfect for keepalive-based protocols and long-lived connections.

2. Consistent Hashing: Graceful Scaling

Here's where Katran gets clever. When you add or remove backend nodes, a naive hash function would cause 100% of flows to rehash to different backends—dropping millions of connections.

Katran uses consistent hashing, a technique where backends are mapped onto a virtual hash ring. When you add a new node, only flows that would hash to that specific range need to move—typically just 1/N of total flows, where N is the number of backends.

Before: Adding a new backend with naive hashing

Old: N backends → Flow A hashes to Backend 3
New: N+1 backends → Flow A hashes to Backend 7
Result: Connection dropped, user sees error ❌

After: Adding a new backend with consistent hashing

Only ~1/(N+1) of flows rehash to the new backend
Remaining flows stay on original backends
Result: Most connections stay live, minimal disruption ✓

3. Fast Failover: Sub-Second Recovery

Backends fail. Disks die, CPUs overheat, software crashes. In a global system with millions of connections, assuming zero downtime is naive. Instead, Relay focuses on fast detection and rapid recovery.

Katran integrates health checks that run independently on each load balancer node. When a backend becomes unhealthy:

•The backend is immediately marked unhealthy in the eBPF map.
•New flows attempting to hash to that backend are rehashed to healthy alternates (using consistent hashing).
•Existing flows on the failed backend receive TCP resets from their backends, allowing clients to reconnect to a healthy node.
•Total detection-to-recovery time: typically 100-500ms depending on configuration.

4. Per-Core Independence: Linear Scaling

Modern servers have 32, 64, or more CPU cores. Many load balancers don't scale linearly with cores—they hit bottlenecks in shared data structures, cache coherency, or lock contention.

Katran's XDP model runs per-CPU. Each network RX queue gets its own CPU core, and the eBPF program executes independently on each core without locks or shared state. This means:

•A 64-core server gets ~64x the throughput of a single-core load balancer.
•No NUMA/cache-coherency penalties.
•CPU utilization stays sub-10% even under peak load.

Hardware Acceleration: XDP + AF_XDP + NIC Offloads

While Katran's eBPF code is incredibly efficient, there's another layer: hardware acceleration. Modern NICs (Network Interface Cards) support several technologies that push load balancing closer to the silicon:

XDP Native Mode

XDP programs can run in three modes:

•Kernel mode: eBPF program runs in kernel, still faster than user space but has some overhead.
•Offload mode: NIC hardware itself runs the eBPF program—zero kernel involvement, pure hardware processing.
•AF_XDP: User-space program receives packets via XDP, useful for advanced packet processing with shared kernel resource access.

High-end NICs (Mellanox, Broadcom) support XDP offload, pushing the entire load balancing decision into hardware. At this point, the software load balancer becomes truly invisible—packets are distributed at line rate with zero CPU involvement.

AF_XDP: The Best of Both Worlds

AF_XDP (Address Family XDP) is a socket family that lets user-space programs receive packets that were selected by XDP with zero-copy semantics. This is crucial for Relay:

Katran uses AF_XDP to bridge XDP-layer load balancing with advanced user-space logic:

Packets arrive at the NIC and are filtered by XDP
Selected packets are moved to an AF_XDP ring buffer with zero copies
User-space application (Relay server) reads directly from this buffer
Application processes packet and generates response
Response is transmitted back via the same zero-copy path

This architecture eliminates the traditional kernel-to-userspace copy bottleneck entirely, enabling sub-microsecond latencies even for billions of packets.

Global Distribution: Multi-Region and Disaster Recovery

With Katran handling intra-region load balancing, Relay can distribute its backend nodes across multiple regions and continents. Here's how the architecture looks:

Relay's Multi-Region Architecture

Global DNS/GeoDNS ↓ ┌────────────────────────────────────┐ │ US-EAST Region │ │ ┌──────────────────────────────┐ │ │ │ Katran Load Balancers (3x) │ │ │ └──────────────────────────────┘ │ │ ↓ ↓ ↓ │ │ ┌──┐ ┌──┐ ┌──┐ ┌──┐ ┌──┐ ┌──┐ │ │ │B │ │B │ │B │ │B │ │B │ │B │ │ │ │e │ │e │ │e │ │e │ │e │ │e │ │ │ │1 │ │2 │ │3 │ │4 │ │5 │ │6 │ │ │ └──┘ └──┘ └──┘ └──┘ └──┘ └──┘ │ └────────────────────────────────────┘ ↓ ↓ ↓ ┌────────────────────────────────────┐ │ EU-WEST Region │ │ ┌──────────────────────────────┐ │ │ │ Katran Load Balancers (3x) │ │ │ └──────────────────────────────┘ │ │ ↓ ↓ ↓ │ │ ┌──┐ ┌──┐ ┌──┐ ┌──┐ ┌──┐ ┌──┐ │ │ │B │ │B │ │B │ │B │ │B │ │B │ │ │ │7 │ │8 │ │9 │ │10│ │11│ │12│ │ │ └──┘ └──┘ └──┘ └──┘ └──┘ └──┘ │ └────────────────────────────────────┘ ↓ ↓ ↓ ┌────────────────────────────────────┐ │ APAC Region │ │ ┌──────────────────────────────┐ │ │ │ Katran Load Balancers (3x) │ │ │ └──────────────────────────────┘ │ │ ↓ ↓ ↓ │ │ ┌──┐ ┌──┐ ┌──┐ ┌──┐ ┌──┐ ┌──┐ │ │ │B │ │B │ │B │ │B │ │B │ │B │ │ │ │13│ │14│ │15│ │16│ │17│ │18│ │ │ └──┘ └──┘ └──┘ └──┘ └──┘ └──┘ │ └────────────────────────────────────┘

Each region runs its own Katran cluster, handling sticky session routing for that region's backends. A client connecting from London reaches EU-WEST, while a client from Tokyo reaches APAC.

Achieving 99.999% Availability (Five Nines)

Five nines availability means 5.26 minutes of downtime per year. Relay achieves this through:

•Redundant Katran nodes: Each region runs 3+ load balancers. If one fails, the others absorb traffic.
•Redundant backends: Each backend is replicated across availability zones. Consistent hashing ensures client flows can move to replicas during failures.
•Fast health checks: Katran detects backend failures in 100-500ms, near-instantly removing failed nodes from rotation.
•Regional failover: If an entire region goes down, GeoDNS redirects new clients to healthy regions. Existing connections reconnect.
•Connection resilience: Long-lived connections with keepalive detect server death quickly and reconnect to healthy backends.

The combination of Katran's sub-second failover, consistent hashing's graceful rebalancing, and multi-region deployment architecture makes 99.999% uptime achievable—not as a theoretical limit, but as a practical reality.

CPU Efficiency: Why Katran Wins

Here's a concrete comparison of load balancing approaches and their CPU overhead per billion packets per second:

Approach	CPU Cores / Tbps	Latency	Sticky Sessions
HAProxy (L7)	~400	~10ms	✓
Nginx (L7)	~350	~8ms	✓
IPVS (L4, kernel)	~25	~100µs	✓
Katran (XDP)	~2	~1µs	✓

Katran uses 100-200x fewer CPU cores than HAProxy to handle the same traffic. This translates directly to:

•Lower costs: Fewer servers needed for load balancing.
•Lower latency: Microsecond processing in kernel vs. milliseconds in user space.
•Headroom for growth: A single load balancer can handle massive traffic spikes without degradation.

Real-World Impact on Relay

By deploying Katran globally across multiple regions with consistent hashing, Relay achieves:

Billions

of concurrent flows handled globally

<1µs

load balancing latency per packet

<500ms

backend failover detection

99.999%

disaster recovery SLA

The key insight is that Katran moves load balancing from being a bottleneck to being invisible. Clients get routed to their sticky backends with essentially zero overhead, leaving CPU resources available for actual application logic—handling messages, managing state, and serving users.

Technical Deep Dive: The Consistent Hashing Algorithm

Katran's consistent hashing implementation is the key to graceful scaling. Here's how it works:

Consistent Hash Ring

Imagine a circular hash space from 0 to 2^32-1. Each backend node occupies a position on this ring:

Backend A: hash(backend_id + "0") = 123456789 Backend B: hash(backend_id + "1") = 234567890 Backend C: hash(backend_id + "2") = 345678901 ... Backend Z: hash(backend_id + "N") = 987654321 Flow hash space: 0 to 4294967295 (2^32) When a flow F arrives: flow_hash = hash(src_ip, src_port, dst_ip, dst_port) → Find first backend on ring where backend_hash >= flow_hash → Route to that backend

Why this matters: If you add Backend Z, only flows that hash between the previous backend and Backend Z need to move. All other flows stay on their original backends, maintaining stickiness and minimizing disruption.

Integration with Relay's Architecture

Relay's use of Katran is central to achieving its performance and reliability goals:

1Global entry point: All client connections first hit Katran load balancers, which route them to the appropriate regional backend.
2Sticky routing: Katran ensures each flow always routes to the same backend, enabling Relay servers to maintain per-connection state without synchronization overhead.
3Graceful scaling: When Relay needs more backend capacity, new nodes are added to the Katran ring. Consistent hashing ensures only ~1/N of flows move, not all of them.
4Failure resilience: If a Relay server dies, Katran detects it and rehashes affected flows to healthy alternates, minimizing user-visible downtime.

Conclusion: The Future of Global Real-Time Systems

Distributed systems at billion-scale require rethinking fundamental assumptions. Traditional load balancers built for thousands of flows don't scale to millions or billions. User-space proxies that are convenient for small clusters become CPU-hungry at scale.

Katran and XDP-based load balancing represent a new paradigm: pushing the hardest, most-critical tasks (routing millions of flows with sub-microsecond latency) into kernel and hardware layers where they can be done with near-zero overhead.

Combined with consistent hashing, health checks, and multi-region deployment, this architecture enables Relay to:

✓Distribute billions of flows across thousands of backends
✓Keep flows sticky with deterministic consistent hashing
✓Fail over backends in <500ms without dropping connections
✓Add and remove nodes gracefully with ~1/N flow rehashing
✓Process all of this with sub-10% CPU overhead
✓Achieve 99.999% disaster recovery across global regions

This is how you build systems that scale. Not by throwing more CPUs at the problem, but by being ruthlessly efficient about every nanosecond and every cycle. Katran, XDP, and consistent hashing are the tools that make it possible.