Load Balancing: Algorithms, Layers, and Tradeoffs

A load balancer distributes incoming traffic across multiple servers to maximize throughput, minimize latency, and avoid overloading any single node. It operates at Layer 4 (TCP/UDP) or Layer 7 (HTTP), using algorithms like round-robin, least-connections, or consistent hashing, plus health checks to route around failed instances.

What a Load Balancer Does

A load balancer (LB) sits between clients and a pool of backend servers, spreading requests so no single server becomes a bottleneck. It improves availability (routes around dead nodes), scalability (add servers behind one virtual IP), and performance. It also enables zero-downtime deploys by draining connections from servers being updated.

Load balancers can be hardware appliances (F5 BIG-IP, Citrix), software (NGINX, HAProxy, Envoy), or managed cloud services (AWS ELB/ALB/NLB, GCP Cloud Load Balancing). DNS-based load balancing (round-robin A records, or GeoDNS like Route 53) distributes at a coarser, global level.

Layer 4 vs Layer 7

Layer 4 (transport) load balancers route based on IP and TCP/UDP port without inspecting payload. They're extremely fast and protocol-agnostic (AWS NLB handles millions of requests/sec with ultra-low latency). Layer 7 (application) load balancers parse HTTP, enabling content-based routing (route /api to one pool, /images to another), TLS termination, header rewriting, and cookie-based sticky sessions, at higher CPU cost.

Aspect	Layer 4 (NLB)	Layer 7 (ALB)
Routes on	IP + port	URL, headers, cookies
Performance	Higher throughput	Lower (parses HTTP)
Features	Basic, fast	Path routing, TLS, rewrites
Use case	Raw TCP, gaming, DB	Web apps, microservices

Load Balancing Algorithms

The algorithm decides which backend gets each request. The right choice depends on whether requests are uniform or variable in cost, and whether you need a client pinned to a consistent server.

Round-robin: cycle through servers evenly; simple but ignores server load and request cost.
Weighted round-robin: assign more traffic to bigger servers.
Least-connections: send to the server with fewest active connections; good for long-lived/variable requests.
Least-response-time: factor in latency plus connections.
IP hash / consistent hashing: route a client consistently to the same server, useful for session affinity and cache locality.
Random with two choices (power of two): pick two servers at random, route to the less loaded; cheap and surprisingly effective at scale.

Health Checks, Sticky Sessions, and HA

Health checks (active probes or passive monitoring) detect unhealthy servers and remove them from rotation, then re-add them when they recover. Sticky sessions (session affinity) pin a user to one server via cookies or IP hash, needed for in-memory session state but they hurt even distribution; externalizing session state (Redis) is the cleaner pattern.

The load balancer itself must not be a single point of failure. Production setups run active-passive or active-active LB pairs with a floating virtual IP (VRRP/keepalived) or use managed multi-AZ services. Global setups layer GeoDNS or anycast in front of regional load balancers.

ResuMax tailors your resume to each role, scores it like a recruiter, and preps you for interviews.

Practice with the interview coach

Frequently asked questions

What's the difference between L4 and L7 load balancing?

L4 balances on IP and TCP/UDP port without reading the payload, making it fast and protocol-agnostic. L7 understands HTTP, enabling routing by URL path, headers, or cookies, plus TLS termination, at the cost of more CPU per request.

How does a load balancer avoid being a single point of failure?

By running redundant LB instances in active-passive or active-active mode with a floating virtual IP via VRRP/keepalived, or by using a managed cloud LB that's distributed across availability zones. Global traffic adds DNS or anycast on top.

When should I use least-connections over round-robin?

Use least-connections when request durations vary widely (e.g., some requests hold connections for seconds). Round-robin assumes uniform request cost and can overload a server stuck with slow requests.

What are sticky sessions and what's their downside?

Sticky sessions pin a client to the same backend server, usually via a cookie, so in-memory session state stays accessible. The downside is uneven load and lost sessions when a server dies. Storing sessions in Redis avoids the need for stickiness.

All system design