Load Balancing: Algorithms, Layers, and Tradeoffs
A load balancer distributes incoming traffic across multiple servers to maximize throughput, minimize latency, and avoid overloading any single node. It operates at Layer 4 (TCP/UDP) or Layer 7 (HTTP), using algorithms like round-robin, least-connections, or consistent hashing, plus health checks to route around failed instances.
What a Load Balancer Does
A load balancer (LB) sits between clients and a pool of backend servers, spreading requests so no single server becomes a bottleneck. It improves availability (routes around dead nodes), scalability (add servers behind one virtual IP), and performance. It also enables zero-downtime deploys by draining connections from servers being updated.
Load balancers can be hardware appliances (F5 BIG-IP, Citrix), software (NGINX, HAProxy, Envoy), or managed cloud services (AWS ELB/ALB/NLB, GCP Cloud Load Balancing). DNS-based load balancing (round-robin A records, or GeoDNS like Route 53) distributes at a coarser, global level.
Layer 4 vs Layer 7
Layer 4 (transport) load balancers route based on IP and TCP/UDP port without inspecting payload. They're extremely fast and protocol-agnostic (AWS NLB handles millions of requests/sec with ultra-low latency). Layer 7 (application) load balancers parse HTTP, enabling content-based routing (route /api to one pool, /images to another), TLS termination, header rewriting, and cookie-based sticky sessions, at higher CPU cost.
| Aspect | Layer 4 (NLB) | Layer 7 (ALB) |
|---|---|---|
| Routes on | IP + port | URL, headers, cookies |
| Performance | Higher throughput | Lower (parses HTTP) |
| Features | Basic, fast | Path routing, TLS, rewrites |
| Use case | Raw TCP, gaming, DB | Web apps, microservices |
Load Balancing Algorithms
The algorithm decides which backend gets each request. The right choice depends on whether requests are uniform or variable in cost, and whether you need a client pinned to a consistent server.
- Round-robin: cycle through servers evenly; simple but ignores server load and request cost.
- Weighted round-robin: assign more traffic to bigger servers.
- Least-connections: send to the server with fewest active connections; good for long-lived/variable requests.
- Least-response-time: factor in latency plus connections.
- IP hash / consistent hashing: route a client consistently to the same server, useful for session affinity and cache locality.
- Random with two choices (power of two): pick two servers at random, route to the less loaded; cheap and surprisingly effective at scale.
Health Checks, Sticky Sessions, and HA
Health checks (active probes or passive monitoring) detect unhealthy servers and remove them from rotation, then re-add them when they recover. Sticky sessions (session affinity) pin a user to one server via cookies or IP hash, needed for in-memory session state but they hurt even distribution; externalizing session state (Redis) is the cleaner pattern.
The load balancer itself must not be a single point of failure. Production setups run active-passive or active-active LB pairs with a floating virtual IP (VRRP/keepalived) or use managed multi-AZ services. Global setups layer GeoDNS or anycast in front of regional load balancers.
ResuMax tailors your resume to each role, scores it like a recruiter, and preps you for interviews.
Practice with the interview coachFrequently asked questions
What's the difference between L4 and L7 load balancing?
L4 balances on IP and TCP/UDP port without reading the payload, making it fast and protocol-agnostic. L7 understands HTTP, enabling routing by URL path, headers, or cookies, plus TLS termination, at the cost of more CPU per request.
How does a load balancer avoid being a single point of failure?
By running redundant LB instances in active-passive or active-active mode with a floating virtual IP via VRRP/keepalived, or by using a managed cloud LB that's distributed across availability zones. Global traffic adds DNS or anycast on top.
When should I use least-connections over round-robin?
Use least-connections when request durations vary widely (e.g., some requests hold connections for seconds). Round-robin assumes uniform request cost and can overload a server stuck with slow requests.
What are sticky sessions and what's their downside?
Sticky sessions pin a client to the same backend server, usually via a cookie, so in-memory session state stays accessible. The downside is uneven load and lost sessions when a server dies. Storing sessions in Redis avoids the need for stickiness.