LESSON 2.7 MODULE 02 — SERVER ARCHITECTURE ADVANCED

Web Server Concurrency Limits:
Nginx vs. Apache / LiteSpeed

Adjusting worker_connections and keepalive_requests in Nginx — and their architectural equivalents in Apache and LiteSpeed — to absorb high-traffic spikes without exhausting the TCP connection pool or dropping client sessions.

// SCHEMA 01 — NGINX EVENT LOOP vs. APACHE PROCESS MODEL LIVE RENDER
Nginx Asynchronous Event Loop vs Apache Prefork Process-Per-Connection Model This diagram contrasts the two dominant web server concurrency architectures. On the left, Nginx uses a single-threaded asynchronous event loop per worker process: one worker handles thousands of simultaneous connections by multiplexing I/O events via epoll, never blocking on a slow client. On the right, Apache Prefork spawns one OS process per active connection; each process consumes 8–25MB of RAM and blocks entirely while waiting for the client to read the response. The consequence is that at 1,000 concurrent connections, Nginx may use 4 workers and 40MB of RAM while Apache Prefork requires 1,000 processes consuming up to 25GB of RAM, making the Apache model RAM-exhaustible under moderate traffic spikes without aggressive MaxRequestWorkers tuning. NGINX — ASYNC EVENT LOOP APACHE PREFORK — PROCESS PER CONN WORKER PROCESS 01 epoll EVENT LOOP NON-BLOCKING I/O CONN 001 CONN 002 CONN 003 … N worker_connections: 1024–4096 ALL SERVED BY ONE WORKER WORKER PROCESS 02 epoll EVENT LOOP MAX CONNECTIONS = worker_processes × worker_connections e.g. 4 workers × 2048 = 8,192 simultaneous TCP connections RAM COST: ~10–20MB per worker process Connection count has near-zero marginal RAM cost in Nginx PROCESS — CONN 001 — 18MB PROCESS — CONN 002 — 18MB PROCESS — CONN 003 — 18MB PROCESS — CONN N — BLOCKED SLOW CLIENT PROCESS IDLE MaxRequestWorkers CEILING Default: 150 — each process: 8–25MB RAM RAM COST: LINEAR WITH CONNECTION COUNT 1,000 connections × 18MB = ~18GB RAM consumed OOM kill risk at sustained spike without MPM tuning LiteSpeed uses an event-driven model similar to Nginx (async, non-blocking) with native PHP LSAPI — achieving Nginx-class concurrency with Apache .htaccess compatibility Apache Worker/Event MPM closes the gap with thread-based handling, but per-thread RAM overhead still exceeds Nginx’s marginal connection cost

Nginx multiplexes thousands of TCP connections through a single non-blocking event loop per worker process, keeping marginal RAM cost per connection near zero. Apache Prefork allocates one OS process per active connection — each consuming 8–25MB of RAM — making it RAM-exhaustible under sustained concurrency spikes. LiteSpeed’s event-driven model matches Nginx’s concurrency architecture while maintaining Apache .htaccess compatibility, making it the operationally lowest-friction upgrade path from Apache on shared stacks.

Core Mechanism

Web server concurrency architecture determines the upper bound on simultaneous TCP connections a server can maintain before it begins dropping new client requests or queuing them behind a backlog that exhausts the OS socket buffer. The fundamental dichotomy is between process-per-connection (Apache Prefork) and event-driven multiplexing (Nginx, LiteSpeed, Apache Event MPM). In the process-per-connection model, every active client holds an OS process in memory for the full duration of the connection — including idle keepalive intervals, slow TLS negotiation, and sluggish client upload speeds. This makes RAM consumption a direct linear function of concurrent connection count. In the event-driven model, a single worker process uses the Linux epoll syscall to monitor thousands of socket file descriptors simultaneously, switching between them only when I/O events are ready — never blocking idle on a slow client. The worker’s RAM cost is fixed regardless of how many connections it is multiplexing.

In Nginx, the connection capacity equation is explicit and operator-controlled: total simultaneous connections = worker_processes × worker_connections. The worker_processes directive should be set to auto, which maps one worker per available CPU core. The worker_connections directive — set inside the events block — defines how many connections each worker can hold open concurrently. The default value of 1024 is a conservative legacy setting from an era of lower traffic volumes; modern production servers routinely require 4096–16384, constrained only by the system’s ulimit -n (open file descriptor ceiling) and available RAM for socket buffers. Each open connection consumes one file descriptor on the OS level, making the ulimit the hard system ceiling above which even a correctly configured Nginx will refuse new connections with a “too many open files” error.

Keepalive configuration is the second critical tuning axis and operates in direct tension with connection pool capacity. A keepalive connection holds a TCP socket open after a request completes, allowing the same client to reuse it for subsequent requests without a three-way handshake. This reduces latency for returning visitors but consumes a worker_connections slot for the full keepalive_timeout duration. Under high-traffic spikes, an aggressive keepalive timeout (e.g., 75 seconds) combined with a sustained stream of new visitors means the connection pool fills with idle keepalive sockets, leaving no capacity for new connection accepts. The production tuning target is a keepalive_timeout of 10–15 seconds and a keepalive_requests ceiling of 100–500, which bounds each persistent socket’s lifespan and prevents any single client from monopolising a connection slot indefinitely.

Server Concurrency Architecture Matrix

The following table maps the operative concurrency configuration parameters across Nginx, Apache (Prefork and Event MPM), and LiteSpeed. These are the directives that define the effective connection ceiling on each platform. Understanding their interaction with RAM, file descriptor limits, and PHP-FPM worker counts is prerequisite to setting any of them correctly — each parameter is a constraint in a system of interdependent limits, not an independent dial.

Parameter Nginx Apache Prefork Apache Event / LiteSpeed
Concurrency Model Async event loop (epoll) — non-blocking 1 process per connection — blocking Async threaded (Event) / async event (LSWS)
Max Connections Directive worker_processes × worker_connections MaxRequestWorkers (default: 150) MaxRequestWorkers / MaxConnections
RAM Per Connection ~0 marginal (socket buffer only, ~4–8KB) 8–25MB per process (heap + stack) ~1–2KB per thread (Event) / ~4KB (LSWS)
Keepalive Timeout keepalive_timeout 15s; KeepAliveTimeout 5 (lower default) KeepAliveTimeout / LSWS: maxKeepAliveReq
Keepalive Requests Cap keepalive_requests 100; MaxKeepAliveRequests 100 MaxKeepAliveRequests / LSWS Admin Panel
OS File Descriptor Ceiling worker_rlimit_nofile 65535; LimitNoFile in systemd unit Controlled via ulimit -n / LSWS config
PHP Integration FastCGI via php-fpm (separate process pool) mod_php (in-process) or mod_proxy_fcgi Native LSAPI (lowest latency PHP integration)
Traffic Spike Behaviour Queues connections in backlog; degrades gracefully Hits MaxRequestWorkers ceiling; returns 503 Event: similar to Nginx; LSWS: auto-scales threads
Recommended Starting Config worker_processes auto; worker_connections 4096; MaxRequestWorkers = RAM_MB / avg_process_MB Event: MaxRequestWorkers 400–800 with thread tuning
// NODE 009 — TOOL INTEGRATION

WooCommerce PHP Worker Calculator

This tool is required here because worker_connections in Nginx and MaxRequestWorkers in Apache do not operate in isolation — they are the upstream TCP layer sitting above a PHP-FPM worker pool whose size is independently bounded by RAM. Setting worker_connections 8192 in Nginx while your PHP-FPM pool has only 10 workers means Nginx will accept 8,192 TCP connections but immediately queue all but 10 of them at the FastCGI layer, creating backpressure that manifests as upstream timeout errors in the Nginx error log. The web server concurrency ceiling and the PHP worker ceiling must be calibrated together as a stack. Use this calculator to derive the correct pm.max_children value for your WooCommerce install’s RAM footprint per worker, then size your worker_connections accordingly so the Nginx layer never accepts more simultaneous PHP-bound requests than the FPM pool can service without queueing.

LAUNCH NODE 009 — PHP WORKER CALCULATOR

Nginx Concurrency Configuration Reference

The Nginx configuration surface for concurrency control is compact but interdependent. Every directive in the events block and the http keepalive context must be understood as part of a single capacity equation — adjusting one without recalculating its dependents produces unpredictable results under load. The multi_accept on directive, for instance, instructs each worker to accept all queued connections in a single accept() loop iteration rather than one per event cycle, which significantly increases burst absorption capacity at the cost of slightly higher per-spike CPU usage. The use epoll directive explicitly selects Linux’s scalable I/O event notification interface, bypassing the older and less efficient select() and poll() syscalls that Nginx may fall back to on misconfigured builds.

The keepalive_timeout and keepalive_requests directives in the http block govern how long and how many times an idle persistent connection will be tolerated. Setting keepalive_timeout 0 disables keepalive entirely — appropriate only for very high-traffic public endpoints where connection slots are perpetually scarce and client reuse patterns are low-value (e.g., CDN origin pulls, ad traffic ingestion). For WordPress sites with returning authenticated users or WooCommerce checkout flows, a timeout of 10–20 seconds and a request cap of 100–200 is the correct balance: long enough to serve a multi-page session over a single TCP connection, short enough to reclaim the slot before the next traffic burst arrives.

# ── /etc/nginx/nginx.conf ─────────────────────────────────────────────────── worker_processes auto; # 1 worker per CPU core worker_rlimit_nofile 65535; # OS file descriptor ceiling per worker events { worker_connections 4096; # Connections per worker; total = workers × 4096 use epoll; # Linux epoll — mandatory for high concurrency multi_accept on; # Accept all queued conns per event cycle } http { # ── KEEPALIVE TUNING ─────────────────────────────────────────────────── keepalive_timeout 15; # Idle keepalive socket TTL in seconds keepalive_requests 200; # Max requests per persistent connection # At 0, disables keepalive (for pure CDN/ad traffic endpoints only) # ── CONNECTION QUEUE & TIMEOUT ───────────────────────────────────────── client_header_timeout 10s; # Abort if client header not sent in 10s client_body_timeout 10s; # Abort if client body not sent in 10s send_timeout 10s; # Abort if client stops reading response reset_timedout_connection on; # Immediately free RAM on timed-out conn # ── UPSTREAM FASTCGI (PHP-FPM) ───────────────────────────────────────── upstream php_fpm { server unix:/run/php/php8.2-fpm.sock; keepalive 64; # Persistent upstream conns to FPM pool } server { # ── BACKLOG TUNING (OS-level socket queue) ───────────────────────── listen 443 ssl http2 backlog=4096; # backlog: max pending TCP connections before kernel starts refusing SYNs # Default: 511; raise to match worker_connections under spike conditions } } # ── /etc/security/limits.conf (or systemd override) ──────────────────────── # www-data soft nofile 65535 # www-data hard nofile 65535 # Required: OS must permit the file descriptor count that worker_rlimit_nofile sets # ── VERIFY ACTIVE LIMITS ─────────────────────────────────────────────────── # cat /proc/$(pgrep -o nginx)/limits | grep ‘open files’ # ss -s # Total TCP connection state summary # nginx -T | grep worker_connections # Confirm running config value
// SCHEMA 02 — KEEPALIVE SLOT EXHAUSTION UNDER TRAFFIC SPIKE LIVE RENDER
Nginx keepalive_timeout Slot Exhaustion: How Idle Keepalive Connections Block New Client Accepts This diagram illustrates the keepalive connection slot exhaustion failure mode in Nginx. With worker_connections set to 1024 and keepalive_timeout at 75 seconds, a sustained traffic spike fills the worker’s connection pool with idle keepalive sockets from previous clients. When the pool is full, new incoming TCP connection requests are queued in the kernel backlog; once the backlog is also exhausted, the kernel sends TCP RST packets, manifesting as connection refused errors on the client side. Reducing keepalive_timeout to 10–15 seconds and capping keepalive_requests at 100–200 causes idle slots to be reclaimed within seconds of last use, restoring connection pool headroom before the next burst wave arrives and eliminating the slot exhaustion failure mode entirely. TIME → (TRAFFIC SPIKE WINDOW) PHASE 1: SPIKE ARRIVAL PHASE 2: POOL FILLS PHASE 3: EXHAUSTION PHASE 4: OUTCOME SCENARIO A — keepalive_timeout 75s (DEFAULT HIGH) ACTIVE CONNS IDLE KEEPALIVE SLOTS HELD (75s TTL — NOT RECLAIMED) POOL FULL worker_conn CEILING HIT NEW CONNS REFUSED / RST SCENARIO B — keepalive_timeout 15s / keepalive_requests 200 ACTIVE CONNS RECLAIMING (15s TTL) FREE SLOTS — HEADROOM FOR BURST NEW CONNS ACCEPTED ✓ KEEPALIVE CONFIGURATION IMPACT — PRODUCTION METRICS keepalive_timeout 75s SLOT RECLAIM: ~75s Spike absorption: LOW keepalive_timeout 15s SLOT RECLAIM: ~15s Spike absorption: HIGH keepalive_requests 1000 (default) 1 CONN HOLDS SLOT 1000 REQ Cap to 100–200 to bound monopoly risk

With keepalive_timeout 75s (a commonly copied default), idle persistent connections occupy worker_connections slots for over a minute after their last request. Under a traffic spike, these unreleased slots fill the pool, forcing new client connections into the kernel backlog and then into a TCP RST rejection loop. Reducing to 15 seconds causes slots to reclaim within the same window as the spike, maintaining continuous headroom for new connection accepts and eliminating the exhaustion failure mode without sacrificing session reuse benefits for active users.

// NODE 015 — TOOL INTEGRATION

Ad Traffic Cache Bypass Calculator

This tool is required here because paid advertising traffic — from Google Ads, Meta, or programmatic networks — generates a distinct and highly predictable concurrency pattern that invalidates generic worker_connections sizing assumptions. Ad traffic arrives in concentrated burst windows synchronized to campaign scheduling, often saturating your connection pool within seconds of a campaign going live. Crucially, ad landing page traffic from UTM-tagged URLs frequently bypasses the full-page cache (due to query string parameters triggering a cache-miss condition in Nginx FastCGI cache or WP Rocket), meaning every ad click becomes a live PHP-FPM execution rather than a served cache file — multiplying the per-connection PHP worker cost by the cache bypass ratio. Use this calculator to model the actual PHP-bound concurrency load generated by your ad traffic volume and cache bypass rate, then derive the worker_connections and PHP-FPM pm.max_children values that can absorb your campaign burst without dropping connections or queuing PHP requests into timeout.

LAUNCH NODE 015 — AD TRAFFIC BYPASS CALCULATOR

Apache MPM & LiteSpeed Concurrency Tuning

Apache’s concurrency ceiling is controlled by the active Multi-Processing Module (MPM). The Prefork MPM — the default on many cPanel and legacy shared hosting stacks — uses the MaxRequestWorkers directive as its hard connection ceiling. The correct value is not a round number chosen by intuition; it is a calculated result: divide available RAM (total server RAM minus OS + MySQL overhead) by the average Apache process size in megabytes, measured via ps aux --sort=-%mem | grep apache2. A server with 2GB RAM reserved for Apache and an average process size of 20MB yields a maximum of 100 simultaneous connections before OOM risk. Exceeding this number does not improve throughput — it causes memory pressure that degrades performance for all concurrent requests and ultimately triggers the OOM killer.

The Apache Event MPM is the architectural bridge between Prefork’s process model and Nginx’s event loop. It assigns one thread to manage multiple keepalive connections asynchronously, reserving full process/thread weight only for requests that are actively being served. This dramatically reduces RAM consumption for keepalive-heavy workloads and is the recommended MPM configuration for any Apache server that cannot be migrated to Nginx or LiteSpeed. Its key tuning directives are ThreadsPerChild (threads per child process, default 25), MaxRequestWorkers (total thread ceiling), and AsyncRequestWorkerFactor (the multiplier controlling how many idle keepalive connections each thread manages — default 2, safely raisable to 4 on modern hardware). LiteSpeed Web Server (LSWS) uses a native asynchronous event architecture with its own LSAPI PHP handler that bypasses FastCGI overhead entirely, achieving the lowest per-request PHP execution latency of the three platforms. Its concurrency tuning is managed primarily through the LiteSpeed Admin Panel’s Server → Tuning → Max Connections and PHP LSAPI worker count settings, rather than flat configuration files.

Takeaway

The web server concurrency limit is the first gate through which every incoming TCP connection must pass. Misconfiguring it — either by leaving worker_connections at its legacy default of 1024, by failing to set worker_rlimit_nofile high enough to match it, or by allowing keepalive timeouts to hold slots open for 75 seconds during a burst event — means the server begins refusing legitimate clients before CPU or RAM resources are anywhere near exhausted. This is the class of production failure that is most commonly misdiagnosed as a hardware capacity problem, when it is entirely a configuration problem solvable without spending a cent on infrastructure. Profile your active connection count under load with ss -s, cross-reference it against worker_connections × worker_processes, and validate your keepalive slot reclaim rate against your traffic spike duration before concluding that vertical scaling is the answer.

For WooCommerce and ad-traffic-heavy WordPress sites, the critical insight is that the web server connection layer and the PHP-FPM worker layer are a coupled system — tuning one without the other creates a mismatch that manifests as upstream timeout errors in Nginx logs or 502/504 responses under load. The correct tuning sequence is always: calculate PHP-FPM pm.max_children from RAM budget first (NODE 009), then size worker_connections to match or slightly exceed the FPM pool capacity, then model ad traffic cache bypass load (NODE 015) to validate that the PHP-bound concurrency ceiling is sufficient for campaign burst scenarios.

// DIAGNOSTIC GATEWAY — LESSON 2.7

A 4-core Nginx server has worker_processes 4 and worker_connections 1024. During a flash sale, ss -s shows 3,800 established TCP connections and the Nginx error log reports “worker_connections are not enough”. CPU is at 22% and RAM has 3GB free. What is the correct remediation sequence?