How Many Open Connections Can an App Have? Scaling Chat Servers with Load Balancers and Horizontal Scaling

March 4, 2025

Scaling WebSockets Load Balancing Horizontal Scaling Chat Apps AWS

Here’s the article formatted in MDX with the requested metadata:

How Many Open Connections Can an App Have? Scaling Chat Servers with Load Balancers and Horizontal Scaling

Building large-scale applications such as chat servers, gaming platforms, or real-time collaboration tools presents a major challenge: handling millions of concurrent connections efficiently. Every connection consumes system resources, and at a certain point, a single server reaches its limit. This article explores:

  • How many open connections a server can handle
  • File descriptor limitations and OS constraints
  • Scaling using load balancers and horizontal scaling
  • How large-scale apps like WhatsApp or Discord manage millions of users

1. Understanding Open Connections & System Limits

When a server handles client connections (e.g., WebSockets, TCP, or HTTP keep-alive), each open connection consumes a file descriptor (FD). The OS uses file descriptors to track open files, network sockets, and pipes.

File Descriptor Limits

Operating systems impose a maximum number of open file descriptors per process:

| OS | Default Soft Limit (ulimit -n) | Hard Limit | |---------|---------------------------------|------------| | Linux | 1024 | 65,535+ (with tuning) | | macOS | 256–1024 | 16,384 (modifiable) | | Windows | Uses SOCKET handles, different limits |

  • Soft Limit: The default max number of open connections per process.
  • Hard Limit: The absolute max (can be increased by the admin).

Example: Connection Limits Per Server

  • If a chat server has a 1024-file descriptor limit, it can handle only 1024 active WebSocket connections.
  • Increasing the limit (ulimit -n 100000) allows 100K+ connections per server, but memory, CPU, and kernel settings also matter.

How Many Users Can a Server Handle?

Each WebSocket connection consumes 50KB–100KB of memory. On a server with 16GB RAM, assuming:

  • 50KB per connection → 320K users max before running out of memory.
  • 100KB per connection → 160K users max.

Since a single server has limits, horizontal scaling and load balancing are required to support millions of connections.


2. Why Chat Apps Require Many Open Connections

Unlike traditional HTTP-based apps where connections are short-lived, chat apps keep connections open indefinitely using WebSockets or TCP sockets.

Challenges in Managing Connections

  1. File Descriptor Exhaustion: A single server has a limit on open sockets.
  2. Memory & CPU Overhead: Each connection consumes RAM and processing power.
  3. Single Point of Failure: If one server manages all connections, it becomes a bottleneck and a failure risk.

Solution: Load Balancing & Horizontal Scaling

To overcome these challenges, chat applications distribute load across multiple servers using:

  • Load balancers to distribute WebSocket connections.
  • Horizontal scaling (adding more servers when traffic increases).
  • Optimized OS settings (sysctl tuning) for better connection handling.

3. Using Load Balancers for Large-Scale WebSockets

A load balancer distributes incoming WebSocket or TCP connections across multiple backend servers. This ensures that:

  • No single server is overloaded.
  • Connections are evenly spread across available resources.
  • The system can scale dynamically by adding more servers.

Types of Load Balancers

1. TCP Load Balancers (Layer 4)

  • Routes traffic based on IP/port.
  • Does not inspect HTTP/WebSocket headers.
  • Examples: HAProxy, AWS NLB, Nginx (stream module).

2. HTTP Load Balancers (Layer 7)

  • Routes traffic based on WebSocket handshake (ws:// or wss://).
  • Can terminate and re-establish WebSocket connections.
  • Examples: Nginx (HTTP mode), AWS ALB, Traefik.

Example: Load Balancing WebSockets with Nginx

This example shows how Nginx can distribute WebSocket connections across multiple chat servers:

http {
    upstream websocket_servers {
        server chatserver1:3000;
        server chatserver2:3000;
    }

    server {
        listen 80;

        location /ws/ {
            proxy_pass http://websocket_servers;
            proxy_http_version 1.1;
            proxy_set_header Upgrade $http_upgrade;
            proxy_set_header Connection "Upgrade";
        }
    }
}
  • Incoming WebSocket connections (/ws/) are routed to multiple backend servers.
  • If one server is overwhelmed, new connections are sent to another.

4. Horizontal Scaling for Millions of Connections

Even with load balancing, a single machine can still only handle a finite number of connections. Horizontal scaling helps by distributing traffic across multiple machines.

How Horizontal Scaling Works

  1. Multiple chat servers run WebSocket services.
  2. Load balancers distribute incoming connections.
  3. More servers are added dynamically when needed.

5. Databases & Message Queues in Chat Apps

Besides handling connections, messages must be stored and delivered efficiently.

Databases for Chat Apps

  1. Relational Databases (MySQL, PostgreSQL) – Good for structured user data.
  2. NoSQL Databases (MongoDB, Cassandra, DynamoDB) – Scale better for chat message storage.

Message Queues for Real-Time Delivery

To ensure real-time message delivery, chat apps use message queues:

  • Kafka/RabbitMQ: Handles real-time chat events.
  • Redis Pub/Sub: Synchronizes WebSocket servers.

6. How WhatsApp & Discord Scale to Millions of Users

Apps like WhatsApp and Discord need to scale globally to support millions of concurrent users.

WhatsApp (Erlang-Based Architecture)

  • Uses Erlang, which handles millions of lightweight threads efficiently.
  • Shards connections across data centers to distribute the load.

Discord (Massive WebSocket Scaling)

  • Runs thousands of WebSocket servers across multiple data centers.
  • Uses Redis & Kafka for real-time event processing.
  • Deploys AWS/GCP for auto-scaling WebSocket clusters.

7. Summary: Scaling Chat Apps for Millions of Users

File Descriptor Limits – A single machine has an OS-enforced limit on open connections.
Load Balancers – Distribute WebSocket connections across multiple backend servers.
Horizontal Scaling – More servers = more connections handled.
Message Queues – Synchronize messages across WebSocket servers (Redis, Kafka).
Database Scaling – Store chat history efficiently using NoSQL or sharded databases.

Final Thoughts

Handling millions of concurrent users requires combining all these strategies. Load balancing, horizontal scaling, database optimization, and real-time synchronization ensure that large-scale chat applications remain fast, reliable, and scalable.

🚀 If you're building a chat app, start with scalability in mind from Day 1!