Here’s the article formatted in MDX with the requested metadata:
How Many Open Connections Can an App Have? Scaling Chat Servers with Load Balancers and Horizontal Scaling
Building large-scale applications such as chat servers, gaming platforms, or real-time collaboration tools presents a major challenge: handling millions of concurrent connections efficiently. Every connection consumes system resources, and at a certain point, a single server reaches its limit. This article explores:
- How many open connections a server can handle
- File descriptor limitations and OS constraints
- Scaling using load balancers and horizontal scaling
- How large-scale apps like WhatsApp or Discord manage millions of users
1. Understanding Open Connections & System Limits
When a server handles client connections (e.g., WebSockets, TCP, or HTTP keep-alive), each open connection consumes a file descriptor (FD). The OS uses file descriptors to track open files, network sockets, and pipes.
File Descriptor Limits
Operating systems impose a maximum number of open file descriptors per process:
| OS | Default Soft Limit (ulimit -n
) | Hard Limit |
|---------|---------------------------------|------------|
| Linux | 1024 | 65,535+ (with tuning) |
| macOS | 256–1024 | 16,384 (modifiable) |
| Windows | Uses SOCKET
handles, different limits |
- Soft Limit: The default max number of open connections per process.
- Hard Limit: The absolute max (can be increased by the admin).
Example: Connection Limits Per Server
- If a chat server has a 1024-file descriptor limit, it can handle only 1024 active WebSocket connections.
- Increasing the limit (
ulimit -n 100000
) allows 100K+ connections per server, but memory, CPU, and kernel settings also matter.
How Many Users Can a Server Handle?
Each WebSocket connection consumes 50KB–100KB of memory. On a server with 16GB RAM, assuming:
- 50KB per connection → 320K users max before running out of memory.
- 100KB per connection → 160K users max.
Since a single server has limits, horizontal scaling and load balancing are required to support millions of connections.
2. Why Chat Apps Require Many Open Connections
Unlike traditional HTTP-based apps where connections are short-lived, chat apps keep connections open indefinitely using WebSockets or TCP sockets.
Challenges in Managing Connections
- File Descriptor Exhaustion: A single server has a limit on open sockets.
- Memory & CPU Overhead: Each connection consumes RAM and processing power.
- Single Point of Failure: If one server manages all connections, it becomes a bottleneck and a failure risk.
Solution: Load Balancing & Horizontal Scaling
To overcome these challenges, chat applications distribute load across multiple servers using:
- Load balancers to distribute WebSocket connections.
- Horizontal scaling (adding more servers when traffic increases).
- Optimized OS settings (
sysctl
tuning) for better connection handling.
3. Using Load Balancers for Large-Scale WebSockets
A load balancer distributes incoming WebSocket or TCP connections across multiple backend servers. This ensures that:
- No single server is overloaded.
- Connections are evenly spread across available resources.
- The system can scale dynamically by adding more servers.
Types of Load Balancers
1. TCP Load Balancers (Layer 4)
- Routes traffic based on IP/port.
- Does not inspect HTTP/WebSocket headers.
- Examples: HAProxy, AWS NLB, Nginx (stream module).
2. HTTP Load Balancers (Layer 7)
- Routes traffic based on WebSocket handshake (
ws://
orwss://
). - Can terminate and re-establish WebSocket connections.
- Examples: Nginx (HTTP mode), AWS ALB, Traefik.
Example: Load Balancing WebSockets with Nginx
This example shows how Nginx can distribute WebSocket connections across multiple chat servers:
http {
upstream websocket_servers {
server chatserver1:3000;
server chatserver2:3000;
}
server {
listen 80;
location /ws/ {
proxy_pass http://websocket_servers;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "Upgrade";
}
}
}
- Incoming WebSocket connections (
/ws/
) are routed to multiple backend servers. - If one server is overwhelmed, new connections are sent to another.
4. Horizontal Scaling for Millions of Connections
Even with load balancing, a single machine can still only handle a finite number of connections. Horizontal scaling helps by distributing traffic across multiple machines.
How Horizontal Scaling Works
- Multiple chat servers run WebSocket services.
- Load balancers distribute incoming connections.
- More servers are added dynamically when needed.
5. Databases & Message Queues in Chat Apps
Besides handling connections, messages must be stored and delivered efficiently.
Databases for Chat Apps
- Relational Databases (MySQL, PostgreSQL) – Good for structured user data.
- NoSQL Databases (MongoDB, Cassandra, DynamoDB) – Scale better for chat message storage.
Message Queues for Real-Time Delivery
To ensure real-time message delivery, chat apps use message queues:
- Kafka/RabbitMQ: Handles real-time chat events.
- Redis Pub/Sub: Synchronizes WebSocket servers.
6. How WhatsApp & Discord Scale to Millions of Users
Apps like WhatsApp and Discord need to scale globally to support millions of concurrent users.
WhatsApp (Erlang-Based Architecture)
- Uses Erlang, which handles millions of lightweight threads efficiently.
- Shards connections across data centers to distribute the load.
Discord (Massive WebSocket Scaling)
- Runs thousands of WebSocket servers across multiple data centers.
- Uses Redis & Kafka for real-time event processing.
- Deploys AWS/GCP for auto-scaling WebSocket clusters.
7. Summary: Scaling Chat Apps for Millions of Users
✅ File Descriptor Limits – A single machine has an OS-enforced limit on open connections.
✅ Load Balancers – Distribute WebSocket connections across multiple backend servers.
✅ Horizontal Scaling – More servers = more connections handled.
✅ Message Queues – Synchronize messages across WebSocket servers (Redis, Kafka).
✅ Database Scaling – Store chat history efficiently using NoSQL or sharded databases.
Final Thoughts
Handling millions of concurrent users requires combining all these strategies. Load balancing, horizontal scaling, database optimization, and real-time synchronization ensure that large-scale chat applications remain fast, reliable, and scalable.
🚀 If you're building a chat app, start with scalability in mind from Day 1!