Scaling from zero to millions of users

6 min readAug 28, 2023

The large systems does not build up overnight. It takes time to make a mature system. System should be designed to meet business requirement first. And with the evolving of business we are to scale system to meet the demand. So over engineering at the beginning is not a good idea. Because each business has unique needs, and how things will grow might change based on user demands. And honestly, predicting exactly what they want at the beginning can be a bit uncertain.

In this article, we’ll begin with a small-scale system and gradually introduce various scenarios to explore potential challenges. As we encounter these challenges, we’ll gradually enhance the system’s complexity in response to real-world business demands. Our aim is to use these scenarios as practical examples to both identify and solve problems effectively.

Scenario 1:

Let’s say you have a coffee shop and you want to have an online platform to sell coffee in online. Since your online customer base is currently limited, your initial focus will be on creating a web application. And to efficiently manage customer information, product details and order details, a database will be essential. And to retrieve the data from database you will need an API server. The API server will work as an interface between the web application and the database server.

Server setup:

Okay, so you’ve developed a website, an API, and chosen a database. Initially, since the user base is small, you’ve housed everything on a single server. The technical details of developing these components are not within the scope of this discussion. Here our focus is solely on scaling the system to handle increased user traffic.

To get your site online, you acquired a server and deployed all components onto it. The user flow operates as follows: when a user accesses your site, the web server serves the web page. Subsequently, the web page sends a request to the API server to fetch data from the database.

Application setup and user flow in a single server

Problem:

Since the initial user traffic is low, a single server is good enough to run on. But let’s say after some days your customer grown up. And you saw the server is struggling in peak hours. Some requests are failing. Because each server has a fixed amount of memory and CPU. Thus it has a max capacity of how much data it can process at a time. The server has hit that max capacity. That’s why it is now struggling.
So, How would you fix that?

Solution:

In your application, there are three main components: the website, the API, and the database. Continuing to host all of them on a single server is no longer viable. It’s now crucial to distribute them across separate servers. Once this separation is implemented, the website will communicate with the API server, which will subsequently interact with the database server. This distributed setup significantly improves scalability and efficiency even in the peak hours.

Separate the servers

This separation yields numerous advantages:

  1. Independent Growth: Each component can now expand independently.
  2. Resource Allocation: Previously, when housed on a single server, they had to compete to get resources (memory, CPU) etc. If the database took lots of CPU and memory to process a complex query then the API had to throttle and vice versa. And thus the performance impacted. Now, isolating on separate servers, they won’t encroach on each other’s resources.
  3. Optimized Resource Allocation: Adjusting resources (RAM, CPU) for specific components becomes feasible. The database server might demand substantial RAM/CPU for complex queries or indexing, which is less critical for the API server. This separation allows you to select servers with varying capabilities — higher RAM/CPU for the database server and more modest resources for the API server.

Scenario 2:

Now again everything was going fine until after some days you noticed the similar problem again. This time you found that the API server is unable to process some of the requests.

Problem:

Though the scenario is similar to the previous one but this time you found that the database server’s load is steady but the API is throttling. The problem arises because some of the APIs have very complex logic, and executing those logic is time consuming. That’s why the API server is struggling to handle high volume of requests at peak hour.

How would you fix that?

Solution:

You can possibly fix this in two way

  1. Vertical scaling: As you see the API server needs to perform complex logic so if you could extend the memory and CPU then the problem might be solved. So, if you were using 8 GB RAM earlier now you can choose 16 GB RAM or more. The same way you can extend the processing power as well. Now your application may come back to steady state again.
    But let’s have a look at some of the pros and cons of this solution
    Pros:
    a. Comparatively easy to manage. (Nothing fancy needed.)
    b. Easy to adopt (In cloud you can easily extend the resource of the existing server)
    c. System is less complicated (As everything is in one place)
    Cons:
    a. Hard limit of max capacity(RAM/CPU). (You can’t take unlimited RAM and CPU in a single machine.)
    b. Single point of failure (As this is the only server to handle all of the requests then if it crashes/goes offline then the full system will be shut down)
  2. Horizontal scaling: You may replicate another API server. Then both of the server will independently process requests. The load will now be shared by two identical servers.
    Pros:
    a. Highly scalable.
    b. Theoretically no limit of total number of servers (You can add as many as you need)
    c. Highly available (If one server goes down then there will be other servers for taking the responsibility.)
    Cons:
    a. Little bit difficult to configure things (As there will be multiple servers to manage.)
    b. Need another component which is Load Balancer. (To optimally use the servers)

Load Balancer:

We can not but talk about this component here. This is critical for horizontal scaling. In this horizontal scaling approach there will be multiple identical servers for handling API requests. But from the user perspective how a user will know which server to request to?

You may think to put your API servers IP address list in your web application. But then updating that list is cumbersome. Also the client won’t know which server is free now to send request to. Thus it becomes an inappropriate solution.

So the alternate way is to address a new component (Load Balancer). It will work as a middleware between the client and the API server. It will know the list of API servers and their status (resource utilization, health stats, avg response time) etc.

Load balancing (Horizontal scaling)

All incoming requests will flow through the Load Balancer, and as it contains information about the API servers and their operational status it will be able to intelligently direct requests to the appropriate server and optimize response time.

The client will now communicate with the API server via the Load Balancer. Being the central communication gateway, this setup allows for the implementation of traffic control measures, such as rate limiting, at this primary point.

Another important point to note that the API servers won’t be public anymore. Only the Load Balancer will be able to communicate with the API servers.

Advantages:
- Easy to add/remove server without bothering client’s
- Easy to scale up/down.
- Server can be added/removed without any downtime. Because there will always be some servers available to handle requests. So while adding/removing any server the request will be handled by the other active servers at that time.

Fair enough??
Maybe not!! Because with the increase of user traffic there will arise more complexities and may need to take decisions based on different scenarios. So there are lot more parts to improve.

Think of the database. What if a single database server is unable to handle all of the queries at peak hour. How would you scale your database? You know scaling up database is not similar to scaling up API server. Because API servers are stateless but databases aren’t.

Let’s discuss that topic in another story. I’m eager to hear your feedback on this story. Your thoughts and impressions are valuable to me.

--

--

Md. Anjarul Islam
Md. Anjarul Islam

Written by Md. Anjarul Islam

Experienced Senior Software Engineer at Brainstation-23 with a strong background in JavaScript technologies, Passionate about system design and problem solving.

No responses yet