This is the 1st post in a series on System Design.
“Scalability is the property of a system to handle a growing amount of work by adding resources to the system.” — Wikipedia
Software systems that can accommodate growth are referred to as scalable in software engineering. Scalability refers to how well a software system can cope with growth in some area of its operation. Operational dimensions include:
- The number of simultaneous requests a system can handle from users or external sources.
- A system’s ability to process and manage a large amount of data.
- Through predictive analytics, a system can derive value from its stored data.
- Maintaining a stable, consistent response time as the request load increases.
Scaling up or scaling out refers to increasing a system’s capability in some dimension by increasing resources. Similarly, scaling down or scaling in refers to decreasing resources.
Internet Live Stats lets you view live internet traffic for numerous services. There are some staggering statistics about Google searches, tweets, and blog posts written per day if you dig around. The data is not real, but rather estimates derived from multiple data sources analyzed statistically.
Design Principles of Scalability
The purpose of scaling a system is to increase its capacity in some application-specific dimension. A common dimension is increasing the number of requests that a system can handle in a given time frame. This is known as the system’s throughput.
Replication and optimization are the two basic principles for scaling our systems.
Our first strategy in software systems is to increase capacity.
To increase throughput, we replicate the software processing resources to increase capacity. Replication is easy in cloud-based software systems, and we can replicate processing resources thousands of times with the click of a mouse. Adding capacity to processing paths that aren’t overburdened will add unnecessary costs without improving scalability.
An application can be scaled vertically or horizontally.
Vertical scaling — To handle increased loads, a system is vertically scaled by adding more resources to its existing nodes.
Horizontal scaling — System resources can be increased by adding more servers with similar hardware.
The second strategy for scalability is optimization.
We can increase our capacity without increasing our resources if we can use more efficient algorithms, add more indexes to our databases, or even rewrite our server in a faster programming language.
Scalability and Costs
Imagine a web-based system that can handle 1,000 concurrent requests with a mean response time of 1 second. Business requirements require scaling up this system to handle 10,000 concurrent requests with the same response time. A simple load test of this system reveals its performance without making any changes.
With the projected load, we see a steady increase in mean response time to 15 seconds. Clearly, this does not meet our requirements in its current deployment configuration. The system doesn’t scale.
In order to achieve the required performance, some engineering effort is required. This graph shows the system’s performance after this effort has been modified. With 10,000 concurrent requests, it now provides the specified response time. As a result, the system has been scaled successfully.
However, a major question remains. How much effort and resources were required to achieve this performance? Perhaps it was simply a matter of running the web server on a more powerful (virtual) machine. The process of reprovisioning a cloud might take no more than 30 minutes. A slightly more complex configuration would be to run multiple instances of the webserver to increase capacity. With no code changes required, this should be a simple, low-cost configuration change for the application. It would be an excellent outcome if these outcomes were achieved.
Scaling a system isn’t always easy. Several reasons may explain this, but here are a few:
- With 10,000 requests per second, the database becomes less responsive, requiring an upgrade.
- The web server generates a lot of content dynamically, which reduces response time. Changing the code to generate content more efficiently could reduce processing time per request.
- When many requests attempt to access and update the same records simultaneously, hotspots in the database are created. Redesigning the schema and reloading the database, along with code changes to the data access layer, are required.
- It was decided to choose a web server framework that emphasized ease of development over scalability. Due to its model, the code cannot be scaled to meet the requested load requirements, and a complete rewrite is necessary. Would you like to use another framework? Would it be possible to use another programming language?
Systems that are not intrinsically scalable may incur huge downstream costs and resources when they are expanded to meet requirements.
Hyperscale systems are software systems that scale exponentially while costs grow linearly.
Scalability Testing Characteristics
There are distinguishable characteristics of scalability tests. Their focus is on:
- Usage of memory
- Usage of the CPU
- Usage of network bandwidth
- Load times
- Time to respond
- Requests processed
In general, availability and scalability are highly compatible. By replicating resources, we create multiple instances of services that can handle requests from any user. In the event that one of our instances fails, the others remain available. Due to a failed, unavailable resource, the system just suffers from reduced capacity.
Scalability and availability become complicated when the state is involved. Consider a database. Whenever a state is replicated for scalability and availability, consistency is a concern.
An application’s ability to scale quickly and cost-effectively should be a defining characteristic of its software architecture. There are two basic ways to achieve scalability, namely by increasing system capacity, typically through replication, and by optimizing system performance.