Optimization of the e-Test project architecture

Load testing and SW architecture refactoring

Challenge

After a previous positive experience working with us, the National Institute for Educational Measurements asked us to help with an interesting problem. During the implementation of the e-Test project and the initial testing with real users, it became clear that the system could not handle the required load of 10,000 concurrent users. Our goal was to identify the system’s bottleneck and propose appropriate measures.

Analysis

First, we needed to measure the real performance metrics of e-Test. Through Gatling load tests, we found that the system became unusable at around 1,000 concurrent users — at that point, the 90th percentile of server response times exceeded 5 seconds. The next step was to determine which architectural element was the bottleneck. White-box testing showed that the problem was neither the data centre’s network bandwidth nor the application server, but — unsurprisingly — the database server.

Solution

The logical first step was to optimize the database queries and indexes, which roughly doubled the system’s performance — but still not enough. A more decisive change was needed at the architecture level, so we proposed adding the Redis in-memory store as a buffer zone in front of the database.

Benefits for the client

Load tests confirmed that, after implementing our recommendations, response times in the 90th percentile dropped from the original 5 seconds at 1,000 users to 80 milliseconds at 10,000 concurrent users. With a relatively simple change at the architecture level, we sped the system up several hundredfold.