After a previous positive experience with us, we were asked by National Institute for Educational Measurements to help them with an interesting issue. During the implementation of the e-Test project and the initial testing with real users, it was found out, that the system could not handle the required amount of 10000 concurrent users. Our goal was to identify the bottlenecks of the system and propose appropriate measures.
First, we needed to measure the real performance metrics of e-Test. Using Gatling load tests, we found out that the system started to be unusable with approximately 1000 concurrent users. The 90th percentile of server responses lasted more than 5 seconds. The next step was to find out which architectural element is a bottleneck. White box testing has shown that the problem is not the bandwidth, nor the application server, but (unsurprisingly) the database server.
The logical step was to optimize database queries and indexes, which improved the performance of the system two-fold, but still not enough. It was obvious that a more resolute change had to be made, so we suggested adding Redis in front of relational database, as some kind of "bumber".
Load tests have shown, that by following our recommendations, they have been able to reduce response times in the 90th percentile from the original 5 seconds for 1000 users to 80 milliseconds for 10000 concurrent users. With a relatively simple change at the architecture level, we have been able to speed up the system several hundred times.