Streply's mission is to develop a tool that will help developers create better, faster and safer applications. However, every application is different and in some cases, such as user behaviour analysis or performance, each situation needs to be approached individually. We have faced such a problem recently. The client, who had already been using Streply for several months, needed our help to better implement Streply in his application in order to eliminate problems that occurred during a non-standard situation.

See, how we helped our client reduce load time in his app by 10 times.

Client

OPB Ariadna is the biggest independent nationwide research panel in Poland. Almost 400,000 respondents complete an average of 150,000-200,000 surveys per month. In order to ensure the highest quality of service, the speed and reliability of the survey system are crucial for the company. OPB Ariadna's development team has been using Streply for almost a year to improve the security and speed of the company's applications.

Problem

In November 2022, a company approached us with a problem while implementing a very large survey. The great number of questions and the extensive survey script meant that the survey software, which always works well, in this case, was loading with a noticeable delay. During the peak loading time, when most people were completing the survey, the app for this questionnaire was loading for an average of 10 sec, quite often reaching up to 20 seconds. Speed and convenience for respondents are the key factors for the company, so they asked us to use the Performance module and logs to find the source and carry out an audit with a suggested solution to the problem.

Solution

We began our work by preparing a completely new environment. We set up a new server on which a copy of the survey system dedicated to the problematic survey was placed. Thanks to this procedure, we separated the survey that caused the problem from the rest of the polls which worked well. As a result, problems with one survey did not affect the operation of the entire company.

Next, based on documentation from the client's software development team, we prepared information on where in the code to implement logs and performance transactions from Streply.

We carried out the work in parallel in two ways:

At each stage of the application operation, we created a log that collected information about what was being executed and in what order,
We added a point-to-performance record to each log, so we also knew where the bottlenecks were occurring.

Thanks to the extensive use of logs, we were able to find a problem that only occurred in extreme cases when the survey script was larger in size. In some cases, when the surveys were complex, the poll system downloaded the structure of the survey twice from the file. This caused a large amount of data to be loaded into memory twice. For smaller studies, this error did not cause such a drastic delay, but a minimal one and completely unnecessary overhead was generated each time.

Then, having already diagnosed the source of the problem, we started to analyse the application's performance. We implemented the above modifications on both the new and the currently used servers. Thanks to the large amount of data collected from other studies, we determined the average for each point and then analysed the situations that came out significantly above the average.

By implementing points at each stage of the code run, we were able to produce a report on places in the code that take longer to execute than they should. The system ran very fast during standard testing, but we were able to diagnose a few places where there was a relatively simple opportunity to optimise.

Result

Thanks to our audit, the company serving the client made two modifications to the survey system. The audit informed the development team what situations exactly required their attention. In which parts of the code the problems lie and our suggestion on how to fix each issue.

First of all, the problem of repeatedly loading the survey structure into memory was fixed, which significantly reduced server usage for large studies. This error was not noticeable in the case of standard surveys, but an exceptionally large poll with a lot of respondents completing the survey at the same time caused the server to slow down. Eliminating this bug only took a few hours, and the benefit it brought was significant in terms of reduced loading times and server usage.

Another modification was the use of a cache to hold information that would not be modified again. Previously, the system downloaded and generated all the data every time the survey system was called. This generated tens of thousands of completely unnecessary operations. Each time the result of these operations was the same, the input data did not change, so there was no need to perform unnecessary operations each time.

With the change made, the system retrieved only the data that could change. The rest of the data was stored using Redis in the survey memory in the form the system expected. Without the need to modify this data repeatedly. Once the survey was completed, the system cleared the data for that respondent so as not to take up space unnecessarily.

The survey was conducted on a total of 24,000 respondents. Altogether thanks to our audit, we were able to reduce the average loading time by more than 10 times, to 2-3 seconds per request. The survey was conducted successfully, with no slow loading problems and no errors.

PhD Tomasz Baran Poland's Research Panel

We are not pushy: We only send a few emails every month. That's all.
No spam: We only send articles, and helpful tips for developers, not SPAM.