How to Optimize Performance in a Clojure Web Application

BrunoBonacci · August 1, 2024, 11:15am

Hi,

I agree with p-himik comment that it is crucial to identify in production which area of your code is slower and in possibly which inputs make it slower.

I’ve highlighted “in production” because the long tail could be different in your offline test bench.

To better analyse this sort of issues I’ve developed a library called µ/log which can be used to do structured logging. In particular there is a macro called µ/trace which provides tracing of key parts in your code.

The key difference between µ/log and other tracking libraries is that µ/log is designed to capture the contextual data, for example your request parameters etc. This is fundamentally important to understand which inputs produce slower responses.

Once you instrument your code with µ/trace it should be much easier to identify and reproduce the code paths which are slower than your expected SLA.

Then, once you have a general idea of which key action in your system is slowing down, the clj-async-profiler should provide enough details to highlight exactly where the extra time goes.

A few years ago I wrote a few scripts to compare the performance of webservers in a specific scenario (see this benchmarks), the work was done to analyse the long tail in a production web service.
Although most the scripts could be out of date now, I would recommend two tools to perform offline benchmarks

wrk2 by far the best tool to benchmark the long tail (up to 6 nines like 99.9999% percentiles)
gatling.io has really useful charts to understand how many RPS (request per second) your web server can withstand before you need to scale out.

These tools are more useful to ensure that overtime there is no regression, for example you could add a load test directly into your CI/CD pipeline.

If you need more help or want to chat about how to interpret the benchmarks results, I’ll be happy to help

Bruno