How to debug slow start-up time?

Webdev_Tory · September 18, 2019, 7:41pm

I write single-page applications with a full Clojure stack, based on Luminus. I’ve been doing this for some time before recently encountering a problem; we couldn’t deploy our uberjar/war; it would just fail. Debugging step one, just running the jar locally, seemed to fail too – seemed. It turned out it wasn’t failing but was taking 3-7 minutes for the application to start. This is why trying to deploy it to our public server (into Wildfly) was failing: it was timing out. My attempt to fix this problem by removing most of the routes did not seem to effect my start times.

I’ve never had a program that compiles ten or twenty times faster than it actually deploys. How can I go about profiling or debugging this to find out what’s causing the slowdown?

seancorfield · September 18, 2019, 7:47pm

What exact version of Java/JDK are you running on? What exact version of Clojure are you running on?

There was a performance regression caused by a recent JVM version update that affected startup times and should be fixed by updating to Clojure 1.10.1.

It should only affect you if you have user.clj on your classpath… but if you recently updated your JVM, this could be why…

Webdev_Tory · September 18, 2019, 8:08pm

In answer to your post, I upgraded to Clojure 10.0.1 (was on 0.0), and also verified that there is no user.clj on my dev classpath.

java --version
openjdk 11.0.4 2019-07-16
OpenJDK Runtime Environment (build 11.0.4+11-suse-1.1-x8664)
OpenJDK 64-Bit Server VM (build 11.0.4+11-suse-1.1-x8664, mixed mode)

Execution results (4 minute deploy time)

13:57$ java -jar target/humhelp.jar
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.xnio.nio.NioXnio$2 (file:/home/torysa/workspace/Clojure/humhelp-clj/target/humhelp.jar) to constructor sun.nio.ch.EPollSelectorProvider()
WARNING: Please consider reporting this to the maintainers of org.xnio.nio.NioXnio$2
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
14:01

This particular application has always been a little slow to start; (mount.core/start) takes longer than on other applications, but I’m not actually sure whether that’s related here or not. The next steps I’ll follow:

check if other applications have started demonstrating this problem since upgrading my java
Start interspersing println in the code to see if I can figure out where the bottleneck is at

Webdev_Tory · September 18, 2019, 8:22pm

UPDATE: I’ve verified that this is NOT a problem on others of my projects (same JVM). I’ll start the labor of println debugging now, unless people have more civilized ways of finding chokepoints.

seancorfield · September 18, 2019, 8:51pm

You didn’t answer my question about your Clojure version.

vlaaad · September 18, 2019, 10:10pm

clj-async-profiler will make you a nice flamegraph showing where jvm spends most of the time.

Webdev_Tory · September 18, 2019, 11:03pm

I upgraded to Clojure 10.0.1

seancorfield · September 18, 2019, 11:31pm

Cool. I just wanted to eliminate the JVM/user.clj performance regression as a possibility here.

Webdev_Tory · September 18, 2019, 11:37pm

Well, at least in Dev, I’ve narrowed down to the problem. I don’t know if this holds true in prod (jar) version or not, but by systematically checking each Mount component, the whole thing loads in under 100ms except for my routing component (Reitit). This one increases the load time by 200ms for each route, even though all routes have an identical signature (they just point to the base SPA case, since the SPA will handle the rest). I will do a little more work with reitit and will see if I can learn anything on their side.

Webdev_Tory · September 19, 2019, 12:32am

I’ve made an issue with full details on the reitit repo: https://github.com/metosin/reitit/issues/314

I found that the relationship of routes to start-time also holds true for jar startup times, except that the slow startup delay is magnified.

ShiTianshu · September 19, 2019, 6:41am

How long will it take for start a repl for development.

Webdev_Tory · September 19, 2019, 7:21am

The repl itself starts in a normal amount of time. But when you do (mount.core/start), you will see the times mentioned in the issue details above. On jar execution (not repl) The start action happens automatically and, when all routes are included, takes 4-7 minutes.

ShiTianshu · September 19, 2019, 1:00pm

I think there’re two things you need to take a look. The first is how long it will take to load your main namespace. The second is how long it take to start each state.

ikitommi · September 19, 2019, 2:39pm

reitit compiles the middleware chain for each route on router creation. If some mw takes 1sec to create, having it in 100 routes takes 100sec. Usually, this is not a problem as the mw creation just costs few function calls. You can also put the mw outside of the router so they are created only once.

We’ll redefine the default mw creation lifecycle in reitit 1.0.0, so that it’s more… obvious, like with interceptors today.

Webdev_Tory · September 19, 2019, 8:07pm

The issue on reitit was solved, with details in the issue linked above. To the point of this thread regarding how to debug slow startup time, my big learning was of a benefit of a component lifecycle (in my case, provided by Mount): by systematically doing mount.core/stop and mount.core/start-without I was able to step through each portion of my application, identifying which one was slowing things down. When I found out it was routes, I was able to do this same process without ever restarting my REPL by simply commenting-out lines of my routes and doing (time (mount.core/start)) to see precisely how adding/removing a route effected my start times. Then, when the @ikitommi et al at reitit helped me find the error (in this case a bad middleware usage), I was able to immediately test things without that middleware by precisely the same process.

TL;DR: Mount proved to be invaluable for locating a run-time bug and worked with a rapid feedback cycle. Also, the Metosin folks are awesome.

seancorfield · September 20, 2019, 10:04pm

My “objection” to Mount is that it is based on global state. I prefer Component which gives you the same dependency management and start/stop functionality – and makes it easy to run up test systems with mock components or run multiple instances of a component configured different ways.

And, yes, those Metosin folks are awesome

Glad you figured it out – without needing to restart your REPL

Webdev_Tory · September 24, 2019, 4:12pm

My main reason for favoring Mount was that it came with Luminus and had a nice emphasis on fun that didn’t seem evident with Component (much in the same spirit of Reagent/Reframe over Ohm). I’d be interested in learning more about your mock components and other distinctions of Component over Mount (knowing that debate tends toward holy war sometimes).

Sean Corfield via ClojureVerse notifications@clojureverse.org writes:

seancorfield · September 24, 2019, 6:00pm

Since you can create Component instances on the fly and compose systems as needed – and start/stop them – it’s easy to construct Component-based systems that contain variants of specific subcomponents.

One example, from work, is where we test a distributed system in place by spinning up multiple Component systems that differ in the configuration they use and can then interact with each other within a single “test system”.

It’s pretty typical dependency injection stuff from the OOP world (and why DI is so useful in testing).

system · March 25, 2020, 6:00am

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.