When does more concurrency help? How do I measure that?

lkrubner · August 9, 2025, 5:08pm

This might be more of a JVM question than a Clojure question, but I hope you won’t mind if I ask it here.

I am aware of Amdahl’s law but I don’t know how to apply it.

I am looking for some intuitions or rough rules of thumb.

I am at a company that uses Hubspot for marketing. We also use many other 3rd parties:

Luma for events

Calendly for leadership meetings

Stripe for payment processing

Swarm for lead discovery

Mailgun for email

plus a few others

I wrote a Clojure app that uses the APIs of those 3rd party services to pull in all the data from the 3rd parties and store them in a central database. (I am using MongoDB as my database, in part to adapt to the many divergent 3rd party schemas that I have to only partly interact with – it’s not worth my time to fully map those schemas.)

I also need to find clues about our users and then push those clues to Hubspot. In other words, assume a person has an email such as tim@example.com, and I find the email tim@example.com also appears in our Luma, Calendly, and Stripe data sources. We want to gather up that data and push it to the Hubspot Contact that we maintain for tim@example.com. For instance, one question is “Where did tim@example.com first appear in our system?” So then I have to look at every date in Luma, Calendly, and Stripe, where tim@example.com appears, and I have to find the earliest date, to see how tim@example.com first arrived in our system.

In other words, there are a lot of background processes, each trying to find some data about our users, so we can aggregate that data and push it to the appropriate Hubspot Contact.

To run things on background, I’ve been using a thread pool, and for scheduling, I rely on the At/At library:

In my core.clj, in my main function, I set up a thread pool and initiate all the background tasks. I pass the same thread pool to each task so they can use it to schedule further tasks.

At first I had everything start simultaneously, but it got to the point where the server was suffering, so I started to have these things start at a random time after startup (a random delay that might be as long as 5 minutes), spreading out some of the initial load.

Most of these tasks only run once every 6 hours. Most of these tasks only take 5 to 20 minutes to run, but if they all run at the same time, then it puts a strain on the server.

(let [tp (at/mk-pool)]
  (log/initiate tp)
  (world/initiate tp)
  (at/at (+ (long (rand-int 300000)) (at/now)) #(reports/initiate tp) tp) 
  (at/at (+ (long (rand-int 300000)) (at/now)) #(push/initiate tp) tp)
  (at/at (+ (long (rand-int 300000)) (at/now)) #(delete-old-data/initiate tp) tp)
  (at/at (+ (long (rand-int 300000)) (at/now)) #(algorithm/initiate tp) tp)
  (at/at (+ (long (rand-int 300000)) (at/now)) #(mailgun/initiate tp) tp)
  (at/at (+ (long (rand-int 300000)) (at/now)) #(stripe/initiate tp) tp)
  (at/at (+ (long (rand-int 300000)) (at/now)) #(luma/initiate tp) tp)
  (at/at (+ (long (rand-int 300000)) (at/now)) #(hubspot/initiate tp) tp)
  (at/at (+ (long (rand-int 300000)) (at/now)) #(supabase/initiate tp) tp)
  (at/at (+ (long (rand-int 300000)) (at/now)) #(calendly/initiate tp) tp)
  (at/at (+ (long (rand-int 300000)) (at/now)) #(user/initiate tp) tp)
  (at/at (+ (long (rand-int 300000)) (at/now)) #(judge/initiate tp) tp))

I am currently running this on an EC2 instance at AWS. The instance has 8 CPUs and 32 gigs of RAM.

I run htop to see how much load the server is under. Since I have 8 CPUs, I figure anything under a load of 8 is fine.

But lately, I have added some new tasks and now the load is hitting 10, when all the background tasks run at once.

This server is not memory constrained. Of the 32 gigs of RAM, I think I’ve only ever seen 7 gigs in use, and that is rare. Even with every task running simultaneously, the RAM in use is usually only 5 or 6. But the load goes up to 10, as I said.

I am trying to think about how to use the JVM scheduler to both speed things up but also spread out the load.

At one point I started wrapping some database updates in their own function pushed to the thread pool, and this gave me a significant speed up:

(at/at (+ 100 (at/now)) #(world/create (merge item {
:item-is-imported “yes”
:imported-via-api-from “hubspot”
:item-type item-type
:hubspot-id hubspot-id
})
:hubspot-id) tp)

But now I am thinking that doing this is also increasing the load on the server?

I’m assuming that when a thread is sleeping it imposes no burden on the server.

I also assume that feeding small functions to the thread pool allows the JVM scheduler to efficiently spread work to all of the CPUs. (Said differently, the JVM scheduler would not be able to efficiently spread large tasks, involving thousands of database calls, to the different CPUs, unless I first break up those large tasks into small tasks.)

But I’m also thinking this (dividing the work into many small tasks) allows the JVM to perhaps put too much pressure on the server?

There is very little happening on this server, other than this one app that I’m running, and it is not public, so I have almost total control regarding how fast the tasks should be fed to the server.

By default, the At library from Overtone creates a threadpool with threads set to “the number of CPUs, plus two.” I have accepted these defaults. Am I correct there would be less strain on the server if I set the threads to exactly the number of CPUs?

So, with all that as background, I am curious about:

how do I control the pace of work
how do I determine which tasks put the greatest strain on the server?
Are there any clever “emergency brakes” I can implement to keep the server from being overwhelmed?
When can I improve speed/performance by breaking down a task into smaller, more fine-grained tasks that can be fed to the threadpool independently of one another? (Versus when do I hit the limits specified by Amdahl’s law?)

p-himik · August 9, 2025, 5:30pm

I just want to mention one thing - “load average” is a poor metric when it comes to CPU utilization. It includes not just the running processes and scheduled processes, but also processes blocked on IO. Well, at least some kinds of IO - I’m not an expert here.

Given the overall description, it sounds like the main thing your server is doing is shuffling the data around, no number crunching. Which means that most of the time should be taken by IO.

lkrubner · August 12, 2025, 1:00am

Yes, that is correct. There is no number crunching. It’s almost entirely a question of making HTTP calls to Hubspot, Luma, Calendly, Stripe, etc, and then pushing the data into MongoDB, and then later pulling the data from MongoDB, doing some string comparisons, and then pushing some aggregated data back into MongoDB.

But I worry about the server becoming overwhelmed and possibly non-responsive, so I am curious:

how I can limit the load while still going fast
how do I know when more fine-grained or coarse-grained concurrency might help?

p-himik · August 12, 2025, 8:01am

The common approach is called backpressure. But it’s not applicable here because you have a small list of endpoints that you call on a specific schedule - there’s nothing to pressure against.
By analyzing CPU utilization. Not “load average” but proper charts in a proper profiler, when nothing else is running but your app. You want full CPU usage, no idling.
But you’re IO bound and you have periodic tasks with cadence of hours - I wouldn’t even look at the CPU, even a single-core one should be fine here since most of the time would be spent in IO, without any CPU usage. In your case, a more precious resource is RAM, but you said that its usage remains relatively low, so your system is simply fully capable of running that kind of load.

tl;dr: You’re overthinking it because “load average” is not a good metric.

maxweber · August 14, 2025, 9:30pm

If most of the tasks are IO bound you can use virtual threads to avoid that platform threads are mainly waiting on IO.

If you really have a very high CPU utilization consider to just make the EC2 instance bigger, especially if it does not need to run 24/7. Cheaper than spending too much time with the optimization.