How do I develop intuitions for where to performance optimize?

lkrubner · March 9, 2025, 6:14pm

I’m working on a small social networking site. I need to match people to other people, based on their similarities. I’ve initially taken a naive approach, using a background process to create a match-preliminary between every user, then using other background processes to assess the two people in that match, raising or lowering the overall score, until finally, in some cases, the score becomes high enough that the match-preliminary is promoted to a match. That is, if we find, after multiple types of analysis, that two users have a lot in common, then we create a match between them.

I currently have 30 background processes going, each of which assesses the two users for different aspects, especially what they’ve written, and what their careers have been about.

I’m aware that if I did all the analysis in a single function, this would be more efficient, but I thought different processes for different aspects of the match would be more flexible, as a programming model. With this style of programming I can potentially take any one aspect of the analysis and make it a separate app.

Even given that my approach has been inefficient, I am still surprised by the strain on the server. I’m running an EC2 instance that has 96 gigs of RAM and 36 CPUs. A powerful machine. And yet, with only a few hundred users, my app uses up 14 gigs of RAM and, when I run htop I see that server load has risen to 8 or 9.

This app talks to MongoDB. I have done nothing, so far, to optimize the connection to MongoDB. I have not added any indexes to MongoDB. I plan to do that soon, though it would be helpful to have some intuitions about what doing so would mean, and how it would help. In other words, I am curious, should I assume that server load is high because my functions are slow, and the functions are slow because the queries to the database are slow? And therefore adding indexes to MongoDB would bring down the server load? Or is the issue only the wastefulness of the original algorithm, that is, creating a match-preliminary between every two pairs of users?

I’ve also added this line:

(Thread/sleep 100)

to try to spread out the load. But I assume having the functions pause like this, rather than race through as fast as possible, contributes to the memory usage, as each function is going slowly and holding onto memory while it does so. Do you think this ultimately makes server load worse?

The people in this network are not equal and therefore the relationship of user1 to user2 needs to be scored different than the relationship between user2 to user1, a fact which doubles the number of relationships that need to be built and scored.

As a programming model, I liked the flexibility it gives me to do each assessment in its own background task, but the strain on the server surprises me.

I’m curious if there are any tweaks that might reduce the server load? Are there tweaks to the garbage collection that might reclaim memory more quickly?

Here is the code I use to create the match-preliminary:

(defn user-matches-preliminary-create
  []
  (doseq [user1 (world/find-many {
                                  :item-type "user"
                                  :searchingEnabled true
                                  })]
    (doseq [user2 (world/find-many {
                                    :item-type "user"
                                    :searchingEnabled true
                                    })]
      (Thread/sleep 100)
      (let [
            preliminary (first (world/query {
                                             :item-type "user_matches_preliminary"
                                             :user-id (:user-id user1)
                                             :other-user-id (:user-id user2)
                                             }))
            ]
        (when (nil? preliminary)
          (world/create {
                         :item-type "user_matches_preliminary"
                         :user-id (:user-id user1)
                         :other-user-id (:user-id user2)
                         :match-score 0
                         :not-allowed []
                         }))))))

p-himik · March 9, 2025, 7:08pm

When it comes to performance and memory usage, you have three main friends: knowledge of algorithmic complexity, profiling, and tracing.

You preliminarily match every user to every user, twice. That’s O(n^2), it doesn’t scale.
It’s not my domain but I would be surprised if there’s no some more efficient algorithm.
It sounds very similar to a map and a task of finding places that are close to a specific place in terms of travel distance. Again - not my domain so I actually don’t have a clue, but on the surface it sounds reasonable to first find the closest places within some specific radius, without any regard for roads, and then filter those plates based on the roads. So also a two-pass approach, but it doesn’t involve matching every place to every place.
Places on a map are just dots in 2D space - you have an N dimensional space, but conceptually it’s the same problem. Unless you have some metrics that really, truly cannot be boiled down to a number.

As for profiling - well, you just do it. There’s no reason to guess when you can measure.

Same for tracing, although usually it’s just something that replaces profiling and debugging in production.

thheller · March 10, 2025, 8:58am

It might sound weird, but what helps me is saying what the code does out loud. Not in code, just describing the rough steps. No tools. Even better when someone else listens and asks questions. You’d be surprised how often code doesn’t actually do what you think it does. Sort of proof reading to check your assumptions. Repeatedly.

I guess some experience helps, but you’ll get a good intuition for what things roughly cost after a while. Just (time ...) is enough to check assumptions. Good tools help but aren’t required for the “easy wins”.

For your actual example code you get something like “load all users from the db, then for each user load all users from the db again and then for each …”. Kinda like a red flag and something that can easily be optimized to load all users only once.

Of course I can’t see what your code actually does, so there may be some lazy seqs involved that only actually load one user at a time and never holds all in memory. But if it does loading all repeatedly for each user will create a lot of unecessary strain on the system/db, since everything has to make it from the db to your memory and gc’d out of memory repeatedly.

habruening · March 10, 2025, 9:49pm

The performance issues clearly come from the algorithm. You can try to tweak it with multithreading or with optimised data structures or search with the profiler the most time consuming parts. But all this will not change the situation that the algorithm is the bottle neck.

If you have n users and m properties each, you have (n(n-1)/2)*m steps. That is too much. You have to find a way to reduce this. Unfortunately, I think, there is not much you can do against this. What you have is a high dimensional vector space and you calculate the distance between points. Sorry, this is a bit of mathematics. Imagine, a person has properties. You can write the properties in the format [p1,p2,p3,…,pn]. You can imagine this is a vector in an n dimensional vector space. If you have another person with properties [p’1,p’2,p’3,…,p’n], you calculate the Euclidian distance between them. There is no shortcut for this.

That been said, you should find an alternative. I have two ideas for you.

The vector space is very sparse. Normally a person only has a very few properties. This is a special case in mathematics. There are mathematical approximations that for example use a clustering of the data, so that the calculation is faster, not perfect, but good enough. Unfortunately this is not easy. You have to learn and understand the mathematics of this.
You should try to find ways to reduce the number of pairs. The most restricting properties should come first. The best is, if you find properties that make a match impossible. For example if you develop a dating app, you never want to mach a man with a man, unless both explicitly say so. Or you normally never want to match persons whose age differs in more than 5 years. Or you don’t want to match persons who live too far away from each others. You can do such checks first and exclude these pairs early. This safes you a lot of offort.

jwr · March 15, 2025, 1:36am

There is already a lot of good advice in this thread, and I won’t repeat it. Here’s my approach to thinking about performance (or more precisely, server-side performance of code running on the JVM, client-server systems and ClojureScript are different):

When writing code, I don’t think about performance. I only think about not doing obviously stupid things. Obviously stupid, as in getting the same data from the database twice because of the way code is structured, for example. Or using an incorrect algorithm, when a better one exists.
After the code is written, I check if I need to worry about performance at all. In the last several years that hasn’t happened, our computers are ridiculously fast and the JVM is an engineering marvel. I just checked, and I last renewed my YourKit profiler license in 2015!
If I do need to worry about performance, I measure. Intuition has never worked reliably for me, even after many years of experience. What I expect to find is 1-2 major performance problems, with the first one getting me easily >80% of the possible gains.
I fix the main performance problem. In my past work that sometimes meant just noticing a thing I did in a stupid way (most often), having to write the code differently, or having to rewrite it in a lower-level language. I try to keep the code in Clojure, if that’s not possible I go down to Java, and then there is always C and assembly (I did have to get down to assembly once!). But most of the time Clojure is enough!

kiramclean · March 17, 2025, 2:01pm

This is a great thread! I don’t have much to add other than that in general, understanding algorithms the way people have explained here is a great place to start for optimizing performance (assuming that is your bottleneck). Many of the most common problems in computer science have been heavily researched and solved, so you don’t need to invent an algorithm from scratch. You can find papers that describe the optimal solutions and then implement them, or find a library that has already implemented it.