I have a fairly basic web app that relies on Ring and Jetty. I followed the examples given online and set things up in what I assume is the standard way.
(defn initiate
[]
(try
(jetty/run-jetty
(wrap-json-body handler {:keywords? true :bigdecimals? true})
{:port 7001
:join? false})
(catch Exception e (pprint e))))
I set this on an EC2 instance with 8 CPUs and 32 gigs off RAM. I setup Apache to proxy to this app, so at a certain URL the requests actually go to the app, rather than to Apache.
All of this was working well for several weeks, and I have continued to add more features. Also, we opened to the public and started to get some traffic.
Then I ran into a strange problem: sometimes the app seems to hang. And Apache seems to think that the app is down. An outside request will get a 500 error and “Proxy is down” error. Apache seems to think the app is offline.
At first I assumed there was an Exception happening somewhere, something that was uncaught. But then I thought, how could that be? I’ve only one handler and I give that handler to Jetty, and the handler is wrapped in a try/catch block. That all happens, I assume, on the main thread, so if any Exception happened on the main thread, then I should see it.
So maybe the exception was happening on a background thread? I use at/at to give me a thread pool:
I run some tasks on a background thread.
But if the Exception happened on a background thread, it should not lock up Jetty, correct?
But anyway, out of an abundance of caution and curiosity, I wrapped almost every function in try/catch blocks, and still, I did not see any Exception happening.
So when the app locks up and fails to respond, I don’t think the problem is an Exception.
If I log into the server and run “htop” and watch the server when it is non-responsive, nothing seems obviously wrong. We use about 1.5 gigs of the 32 gigs of RAM. Load sometimes rises to 2, and I guess that is the average over the 8 CPUs. But a load of 2 doesn’t seem like all that much, I’ve certainly seen worse on other apps which remained responsive.
Another question I have, which is very basic, is about the Ring/Jetty connection. I am assuming this is multithreaded? I assume I don’t need to do any configuration to enable Jetty to handle multiple requests at once?
Assuming that is true, I moved on to other possible answers.
The key things are:
- a non-responsive app
- no Exceptions
- plenty of free RAM
- a load of no more than 2
Would I need to do anything to tell the app that it is allowed to use most of the RAM? Aside from Apache, the app is the only thing we run on this server.
But all of that is probably a distraction from the real issue.
I came up with another theory, and I’m curious what you all think about it.
Two weeks ago I noticed that 1% of our content got about 95% of our traffic. So I decided I would create an in-memory hot cache to hold that 1%. The “hot cache” is simply a hash-map, wrapped in an atom.
I’m thinking the episodes when the app seems non-responsive is because of contention on the atom? Some of the background tasks write to and read from it, and likewise, the main thread writes to it and reads from it. I’m thinking if the background tasks overwhelm the atom with writes, while the main thread needs to read from the atom, then the contention basically renders the main thread blocked?
If the main thread, and the background threads, were all trying to write and read to the same atom, what would the symptoms be? I suspect this is the problem, but I’d like to verify it. How would I verify this theory?
The question is really more general to all kinds of Clojure programming I might do in the future: how can I tell when an atom is overwhelmed with too many writes?