I have a Clojure web application that serves multiple clients (~80 and growing), each client is a research project that has their own MySQL database on our server. So we have ~80 databases that the application connects to (each client has their own subdomain, so the web request is mapped from domain to database).
I started the app using the Luminus template that uses the HikariCP connection pool. We had much fewer clients then so I created one pool for each client. When the number of MySQL connections grew, I reduced the number of connections per pool to 5, but that still means that we have 400 open connections on our server. I donât think that is optimal.
I would like to solve this. I asked a question a few years ago on Stackoverflow (mysql - HikariCP number of connections in multiple pools - Stack Overflow) and got the suggestion to create a new connection for each request (so no connection pooling). I have not actually tried that yet, because I (erroneously?) believe that it would be preferable if I can use connection pool with a fixed number of max connections so that my connections donât grow beyond control if multiple simultaneous requests are made to the server. And also, although the load is not super high on the server, sometimes we get quite a few simultaneous requests and if a connection pool would reduce response time compared to creating new connections, I would of course prefer that.
My idea is instead to have only one connection pool with a database user that can access all client databases. And to wrap each query like this
But that requires that there is no race condition and no other process can use the connection between the USE query and the actual query. But if I understand correctly, with-db-connection âchecks outâ the connection from the pool and returns it when it is âclosedâ?
Does it make sense to use a connection pool like that? Or is there any other way that I can reduce the number of open connections without opening one connection on each request?
You are correct. That will check out a connection from the pool, bound to that local conn so it is entirely private to that function in that particular invocation (and whatever you pass it into), and then âcloseâ it by giving it back to the pool. Youâll get as many open connections as you have concurrent requests on that pool, modulo whatever the pool keeps open per the pool config.
We use multiple databases at work and we have a âreportingâ connection pool using a particular user account that has select-only access to all databases, and we have queries that run across multiple databases by using dbname.tablename in the queries (rather than just tablename).
You probably should check the connection pooling libraryâs documentation to make sure they donât make assumptions about multiple vs single databases in how they handling pooling â just in case there are any subtle caveats.
Why donât you think this is optimal? Are you running out of memory? If not, this still seems ideal to me.
Unless youâre saying that users can create new DB and thus new pools dynamically? So in theory your server is unbounded and user behavior could result in maxing out your memory?
Ok, Iâm not familiar enough with JDBC, but this makes no sense to me. The connection that youâre reusing is to a particular machine, I guess maybe JDBC can partially rebind a connection to a new DB and still reuse some of the object memory? But Iâd also feel the benefit here would be much lower, since youâd need to renegotiate the connection and handshake and everything again with the other DB.
Personally, Iâd try what you were suggested, just get rid of the pool entirely. Now measure your request response time. If theyâre an acceptable latency, you donât need anymore complexity.
If they are too slow, measure the time to create the connection, if this is where the slowdowns are, reintroduce a pool.
With regards to the pool at this point, I would investigate if thereâs a way to create a pool that is more like a Least Frequently Used cache that can hold connections to different databases. That way you can set a global maximum of connection accros all DB and itâll reuse them for the clients who are most active, and wonât for those who are least active.
Edit:
(jdbc/execute! conn (str "USE " db-name))
Oh, sorry did you mean you have one Database but multiple instance within it? I assumed you had multiple databases (not instances within a single one).
Perhaps Iâm using the wrong terminology. I have one MySQL server running which has 80 databases (or schemas as they are called in MySQL), one for each client.
Oh, maybe USE to switch database is not quite as innocent as I thought then? I imagined just connecting to anyone of the 80 databases with a database user that has access to all databases and then switching the one I want to use for the specific query. But that switch maybe entails a lot of overhead?
Users cannot add new databases, I have complete control over that and can also add more memory if needed. It just feels bad to have 400 open connections when 10 would probably do if I was able to bind them to different databases for each query.
MySQL uses âdatabaseâ where a lot of other vendors talk about âschemaâ so thatâs the root of the confusion here I think?
Example:
We have a cluster of MySQL servers (with a primary and one or more secondaries). We use a proxy process on each of our application servers, which manages the actual connections to the primary and/or secondary MySQL server instance. Our applications talk to the proxy and have connection pools (in JDBC, in memory) and the proxy in turn takes those connections and routes them to the actual MySQL servers â so there are two layers of connection pooling going on here.
Inside the primary (database) server, we have multiple âdatabasesâ (schemas). Typically, when you set up a datasource in JDBC, you specify the default database it should use (:dbname in the Clojure hash map used to describe the âdb-specâ for both clojure.java.jdbc and next.jdbc). If you access a database via a user account with permission to access multiple âdatabasesâ, you can select .. from db1.table1 .. or select .. from db2.table2 .. so a connection to one âdatabaseâ can actually get access to multiple âdatabasesâ in MySQL
Where I offered the caveat to @DrLjotsson around this is to do with how the connection pooling library may or may not use the selected database name (schema name!) in how it organizes the pools. I am fuzzy on the details but I remember reading some cautions some time ago about trying to use some connection pooling software with multiple databases (schemas). I just had a quick read over the HikariCP and c3p0 docs and canât find anything about that now⊠I did see that c3p0 partitions pools based on credentials (but HikariCP does not). So⊠maybe that caveat was just me being inherently suspicious about potential complexity
It seems what you could do here is have one pool for each DB schema (or even seperate DB server if you wanted), but have each pool dynamically resize themselves as demand for them happens. So potentially you could set it up so some pool even go down to zero when not actively used. Have not tried this lib but it seems well supported and like it be possible to do this with it.
Edit: Hum, or maybe it only does metrics, it has a bit of a poor documentation.