Connection pooling and multiple databases

DrLjotsson · February 21, 2021, 7:25pm

I have a Clojure web application that serves multiple clients (~80 and growing), each client is a research project that has their own MySQL database on our server. So we have ~80 databases that the application connects to (each client has their own subdomain, so the web request is mapped from domain to database).

I started the app using the Luminus template that uses the HikariCP connection pool. We had much fewer clients then so I created one pool for each client. When the number of MySQL connections grew, I reduced the number of connections per pool to 5, but that still means that we have 400 open connections on our server. I don’t think that is optimal.

I would like to solve this. I asked a question a few years ago on Stackoverflow (mysql - HikariCP number of connections in multiple pools - Stack Overflow) and got the suggestion to create a new connection for each request (so no connection pooling). I have not actually tried that yet, because I (erroneously?) believe that it would be preferable if I can use connection pool with a fixed number of max connections so that my connections don’t grow beyond control if multiple simultaneous requests are made to the server. And also, although the load is not super high on the server, sometimes we get quite a few simultaneous requests and if a connection pool would reduce response time compared to creating new connections, I would of course prefer that.

My idea is instead to have only one connection pool with a database user that can access all client databases. And to wrap each query like this

(defn wrap-query
  [pool db-name query]
  (jdbc/with-db-connection [conn pool]
    (jdbc/execute! conn (str "USE " db-name)) 
    (jdbc/query conn query)))

But that requires that there is no race condition and no other process can use the connection between the USE query and the actual query. But if I understand correctly, with-db-connection “checks out” the connection from the pool and returns it when it is “closed”?

Does it make sense to use a connection pool like that? Or is there any other way that I can reduce the number of open connections without opening one connection on each request?

seancorfield · February 21, 2021, 7:44pm

You are correct. That will check out a connection from the pool, bound to that local conn so it is entirely private to that function in that particular invocation (and whatever you pass it into), and then “close” it by giving it back to the pool. You’ll get as many open connections as you have concurrent requests on that pool, modulo whatever the pool keeps open per the pool config.

We use multiple databases at work and we have a “reporting” connection pool using a particular user account that has select-only access to all databases, and we have queries that run across multiple databases by using dbname.tablename in the queries (rather than just tablename).

You probably should check the connection pooling library’s documentation to make sure they don’t make assumptions about multiple vs single databases in how they handling pooling – just in case there are any subtle caveats.

DrLjotsson · February 21, 2021, 8:19pm

Thanks @seancorfield! May I ask what you think such a caveat in HikariCP could be? That something could go wrong if USE is used?

didibus · February 21, 2021, 9:24pm

Why don’t you think this is optimal? Are you running out of memory? If not, this still seems ideal to me.

Unless you’re saying that users can create new DB and thus new pools dynamically? So in theory your server is unbounded and user behavior could result in maxing out your memory?

Ok, I’m not familiar enough with JDBC, but this makes no sense to me. The connection that you’re reusing is to a particular machine, I guess maybe JDBC can partially rebind a connection to a new DB and still reuse some of the object memory? But I’d also feel the benefit here would be much lower, since you’d need to renegotiate the connection and handshake and everything again with the other DB.

Personally, I’d try what you were suggested, just get rid of the pool entirely. Now measure your request response time. If they’re an acceptable latency, you don’t need anymore complexity.

If they are too slow, measure the time to create the connection, if this is where the slowdowns are, reintroduce a pool.

With regards to the pool at this point, I would investigate if there’s a way to create a pool that is more like a Least Frequently Used cache that can hold connections to different databases. That way you can set a global maximum of connection accros all DB and it’ll reuse them for the clients who are most active, and won’t for those who are least active.

Edit:

(jdbc/execute! conn (str "USE " db-name))

Oh, sorry did you mean you have one Database but multiple instance within it? I assumed you had multiple databases (not instances within a single one).

DrLjotsson · February 21, 2021, 9:52pm

Thanks @didibus.

Perhaps I’m using the wrong terminology. I have one MySQL server running which has 80 databases (or schemas as they are called in MySQL), one for each client.

Oh, maybe USE to switch database is not quite as innocent as I thought then? I imagined just connecting to anyone of the 80 databases with a database user that has access to all databases and then switching the one I want to use for the specific query. But that switch maybe entails a lot of overhead?

DrLjotsson · February 21, 2021, 9:57pm

Maybe it’s not as bad as I think. It’s just that HikariCP’s readme (GitHub - brettwooldridge/HikariCP: 光 HikariCP・A solid, high-performance, JDBC connection pool at last.) has a lot of bad things to say about multiple database connections.

Users cannot add new databases, I have complete control over that and can also add more memory if needed. It just feels bad to have 400 open connections when 10 would probably do if I was able to bind them to different databases for each query.

seancorfield · February 21, 2021, 10:50pm

MySQL uses “database” where a lot of other vendors talk about “schema” so that’s the root of the confusion here I think?

Example:

We have a cluster of MySQL servers (with a primary and one or more secondaries). We use a proxy process on each of our application servers, which manages the actual connections to the primary and/or secondary MySQL server instance. Our applications talk to the proxy and have connection pools (in JDBC, in memory) and the proxy in turn takes those connections and routes them to the actual MySQL servers – so there are two layers of connection pooling going on here.

Inside the primary (database) server, we have multiple “databases” (schemas). Typically, when you set up a datasource in JDBC, you specify the default database it should use (:dbname in the Clojure hash map used to describe the “db-spec” for both clojure.java.jdbc and next.jdbc). If you access a database via a user account with permission to access multiple “databases”, you can select .. from db1.table1 .. or select .. from db2.table2 .. so a connection to one “database” can actually get access to multiple “databases” in MySQL

Where I offered the caveat to @DrLjotsson around this is to do with how the connection pooling library may or may not use the selected database name (schema name!) in how it organizes the pools. I am fuzzy on the details but I remember reading some cautions some time ago about trying to use some connection pooling software with multiple databases (schemas). I just had a quick read over the HikariCP and c3p0 docs and can’t find anything about that now… I did see that c3p0 partitions pools based on credentials (but HikariCP does not). So… maybe that caveat was just me being inherently suspicious about potential complexity

didibus · February 22, 2021, 1:23am

Ya, that confused me. Hum, I don’t know if HikariCP schema property can be dynamically changed? That could be useful if so.

Also, I found this: GitHub - vladmihalcea/flexy-pool: FlexyPool adds metrics and failover strategies to a given Connection Pool, allowing it to resize on demand.

It seems what you could do here is have one pool for each DB schema (or even seperate DB server if you wanted), but have each pool dynamically resize themselves as demand for them happens. So potentially you could set it up so some pool even go down to zero when not actively used. Have not tried this lib but it seems well supported and like it be possible to do this with it.

Edit: Hum, or maybe it only does metrics, it has a bit of a poor documentation.

seancorfield · February 22, 2021, 2:30am

You can already do something very close to that with both HikariCP and c3p0 by using the appropriate settings around idle pool sizes etc.

system · August 23, 2021, 2:30pm

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.