Cool. What’s the rationale of next.jdbc
in comparison with clojure.java.jdbc
? What led you to develop it ?
There are a number of drivers for a new version of clojure.java.jdbc
:
- Performance: there’s a surprising amount of overhead in how
ResultSet
objects are converted to sequences of hash maps – which can be really noticeable for large result sets – so I want a better way to handle that. - A more modern API, based on using qualified keywords and transducers etc:
:qualifier
andreducible-query
in recentclojure.java.jdbc
versions were steps toward that but there’s a lot of “legacy” API in the library and I want to present a more focused, more streamlined API so folks naturally use theIReduceInit
/ transducer approach from day one and benefit from qualified keywords. I’m still contemplating whether there are reasonable ways to integrate withclojure.spec
(for example, if you have specs of your data model, couldnext.jdbc
leverage that somehow?). - Simplicity:
clojure.java.jdbc
uses a variety of ways to execute SQL which can lead to inconsistencies and surprises –query
,execute!
,db-do-commands
are all different ways to execute different types of SQL statement so you have to remember which is which and you often have to watch out for restrictions in the underlying JDBC API.
Those are my three primary drivers. In addition, the db-spec-as-hash-map approach in clojure.java.jdbc
has caused a lot of frustration and confusion in the past, especially with the wide range of conflicting options that are supported. next.jdbc
is heavily protocol-based so it’s easier to mix’n’match how you use it with direct Java JDBC code (and the protocol-based approach contributes to the improved performance overall). There’s a much clearer path of db-spec -> DataSource
-> Connection
now, which should steer people toward more connection reuse and better performing apps.
I also wanted datafy
/nav
support baked right in (it was added to clojure.java.jdbc
back in December as an undocumented, experimental API in a separate namespace). I wanted it to be “free” in terms of performance (it isn’t quite – my next round of changes should address that).
The API so far is still very much a work-in-progress. I’m still very conflicted about the “syntactic sugar” SQL functions (insert!
, query
, update!
, and delete!
). They go beyond what I really want to include in the API, but I know that their equivalents in clojure.java.jdbc
are heavily used (based on the number of questions and JIRA issues I get).
My latest round of changes exposed the mapped-function-over-rows API more prominently, but I’m still not happy with the “feel” of that aspect of the API yet (it creates a tension with the datafication behavior).
So, while I’m comfortable to put it out there and get feedback – and I’ve had lots of great feedback so far – expect to see more changes, possible some dramatic ones, in the next month or so before I actually settle on where the library will live and what the published artifacts will look like. (@didibus this is a change of position compared to what I said above two days ago – based on extensive discussions with someone at Cognitect since then – sorry!).
For the people not on slack. I have a small microservice I was already benchmarking, with just 6 functions using the database. It was very easy to change from clojure.java.jdbc. Performance was about the same, cpu a bit lower. Most importantly there were thousands of transactions during testing which all went well.
Great work.
Thanks for the detailed explanation Sean. I use clojure.java.jdbc
mainly through the korma lib so I have not an extended experience with it, but I have to reckon I am disoriented by jdbc’s API when I have to directly use it so I’m glad to hear you’re working on a simplified API query
, execute!
, etc…
Also is there any plan for supporting result sets streaming ?
connection = PG::Connection.open(...)
# Note: This will NOT execute query as soon as this line is evaluated by Ruby!
connection.send_query("SELECT * FROM gigantic_query")
connection.set_single_row_mode
connection.get_result.stream_each do |row|
# Do something with the row!
end
Source: https://medium.com/@johnmcclean/jdbc-processing-options-with-cyclops-react-49d62b02f775
Unfortunately, getting database drivers to stream results is highly database-specific. I know people who have succeeded in doing this for both MySQL and PostgreSQL – but they needed different configurations. So, the short answer is “yes, you can do this” – but not in any way that is portable and therefore not in any way that either clojure.java.jdbc
or next.jdbc
can directly support out of the box.
You typically need to set the connection to not auto-commit (there’s a keyword argument for that, in both libs), and you need to set fetch-size (again, there’s a keyword argument for that), and then some other incantations. And you need to use the reducible-query!
in clojure.java.jdbc
or reducible!
in next.jdbc
.
I’ve written up more of my thoughts about the motivation and design of next.jdbc
(some of it based on my comments above): https://github.com/seancorfield/next-jdbc/blob/master/README.md
What’s clear to me at this point is that I need to rethink the row realization aspect of the library and I need to focus more on the three primary intended use cases (single row fetch, multi-row fetch, process result set).
Thank you, everyone, for all the feedback so far!
Will the default be to return maps with namespaced keywords in the future? Is there support for other return types? I’m bit worried with the “qualify all the things” in the low level libraries, at least before Records can be optimized to support them. There was some discussion that the next version of Ring could also lean on namespaced keys(??).
About perf: to get close to the native java performance, we should do two things:
-
Precompile the queries: given an connection and optionally parameters, the query is just executed. All options, factories etc. are done in the (pre)compile stage.
-
Fast immutable query data: Persistent maps are much slower than Classes/Types/Records.
For an average low traffic enterprise app, fully dynamic response maps with qualified keys read from ResultSet
metadata are just great (and a good companion to clojure.spec
), but for high traffic (and high profile!) apps, we should be able use idiomatic clojure with ~Java-speed.
Did a small spike of porting some of the perf stuff from our libs into a minimalistic query-only jdbc wrapper: it allows statically populating return records, generating response records from (compiled) query, precompiling everything, including map responses etc. In the simple criterium micro-benchmarks, it seems to be 10x faster on query-side (including a roundtrup into the h2 memory database!). Not sure if I’m using next.jdbc
correctly yet, so the results might be totally bogus. Please correct if I’m using it wrong.
Anyway, I think that the clean separation of compilation and execution time would allow a single library to serve both use cases: fast tools for those (high-profile apps/startups) who need the perf and a “simple” api layer that just runs the compilation and the execution in a sequence, effectively giving the current api (and perf) of current jdbc.next
.
What do you think? How to proceed? Big internal refactor PR to jdbc.next
? Something else? Would be great to have just one solid and maintained solution for the community, and outside of the contribs.
Quick results, a modified criterium suite:
(defrecord Fruit [id name appearance cost grade])
;; java-fast mapping
(defn rs->fruit [^ResultSet rs]
(->Fruit
(.getObject rs 1)
(.getObject rs 2)
(.getObject rs 3)
(.getObject rs 4)
(.getObject rs 5)))
;; 680ns (vanilla java)
(bench!
"java"
(java-query "SELECT * FROM fruit" con))
;; 690ns (hand-crafted function to populate the record)
(let [query (p/compile "SELECT * FROM fruit" {:row rs->fruit})]
(bench!
"p: manual, record"
(query con)))
;; 710ns (infer code of rs->fruit from a given record class)
(let [query (p/compile "SELECT * FROM fruit" {:row (p/rs->record Fruit)})]
(bench!
"p: derived, record"
(query con)))
;; 710ns (create a new record for the unique resultset using compile-time inspection
;; + a positional constructor for it, memoized + macro / eval => know what you are doing!)
(let [query (p/compile "SELECT * FROM fruit" {:con con :row-gen p/record-row-gen})]
(bench!
"p: compiled map"
(query con)))
;; 1800ns (precompiled fully dynamic map result, just like next.jdbc / java.jdbc)
(let [query (p/compile "SELECT * FROM fruit" {:con con})]
(bench!
"p: dynamic map"
(query con)))
;; 3000ns (fully dynamic map result, just like next.jdbc / java.jdbc)
(let [query (p/compile "SELECT * FROM fruit")]
(bench!
"p: dynamic map"
(query con)))
;; 6700ns
(bench!
"next.jdbc: reducible!"
(into [] (map (partial into {})) (j/reducible! con ["select * from fruit"])))
;; 7000ns
(bench!
"next.jdbc: execute!"
(j/execute!
con
["select * from fruit"]
(partial into {})
{}))
;; 8000ns
(bench!
"java.jdbc: query"
(j1/query {:connection con} ["SELECT * FROM fruit"]))
The current performance-focused route is via reducible!
, for processing large result sets. That doesn’t create a hash map at all if you only use operations that lookup keys.
The other two scenarios that are goals of the library are fetch a single, fully-realized row and fetch a fully-realized sequence of rows. I’ve been talking to Ghadi about various options for allowing extensible row-building. If I go that route, it could produce hash maps with qualified keys, arrays of column names/row values, or anything else you want – including records – but qualified keywords will be the default choice because I believe that is the “right” default for most Clojure apps, especially those that also use spec.
I’d be interested in seeing what you have in mind for a separation of “compile” and “execute” (but not as a PR – and I’ll remind everyone of https://github.com/seancorfield/next-jdbc/blob/master/CONTRIBUTING.md at this point).
Sorry, didn’t notice the CONTRIBUTING. Pushed my ideas with tests into https://github.com/metosin/porsas. Hopefully some of the work can be joined later.
I’ve reworked next.jdbc
to include RowBuilder
and ResultSetBuilder
inspired by discussions with Ghadi. I’ve added an example of building Fruit
records with a custom builder.
I’ve also moved the “sugar” functions to next.jdbc.sql
, deleted execute!
and execute-one!
from result-set
and reimplemented them as part of the Executable
protocol so they no longer go through the reducible!
path (which speeds them up quite a bit).
I’m not sure how much further I want to pursue performance at this point, given the flexibility I still want users of the library to have (and I am not moving to a “precompile”/“execute”, beyond what you can do with creating a PreparedStatement
and then executing/reducing it).
Feedback is welcome but I’m getting close to the point where I want to decide on a permanent home and make a release…
I don’t understand why do you need datafy at all, why not just return navigables that return other navigables? Your use of datafy seems like a no-op, and since datafy works on everything (returning identity by default), just returning navigables will do the same with fewer hops.
Another thing that slightly worries me is that datafy in this case is basically a memory leak: you keep a reference to a connection that is useful only during development (probably?).
For now, datafy
is (almost) a no-op in this case – it takes a row in a result set and makes it navigable. The idiom is Thing -> datafy
-> (navigable) data -> nav
-> new Thing – that’s just how the protocols work and what REBL etc expect. Could I just make rows Navigable
and rely on datafy
being a no-op on them always? Maybe, but making them truly Datafiable
feels like the “right thing to do” based on the intended idiomatic usage. The datafy
behavior could be changed/enhanced down the line and this way we already have a hook for that.
As for the memory leak, the reference isn’t going to be any longer-lived than the data it is attached to – and the recommendation for next.jdbc
is to pass around a connectable such as a Datasource
which would be much longer-lived anyway (if you’re using a connection pooled datasource, that’s what you’d be passing to most operations anyway).
It is one of the caveats of using REBL with this, that if you do pass a Connection
instead, and it gets .close
d before REBL calls nav
, you’ll get an error anyway.
Hello, I got this very confusing
ClassCastException/CompilerException “java.lang.Character cannot be cast to java.lang.String” at “prepare.clj” 82
When I forgot to put the SQL into a vector:
(let [my-datasource (get-datasource "jdbc:h2:tcp://localhost:1521/opt/h2-data/mb;MODE=Oracle;USER=sa")]
(with-open [connection (get-connection my-datasource)]
(execute-one! connection "SELECT ENABLED FROM organization"))) ; <- ERR
I realize this is alpha lib but anyway, perhaps something to put on the todo list, to
provide better error message here?
Thanks for the library, BTW!
I think it should be (execute-one! connection ["SELECT ENABLED FROM organization"])
.
Edit: the question was about error messages. Could the functions allow both strings and vectors?
No, I’m not making that mistake again Clojure Don’ts: The Heisenparameter – Digital Digressions by Stuart Sierra
I did it in clojure.java.jdbc
and it propagated conditional logic through a lot of the library (since the API was so wide) and it made the specs harder to write and it also impacts performance: everyone pays the price just so some people can omit [ .. ]
around SQL statements that have no parameters.
next.jdbc
’s API is much narrower so it wouldn’t be as bad but it was a deliberate decision not to do this.
Options are:
- Add an
assert
- Add a
:pre
condition - Add an explicit conditional
- Provide specs and let developers
instrument
code in development if they want
The first three all incur a cost for everyone (the first two could be turned off for production but I don’t know how many people really do that) so I don’t want to go down that path.
The fourth is opt-in so I am more inclined to do that. If you had specs for next.jdbc
, would that address your request @HolyJak?
Good points. And yes, it would (providing the Getting started guide mentioned it and demonstrated how to turn them on.
This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.