Next.jdbc -- early access

Cool. What’s the rationale of next.jdbc in comparison with clojure.java.jdbc ? What led you to develop it ?

1 Like

There are a number of drivers for a new version of clojure.java.jdbc:

  • Performance: there’s a surprising amount of overhead in how ResultSet objects are converted to sequences of hash maps – which can be really noticeable for large result sets – so I want a better way to handle that.
  • A more modern API, based on using qualified keywords and transducers etc: :qualifier and reducible-query in recent clojure.java.jdbc versions were steps toward that but there’s a lot of “legacy” API in the library and I want to present a more focused, more streamlined API so folks naturally use the IReduceInit / transducer approach from day one and benefit from qualified keywords. I’m still contemplating whether there are reasonable ways to integrate with clojure.spec (for example, if you have specs of your data model, could next.jdbc leverage that somehow?).
  • Simplicity: clojure.java.jdbc uses a variety of ways to execute SQL which can lead to inconsistencies and surprises – query, execute!, db-do-commands are all different ways to execute different types of SQL statement so you have to remember which is which and you often have to watch out for restrictions in the underlying JDBC API.

Those are my three primary drivers. In addition, the db-spec-as-hash-map approach in clojure.java.jdbc has caused a lot of frustration and confusion in the past, especially with the wide range of conflicting options that are supported. next.jdbc is heavily protocol-based so it’s easier to mix’n’match how you use it with direct Java JDBC code (and the protocol-based approach contributes to the improved performance overall). There’s a much clearer path of db-spec -> DataSource -> Connection now, which should steer people toward more connection reuse and better performing apps.

I also wanted datafy/nav support baked right in (it was added to clojure.java.jdbc back in December as an undocumented, experimental API in a separate namespace). I wanted it to be “free” in terms of performance (it isn’t quite – my next round of changes should address that).

The API so far is still very much a work-in-progress. I’m still very conflicted about the “syntactic sugar” SQL functions (insert!, query, update!, and delete!). They go beyond what I really want to include in the API, but I know that their equivalents in clojure.java.jdbc are heavily used (based on the number of questions and JIRA issues I get).

My latest round of changes exposed the mapped-function-over-rows API more prominently, but I’m still not happy with the “feel” of that aspect of the API yet (it creates a tension with the datafication behavior).

So, while I’m comfortable to put it out there and get feedback – and I’ve had lots of great feedback so far – expect to see more changes, possible some dramatic ones, in the next month or so before I actually settle on where the library will live and what the published artifacts will look like. (@didibus this is a change of position compared to what I said above two days ago – based on extensive discussions with someone at Cognitect since then – sorry!).

4 Likes

For the people not on slack. I have a small microservice I was already benchmarking, with just 6 functions using the database. It was very easy to change from clojure.java.jdbc. Performance was about the same, cpu a bit lower. Most importantly there were thousands of transactions during testing which all went well.
Great work.

2 Likes

Thanks for the detailed explanation Sean. I use clojure.java.jdbc mainly through the korma lib so I have not an extended experience with it, but I have to reckon I am disoriented by jdbc’s API when I have to directly use it so I’m glad to hear you’re working on a simplified API query, execute!, etc…

Also is there any plan for supporting result sets streaming ?

connection = PG::Connection.open(...)

# Note: This will NOT execute query as soon as this line is evaluated by Ruby!
connection.send_query("SELECT * FROM gigantic_query")
connection.set_single_row_mode
connection.get_result.stream_each do |row|
  # Do something with the row!
end

Source: https://medium.com/@johnmcclean/jdbc-processing-options-with-cyclops-react-49d62b02f775

1 Like

Unfortunately, getting database drivers to stream results is highly database-specific. I know people who have succeeded in doing this for both MySQL and PostgreSQL – but they needed different configurations. So, the short answer is “yes, you can do this” – but not in any way that is portable and therefore not in any way that either clojure.java.jdbc or next.jdbc can directly support out of the box.

You typically need to set the connection to not auto-commit (there’s a keyword argument for that, in both libs), and you need to set fetch-size (again, there’s a keyword argument for that), and then some other incantations. And you need to use the reducible-query! in clojure.java.jdbc or reducible! in next.jdbc.

1 Like

I’ve written up more of my thoughts about the motivation and design of next.jdbc (some of it based on my comments above): https://github.com/seancorfield/next-jdbc/blob/master/README.md

What’s clear to me at this point is that I need to rethink the row realization aspect of the library and I need to focus more on the three primary intended use cases (single row fetch, multi-row fetch, process result set).

Thank you, everyone, for all the feedback so far!

3 Likes

Will the default be to return maps with namespaced keywords in the future? Is there support for other return types? I’m bit worried with the “qualify all the things” in the low level libraries, at least before Records can be optimized to support them. There was some discussion that the next version of Ring could also lean on namespaced keys(??).

About perf: to get close to the native java performance, we should do two things:

  1. Precompile the queries: given an connection and optionally parameters, the query is just executed. All options, factories etc. are done in the (pre)compile stage.

  2. Fast immutable query data: Persistent maps are much slower than Classes/Types/Records.

For an average low traffic enterprise app, fully dynamic response maps with qualified keys read from ResultSet metadata are just great (and a good companion to clojure.spec), but for high traffic (and high profile!) apps, we should be able use idiomatic clojure with ~Java-speed.

Did a small spike of porting some of the perf stuff from our libs into a minimalistic query-only jdbc wrapper: it allows statically populating return records, generating response records from (compiled) query, precompiling everything, including map responses etc. In the simple criterium micro-benchmarks, it seems to be 10x faster on query-side (including a roundtrup into the h2 memory database!). Not sure if I’m using next.jdbc correctly yet, so the results might be totally bogus. Please correct if I’m using it wrong.

Anyway, I think that the clean separation of compilation and execution time would allow a single library to serve both use cases: fast tools for those (high-profile apps/startups) who need the perf and a “simple” api layer that just runs the compilation and the execution in a sequence, effectively giving the current api (and perf) of current jdbc.next.

What do you think? How to proceed? Big internal refactor PR to jdbc.next? Something else? Would be great to have just one solid and maintained solution for the community, and outside of the contribs.

Quick results, a modified criterium suite:

(defrecord Fruit [id name appearance cost grade])

;; java-fast mapping
(defn rs->fruit [^ResultSet rs]
  (->Fruit
    (.getObject rs 1)
    (.getObject rs 2)
    (.getObject rs 3)
    (.getObject rs 4)
    (.getObject rs 5)))

;; 680ns (vanilla java)
(bench!
  "java"
  (java-query "SELECT * FROM fruit" con))

;; 690ns (hand-crafted function to populate the record)
(let [query (p/compile "SELECT * FROM fruit" {:row rs->fruit})]
  (bench!
    "p: manual, record"
    (query con)))

;; 710ns (infer code of rs->fruit from a given record class)
(let [query (p/compile "SELECT * FROM fruit" {:row (p/rs->record Fruit)})]
  (bench!
    "p: derived, record"
    (query con)))

;; 710ns (create a new record for the unique resultset using compile-time inspection 
;; + a positional constructor for it, memoized + macro / eval => know what you are doing!)
(let [query (p/compile "SELECT * FROM fruit" {:con con :row-gen p/record-row-gen})]
  (bench!
    "p: compiled map"
    (query con)))

;; 1800ns (precompiled fully dynamic map result, just like next.jdbc / java.jdbc)
(let [query (p/compile "SELECT * FROM fruit" {:con con})]
  (bench!
    "p: dynamic map"
    (query con)))

;; 3000ns (fully dynamic map result, just like next.jdbc / java.jdbc)
(let [query (p/compile "SELECT * FROM fruit")]
  (bench!
    "p: dynamic map"
    (query con)))

;; 6700ns
(bench!
  "next.jdbc: reducible!"
  (into [] (map (partial into {})) (j/reducible! con ["select * from fruit"])))

;; 7000ns
(bench!
  "next.jdbc: execute!"
  (j/execute!
    con
    ["select * from fruit"]
    (partial into {})
    {}))

;; 8000ns
(bench!
  "java.jdbc: query"
  (j1/query {:connection con} ["SELECT * FROM fruit"]))
3 Likes

The current performance-focused route is via reducible!, for processing large result sets. That doesn’t create a hash map at all if you only use operations that lookup keys.

The other two scenarios that are goals of the library are fetch a single, fully-realized row and fetch a fully-realized sequence of rows. I’ve been talking to Ghadi about various options for allowing extensible row-building. If I go that route, it could produce hash maps with qualified keys, arrays of column names/row values, or anything else you want – including records – but qualified keywords will be the default choice because I believe that is the “right” default for most Clojure apps, especially those that also use spec.

I’d be interested in seeing what you have in mind for a separation of “compile” and “execute” (but not as a PR – and I’ll remind everyone of https://github.com/seancorfield/next-jdbc/blob/master/CONTRIBUTING.md at this point).

2 Likes

Sorry, didn’t notice the CONTRIBUTING. Pushed my ideas with tests into https://github.com/metosin/porsas. Hopefully some of the work can be joined later.

2 Likes

I’ve reworked next.jdbc to include RowBuilder and ResultSetBuilder inspired by discussions with Ghadi. I’ve added an example of building Fruit records with a custom builder.

I’ve also moved the “sugar” functions to next.jdbc.sql, deleted execute! and execute-one! from result-set and reimplemented them as part of the Executable protocol so they no longer go through the reducible! path (which speeds them up quite a bit).

I’m not sure how much further I want to pursue performance at this point, given the flexibility I still want users of the library to have (and I am not moving to a “precompile”/“execute”, beyond what you can do with creating a PreparedStatement and then executing/reducing it).

Feedback is welcome but I’m getting close to the point where I want to decide on a permanent home and make a release…

3 Likes

See Seancorfield/next.jdbc "1.0.0-alpha8"

2 Likes

I don’t understand why do you need datafy at all, why not just return navigables that return other navigables? Your use of datafy seems like a no-op, and since datafy works on everything (returning identity by default), just returning navigables will do the same with fewer hops.

Another thing that slightly worries me is that datafy in this case is basically a memory leak: you keep a reference to a connection that is useful only during development (probably?).

For now, datafy is (almost) a no-op in this case – it takes a row in a result set and makes it navigable. The idiom is Thing -> datafy -> (navigable) data -> nav -> new Thing – that’s just how the protocols work and what REBL etc expect. Could I just make rows Navigable and rely on datafy being a no-op on them always? Maybe, but making them truly Datafiable feels like the “right thing to do” based on the intended idiomatic usage. The datafy behavior could be changed/enhanced down the line and this way we already have a hook for that.

As for the memory leak, the reference isn’t going to be any longer-lived than the data it is attached to – and the recommendation for next.jdbc is to pass around a connectable such as a Datasource which would be much longer-lived anyway (if you’re using a connection pooled datasource, that’s what you’d be passing to most operations anyway).

It is one of the caveats of using REBL with this, that if you do pass a Connection instead, and it gets .closed before REBL calls nav, you’ll get an error anyway.

1 Like

Hello, I got this very confusing

ClassCastException/CompilerException “java.lang.Character cannot be cast to java.lang.String” at “prepare.clj” 82

When I forgot to put the SQL into a vector:

(let [my-datasource (get-datasource "jdbc:h2:tcp://localhost:1521/opt/h2-data/mb;MODE=Oracle;USER=sa")]
    (with-open [connection (get-connection my-datasource)]
      (execute-one! connection "SELECT ENABLED FROM organization"))) ; <- ERR

I realize this is alpha lib but anyway, perhaps something to put on the todo list, to
provide better error message here?

Thanks for the library, BTW!

I think it should be (execute-one! connection ["SELECT ENABLED FROM organization"]).

Edit: the question was about error messages. Could the functions allow both strings and vectors?

No, I’m not making that mistake again Clojure Don’ts: The Heisenparameter – Digital Digressions by Stuart Sierra

I did it in clojure.java.jdbc and it propagated conditional logic through a lot of the library (since the API was so wide) and it made the specs harder to write and it also impacts performance: everyone pays the price just so some people can omit [ .. ] around SQL statements that have no parameters.

next.jdbc’s API is much narrower so it wouldn’t be as bad but it was a deliberate decision not to do this.

Options are:

  • Add an assert
  • Add a :pre condition
  • Add an explicit conditional
  • Provide specs and let developers instrument code in development if they want

The first three all incur a cost for everyone (the first two could be turned off for production but I don’t know how many people really do that) so I don’t want to go down that path.

The fourth is opt-in so I am more inclined to do that. If you had specs for next.jdbc, would that address your request @HolyJak?

Good points. And yes, it would (providing the Getting started guide mentioned it and demonstrated how to turn them on.

1 Like

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.