HoneySQL -- What do you want from 2.0?

seancorfield · August 7, 2020, 6:02am

At World Singles Networks llc, we use HoneySQL very heavily. One of my former colleagues gave a talk about our use of HoneySQL at Clojure/West 2015. Fairly recently, I officially took over the repo from the original author Justin Kramer (and it’s been through a number of maintainers over the years). It has always sought to be vendor-neutral, focusing on ANSI SQL and this has often been a tension when looking at issues and pull requests. The general message has been to look at extension libraries (such as nilenso’s honeysql-postgres) for anything that is “non-standard” but it’s become clear over the years that this is neither very helpful nor an entirely tenable situation: the nilenso library is, I believe, the only HoneySQL extension library – we do not have equivalents for other databases.

I think part of the problem there is that it’s non-trivial to add new syntax support to HoneySQL and also that folks tend to extend it locally and don’t feel their individual extensions are worth turning into a library.

As noted in the GitHub issues, I’ve been thinking about what “2.0” might look like for quite a while and I’d like to widen the discussion here and see what sort of things folks would like to see in a future, reimagined version of the library.

From my point of view, I don’t think 2.0 can be backward compatible – it needs to be new namespaces, possibly a different data structure (although that is part that I’d like to retain), with simpler helpers (the merge/non-merge versions cause confusion and the helpers are not always consistent with the underlying data structure), a much easier extension process, and a bigger tent for SQL dialects.

I think it needs to encompass more of the SQL dialects – and where those conflict, it needs to provide dialect-specific ways to format the data structure into SQL strings (the different priorities of SET between MySQL and PostgreSQL come to mind – see Issue #235).

So let’s have a no-holds-barred discussion about what you’d like to see in HoneySQL, what problems you’ve hit, what just plain bugs you about it…

orestis · August 7, 2020, 2:51pm

How timely! I just finished a round of implementing a somewhat complex query using both HugSQL and HoneySQL to compare the two approaches.

HoneySQL wins when I want to manipulate the query at runtime: While HugSQL gives you the full Clojure power it’s a bit weird. The case that made me lean towards HoneySQL is adding a where clause that requires an extra join. I can express that like so:

(defn with-userlabels [q label-ids]
  (-> q
      (h/merge-left-join [:userlabels_array :ul] [:= :ul.user_id :users.id])
      (h/merge-where (userlabels-qf label-ids))))

Things that have been bugging me (i.e. why did I even consider HugSQL?)

documentation is hard to come by. The README covers some common cases but I feel that for many things I had to read the source (more on that later) or just google and see an issue where jkk would give some guidance.
I cannot understand the principles of organising the namespaces/functions. I have to require the helpers (obviously), the types (to use Postgres arrays), the format (to register the && array operator) and the core in order to get the job done.
The parameter handling is very eager, and raw/inline is also confusing. I want to construct the string where posts.state = 'publish' (note the single quotes, this is a literal string) and I had to resort to (honeysql/raw "'publish'") because inline was not quoting the string, and just passing in a string would generate a parameter (which is not what I want, since a literal string will be auto-casted by postgres).
Not generating pretty-printed SQL makes for very hard visual inspection.
I can only use the helpers and still generate an invalid sqlmap, which I will only find about when I call format. (I mean that format will throw an exception).
Most of the time I had to trial-and-error to figure out the correct syntax of things, e.g. the with helper. This is mostly because it’s not clear when I need to use a vector and when I need var-args. Reading the source wasn’t very enlightening.
Overabundance of keywords. I like Datalog’s use of quoted lists.

With all that said… it’s still an improvement for doing programmatic manipulation of sql queries. And, once you have a complex example in your codebase or a few small helpers for common stuff, all these issues disappear

Here’s some wild ideas for 2.0:

Pretty printed SQL
An interactive playground that auto-generates pretty-printed SQL as you type.
A curated set of examples for said playground
Better naming (builders, handlers, helpers oh my!)
tagged literals for escape hatches
More emphasis on named parameters rather than positional. The way e.g. psql handles them looks very interesting. The rationale being that I would much rather define my own “slots” in my query explicitly.
Perhaps support for outputs that are not sqlvec but that could be literally copied-pasted into an SQL program. Bonus if named parameters could be passed-through as-is in a way that psql understands.

Thanks for taking the maintenance burden of HoneySQL, and thanks for opening this discussion that will inevitable lead to a ton of bike-shedding

seancorfield · August 7, 2020, 4:06pm

Thanks for the detailed feedback. I agree with pretty much everything you wrote, except perhaps for named parameters (and the link you gave doesn’t seem to address that?).

A big issue for a lot of people seems to be writing function calls, especially ones that take non-simple parameters (e.g., date_sub( col, interval n hour ) is hard to write in HoneySQL). Is function call syntax not a pain point for you @orestis?

orestis · August 7, 2020, 4:10pm

Yes it is - I ended up writing my own cast, coalesce and jsonb helpers that do the right thing. I think the % prefix just muddled the waters.

What I haven’t found in honey sql that hugsql seems to nail is explicit quoting for identifiers, values and raw strings. I’m afk at the moment but I can share more explicit examples next week.

slifin · August 7, 2020, 5:13pm

For context I don’t really use HoneySQL in anger but as a hobby I am writing a PHP library that can communicate via Transit and Babushka to HoneySQL

The pretty print feature is going to be big for me, the kinds of queries where I want to use HoneySQL are nearly always large and so having multiple lines will be a game changer

I’m still not 100% clear on when to use raw vs inline vs param, that might just be something for the docs

SamiSandqvist · August 9, 2020, 8:27am

I use HoneySQL as well as the postgresql extensions at work. I am fairly familiar with the internals as I’ve had to write my own ltree support. My biggest pain point is the use of macros to generate the helpers. I use Cursive, so I have to type them every time. IMHO the few lines of code saved in the library is not worth the loss of helpful docstrings and autocomplete for some users.

And while it is not a major pain point, I would add input checking to the helpers so that the data representation will actually result valid SQL.

(-> (sql/select :foo)
    (sql/delete-from :bar))

should be an error. This would require the data format to remember the type of the clause being built.

The above change to the data format would probably help with the formatting order, which while easy to extend does not seem very robust. For example, doing INSERT INTO … RETURNING * (with the postgres extension) works fine until you try INSERT INTO … SELECT … RETURNING * as the returning clause in the extension has its priority set too low.

Devices_For · August 10, 2020, 4:48pm

IMO, it should support postgres only, will eliminate lots of complexity from the library and path the way to leverage postgres to its fullest.

seancorfield · August 10, 2020, 5:04pm

HoneySQL’s design goal has always been (and will always continue to be) multi-database support. Like many current HoneySQL users, I do not use PostgreSQL. The same applies to clojure.java.jdbc and next.jdbc: they will always support mulitple databases.

I’m perfectly happy to extend the OOTB support for PostgreSQL – even though I do not use it – but it must also always continue to work for other databases – and the fact that every database vendor supports extensions to ANSI SQL (as well as several not implementing all of ANSI SQL!) means that HoneySQL will always have more complexity out of the box than a dedicated PG-only solution.

I’d even be happy for HoneySQL to default to PostgreSQL out of the box as long as it is easy to pass a flag to format to select a different dialect (or some other approach). Given that formatting SET as part of UPDATE is dialect-specific, even some basic SQL that works on PostgreSQL would fail on other databases – and HoneySQL must support that somehow.

slifin · August 13, 2020, 9:30pm

This is probably the responsibility of a different library or out of scope at the least but if the internals are being thought about again

For every data -> SQL string function if there was the same function but in reverse a well formed SQL string could be reversed into the honeySQL structure

It does mean the SQL string has to be exactly as honeySQL expects but it does mean it’s safe to use raw SQL for most of the query which I think aids reading and editor support and then back to data when needed

seancorfield · August 13, 2020, 9:55pm

I’ve seen a couple of requests for this over the years but it is definitely out of scope because SQL syntax is large and complex and varies from database to database.

seancorfield · September 30, 2020, 8:00pm

I wanted to give everyone an update on HoneySQL 2.0 since I have been working on it quite a bit this past week. Here are some high-level notes about the status, based on issues raised in this thread – with more detailed notes following below:

When a release is cut, it will be seancorfield/honeysql "2.0.0-alpha1" so that we don’t perpetuate the unqualified lib name – see the last section of https://insideclojure.org/2020/07/28/clj-exec/ for a bit more background on that whole issue.
The new version will contain only honey.sql and honey.sql.helpers so it will not conflict with 1.0 and you’ll be able to use both versions side-by-side as you migrate piecemeal.
Documentation will be written for the data DSL and the helpers with lots of examples – and differences between 1.0 and 2.0.
There will be a :pretty? true formatting mode that inserts newlines to make the SQL readable.
Invoking functions will be a lot easier and a lot more consistent.
Inline/raw/etc are all going to be “just functions” (technically “just special syntax” – see below) and strings will be SQLized properly, i.e., [:inline "foo"] produces 'foo' in the SQL “as expected”
The helpers will all have docstrings to explain their behavior (although they will still be variadic since they are intended for people to write – for programs, generate the data DSL directly) – and they’ll be documented, as will the data DSL (see above comment about documentation!).
Anywhere that a keyword is accepted and treated as “something special” (table, column, function, operator, etc) you will be able to use a symbol instead if you want.
The whole tagged literal thing will go away and so will the types and the call syntax.
It will support vendor dialects with the default being an extensive ANSI/PostgreSQL blend – it will support everything that you currently need nilenso/honeysql-postgres for and as much of the PostgreSQL syntax beyond that as I can reasonably include. There will also be a MySQL dialect (and the beginnings of SQL Server and Oracle – which will evolve over time).
The extension mechanism will be through a few register-*! functions to add new clauses, new operators, and new “syntax” (essentially new function-like things).
You will be able to set a default dialect at app startup and then override it per-call (to format).

I mentioned (back in August, in this thread) that things like date_sub( col, interval n hour ) were hard to write in 1.0. In 2.0, that’s [:date_sub :col [:interval 24 :hour]] or '[date_sub col [interval 24 hour]] if you prefer to use quoted symbols like Datalog syntax. You can also use lists if you want: '(date_sub col (interval 24 hour)) – things like interval are implemented via the “special syntax” machinery, which are function-like expressions that can generate arbitrary SQL from their arguments, e.g., cast, inline, raw.

The %func.arg syntax will still be valid (apparently enough people use this that forcing them to write [:func arg] would be unpopular) but it won’t be enhanced with SQLization etc.

The current status is:

There’s a v2 branch with this work-in-progress code on it and the start of documentation (right now that’s just a “differences” file, for the most part).
Nearly all of the 1.0 formatting tests pass with the 2.0 code – with a few specific changes around sql/call, sql/inline, and sql/raw etc.
None of the parameterizer stuff is implemented yet but I have a plan and that will be the next piece of work (specifically named parameters will be the very next piece of work I do).
It already supports PostgreSQL’s upsert features via ON CONFLICT / DO UPDATE SET / DO NOTHING etc with other PG-specific stuff “coming soon”.

You can track progress via the 2.0 milestone. Feel free to add comments on any of the open issues there (or create new issues if you think something is missing).

zerg000000 · October 16, 2020, 10:31am

To me, as a happy honeySQL user. I want to see a few enhancements in 2.0.

SQL map validation. I want to at least knowing which part getting wrong by honeySQL definition rather than an unknown exception.
More generic way to extend honeySQL for other SQL dialects.

Thanks for your hard works!

seancorfield · February 2, 2021, 12:59am

I’m getting very close to a 2.0.0-alpha1 release!

In the last few days, I finally got to the point where all the tests pass on v2! That means that everything you could do with v1, you can now do with v2 although there are a few changes (mostly as noted in my posts above).

I’m currently working on the documentation(!) although there are still a handful of code-level changes and enhancements I need to make – based mostly on my “promises” in this thread.

See https://github.com/seancorfield/honeysql/tree/v2 for the current state of play – in particular see differences between v1 and v2.

If you want to take it for a spin:

seancorfield/honeysql {:git/url "https://github.com/seancorfield/honeysql" 
                       :sha "4cbeb170ddf4ba0d7238c43b092ffb0a33062474"}

I’m hoping to cut 2.0.0-alpha1 tomorrow so there will be a version on Clojars for folks not using the Clojure CLI / deps.edn.

Notable things still missing (there are GitHub issues for all of these):

docstrings on the helper functions
nilenso-postgres feature parity (because you can’t use that library with v2 due to the changes in the extension machinery) – although some of the features already are present (see some things noted in posts above)
TOP, FETCH, OFFSET
Spec for the DSL (this is substantially harder than I initially thought because it needs to effectively replicate the syntax of SQL itself – but I’m still planning to do something about validation)

borkdude · February 2, 2021, 9:36am

HoneySQL is one of those libs that just happen to work with babashka. At least v1 did.
Trying v2 as a git lib:

#!/usr/bin/env bb

(require '[babashka.deps :as deps])

(deps/add-deps '{:deps {seancorfield/honeysql {:git/url "https://github.com/seancorfield/honeysql"
                                               :sha "4cbeb170ddf4ba0d7238c43b092ffb0a33062474"}}})

(ns script
  (:refer-clojure :exclude [for group-by set update])
  (:require [honey.sql :as sql]
            #_[honey.sql.helpers :as h]))

(def sqlmap {:select [:a :b :c]
             :from   [:foo]
             :where  [:= :f.a "baz"]})

(sql/format sqlmap) ;;=> ["SELECT a, b, c FROM foo WHERE f.a = ?" "baz"]

At least this still works!

When I uncomment [honey.sql.helpers :as h] then I get the error:

Type:     java.lang.AssertionError
Message:  Assert failed: (= (clojure.core/set (conj (clojure.core/deref (clojure.core/deref (var h/base-clause-order))) :composite)) (clojure.core/set (map keyword (keys (ns-publics *ns*)))))
Location: 92:1

I’m going to take a look and see if this can be fixed.

@seancorfield If there is something in honeysql v2 that babashka doesn’t/can’t support, are you open to making the library .cljc so the parts that babashka doesn’t support can be dealt with using reader conditionals? I’m willing to help out with this.

EDIT: I see the library is already .cljc - awesome!
EDIT2: I found the issue. This assert is failing and I think I can fix that.

seancorfield · February 2, 2021, 5:39pm

The assert is there purely to ensure dev-time consistency between the ordered list of clauses and the list of helper functions. Moving that check into the Clojure tests would be a sensible move at this point (I think there’s a similar assert in the main namespace?).

system · August 4, 2021, 5:39am

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.