HoneySQL -- What do you want from 2.0?

At World Singles Networks llc, we use HoneySQL very heavily. One of my former colleagues gave a talk about our use of HoneySQL at Clojure/West 2015. Fairly recently, I officially took over the repo from the original author Justin Kramer (and it’s been through a number of maintainers over the years). It has always sought to be vendor-neutral, focusing on ANSI SQL and this has often been a tension when looking at issues and pull requests. The general message has been to look at extension libraries (such as nilenso’s honeysql-postgres) for anything that is “non-standard” but it’s become clear over the years that this is neither very helpful nor an entirely tenable situation: the nilenso library is, I believe, the only HoneySQL extension library – we do not have equivalents for other databases.

I think part of the problem there is that it’s non-trivial to add new syntax support to HoneySQL and also that folks tend to extend it locally and don’t feel their individual extensions are worth turning into a library.

As noted in the GitHub issues, I’ve been thinking about what “2.0” might look like for quite a while and I’d like to widen the discussion here and see what sort of things folks would like to see in a future, reimagined version of the library.

From my point of view, I don’t think 2.0 can be backward compatible – it needs to be new namespaces, possibly a different data structure (although that is part that I’d like to retain), with simpler helpers (the merge/non-merge versions cause confusion and the helpers are not always consistent with the underlying data structure), a much easier extension process, and a bigger tent for SQL dialects.

I think it needs to encompass more of the SQL dialects – and where those conflict, it needs to provide dialect-specific ways to format the data structure into SQL strings (the different priorities of SET between MySQL and PostgreSQL come to mind – see Issue #235).

So let’s have a no-holds-barred discussion about what you’d like to see in HoneySQL, what problems you’ve hit, what just plain bugs you about it…

10 Likes

How timely! I just finished a round of implementing a somewhat complex query using both HugSQL and HoneySQL to compare the two approaches.

HoneySQL wins when I want to manipulate the query at runtime: While HugSQL gives you the full Clojure power it’s a bit weird. The case that made me lean towards HoneySQL is adding a where clause that requires an extra join. I can express that like so:

(defn with-userlabels [q label-ids]
  (-> q
      (h/merge-left-join [:userlabels_array :ul] [:= :ul.user_id :users.id])
      (h/merge-where (userlabels-qf label-ids))))

Things that have been bugging me (i.e. why did I even consider HugSQL?)

  • documentation is hard to come by. The README covers some common cases but I feel that for many things I had to read the source (more on that later) or just google and see an issue where jkk would give some guidance.
  • I cannot understand the principles of organising the namespaces/functions. I have to require the helpers (obviously), the types (to use Postgres arrays), the format (to register the && array operator) and the core in order to get the job done.
  • The parameter handling is very eager, and raw/inline is also confusing. I want to construct the string where posts.state = 'publish' (note the single quotes, this is a literal string) and I had to resort to (honeysql/raw "'publish'") because inline was not quoting the string, and just passing in a string would generate a parameter (which is not what I want, since a literal string will be auto-casted by postgres).
  • Not generating pretty-printed SQL makes for very hard visual inspection.
  • I can only use the helpers and still generate an invalid sqlmap, which I will only find about when I call format. (I mean that format will throw an exception).
  • Most of the time I had to trial-and-error to figure out the correct syntax of things, e.g. the with helper. This is mostly because it’s not clear when I need to use a vector and when I need var-args. Reading the source wasn’t very enlightening.
  • Overabundance of keywords. I like Datalog’s use of quoted lists.

With all that said… it’s still an improvement for doing programmatic manipulation of sql queries. And, once you have a complex example in your codebase or a few small helpers for common stuff, all these issues disappear :slight_smile:

Here’s some wild ideas for 2.0:

  • Pretty printed SQL
  • An interactive playground that auto-generates pretty-printed SQL as you type.
  • A curated set of examples for said playground
  • Better naming (builders, handlers, helpers oh my!)
  • tagged literals for escape hatches
  • More emphasis on named parameters rather than positional. The way e.g. psql handles them looks very interesting. The rationale being that I would much rather define my own “slots” in my query explicitly.
  • Perhaps support for outputs that are not sqlvec but that could be literally copied-pasted into an SQL program. Bonus if named parameters could be passed-through as-is in a way that psql understands.

Thanks for taking the maintenance burden of HoneySQL, and thanks for opening this discussion that will inevitable lead to a ton of bike-shedding :smiley:

7 Likes

Thanks for the detailed feedback. I agree with pretty much everything you wrote, except perhaps for named parameters (and the link you gave doesn’t seem to address that?).

A big issue for a lot of people seems to be writing function calls, especially ones that take non-simple parameters (e.g., date_sub( col, interval n hour ) is hard to write in HoneySQL). Is function call syntax not a pain point for you @orestis?

Yes it is - I ended up writing my own cast, coalesce and jsonb helpers that do the right thing. I think the % prefix just muddled the waters.

What I haven’t found in honey sql that hugsql seems to nail is explicit quoting for identifiers, values and raw strings. I’m afk at the moment but I can share more explicit examples next week.

1 Like

For context I don’t really use HoneySQL in anger but as a hobby I am writing a PHP library that can communicate via Transit and Babushka to HoneySQL

The pretty print feature is going to be big for me, the kinds of queries where I want to use HoneySQL are nearly always large and so having multiple lines will be a game changer

I’m still not 100% clear on when to use raw vs inline vs param, that might just be something for the docs

1 Like

I use HoneySQL as well as the postgresql extensions at work. I am fairly familiar with the internals as I’ve had to write my own ltree support. My biggest pain point is the use of macros to generate the helpers. I use Cursive, so I have to type them every time. IMHO the few lines of code saved in the library is not worth the loss of helpful docstrings and autocomplete for some users.

And while it is not a major pain point, I would add input checking to the helpers so that the data representation will actually result valid SQL.

(-> (sql/select :foo)
    (sql/delete-from :bar))

should be an error. This would require the data format to remember the type of the clause being built.

The above change to the data format would probably help with the formatting order, which while easy to extend does not seem very robust. For example, doing INSERT INTO … RETURNING * (with the postgres extension) works fine until you try INSERT INTO … SELECT … RETURNING * as the returning clause in the extension has its priority set too low.

1 Like

IMO, it should support postgres only, will eliminate lots of complexity from the library and path the way to leverage postgres to its fullest.

HoneySQL’s design goal has always been (and will always continue to be) multi-database support. Like many current HoneySQL users, I do not use PostgreSQL. The same applies to clojure.java.jdbc and next.jdbc: they will always support mulitple databases.

I’m perfectly happy to extend the OOTB support for PostgreSQL – even though I do not use it – but it must also always continue to work for other databases – and the fact that every database vendor supports extensions to ANSI SQL (as well as several not implementing all of ANSI SQL!) means that HoneySQL will always have more complexity out of the box than a dedicated PG-only solution.

I’d even be happy for HoneySQL to default to PostgreSQL out of the box as long as it is easy to pass a flag to format to select a different dialect (or some other approach). Given that formatting SET as part of UPDATE is dialect-specific, even some basic SQL that works on PostgreSQL would fail on other databases – and HoneySQL must support that somehow.

2 Likes

This is probably the responsibility of a different library or out of scope at the least but if the internals are being thought about again

For every data -> SQL string function if there was the same function but in reverse a well formed SQL string could be reversed into the honeySQL structure

It does mean the SQL string has to be exactly as honeySQL expects but it does mean it’s safe to use raw SQL for most of the query which I think aids reading and editor support and then back to data when needed

I’ve seen a couple of requests for this over the years but it is definitely out of scope because SQL syntax is large and complex and varies from database to database.