Blog post: DuckDB - Data power tools for your laptop, now in Clojure

Harold · September 11, 2023, 4:14pm

Hello! Please read our new blog post about accessing DuckDB from Clojure through TMD

: TechAscent - DuckDB - Data power tools for your laptop, now in Clojure

Wherein, we join 1.4B+ rows on a laptop in 1s.

The story continues to coalesce: Single developers, or small teams, with functional data science can accomplish processing tasks that would otherwise drive longer timelines, higher headcounts, and involve bigger machines, and dramatic/unwieldy tools.

ChipNowacek · September 11, 2023, 7:20pm

Thanks and congrats.

maxweber · September 12, 2023, 5:38am

Thanks a lot Super interesting.

We want to rework our system that calculates the business metrics about our SaaS. We struggle to fit our Clojure and Datomic data sources into the Google BigQuery ecosystem. Mainly since you need to convert everything to fit into a relational database schema. Would you recommend to give DuckDB + TMD a try for this situation?

Harold · September 12, 2023, 3:14pm

Hey Max - I’ll copy/paste my response from reddit here, since I think it has some good ideas in it.

Thanks a lot Super interesting.

You’re welcome. Thank you.

We want to rework our system that calculates the business metrics of our SaaS.

Now that sounds super interesting. (:

We struggle to fit our Clojure and Datomic data sources into the Google BigQuery ecosystem. Since you need to convert everything into a relational database schema

Right! It’s interesting that there’s no mention of schema anywhere in the article, however, both TMD and DuckDB are strongly typed; columns are homogeneous and tables are (for all intents and purposes) rectangular.

We’re getting a lot of leverage in this area from two things (1) DuckDB’s CSV import detects data types automatically, and creates the table schema with no additional user input (and it does this surprisingly well, imo). (2) Both TMD and DuckDB know the types of all their columns, so in general data between them can discover and late bind schema information, again with no additional user input.

That hides a lot of the drama associated with ‘the relational database schema’ you mention, but it’s still there under the hood - we regard this as a good thing, this columnar orientation is both why TMD can use so little RAM, and why the DuckDB query engine can be so fast.

would you recommend giving DuckDB + TMD a try for this situation?

So, the answer here is definitely maybe, and it depends, mostly on the exact data shapes and quantities and such. Our typical strategy is to do the actual data processing in Clojure, if possible, which is by far the most flexible, and then to graduate to involving other tools as necessary.

Happy to discuss it further, if you’d like. Fill out our contact form here and we can get an email thread going, or hop on a call:

TechAscent - Effective Software Solutions

maxweber · September 13, 2023, 12:55pm

Thanks a lot for your reply I just wrote you via your mentioned contact form.

system · March 14, 2024, 12:56am

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.