2021-08 - Plans & Hopes for Clojure Data Science

geokon-gh · August 8, 2021, 5:48am

Thought I’d share here: I’ve sort of ironed out a literate Clojure workflow

https://geokon-gh.github.io/literate-clojure.html

It allows me to have a single-file clojure “project” that loads libraries dynamically and generates plots with inline SVG. (I’ve got other documents and projects in the pipeline, but they’re not ready to share yet unfortunately)

It’s made me really strongly believe in the thi-ng/geom architecture - with SVG hiccup being the correct “interchange” format. I think it’s a shame geom never took off, but I get that it’s a bit harder to get into b/c instead of being a monolithic plotting library with one entry point (like some central plot function with a million options) it’s a composable set of namespaced mini libraries with SVG being the ultimate output. Some of the minilibraries are lower level (matrix/transform/color/math) and some are higher level and built on svg (viz for plotting, other more complex ones for generating meshes and 3D images)

It’s all highly composable and the resulting SVGs are very flexible. This has two primary benefits:

The first is that It’s naturally very easy to extend the existing functionality. It’s very easy to write you own custom visualizations/plotting functions and to tweak/dial-in graphs. The existing plotting functions are quite capable and complex: geom/core.org at master · thi-ng/geom · GitHub and are a great starting point to making your own - which you will inevitably need to do. There are tons of options already, but many things are missing. If you say… want some error bars on your scatter plot then you’ll prolly need to implement it yourself. All the code (so far) has been very digestible and I’m not a Clojure guru by any stretch. I’ve never felt so comfortable looking at and extending someone’s codebase

The second is the flexibility of resulting SVG hiccup. You can manipulate the hiccup directly and modify it in any way you’d like. SVG is also a pretty pleasant and seems pretty well designed. It’s very modular and you can embed SVG in other SVG. If you want to add bar plots to your scatter plot, you just make the two plots and svg/group them together. Boom, done. So you can generate different graphs apply transformations etc. and compose them to generate multi-plot visualizations very easily. There is a bit of a learning curve, but once you get the hang of it, it starts to feel like you can plot anything with a bit of effort. There is not a lot of “meta” functionality though - but it’s very easy to write your own. I wanted to be able to arrange plots in a grid, but you don’t get a MATLAB-y figure(i),subplot(i,n,m,"blah") type of functionality. So I wrote a thing to do that in a few hours on a weekend. It felt very accessible and I guess I feel like I’m in control and not just subject to what the library provides (if that makes sense?). I’m rarely fighting the system and trying to massage it to do what is typical in MATLAB/R/etc.

And then in the end you can display/render the SVG in a myriad of ways. You are no platform constrained in any way. You can open a webview, you can use Batik/SVG Salamander, you can just serialize the hiccup and spit to a file. I even wrote a quick svg-hiccup to JavaFX renderer (using JavaFX graphics primitives - not a webview) for a larger GUI data processing application that has some simple in-window plots: corascope/svg.clj at master · geokon-gh/corascope · GitHub
It was very easy and a day or two of some fiddling,

Anyway, I’m just throwing it out there if people are looking for some alternatives. Last I poked around Scicloj and company it’s all very Vega JS webstack focused. Which is a pragmatic solution, but felt very not Clojure-y and non-extensible. (and of course everyone loves to rewrite functionality in existing libraries ) So here is a less capable but more pure Clojure solution. Hope it’s useful for someone