2021-10 - Plans & Hopes for Clojure Data Science

Since the beginning of 2021, we’ve had a habit of a monthly thread where people could share their hopes for the emerging data science ecosystem.

Through periodical updates, we may help each other catch up and think about the bigger picture, and the way our efforts may tie together. It’s also a good way for each of us to remind ourselves individually of what we have done, and what we would like to do in the near future.

It would be great if you all would consider the following questions and briefly mention your views towards them. Please skip anything that you find irrelevant. Keep in mind, these are only prompts to get you thinking.

  • Are you working on anything related to the Clojure ecosystem for data science / scientific computing / data tooling / data engineering? Let us know about it.
  • Have you been doing anything interesting in the last month?
  • Is there any new realization or change in your hopes and beliefs about the ecosystem’s future?
  • What are you hoping to create/learn/explore in the coming month? … and in the coming 3 months?
  • What developments are you hoping to see in the ecosystem and community in the coming month? … and in the coming 3 months?

Also: if you are interested to see what you or others have written in the past few months here are some links to the previous threads:

Looking forward to hearing about what everyone has been up to and hopes to be up to!


I am thinking that for the first non-beta release of scicloj.ml the following 3 stabilisation tasks are pending:

  1. Use Malli and the existing metadata for the model hyper parameters to detect, if wrong parameters are given to a model. Currently they just get ignored. That is how Smile works. We have this metadata for the models already, so It should be easy to convert them to Malli and fail, if wrong:
    scicloj.ml.smile/classification.clj at 3e4f1abef32f099358252fde5aa07069159e5719 · scicloj/scicloj.ml.smile · GitHub

  2. Use Malli to specify precisely the output of model evaluation, which is a complex map. This would then be the way to make it part of the API. It is nearly done: metamorph.ml/ml.clj at fba1897377fd2932a6449538cdd78cf928b525d6 · scicloj/metamorph.ml · GitHub

  3. Have minimal support for experiment tracking of the three important pieces of information needed around any model evaluation:

  4. Metric

  5. train data

  6. Code used
    → Declarative pipeline
    → Source code of all function used in a pipeline

This is a purely optional feature, to be enabled via the options passed to the evaluate-models function.
See it here in action: scicloj.ml-tutorials/experiment_tracking.clj at train-test-result-change · scicloj/scicloj.ml-tutorials · GitHub

I just started with Malli, but I see it very promising to further “protect” the public methods in scicloj.ml from wrong data.
One of the biggest “beginner hurdles” in scicloj.ml are to me “cryptic errors” in case of wrongly using its API (= passing wrong things to the functions)


I have been working on tools in the machine reading and discovery space, starting with Forth in the 1980’s, through Java and some Python these days, but now migrating to Clojure and quite possibly Datalog. Machine reading and NLP seem ripe for clojure/datalog exploration.


Back in August I showed my literate Clojure workflow using orgmode and SVGs

I’ve since been using it to make maps in Clojure/orgmode. It’s basically just a matter of gluing a few libraries together and dumping out an SVG


This is part of a larger project I’m in the middle of, so there is a lot of “extra” stuff going on with curling GeoTIFFs and wrangly GeoJSON - so sorry about the huge code blocks. You can scroll past that to see the results towards the end. I think given what it’s doing the result is actually pretty minimal in terms of lines of code

This is just a proof of concept. Lots of small issues… weird corner cases with huge shapefiles, code isn’t designed to work in the Southern Hemisphere… stuff like that :slight_smile:

1 Like

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.