Since the beginning of 2021, we’ve had a habit of a monthly thread where people could share their hopes for the emerging data science ecosystem. The place for those threads has been the Clojurians Zulip . We decided to move that to Clojureverse. There are many new friends getting involved , and it seems important to have this dialogue in a more visible place.
Through periodical updates, we may help each other catch up and think about the bigger picture, and the way our efforts may tie together. It’s also a good way for each of us to remind ourselves individually of what we have done, and what we would like to do in the near future.
It would be great if you all would consider the following questions and briefly mention your views towards them. Please skip anything that you find irrelevant. Keep in mind, these are only prompts to get you thinking.
Are you working on anything related to the Clojure ecosystem for data science / scientific computing / data tooling / data engineering? Let us know about it.
Have you been doing anything interesting in the last month?
Is there any new realization or change in your hopes and beliefs about the ecosystem’s future?
What are you hoping to create/learn/explore in the coming month? … and in the coming 3 months?
What developments are you hoping to see in the ecosystem and community in the coming month? … and in the coming 3 months?
Also: if you are interested to see what you or others have written in the past few months here are some links to the previous threads:
I started working on my vision for a feature processing service. The super high-level concept can be seen here https://gist.github.com/jcpsantiago/320e3665a9bd749fc25ede0341c6323c . Such a system would compute features for models on-demand, and also store any computations in a database (as part of a larger “feature store” system) to enable data scientists to train models without having to rewrite code for data transformations multiple times.
For me personally, it’s a necessary step to finally deploy my company’s anti-fraud (XGBoost) model using clojure. At the moment I’m stuck with R because of the recipes package doing all the preprocessing, which is then used in the workflows package during cross-validation.
I’m still surprised nobody has done this (especially the larger companies using Clojure (looking at you Nubank), instead of using single threaded python pipelines/rewriting code in Scala/dumping everything in complicated pieces of software such as Kafka and Spark.
This is a bit late for a September update, but I thought I’d still use this opportunity for a small update.
currently working on:
a new version of Notespace, that will mostly be more seamless to use in various Clojure dev environments, and also easier to understand and to contribute to
a couple of the Scicloj study groups
planning the re:Clojure’s pre-conference workshops, that will take place during November
new realization:
Making sure that things are teachable is a good way to make sure they are usable. Therefore, the process of preparing the November workshops is valuable in the process of improving the emerging stack of tools and libraries.
personal hopes:
1 month:
Make the new Notespace work for our current needs.
Help preparing the November workshops.
3 months:
Help in preparing the data science stack to be shared with people who are new to Clojure.
community hopes:
1 month:
Polish the existing libraries in the fields of data wrangling, visualization, statistics/machine-learning, and tooling.
Prepare the November workshops.
3 months:
Bring the current data science stack to a state where it can be introduced to people who are new to Clojure.