What is going on in the Clojure data science scene, and why does it matter to you?
Yesterday’s data science public meeting was one of the important experiences I’ve had in this community.
I could not sleep well, so here are some notes. This is a personal, partial view.
The video will be up soon anyway. A couple of us are also working on more informative, analytical blog posts, that will provide a better view of the state of the art.
What happened in the meeting?
For many of us, any meeting is an effort in this unusual period. As usual, people from diverse places attended, with very different hours in their local clocks.
Why would anybody bother to join yet another technical meeting early in the morning or in the middle of the night? What are we seeking here? Why is it so important to figure out a way to do data science in Clojure?
Let us come back to that later.
We began with some login problems. Trying to solve that quickly, I carelessly shared the meeting link publicly. Twitting it was a wrong idea. In seconds, the meeting started filling up with new visitors. It took a moment to realize: they were not there for anything related to the meeting topic. This was bad, a reminder of how horrible the internet might be.
It was comforting to see how caring and supportive this community could be in a moment of trouble. We recovered and started the conversation with a different link. Apologies to anybody who did not get that link.
In the meeting, new faces appeared, and new voices were heard. Not less important, there were many quiet voices, listening. With some of the new friends, we have had good email conversations while preparing for the meeting. There are very special and beautiful minds in this community. When these will be in dialogue, beautiful things may emerge.
The lightning talks presented diverse topics: tooling, new libraries, applied research, conceptual visions. One repeating theme of innovation was workflow and interaction in data exploration.
In the discussion afterwards, probably the major topic was making the ecosystem accessible: how to make it easy for anyone to pick and use Clojure for common data science tasks.
One related issue was setting up a working environment. In Clojure, such things are relatively easy, thanks to the JVM and the emphasis on stability and simplicity. But common scientific computing tasks typically require native libraries and platform-dependent setup. Thus, setting up an environment in a reproducible way is not always trivial. We discussed some options to cope with that.
Another related chat was about the format and agenda of future meetings, and how they can be made useful as practical introductions to tools, libraries and workflows.
We also had a nice Q&A conversation, where some members shared their problems and needs, and others offered possible answers, essentially drawing a picture of the current ecosystem landscape.
A Lispy approach
Most of this world does not make sense. On our search for sensibility, we sometimes need to explore. The Lisp way is about that. It is about making things open, visible and dynamic enough so that we can wonder, look and explore.
Clojure makes that practical. As the ecosystem is growing, it is becoming an enjoyable platform for explorations of data, randomness, logic, uncertainty and AI.
Arguably, this is relevant to any person who is willing to explore the world, and is a little bit open to hear about Lisp.
What is happening to us?
We are going through a transition.
For some time, a small handful of people has been working on a set of core libraries, that would make Clojure into a great platform for data science. The core functionality for data processing, scientific computing and machine learning is now maturing. Most of it is already there. Hopefully, soon it will be reasonably complete, robust, fast and scalable.
What is still missing is specific libraries for certain applied fields. NLP, time series analysis, geospatial processing, probabilistic modelling, and feature engineering are some examples. Some of these are low-hanging fruits. Conceptually, these missing libraries lie one layer above the existing stack. Several of them will be mostly about creating abstractions and grammars to wrap existing libraries. Clojurians love to do that, so we can be hopeful about this stage.
In this phase, we need to go wider. Diverse problems, use cases and views are important in picking these fruits. More people need to be involved. Coordination and dialogue are becoming more important.
After all that matures, maybe in several months, we will be in a different situation. We will be able to reach out to the wide world and offer Clojure as a platform for data science.
You are needed
If you are a Clojurian, then all this story is probably very relevant to you. Hopefully, you may find it exciting. Certainly, it needs you. You can make a difference and be influential in this story.
If you are not a Clojurian, but curious about anything in this post, then you are certainly needed. Your view of things will probably affect us in new, refreshing ways. This can help a lot in figuring out our future directions.
Let us make that concrete.
Your problem domain
These days, we are still learning to use the new emerging libraries and tools. Any use case will teach us something.
If you have any data-related problem that you can discuss with the community, then let us talk. Let us think together and see how the tools can be applied. Usually, that can be very fruitful.
A blog post, a tutorial, or a small library wrapping the stack for your specific need can have an impact, and can be a joy for others.
From its early days, the Clojurescript stack and culture have been a source for novel ideas, that anticipated and inspired the progress of other client-side technologies.
Yesterday’s talks show how fruitful these can be for data exploration.
Of course, some of the tools we saw did require lots of careful work to develop. But the barriers are gradually becoming lower. It is now relatively easy to invent and experiment with user interfaces and workflows.
If you are interested in UI development and wish to contribute to some of the tooling and visualization libraries, then there are beautiful problems waiting for your attention.
Several of the emerging libraries are looking for contributors or co-maintainers.
Lots of functionality is already there or almost there, just needing someone to wrap it up with a neat API, and demonstrate its concrete use in a problem.
At the Scicloj community, we are trying to build a space for library authors and users to think together about the Clojure data science ecosystem.
There are many ways to contribute to this process. Even organizing one small meeting can have an important impact on the group and on concrete projects. We have seen that several times.
Personally, I think that I have experienced mostly kindness, patience and helpfulness in this community. But that does not mean we are not facing any human challenges.
Can we make the community into a place where any member feels at home? Can we make sure that our ongoing processes are driven by group reasoning? Are all the voices being heard, and can they be influential? Can we achieve diversity, and can we welcome beginners?
These are all crucial questions.
It seems that several members are continuously thinking about them. Hopefully, we will come up with answers. We need you there with us. We need a more diverse group of people to reason about all that.
If you wish to join as a community organizer, that would be wonderful. If you have 30 minutes a month to think a little and write your opinion about our challenges, that would be wonderful too.
What will happen now?
Through our process of building this community, we are having ups and downs, surprises and strange realizations. It is difficult to imagine the coming few months.
What I wish to suggest here, is that we try to be in dialogue.
The #data-science stream of the Clojurians Zulip chat is the main place for these discussions. But most of the relevant people are here at Clojureverse too.
If you are interested in anything mentioned in this post, it would be great to talk with you, in person or publicly.