Fundamentals study group

daslu · October 14, 2020, 7:55pm

Hi. Below is a suggestion of something that I think we need to do.

It follows some Scicloj conversations at the Clojurians Zulip #data-science stream, but I think it is not just relevant to data science people. It is for people who want to reimagine what can be built on top of Clojure.

Our basic building blocks have been changing. We are no longer just using and building plain Clojure libraries, at least not in the old sense of it. Clojure is still the language that we use, and most of the things are built on top of clojure.core . But new layers have been emerging in between, and they completely change the situation.

What problems are possible/impossible, easy/hard, slow/fast to solve? What are the best practices for building tools and libraries? What are the recommended ways to perform basic tasks?
The answers to these questions are changing, thanks to new abstractions and new infrastructure.

Efficient array programming, access to the GPU, C-level performance, distributed programming, zero-copy connections to other ecosystems, and Graal Native compilation are some of the things that are gradually becoming part of everyday Clojure REPL sessions.

Recently, one of the main reasons for this has been @cnuernber’s tech.datatype library, that several other libraries rely upon. But now things are gradually shifting towards dtype-next.
Geni, Neanderthal and tvm-clj are some of the other game-changers.

We need things to go wide and not only deep – we hope that many diverse libraries will be built on top of these new layers. To support that, it is probably time to learn.

Therefore we wish to organize a study group for people who wish to learn the new abstraction layers and build things on top of them.

Here is a suggestion for the agenda and the format.

The first topic will be dtype-next . Maybe we have another topic in parallel (Geni?).
We commit to a continuous learning process at a good pace (as much as it is possible to commit to anything in 2020).
We assume that we all know Clojure but are new to the topics we learn.
We set some goals of contributions to existing and new libraries. These goals are driving our learning focus.
We meet every 2 weeks.
Between meetings, we keep reading and discussing, maybe with some “homework”.
Video recordings are used just inside the learning group, not shared publicly (so that we feel more comfortable to speak nonsense).
We keep discussing our progress with library authors, asking them for recommended directions.
Maybe sometimes a member of the group may prepare a particular topic and present it (e.g., Spark’s RDD data structures, Automatic Differentiation).

This is just a suggestion, a starting point for a discussion.

Do you have any thoughts about this?
Would any of you like to join such a study group?

mvarela · October 15, 2020, 8:54am

Thoughts: it seems like an awesome idea!
Joining: I’d like to, but I suspect I don’t have enough bandwidth to commit. I’d love to follow somehow, though.

daslu · October 16, 2020, 9:16pm

Thanks @mvarela!

daslu · October 16, 2020, 9:16pm

Since there have been several ideas for other study groups (practical machine learning, data visualization, reading Dragan’s books, etc.), let us add a question here:

Who wants to be part of the organizing team that will make the study groups happen?

icosahedron · October 16, 2020, 10:59pm

I’d be up for joining the study group. I’m interested in using Clojure for data science projects. dtype-next sounds like a good first subject.

HolyJak · October 17, 2020, 3:29pm

I’d be happy to join the study group. It is a new domain to me but one that has interested me for a long time.

daslu · October 17, 2020, 6:21pm

Hi @icosahedron, @HolyJak!
Wonderful, let us be in touch soon.

bsless · October 17, 2020, 7:37pm

The distributed systems and performance aspects sound interesting to me (each with their own field, eh?). You already know some of the work I’ve been doing on the latter, and I have been doing some work on the former, although it’s not ready to share yet. Another aspect I’m curious about and didn’t even make it to the list is how we could leverage category theory to write better Clojure code. There are three monad libraries I’m aware of and I’ve only seen them get very limited use. Is there untapped potential there or is it not worth the effort?

zackteo · October 18, 2020, 4:57am

I think this is great! Hope, this is something I can find a good timing for and commit to

@bsless For distributed systems specifically, there has been a study group that has been going on here for a few months https://www.reddit.com/r/mit6824clojure/ . They are following MIT’s 6.824: Distributed Systems course but tackling the labs in Clojure. (Unfortunately, I wasn’t able to keep up with it as weeks went by. Also cause I had random things at that timing). But think perhaps, something we can look at/ask about if and when we have a study group of that nature.

tbrooke · October 20, 2020, 7:40pm

I’ve been following some of the Scicloj discussions and watched some of the meeting but I would love to get involved deeper in a study group. All of this is new to me.

samedhi · October 21, 2020, 4:00am

I have a suggestion.

Most people are not first movers. They will adopt something only after they see value. My suggestion is to build a core with the first movers and a “beaten path” for the late joiners.

I suggest that you try to make some sort of record of what was covered in every meeting. Write down the problems and solution sets. Log the meetings. Record the sessions. Leave breadcrumbs that future people will see and follow.

I am suggesting that you make it easier for people who didn’t join initially to join at a later date. Make it possible for someone who is motivated to “catch up” to the core group.

daslu · October 21, 2020, 12:22pm

Thanks, @samedhi, this is really important.

Our current approach is to record study meetings and share with the group participants, but not upload to youtube. This way, you have some breadcrumbs as you said, but hopefully, people feel more comfortable talking.

Does it make sense to you?

I really hope we can also take some notes and organize the accumulating knowledge. But we cannot promise that at the moment, considering the pace and breadth we are aspiring to achieve in the coming weeks and months. At the moment, there are very few organizing hands, so being able to take notes will depend on the commitment and spirit of group participants.

samedhi · October 21, 2020, 3:18pm

Makes sense to me @daslu.

The Youtube thing is a bit out there, I wasn’t even sure about suggesting it. There are several valid concerns with publishing interactions, it may not be feasible with an open group.

I also want to convey that I think it is completely acceptable to be a community that does not create any of these artifacts. I only want to highlight that if you do leave artifacts around, people may be more likely to join at a later date.

Randoms Thoughts:

Shared Google Doc per meeting that everyone can edit.
https://drawpile.net/ lets you edit a shared canvas.
Shared Calendar invites seem to help a lot with attendance, maybe some service makes that easy?

Good luck!

daslu · October 21, 2020, 3:40pm

Many thanks @samedhi.

BTW there is also Precursor, an open source Clojure/script collaborative sketching app.

didibus · October 21, 2020, 10:55pm

That’s neat, though it seems pretty light on features, and didn’t work well on my mobile Chrome browser with Android 11.

lambduhhh · October 26, 2020, 4:07pm

I am a self taught professional developer and native clojurian starting to dip my toes into the world of data science. I love working in groups, learning and teaching so if there is anything you think I’d be able to offer to the group, I’d love to be involved.
I recently started a youtube channel where I will be focusing on clojure and touching on some data science topics as well.

Also came across a cool project that is relevant to geni, clojure, python and chris’s TDL… Here’s the link for anyone interested. Chris is a pretty awesome guy!

github.com

zero-one-group/geni/blob/develop/docs/simple_performance_benchmark.md

# A Simple Performance Benchmark

The Geni project was initiated by [Zero One Group's](https://zero-one-group.com/) data team in mid-2020 partly due to our frustrations with Pandas' unpredictable performance. We could have gone the PySpark way, but since the rest of the team had started using Clojure, we wanted have a crack at using Clojure for our data jobs.

The following piece does not attempt to present a fair, rigorous performance benchmark results. Instead, it is to illustrate typical speedups that were up for grasp for our team and for our specific use cases. Therefore, the results presented here should be taken with a grain of salt.

## Dummy Retail Data

In mid-2020, we worked on a customer segmentation project for one of Indonesia's retail giants. We were working with more than 20 million transactions and 4 million customers. We simulate a reasonably representative dummy data with Geni. The crux of the simulations is as follows:

```clojure
(-> skeleton-df
    (g/select
      {:trx-id    (transaction-id-col)
       :member-id (g/int (g/rexp 5e-6) )
       :quantity  (g/int (g/inc (g/rexp)))
       :price     (g/pow 2 (g/random-int 16 20))
       :style-id  (g/int (g/rexp 1e-2))
       :brand-id  (g/int (g/rexp 1e-2))
       :year      2019

This file has been truncated. show original

daslu · October 26, 2020, 4:35pm

@lambduhhh wonderful, looking forward to learning from your channel!

To everyone here: we are still planning the so-called “fundamentals” study group, which is more-or-less around learning the abstractions and implementations that enable the benchmarks shared by @lambduhhh here.

In the meantime, we are planning several ad-hoc study meetings about (“classical”) machine learning in Clojure for mid-November.
If interested, it would be great if you could mark your preferred times in the survey below.

daslu · October 26, 2020, 4:37pm

More details are discussed at the Clojurians Zulip:

Also, see our general plan of talks and meetings here:

lambduhhh · October 26, 2020, 5:27pm

Ok filled out! Is there a slack channel (on the clojure slack, specifically for for the meet) as well or just zulip?

daslu · October 26, 2020, 5:37pm

@lambduhhh great!

There is a #data-science channel at Clojurians Slack, with really nice conversations, but most of the activity around this topic is at Zulip nowadays.

Here is some more info on the relevant Zulip streams: