Appreciation of core.matrix

mars0i · September 10, 2020, 4:45am

Some background: core.matrix is an interface and home to multiple matrix/linear algebra libraries. In the best case, you can make a one-line code change and switch matrix libraries. I have the sense that core.matrix doesn’t get a huge amount of affection these days, although I still see issues and PRs submitted, so maybe I’m wrong. I believe there’s more interest in the Neanderthal matrix library, which I am sure is quite justified. There is a core.matrix interface to Neanderthal, denisovan, but I understand that there are often advantages to writing code directly in Neanderthal.

Recently I was implementing a simple neural network algorithm that’s part of a proof by Siegelmann and Sontag (or see Siegelmann’s book), and thought I’d try Neanderthal. (I’d only used core.matrix in the past.) Neanderthal worked perfectly, but trying to run Siegelmann and Sontag’s algorithm in Neanderthal made me realize that Neanderthal’s floating point numbers weren’t sufficiently precise for the algorithm.

So I rewrote my (very few lines of) code in core.matrix, and implemented the network with clojure.lang.Ratios in core.matrix’s ndarray library. Ratios are native Clojure abitrary precision rational numbers, and ndarray is written in pure Clojure, so it preserves Clojure’s numeric type semantics. That was exactly what I needed. I got the algorithm working and finally understood the proof.

I’m not aware of any other language in which I could have done this! (For example, OCaml, my other favorite language, has a good matrix library and a couple of arbitrary-precision libraries, but no matrix library with abitrary-precision numbers.) The fact that we have core.matrix in Clojure, with a common interface to several libraries, means that even in unusual use cases like mine, there is a good chance that there’s a matrix library that will work. And if you write to the core.matrix interface, you may not have to change your code when you need a different library. (Of course, it’s not always so simple. Sometimes core.matrix code that works with one library can become slow or even break when you switch libraries.)

I think that core.matrix was and is a very difficult project, for which Mike Anderson deserves enormous credit. Maybe it’s too difficult, without a big community of users and contributors, and I don’t think it has that. Most people who need a matrix library quite reasonably want the fastest possible floating point matrix code, and for that, writing directly to Neanderthal may be the best solution.

However, core.matrix has been incredibly useful to me. I needed to say that. I hope that it persists and grows.

daslu · September 10, 2020, 9:38pm

Thanks for this article! Really clarifying the relationship of these libraries nicely.

What an inspiring use case, and a nice argument for using abstraction layers.

I haven’t used core.matrix in a while. But in the past, I really liked it as a user. And its approach towards abstraction taught me a lot.

mikera · September 14, 2020, 1:57pm

Original author of core.matrix here

I’m glad you have found core.matrix useful and this is exactly what it is supported to be: an abstraction layer over the infinite possible variety of underlying concrete implementations. The motivation came from discussions at a Clojure Conj many years ago (2011 I think?), when several people observed the problem of many competing (but useful!) different implementations of matrix libraries in the Java ecosystem. core.matrix was simply a pragmatic interpretation of the fundamental ideas of array programming (APL etc.) that could work in a uniform way across different underlying implementations (Clatrix, Vectorz and good old nested Clojure vectors mostly at at the time). It was inspired by the Clojure sequence abstraction which demonstrated how you can create higher level code to work effectively over a wide range of different sequential types.

I haven’t been using Clojure personally for array programming for a while, but I still think the ideas are important and that Clojure is an amazing tool for data science with these kind of abstractions. I had a great time building core.matrix and it helped me grow as a developer. I’m still happy to maintain the library and merge PRs for anyone who wants to contribute.

My one regret, perhaps, is that I didn’t communicate the ideas well enough and Clojure didn’t get massive traction in the data science / AI space. I still think Clojure is better in many ways than Python etc. but it’s a critical mass game when you need to build large amounts of tooling and implementations that work together. Hopefully Clojure and core.matrix can continue to deliver amazing value in the niches where they thrive!

geokon-gh · September 15, 2020, 2:30pm

I’m not some gnarly numerics dude, but having used both last year, I was left with the impression that Neanderthal is a much better option if you don’t mind the MKL dependency (which often is an issue for me)

core.matrix are as you mention code will work under one backend and not under another and I kept coming across bugs with backends and even things that didn’t work on the default backend

I might in the end be wrong about this (since I think Julia also picked it up) but in the end I feel the whole nD-array thing is something that looks theoretically really cool but it’s actually very harmful. From a practical stand point 99% of usecases don’t need more than 2D arrays (see: MATLAB) and the added layer of abstraction just makes the interface more confusing and the whole issue with “are vectors 1byN arrays or 1D arrays” now became some center stage thing I need to wrap my head around instead of some background detail.

But the real issue is that it’s completely disconnected from how the machine works or how memory is layed out or whatnot. I ended up feeling like some overly clever theoretical physicists that tried to make a new paradigm and make everything generic and about spherical cows and then just chucked out half a century of work on computation linear algebra. The end result is that it sorta maps easier to textbook-math (though that’s usually 2D as well) but you end up very disconnected from the performance of what you’re writing.

BLAS is actually not just some cruffy old thing - it’s actually an incredibly well thought out set of primitive operations. (a smaller set than what you would think as “primitive” in a linear algebra class). Very simple operation on-paper end up being impractical on a machine and you need to be finding ways around them and looking for simplifications algorithmicaly (something like MATLAB will do this automagically for you most of the time)

And Neanderthal is just a wrapper about BLAS. It’s in effect doing much less… and the result is it feels less buggy/flimsy. Everything just works and works fast. The downside is that you do need to write a lot more boiler plate b/c there are basically no convenience functions (and hence fewer footguns). It’s more challenging to implement algorithms but you end up writing stuff that is much more “correct”. I think I rewrote a QR decomposition from core.matrix to neanderthal and it made me really appreciate how much more nuanced it was compared to what you’d learn “on paper” so to speak (and how crappy my core.matrix code was haha). Getting the thing to run with just BLAS functions was actually tricky - but the end result felt a lot more correct and was fast. I maybe could have then rewritten things back into core.matrix and had better code - but the library doesn’t push you to write good code

I also really enjoyed the matrix types that Neanderthal has - I think they’re very practical and also pushed you to think about your code harder. The equivalent in core.matrix wasn’t nearly as coherent.

I think in the end if you’re writing linear algebra code you typically need to have a good sense of what your computer is actually doing in the end - and that’s just much easier to reason about with BLAS.

If you just need to slap together some equations and test something from a book/paper then core.matrix is sorta fine but now a days I honestly just use Octave/MATLAB. It’s less of a headache

But, that’s like… just my opinion man. I’m happy we have both

system · March 17, 2021, 2:30am

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.