i have a new M1 Apple MBPro and am
trying to make it work for me. one of my code bases
to do useful things i’ve had to to depend on libraries like Intel’s
oneAPI Math Kernel Library ( oneMKL ), and Clojure
packages like the uncomplicate/neaderthal library
(https://neanderthal.uncomplicate.org/).
for java on the new M1 i have Azul’s Zulu build of the openjdk (v17).
i’m also seeing lots of activity around other jdks:
and am too ignorant to know whether I should hope to find
better implementations associated with openjdk, temurin, …?
i’m wondering if anyone has thoughts/insights/pointers re:
how solid math libraries (like BLAS) best land on Apple’s Accelerate
Framework, M1 custom AMX2, etc.?
I’m not informed enough to answer all of those questions, but just responding to the title of the post:
You probably know that core.matrix is more or less an interface to various underlying matrix libraries. There are at least two pure-Clojure implementations, and at least one pure Java implementation. There’s also a Clojurescript implementation. One or more of those ought to run on any hardware that supports Clojure or Clojurescript. (core.matrix implementations don’t necessarily implement all of the same functions–there will be corner cases for each–but for the functions that that they do implement, you can write identical code. In the best case scenario, there’s just one line of code to switch between them.)
None of the core.matrix implementations is likely to run as fast as Neanderthal on the hardware it runs on, especially when it can run on a GPU–unless you use Neanderthal as an implementation of core.matrix, but that won’t expose all of Neanderthal’s capabilities.
thanks for your reply @mars0i ! your understanding seems about the same as mine.
i think this winds up being more of a java/jvm question than a clojure one: which ones can best leverage Apple’s Accelerate framework on BLAS? when this gets solved i guess the core.matrix implementation interface will need to get written.
I think if you want something working any time soon the best way to get going is to use dtype-next’s ffi system and bind to the framework’s C BLAS functions directly.
Well, something based on the framework. I agree with previous comment that core.matrix has pure java implementations. Additionally you could try ojAlgo.
It really depends what you are trying to do. The good news is that you have various options. The bad news is that all have pros and cons.
If you want flexibility and cross-platform compatibility, I would suggest a pure JVM option like vectorz-clj. This isn’t M1 specific, but will take advantage of any M1 optimisations provided by the JVM. I find this convenient good enough for most normal purposes. I would recommend this as long as it meets your performance needs.
If you care about maximum performance, you will need to use native CPU / GPU implementations. The problem here is that (to my knowledge) has made a really good complete and well-maintained implementation. This is sadly a big “missing piece” in the Clojure numerical computing space. I made an experimental GPU implementation (https://github.com/mikera/vectorz-opencl) but not complete, YMMV. There is also Clatrix (CPU with BLAS) which may work for you, but that is also quite old and not much maintained.
Libraries which do not support core.matrix (like Neanderthal) may also do what you want, but of course you then give up the advantages of working with the core.matrix API abstraction and will be tied to specific implementation decisions. You will need to check M1 support here.
You can of course use any of the many Java matrix libraries through Java interop. Many of these are very good and very mature. But again, you won’t get core.matrix compatibility unless there is a wrapper implementation.
Sadly, it brings yet another incompatible ndarray API. This is exactly the problem core.matrix was designed to solve, but people seem to like reinventing their own matrix APIs…
I wouldn’t recommend going in that direction. In my admitted rather limited experience, the resulting code of using a performant BLAS/LAPACK API (ie. neanderthal) looks very different from using a more casual ndarray API. core.matrix makes it easy to turn math equations to code (kinda like MATLAB) - but it’s also really easy to write very bad performance code. the BLAS API forces you to write faster code
I’d write core.matrix code as a first pass and then deal with performance later …