After all this time learning (and enjoying) functional programming with Clojure, and after knowing that exists Spark for “Data Mining”, there are Clojure alternatives? cause seems the same thing, as it is made in Scala, could be same thing in Clojure.
Anything that is been used, as a Spark alternative?
is one option, a Clojure library for Google Cloud Dataflow / Apache beam.
Like Spark, Google Cloud Dataflow is rather complex to use. Therefore I often just use one big cloud instance to do all the data processing in plain Clojure. Recently more and more Clojure libs for data science are published, like this one:
I think one way to look at this is to ask yourself what you want to use “Spark-like” for. From my experience, people tend to consider Spark when they have one of these needs:
Processing data that’s larger than RAM available - aka medium-sized to “big data”.
Streaming processing
Working with Data Lake files - like Parquet on S3.
A decent DataFrame-based processing.
So depending which particular use case you care about, there are already several options possible. 2, 3 and 4 have been addressed by several people above.
And, personally, for 1 and 3 I would urge you to reconsider needing Spark at all. It is a very heavy tool that in most cases is not needed when the data is medium-sized. Modern SQL tools like DuckDB allow working with serious data sizes (gigabytes+) and all you need is an existing tool like HoneySQL or next-jdbc.