For a client I’m evaluating potential solutions for doing a specific ETL job. One of the contenders is Onyx, which seems like it could be a great match. There was some pretty positive buzz around it in 2014-2016, but since they’ve been acquired by Confluence things have gone eerily quiet. The site says ©2016, the twitter accounts went dead around 2018. It seems at least one person is keeping the lights on by merging stuff on Github from time to time, so that’s something, but it’s not a lot.
I’m curious to hear from people who have used/are using Onyx. How was the experience? Do you consider this an issue? And lastly, how complex would you say Onyx is?
This last question is of course hard to quantify, but I’d like to get a sense of it for two reasons. I don’t want to drag much more complexity onto this project than is justified for the size of the problem, and if Onyx is looking to be unmaintained then I need to feel comfortable that we’ll be able to do our own maintenance and troubleshooting, basically taking ownership of the code if necessary.
Hi, my org used Onyx and were quite happy with it but moved away form it to Apache Beam concerned of long-term sustainability.
I recently announced a Clojure library (alpha version) for Apache Beam on Clojure Google Groups, https://groups.google.com/d/msg/clojure/DTxPnCbb_Wo/jCd9uRivCQAJ, and meeting the ideals of Onyx on a well-supported platform is one of my goals. (Here is thurber, the Clojure library: https://github.com/atdixon/thurber)
Given that thurber is just publicized and in alpha this may be too early for you to look at it for a real client. (Though happy to help you navigate it if you do consider.)
I will say that both Onyx and Beam (and generally modern streaming frameworks) do have a steepish learning curve. But for straightforward ETL jobs you may have a faster path. Onyx had some nice levers that let you trade performance for implementation ease, but Beam makes you do things “the right way” (i.e. “scalable” way) out fo the box.
I will also add that the Beam maintainers are very active on their mailing lists and have been really engaged and thorough with responding to questions and they seem to be on a rapid release cycle.
Hope this helps some.
Hi there, guess I am late to the conversation,
and neither do I have experience with Onyx
Just curious what solution you settled for in this case.
I think Kafka Streams and also together with KSQL might be interesting for certain similar scenarios.
Personally, the ETL needs I encountered so far could be solved in a SQL.
dbt is a great tool.
This, however, won’t work if batch processing is not an option. Regarding throughput I think you can go pretty far without one of these really big system.
We also had great success with TimescaleDB which is a timeseries database built on Postgres. It handles event data pretty efficient and with continuous aggregates you can build “realtime” views for certain use cases.
The project has been put on ice because of funding constraints, but we were looking at a fairly simple solution using Kinesis (basically aws’s kafka-alike) and Datomic.
This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.