I’m curious to hear from people who have used/are using Onyx. How was the experience? Do you consider this an issue? And lastly, how complex would you say Onyx is?
This last question is of course hard to quantify, but I’d like to get a sense of it for two reasons. I don’t want to drag much more complexity onto this project than is justified for the size of the problem, and if Onyx is looking to be unmaintained then I need to feel comfortable that we’ll be able to do our own maintenance and troubleshooting, basically taking ownership of the code if necessary.
Given that thurber is just publicized and in alpha this may be too early for you to look at it for a real client. (Though happy to help you navigate it if you do consider.)
I will say that both Onyx and Beam (and generally modern streaming frameworks) do have a steepish learning curve. But for straightforward ETL jobs you may have a faster path. Onyx had some nice levers that let you trade performance for implementation ease, but Beam makes you do things “the right way” (i.e. “scalable” way) out fo the box.
I will also add that the Beam maintainers are very active on their mailing lists and have been really engaged and thorough with responding to questions and they seem to be on a rapid release cycle.
Hi there, guess I am late to the conversation,
and neither do I have experience with Onyx
Just curious what solution you settled for in this case.
I think Kafka Streams and also together with KSQL might be interesting for certain similar scenarios.
Personally, the ETL needs I encountered so far could be solved in a SQL. dbt is a great tool.
This, however, won’t work if batch processing is not an option. Regarding throughput I think you can go pretty far without one of these really big system.
We also had great success with TimescaleDB which is a timeseries database built on Postgres. It handles event data pretty efficient and with continuous aggregates you can build “realtime” views for certain use cases.
The project has been put on ice because of funding constraints, but we were looking at a fairly simple solution using Kinesis (basically aws’s kafka-alike) and Datomic.