Well - I am working on a project to receive a lot of events through SSE channels, put them in some Kafka queues, and then pull them off again to feed remote databases through an HTTP API. The devil is in the details: no events are to be lost, and both producer and remote databases are stateful - so I need to keep a high-water mark of the last event I read (so I can subscribe again) and, when syncing, I have to ask each database up to which event they are up to; then do a back-scan, push all backlog and then start streaming new events. All this, non-blocking (as we have thousands of databases, and average latency should be < 100ms).
How it was: interesting. I wrote the Kafka module by wrapping the Java interfaces, and though Kafka looks simple, there are a lot of ways you may want to consume data (one or more topics at once, only one topic, only one partition; one topic or partition from index to index, from timestamp to timestamp and whatever else). Plus topics keeping state are compacted, and you have to keep only the last version of the key. Kafka is very low level, but version 1.0 is pretty nice. I use JSON for data representation, and Spec to make sure everything works as expected and that data is what I expect it to be. For Kafka, i ended up separating the reading operation into a function that does the heavy lifting and a kind of reducer that receives a page of data, processes it and returns what to do next (keep on reading , stop, skip). I am considering releasing the Kafka wrappers if anybody is interested.
As we have potentially tens of SSE ingestors spreading data across thousands of databases, I use bounded thread pools both on ingestors and on pushers when I need blocking operations, and async HTTP (http-kit) for everything else. Running thread pools with Clojure is easy-peasy, though you have to be careful with exceptions.
The best thing I learnt, still, is Spec - it saved my *** countless times, and is extremely useful to validate / conform data. Using the same spec to validate function input, external data and HTTP apis was a big gain (did I tell you we also have an HTTP inspection/configuration layer, as Kafka’s are pretty poor?)