Core.async. Traversing remote folder

core_async

#1

I spent so many hours already trying to wrap my head around this. Can someone please give me a conceptual idea how to solve it.

The task is to traverse Box folder using Box API in most efficient way possible.
In order to get folder content you have to send a separate request (sometimes multiple, because of pagination). And another for its sub-folder and yet another for sub-sub-folder and so on.

So my idea was:

  • to feed a bunch of urls (specific to a folder-id) to an async channel
  • then in a go-block for when the response comes back:
    • for every subfolder - keep adding urls into the urls-ch channel
    • put folder info (alongside with the vector of its parents) into a responses-ch channel
  • After all that - read values off responses-ch and “stitch” things together using parents vector of the folder,
    • with every change - push the resulting tree into a final-tree-ch channel.

Now, all this dandy and works nicely. Except the very last, missing piece.

How do I determine that it’s done fetching and there’s no more updates to the final-tree?

Things are happening asynchronously, I can’t just block main thread - channels don’t ever close. There’s no way to find out how many items currently sitting in a urls-ch or/and responses-ch. I guess simply keeping their count and calculating the difference could work (I dunno).

I’ve been trying all sorts of things so far unsuccessfully. Please help me.

upd: Maybe using pub-sub? But I still can’t wrap my head around that either


#2

Are the Box APIs async?


#3

What do you mean? It’s just a regular REST API. They do throttle you though. I think they allow no more than 4 simultaneous requests per second (or something like that) for free accounts. But hey, this is not important - Box api or whatnot - it could be some other kind of remote process, or any other slow/remote thing that could be done using concurrent async requests. I can’t wrap my head around this concept - if you have a bunch of requests being processed asynchronously, what constitutes “done processing?”. How do you check for that?


#4

Your channel with urls is basically a work queue. Processing an item off the queue can add more work to the queue. If at a certain point you are not actively processing any items, and the queue is empty, then you are done.

If you only process one URL at a time then I think it’s relatively straightforward, after finishing each request you poll the channel, if you get nil back then you’re ready, and you can put a :ready sentinel value on your responses-channel, otherwise you proceed processing the URL you just received.

Honestly though I’ve lately started using simple java BlockingQueues more than core.async. You gain introspection (i.e. you can count the items in the queue and peek at the head), you lose the ability to generate a state machine to coordinate multiple channels with the go macro.