Hosting dependencies on a distributed file system?

camdez · July 27, 2018, 12:17am

Has anyone experimented with / thought deeply about hosting project dependencies on a content-addressable distributed file system (e.g. IPFS)?

At first blush, this approach makes a ton of sense to me—we would no longer need storage / bandwidth sponsorships for large, central repositories; content addressability takes care of ensuring artifacts aren’t tampered with; and it provides interesting opportunities for leveraging data locality. e.g. your coworkers likely use most of the same dependencies you do—why not fetch them directly from your coworkers’ machines? (For even more fun, you can do this in an environment where you don’t even have an Internet connection, like on an airplane.)

I realize this is hardly the most pressing issue in the Clojure community, but it sounds like a fun project / possibly The Way Things Should Be™, especially for immutability-loving Clojurians. Am I overlooking anything important?

GKlijs · July 27, 2018, 6:04am

If to be able to build the code, a coworkers machine needs to be running and accessible to you, that seems brittle. With deps.edn you can have a git dependencies. I think that’s a lot nice than what your suggesting.

camdez · July 27, 2018, 3:12pm

Ah, perhaps that required more explanation. You certainly wouldn’t run it as you’re describing.

Despite a coworker’s machine being an equal peer in the network, you’d still want to have always-on supernodes reliably hosting content for times when other peers weren’t available. But think of them (in this case) like a lowest level behind a series of caches. For example, compare with Datomic—best case scenario the data you’re accessing is already in memory, if not, fall back to Memcached, and if it’s not there, fall back to Datomic itself. In the ideal case, you never hit that underlying Datomic server. Same with these supernodes.

But the nice thing about a system like IPFS is that there’s not a technological distinction between the “layers”—it’s a single, consistent abstraction.

Git has some of the properties of IPFS (and, in fact, you can piggieback git on IPFS), but we use a decentralized model rather than peering. And, in practice, we largely tend to centralize (i.e. how many of us are transmitting patches directly rather than sharing though a git remote?). If git could automatically discover other remotes to read from, then you’d be looking at something similar.

camdez · July 27, 2018, 3:35pm

I had a little time to do some reading this morning, and this is certainly a path that has been trodden before:

https://github.com/ipfs/notes/issues/171 - great primer / discussion of core considerations
https://github.com/whyrusleeping/gx - prior art (for Go lang, but ostensibly extensible)
https://ipfs.io/blog/15-ipfs-weekly-9/ - covers the topic and points out this was actually how IPFS got its start (!)

I also realized I had missed this fairly extensive discussion on Reddit about deps.edn and git dependencies, which raised some interesting considerations and cleared up, for me, a couple points I believe Rich was trying to convey in his Spec-ulation.

Looking over the source for tools.deps.alpha, the abstraction looks pretty clean / simple and an IPFS-backed dependency extension seems like a natural fit.

I’d love to hear more opinions; this still makes a lot of sense to me.

fingertoe · July 30, 2018, 9:30pm

IPFS and git are pretty darn similar… I don’t think it is any different at all – with a git dependency you still need the machine hosting your git repository to be up… The only real difference is that with IPFS, any IPFS client in the world with the file could deliver it to you – not just the specific machine on the endpoint of the url in your deps file.

I think eventually IPFS will become a mainstream way of doing things. Not sure when, but it does make a ton of sense.

system · January 29, 2019, 9:30am

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.