State management for the server?

daslu · May 11, 2020, 9:21am

@dave.liepmann that is great. Cells are one of the things that really made me happy about Maria.

We do want to explore different evaluation semantics, including dependencies across notes, caching, etc. Hopefully, these will be decoupled off the general solution (just like, afaik, Maria can be useful without using Maria cells, and Maria cells could be useful in other UI systems, in principle).

I will look into the resources you suggested, thanks.

Before diving in, it seems to me that cells are similar to Reagent reactions, and Javelin cells. Is there a conceptual distinction you wish to point out (except for probably being a bit easier to start with)?

daslu · May 11, 2020, 9:42am

By the way, I have had the opportunity to talk with a 10 years old boy, relatively new to programming, and learn some Maria and Clojurescript toegther. Cells were one of the things that worked for him. It was nice to see how you made them easy enough to learn quickly.

bsless · May 11, 2020, 9:53am

You’re giving me an idea for the next meetup

dave.liepmann · May 11, 2020, 10:08am

I haven’t worked extensively enough with those to compare them in depth.

jackrusher · May 11, 2020, 1:54pm

I’ll add that cells are also often a nicer way to do general front-end programming.

jjttjj · May 11, 2020, 2:26pm

I do feel cells are underrated. I use javelin extensively and the Maria.cloud flavor look similar. But cells are just a reference type. To me the harder questions come when we try to build a higher layer of abstraction.

How do I group state within one “component”?
How do I deal with side effects that a component must trigger? Should a state change be treated like every other side effect or should it be different?
How do I communicate between components?

If you have only the components: editor, filesystem and repl, there are basically infinite ways to structure these. Months later, if you want to add tabs, you are again faced with the fact that you could implement this 100 different ways.

I think something like citrus or domino or anything imposing structure, at the very least give you some guidelines on how to implement these components so that everything isn’t ad hoc. Still, even these systems all have a bunch of different ways to do things within them.

Ultimately the problem I want solved, and I know this is way too vague, is:
How do I prevent my code from becoming spaghetti code over time as it grows and I want to add arbitrary components into a system.
For some reason I’ve found this much more difficult in UI programs than other types of programs.

jjttjj · May 11, 2020, 2:40pm

Another thing I struggle with is the difference between singleton components and components that can be instantiated multiple times at runtime. Often, up front, it’s easier to think about a system as having, for example, one editor and one repl. And that’s all you need right now.

But what if we later want to have multiple editors in our program? Or multiple repls per editor?

One way to address this is to make nothing a singleton. Everything should be instantiable. This ends up looking somewhat like OOP. This makes things a lot more complicated up front to design in these re-frame/citrus/domino like systems though. But these systems alone don’t feel ideal for managing “entities” like this. You end up wanting something like datascript or the fulcro style DB to manage these individual entities. These setups are a lot more involved.

But the problem of treating things as singletons, is when you want to change something from a singleton to an instantiable entity, as you inevitably will, this tends to be a non-trivial change that has to touch a lot of your codebase.

dave.liepmann · May 11, 2020, 3:08pm

didibus · May 11, 2020, 5:39pm

I see, that clarifies it.

I think I’d say my only worry if you go with a dataflow approach, is that they tend to have horrid performance. Eventually, you can end up in a place where a lot of caching is required and additional logic for computing the minimal set of changes and all that.

The benefits are the user gets this “live” feeling, as every change cascade and re-render automatically.

But I’d think a bit about the need for this in the UX. If the user had a way to make changes and choose when to “apply” them. They’d be more in control of when the re-computation and re-render happens. So if they know they’re about to re-run some heavy computation and can expect some waiting time, maybe they choose not to apply right away, and make some more changes. In effect the user can control the amount of batching done.

It might be you can pull of some hybrid. Like the user could choose the code block style, if it auto-recompute on changes to its inputs or needs to be manual triggered.

Another possible hybrid is to never auto-recompute, but still track when things are invalidated, and highlight the chain of cascading changes. You might need a heuristic for this, could be as simple as tracking what block uses what other block, and any change in a block would indicate the others need to be recomputed, even if the change technically doesn’t affect the output.

mvarela · May 11, 2020, 6:28pm

I think that if the goal is a notebook-like experience, the dataflow approach might be worth the performance hit in many cases. Typically, one of the pain points of using notebooks is stale state, which this would take care of.

didibus · May 11, 2020, 7:20pm

Maybe I need to be more specific, the difference I’m talking about is the difference between recomputing on every keystroke, or recomputing at the user discretion.

While there’s a spectrum here, like recompute on every x keystroke, or on every full symbol edit, or every y seconds, etc. I think doing it at the user’s discretion can actually be a good UX.

At first, a user might appreciate re-computation on every keystroke, until it becomes too slow off course, then they curse the tool and go back to their old workflow.

In both cases, the dependency of changes is tracked by the tool. So I’m not suggesting the user has to figure out what all to recompute and manually go to each block in the right order and re-eval them. This should still be tracked by the tool. That said, the user could choose how many changes to perform until the graph is re-computed.

But to be honest, I think sometimes further isolation might be needed. Let’s say I’m playing with a code block that pulls in a large dataset and does some filtering and cleaning over it. Maybe some other block uses this as an input, millions of rows as input, and it generates some visualisation over it.

This visualisation generation block is really slow over the large input. My code block itself is already pretty slow. So now as I work on the filtering and cleaning, I’m constantly hammered by the visualisation re-computing over and over even though I don’t care for it. I’m still figuring out the right filtering and cleaning I want. Why have this visualisation re-compute on every change?

In this scenario, I’d like to be able to say that only the code block I’m currently on re-evaluate, nothing else. And even that block, not on each keystroke, since it’s a block filtering and cleaning over million of rows. So only once I’m interested in seeing the effect of my new batch of changes.

And only once I’m satisfied with that, would I want to say, okay, now re-sync all dependent code blocks as well so that I see the effect on my entire notebook.

Now I don’t know, maybe there’s a way to make this even nicer. Like start by re-computing everything in change, but keep track of performance. If something is found to start to take too long, it automatically switches off from the auto-recompute chain, and now required manual trigger. Visual indicators could make this intuitive to the user. And if the user really cared, maybe they could force it back to auto-recompute and vice versa.

Webdev_Tory · May 11, 2020, 7:43pm

I admit I haven’t done any heavy lifting with back-end re-frame; I’ve just used it in cljc to make for easy testing with our clj testing suites (since I still haven’t found a good way for cljs testing).

didibus · May 12, 2020, 5:04am

Another aspect to consider as well is certain I/O events, like sending an email, or downloading a data-set. The notion of when to re-evaluate them and not can get tricky as well.

I don’t really have a good answer to these, but some things I’ve seen is to have them modeled in isolation. Like you could have pure code blocks and side effecting code blocks. The side effecting code blocks could have policies. Those could even be functions, which return true if the side effect block needs to be recomputed and false otherwise, which would then just reuse the output from last time.

I’m thinking something that could be nice maybe is that across namespaces, things are always cached and need to be forcefully re-computed. But within a namespace, things are auto-recomputed on change.

So in that scenario, by choosing to factor something out in its own namespace, as a user, I can decide to do that whenever a block becomes too slow.

I can also imagine I could do that when I need to make some side-effect idempotent. Like say I have a send report block at the end of my notebook, which sends a final report. I probably don’t want that report sent on each change. If I moved it to its own namespace since future reevaluation of that namespace from the calling namespace would not cause it to reevaluate, the side effect could be made idempotent.

Anyways, I’m kind of doing a brain dump of the possible edge case and concerning scenarios I can imagine possibly happening. Not saying any of this is relevant or a good idea. Just food for consideration and things to bring to the hammock.

jackrusher · May 12, 2020, 7:33am

For the style of interaction you’ve outlined, the best thing is probably a long running process doing “parse on save” for the notebook namespace so you can do automatic view updates. In order for this to be efficient, you’ll need to be able to diff the bits of code in the file versus the previous state so as to only recompute what has changed, but you’ll also want a dependency graph so you can update code that depends on the code that changed. Once you’ve built that, you’ll start to want a way to propagate changes automatically along the dependency graph. Once you’ve built the updating mechanism, you’ll have cells.

Yogthos · May 15, 2020, 3:39pm

Managing server-side state in a structured way is largely the motivation behind Domino https://domino-clj.github.io/

daslu · May 15, 2020, 5:47pm

Thanks @Yogthos!

I have been looking into Domino in the last few days following @jjttjj’s link. Really nice and elegant!
(Not yet sure whether it is flexible enough for my use case, with a varying collection of paths).

Yogthos · May 15, 2020, 7:06pm

There are some changes and improvements coming down the pipe for Domino in the near future. I’d be interested to hear your particular use case to see if perhaps it could be accommodated better.

system · November 14, 2020, 7:06am

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.