Lazy-seq magically works for extremely large recursion. Why?

Hi all,

Reading in recent Would loop and recur exist if Clojure was build on top language with no StackOverflows? Clojure does not do TCO (recur is needed).

But it works for lazy sequences.

I understand that it being lazy, the recursion doesn’t happen till it’s called. But I did call it as you can see bellow. It took a lot of time to compute but it worked:

(defn my-repeat [x]
  (cons x (lazy-seq (my-repeat x))))

(nth (my-repeat 5) 1000000000)
;; => 5

Could anyone please explain how is that possible? Shouldn’t I get StackOverflowError?

Thank you Clojureverse.

Ozzy

1 Like

If you look at the source of lazy-seq, you’ll see it is transforming the code into a “thunk” – a function of no arguments that yields the expression when called – which is invoked when a new element needs to be realized. So the stack never gets deeper than one call, made each time a new element is “requested” (realized) as the lazy sequence is unraveled.

3 Likes

It’s a great question, and isn’t immediately obvious from reading just the documentation.

lazy-seq is a macro which creates a LazySeq instance. The reason it is a macro is that whatever you pass to it, it wraps that up in an 0-arity function (aka thunk), which is a way of creating a delayed computation. A lazy sequence is really a delay, with a slightly different mechanism to triggering its evaluation than an actual delay.

With a delay, you can pass in any body of expressions, and it is only evaluated when you dereference the delay, via either deref or @.

With a lazy sequence, you can only pass in a body which evaluates to a seq, that is anything for which seqable? returns true. When you call lazy-seq at the REPL, provided you pass it a valid body, it will be realized immediately. However, if you call it and bind its value to a symbol (via def or within a let block or similar), then it won’t do anything with the body you pass it, until you actual request it, for example by calling take, nth and so on, on it.

Now, here’s the real kicker: your function isn’t actually using recursion! It just looks that way. When you call my-repeat as follows:

(def result1 (my-repeat 5))

your function returns immediately with the result of the following:

(cons 5 (lazy-seq (my-repeat 5))

which actually creates a Cons instance, which is seqable, and which has two items: 5 and a lazy sequence. It is important to note that the Cons instance (which when you print it in the REPL looks like a list, delimited using parentheses) is not lazy.

If you just call say:

(nth result1 0)

and don’t attempt to evaluate the whole return value, say by printing it to the REPL, then the lazy sequence which is at index 1 is never evaluated, and you just retrieve the item at index 0, which is 5.

If you then call

(nth result1 1)

now the lazy sequence which is the item at index 1 of result1 above is evaluated for the very first time. So up until now result1 has looked like:

(cons 5 <<unevaluated lazy sequence>> )

but now, when we try to evaluate the lazy sequence, it becomes:

(cons 5 (cons 5 <<unevaluated lazy sequence>> ))

which can be expressed as:

(5 5 <<unevaluated lazy sequence>> )

since it is a Cons instance which looks and behaves like a list in most ways, at least when it comes to retrieving items from it.

The above is now the new state of result1 after our call to (nth result1 1). If this sounds like result1 has been mutated, that’s because it has. Or specifically, the lazy sequence that was initially at index 1, after our original call to my-repeat, has now been realized, and will always return the value of 5 whenever we attempt to call say (nth result1 1).

At this stage result1 appears as if it has two realized items followed by a lazy sequence (at least from the point of view of any function we use to request items from it is concerned.) If we attempt to request an index which has already been realized, we get back the item at that index. If we attempt to retrieve an item which has not yet been realized, what happens is that the last lazy sequence is evaluated, then its value is returned, and that will contain the next item in the sequence together with another lazy sequence, and so on, until we return the item at the index we requested.

So each time we realize a lazy sequence at the end of result1, we return the next element together with a new lazy sequence. We only ever do this outside the body of my-repeat, hence recursion is not even possible.

4 Likes

That’s a great explanation. Thank you.

The most important part is that It’s not recursion and Clojure will keep in memory only elements, that were realized. So it won’t keep all 1000000000 elements when I call (nth (my-repeat 5) 1000000000)

Not even those, unless you need them. If the consumer of a lazy sequence “holds the head” of the sequence, then the realized portion of the sequence accumulates in memory. But if the consumer does not “hold the head” then the sequence is just water under the bridge.

Head-holding is discussed in an announcement from the Clojure 1.0 days: Clojure - Making Clojure Lazier (especially the “Don’t hang onto your head” section). Also, Mike Fikes wrote up an interesting exploration at FikesFarm Blog: ClojureScript Head Holding .

1 Like

Simple illustration of holding the head:

(def fives (my-repeat 5))
(nth fives 1000000000)

It looks like it should work the same as your example, @leonardobettany, but it will eat up memory and/or CPU. fives is reference to the beginning of the sequence, so as nth progresses through the sequence looking for the 1000000000th element, fives refers to the first element, which refers to a realized second element, which refers to a realized third element, …, and so on, and they all have to be kept available, up until the 1000000000th element–because fives is a reference to the first element. In your example, though, Clojure can forget about the earlier elements as it searches for the 1000000000th element, because nothing will ever use the earlier elements once the nth-search has gone beyond them.

1 Like

EDIT: As it turns out (see subsequent posts) the following is true for ClojureScript but not for Clojure.

@mars0i, I don’t believe your explanation is correct, at least not in ClojureScript, and probably not in Clojure either.

In both of the follwing cases, Clojure/Script will realize all the elements in the sequence returned by (my-repeat 5), so whether you do:

(def fives (my-repeat 5))
(nth fives 1000000000)

or

(nth (my-repeat 5) 1000000000)

the same process I have outlined in my post above will happen. As nth evaluates, the sequence is built up in memory and stays in memory until nth's internal index reaches n (which is 1e9 here). So it will eat up memory and use CPU in both cases while going through this process, and in both cases the entire realized collection will have to be kept in memory, until it reaches n. So I believe the following:

isn’t true, at least in ClojureScript and I strongly suspect Clojure as well. I don’t believe it can forget about (i.e. release resources for, via garbage collection) earlier elements as it searches for the 1e9th element, because the nth function itself holds onto the sequence as it is being realized. However:

If by this you mean once nth has finished executing, then this is correct for the second example.

Now, while technically speaking your example results in head retention, it is a byproduct of the fact that you are actually holding onto not just the head but the entire sequence realized thus far, via the global reference fives. Defining a lazy-sequence and binding it to a global in this way is a big no-no and should be avoided unless you have a compelling reason to do so, because it will never be garbage collected. I really should have mentioned it in my original post.

EDIT: The warning in the previous paragraph was really intended for beginners. As long as you know what you are doing, then ignore this warning completely.

Head retention becomes an issue, and something you need to be wary of, when creating a very large lazy sequence within a let block, and then consuming it in a particular way such that the consumer forces the head, and hence the entire lazy sequence, to remain in memory even after it no longer really needs it. There is a good example I’ll dig up which illustrates this well when I get a chance.

Thanks @outrovurt. You might be right.

There is nevertheless a difference between the two versions of the nth operation. Though I would never use time for real benchmarking (I’ve used Criterium in the past, but there may be better options now), time provides good evidence of that something different is going on: I repeatedly see an order of magnitude difference in time results using nth fives vs nth (my-repeat 5) in a freshly started repl. Either version has apparently left a lot of garbage to be collected, because Clojure is slow if I try it again in the same session, no matter which version I ran. I’m not sure I understand it, but it’s clear that there’s some sort of difference.

(btw, “a big no no … unless you have a compelling reason” is a good rule of thumb. I would add “or understand what you are doing”. I bind lazy sequences to globals all the time. It’s a key part of one of my favorite strategies for working with Clojure, and I do understand the risks. I know that the kind of work that I do with Clojure, and my goals, are a bit different from those of most Clojure folks.)

Pretty sure this is incorrect for nth. The head of the coerced sequence seq is bound to a variable, but it’s mutated internally via first/next traversal during iteration. There’s no retention of the original head or earlier entries during traversal (local references are cleared on clj jvm). GC is thus able to collect previous entries (assuming no external references to heads that end up caching the sequence). The logic is similar in cljs where the linear traversal works via loop/recur, so the previous heads are ditched as the traversal progresses. I’ve had plenty of larger-than-memory sequences where I can compute the nth because of these properties. Perhaps there are degenerate cases I’m not aware of though.

edit looks like mike fikes post (already referenced) addresses cljs lack of locals clearing. Looks like reducible variants are the solution. Good to know if I dip over into cljs land.

1 Like

So conclusion, it’s not true in Clojure, but is true in ClojureScript because the lack of locals clearing will retain pointers to the elements?

Seems to be. Looks like CLJS (or at least Nolen) is uninterested in fixing this. “More effort than it’s worth, few people have reported issues with head-holding.”

This semantic difference in Clojurescript will bite someone sometime. Maybe me (because I don’t use Clojurescript often, might forget). There’s a reference to the difference on the Differences from Clojure page, but it’s only meaningful if you already know what “locals clearing” means and what its implications are for lazy sequences. I would have thought there’d be a more prominent warning. Lazy sequences are after all one of the signature features of Clojure. (Oh well. I guess that’s what blog posts are for :slight_smile: , like Fikes’.)

Then you definitely have a “compelling reason” :wink: But yes, absolutely, if you know what you’re doing, then do it, and that pretty much goes for anything.

This is actually mind-blowing, I had no idea that this is how it worked in Clojure. Makes sense when you stop and really consider it, why hold onto earlier elements when done with them? I work mostly in CLJS-land so foolishly assumed that this was how it works in Clojure as well. Really good to know all these things.

2 Likes

I work mostly in CLJ-land so foolishly assumed that this was how it works in ClojureScript as well :wink: Glad to still be learning new things; this discussion actually might save me some headaches down the road as I settle more into the browser side.