Maps vs records

mars0i · April 20, 2021, 3:09am

Sorry this post is so long … thanks to everyone who actually reads it!

In another thread, @seancorfield mentioned that there’s a sentiment around these days that one should prefer using maps over records. I expressed surprise, and @didibus subsequently wrote a series of posts (starting with this one:
What is 2021 recommendation for Specs? - #19 by didibus) that explained why when one is feeding data in and out to/from various sources, it’s better to use maps so that it’s easy to adjust to changes in those data sources. You can use a :type key (or a key with some other name) in each map with particular keyword values to specify the expected fields, and you can then test that the data is in the right form at specified interface points using spec so that it will fail early rather than biting you later. These posts were great! (I’m leaving out a lot of valuable details.)

I did, and still say that this all makes sense to me for the kind of data processing that didibus was talking about. And I still am uncomfortable with the advice that one should default to maps over records. I think records are perfectly good, and that they should be part of the “there’s more than one way to do things” that any good language such as Clojure allows. I think that in fact, moving between maps and records is often easy, so I’m not that worried that people who would benefit from records will lose that benefit if they’re taught to use maps instead: they can easily switch when the see a benefit to it. I do have a small worry that I’ll express at the end of this post.

Something was bugging me about the “default to maps” advice, and I think I figured out what it is. I offer this in the spirit of clarification, as much to let others help me understand better as for me to (maybe) help others understand how I’m seeing things. It may be that there are things that I am just misunderstanding.

First, I think that in the kind of scenarios that didibus described, it absolutely makes sense to me to use maps and spec as didibus indicated. Yes.

However, in my work, and in many contexts in which Clojure is, or could be used, I believe worrying about changing data sources/sinks is not a big issue. I’ve used spec once, in order to learn about it, for data validation in a Clojurescript form. I could probably use it more than I have, but not using it hasn’t caused much trouble. Using spec in my code would actually be more trouble. I don’t write as much code as some folks here, but still, I don’t think that maps by default + spec is optimal as a general rule, and I don’t think that advice that makes sense for a particular kind of application ought to be considered general advice. Of course, you could still use maps all of the time, even if you didn’t think it was worth checking your data with spec.

In the other thread, I explained that I liked records because they partially documented the data structure that I expect. Here’s another part of what bothers me about the map-default strategy–when it’s not called for by long-term data management needs. If I define a record and then mistype its name, the compiler will catch it. If I use maps with a :type key, and I mistype a keyword value, or even mistype :type, the compiler won’t care. Of course, because Clojure is dynamically typed, there are lots of things the compiler doesn’t catch, and you just have to know that and deal with it. But if I’m using maps with :type keys instead of records, spec becomes much more important. It’s essentially doing what the Clojure compiler does with records. (Spec can do a lot more; I’m just talking about validating type keyword values in maps.) If you’re already spec’ing your data at carefully chosen points, going from records to maps might not be a big deal–and then you get the flexiblity that didibus described.

But if I don’t have to deal with spec, and I don’t have to worry about changing data sources/sinks, then using maps with a :type key feels very low-level. It means I’m constructing types with no help from the compiler, and then I have to do my own type checking using spec. Clojure is then functioning as a lower-level language than it could be. I’m not letting the language do the work, and I’m making my work harder, rather than easier.

I’m in favor of Clojure appealing to a broad audience. If new users are taught that “this is how you do it in Clojure” until you have advanced knowledge, and that way of doing things is more difficult, less elegant, more involved, and more bug-prone for their applications than alternative strategies that are discouraged, then those users might be less likely to continue with Clojure. If someone doesn’t have the kind of data management and validation needs that benefit from the default map + spec strategy, they might feel that Clojure is a little less appealing, and go to another language (e.g. Python). So I’m in favor of new users learning about the map + spec strategy (which I didn’t know about until the past week) but I’m not in favor of them being told that that’s the way everything should be handled. I don’t think it’s a big deal either way, but that’s what I’m thinking.

Appendix:

Maybe the reason that it’s good to use what I described as “lower-level” strategies with the kind of data management context that didibus clarified is that it’s a context where a lower level matters. If the structure of your data can change, then you in effect have to deal with a lower level; you can’t just specify the data structures once and for all, and then forget about the details. You have to build in flexibility so that you can respond to internal changes in structure. Not sure if this is the right way to put things.

seancorfield · April 20, 2021, 3:47am

If you define a record with fields foo and bar and then you try to access it with :baz, you’re in exactly the same situation as the hash map {:foo "value" :bar "value"}: it’s legal code and you’ll get nil back rather than an error. If you ask (contains? data :baz) you’ll get false in both cases. I think you are getting a false sense of security with records, thinking that the compiler is going to do more for you than it really is?

I’m curious now about your programming background to try to understand why you seem to find (static) types to be important here and you don’t see them as restrictive and getting in the way?

I’ve worked with both statically typed and dynamically typed languages (stretching back into FP languages in the ‘80s that predated and inspired Haskell) and after all that, I’ve come down solidly on the side of dynamically typed languages, so I’m always interested to hear folks’ backstory when they come to Clojure and seem very attracted to records and protocols.

mars0i · April 20, 2021, 4:31am

Yes, that’s a good point about what happens when you use a keyword that’s not defined with the record. Not sure I like it, but it’s how records work. They still have this benefit, though:

user=> (defrecord a-rec [foo bar])
user.a-rec
user=> (->a-rec 2 3)
#user.a-rec{:foo 2, :bar 3}
user=> (->arec 2 3)
Syntax error compiling at (/private/var/folders/68/d0l7z7p906l07fj6s7j5_ygm0000gq/T/form-init2781839833354661334.clj:1:1).
Unable to resolve symbol: ->arec in this context

I don’t find records to get in my way. I do appreciate advantages of statically typed languages (such as OCaml and the non-modadic half of Haskell), but also appreciate the advantages of a dynamically typed language like Clojure. You can’t have both, but it’s not that I want Clojure to be statically typed. I like it the way it is–mostly, of course–nothing’s perfect. (The things that bug me are orthogonal to this discussion.)

Not sure what will be helpful about my background. Started with dBase III, DOS Basic, VAX Basic, SQL, Common Lisp, Scheme, lisps with OO extensions, Standard ML, some serious Bash scripting, Perl, more SQL, Java, PHP, C, didn’t code for a while, Perl, Common Lisp, NetLogo, R, Clojure (with Java when I have to), OCaml, Haskell (very recent). Some Python at one point because I had to. The switch from Common Lisp to Clojure was a very good one. I’m unlikely to go back. I don’t want to go back to Scheme, either, although I no longer know it well.

Some of those were only hobby languages at the time, and some were for real work. Somewhere along the way I learned a little assembler (definitely hobby). I think those are the major highlights. I’m leaving out all of the little scripting languages and extension languages that I didn’t bother to remember, dialects I played with, other experiments and investigations. Programming started as a hobby, then I worked professionally to pay for grad school. That was database work, sysadmin, web stores, a moderately complex inventory and shipping management system. Now I use code for exploratory research.

didibus · April 20, 2021, 4:50am

What you can do is create a constructor for your map, and then you get the same benefit and even more if you combine it with spec.

(defn make-rec
  [foo bar]
  {:foo foo
   :bar bar})

(make-rec 2 3)
;;=> {:foo 2, :bar 3}
(make-reec 2 3)
Syntax error compiling at ...
Unable to resolve symbol: make-reec in this context

Now you can also spec this:

(s/def ::foo string?)
(s/def ::bar number?)
(s/def ::rec (s/keys :req-un [::foo ::bar])
(s/fdef make-rec
  :args (s/cat :foo ::foo :bar ::bar)
  :ret ::rec)

(defn make-rec
  [foo bar]
  {:foo foo
   :bar bar})

At this point, if you call (doc make-rec) it’ll return the spec, and document that foo is a string and bar an int. You can now instrument your code at the REPL and in your tests and the following will fail:

(make-rec 2 3)
;; Spec validation error at :foo expected string?

Now you can also stub make-rec, or generate valid rec maps for testing or when trying things out. You can use the spec later if you ever start passing a rec in/out of your app to validate that what you have is a valid rec, etc.

Off course, if you don’t need any of that, skip the spec, by just creating the make-rec on its own you’ve gotten back all the benefits you described. The constructor helps you create a rec, and it pretty clearly shows you what keys a rec contains.

seancorfield · April 20, 2021, 5:30am

Thanks. That’s a fascinating background! And that definitely clarifies that you have no particular leaning toward statically-typed languages, so thank you. We have a lot of overlap in the tech we’ve used but overall we’ve certainly walked very different paths.

Over the decades I’ve found that people can have very different preferences in programming languages and styles, even if they’ve walked similar paths – I think there are a lot of aspects of programming that are very subjective.

For me, Clojure resonates because most of what I like about it seems to be echoed by the core team, Cognitect, etc. But Clojure also resonates with folks for very different reasons and it’s always interesting just how broad a community we have.

rudolfvesely · April 20, 2021, 12:15pm

From my perspective Maps by default is a good general rule. I’d add to that rule: Use Records when you need performance boost / with Protocols. But how often do you need that?

When I moved from Elixir I felt confused about Records. First I compared them with Elixir structs and then I realized that they’re not really…

But it won’t catch key mistypes:

(defrecord Color [red green blue])

(map->Color {:rad 50 :green 100 :blue 255})
;; => #user.Color{:red nil, :green 100, :blue 255, :rad 50}

Sure, you can use ->Color but would you use it with 5 arguments? What about 7? Most likely not…

And now you need to change something (“immutable” style of change):

(assoc my-color :greeen 123)

No exceptions… Records are somewhere between Map and completely static struct.

I used constructor-like functions even for Elixir structs (I called them new ;-)) since they’re great way how to document and unify creation of more complicated data structures and you have complete freedom so you can for example create multiple arities.

(defn new-something
  ([] {:x 0 :y 0 :z 0})
  ([x y] {:x x :y y :z (+ x y)})
  ([x y z] ...

Phill · April 20, 2021, 12:36pm

The decision can be a bit multi-faceted.

You can extend a protocol to a record type, so you can arrange faster polymorphism than with multimethods.

Also, records may be a little faster than maps, because the record is a Java class with the members you specified. But accessing a non-declared record member is correspondingly slower than a map.

OTOH, records’ baggage (can’t dissoc a member) may distract from the slight benefit.

There have been some attempts to re-cast some of the (more peripheral) core features with records.

clojure.data.xml uses records instead of clojure.xml’s maps.
Somewhere there is a “fast-zip” that ported clojure.zip to records for the zip-overhead metadata.

Edit: P.S. Remember that records are a relatively late addition. Whatever the task, you can definitely get it done with maps.

mars0i · April 21, 2021, 1:21am

Thanks for all of these comments. I understand the points, and very much appreciate people spelling out these alternatives. (!)

This is kind of a strange experience for me. I’ve been using Clojure on and off since 2013, have spent time here, in the Google group, and in Slack, and I read two or three Clojure books when I was starting out, and parts of one or two others later–and I have never come across the advice to prefer maps. And now I’m being told, yeah, that’s what everyone has always said! (The first book I read still talked about defstruct, and I spent time figuring out that defrecord was definitely preferable.)

So now I understand that the benefits of records are not that great, that there are advantages to using maps instead of records, and that the costs of using maps this way (e.g. defining your own constructors) are not high. I’ll keep this all in mind going forward. Thanks everyone. I’ll still probably use records often, but I will have a better idea of why, and of when to avoid them. Great!

Apart from small exploratory projects that don’t use records or map-based record substitutes at all, a lot of what I do with Clojure is to write agent-based models (nothing to do with Clojure’s state-management agents). This is a situation in which there’s no data being read, typically, and faster really is better, if the cost of making things faster isn’t high. If I can perform dozens or hundreds of simulation runs with hundreds of agents in thousands of time steps, faster, that’s good. It means I can get results in 20 minutes or an hour on my own computer, and not have to use batch submission on a cluster, which I am lucky to have, but isn’t entirely convenient. Speed isn’t the be-all and end-all, for me, but faster is good, and using records instead of maps is an easy move for me. So I’ll probably prefer records to represent agents for simulations.

(There’s a more important reason that has nothing to do with this discussion: I often use a Java ABM library, MASON, in which agents are Java class instances. Representing agents as Clojure records means that I can have the pleasure of writing convenient, largely idiomatic Clojure code to manage MASON agents. It makes MASON a lot more fun. [This is one of the cases in which I think Emerick’s type definition flow chart is wrong–but I think that can’t be helped. There are just too many considerations. I use proxy, reify, and gen-class in other parts of my MASON code. I’ve used deftype, but that’s for a speed optimization I have to really need before I’d use it again.])

mjmeintjes · April 21, 2021, 1:43am

That’s the thing with general advice - it can only ever be general and you need to understand your own problem domain well enough to make the decision that’s right for the situation.

I think the general advice to prefer maps is solid, but as you say, if your situation has different requirements then there is nothing wrong with records.

joinr · April 21, 2021, 1:11pm

What did your profiling show when you switched to using records?

As implemented, records still leave some performance on the table that can be regained.

mars0i · April 21, 2021, 3:28pm

Interesting question. I started with records. I use maps a lot, of course, but never to implement agents in an agent-based model. It could be that maps wouldn’t be much slower for this purpose, but I could imagine why they might be, and there’s no cost in this case to sticking with records as opposed to maps.

fastrecord looks interesting. I was unaware of clj-fast. Thanks! I definitely want to investigate it when I have time. With MASON, I’ve been happy with the speed I’m getting from Clojure so far, but I’m planning some work in the next year that might require many more runs, agents, timesteps, etc. than I’ve had to use in the past, so perhaps fastrecord or other optimizations in clj-fast could have a big impact.

I’ve done some comparisons of MASON with pure Java vs. MASON with Clojure, and between using deftype as opposed to defrecord with MASON with Clojure. There are definitely contexts in which deftype is faster than defrecord, and contexts in which Java is faster than Clojure. Using deftype takes away a lot of the joy of using Clojure. After writing a simulation with MASON using deftype to define agents, I felt as if I might as well use Java instead. And I don’t want that.

joinr · April 21, 2021, 3:56pm

If all you have to do to interface with MASON’s framework is implement some interfaces (like Steppable), reify would also work. The SimState design is kind of an indictment of the OOP inheritance weakness too, one where you could just pack some data along with the base SimState class instead of requiring inheritance all over (decouple the simulation data / state from the methods…). Same with all the visual components. Lots of inheritance heavy code. OTOH, you can define simple wrapper classes/types that can delegate toward functions or whatever and have a reusable substrate, then just re-use those to allow parameterizing everything via composition (no need for multiple gen-class or proxies per project to get inheritance).

In your masonclj example, the fact you’re using records with fields < 8, and not really implementing any interfaces or protocols on them, and not using primitive types for the fields, indicates that arraymaps would probably perform identically (I don’t think you are exploiting any fast paths). Doing stuff like key-based destructuring isn’t helping (it expands to polymorphic clojure.core/get despite the type being known, and despite having direct field access available)

(let [cfg-data @cfg-data$
        {:keys [snipe-field ]} subenv ;;this could be a direct field access on SubEnv record...
        snipe-field' (move-snipes rng cfg-data snipe-field)]

I explored some of these things in the icpc2019 optimization where records were used (and further optimized), and ended up with an experiment called structural to make direct field accesses and destructuring easier to do.

Looking back through the MASON stuff reminds me a lot of interfacing with Piccolo2D and adapting it, living with the primacy of inheritance-based design and how that was a pain to eleveate into something more amenable to composition. Might be nice to provide an alternative to MASON that’s compatible with clojure’s focus on data and composition. I have some stuff that I use for work doing discrete event / agent based. and clojure has worked out well.

mars0i · April 21, 2021, 3:58pm

It’s also reasonable for me to wonder whether it would be worth using your spork library, @joinr. I’m used to MASON, of course, so that’s one reason to continue with it. But I’ve had to struggle with it sometimes. It seems clear that it’s often possible to use ABM software and discrete event simulation software for the same purposes. The concerns are a bit different, and one has to work through a different conceptual framework, maybe, and figure out what is and isn’t relevant to one’s needs.

joinr · April 21, 2021, 4:02pm

I don’t have near the documentation or examples/demos that MASON has (I’m not an academic, and instead have been using it professionally for analysis). There also hasn’t been exactly a huge call from the Clojure folks for these kinds of libraries I’ve mulled around extracting and packaging out the stuff into something more focused, combining the entity component store, behavior trees, and the discrete event stuff into a simple library.

Fyi, odoyle-rules by Zach Oakes is another really interesting take on these kinds of problems, and as demonstrated in his demos (doing game stuff), can act as a really nice layer for defining complex agent/entity behavior.

dsim is another interesting take,although I’m not as familiar. It also tries to lift everything into the declarative realm. I haven’t messed with it though.

mars0i · April 22, 2021, 3:32am

Yeah.

Thanks for the links. I’ve added them to my brief Notes on writing agent-based models in Clojure. Maybe that will help one person some day. Not too many readers of an obscure page buried in an obscure repo. I suppose I could start a blog instead.

joinr · April 22, 2021, 1:10pm

That’s a really good set of notes. I think some of the problems you mentioned with MASON (and your adaption with masonclj) may be surmountable to make the out-of-box clojure experience better (or similarly adapt MASON’s stuff to work with clojure libs).

mars0i · April 23, 2021, 5:07am

Thanks @joinr.

Some of the problems have to do with design choices that I suspect Rich H. et al. would not want to change, and I don’t think it would be reasonable of me to ask Sean Luke, the main author of MASON, to modify it just for Clojure. MASON well designed from the point of view of Java traditions, and it’s complex. The intersection of the set of MASON users and Clojure users is pretty small.

There is or was some interest among MASON people in figuring out how to use MASON with other languages, but other than what I’ve done, the experiments using MASON in other languages were all imperative in nature. There was a Scheme example like that.

I’m amazed at how widespread functional programming interest has become in the last decade or two, but my sense is that it’s still a fraction of all programmers.

I’d love it if others would use MASON with Clojure, maybe using masonclj, but I realize that it’s a tough sell. Assuming that my macros are enough for a person’s needs, and it’s never necessary to look under the hood of the macros, where you have to understand things like gen-class, you still have to develop enough understanding of Java that you can understand MASON, and then use proxy and/or reify to set up the classes needed to configure the GUI for your app. (I thought about trying to make that easier from Clojure, but MASON gives you a great deal of flexibility in configuring the GUI with Java subclassing and method overrides, so any macros I would write would only work for a small number of cases. I will want more than any macro I could write to try to make the GUI code easier.)

So to use MASON with Cojure, you end up having to be pretty comfortable with Clojure, Java, and basic Clojure interop. There are lots of people with those skills, sure, but then again, take the intersection of that set and the set of people interested in ABM (and who aren’t sufficiently satisfied with NetLogo), and it’s going to be small.

mars0i · April 23, 2021, 5:11am

This conversation is making me wonder whether it would be worthwhile to set up a site that would collect information, links, papers, etc. about functional ABM, DES, and related software, e.g. maybe some game engines. I’ve come across some experimental Haskell ABM projects, for example. This is just an idle thought at the moment. Not sure whether I’d want commit to anything like that right now. (I shouldn’t be spending so much time posting on Clojureverse, as it is. )

joinr · April 23, 2021, 1:54pm

Yeah, we can move the now off-topic spam on ABM back over to zulip maybe. I’m definitely interested in the topic in general.

system · October 23, 2021, 1:55am

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.