"Just use maps" in Java?

seancorfield · February 15, 2021, 6:38pm

As someone who started with FP at university (in the early '80s) but then learned OOP (to be employable in the early '90s), this definitely resonates.

So much of the early push for OOP was around reuse but it never fulfilled that promise – except where folks could build more general abstractions: the concrete classes never provided reuse and early C++/Java didn’t have the language features to support generic abstractions properly.

Part of why design patterns looked so promising in those early days was the lure of reusable abstractions. In the end, we needed generic type support – templates in C++ and “generics” in Java – to be able to provide true reusability, something the FP community had known about for years by that point.

The more that code can “ignore” the specificity of its parameter types, the more reusable that code is likely to be. It will be interesting to see how much of an impact Java’s record type has on the mainstream style(s) of Java programming (it’s still rooted in concrete types but it should encourage more “plain ol’ data” usage and more decoupling of behavior and data).

didibus · February 15, 2021, 10:07pm

I’m referring to the constraint put in place by the Java style of Object Oriented Programming. Think Essential Constraints like what you’re referring too, and Accidental Constraints which is what I’m referring too.

For example, in Java every Object needs a custom implementation of toStrings, because by default it simply returns the object class + hashcode.

So if you model some domain information with an Object, and try to print it, you don’t get the modeled information back, but the accidental details of the Object.

Now, to have a default toString that could return a view of the modeled information, well you need to manually override it. That’s because there’s not a generic toString function that takes some data for it to print. Each modeled entity comes with its own toString which can only access the data on the object.

Even inheritance doesn’t help you:

class Foo {
  public String toString() {
    // That won't work
    return this.studentName;
    // This won't either
    String allInfo;
    for(field : this) {
       allInfo = allInfo + ", field:" + field.name + ", value:" + field.value;
    }
    // So this is what Java does
    return this.getClass.name() + this.hashCode();
  }
}

class Student extend Foo {
  private String studentName;

  // And this is what you need to do over and over for each of your modeled domain entity
  @override
  public String toString () {
    return "studentName: " + this.studentName;
  }
}

class Teacher extend Foo {
  private String teacherName;

  // And this is what you need to do over and over for each of your modeled domain entity
  @override
  public String toString () {
    return "teacherName: " + this.teacherName;
  }
}

The reason you can’t do it is because only methods of the Object Student can refer to the information inside it. And Objects do not expose their information in a generic way (unless you use reflection, but that’s a different style of programming than OO).

This is solved by modeling your data as a Map and toString as a separate function:

var student = Map.of("studentName", "John");
var teacher = Map.of("teacherName", "Mark");

static toString(Map m) {
  String info;
  for(Map.Entry<Object, Object> entry : m) {
    info += entry.getKey() + ", " + entry.getValue + ", ";
  }
  return info;
}

Another example is what I had in my previous post, take a Player:

class Player {

  private int x;
  private int y;
  private List<item> inventory;
  private String name;

  public Player(String name, int x, int y) {
    this.name = name;
    this.x = x;
    this.y = y;
  }

  public void moveBy(int byX, int byY) {
    this.x += byX;
    this.y += byY;
  }

  public void addToInventory(Item item) {
    this.inventory.add(item);
  }

  public int getX() { return this.x; }
  public int getY() { return this.y; }
  public String getName() { return this.name; }
}

Now say after the game released we have added a Tank class as well, which isn’t a Player, it doesn’t have a name or an inventory of item, so it doesn’t make sense for it to derive from Player, but it does have x,y int coordinates, and we would like to reuse the moveBy method from Player? Well there is an accidental constraint from the fact we used OO to model this operation coupled to the Player itself, so we can’t just reuse moveBy here for Tank, we have to define a new moveBy method on Tank as well. Where as as a separate function and without encapsulating the fields this wouldn’t be an issue cause we’d be able to make it generic over the data and do:

// Not exactly 1:1 Java syntax just representative of the idea
public static void moveBy<T>(T m, int byX, int byY) : where T has int x, int y {
    m.x += byX;
    m.y += byY;
  }

This actually won’t work in Java, but in some languages (like Clojure) this does work, and there are other languages where this can work too even typed ones, where T can be constrained to any thing with fields x and y and thus the above can be type-safe and compiled.

In Java, you would be forced to refactor your data model, probably you’d need to extract out x,y into an Abstract Coordinate class with the moveBy function on it and then change the hierarchies of Players and Tank to extend from it, etc. If you can’t shenanigan the hierarchy like that, then maybe an interface is needed, such as Movable with a setX and setY method as well as moveBy and have Tank and Player implement the interface, than extract the moveBy method into a static function (maybe a default implementation on the interface), etc.

mars0i · February 15, 2021, 10:28pm

Nice post, @ericnormand! Clarifies a lot, some of which I had thought about less clearly, and some of which I had never thought about.

A kind of peripheral technical, or rather philosophical, point is that even if you did want to model the world, as in a simulation, what makes things work in the world are not only monadic properties (x is a mother), but also relations (x is mother of y, x is sitting between y and z). Java’s OOP (unlike, say, CLOS) distorts things by requiring relations to be treated as if they were properties, for no reason that has anything to do with modeling the world. (The most absurd variant imo is equals as a method of a number class.)

(If anyone wants to see philosophical roots of the idea that relations are crucial, see the critique in Bertrand Russell’s The Problems of Philosophy of some of his contemporary philosophical metaphysicians.)

Richard_Heller · February 15, 2021, 11:29pm

That’s by design. Many times, especially when debugging complex algorithms, you’re more interesting in what object in memory is being referenced and don’t really care about the contents. At some point, languages that don’t do that annoy me when you think two objects are the same because the contents are identical but they’re actually different objects in memory. So, it’s a design decision done on purpose. I’m actually glad they did it that way.

Right, because you designed it poorly to begin with. You’d also have to refactor your Clojure code if Tank called its location x-pos and y-pos instead of x and y like Player did. You can screw things up in both cases. Setting up a poor design strawman doesn’t work.

Java is one of those languages, if you design things correctly. What you just described is an interface. There are several ways of dealing with that interface to not need to duplicate code.

It’s two different ways of approaching things and, like I’ve said dozens of times on here, in the end they both do the same thing. Which one you’d rather use is a personal choice. One of the key reasons Java took off is because of the extensive toolkit of data structures it offers. The Java community rejected the idea of turning everything into a nail so that your hammer works. Clojure thinks that way of doing it is awesome. Neither one is more right than the other.

Yehonathan_Sharvit · February 16, 2021, 4:23am

With Java records, one adheres to Principle #1 and #3 but breaks Principle #2.

didibus · February 16, 2021, 7:27pm

This is one of the biggest distinction, Java style OO doesn’t have a way to model domains and another to model abstractions. This is one of the cons of OO in my opinion. When you’re implementing language abstractions, like algos or data-structures, that’s a nice default behavior, but when domain modeling it isn’t. The ability to define datatypes with custom semantics and even allow them to encapsulate data and perform protected mutations, this is valuable for those use cases. Java is great there, and Clojure only slightly improved on those features, since Java does a pretty good job at that already. The problem is when you now try to build domain applications, and need to model your domain information and their operations, those same tools are not as useful to this anymore, where there’s never a good reason to want equality to be based on the object memory location for example, and where all the issues @ericnormand pointed out come into play as well.

It seems every Java code base I work on “just happen” to be designed poorly then .

But more seriously, I actually will disagree with you here, when there was no use case for a Tank, it was quite a nice simple design for the model. If someone would have gone overboard with abstractions you’d have had the other issue Java code bases faces often, an overabundance of abstractions that actually muddies the water over the useful domain, due to Speculative Generality.

And really this is a reoccurring pattern in Java code bases in my professional experience, either their model isn’t abstract enough that it cannot adapt to future requirements without major refactors, or there is too many uncessary abstractions put in place to allow adapting to speculative future requirements which don’t quite hit the mark, because the requirement ends up different then what was anticipated, or for which the requirement never came and now the abstraction is just lingering added complexity serving no purpose.

If you know of a way to design Player initially which doesn’t suffer from those issues please let me know how you would design Player.

I’m not having to change the structure of my domain model when using a data-oriented model the same way I’m having too when using OO. This is all I need to do:

(defn moveBy
  [m byX byY]
  (if (and (contains? m :x) (contains? m :y))
    (-> m
        (update :x + byX)
        (update :y + byY))
    (-> m
        (update :x-pos + byX)
        (update :y-pos + byY)))

And now I can reuse moveBy for both Player and Tank.

Where as again in Java you’d have to do one of:

Which are all much more intrusive changes to your data model.

Did you read all what I wrote? I already discussed the interface case in Java. I do believe it is the best way to design this in Java, but it is an intrusive refactor to do so after the fact, and you cannot always predict what would need to be an interface and not and how much granularity to make them beforehand easily.

And since we’re on the topic of interfaces. I’m curious where everyone would club them? Are interfaces OO? Or are they data-oriented? Does an interface “Seperate code and data”?

It could be that a data-oriented inspired Java code base is one which defines all methods behind an interface?

In my Player/Tank example, an interface isn’t enough, you also need Java’s newer feature of having a default method in them so you can share an implementation between all types that implement the interface. This is another thing to make clear, Java has evolved more recently towards non-OO approaches, like was pointed out they are adding “records”, and they added interfaces a while back, and then default interface methods, they added streams, etc. At some point we need to establish if we want to discuss latest Java vs latest Clojure ? Or do we discuss OO vs DO which maybe requires establishing similar definitions we did for DO for OO as well.

I feel the beginning of my reply indicates the opposite, Java seems to have a single hammer, the Object, and wants to shoehorn everything into it, nailing all use cases with Objects. As another example, how annoying are Runnable and Callable? Having to wrap a function inside an Object just because Java tries really hard for everything to be an Object. It be nice if it just had functions as well for when you have a screw instead of a nail.

Ya, both offer Turing Completness, and you can use either/or to build any kind of application. How familiar and effective the programmer is at one or the other matters as well. That said, if you are the engineer in charge of the design of the application code base, you do need to choose how to model your domain information and its operations. It is worth thus understanding the various options and their pros/cons. Maybe this is a case of evenly matched trade-offs, and it’s not worth debating as extensively as we all are I’ll grant you that. But I think the exercise of discussing pros/cons actually depens your understanding of the design space, and no matter which approach you choose you’ll now do a better job as you’ll probably understand it better.

Richard_Heller · February 17, 2021, 11:00am

Obviously, I disagree with you here, but that’s fine. Doesn’t matter. I feel we’ve gone pretty far into the weeds and should bring it back to the original question, which is the constraint tying the function to its data. You haven’t removed that constraint, it’s still there. Saying it’ll work with anything that has an x and a y isn’t true. If something has an x that’s a string and y that’s an array, it won’t work. The data passed in is still tied to what the function requires, however you want to word that. It needs to contain the functions’s type, or be a subtype of it, whatever. What you’ve removed are any guarantees around the constraint and turned it into a brittle, loose coupling.

What you’re proposing, i.e. everything’s a map with magic key names to duck type the data, is the core of how JavaScript works. Trying to write robust, error resilient, reusable code with that paradigm is much more problematic.

didibus · February 17, 2021, 6:30pm

That’s not what I’m saying, which is why I keep insisting I think you misinterpret. What you say is true, but it’s not related to what I’m saying

The constraint is not about the logical compatibility of your data and your operations, such as “x” representing a position in a world and not the current health. The constraint is about the paradigm forcing the fields of an Object to only be accessible to the methods of the same Object.

This constraint is core to OO, without it you just have traditional procedural programming with procedures and structs.

The core of OO is that you have these things called Objects, they contain data, and they can receive messages asking them to make modifications to the data they contain. Thus all operations over the data must be defined as message handlers in the Object which contains the data, those are called methods. Method calls is the mechanism for sending the Object message, and this explains the Object.method() message sending notation.

Now this means that those message handlers are not as easy to reuse, because they’re tightly coupled to their Object. Inheritance is the way to reuse them, by defining some Object superset of others, they can inherit message handlers of their parents. Prior to default interface methods this was the only way to reuse them. Which means there was no way to have the same method work on more than one Object except for making the objects parent/child.

class Player {

  private int x;
  private int y;

  [...]

  public void move(int x, int y) {
    this.x = x;
    this.y = y;
  }
}

class Tank {

  private int x;
  private int y;

  [...]
}

Look at this example, now clearly the move method works for the Tank class as well, so how can I reuse it? I can have Tank extend Player (inheritance), but that is illogical to my domain, a Tank is not a Player. So what can I do? Now I’ll let you answer that, but I’d bet it requires changing the way the data model and their operations were designed with a big refactor.

On the other hand, had we not had such a constraint, and we allowed functions to operate over data passed to them as an argument such as in the data-oriented style:

(defrecord Player [x y])
(defrecord Tank [x y])

(defn move
  [m x y]
  (-> m
      (assoc :x x)
      (assoc :y y))

We can simply reuse move as is.

P.S.: And this is orthogonal to type checks, like if this was a statically typed language, you could declare that the type of the m argument to move is of type Tank OR Player, and the whole thing would be type safe, but still you can reuse it, because you removed the OO constraint and seperated code from data. Similarly in Clojure you can add dynamic type checks say by speccing move to validate m is a Tank or Player record at run-time.

Yehonathan_Sharvit · February 18, 2021, 4:33am

Do you mean that, in a sense, traditional procedural programming with procedures and structs is more flexible than programming with objects?
Could you elaborate about that?

Phill · February 18, 2021, 11:19am

In the context of the present discussion, we must put our foot down here. Costs are relative. The “overhead” of immutability is the price I happily pay for my ticket out of the tar pit!

Ironically, when data are not immutable, “copying the data” quickly gets complicated. How do you copy data that has pointers to itself and other stuff? How do you decide when to bother making a copy? How deeply must you copy the stuff?

A program gets littered with a bureaucracy of “safe copies” whose individual usefulness is hard to assess, except for a bitter suspicion that it needs a few more where it does not have them…

Taking the taxing decisions of “must I copy?” and “how shall I copy?” at the end of every subroutine bogs the good programmer down. And it does not bog down the sloppy programmer at all!..

Phill · February 18, 2021, 5:03pm

Quoting Alan Perlis: “It is better to have 100 functions operate on one data structure than 10 functions on 10 data structures.” Perlisisms - "Epigrams in Programming" by Alan J. Perlis

Yehonathan_Sharvit · February 18, 2021, 10:29pm

I know this quote of Alan Perlis but I don’t see how it applies to procedures and structs. To me, it applies when we adhere to DOP Principle #2

Structs are not generic data structures: you have to define a struct for a player, a struct for a tank, a struct for a position etc…

didibus · February 18, 2021, 11:10pm

No, but it gets complicated, and I’m not as familiar with the procedural style, so if anyone wants to correct me go ahead.

A procedure has inputs and a return value, can be called arbitrarily to run, and is allowed to do side-effects. It differs from an impure function in that it isn’t first-class, you can’t create them at run-time, pass them around as arguments to other procedures, have them close over environment variables, etc., where as you can with an impure function.

As opposed to methods, procedures are not tied to any particular set of data. They can access any data in scope of them, as well as their inputs. Other then that, methods and procedures are the same.

In procedural programming, you can structure your data in heterogeneous containers called structs or in arrays of homogeneous data.

So a struct is not very flexible, it is static in its structure, you can’t combine two of them together, or do any kind of set operations on them, etc.

Where it gets complicated is that normally procedural languages are missing a lot of other things you’d need to make structs flexible in their use. C is kind of the biggest procedural language, and it doesn’t support union types, so you actually can’t do what you can in Clojure in C, because the procedure is itself very static.

struct Player {
  int health;
  int x;
  int y;
};

struct Tank {
  int x;
  int y;
};

void movePlayer(struct Player *player, int x, int y) {
   player->x = x;
   player->y = y;
}

Now the problem is that the movePlayer function needs to know the type of struct in order to know the memory layout for it to change the value of x and y. The two structs have different memory layouts, so you can’t do the same thing you do for one on the other.

That’s because a struct is very very simple and static like I said, its like a fixed size array where the struct definition defines the start index and end index of each field.

But, I don’t know if this is a restriction of the procedural paradigm? Or if its more that no procedural language with more advanced features have been designed. What you would need is a generic procedure, something like:

void movePlayer<T>(struct T *m, int x, int y) {
   m->x = x;
   m->y = y;
}

So I could imagine a procedural language that supports that, at which point, ya in this case it would be more flexible than OO.

That said, there’s many other issues with procedural languages, and some parts are less flexible than OO, so even with such generic procedures, I think I’d say OO is more flexible overall. The big one being that mutation is very risky if unconstrained, which is where OO’s encapsulation of mutable data behind methods are really worth their salt.

seancorfield · February 18, 2021, 11:32pm

Not sure what you mean here because C definitely does have a union construct that allows you to declare overlapping memory layouts.

In fact, if you declared your Player struct so the position was first, you could have a union that overlapped Player and Tank and could access x or y from either type using that union.

struct Player {
  int x;
  int y;
  int health;
};

struct Tank {
  int tx;
  int ty;
};

union Movable {
  struct Player p;
  struct Tank t;
};

void moveThing( void* thing, int x, int y ) {
  ((Movable*)thing)->x = x;
  ((Movable*)thing)->y = y;
}

And it doesn’t even matter that the fields have different names between Player and Tank at this point: all that matters is the memory layout matches for those first two int fields.

I’m not claiming this is portable or even particularly safe, and I definitely wouldn’t claim it is “good practice” but this sort of overlapping of memory structures is often very necessary when you are doing low-level systems programming in C.

didibus · February 18, 2021, 11:38pm

Ah great, my C is amateur at best. That’s exactly the kind of correction I was hoping to get.

mvarela · February 19, 2021, 7:35am

I think it depends a bit on whose terminology you’re using. IIRC, in Pascal, for example (I guess I’m showing my age here), procedures do not have return values, but rather can return values via reference arguments. I guess in C the equivalent would be something returning void. As opposed to functions, which actually have a return type. Note that this is orthogonal to purity and referential transparency.

Yehonathan_Sharvit · February 23, 2021, 7:50am

New food for thought: a Java library named Paguro that embraces DOP.

Paguro simplifies a lot how to write code that manipulates data in Java.

Please share your thoughts about “Just use maps” with Paguro.

For instance, the following piece of Clojure code that turns a nested collection inside out to procuce a map

(def emails
  [["Fred" ["fred@gmail.com", "fred@hello.com"]]
   ["Jane" ["jane@gmail.com", "jane@hello.com"]]])

(->> (mapcat (fn [[person emails]]
          (map (fn [email]
                 [email person])
                 emails))
        emails)
     (into {})) 

;; {"fred@gmail.com" "Fred",
;;  "fred@hello.com" "Fred",
;;  "jane@gmail.com" "Jane", 
;;  "jane@hello.com" "Jane"}

is written like this with Paguro

vec(tup("Fred", vec("fred@gmail.com", "fred@hello.com")),
    tup("Jane", vec("jane@gmail.com", "jane@hello.com")))
    .flatMap(person -> person._2()
             .map(email -> tup(email, person._1())))
    .toImMap(x -> x)

// PersistentHashMap(Tuple2("fred@hello.com","Fred"),Tuple2("jane@hello.com","Jane"),Tuple2("fred@gmail.com","Fred"),Tuple2("jane@gmail.com","Jane"))

Want more? Take a look at detailed examples in Paguro GitHub repo.

Or play with a Live example here.

mvarela · February 23, 2021, 8:27am

It does seem better than plain java, but it is so noisy compared to Clojure!

Yehonathan_Sharvit · February 23, 2021, 9:19am

No one doubt that Clojure is the best language for DOP.

The question is: is Java DOP (with Paguro) better than classic OOP Java?

mvarela · February 23, 2021, 11:26am

I suspect you’ll still end up fighting with Java’s OOP-ness in any case