"Just use maps" in Java?

Yehonathan_Sharvit · February 12, 2021, 11:30am

In my book about Data-Oriented programming, I am doing my best to illustrate for non-Clojure developers Data-Oriented programming principles as we embrace them so naturally in Clojure . As Rich Hickey framed it: “We just use maps!”

The three main principles of Data-Oriented programming:

Separate code from data
Model entities with generic data structures
Data is immutable

In theory, we could adhere to the three principles in Java by constraining ourselves to:

Code in classes with static methods only
Data in immutable collections

I would like to ask the Java experts among us whether the Data-oriented paradigm could be implemented practically in Java.

If you think that it can’t, please explain why.

If you think that it can, please explain how, by addressing at least the following points:

What Java library for persistent data structures do you recommend?
How do we parse a JSON string into a persistent data structure?
How do you communicate to the database with persistent data structures?

Phill · February 12, 2021, 5:47pm

“Just use maps” in Java is doable, but you’ve got casts everywhere or you bend over backwards to force everything into a String. Distortions notwithstanding, it can actually be a reasonable design (among the “least bad” designs) when the same map keys are most conveniently treated as if they were record members for some purposes, and for other purposes as expandos to iterate over and treat generically… because if there’s one thing that’s even more tedious in Java than “just use maps”, it’s reflection!

liborio7 · February 12, 2021, 6:34pm

I will try to give my opinion on your Data-Oriented programming principles for Java.

Separate code from data

In Java this is quite easy to achieve. You can build a pojo that defines your data structure and several helpers/services/managers that can handle such data structure to do a certain business logic. That being said, Java is Java and no one can stop you from putting business logic inside the pojo. However, you can use some libraries to try to minimize this risk (eg. Lombok).

Model entities with generic data structures

The better answer for this is what @Phill has already said. You can use maps and reflection, but I would never suggest it or seen someone actually following this approach.

Data is immutable

You can define immutable data with the help of some libraries for pojo (Lombok) and collections (Guava).

mars0i · February 12, 2021, 6:46pm

In Clojure, defrecord gives you objects with named fields. In fact, for interop purposes, defrecord defines a Java class. So defrecords are not really “generic”. Yet they also work like maps, so all of the map-oriented functions are available for working with defrecords, which under some circumstances will be converted back into maps. (Of course there are potential risks with that kind of conversion–but in practice, not many, in my limited experience.)

I’m not sure how that fits into the Data-Oriented paradigm, but I think it’s one of the useful, pleasureful aspects of Clojure programming. Java can’t possibly provide this kind of convenience.

jarirajari · February 12, 2021, 11:11pm

“whether the Data-oriented paradigm could be implemented practically in Java” TLDR; Yes, but you shoudn’t. Here is why.

Let’s start by citing Data-Oriented Architecture: A Loosely-Coupled Real-Time SOA : "2.2.1 Data-Oriented Programming Principles DOP is based on the following principles, elucidated by Kuznetsov .

Expose the data and meta-data
Hide the code
Separate data and code, or data-handling and application-logic.
Generate data-handling code from interfaces

First, looking at this I would say that the DOP seems to focus more on data than programming, if you will. Second, and coming back to the questions. I have come to question the usefulness of the Object-Orient Programming paradigm, but not fully. What I don’t like is the coupling of the data and the behavior i.e. the logic, which combined creates a state.

Separation can be done quite easily in Java, but the moment that you add abstraction to your design, you will start writing a lot of boilerplate code. For example, if you model, let’s say a process, often you write Java class that a) builds a state b) is mutable, and c) mixes logic into the bundle. How I solved this was that I wrote a data class (perhaps could Records could be utilized in Java 15) called ProcessData (i.e. data entity), and then created another (wrapper) class called Process that wraps (only) behavior around any ProcessData instance.

I started re-writing my code this way but stopped and reverted my changes after a while: not that it wasn’t working, but I realized that a) I was writing a lot of extra code just because of the abstraction required it b) I wasn’t focusing on delivering actual value and c) writing that extra code started to cause a lot of repetition and extra classes. This is when I started to think that maybe I was actually fighting against the OO paradigm itself - and that was the root cause of my problems.

IMHO, while I think Java can handle immutability, generic data structures, separation of code from data, etc. I wouldn’t recommend choosing Java for this type of programming. However, I would like to add that if you would want to try this out in Java, you definitely can, but don’t forget Lambdas (the syntactic sugar) and different Mapper libraries. But you are definitely going write a lot of extra code: creating extra classes, extra methods (because of method signatures), doing type casting, etc.

didibus · February 13, 2021, 12:00pm

If you think that it can’t, please explain why.

You’re fighting against the idioms, and that’s a uphill battle.
Because of the types, your maps can’t contain heterogeneous data, you’d need to make everything Object, which would get rid of the static type benefits of Java in the first place.
The default equality semantics in Java don’t work well here. Like two maps of the same data are not equal by default, at least not for the standard maps.
You still need mutation at some point, in Clojure we have the concept of managed mutation, which often wrap over immutable data-structures like an atom with a map in it. You’d need something similar in Java as well, or you will just reintroduce potential thread unsafety at the boundaries where you have mutation.
Where are all the 100 functions that operate over those same data-structures? Without those, manipulating the data and transforming it won’t be as convenient, which is one of the benefit of putting your data in data-structures, so you can reuse the same large set of data manipulation functions no matter what “entity” you’re modeling.

These are why I think it would be challenging to a point where the friction would be too high. You can do it, but I don’t think that you should.

Like others have said, it be better to compromise with:

Code in classes with static methods only
Data in immutable classes with only fields on them (with value equality semantics).

mvarela · February 13, 2021, 4:21pm

I’d add a point 5.5, data literals. It’s amazing how helpful they are, and how painful doing basic stuff like populating a map or building a JSON object is in Java or C#

Yehonathan_Sharvit · February 13, 2021, 4:57pm

What are, in your opinion, the main practical insights a Java developer would gain from learning Data-Oriented programming, beside willing to move to Clojure?

didibus · February 14, 2021, 3:53am

Hum… I’m not too sure. I think starting to think more in terms of values, and possibly starting to design more value objects (immutable objects) with value equality.

I feel one part of data-oriented to me actually relates a lot to Domain Driven Design.

If you model your domain as a mixture of Values with value semantics, and Identities over those.

Like Coordinate is a value object with fields X and Y. You make its equals be equal to other Coordinates of the same X and Y value. Then you make Player an Identity which has an ID and it has a Coordinate value. This would be a typical DDD setup.

Ok, this post will be a bit all over the place, since I’m not really sure and exploring the idea. I feel Java maybe just can’t really be used in a data-oriented way, unless you’d build so many other constructs and then use those.

Recapping here:

A standard Java design:

class Player {
  
  private int X;
  private int Y;

  public Player (int X, int Y) {
    this.X = X;
    this.Y = Y;
  }

  public move(int x, int y) {
    this.X = x;
    this.Y = y;
  }

  public getX() { return this.X; }
  public getY() { return this.Y; }
}

Notice that first of all I had to write so much code to get what basically is just: {:x x :y y} in Clojure, and even then I actually don’t have the same thing yet, lacking equality semantics, proper hash-codes, and not immutable.

Now in standard Java the Player identity isn’t modeled explicitly, right now it’s based on the instance of the player object you’d create:

Player playerOne = new Player(10, 20);

Now the identity is playerOne, but that’s just an alias to the real identity, which is actually the Object memory address.

Also, two players are equal if they are the same Object. And I can’t really extract the coordinates in any way, like there’s no structure for them, and in Java you can’t just create structures dynamically, so I’d need to explicitly create a Coordinate class for it.

Now in Clojure, you could do this, but you most likely wouldn’t:

(def player-one {:x 10 :y 20})

I mean, look, in one line of code I have all the previous code I wrote in Java

But ok, in Clojure you’d model the identity explicitly instead:

(def players {:player-one {:x 10 :y 20}})

Which is interesting, because even your list of players is now runtime data. To get all players you just do (keys players). Now depending on the use case, you could arrive to this as well in Java:

Map<String, Player> players = Map.of("playerOne", new Player(10, 20));

Don’t even remember if that’s a valid way to construct a map but oh well. It’s not as intuitive to get here from Java, and why is the player identity a string? And also, you now can’t mix more things in this map, so in Clojure you’d eventually get:

(def game-state (atom {:players {:player-one {:x 10 :y 20}}}))

And I mean wow already we accomplished so much domain modeling in a single line of code. There could be ways to do this in Java, but most likely at best you’d keep each of the keys in your game-state as top level variables with their identity an implicit variable reference, no real key associated to them.

Ok let me go back to Java. In DDD you’d make a change as so:

class Coordinates {
  private int X;
  private int Y;

  public getX() { return this.X; }
  public getY() { return this.Y; }

  public Coordinates (int X, int Y) {
    this.X = X;
    this.Y = Y;
  }

  @override
  public boolean equals(Coordinate other) {
    return this.X == other.getX() && this.Y == other.getY();
  }
}

class Player {
  
  private Coordinates coordinates;

  public Player(Coordinates c) {
    this.coordinates = c;
  }

  public move(int x, int y) {
    this.coordinates = new Coordinates(x, y);
  }
}

Where now we’ve introduced a value object, and thus the concept of value/identity is forming more clearly. The Player as the identity for a set of immutable values, thus the Player is a representation of changing values over time that can be refered to by name (identified).

In Clojure, we already had that, because everything was already just values to start with. The map {:x 10 :y 20} is already a Coordinates value object by itself, no special care needed, and the Player was a key/value pair of the identity over that map:

{:player-one {:x 10 :y 20}}

This is the same as the DDD version in Java.

So that’s already getting us a bit more data-oriented-ish, but we don’t meet all the requirements you’ve defined for data-oriented.

Now, in true DDD, we’d also be forced to be explicit about the identity, because generally it believes you want to have a database, or a way to add/remove/update your domains identities.

So you’d change things like:

class Player {
  
  private String id;
  private Coordinates coordinates;

  public Player(String id, Coordinates c) {
    this.id = id;
    this.coordinates = c;
  }

  public move(int x, int y) {
    this.coordinates = new Coordinates(x, y);
  }

  public String getId() { return this.id; }
}

Now in Clojure your player is already like that:

{:player-one {:x 10 :y 20}}

That’s the same thing, you want the id? (first (keys player)) that’s it. But you can refine it a little if you prefer and there’s a few different ways depending what’s most convenient like:

{:id :player-one
 :coordinates {:x 10 :y 20}}

Is a simple change and maybe you prefer the explicit keys for the various piece of information inside the Player, so that’s all you need to do to get the same exact thing as Java.

Ok, one last thing, you could take the concept of identity/values to the extreme in Java, because you still have a lot of mutation in theory. If your Player had another value object, like say inventory, well in theory you could have an inconsistent state because you could say have code that says move player to item and pick it up, which should result in your player being moved to that coordinate and also have the item in the inventory. But in Java, it would be easy to have a bug where somehow the player doesn’t end up with the updated coordinates but does end up with the item. Because nothing on the Object protects the invariant between mutable changes made to both inventory and coordinates, even though each one contains an immutable value, the player data is not a value object yet. So here’s the full version you would really want:

class PlayerData {
  
  private Coordinates coordinates;
  private List<Item> inventory;

  public PlayerData(Coordinates c, Inventory i) {
    this.coordinates = c;
    this.inventory = i;
  }

  public move(int x, int y) {
     return new PlayerData(new Coordinates(x, y), this.inventory);
  }

  public add-item(Item item) {
    return new PlayerData (this.c, this.inventory.clone().add(item));
  }
}

class Player {
  
  private String id;
  private PlayerData playerData;

  public Player(String id, PlayerData playerData) {
    this.id = id;
    this.playerData = playerData;
  }

  public update(PlayerData newPlayerData) {
    this.playerData = newPlayerData;
  }

  public String getId() { return this.id; }
  public String getPlayerData() { return this.playerData; }
}

And now you can use it like:

playerOne.update(playerOne.getPlayerData.move(30, 50).add-item(item));

Seems we’re slowly getting closer to data-oriented programming here. Let’s see in Clojure:

(def player-one
  {:id :player-one
   :coordinates {:x 10 :y 20}
   :inventory [:guitar]})

(-> player-one
    (assoc :coordinates {:x 30 :y 50})
    (update :inventory conj item))

Ya so the Java and Clojure are both starting to look a lot alike. In Clojure we don’t need the PlayerData abstraction, because the assoc and update to player are immutable already, and protect our invariants. Also, we can easily get the playerData out if we wanted without needing it be explicitly defined:

;; This gives us the PlayerData
(dissoc player-one :id)

Alright, I feel we’re getting close. To make it fully data-oriented now I think we need to make it more functional, and split data and methods into seperate things. So finally I present data-oriented Java:

class PlayerData {
  
  private Coordinates coordinates;
  private List<Item> inventory;

  public PlayerData(Coordinates c, Inventory i) {
    this.coordinates = c;
    this.inventory = i;
  }

  public Coordinates getCoordinates() {
    return this.coordinates;
  }

  public Inventory getInventory () {
    return this.inventory.clone();
  }

  public static move(PlayerData pd, int x, int y) {
     return new PlayerData(new Coordinates(x, y), pd.getInventory);
  }

  public static add-item(PlayerData pd, Item item) {
    return new PlayerData(pd.getCoordinates, pd.getInventory().add(item));
  }
}

class Player {
  
  private String id;
  private PlayerData playerData;

  public Player(String id, PlayerData playerData) {
    this.id = id;
    this.playerData = playerData;
  }

  public String getId() { return this.id; }
  public String getPlayerData() { return this.playerData; }

  public static update(Player p, PlayerData pd) {
    return new Player(p.getId(), pd);
  }
}

Separate code from data

Yes, our code is static (happens to be grouped on the same “namespace” (ie class in Java), but we could move them outside somewhere else if we wanted because they don’t belong to the object.

Model entities with generic data structures

Not quite, which is why we need to implement a move and an add-item ourselves and can’t just reuse existing data-structure functions to do so. But in Java this is probably better in my opinion. At least we’ve reduced our data to dumb classes that only have data, and a clear distinction between identity/value.

Data is immutable

Somewhat, we had to do a lot to protect data from being changed after it is set, such as cloning the list, and making the field private, and not having any setters.

Now, would this style in Java be a good idea? Honestly I don’t know, I never tried it on a code base. I don’t know how important a style is relative to the ergonomics of a language, this style doesn’t feel as ergonomic in Java as in Clojure, so I’m not sure how it would fair against other styles used in Java.

P.S.: Wrote this all on my phone, so there might be some mistake in my code.

Richard_Heller · February 14, 2021, 4:34am

Could it be implemented? Yes. Have people done it before? Yes. Will it ever catch on? No.

Separate code from data. This is a fallacy that even Clojure programs don’t do. There’s never been code written that works no matter what data you hand it. Every program’s functions take specific data. That data may be “just maps”, but it can’t be any maps. It has to be maps of a specific form with specific data layout. Classes group those functions together into a logical unit and specify what the map that’s handed to them has to look like.
Model entities with generic data structures. If by generic you mean dynamic and can hold any data, that idea has been soundly rejected by the Java community. You lose type safety. There’s a reason Node projects tend to move to TypeScript and Python added a typing system. Especially for larger projects with many developers, knowing exactly what a function takes as arguments and exactly what it returns is very beneficial, as opposed to the “everything just takes maps, figure it out” approach that dynamic languages love so much. If it’s not beneficial, why was spec added? And why do so many people use it?
Data is immutable. This fad has fallen out of favor for the most part. There are times when it’s useful and times when it’s not. The times when it’s useful, copying the data tends to do the job well enough with less overhead. Immutable data is easy to achieve, just don’t change the data. Pure functions are really the beneficial part and they don’t require special data structures.

didibus · February 14, 2021, 5:22am

That’s not true, tons of functions are generic over maps: keys, vals, reduce, group-by, sort, count, assoc, dissoc, filter, =, get, select-keys, etc.

This is true in Java as well.

I think you interpret the meaning wrong. What is meant by this is not that functions will always work on every piece of data imaginable, but that they can be made to work on more than one piece of data.

If the function is coupled with the data, you can’t do that, because well, you can’t independently call that function and give it some other data to work on, it will access the data inside itself as a side effect.

Java solves this limitation with interfaces, which effectively decouple code and data. The function can now be generic over multiple different data by providing a different implementation for each, thus the same function now works with more than one piece of data.

But this is also simply solved by just passing the data to the function and let it do whatever it wants based on the passed data, instead of only allowing the function to work only for the fields in the class it is defined.

The other thing that is meant by separate code and data (but maybe only in a Clojure context) is that it should be possible to write more functions over the data as a consumer of the data. Java combines code and data to offer encapsulation, the data is protected from external modification, but that also mean that you can’t manipulate it freely from the outside. Which means you can’t just extand the set of what you can do with this data freely. There’s good reason for that in Java, you often want to protect invariants over the data since it is mutable. But if the data is immutable and copied on change, this makes no sense, so why would you limit the code that can change the data to only the one defined internally alongside it?

Yehonathan_Sharvit · February 14, 2021, 8:16am

This is incredible @didibus: I can’t believe you typed all of this in your phone!

I think we can write in Java something similar to what we are used to in Clojure.
Of course, it’s much more verbose. But doable.

Here is my attempt to implement your example with the players, leveraging immutable collections from https://github.com/hrldcpr/pcollections:

Generation of data

PMap playerData = HashTreePMap.from(Map.of("playerOne",
                                         HashTreePMap.from(Map.of(
         "id", "playerOne",
         "coordinates", HashTreePMap.from(Map.of("x", 10, "y", 20))))));

Accessing data


class GameQuery {
    static PMap getCoordinates(PMap playerData, String id) {
      Object coordinates = ((PMap)playerData.get(id)).get("coordinates");
      return (PMap) coordinates;
    }
}

Example of query

GameMutation.getCoordinates(playerData, "playerOne");

Modifying data

class GameMutation {
    static PMap addPlayer(PMap playerData, String id) {
      return playerData.plus(id, HashTreePMap.from(Map.of("id", id)));
    }

    static PMap move(PMap playerData, String id, int x, int y) {
      Map coordinates = HashTreePMap.from(Map.of("x", x,
                                                   "y", y));

      PMap nextPlayerData = ((PMap)playerData.get(id)).plus("coordinates",
                                                             coordinates);
      return nextPlayerData;
    }
}

Example of mutation

GameMutation.move(playerData, "playerOne", 42, 42);

Now some questions arise:

Does it worth it to represent the whole system as an immutable hash map and write code like this in Java?
Is there a way to make the Java code less verbose?

Richard_Heller · February 14, 2021, 2:37pm

Aside from the fact that no program is written solely with primitives, those functions you listed only work with Clojure’s core data structures. If you implemented your own hashmaps and tried to pass your data structure into them, it wouldn’t work. They expect the data they’re given to implement a specific interface and have a specific form.

What you described here is polymorphism. There isn’t one function that takes different data, there are multiple functions and which one is called is determined at runtime. The individual functions only work on the data they’re expecting to receive. If the wrong function is somehow dispatched, it’ll break.

The function only works if the map it’s given has the fields that it’s looking for. Classes guarantee those fields are present. Clojure’s functions don’t.

Yes, you can. The primary mantra in the OOP world is “reusable abstractions.” Everything is meant to be used by the outside. What classes/interfaces give you is the contract between the two pieces of code.

The bottom line is, code and data are never separate. It’s impossible to separate them.

didibus · February 14, 2021, 9:14pm

I think you’re still misinterpreting the idea. You’re writing an application, you have to model some real-life user domain, there is information to model, you need to find a way to capture and structure the information of the domain. There are many approaches to this domain modeling. Here we’re saying the approach will be to model your domain data using standard Clojure data-structures. So almost exclusively using Clojure’s Persistent Collections.

Now we can discuss pros/cons of doing so, versus using some other approach, like classes and objects. One of the pros of this approach is you automatically get a bunch of functions for free, all the ones that are generic to the domain, but specific to the Clojure collections. So simply by having modeled my domain information inside a Clojure map, I can now perform a group-by on it, I can now compare two records from it for equality, I can now select a subset of some information in a record of it, I can now filter down records to some given predicate, I can dynamically add or remove information to it, etc.

No matter which way you lean on your favourite technique for modeling of domain information, discussing the pros/cons would be interesting. @Yehonathan_Sharvit is that something you’ve looked into?

No, polymorphism is a possible approach to what I’m describing, but it’s not what I’m describing. For example, inheritance is polymorphic as well, but it is similarly restricted to coupled data within the inheritance chain. An interface on the other hand can adapt to any data, there are no constraints. Similarly a function that takes data as input can also adapt to anything it receives, it can do so using multiple strategies, one of them could be using polymorphism of the type of arguments, or the number of arguments, but it can also do a case/switch, or it can be generic over some common data-structure, etc.

But the important point is that the flexibility is only possible if you decouple data and operation over the data. When we say that operation over the data are not seperate, we mean that the operations are constrained to only operate over the data they belong too. So when we say to seperate code and data, we mean to not constrain the operations to only operate over some specific data. That doesn’t mean the operation magically can “do the right thing” on all sort of data, but that the constraint is gone. This can be argued to lead to systems that can evolve more easily over time, since by removing the constraint, you’ve created more flexibility in what you can do.

That’s not necessarily true, but yes, you cannot give a function something it doesn’t know how to work with and expect logical results from it. But the function can easily do things like try to find what it needs even in multiple different places, and hopefully succeed. You can also have it return nil or throw an error when it is handed data that doesn’t make sense to the operation it’s meant to be doing, and thus the function now works even on data it doesn’t understand, and the “right thing” to do is return an error or nil prune, or it could just return the data unchanged, etc. This makes it the function actually can be used on any arbitrary data.

You’ll need to back this up, OO style generally implies encapsulation, which means constraining the modification to the data the outside can do to only a small pre-existing set of methods.

didibus · February 14, 2021, 10:44pm

For #1, I don’t know. I’d need to try it on some sizable code base to really know. My gut feeling is it isn’t worth it, for the reasons I mentioned last time, it just doesn’t fit the Java idioms and nothing about the language prepares you or help you with this style. But it’s hard to say without trying it out. The use of a lot of string keys to look things up bypassing the compiler checks for some reason scares me more in Java, cause you have no tools to manage it, no REPL, no conscise code, no destructuring, no spec, no doc-strings, there’s less emphasis on tests and on comments and good naming, it seems you could easily introduce mutation and unsafe data read/writes if you’re not careful, etc.

For #2, there might be ways to have some better fluent interfaces, but there’s not much available in Java to improve ergonomics of the language itself. Annotations and compiler annotations are one, and fluent style interfaces are the other. Beyond that, some people do implement source pre-processor like Lombok or do a lot of source code gen, but those are very difficult to build and maintain, and poorly integrated with Java’s development tools normally, so only the most popular ones get support like Lombok.

Richard_Heller · February 14, 2021, 11:25pm

There’s no way to remove that constraint. Let’s take everybody’s favorite domain to model, students. Say you have a function to find all the first grade students.

(defn list-first-graders [students] ... )

What is students? A list? A map? How do you know? What does each record look like? How do you determine which ones are first graders? What happens if the data passed in doesn’t fit the original model? Can it still determine which ones are first graders? What if a list of Beatles albums is given? Note that with classes, those last two don’t need to be checked for. Just sayin’.

In order for the function to determine what it’s supposed to, it has to be given data in a specific format. Yes, it should do error checking and reject anything that doesn’t fit. But it cannot be separated from the data.

What are Clojure maps? When the rubber meets the road, what are they? They’re encapsulated objects. Can you change their internals? Can you get at the buckets the entries are stored in? Or modify the hashing algorithm? Only if they’ve explicitly exposed it.

Let’s be honest, the primary reason that Clojure is a nicer language to work with than Common Lisp is because it uses encapsulated objects under the covers, which weren’t as much of a thing when CL was standardized. [Cue PSA voice] That’s what OOP can do for you.

ericnormand · February 14, 2021, 11:47pm

Yes, you can. And I think you might even find it more convenient than regular Java for a wide range of applications. The hard part, I think, is that it will take a lot of exploration and design work to make it practical. Clojure and its ecosystem have a ton of design work already.

My opinion is that you have to take your three points as general statements and free your mind from the specific ways Clojure implements them.

Let’s start with the second one: What is a “generic data structure”? Clojure uses maps, vectors, sets, keywords, etc. But it seems to me that the set of data structures that are optimal for use in Java would be different, considering the constraints of the type system and syntax.

One approach might be to start with classes implementing EDN, build something substantial in it, and work really hard to build in all the conveniences that make it nice to work in. One trick I remember from college (where we did a ton of Java) was to have different accessor methods for different types. You’d have a .getInt() and a .getString() and a .getObject(), etc. These turned static type checks into runtime checks without the need to cast everywhere. If you could add these to a Map class, that would make them more convenient to work with. I would call this “Clojure in Java”. I think this is possible but why not just use Clojure?

Another approach that I think would be worth exploring (but would probably be a ton of work) is to think hard about what small set of classes you could consider generic. Clojure uses maps for both entities and indexes. What if in Java you defined an Entity class? It has string keys and heterogeneous values. .equals() and .hashCode() are defined in the right way, and even a way to print and read it in could be defined. Indexes are much more like traditional hash maps with homogeneous keys and homogeneous values. You could make a class for them, too. The trick would be to find a set both ergonomic and powerful. You would then use Entities to implement your User and Document concepts.

I like this approach because it could actually take advantage of the type system instead of fighting it. Let’s call it “Generic data Java”.

Another idea is to think of “generic data structures” as a call to implement a powerful data model in memory. I would probably choose a relational model, or something like Datascript. Let’s call it “Relational Java”.

A final idea is to figure out a generic way of doing reflection that makes it powerful and general. Here’s an example. What if you defined an abstract class called DataRecord. You would make a regular POJO that extends that DataRecord. But on DataRecord, you have a method like .get(String) and .set(String) that reflects on the concrete class to figure out the fields. You could have something like Clojure’s keys and vals. The trick is that they work on a POJO! I would call this “Reflective POJOs”. This way, you’re able to write regular-looking Java, with new classes to represent entities, while still getting the “100 operations” to work on them.

In all of these cases, I think they should really stress test the definition of Data-Oriented Programming. What does it mean for data structures to be generic? If these different approaches don’t fall under data-oriented, then I think the principle needs to be defined to exclude these.

What does it mean to separate code from data?

The way I think of it, which may be different from the mainstream, is that you should separate domain-specific code from the data representation. Obviously, you’ll need some code that knows how to operate on the hashmap object, but that’s generic and not domain-specific. But once you have those, you can program at the domain level totally separated from the data structure. I think this is one of the hidden benefits of the entire paradigm: Data modeling is quick and easy because you’re not stuck defining fields, constructors, getters and setters, equals, hashCode, etc. Clojure gives you all that for free.

You see hints of this kind of generic data thinking in modern Java. I am quite impressed with the Java functional interfaces. (java.util.function (Java Platform SE 8 )) They are totally domain agnostic and hint that people in the Java team really get the idea of defining generic interfaces devoid of domain meaning. This is a far cry from what I remember of the Swing API which defined different classes for each kind of event handler. Is ButtonDoubleClickHandler really so different from ButtonSingleClickHandler? Where’s the abstraction? Now you might be able to implement Consumer<SingleClickEvent> or some such.

What does it mean to be immutable?

This is a big question. I’m not sure if strict, language-enforced immutability is required. That would rule out JavaScript. Immutability could just be the idea that you choose not to mutate, even if you could. That said, language-enforced immutability can really help, and Java has really good support for this. Just don’t implement any mutating methods on the generic data structures. Do copy-on-write only.

Okay, those are my ideas about how to implement it. I’ll post another one about the benefits.

ericnormand · February 14, 2021, 11:54pm

I forgot to answer these questions.

Java library for persistent data structures do you recommend?

No ideas. I’d probably skip it and use copy-on-write.

How do we parse a JSON string into a persistent data structure?

I’m not sure. I imagine the libraries have a way to construct the persistent data structures from the java.util equivalents. You could just recurse down (visit in the Java parlance ) and construct persistent versions of the tree.

How do you communicate to the database with persistent data structures?

Something similar to clojure.java.jdbc (the old version). Just construct persistent maps from the java.sql objects.

greinseth · February 15, 2021, 1:40pm

Records in java will make the language much more ergonomic in terms of domain modelling: What’s New in Java 15 | Baeldung

You get immutable data and value semantics with compact representations:

public record Person(string name, int age) {}

And I’m guessing that Java eventually will adopt C#'s destructuring and with operator so you can write updates like this:

var p1 = new Person("Adam", 42);
var p2 = p1 with { age = 43 };
var (name, age) = p2;

(source: What's new in C# 9.0 - C# Guide | Microsoft Docs)

But what I find most interesting in Java’s future, which should benefit Clojure as well, is Project Valhalla, which will allow us to put lists of records in contiguous memory - instead of using primitive arrays when performance is needed. That’s a digression though.

ericnormand · February 15, 2021, 3:04pm

I think this is a great question. @didibus has done a great job already identifying it, but I’ll wax on it a bit myself.

I think Object-Oriented Design (OOD) took a wrong turn somewhere. Instead of modeling the information system, they try to model the scenario. Let me give an example.

If an OOD practitioner were contracted by a supermarket to write software to manage their cash registers, they might create a Product class and a Shopper class. Where do you put the .buy() method? Do you put it on the Shopper, because the shopper is the subject of the verb in an English language description? (“Eric buys a broccoli.”) Or do you put in on the Product because the Shopper is calling the buy method on the Product? These are the kinds of questions that get asked. You’ll come up with something, and it can work.

But this totally misses the point. Instead of modeling the situation in the grocery store, your job is to model the information systems of the store. Those are the inventory, the sales receipts, the tax calculations, the accounting of money in each register. Somewhere along the way, OOD forgot they were building information systems and thought they were building something more like a simulation.

The primary benefit of DOP is that it makes it clear that you are processing information. You are not modeling the grocery store. You are fulfilling the data requirements of the grocery store. There is no need to map isA and hasA relationships from the real world into the computer. You need to keep track of how many apples you have, how fast they are selling, etc.

OOD made another wrong turn, which led them to focus on the wrong level of reuse. Back when OO was building its momentum in the industry, everyone sold it as the answer to the problem of reusability. Somehow, those classes you wrote would be reusable in a way a C program never could be.

I believe a high level of reuse is possible, but they did it wrong.

Java focused on reuse at the wrong level. They thought “every department in the company deals with people (Person class), so let’s have every department throw all of their methods into that one class”. They tried to make it reusable by piling on. And there is a little bit of reuse. the Accounting dept might use 30% of the basic routines in the Person. The HR dept uses a different 30%, etc. But then they use the overlap routines slightly differently, so they either fork them or build yet more code to work around the differences. The monstrosity that is the Person class could have been 9 different, much simpler, smaller classes, one for each department. But they wanted reuse!

Notice that this is a similar error to the first wrong turn. Every department deals with people, so let’s make a Person class to represent that. But your job as a programmer is not to build a single ontology that captures every aspect of the world. Your job is to make sure the right information gets from A to B. Each department has different data and processing requirements, so there’s no real overlap at the domain level.

It turns out that you can’t get real, good reuse by piling a bunch of specific use cases into the same class. You need to go down the layers to where there is a shared, general structure. Going down the stack usually means making something simpler and less specific to any particular situation. We see a ton of reuse in the Java standard library precisely because they didn’t have any specific use cases in mind. They just made general-purpose utilities.

When companies do recognize this, you get stuff like Google Guava. Highly reusable components. They are reusable because they contain no domain content. And their release as open-source libraries, useful to all, benefits Google because now there is a clear boundary between general-purpose and domain-specific constructs.

The trouble in OO languages is there is no clear way to differentiate layers. It’s classes all the way down. hasA relationships (object references) are sometimes peer relationships and sometimes subordinate relationships. But there’s no way to tell by looking. It’s easy, especially in the crunch of corporate deadlines, to lose track of layers. That is, if that was ever discussed in the first place.

I want to call out a word I’ve heard thrown about in the OOD circles: collaborator. An object gets work done with the help of collaborator objects. It sounds so anthropomorphic and egalitarian. This idea has its merits (it encourages composition, for one) but encourages mixing of levels. Is my InventoryManager really collaborating with a hash map? No. It’s implemented using a hash map. The relationship is clearly subordinate.

The second benefit of DOP is to clear out these ideas of “reuse at the top”. Reuse comes at the bottom. The more you can push down to the bottom, the better. A lot of problems are easier to solve down there. To illustrate, Clojure has pushed equality/hash code down there. It’s pushed all the getters and setters down there. Because we’ve got the right, reusable components, we can start modeling our data right away. With Clojure, you can start modeling your domain so quickly that it feels effortless. As @didibus was saying, there’s no need to create a new class.

The third wrong turn was to focus on style over substance. OO talks more about the cleanliness of code than on whether the code expresses an appropriate model. I talked about this in my recent newsletter, so I won’t fully flesh it out here. To summarize: if there are a lot of classes, instead of asking “is this too many classes for our domain model?”, we ask “is this number of classes inconvenient for the programmer?” It seems, in OO land, once code is written, its relationship to the world is forgotten, and now it’s just about developer convenience.

The third benefit of DOP is that you either have the data you need or you don’t. Although I think there’s a lot of benefit to modeling well in the DOP paradigm, there are simply fewer mistakes you can make and still have functioning software. At least, I speculate and hope that is the case.