Congrats, @Yehonathan_Sharvit, on the current success of your book! You’ve created a phenomenon. I consistently see it on the Manning’s top 10 bestsellers lists. Here it is at #1 today (beating my book!):
Your ideas have really been getting to me. I’ve been thinking a lot about the advantages of Data-Oriented Programming (DOP), especially as it relates to Clojure. I wanted to share my thoughts somewhere:
Reduce boilerplate
A lot of the benefit of DOP is merely giving you the basics that you have to write yourself in Java:
- Getters and setters (
get
/assoc
) - Equality and hash code
- Iterators (
seq
) - Serialize/deserialize
- Clone
- Constructors
I think these things are quite mundane but provide significant savings.
Reflection
There are a lot of features that are easier to code in the general case. In DOP, that means we are coding at the level of the data structure. In Java, it would mean coding at the level of the class. However, in normal Java code, Classes, which describe objects, are not first-class. You have to go through a complex reflection API. Because they’re hard to do in a general way, you have to do them for each specific case. In DOP, we can do them once and use them for all data structures they apply to.
A non-comprehensive list of things that we do in Clojure that would require reflection in Java:
- listing fields
- diffing
- merging two entities of the same type
I believe the fluidity with which we program using DOP is not well-understood. We program at the entity level and at the map level. We often move between them fluidly and don’t realize it.
Open world assumption
The open-world assumption is critical for making systems that are resilient to change. In Clojure, it means that we assume there may be keys any particular piece of code might not recognize and it should keep going anyway. Likewise, a missing key can often be given a default or some other workaround. This is vital for forward and backward compatibility.
This is very hard to do in Java. Your code won’t compile if you access a field that does not exist.
Less is needed
There’s a lot of stuff you just don’t need when doing DOP:
- Names for classes
- synchronized keyword (since everything is immutable)
- type hierarchy shennanigans
Standard API
Clojure gives us a huge library of operations over its data structures. These implement many common algorithms and they make working with data very nice. Imagine having to do a join between every combination of two classes out of ten possible classes. Each pair would have to be custom coded. In Clojure, there’s a function that just does it. Even if there wasn’t, you could write it yourself very easily.
The other side is that you don’t have to learn new APIs. In Java, each library’s API contains a number of classes, each with custom methods with different semantics. The classic example is the Java Servlets API. Look at the Request and Response classes’ Javadocs. It’s amazing that Ring translates those into two hash maps with (I believe) zero information loss. All you have to do is learn the keys and value types to expect and you have everything you need.
Modeling
Modeling is a complicated activity, but suffice it to say that Clojure’s data structures give you everything you need (basically, product and sum types).
Data anyway
Some problems need to be moved into data anyway. For example, you might hard code the T-Shirt sizes into an Enum at first. But if they change every day, you have to move it into data.
DOP obviously starts with the data. However, a lot of stuff doesn’t need to be in data. If the T-shirt sizes aren’t changing, coding them as an Enum gives you a lot of benefits. For one, you get static type checks on the values. Java’s tendency is to prefer static encoding and reluctantly moves toward data encoding.
DOP’s advantage there is only having one kind of encoding: first-class data. You learn it once and you won’t have to change it because it’s already the most dynamic possible. However, you lose the flexibility and power of having multiple ways to encode it.
Lots of problems in the modern world are better solved at the data level. For example, dealing with an unknown JSON API is much better if you don’t have to model it ahead of time. You want to do minimal translation of JSON into the equivalent data types of your language and explore it. Once you get a good idea of how to translate it into the entities you need to work with, the advantage of keeping it as data diminishes. However, you are probably going to convert it back to data anyway, so it may be good just to leave it as data. That way, you only have to learn one way to work, which is the most powerful one anyway.
What do you think? I’m happy to discuss any of these points further.
Rock on!
Eric