Review: What is Data Oriented Programming?

To me, the requirement of data orientation is that the data has some kind of general algebra for accessing it. Relational algebra could be it. But a less formal algebra like what Clojure provides is possible, too. I would say the pattern matching in Erlang gives enough of a general algebra to make it at least possible. Whether people use it that way is another thing.

1 Like

OOP is data oriented. It takes the data, along with the functions that manipulate that data, and balls them up into a single thing and calls it an object.

Objects are a direct, 1-to-1 mapping of the data they represent. FP, on the other hand, forces data to be genericized so that it can be used with existing functions.

When implementing an algorithm with OOP, you create objects that directly implement how the algorithm is written. For example, if youā€™re implementing a tree traversal algorithm you first create a tree object and then traverse it. With FP, you first have to translate the concept of a tree to lists and maps so that you can use the existing functions to do the work.

Some people prefer one way, some prefer the other.

You make a good point, and just so you know I donā€™t think I, or really anyone, to my best understanding, really has a clear understanding of this. These are deep issues that go into the very foundation of the motivations behind mathematical logic, and, by extension computing.

That said, to me, the deepest lesson Lisp has to teach is that computing, and therefore programming, not just Lisp, is about data by itā€™s very nature. So in that sense, all programming languages are trying to be (whether by design or not) data-oriented, but some languages deal with this reality better than others.

So when I think of a language that is data-oriented, Iā€™m thinking of languages that are better than average on that count. Thatā€™s an intuitive thing as a practitioner. I think Clojure is the best language Iā€™ve used for my data-oriented programming needs (which is why Iā€™m here).

But, I would argue that a good object-oriented programming language (like Smalltalk) in the hands of a well developed object-oriented programmerā€“one who is very conservative about issues around stateā€“one who is careful to craft class systems that are generalā€“is data-oriented as well.

In my estimation, thatā€™s one of the other major weaknesses of object-oriented programming thatā€™s even present in the best OO languagesā€“that it tends to encourage the development of more complex systems for less mature programmers, even when issues around state are handled well, because building with generality in mind is hard. This is the major weakness of type-oriented FP languages (in my mind) as well. These languages make generality a feature by encouraging polymorphism. But, providing some idiomatic, practical, general structures as Clojure, Perl, Ruby, etc. do makes this mindset easier to benefit from and to adopt.

I used to tutor math quite a bit, and (to me) thatā€™s the hardest thing for less mature mathematicians to learn also. Generality should always be a goal, thatā€™s where the big insights come from and a tremendous amount of leverage. But itā€™s hard, and it getā€™s harder the more general you get. Which is one of the reasons I tend to see programming as just another one of the mathematical arts.

So, back to the point. Iā€™d describe Smalltalk as data-oriented because itā€™s a homoiconic (general), fully reified language that goes quite-a-bit further than average to encourage data-orientedness (if thatā€™s a word). That said, to me, itā€™s a spectrum. Iā€™d argue that Clojure is certainly among the best-in-class if not the best-in-class in that department, and the weakness of Smalltalkā€™s attempts at data-orientedness is that to pull off itā€™s approach effectively more maturity as a developer is required (I recall Alan Kay making mention of some things along those lines himself), and kind of an ā€œarchitectā€ mindset that became an epidemic in the Java, C#, C++ world (please excuse the sarcasm).

Languages that fail miserably with respect to data-orientation (in my experience) are more focused on control structures than data processing. So even when they claim to be ā€œobject-orientedā€ they are often really much more classification-oriented, mechanism-oriented, and place-oriented.

As to the question, ā€œwhatā€™s a general structure?ā€. As I best understand it, I think a reasonably well-informed definition, without claiming that this is a well understood idea, might be a structure that can describe (or perhaps express) the most structure (again a spectrum). So, for example, number is the most general structure, because everything can be described in terms of number. But, mathematical sets are quite general as well since they can be describe numbers and be can used to describe other more specific structures like Graphs, Algebras, etc.

I donā€™t often post to forums (perhaps this shows), but Iā€™ve enjoyed this discussion very much.

2 Likes

Perhaps a case-in-point from the Lisp world would be the examples given in Chapter 2 of SICP ā€œBuilding Abstractions with Dataā€, which is often, more-or-less, stateless OOP. But, it would be hard for me to not classify that approach as data-oriented.

Iā€™d have to respectfully disagree. I think youā€™re doing an equivocation fallacy.

Whatever OOP is and what Iā€™m trying to establish and tentatively calling Data Oriented as a style are not the same style.

And since OOP is called Object Oriented already, I donā€™t see why weā€™d need to also refer to it as Data Oriented.

I feel this is part of it as well, because thatā€™s the first benefit, is that you can use functions that are generic to the domain, but common to the structure.

This is also one of the big difference with other styles that rely on custom data types for each domain entity.

I think there is another criteria though, that of having a flexible enough set of data-structures that you can model your domain as closely as possible to its inherent structure.

What I mean by this is that, applying the Data Oriented style means that you fundamentally want your programā€™s domain model to reflect 1:1 with your domainā€™s way of modeling it.

Relational modeling would fail this criteria, since it imposes a very strict structure in order to give you that generic algebra.

Clojure on the other hand gives you various ways to accommodate most domain models, you have hierarchical with maps, flattened with lists, relational with sets, graph with Datascript/Datomic (graph might be the only one thatā€™s not included in core). Since real world domains are pretty much never of a fixed constant element size, all structures in Clojure are dynamic in size. And real world domains are full of heterogeneous representations, so Clojure also has great support for this.

I do think Erlang definitely has a Data Oriented approach to some extent. It does use lists, maps, sets, and other general data-structures to represent domain data. So that would qualify as having a flexible domain modelling toolkit. And it has an algebra over them, with pattern matching, guard clauses, and also its data types are abstract.

Where Iā€™m curious is in the actor model. The actor model forces a structure that might not be true of the real domain. What if some document is actually shared between accounting and marketing? In Erlang Iā€™d have to pretend that they both operate on their own copy and synchronize their changes between each other.

The way I personally distill what Iā€™m calling Data-Oriented would be in being as truthful to the structure and operations of the real domain. When I think of data, I think of real world information. How can we take the real world information, and model it in our programs ā€œAS-ISā€. Similarly, how do we manipulate data in the real world? Can we similarly manipulate data in our programs? In the real world, you manipulate data in a very generic way, and ā€œyouā€ are the one that defines the invariants of your particular usage of said data.

The programming language and the style of programming are not one and the same. A language can focus on providing tools tailored in helping the programmer with achieving one style over another. Java is OOP because it focuses on providing supportive constructs and tooling that helps you design a program in the OOP style.

Iā€™m just emphasising this, because from that perspective, it is hard to argue that an OOP language is Data Oriented. That would be arguing that the language is more focused on providing language level support for the Data Oriented style of programming over all others. If that was the case, the language would not claim to be focused on the Object Oriented style. You can off course try to make your language friendly to multiple paradigms.

Maybe one OOP language also has hints of Data Orientedness. Maybe the two styles have overlaps in places. Maybe some OOP language also focus on providing a more mutli-paradigm framework and has as its second most prominent style Data Orientedness. But Iā€™d still regard them as separate styles, and unless a language is mislabeled as being OOP, that labeling would indicate that OOP is its primary focus, not data oriented or any other style.

Now, Iā€™m also not saying that the Data Oriented style is the one true style and is the best style to use for all programs. And Iā€™m not saying the OOP style is the worst. Iā€™m only interested in distilling the various programming styles and their essence, so I can have a mental map of the landscape of program design.

I think in the context of Clojure, and reading the History of Clojure from Rich Hickey, I really feel like he was very focused on trying to build a programming language that facilitates taking the real world information as-is, and just plugging it in to the computer. What is associative is associative, what is nested is nested, what is flat is flat, what is ordered is ordered, what is relational is relational, what is shared is shared, what is independent is independent, what is contextual is contextual, etc.

This isnā€™t always the right approach. For example, if I were to make a game, if the game data is layed out one way, but thatā€™s not going to hit the CPU cache, I have to re-strcuture it some other way, because my game has to hit 60fps. And from this is born the Data Oriented (for games) style.

This is all me trying to reverse engineer the ā€œstyleā€ Clojure makes most appropriate off course. Itā€™s naively a reductionist exercise. All programs end up being unique in style, and Iā€™m just trying to find generalities.

I really like this paragraph, because you bring new ideas of styles, classification-oriented, mechanism-oriented and place-oriented for me to ponder on and explore.

So Iā€™ll end with some of the more concrete aspects I feel the Clojure Data Oriented style is about:

  • Default to value semantics, equal data is equal, the particular container type doesnā€™t matter, only the captured information does
  • Manipulate data directly, validate invariants, donā€™t encode the invariants in the container type and operations over the type. Having a generic data transformation algebra helps for this.
  • Use a structure that is similar to the real structure, aka, there shouldnā€™t be any constraint on what data-structures you are allowed to use to model your domain and their operations. Not everything has to be a fixed sized record, a tuple, an actor, an object, etc. This means pure and impure modeling should be allowed as well, since some things in the real world do mutate. Care can be taken to make this concurrency safe off course, like with the atom construct.

Just my 2 cents. Think of this as :didibus/data-oriented-style. Not to be confused with the same key name from other namespaces :stuck_out_tongue_closed_eyes:

I would repectfully disagree with your respectful disagreement. :slight_smile: The focus of OOP is on the data. Itā€™s all about isolating changings and managing / controlling how data flows through the system. Do we need to call it Data Oriented? No, that would be redundant.

Why is it called Object Oriented? Because thatā€™s how data is represented. Even the naming is all about the data. Functions are given a back seat to the data. The focus of FP is on the functions, with data being secondary.

How OOP goes about things may not be your cup of tea, which is fine, but itā€™s primary focus has always been on the data. The more a program focuses on data, the closer it gets to being OO.

I couldnā€™t have described the fundamental purpose of OOP any better if I tried. Iā€™m actually getting a little teary eyed over hereā€¦ Once it clicks how useful encapsulation and polymorphism really are, your journey to the dark side will be complete. :slight_smile:

This discussion has (naturally) tended to define DOP in contrast to FP and OOP, and where all those nuances and their respective algebras and powers lie. At the risk of overloading this topic further though like to give some love to the data itself, and re-emphasise the importance of support for namespaced keys in any ā€œprogrammingā€ that is really data oriented. Itā€™s another string that is almost unique to Clojureā€™s bow in contemporary programming languages (I think?), albeit with prior RDF art. The benefits of this one feature is easily overlooked but turn up everywhere from databases to APIs to UIs to DDD (IMO, where Iā€™m beginning to think the term ā€œbounded contextā€ boils down to namespaces). In fact this whole discussion has been about different peopleā€™s different interpretations of one insufficiently narrow term :data-oriented-programming so Iā€™m delighted to see namespaced versions of the same appearing :slight_smile: . Naming a concept is a powerful thing - and labelling it with something usable context free, with globally distinct and yet contextualised semantics by just namespacing the key itself, and wielding the powers to merge cross-domain concepts that come from that seems to me to be truly ā€œdata orientedā€ā€¦ or should that be ā€œinformation orientedā€ ā€¦

This talk does a much better job of explaining this power than I can - though his label for this is ā€œdata focusedā€ :slight_smile:

1 Like

Data driven programming, as Eric S. Raymonds defines it in his book ā€œThe Art of Unix Programmingā€:

ā€œWhen doing data-driven programming , one clearly distinguishes code from the data structures on which it acts, and designs both so that one can make changes to the logic of the program by editing not the code but the data structure.ā€

1 Like

This seems very close to how data-based DSLs work in Clojure.

1 Like

I consider Data Driven to be another style altogether. In that style, you build a description of your operations represented as data, and have an interpreter to it that performs the computation defined by the data. Hiccup is an example of that style.

1 Like

After a month of thinking, I came to the understanding that the fundamental characteristics of Data Oriented Programming are:

  1. Code and data are located in separate entities
  2. Data is immutable
  3. Data access is universal
  4. Data shape is flexible
  5. Data can be created via literals

It seems to me that 2,3,4 and 5 can be summarized in a short sentence:

Data is considered as a value.

What do you guys think?

These seem like a good start to me.

I think for any style, itā€™s impossible to really define the essence of it. Think of music genres, or architectural styles, you can explain a bit the themes and ideas, some of the more iconic characteristics, yet you can never nail it down, the borders between where one style begins and another end often can blur, and like music genres, swats of subgenres within a genre appears and everything becomes ever so much more difficult to put in unambiguous clear little categories.

So I think a good way to build an intuition into them, and to teach it to others is by example. Like with music, listening to music in one style and others helps you build that intuition. And same with architecture, looking through photos of different style is probably much better then reading their descriptions.

Thatā€™s why Iā€™d say, if you can produce example of small programs in a data oriented style, and show the same in other style. And then talk about some of the differences and how they relate to each style. Might include the use of different programming language, since not all of them can properly demonstrate the style (like choice of materials in architecture). That might be able to teach the style more effectively, and it be a great complement to your more definitional characteristics.

1 Like

Iā€™m late to this, but my 5c is ā€¦

Data Oriented programming is when ā€œdata is codeā€. Thatā€™s it.

FP, or not, doesnā€™t come into it. Neither does OO or not.

(For clarity, Homiconicity is more than this - it requires that ā€œcode is dataā€ as well. Ie. the code is represented in the languageā€™s own data literals.)

Data oriented design or programming usually involves two (or more) execution contexts.

To explain data-oriented design and how it applies to re-frame, I wrote this, which you might find interesting:
https://day8.github.io/re-frame/data-oriented-design/

I loved your ideas @didibus
It reminds me how important it is to maintain the balance between being too abstract or too concrete.

1 Like

Two observations have been living on the back of my head in regards to FP vs OOP. I hope it makes sense because this is just me thinking out loud.

  1. FP feels closer to the general input > process > output computing model, which is a simpler model to reason about. OOP on the other hand feels distant from that because you donā€™t really know what goes into a method for processing due to data encapsulation.
  2. FP also feels closer toā€”as far as pure functions are concernedā€”the mathematical concept of functions, which I think provides simple & powerful means of composition.

Thank you guys for all your inputs.
I have officially started to write my book about Data Oriented Programming.
A few excerpts are published on my blog here.
The introduction summarises my understanding of what is Data Oriented Programming.
Feel free to disagree and share your thoughts.

2 Likes

New milestone in the way to share with the global developer community the benefits of Data-Oriented programming. My book is available for early access at manning.com.

I have create a #data-oriented-programming channel on Clojurians so that we can create kind of a work group about Data-Oriented programming.

The first task of our work group would be to create a Wikipedia article about ā€œData-Oriented programmingā€.

1 Like

Congrats, @Yehonathan_Sharvit, on the current success of your book! Youā€™ve created a phenomenon. I consistently see it on the Manningā€™s top 10 bestsellers lists. Here it is at #1 today (beating my book!):

Your ideas have really been getting to me. Iā€™ve been thinking a lot about the advantages of Data-Oriented Programming (DOP), especially as it relates to Clojure. I wanted to share my thoughts somewhere:

Reduce boilerplate

A lot of the benefit of DOP is merely giving you the basics that you have to write yourself in Java:

  • Getters and setters (get/assoc)
  • Equality and hash code
  • Iterators (seq)
  • Serialize/deserialize
  • Clone
  • Constructors

I think these things are quite mundane but provide significant savings.

Reflection

There are a lot of features that are easier to code in the general case. In DOP, that means we are coding at the level of the data structure. In Java, it would mean coding at the level of the class. However, in normal Java code, Classes, which describe objects, are not first-class. You have to go through a complex reflection API. Because theyā€™re hard to do in a general way, you have to do them for each specific case. In DOP, we can do them once and use them for all data structures they apply to.

A non-comprehensive list of things that we do in Clojure that would require reflection in Java:

  • listing fields
  • diffing
  • merging two entities of the same type

I believe the fluidity with which we program using DOP is not well-understood. We program at the entity level and at the map level. We often move between them fluidly and donā€™t realize it.

Open world assumption

The open-world assumption is critical for making systems that are resilient to change. In Clojure, it means that we assume there may be keys any particular piece of code might not recognize and it should keep going anyway. Likewise, a missing key can often be given a default or some other workaround. This is vital for forward and backward compatibility.

This is very hard to do in Java. Your code wonā€™t compile if you access a field that does not exist.

Less is needed

Thereā€™s a lot of stuff you just donā€™t need when doing DOP:

  • Names for classes
  • synchronized keyword (since everything is immutable)
  • type hierarchy shennanigans

Standard API

Clojure gives us a huge library of operations over its data structures. These implement many common algorithms and they make working with data very nice. Imagine having to do a join between every combination of two classes out of ten possible classes. Each pair would have to be custom coded. In Clojure, thereā€™s a function that just does it. Even if there wasnā€™t, you could write it yourself very easily.

The other side is that you donā€™t have to learn new APIs. In Java, each libraryā€™s API contains a number of classes, each with custom methods with different semantics. The classic example is the Java Servlets API. Look at the Request and Response classesā€™ Javadocs. Itā€™s amazing that Ring translates those into two hash maps with (I believe) zero information loss. All you have to do is learn the keys and value types to expect and you have everything you need.

Modeling

Modeling is a complicated activity, but suffice it to say that Clojureā€™s data structures give you everything you need (basically, product and sum types).

Data anyway

Some problems need to be moved into data anyway. For example, you might hard code the T-Shirt sizes into an Enum at first. But if they change every day, you have to move it into data.

DOP obviously starts with the data. However, a lot of stuff doesnā€™t need to be in data. If the T-shirt sizes arenā€™t changing, coding them as an Enum gives you a lot of benefits. For one, you get static type checks on the values. Javaā€™s tendency is to prefer static encoding and reluctantly moves toward data encoding.

DOPā€™s advantage there is only having one kind of encoding: first-class data. You learn it once and you wonā€™t have to change it because itā€™s already the most dynamic possible. However, you lose the flexibility and power of having multiple ways to encode it.

Lots of problems in the modern world are better solved at the data level. For example, dealing with an unknown JSON API is much better if you donā€™t have to model it ahead of time. You want to do minimal translation of JSON into the equivalent data types of your language and explore it. Once you get a good idea of how to translate it into the entities you need to work with, the advantage of keeping it as data diminishes. However, you are probably going to convert it back to data anyway, so it may be good just to leave it as data. That way, you only have to learn one way to work, which is the most powerful one anyway.

What do you think? Iā€™m happy to discuss any of these points further.

Rock on!
Eric

9 Likes

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.