Review: What is Data Oriented Programming?

didibus · June 13, 2020, 1:13am

What would not be data oriented then? I don’t know any language that don’t have some form of data-structure?

I think it has to be we go one level down and define some properties that the data-structures in a data oriented language must have. For example your point on generality. Now re-reading, I see you meant using general structures, but I don’t know what that means. What’s a general structure? One that is popular? One that is used pervasively? I was thinking of it more in terms of general operations over the structure. In that, it doesn’t matter what the data represents, like it doesn’t matter if the map is a bank account, a user, a receipt, etc. I’ll still use the exact same functions to manipulate it no matter.

In an OO language (but not sure of Smalltalk). This generality of functions doesn’t exist. For example, just getting an element from the structure is a custom method (so called getters). So someone would use “getName” to get the name out of a User structure. In a data oriented language, you’d use the generic “getElementFromData” function, which returns you the value at a particular key, it is agnostic of the fact that the data models a User.

I’m not super familiar with Smalltalk, are you saying Smalltalk would have had a generic getter that works to retrieve any element of any Object no matter what the Object models in the domain?

ericnormand · June 13, 2020, 12:54pm

To me, the requirement of data orientation is that the data has some kind of general algebra for accessing it. Relational algebra could be it. But a less formal algebra like what Clojure provides is possible, too. I would say the pattern matching in Erlang gives enough of a general algebra to make it at least possible. Whether people use it that way is another thing.

Richard_Heller · June 13, 2020, 6:56pm

OOP is data oriented. It takes the data, along with the functions that manipulate that data, and balls them up into a single thing and calls it an object.

Objects are a direct, 1-to-1 mapping of the data they represent. FP, on the other hand, forces data to be genericized so that it can be used with existing functions.

When implementing an algorithm with OOP, you create objects that directly implement how the algorithm is written. For example, if you’re implementing a tree traversal algorithm you first create a tree object and then traverse it. With FP, you first have to translate the concept of a tree to lists and maps so that you can use the existing functions to do the work.

Some people prefer one way, some prefer the other.

delonnewman · June 13, 2020, 9:16pm

You make a good point, and just so you know I don’t think I, or really anyone, to my best understanding, really has a clear understanding of this. These are deep issues that go into the very foundation of the motivations behind mathematical logic, and, by extension computing.

That said, to me, the deepest lesson Lisp has to teach is that computing, and therefore programming, not just Lisp, is about data by it’s very nature. So in that sense, all programming languages are trying to be (whether by design or not) data-oriented, but some languages deal with this reality better than others.

So when I think of a language that is data-oriented, I’m thinking of languages that are better than average on that count. That’s an intuitive thing as a practitioner. I think Clojure is the best language I’ve used for my data-oriented programming needs (which is why I’m here).

But, I would argue that a good object-oriented programming language (like Smalltalk) in the hands of a well developed object-oriented programmer–one who is very conservative about issues around state–one who is careful to craft class systems that are general–is data-oriented as well.

In my estimation, that’s one of the other major weaknesses of object-oriented programming that’s even present in the best OO languages–that it tends to encourage the development of more complex systems for less mature programmers, even when issues around state are handled well, because building with generality in mind is hard. This is the major weakness of type-oriented FP languages (in my mind) as well. These languages make generality a feature by encouraging polymorphism. But, providing some idiomatic, practical, general structures as Clojure, Perl, Ruby, etc. do makes this mindset easier to benefit from and to adopt.

I used to tutor math quite a bit, and (to me) that’s the hardest thing for less mature mathematicians to learn also. Generality should always be a goal, that’s where the big insights come from and a tremendous amount of leverage. But it’s hard, and it get’s harder the more general you get. Which is one of the reasons I tend to see programming as just another one of the mathematical arts.

So, back to the point. I’d describe Smalltalk as data-oriented because it’s a homoiconic (general), fully reified language that goes quite-a-bit further than average to encourage data-orientedness (if that’s a word). That said, to me, it’s a spectrum. I’d argue that Clojure is certainly among the best-in-class if not the best-in-class in that department, and the weakness of Smalltalk’s attempts at data-orientedness is that to pull off it’s approach effectively more maturity as a developer is required (I recall Alan Kay making mention of some things along those lines himself), and kind of an “architect” mindset that became an epidemic in the Java, C#, C++ world (please excuse the sarcasm).

Languages that fail miserably with respect to data-orientation (in my experience) are more focused on control structures than data processing. So even when they claim to be “object-oriented” they are often really much more classification-oriented, mechanism-oriented, and place-oriented.

As to the question, “what’s a general structure?”. As I best understand it, I think a reasonably well-informed definition, without claiming that this is a well understood idea, might be a structure that can describe (or perhaps express) the most structure (again a spectrum). So, for example, number is the most general structure, because everything can be described in terms of number. But, mathematical sets are quite general as well since they can be describe numbers and be can used to describe other more specific structures like Graphs, Algebras, etc.

I don’t often post to forums (perhaps this shows), but I’ve enjoyed this discussion very much.

delonnewman · June 13, 2020, 9:23pm

Perhaps a case-in-point from the Lisp world would be the examples given in Chapter 2 of SICP “Building Abstractions with Data”, which is often, more-or-less, stateless OOP. But, it would be hard for me to not classify that approach as data-oriented.

didibus · June 14, 2020, 6:55pm

I’d have to respectfully disagree. I think you’re doing an equivocation fallacy.

Whatever OOP is and what I’m trying to establish and tentatively calling Data Oriented as a style are not the same style.

And since OOP is called Object Oriented already, I don’t see why we’d need to also refer to it as Data Oriented.

I feel this is part of it as well, because that’s the first benefit, is that you can use functions that are generic to the domain, but common to the structure.

This is also one of the big difference with other styles that rely on custom data types for each domain entity.

I think there is another criteria though, that of having a flexible enough set of data-structures that you can model your domain as closely as possible to its inherent structure.

What I mean by this is that, applying the Data Oriented style means that you fundamentally want your program’s domain model to reflect 1:1 with your domain’s way of modeling it.

Relational modeling would fail this criteria, since it imposes a very strict structure in order to give you that generic algebra.

Clojure on the other hand gives you various ways to accommodate most domain models, you have hierarchical with maps, flattened with lists, relational with sets, graph with Datascript/Datomic (graph might be the only one that’s not included in core). Since real world domains are pretty much never of a fixed constant element size, all structures in Clojure are dynamic in size. And real world domains are full of heterogeneous representations, so Clojure also has great support for this.

I do think Erlang definitely has a Data Oriented approach to some extent. It does use lists, maps, sets, and other general data-structures to represent domain data. So that would qualify as having a flexible domain modelling toolkit. And it has an algebra over them, with pattern matching, guard clauses, and also its data types are abstract.

Where I’m curious is in the actor model. The actor model forces a structure that might not be true of the real domain. What if some document is actually shared between accounting and marketing? In Erlang I’d have to pretend that they both operate on their own copy and synchronize their changes between each other.

The way I personally distill what I’m calling Data-Oriented would be in being as truthful to the structure and operations of the real domain. When I think of data, I think of real world information. How can we take the real world information, and model it in our programs “AS-IS”. Similarly, how do we manipulate data in the real world? Can we similarly manipulate data in our programs? In the real world, you manipulate data in a very generic way, and “you” are the one that defines the invariants of your particular usage of said data.

The programming language and the style of programming are not one and the same. A language can focus on providing tools tailored in helping the programmer with achieving one style over another. Java is OOP because it focuses on providing supportive constructs and tooling that helps you design a program in the OOP style.

I’m just emphasising this, because from that perspective, it is hard to argue that an OOP language is Data Oriented. That would be arguing that the language is more focused on providing language level support for the Data Oriented style of programming over all others. If that was the case, the language would not claim to be focused on the Object Oriented style. You can off course try to make your language friendly to multiple paradigms.

Maybe one OOP language also has hints of Data Orientedness. Maybe the two styles have overlaps in places. Maybe some OOP language also focus on providing a more mutli-paradigm framework and has as its second most prominent style Data Orientedness. But I’d still regard them as separate styles, and unless a language is mislabeled as being OOP, that labeling would indicate that OOP is its primary focus, not data oriented or any other style.

Now, I’m also not saying that the Data Oriented style is the one true style and is the best style to use for all programs. And I’m not saying the OOP style is the worst. I’m only interested in distilling the various programming styles and their essence, so I can have a mental map of the landscape of program design.

I think in the context of Clojure, and reading the History of Clojure from Rich Hickey, I really feel like he was very focused on trying to build a programming language that facilitates taking the real world information as-is, and just plugging it in to the computer. What is associative is associative, what is nested is nested, what is flat is flat, what is ordered is ordered, what is relational is relational, what is shared is shared, what is independent is independent, what is contextual is contextual, etc.

This isn’t always the right approach. For example, if I were to make a game, if the game data is layed out one way, but that’s not going to hit the CPU cache, I have to re-strcuture it some other way, because my game has to hit 60fps. And from this is born the Data Oriented (for games) style.

This is all me trying to reverse engineer the “style” Clojure makes most appropriate off course. It’s naively a reductionist exercise. All programs end up being unique in style, and I’m just trying to find generalities.

I really like this paragraph, because you bring new ideas of styles, classification-oriented, mechanism-oriented and place-oriented for me to ponder on and explore.

So I’ll end with some of the more concrete aspects I feel the Clojure Data Oriented style is about:

Default to value semantics, equal data is equal, the particular container type doesn’t matter, only the captured information does
Manipulate data directly, validate invariants, don’t encode the invariants in the container type and operations over the type. Having a generic data transformation algebra helps for this.
Use a structure that is similar to the real structure, aka, there shouldn’t be any constraint on what data-structures you are allowed to use to model your domain and their operations. Not everything has to be a fixed sized record, a tuple, an actor, an object, etc. This means pure and impure modeling should be allowed as well, since some things in the real world do mutate. Care can be taken to make this concurrency safe off course, like with the atom construct.

Just my 2 cents. Think of this as :didibus/data-oriented-style. Not to be confused with the same key name from other namespaces

Richard_Heller · June 14, 2020, 10:07pm

I would repectfully disagree with your respectful disagreement. The focus of OOP is on the data. It’s all about isolating changings and managing / controlling how data flows through the system. Do we need to call it Data Oriented? No, that would be redundant.

Why is it called Object Oriented? Because that’s how data is represented. Even the naming is all about the data. Functions are given a back seat to the data. The focus of FP is on the functions, with data being secondary.

How OOP goes about things may not be your cup of tea, which is fine, but it’s primary focus has always been on the data. The more a program focuses on data, the closer it gets to being OO.

I couldn’t have described the fundamental purpose of OOP any better if I tried. I’m actually getting a little teary eyed over here… Once it clicks how useful encapsulation and polymorphism really are, your journey to the dark side will be complete.

Anthony_Leonard · June 15, 2020, 1:18am

This discussion has (naturally) tended to define DOP in contrast to FP and OOP, and where all those nuances and their respective algebras and powers lie. At the risk of overloading this topic further though like to give some love to the data itself, and re-emphasise the importance of support for namespaced keys in any “programming” that is really data oriented. It’s another string that is almost unique to Clojure’s bow in contemporary programming languages (I think?), albeit with prior RDF art. The benefits of this one feature is easily overlooked but turn up everywhere from databases to APIs to UIs to DDD (IMO, where I’m beginning to think the term “bounded context” boils down to namespaces). In fact this whole discussion has been about different people’s different interpretations of one insufficiently narrow term :data-oriented-programming so I’m delighted to see namespaced versions of the same appearing . Naming a concept is a powerful thing - and labelling it with something usable context free, with globally distinct and yet contextualised semantics by just namespacing the key itself, and wielding the powers to merge cross-domain concepts that come from that seems to me to be truly “data oriented”… or should that be “information oriented” …

This talk does a much better job of explaining this power than I can - though his label for this is “data focused”

jgomo3 · June 15, 2020, 7:35pm

Data driven programming, as Eric S. Raymonds defines it in his book “The Art of Unix Programming”:

“When doing data-driven programming , one clearly distinguishes code from the data structures on which it acts, and designs both so that one can make changes to the logic of the program by editing not the code but the data structure.”

mvarela · June 16, 2020, 6:02am

This seems very close to how data-based DSLs work in Clojure.

didibus · June 16, 2020, 8:56am

I consider Data Driven to be another style altogether. In that style, you build a description of your operations represented as data, and have an interpreter to it that performs the computation defined by the data. Hiccup is an example of that style.

Yehonathan_Sharvit · July 29, 2020, 6:27pm

After a month of thinking, I came to the understanding that the fundamental characteristics of Data Oriented Programming are:

Code and data are located in separate entities
Data is immutable
Data access is universal
Data shape is flexible
Data can be created via literals

It seems to me that 2,3,4 and 5 can be summarized in a short sentence:

Data is considered as a value.

What do you guys think?

didibus · July 29, 2020, 11:55pm

These seem like a good start to me.

I think for any style, it’s impossible to really define the essence of it. Think of music genres, or architectural styles, you can explain a bit the themes and ideas, some of the more iconic characteristics, yet you can never nail it down, the borders between where one style begins and another end often can blur, and like music genres, swats of subgenres within a genre appears and everything becomes ever so much more difficult to put in unambiguous clear little categories.

So I think a good way to build an intuition into them, and to teach it to others is by example. Like with music, listening to music in one style and others helps you build that intuition. And same with architecture, looking through photos of different style is probably much better then reading their descriptions.

That’s why I’d say, if you can produce example of small programs in a data oriented style, and show the same in other style. And then talk about some of the differences and how they relate to each style. Might include the use of different programming language, since not all of them can properly demonstrate the style (like choice of materials in architecture). That might be able to teach the style more effectively, and it be a great complement to your more definitional characteristics.

wazound · July 30, 2020, 12:08pm

I’m late to this, but my 5c is …

Data Oriented programming is when “data is code”. That’s it.

FP, or not, doesn’t come into it. Neither does OO or not.

(For clarity, Homiconicity is more than this - it requires that “code is data” as well. Ie. the code is represented in the language’s own data literals.)

Data oriented design or programming usually involves two (or more) execution contexts.

To explain data-oriented design and how it applies to re-frame, I wrote this, which you might find interesting:
https://day8.github.io/re-frame/data-oriented-design/

Yehonathan_Sharvit · July 31, 2020, 12:18pm

I loved your ideas @didibus
It reminds me how important it is to maintain the balance between being too abstract or too concrete.

ccidral · August 5, 2020, 1:20pm

Two observations have been living on the back of my head in regards to FP vs OOP. I hope it makes sense because this is just me thinking out loud.

FP feels closer to the general input > process > output computing model, which is a simpler model to reason about. OOP on the other hand feels distant from that because you don’t really know what goes into a method for processing due to data encapsulation.
FP also feels closer to—as far as pure functions are concerned—the mathematical concept of functions, which I think provides simple & powerful means of composition.

Yehonathan_Sharvit · September 27, 2020, 5:16am

Thank you guys for all your inputs.
I have officially started to write my book about Data Oriented Programming.
A few excerpts are published on my blog here.
The introduction summarises my understanding of what is Data Oriented Programming.
Feel free to disagree and share your thoughts.

Yehonathan_Sharvit · February 4, 2021, 3:24pm

New milestone in the way to share with the global developer community the benefits of Data-Oriented programming. My book is available for early access at manning.com.

I have create a #data-oriented-programming channel on Clojurians so that we can create kind of a work group about Data-Oriented programming.

The first task of our work group would be to create a Wikipedia article about “Data-Oriented programming”.

ericnormand · April 1, 2021, 6:37pm

Congrats, @Yehonathan_Sharvit, on the current success of your book! You’ve created a phenomenon. I consistently see it on the Manning’s top 10 bestsellers lists. Here it is at #1 today (beating my book!):

Your ideas have really been getting to me. I’ve been thinking a lot about the advantages of Data-Oriented Programming (DOP), especially as it relates to Clojure. I wanted to share my thoughts somewhere:

Reduce boilerplate

A lot of the benefit of DOP is merely giving you the basics that you have to write yourself in Java:

Getters and setters (get/assoc)
Equality and hash code
Iterators (seq)
Serialize/deserialize
Clone
Constructors

I think these things are quite mundane but provide significant savings.

Reflection

There are a lot of features that are easier to code in the general case. In DOP, that means we are coding at the level of the data structure. In Java, it would mean coding at the level of the class. However, in normal Java code, Classes, which describe objects, are not first-class. You have to go through a complex reflection API. Because they’re hard to do in a general way, you have to do them for each specific case. In DOP, we can do them once and use them for all data structures they apply to.

A non-comprehensive list of things that we do in Clojure that would require reflection in Java:

listing fields
diffing
merging two entities of the same type

I believe the fluidity with which we program using DOP is not well-understood. We program at the entity level and at the map level. We often move between them fluidly and don’t realize it.

Open world assumption

The open-world assumption is critical for making systems that are resilient to change. In Clojure, it means that we assume there may be keys any particular piece of code might not recognize and it should keep going anyway. Likewise, a missing key can often be given a default or some other workaround. This is vital for forward and backward compatibility.

This is very hard to do in Java. Your code won’t compile if you access a field that does not exist.

Less is needed

There’s a lot of stuff you just don’t need when doing DOP:

Names for classes
synchronized keyword (since everything is immutable)
type hierarchy shennanigans

Standard API

Clojure gives us a huge library of operations over its data structures. These implement many common algorithms and they make working with data very nice. Imagine having to do a join between every combination of two classes out of ten possible classes. Each pair would have to be custom coded. In Clojure, there’s a function that just does it. Even if there wasn’t, you could write it yourself very easily.

The other side is that you don’t have to learn new APIs. In Java, each library’s API contains a number of classes, each with custom methods with different semantics. The classic example is the Java Servlets API. Look at the Request and Response classes’ Javadocs. It’s amazing that Ring translates those into two hash maps with (I believe) zero information loss. All you have to do is learn the keys and value types to expect and you have everything you need.

Modeling

Modeling is a complicated activity, but suffice it to say that Clojure’s data structures give you everything you need (basically, product and sum types).

Data anyway

Some problems need to be moved into data anyway. For example, you might hard code the T-Shirt sizes into an Enum at first. But if they change every day, you have to move it into data.

DOP obviously starts with the data. However, a lot of stuff doesn’t need to be in data. If the T-shirt sizes aren’t changing, coding them as an Enum gives you a lot of benefits. For one, you get static type checks on the values. Java’s tendency is to prefer static encoding and reluctantly moves toward data encoding.

DOP’s advantage there is only having one kind of encoding: first-class data. You learn it once and you won’t have to change it because it’s already the most dynamic possible. However, you lose the flexibility and power of having multiple ways to encode it.

Lots of problems in the modern world are better solved at the data level. For example, dealing with an unknown JSON API is much better if you don’t have to model it ahead of time. You want to do minimal translation of JSON into the equivalent data types of your language and explore it. Once you get a good idea of how to translate it into the entities you need to work with, the advantage of keeping it as data diminishes. However, you are probably going to convert it back to data anyway, so it may be good just to leave it as data. That way, you only have to learn one way to work, which is the most powerful one anyway.

What do you think? I’m happy to discuss any of these points further.

Rock on!
Eric

system · October 1, 2021, 6:38am

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.