Structuring program that returns a lot of information in a nested Map

jayden · April 26, 2021, 4:48pm

Hi Everybody,

I’m in process of writing two apps, one is supposed to transform textual input and simply output a large nested Map (just from that input) and the second will analyze state of an application by executing API calls and again returns a large nested Map.

I’m vague in description since I don’t want to focus on that specific use case. I’d like to kindly ask about structuring application that returns a map of maps.

I tried to do something like this a year ago in Elixir and my surprise was that if your functions “know” internal structure of those data structures then you can get easily into troubles even in functional programming language with immutable data structures.

I’m sure this is not a surprise for you, experienced developers, that if you have a lot of functions that “understand” keys in nested maps, it’s very annoying to change something or to maintain it some time later since you’ll need a lot of time to understand what monstrosity you created a year ago. Not just mentioning that when you start changing stuff then it’s basically refactoring and a lot of tests you wrote will also need to be changed which means you won’t be able to use them to verify that you didn’t break something. I think the official term for this is spaghetti code.

The way how I solved this problem in Elixir is to make sure that functions in one namespace (using Clojure terms) understand only keys in a Map that was also defined in that namespace. In other words I minimized wiring between namespaces.

My first trial (v1.0) was the following:

You can find this and following code in: GitHub - jaydenmcconnell/data-structure-in-clojure

In each namespace the main function is simply called get to highlight that that’s the main one. When you call fruitbasket.analytics/get the function will get all data using simple functions in the same namespace or using get functions in child namespaces - for example by fruitbasket.analytics.fruit.apple/get.

To minimize wiring the get will use only gets in child namespaces. Nothing else in those child namespaces will be called

fruitbasket.analytics.fruit.apple/get can be almost as complicated as fruitbasket.analytics/get since it can call get in its child namespaces. And so on…

(ns fruitbasket.analytics
  (:require [fruitbasket.analytics.fruit.apple :as apple]
            [fruitbasket.analytics.fruit.kiwi :as kiwi]
            [fruitbasket.analytics.fruit.pear :as pear]))

(defn get-something [data]
  (count data))

;; opts - idea to use "merge" taken from https://clojureverse.org/t/opts-concept-map-or-vector-with-additional-parameters-as-last-function-argument/7554
(def opts-default
  {:apple-color :red
   :kiwi-color :brown
   :pear-color :yellow})

(defn get [data & {:as opts}]
  (let [;; opts merging
        opts (merge opts-default opts)
        ;; apple functions REMOVES analyzed data, leaving just unanalysed
        {:keys [apple, unanalysed-data]} (apple/get data opts)
        ;; the other functions are simpler - they don't remove analyzed data
        kiwi (kiwi/get unanalysed-data opts)
        pear (pear/get unanalysed-data opts)
        ;; and something simple that is just in this namespace
        something (get-something unanalysed-data)]
    {:apple apple
     :kiwi kiwi
     :pear pear
     :something something}))

(ns fruitbasket.analytics.fruit.apple
  (:require [clojure.string :as str]))

(defn get [data opts]
  ;; just a silly example that some analytics functions removes
  ;;   analyzed data so subsequent analytics functions have easier job
  (let [[unanalysed-data, color] (if (str/includes? data "GreenApple")
                                   [(str/replace data "GreenApple" "") , :green]
                                   [data, (:apple-color opts)])]
    ;; returning result AND rest of data (unanalyzed data)
    {:unanalysed-data unanalysed-data
     :apple {:color color
             :count 3}}))

(ns fruitbasket.analytics.fruit.kiwi)

(defn get [data opts]
  ;; just stupid example
  {:color (:kiwi-color opts)
   :count (count data)})

(ns fruitbasket.analytics.fruit.pear)

(defn get [data opts]
  ;; just stupid example
  {:color (:pear-color opts)
   :count (count data)})

and run:

(require 'fruitbasket.analytics)
(fruitbasket.analytics/get "GreenApple BrownKiwi YellowPear")
;; =>
;;   {:apple {:color :green, :count 3}
;;    :kiwi {:color :brown, :count 21}
;;    :pear {:color :yellow, :count 21}
;;    :something 21}

With such encapsulation (in my opinion) it’s easy to build a complicated app that is gathering information that is then returned as one large nested map.

The only problem is that this would work as my first version but my plan for another version is to limit amount of gathered/analyzed data by specifying what you actually want.

My second trial (v1.1) was the following:

The idea is that all fields are assoc using add-missing-x functions. They can do something simple or call get function in child namespace (only get function, nothing else).

(ns fruitbasket.analytics
  (:require [fruitbasket.analytics.fruit.apple :as apple]
            [fruitbasket.analytics.fruit.kiwi :as kiwi]
            [fruitbasket.analytics.fruit.pear :as pear]))

(defn add-missing-apple [structured data opts]
  ;; apple function is more complicated than other
  ;;   since it REMOVES analyzed data, leaving just unanalysed
  (let [{:keys [apple, unanalysed-data]} (apple/get data opts)]
    {:unanalysed-data unanalysed-data
     :structured (assoc structured :apple apple)}))

(defn add-missing-kiwi [structured data opts]
  (assoc structured
         :kiwi (kiwi/get data opts)))

(defn add-missing-pear [structured data opts]
  (assoc structured
         :pear (pear/get data opts)))

(defn add-missing-something [structured data]
  ;; and something simple that is just handled in this namespace
  (assoc structured
         :something (count data)))

;; opts - idea to use "merge" taken from https://clojureverse.org/t/opts-concept-map-or-vector-with-additional-parameters-as-last-function-argument/7554
(def opts-default
  {:apple-color :red
   :kiwi-color :brown
   :pear-color :yellow})

(defn get [data & {:as opts}]
  (let [;; opts merging
        opts (merge opts-default opts)
        ;; apple functions REMOVES analyzed data, leaving just unanalysed
        {:keys [structured, unanalysed-data]} (add-missing-apple {} data opts)]
    ;; the other functions are simpler - they don't remove analyzed data
    (-> structured
        (add-missing-kiwi unanalysed-data opts)
        ;; get this later:
        ;;   (add-missing-pear unanalysed-data opts)
        (add-missing-something unanalysed-data))))

(ns fruitbasket.analytics.fruit.apple
  (:require [clojure.string :as str]))

(defn get [data opts]
  ;; just a silly example that some analytics functions removes
  ;;   analyzed data so subsequent analytics functions have easier job
  (let [[unanalysed-data, color] (if (str/includes? data "GreenApple")
                                   [(str/replace data "GreenApple" "") , :green]
                                   [data, (:apple-color opts)])]
    ;; returning result AND rest of data (unanalyzed data)
    {:unanalysed-data unanalysed-data
     :apple {:color color
             :count 3}}))

(ns fruitbasket.analytics.fruit.kiwi)

(defn get [data opts]
  ;; just stupid example
  {:color (:kiwi-color opts)
   :count (count data)})

(ns fruitbasket.analytics.fruit.pear)

(defn get [data opts]
  ;; just stupid example
  {:color (:pear-color opts)
   :count (count data)})

and run:

(require 'fruitbasket.analytics)
(fruitbasket.analytics/get "GreenApple BrownKiwi YellowPear")
;; =>
;;   {:apple {:color :green, :count 3}
;;    :kiwi {:color :brown, :count 21}
;;    :something 21}

That gives me two benefits:

The main function might have in the second version an argument (for example set #{:x :y :z}) that specifies what to get => what add-missing-x functions to call.
After the Map is created by the main get function it is possible to call a specific add-missing-x to add a field that was not needed, but now it is required.

(fruitbasket.analytics/add-missing-pear
 {:apple {:color :green, :count 3}
  :kiwi {:color :brown, :count 21}
  :something 21}
 "abcdef" {})
;; =>
;;   {:apple {:color :green, :count 3}
;;    :kiwi {:color :brown, :count 21}
;;    :something 21
;;    :pear {:color nil, :count 6}}

I would like to kindly ask, what do you think?

Thank you very much for this community.

Kind regards, Jayden

joinr · April 26, 2021, 6:07pm

I would not overload clojure.core/get, and would instead name the operations for what you are doing specifically. In this case, it looks like you are parsing some input (a string), and accumulating structured data. You have several different parsing rules, as well as corresponding add-missing ones. I think you can accomplish something with similar intent (separation of concenrs w.r.t. the parsing rules) with qualified keywords and multimethods. I left the unqualified keys in the resulting maps though; some would probably advocate that you qualify all the keywords, but dealer’s choice:

(ns fruitbasket.analytics
  (:require [clojure.string :as str]))

(defmulti parse (fn [data fruit & opts] fruit))

(defmethod parse :fruit/apple [data fruit opts]
  (let [[unanalysed-data color] (if (str/includes? data "GreenApple")
                                   [(str/replace data "GreenApple" "")  :green]
                                   [data (:apple-color opts)])]
    {:unanalysed-data unanalysed-data
     :apple {:color color :count 3}}))

(defmethod parse :fruit/kiwi [data fruit opts]
  {:color (:kiwi-color opts) :count (count data)})

(defmethod parse :fruit/pear [data fruit opts]
  {:color (:pear-color opts) :count (count data)})

(defmulti add-missing (fn [structure fruit data opts] fruit))

(defmethod add-missing :default [structured fruit  data opts]
  (assoc structured fruit (parse data fruit opts)))

(defmethod add-missing :fruit/apple [structured fruit data opts]
  (let [{:keys [apple, unanalysed-data]} (parse data fruit opts)]
    {:unanalysed-data unanalysed-data
     :structured (assoc structured :apple apple)}))

(defn add-missing-something [structured data]
  ;; and something simple that is just handled in this namespace
  (assoc structured :something (count data)))

;; opts - idea to use "merge" taken from
;; https://clojureverse.org/t/opts-concept-map-or-vector-with-additional-parameters-as-last-function-argument/7554
(def opts-default
  {:apple-color :red
   :kiwi-color :brown
   :pear-color :yellow})

(defmethod parse :fruit/analytics [data fruit & {:as opts}]
  (let [;; opts merging
        opts (merge opts-default opts)
        ;; apple functions REMOVES analyzed data, leaving just unanalysed
        {:keys [structured, unanalysed-data]} (add-missing {} :fruit/apple data opts)]
    ;; the other functions are simpler - they don't remove analyzed data
    (-> structured
        (add-missing :fruit/kiwi unanalysed-data opts)
        (add-missing :fruit/pear unanalysed-data opts)
        (add-missing-something unanalysed-data))))

;;(parse "GreenApple BrownKiwi YellowPear" :fruit/analytics)
;;{:apple {:color :green, :count 3},
;; :fruit/kiwi {:color :brown, :count 21},
;; :fruit/pear {:color :yellow, :count 21},
;; :something 21}

You can trivially disperse these rules as well, with a separate core namespace defining the multimethods and defaults, the other namespaces defining the implementations via defmethod, and then depending on them as needed from other namespaces. I probably wouldn’t go that far though.

Also, commas in clojure are treated as whitespace.

Feels like a trend toward a CLOS (Common Lisp Object System) style fusion of method combinations too (which I am not a fan of, but many are), with the interplay between parsing and appending-missing.

jayden · April 26, 2021, 7:07pm

Thank you @joinr

I know that Clojure doesn’t need them. But I do need them

This is very cool.

I wasn’t really thinking about the parsing (something to figure out later), the parsing example was just something to do in that function. But I definitely love your example and I’ll try to use it.

I’ll check that one. I might learn something from it for this app.

Good point. Assuming that I call it get-data, that your parsing will be combined with the structure that I presented in the second example and that your parsing rules will be called differently then add-missing-x since that name will be dedicated just for adding/updating key values in a given map, this is how wiring would look like.

joinr · April 26, 2021, 7:20pm

At the end of the day, the scheme you propose will produce the same result. The question is how much namespace proliferation you need/want as oppose to just encoding the parsing “rules” in data (via the multimethod dispatch or other stuff). Clojure projects tend to either collate a lot of stuff into one namespace, but use data (like the qualified keywords, or even just simple keywords with descriptive names) to partition concerns, or follow a bit more spread-out protocol definition/implementation style. I would personally opt for additional namespaces if there was a lot of convoluted orthogonal parsing logic specific to that implementation. Enough to clear out into another logical area entirely. For a 3-line chunk of code (as with the notional example), I wouldn’t consider it worthwhile to go that route, as it would remind me of Java or other langs with an explosion of source files with tiny content. OTOH, if the implementation crept up toward several hundred lines or more, it would make sense to fork off the code into another actual namespace to me.

If you have a unified/shared get-data multimethod or protocol, the only difference in your proposed wiring would be the need for a common shared namespace the others (analytics, apples, kiwi) could depend on to bring in the multimethod (or protocols in other use cases) and define their implementations. You may have something like namespace.app.core with get-data defined as a multimethod, and then the other namespaces would require that and implement their defmethods (similar setup for protocols if you ever go down that route). You can also - if things are simple enough - do this with plain functions, although you run into the name collisions and redundancy by distributing stuff across multiple namespaces (multimethods and protocols eliminate that).

There are other options too, like defining concrete types (e.g. using a protocol and implementing it with reify or deftype or defrecord) that correspond to similar parsing rules. It’s getting more into building an AST-like thing though, which I think is less useful here. You could also parse into maps with :types and then define another layer that understands how to incorporate those tokens to build up the structured data in a separate pass. There are multiple options to explore (probably a good project for learning).

jayden · April 26, 2021, 7:54pm

So I didn’t want to get into details but in the first app will be parsing a lot of XML and JSONS and unstructured text (logs) from embedded devices (kids call them IOT). Parsing is not really the problem - it’s more about evaluating and expressing the meaning and that’s totally convoluted.

But the main mistake I made last year was adding stuff. I didn’t realize that the main thing (basket) has child structures (apples) and those have child structures (seeds) and those have child structures… and so on… and that one day I’ll want to know not just about parent systems but also about subsystems and a month later about their subsystems. And easiest way I figured out in Elixir were new files => more deeper namespaces so I didn’t have to mess with the parent one that will not change anymore.

So yeah, sounds like Java (OOP) but I do want to present the result as a nice map with child maps and child maps since this is how the system looks like - a lot of systems that have subsystems and they have subsystems, etc…

Cool. But that really reminds me OOP ;-).

I did have that problem last year. I had one namespace called “global” (but you can really call it “utils” or “helpers”) where I tried to add stuff that was re-used in multiple namespaces.

I haven’t even read about those. I’m thinking that since this is my first real app I should take it easy and just make sure the implementation (naming, wiring) is unified since I’ll 100% need to refactor it once I get more experienced in Clojure.

This was really great learning project in Elixir last year. And this year in Clojure I’ll go much deeper.

Thanks a lot @joinr

joinr · April 26, 2021, 8:06pm

Cool. But that really reminds me OOP ;-).

There’s no inheritance (technically there “can” be a pseudo-inheritance for dispatch values if you leverage something called hierarchies, but it’s not required out of the box, maybe not even used much in practice). No coupling of method and data. It’s a way to address some of the stuff OOP forces on you (e.g. polymorphism, multiple dispatch) while remaining function-oriented. No different than requiring clojure.string for string operations, or clojure.core/print-method (a multimethod) for printing stuff (or extending your own print methods for custom types).

But the main mistake I made last year was adding stuff. I didn’t realize that the main thing (basket) has child structures (apples) and those have child structures (seeds) and those have child structures… and so on… and that one day I’ll want to know not just about parent systems but also about subsystems and a month later about their subsystems. And easiest way I figured out in Elixir were new files => more deeper namespaces so I didn’t have to mess with the parent one that will not change anymore.

You may find that a relational model, or something like datalog might be useful here. It’s additional stuff to learn, but very flexible if you are dealing with richly nested, poorly understood systems with a lot of rules that you may “discover” over time. Also if you have to mess with “artisanal” data that somewhat conforms with a standard until it doesn’t…Clojure seems to excel at these tasks.

Some libraries to explore:

You can probably get by with Clojure’s built-in stuff, but there may be some compelling solutions in the declarative/logic/rules-based approaches in those libs. If you can define grammars for the input, there’s instaparse and even clojure.spec can be abused (via conform) for parsing uses fairly cleanly clojure->org parser gist.

I am sure the broader community has dealt with this stuff as well and has complementary or superior advice.

didibus · April 26, 2021, 10:33pm

I’m not sure I understand the challenge you’re describing. Starting with a simple way, what’s wrong with this:

(ns fruitbasket.analytics
  (:require [clojure.string :as str]))

(defn analyze-basket-for-apple
  [basket]
  {:color (if (str/includes? basket "GreenApple") :green :red)
   :count 3})

(defn analyze-basket-for-kiwi
  [basket]
  {:color :brown
   :count (count basket)})

(defn analyze-basket-for-pear
  [basket]
  {:color :yellow
   :count (count basket)})

(defn analyze-basket-for-something
  [basket]
  (count basket))

(defn analyze-basket
  [basket]
  {:apple (analyze-basket-for-apple basket)
   :kiwi (analyze-basket-for-kiwi basket)
   :pear (analyze-basket-for-pear basket)
   :something (analyze-basket-for-something basket)})

(analyze-basket "GreenApple BrownKiwi YellowPear")
;; =>
;;    {:apple {:color :green,
;;             :count 3},
;;     :kiwi {:color :brown,
;;            :count 31},
;;     :pear {:color :yellow,
;;            :count 31},
;;     :something 31}

jayden · April 27, 2021, 9:59am

Sounds good, thank you.

Thank you @joinr for all the links and interesting terms to google (datalog). I’ll definitely check it. But I also want to try to implement this without extra help from additional libraries. Based on my experience that’s the worst way when you have deadlines but the best way if you want to learn the language. And I also think that Clojure without libraries will be easier than Elixir without libraries since there is much more stuff in clojure.core than in built-in libraries in Elixir.

Thank you @didibus, you completely and correctly re-implemented my example ;-). The problem is that the example was really meant to show a few core features/rules and not the actual parsing. I was meant to show that I have an actual strategy how the functions call functions in child namespaces and that add-missing-x function can add missing key later.

didibus · April 27, 2021, 3:36pm

I guess I’m not sure why you’re doing things the way you did. Right now it seems to me you’ve over complicated things and what you’re doing is really non-idiomatic. That’s why I rewrote your example in a simpler and more idiomatic way. What about the strategy I used would not work for your real problem but does work for your example?

jayden · April 28, 2021, 2:00pm

It’s absolutely possible to use your example and have all functions in a single file and as joinr mentioned it’s not great to have a lot of files with just a few lines so your example is definitely better at the beginning.

The problem I faced before was adding more and more child structures with additional information so if I know I’ll scale it up, it’s better in my opinion to prepare for it and separate Baskets, Apples, Seeds… into their own namespaces.

I noticed that it’s normal for Clojure projects to have very large files (clojure.core has 8000 lines) but I’d much rather split it into several files.

didibus · April 28, 2021, 5:42pm

That choice is up to you, but still, if you do so why not keep it simple like this:

(ns fruitbasket.analytics.apple
  (:require [clojure.string :as str]))

(defn analyze-basket-for-apple
  [basket]
  {:color (if (str/includes? basket "GreenApple") :green :red)
   :count 3})

(ns fruitbasket.analytics.kiwi)

(defn analyze-basket-for-kiwi
  [basket]
  {:color :brown
   :count (count basket)})

(ns fruitbasket.analytics.pear)

(defn analyze-basket-for-pear
  [basket]
  {:color :yellow
   :count (count basket)})

(ns fruitbasket.analytics
  (:require [fruitbasket.analytics.apple :as apple]
            [fruitbasket.analytics.kiwi :as kiwi]
            [fruitbasket.analytics.pear :as pear]))

(defn analyze-basket-for-something
  [basket]
  (count basket))

(defn analyze-basket
  [basket]
  {:apple (apple/analyze-basket-for-apple basket)
   :kiwi (kiwi/analyze-basket-for-kiwi basket)
   :pear (pear/analyze-basket-for-pear basket)
   :something (analyze-basket-for-something basket)})

(analyze-basket "GreenApple BrownKiwi YellowPear")
;; =>
;;    {:apple {:color :green,
;;             :count 3},
;;     :kiwi {:color :brown,
;;            :count 31},
;;     :pear {:color :yellow,
;;            :count 31},
;;     :something 31}

In your v1.0, I didn’t really see what you were trying to achieve with that whole get pattern, it felt a little OO to me, and quite confusing to parse. The namespace doesn’t contain state, calling it just get definitely confused me for a bit, I had to figure out what namespace it was in to understand what it is trying to analyze out of the data.

In your v1.1, I also don’t really find a lot of value in your add-missing pattern. It too confused me for a bit, what is missing, what would these functions do, just felt overcomplicated to me.

I’d follow KISS instead, that’s my 2-cents, unless you’ve got more complicated use cases in you real program, maybe it warrant some of these, but I’m not seeing it right now.

For example, you want to have the user pick-n-choose what to analyze?

(defn analyze-basket
  ([basket]
    (analyze-basket basket #{:apple :kiwi :pear})
  ([basket fruits-to-analyze]
   (cond-> {:something (analyze-basket-for-something basket)}
     (fruits-to-analyze :apple)
     (assoc :apple (apple/analyze-basket-for-apple basket))
     (fruits-to-analyze :kiwi)
     (assoc :kiwi (kiwi/analyze-basket-for-kiwi basket))
     (fruits-to-analyze :pear)
     (assoc :pear (pear/analyze-basket-for-pear basket))))

You want the user to analyze something more after calling the main analyze-basket ?

(ns user
  (:require [fruitbasket.analytics :as basket]
            [fruitbasket.analytics.kiwi :as kiwi]))

(def basket "GreenApple BrownKiwi YellowPear")

(-> (basket/analyze-basket basket #{:apple})
    (assoc :kiwi (kiwi/analyze-basket-for-kiwi basket)))

Now, personally though, I think if you break out your namespace in multiple, I’d have the subnamespaces have the logic of the subsystem.

So I’d do something more like the main fruitbasket.analytics has a bunch of analyze-basket-for-thing functions. And then if you wanted to analyze a thing further I’d have a subnamespace called fruitbasket.analytics.thing with a function called analyze-thing and some analyze-thing-for-xyz functions. So it would look like this instead:

(ns fruitbasket.analytics.apple
  (:require [clojure.string :as str]))

(defn analyze-for-color
  [analyzed-apple apple]
  (let [color (cond (str/includes? apple "Green") :green
                    (str/includes? apple "Red") :red)]
    (assoc analyzed-apple :color color)))

(defn analyze
  ([apple]
   (analyze apple #{:color}))
  ([apple analyze-for]
   (cond-> {}
     (analyze-for :color)
     (analyze-for-color apple))))

(ns fruitbasket.analytics.kiwi
  (:require [clojure.string :as str]))

(defn analyze-for-color
  [analyzed-kiwi kiwi]
  (let [color (cond (str/includes? kiwi "Yellow") :yellow
                    (str/includes? kiwi "Brown") :brown)]
    (assoc analyzed-kiwi :color color)))

(defn analyze
  ([kiwi]
   (analyze kiwi #{:color}))
  ([kiwi analyze-for]
   (cond-> {}
     (analyze-for :color)
     (analyze-for-color kiwi))))

(ns fruitbasket.analytics
  (:require [fruitbasket.analytics.apple :as apple]
            [fruitbasket.analytics.kiwi :as kiwi]))

(defn analyze-for-apple
  [analyzed-basket basket]
  (let [apples (map first (re-seq #"(GreenApple|RedApple)" basket))]
    (assoc analyzed-basket :apple
           (cond-> {:count (count apples)}
             (seq apples)
             (assoc :apples (map apple/analyze apples))))))

(defn analyze-for-kiwi
  [analyzed-basket basket]
  (let [kiwis (map first (re-seq #"(YellowKiwi|BrownKiwi)" basket))]
    (assoc analyzed-basket :kiwi
           (cond-> {:count (count kiwis)}
             (seq kiwis)
             (assoc :kiwis (map kiwi/analyze kiwis))))))

(defn analyze
  ([basket]
   (analyze basket #{:apple :kiwi}))
  ([basket analyze-for]
   (cond-> {}
     (analyze-for :apple)
     (analyze-for-apple basket)
     (analyze-for :kiwi)
     (analyze-for-kiwi basket))))

(ns user
  (:require [fruitbasket.analytics :as basket]))

(basket/analyze "GreenApple YellowKiwi RedApple")
;; => {:apple {:count 2, :apples ({:color :green} {:color :red})},
;;     :kiwi {:count 1, :kiwis ({:color :yellow})}}

Notice I changed the data-structure of the analyzed basket, in my opinion it is better this way, not only it allows you to structure your code more logically, but it also makes more sense, like how there are multiple apples in a basket, and then you want details about each you need to drill down one more level.

Finally, now you might be annoyed by the pattern of analyze-for-... and the need to always add more and more case to cond, so if you want you can take @joinr advice, though I think that could be a refactor you do later as well, since the cond isn’t so bad in my opinion and simpler overall in this case. But if you want, here it is with polymorphism added:

(ns fruitbasket.analytics.apple
  (:require [clojure.string :as str]))

(defmulti analyze-for (fn [key _ _] key))

(defmethod analyze-for :color
  [_ analyzed-apple apple]
  (let [color (cond (str/includes? apple "Green") :green
                    (str/includes? apple "Red") :red)]
    (assoc analyzed-apple :color color)))

(defn analyze
  ([apple]
   (analyze apple #{:color}))
  ([apple analyze-for-set]
   (reduce (fn [analyzed key]
             (analyze-for key analyzed apple))
           {}
           analyze-for-set)))

(ns fruitbasket.analytics.kiwi
  (:require [clojure.string :as str]))

(defmulti analyze-for (fn [key _ _] key))

(defmethod analyze-for :color
  [_ analyzed-kiwi kiwi]
  (let [color (cond (str/includes? kiwi "Yellow") :yellow
                    (str/includes? kiwi "Brown") :brown)]
    (assoc analyzed-kiwi :color color)))

(defn analyze
  ([kiwi]
   (analyze kiwi #{:color}))
  ([kiwi analyze-for-set]
   (reduce (fn [analyzed key]
             (analyze-for key analyzed kiwi))
           {}
           analyze-for-set)))

(ns fruitbasket.analytics
  (:require [fruitbasket.analytics.apple :as apple]
            [fruitbasket.analytics.kiwi :as kiwi]))

(defmulti analyze-for (fn [key _ _] key))

(defmethod analyze-for :apple
  [_ analyzed-basket basket]
  (let [apples (map first (re-seq #"(GreenApple|RedApple)" basket))]
    (assoc analyzed-basket :apple
           (cond-> {:count (count apples)}
             (seq apples)
             (assoc :apples (map apple/analyze apples))))))

(defmethod analyze-for :kiwi
  [_ analyzed-basket basket]
  (let [kiwis (map first (re-seq #"(YellowKiwi|BrownKiwi)" basket))]
    (assoc analyzed-basket :kiwi
           (cond-> {:count (count kiwis)}
             (seq kiwis)
             (assoc :kiwis (map kiwi/analyze kiwis))))))

(defn analyze
  ([basket]
   (analyze basket #{:apple :kiwi}))
  ([basket analyze-for-set]
   (reduce (fn [analyzed key]
             (analyze-for key analyzed basket))
           {}
           analyze-for-set)))

(ns user
  (:require [fruitbasket.analytics :as basket]))

(basket/analyze "GreenApple YellowKiwi RedApple")
;; => {:apple {:count 2, :apples ({:color :green} {:color :red})},
;;     :kiwi {:count 1, :kiwis ({:color :yellow})}}

jayden · April 30, 2021, 1:49pm

I like that idea, thank you ;-).

I’m not annoyed by the pattern, some kind of repetition will be always there. But I’ll give it a try in real code and see what’s happen. Thank you @didibus

marciol · May 1, 2021, 12:08am

Maybe Pathom can help you: Introduction | Pathom

jayden · May 6, 2021, 9:29pm

didibus:

(defn analyze-basket
  ([basket]
    (analyze-basket basket #{:apple :kiwi :pear})
  ([basket fruits-to-analyze]
   (cond-> {:something (analyze-basket-for-something basket)}
     (fruits-to-analyze :apple)
     (assoc :apple (apple/analyze-basket-for-apple basket))
     (fruits-to-analyze :kiwi)
     (assoc :kiwi (kiwi/analyze-basket-for-kiwi basket))
     (fruits-to-analyze :pear)
     (assoc :pear (pear/analyze-basket-for-pear basket))))

I had more time to play with it and this is the best solution @didibus . Thank you.

The only change I made is moving assoc from the main function (analyze-basket) to the analyze-basket-for-x functions. The reason is that some of those functions need to return unanalyzed data. In other words some analyze-basket-for-x functions remove data that they analyze to make it easier for subsequently called functions.

(ns app.eat.my.data)

(defn half-text [text]
  (subs text (quot (count text) 2)))

;; this is probably in a different namespace
(defn analyze-basket-for-apple
  [{:keys [unprocessed_text result]}]
  ;; half-text is just a stupid example of removing already processed input
  {:unprocessed_text (half-text unprocessed_text)
   :result (assoc result
                  :some 123
                  :info 456)})

;; this is probably in a different namespace
(defn analyze-basket-for-kiwi
  [{:keys [unprocessed_text result]}]
  {:unprocessed_text (half-text unprocessed_text)
   :result (assoc result
                  :for 789
                  :you 111)})

;; this is probably in a different namespace
(defn analyze-basket-for-pear
  [{:keys [unprocessed_text result]}]
  {:unprocessed_text (half-text unprocessed_text)
   :result (assoc result
                  :here 222
                  :andthere 333)})

(defn process_all
  ([text] (process_all text #{:apple :kiwi :pear}))
  ([text analyze_for]
   (cond->
    {:unprocessed_text text :result {}}
     (analyze_for :apple) (analyze-basket-for-apple)
     (analyze_for :kiwi) (analyze-basket-for-kiwi)
     (analyze_for :pear) (analyze-basket-for-pear))))

(process_all "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nam erat justo, sollicitudin at luctus in")
;; => {:unprocessed_text " at luctus in", :result {:some 123, :info 456, :for 789, :you 111, :here 222, :andthere 333}}

I was thinking about to make it even simpler by flattening the {:unprocessed_text x :result map} map to a single map with result keys and :unprocessed_text key that can be (but does not have to be) dissoc at the end.

But I like the clear separation of something that is read/analyzed (value in :unprocessed_text) and something where the result is saved to/added (map in :result key).

This separation makes it easy to understand / maintain / test since it is possible to mention in comments / docstring that for example analyze-basket-for-pear is a pure function that is using only value in :unprocessed_text.

Btw. the combination of a set in analyze_for with (cond-> makes it really nice and simple. Thank you @didibus !

system · November 5, 2021, 9:29am

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.