Is this proper coding style for a Clojure function?

saramsey · January 22, 2021, 6:03am

I’m a beginner and I’m wondering how to write a function in the proper Clojure style. Somehow I ended up writing a function that is basically one let statement with a long chain of bindings and then a single expression. From a coding style standpoint, is there a better way to structure such a function?

(s/defn read-kg-json-file-and-construct-indexed-kg :- type-kg-with-indexes
  [json-kg-filename :- String]

  (let
      [json-kg (json/read-str (slurp json-kg-filename))
       build-map-keys (map :k (keys type-json-kg-build-map))
       edge-map-keys (map :k (keys type-json-kg-edge-map))
       convert-map-to-edge-record (make-convert-map-to-edge-record edge-map-keys)
       convert-map-to-build-record (make-convert-map-to-build-record build-map-keys)
       kg-nodes (into [] (map map-to-node-record (json-kg "nodes")))
       kg-build (convert-map-to-build-record (json-kg "build"))
       construct-node-index (make-func-construct-index kg-nodes)
       node-indexes-map (make-indexes-map node-indexes-functions construct-node-index)
       node-index-curie (node-indexes-map :id)
       kg-edges (json-edges-to-edge-record-vector (json-kg "edges") convert-map-to-edge-record node-index-curie)
       construct-edge-index (make-func-construct-index kg-edges)
       edge-indexes-map (make-indexes-map edge-indexes-functions construct-edge-index)
       kg-neighbors-list (map make-edge-map-from-vector
                              (map conj
                                   (map make-vector-from-edge-map kg-edges)
                                   (range (count kg-edges))))
       kg-neighbors-in (group-by :object kg-neighbors-list)
       kg-neighbors-out (group-by :subject kg-neighbors-list)
       neighbors  {:in kg-neighbors-in
                   :out kg-neighbors-out
                   :both (merge-with into kg-neighbors-out kg-neighbors-in)}]

  {:build kg-build
   :nodes kg-nodes
   :edges kg-edges
   :node-indexes-map node-indexes-map
   :edge-indexes-map edge-indexes-map
   :neighbors neighbors}
  ))

mvarela · January 22, 2021, 7:18am

I don’t see anything particularly egregious about it. It is verbose, but that does happen.
I myself my trade some vertical space for a less horizontal sprawl (e.g, where you have 3 nested maps. You could also think about using threading macros to remove some of the nesting), but other than that, it does look pretty good.

saramsey · January 22, 2021, 7:28am

Thank you, this kind of feedback is so helpful for a beginner. I will read up on threading macros.

PEZ · January 22, 2021, 7:57am

I agree. Since some of the values calculated early go into the result, this way of stepwise building the result is perfectly fine.

That said, I would probably try to break it apart into some separate functions that each get the ”same” map to work on as it accumulates the needed data. It’s easier to test small nicely named functions in the REPL than a long chain of let bindings.

A pattern I find myself often return to (no idea how idiomatic Clojure it is) is to start with a map and thread it through a pipeline that in turn sometimes consists of pipelines. Something. like so:

(defn- work-out-some-more-details [{:keys [input-3 result-1] :as wip}]
  (let [thing (some-fn result-1 input-3)]
    (-> wip
        (assoc :result-2 thing)
        (assoc :result-3 (take 2 (some-module-2/figure-this-out (cycle input-3)))))))

(defn- work-out-some-details [{:keys [input-2 thing-1] :as wip}]
  (-> wip
      (assoc :result-1 (some-module/cook-this input-2 thing-1))))

(defn work-it [{:keys [input-3] :as wip}]
  (let [thing (str input-3 "@" "something-else")]
    (-> wip
        (assoc :thing-1 thing)
        (work-out-some-details)
        (work-out-some-other-details)
        (dissoc :input-1 :input-3 :thing-4))))

(comment
  (work-it {:input-1 "something"
            :input-2 :foo
            :input-3 [2 4 6]})
  ;=> {:input-2 :foo
  ;    :result-1 ,,,
  ;    :result-2 ,,,
  ;    :result-3 ,,,}
  )

Sorry for my lack of imagination when it comes to naming things .

This builds more lines of code, than longer let chains do, but anyway, throwing it in here as an alternative recipe.

joinr · January 22, 2021, 8:07am

The goal I typically aim for is about 20 LOC per function. That seems to be enough pressure to golf the code down and keep the local mental context of a function simple enough to reason about. There are of course exceptions, but it’s been a pretty useful practice. It leads to refactoring “big” stuff into smaller composable functions. It looks like you already did that in this case and you follow the functional style pretty well.

If I had to golf this down a bit, I would look to extract any repeated noise or verbosity into functions or macros where appropriate. Typically functions. I would also reduce the verbosity of the descriptions from the make-this-into-that into this->that, and similarly for constructors, instead of make-a-new-thing into ->a-new-thing. Destructuring can help golf things down and may make it clearing what you’re doing. Also, there may be library functions like map-indexed that do what you want and let you refactor without having to write anything new.

There’s nothing generally wrong with sequential let bindings, although you can try to aim for a more “points free” style where you’re just composing functions (Haskellers do that a bit), but if the intermediate bindings are useful for readability then I’d keep them in. There’s also a style of using anonymous let bindings for imperative stuff, like (let [x 2 _ (println x)] x) which is sometimes more convenient than peppering do while achieving the same result.

So my limited translation is

(defn ->key-converter [m f]
  (->> m keys (map :k) f))

(s/defn read-kg-json-file-and-construct-indexed-kg :- type-kg-with-indexes
  [json-kg-filename :- String]
  (let
      [{:strs [build nodes edges]} (json/read-str (slurp json-kg-filename))
       map->build-record    (->key-converter type-json-kg-build-map ->map->build-record)
       kg-build             (map->build-record build)
       kg-nodes             (into [] (map map-to-node-record nodes))
       construct-node-index (->func-construct-index kg-nodes)
       node-indexes-map     (->indexes-map node-indexes-functions construct-node-index)
       map->edge-record     (->key-converter type-json-kg-edge-map ->map->edge-record)
       kg-edges (json-edges->edge-records edges map->edge-record (node-indexes-map :id))
       edge-indexes-map     (->indexes-map edge-indexes-functions (->func-construct-index kg-edges))
       kg-neighbors-list    (->> kg-edges
                                 (map-indexed (fn [idx edge-map]
                                                (conj (edge-map->vector edge-map) idx)))
                                 (map vector->edge-map))
       [kg-neighbors-in kg-neighbors-out] (map #(group-by % kg-neighbors-list) [:object :subject])]
  {:build kg-build
   :nodes kg-nodes
   :edges kg-edges
   :node-indexes-map node-indexes-map
   :edge-indexes-map edge-indexes-map
   :neighbors {:in kg-neighbors-in
               :out kg-neighbors-out
               :both (merge-with into kg-neighbors-out kg-neighbors-in)}}))

On second reading, it seems like you are using records (and apparently typing via the defn-spec stuff). I find that people coming from statically typed languages immediately reach for records and try to use types asap. I would question “why the need for records instead of plain maps at this juncture?” (they certainly have uses, but are less common than generic persistent hashmaps).

mvarela · January 22, 2021, 8:12am

One thing I like about the intermediate let bindings is that they come quite naturally when you’re building the function in the REPL. Once you’re done, you can of course refactor them into something more compact if it makes sense, but it does give you nice intermediate steps for debugging, etc.

saramsey · January 22, 2021, 5:23pm

Thank you. I’m going to read up on points-free style.

saramsey · January 22, 2021, 5:25pm

Thank you for the example code. Super helpful! I will study the arrow syntax; looks clean and concise.

saramsey · January 22, 2021, 5:26pm

Ah, I get it now! The arrow in map->build-record is a part of the variable name. That’s very useful.

joinr · January 26, 2021, 4:39pm

We tend to lift a lot into names, since Clojure (and lisps in general) is pretty permissive with symbol names given the lack of operators or reserved words. So it’s common to use question marks to indicated predicates, and the convention this->that to indicate transforms or ->thing for a constructor. They’re just names. Similarly, the macros ->, ->>, some->, some->>, as-> etc. are all just naming conventions designed to communicate what’s happening. They aren’t syntax per se (except in the sense that macros extend the language), so there’s no commonality between the -> used there and in the other symbols.

saramsey · January 26, 2021, 8:35pm

Awesome. This opens up all kinds of naming possibilities. Very eye-opening!

system · July 28, 2021, 8:35am

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.