The hauting problem of collapsing rowspanned table cells

I will present a problem which haunted me for some time, and made me reach for overly complex tools and solutions without really thinking the problem through.

The problem is to render an HTML table, using hiccup tag syntax, but with a reasonable extra requirement, where the ordinary pattern (into [:table] (map row-renderer data)) made me try to solve the problem in an overly complex manner.

The example data is the population of the four largest cities in Sweden and Norway:

Country City Population
Sweden Stockholm 2308143
Sweden Gothenburg 1021831
Sweden Malmö 669471
Sweden Uppsala 294689
Norway Oslo 1000467
Norway Bergen 420000
Norway Stavanger 222697
Norway Trondheim 183378

Here we see a tuple of country, city and population.

The table is somewhat hard to parse for a human reader, because of the duplication in the first column. The preferable way to render the table would be:

Country City Population
Sweden Stockholm 2308143
Gothenburg 1021831
Malmö 669471
Uppsala 294689
Norway Oslo 1000467
Bergen 420000
Stavanger 222697
Trondheim 183378

where the cells “Sweden” and “Norway” should have their cell rowspan set to 4 cells. Also the renderer has to omit the duplicated cells which are covered by the first occurence cell.

For simplicity, the original data is in the form of well-behaved vectors of vectors, where all rows have the same format:

(def data [["Sweden" "Stockholm"  "2308143"]
           ["Sweden" "Gothenburg" "1021831"]
           ["Sweden" "Malmö"       "669471"]
           ["Sweden" "Uppsala"     "294689"]
           ["Norway" "Oslo"       "1000467"]
           ["Norway" "Bergen"      "420000"]
           ["Norway" "Stavanger"   "222697"]
           ["Norway" "Trondheim"   "183378"]])

The common construct (into [:table] (map row-renderer data)) gives no apparent clue on how to essentially “collapse” all adjacent cells to just one cell with a rowspan to cover up for the cells that should not be rendered, in the context of row-renderer.

Some important insights
The map function gives row-renderer only one data item at the time. In this case, the individual row datas provided to row-renderer does simply not contain enough information for the function to return the required result.

Another important insight is that the row-render essentially would have to be able to “look ahead” in the whole data sequence to know which rowspan to use in the first cell. It would also have to remember what rowspan was last set, and which future cells are already covered, and therefore should be omitted from the table altogheter.

The sequential processing paradigm is of great importance in Clojure, but rendering the data using sequential processing of the data in one go is complex, if at all possible, and no goal in itself.

Calculate the context via index artithmetics on the dataset

My initial take on this problem would be to give the row-renderer function the whole data-set as a vector, and map over a range, like so: (map-indexed (partial row-renderer data-set) data-set). The function now has access the dataset via index arithmetics and it is possible to solve the problem.

(defn row-renderer [full-data-set idx curr-row])

In practice, each cell render still has to find out it’s current cells role in the rendering process - Is this cell the first occurce of a cells in a column? Does the cell in the row above this cell contain the same data as the current - then don’t render this cell - somewhere above there is a cell with rowspan covering up for this cell.

To be able to introspec the full data-set given a certain row or cell is helpful, but gives us the frustrating task to implement a non-intuitive dispatch for rendering individual cells/rows.

Apparently our table rendering is in need of more data at rendering time. It would be very helpful to know at rendering time which rowspan we should use for a certain cell - the default rowspan for a table cell 1, a rowspan of 0 could likely be rendered as if the cell is omitted. Our problem would be solved if we added more data, like so:

Country rowspan City Population
Sweden 4 Stockholm 2308143
Sweden 0 Gothenburg 1021831
Sweden 0 Malmö 669471
Sweden 0 Uppsala 294689
Norway 4 Oslo 1000467
Norway 0 Bergen 420000
Norway 0 Stavanger 222697
Norway 0 Trondheim 183378

I prefer to not change the format of the original data. Let’s instead feed into the row-renderer as a separate argument, like

(map row-renderer data [4 0 0 0 4 0 0 0])

gives us enough data to solve this particular problem in a reasonable way.

The calculation of the rowspan hint data is left as an exercise to the reader. The implementation of a row-renderer function would be

(defn row-renderer [[country city population :as data-row] rowspan-hint]
       (if (zero? rowspan-hint) 
           [:td {:rowspan rowspan-hint} country])
       [:td city]
       [:td population]])

The rendering of the table would be

(defn table [data]
   (let [rowspan-hints (calculate-rowspan-hints data)]
       [:thead [:th "Country"] [:th "City"] [:th "Population"]]
       (into [:tbody] (map row-renderer data rowspan-hints))])

As you might have suspected, this problem has haunted me for some time, and until I wrote this piece, I was certain that the index-walking solution was nescessary to create even slightly more advanced table renderings.

More likely, the solution(s) to more complex HTML table renderings are several passes over the data to be presented.