Clojure ML library - advise on model evaluation

Carsten_Behring · September 27, 2021, 2:39pm

I am working on a new Clojure library for machine learning (‘scicloj.ml’), based on a lot of existing libraries (tech.ml, tech.ml.dataset, Smile)

One of the few areas where complete new code is needed, is “model evaluation” including cross-validation (eventually nested) in all its variations.

I would like to discuss some aspect of this with anybody interested and knowledgeable in the subject.

It is not easy to find inspirations in this, as the design of the model evaluation is purely functional (so very different then in Python or R)

There is some code to see the current this in action here: https://scicloj.github.io/scicloj.ml-tutorials/tune-titanic.html

Further explanations and tutorials are here:

I am welcoming any feedback

Carsten_Behring · September 27, 2021, 3:01pm

The key question for me is, if the pseudo code below implements so called nested cross validation:

ps:      2 hyperparameter configs
folds : 6 folds

-> 12 model evaluations


for p in ps:
   for fold in folds :
       model      =  train (fold.train-data, p)
       prediction =  predict(model,fold.test-data)
       metric     =  calc-metric(prediction, fold.test-data)
   metric-of-p = mean (metrics of all folds)

best-model = p which has best metric-of-p

andycraig · October 5, 2021, 10:32am

Hi, Python ML libraries like scikit-learn take an OOP approach but R is quite a functional language so there might be some resources there that you can reference. R’s ‘tidymodels’ framework has the function nested_cv() for nested cross-validation. It’s implemented as a map of the inner CV function over the outer CV splits:

github.com

tidymodels/rsample/blob/7fcda23740a9f27b14b46c37d4df909877968b18/R/nest.R#L78

    
      
                warning(boot_msg, call. = FALSE)
            }
          
          
  inner_cl <- cl[["inside"]]
            if (!is_call(inner_cl))
              stop(
                "`inside` should be a expression such as `vfold()` or ",
                "bootstraps(times = 10)` instead of a existing object.",
                call. = FALSE
              )
            inside <- map(outside$splits, inside_resample, cl = inner_cl)
          
          
  out <- dplyr::mutate(outside, inner_resamples = inside)
          
          
  out <- add_class(out, cls = "nested_cv")
          
          
  attr(out, "outside") <- cl$outside
            attr(out, "inside") <- cl$inside
          
          
  out
          }

Carsten_Behring · October 5, 2021, 10:12pm

Thanks for the link.

I implemented one form of nested_cv here:

github.com

scicloj/scicloj.ml-tutorials/blob/train-test-result-change/src/scicloj/ml/nested_cv.clj

(ns scicloj.ml.nested-cv
  (:require [tablecloth.api :as tc]
            [scicloj.metamorph.ml :as ml]
            [scicloj.metamorph.ml.classification :as clf]
            [tech.v3.datatype :as dt]))


(defn nested-cv [data pipelines metric-fn loss-or-accuracy outer-k inner-k]
  ;;  https://www.youtube.com/watch?v=DuDtXtKNpZs
  (let [k-folds (tc/split->seq data :kfold {:k outer-k})]
    (for [{train :train test :test} k-folds]
      (let [inner-k-fold (tc/split->seq test :kfold {:k inner-k})
            evaluation (ml/evaluate-pipelines
                        pipelines
                        inner-k-fold
                        metric-fn
                        loss-or-accuracy)
            fit-ctx (-> evaluation first first :fit-ctx)
            best-pipe-fn (-> evaluation first first :pipe-fn)
            transform-ctx (best-pipe-fn

This file has been truncated. show original

It implements one form of nested cv.
It does ot do model selection, bu only estimates the accuracy

system · April 6, 2022, 10:12am

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.