Clojure ML library - advise on model evaluation

I am working on a new Clojure library for machine learning (‘scicloj.ml’), based on a lot of existing libraries (tech.ml, tech.ml.dataset, Smile)

One of the few areas where complete new code is needed, is “model evaluation” including cross-validation (eventually nested) in all its variations.

I would like to discuss some aspect of this with anybody interested and knowledgeable in the subject.

It is not easy to find inspirations in this, as the design of the model evaluation is purely functional (so very different then in Python or R)

There is some code to see the current this in action here: https://scicloj.github.io/scicloj.ml-tutorials/tune-titanic.html

Further explanations and tutorials are here:

I am welcoming any feedback

2 Likes

The key question for me is, if the pseudo code below implements so called nested cross validation:

ps:      2 hyperparameter configs
folds : 6 folds

-> 12 model evaluations


for p in ps:
   for fold in folds :
       model      =  train (fold.train-data, p)
       prediction =  predict(model,fold.test-data)
       metric     =  calc-metric(prediction, fold.test-data)
   metric-of-p = mean (metrics of all folds)

best-model = p which has best metric-of-p

Hi, Python ML libraries like scikit-learn take an OOP approach but R is quite a functional language so there might be some resources there that you can reference. R’s ‘tidymodels’ framework has the function nested_cv() for nested cross-validation. It’s implemented as a map of the inner CV function over the outer CV splits:

1 Like

Thanks for the link.

I implemented one form of nested_cv here:

It implements one form of nested cv.
It does ot do model selection, bu only estimates the accuracy

1 Like