On Mar. 22nd, 2025, we had the second meeting of scicloj-ai-meetups, a new Clojure dev group focusing on AI, continuing the earlier scicloj-llm-meetups group.
7 people attended the meeting.
This time, we focused on building a basic RAG system.
Brief summary
- We coded together.
- We followed a similar path to @Carsten_Behring’s Simple RAG tutorial (which was inspired by Nir Diamant’s Python tutorial.
- We spent some time computing, storing, and retrieving sentence embeddings, and then used retrieved embeddings as context for LLM queries.
- We used langchain4j (maven articfact here) and @wkok’s openai-clojure.
- (One notable related library that we haven’t used this time is ragtacts, which also uses langchain4j.)
- Eventually, we applied the RAG technique to @ztellman’s Elements of Clojure book.
Recording
The Scicloj-AI-meetups group
The scicloj-ai-meetups group is a new developer community focused on AI models, their underlying principles, and practical applications.
The group is organized by @stoica94 and @daslu, emerging from community discussions about the need for more open collaboration and experimentation in AI. It builds upon our previous initiatives like the scicloj-llm-meetups, but with an expanded scope that explores diverse types of AI models beyond just LLMs.
While some content will be specific to the Clojure ecosystem, much of the material will be valuable and relevant to the broader AI and programming communities.
Next events
Follow our events on the Clojure Events Calendar Feed.
Code
To make the following work fully, you will need an API key. See openai-clojure’s instructions.
deps.edn
{:deps {org.clojure/clojure {:mvn/version "1.12.0"}
org.scicloj/noj {:mvn/version "2-beta13"}
dev.langchain4j/langchain4j {:mvn/version "1.0.0-beta2"}
dev.langchain4j/langchain4j-open-ai {:mvn/version "1.0.0-beta2"}
dev.langchain4j/langchain4j-embeddings-all-minilm-l6-v2 {:mvn/version "1.0.0-beta2"}
dev.langchain4j/langchain4j-document-parser-apache-pdfbox {:mvn/version "1.0.0-beta2"}
net.clojars.wkok/openai-clojure {:mvn/version "0.22.0"}}}
notebooks/rag.clj
(ns rag
(:require [clojure.java.io :as io]
[wkok.openai-clojure.api :as api]
[clojure.string :as str])
(:import [dev.langchain4j.store.embedding.inmemory InMemoryEmbeddingStore]
[dev.langchain4j.model.embedding.onnx.allminilml6v2 AllMiniLmL6V2EmbeddingModel]
[dev.langchain4j.data.segment TextSegment]
[dev.langchain4j.data.document.parser.apache.pdfbox ApachePdfBoxDocumentParser]
[dev.langchain4j.data.document.splitter DocumentSplitters]))
(def embedding-model (AllMiniLmL6V2EmbeddingModel.))
;; turning text chunks into arrays of numbers
(def embedding-store-1 (InMemoryEmbeddingStore.))
;; storing and retrieving arrays of numbers (and their associated text chunks)
(doseq [text ["today is Saturday"
"yesterday was Friday"
"tomorrow is Sunday"
"I'm learning Clojure"
"What time is it?"]]
(.add embedding-store-1
(.content (.embed embedding-model text))
text))
(let [matches (.findRelevant embedding-store-1
(.content (.embed embedding-model "Is it Tuesday?"))
4)]
{:type (type matches)
:count (count matches)
:embedded (->> matches
(map (fn [match]
{:score (.score match)
:text (.embedded match)})))})
(def embedding-store-2 (InMemoryEmbeddingStore.))
(def document
(.parse (ApachePdfBoxDocumentParser.)
(io/input-stream "data/elements_of_clojure.pdf")))
(type (.text document))
(def texts
(.split (DocumentSplitters/recursive 1000 200)
document))
(->> texts
(map (fn [text]
(.text text)))
(take 5))
(count texts)
(last texts)
(def embeddings
(.embedAll embedding-model texts))
(run!
(fn [[text-segment embedding]]
(.add embedding-store-2 embedding text-segment))
(map vector
texts
(.content embeddings)))
(let [matches (.findRelevant embedding-store-2
(.content (.embed embedding-model "What planets do you know about?"))
4)]
{:type (type matches)
:count (count matches)
:embedded (->> matches
(map (fn [match]
{:score (.score match)
:text (.embedded match)})))})
(let [question "Would you prefer indirection or abstraction, and why? Please give an example."
matches (.findRelevant embedding-store-2
(.content (.embed embedding-model question))
6)
content (str (->> matches
(map (fn [match]
(.text (.embedded match))))
(str/join "\n"))
"\n"
question)]
(api/create-chat-completion
{:model "gpt-4"
:messages [{:role "user" :content content}]}))