Almost every example I see, regarding semantic chunking, uses Python for the examples. For instance:
I am curious if there are any Clojure or Java tools I can use for semantic chunking?
I see a few references to the Stanford NLP library, but these are 13 years old:
stanford-nlp
I’m curious what is considered modern and current?
I don’t know the answer but I’d first try searching for “NLP” on Search clojure libs on github and then following every link that looks relevant. Then following any relevant links in the READMEs of those projects.
Also note that you can use Python libraries from Clojure with GitHub - clj-python/libpython-clj: Python bindings for Clojure .
1 Like
Amazing. I did not know that about the bindings.
Does the Java Stanford CoreNLP do what you need?
1 Like
+1 for the CoreNLP wrapper
You can try this: NLP support with Huggingface tokenizers | djl it can use some of the HugginFace models, like BERT tokenizer.
1 Like