String functions

I was looking at https://commons.apache.org/proper/commons-lang/javadocs/api-3.1/org/apache/commons/lang3/StringUtils.html and there are a lot of goodies in there that I could definitely have used sooner or later without rewriting them myself.

If I look at https://clojuredocs.org/clojure.string there are just a few functions in there… nothing like, say, core. Do you think it would make sense to have more?

clojure.string comes included with Clojure, which gets a new release about every year or so, and its developers are very conservative in what they choose to add. You could ask them, but my guess is that they would suggest instead creating a separate library from Clojure, or using Java interop to use all the Java goodies you want.

1 Like

That’s exactly what I thought. On the other hand, clojure.string is cross platform, and string manipulation is darn important, so I was wondering if that is an area that should be in the core library…

Try https://github.com/funcool/cuerdas which is a string library which can be used in both clj/cljs.

2 Likes

Should or shouldn’t be in the core library is (legitimately) judged by the creators of Clojure. You can try to persuade them to change things if you wish. My semi-educated guess is that they will not be persuaded on this topic to change Clojure.

Probably not matching StringUtils, but https://github.com/expez/superstring contains some goodies, and is for both Clojure and Clojurescript

Anybody can import a string library, but being core means you get to be a part of the language, therefore is a big plus in terms of readability of other people’s code.

  • The smaller the language core, the better.

  • Common libraries like String, would be better if used as an independent official standard library.

  • Both .net and jvm split the platform into separate parts and then iteratively developed independently, concurrently, and at high speed.

Can you elaborate? Are you talking in terms of creating a common language and known semantics? Or in terms of discoverability and availability?

I think that having a set of “common idioms” is useful. If I see clojure.string/join in somebody else’s code I know what it does, if I see weirdlib/join I can imagine.

Plus, very often, string functions are (appartently…) so easy that you just write your buggy version of (lefts "aaa" 7) that gets you the first 7 characters without an IndexOutOfBounds, so it appears everywhere with small differences.

String manipulation is very important, and it’s used everywhere, so I think it would deserve more TLC :slight_smile:

Been using Cuerdas for years now, mainly for CLJS at first, but now also on CLJ. Simple and pragmatic

2 Likes

To be included in clojure.string, I think there are two very good arguments - a function is an extremely common need, and having a stdlib function can be made portable across dialects. In the words of Rich, “Clojure is a small language, and intends to remain so.” As you can see in this thread, there are plenty of add-on libs, and that is imo good.

When we last expanded clojure.string (in 1.8 via CLJ-1449), I reviewed a large corpus of Clojure, looking for the most commonly used string interop functions and narrowed it to this set, which were both very common, and a good target for both CLJ and CLJS.

My impression based on requests and watching forums is that this really did close 95% of the gap. There is always more gap of course, but it’s diminishing returns. That said, if you find particular functions that are both commonly used and great targets for portable implementations, please file a request question at https://ask.clojure.org.

The one function I could make an argument for on perf is something like non-blank?. blank? is almost always done inside a not and this could be more efficient I think.

2 Likes

Hi Alex,
I agree with non-blank?. My personal list of stuff I always use includes parsing numbers (e.g. parse-number without interop) and operations from the beginning (and end, using negative offsets) of the string that do not throw exceptions on missing indexes, like (substring "abc" -1 -7) that would return “ab”.

The parsing functions do imo meet the criteria above and there is a ticket for that with pretty good detail on the range of decisions to be made (more than you might think). https://clojure.atlassian.net/browse/CLJ-2451

Negative offsets have been requested and declined in the past, so unlikely that’s going to be added.

Why is it so? they are super-handy for string manipulation. Also, indexes on a string that just don’t die so you do not have to compute offsets manually out of the length.

Couple reasons - a big one is that the clojure.string fns are designed to primarily leverage Java’s string functions for performance and this is not supported in Java’s substring. Also, there are analogies from string manipulation to seq manipulation and negative offsets is just not something we do other places (for good perf reasons). These are better candidates for external libs.