The C# macro system - anyone using both Clojure & C#?

DPiepgrass · July 1, 2021, 6:45pm

A few years ago I developed a modified syntax for C# called Enhanced C#, along with a Lisp-style macro system. It’s a long story, but basically I reformulated the C# language in a way that makes it more friendly to macros. For example, the language defines a new expression that looks like this:

identifier (expression1, expression2) { statements; }

This is used by the unless macro, for instance, to reverse the usual if statement logic:

define (unless ($cond) { $(..actions); })
{
	if (!$cond) {
		$(..actions);
	}
}

unless(authentication_failed) {
    Console.WriteLine($"Logged in as {user.Name}");
}

This compiles to plain C# code that one compiles as usual:

if (!authentication_failed) {
    Console.WriteLine($"Logged in as {user.Name}");
}

Another example of a macro-friendly syntax change is that method parameters don’t need to be variable declarations; they can be any expression whatsoever.

The newest feature, the “macro” macro, makes it much easier to use because you can write a macro in Enhanced C# and immediately call that macro in the same file (just like in all Lisp languages). Previously, writing new macros was a clumsier process that required a separate project to hold them.

Here’s an example using macro. This macro’s job is to convert a UTF-8 string into a byte array, e.g. var hello = stringToBytes("hi!") becomes var hello = new byte[] { (byte) 'h', (byte) 'i', (byte) '!' };

macro stringToBytes($str) {
    var s = (string) str.Value;
    var bytes = Encoding.UTF8.GetBytes(s).Select(
        b => quote((byte) $(LNode.Literal((char) b))));
    return quote(new byte[] { $(..bytes) });
}

This macro uses the quote macro to generate a syntax tree for the cast to (byte) and for the literal characters, created with the LNode.Literal method.

LNode is short for “Loyc node”, a reference to the underlying syntax tree, called a Loyc tree. A Loyc tree is a simple (ish) data structure inspired by the Lisp family of languages. Loyc trees can theoretically represent syntax in any language, and in fact the macro processor (called LeMP) is language-agnostic and currently supports three separate syntaxes: EC#, and LES versions 2 and 3. (to support additional languages I would need volunteers to help, as I’ve got my hands full with two other projects atm). This is cool, but as far as I can tell, it is not possible for a multi-language preprocessor to support a hygienic macro system, so it doesn’t.

A typical use of Enhanced C# is generating sequences of similar but repetitive code. For example, last week I wrote this code to generate some methods:

define isUnsigned($T) {
    $T `staticMatches` uint || $T `staticMatches` ulong;
}

public partial struct JsonWriter {
    // This is a new macro, unreleased; it'll replace the old unroll macro
    ##unroll($T in (int, uint, long, ulong)) {
        public $T Sync(Symbol? name, $T savable) {
            _s.WriteProp(name == null ? "" : name.Name, (long) savable, !isUnsigned($T));
            return savable;
        }
    }
    ...
}

The generated methods look like this:

    public uint Sync(Symbol? name, uint savable) {
        _s.WriteProp(name == null ? "" : name.Name, (long) savable, !(true || false));
        return savable;
    }

Anyway, the reason I dropped by today is that I’ve never actually written a program in Lisp or Clojure; I did everything based on just reading about them. So I’ve been learning things like “controlling macro execution order is hard and not very intuitive”, e.g. I’ve been patching some built-in macros to reverse execution order, and made a macro called #preprocessChild that overrides execution order for specific parameter(s) of other macros.

There are a lot of built-in macros light now, but their naming conventions (and behavior conventions) are not settled and not entirely consistent because I’ve been a bit indecisive over the past few years about what the rules should be

So, I wonder if any Clojure developers are (1) experts in using macros and (2) might like to help design, implement or just play with macros in C#. If so, please head on over to the LeMP home page and look around, install the Visual Studio extension, try out the LeMPDemo.exe in the released zip file, or ask me any questions you have.

didibus · July 1, 2021, 6:56pm

FYI, Clojure can also run on the .NET run-time, no need for an enhanced C# when you can just use Clojure as-is :

And a Unity friendly compiler:

didibus · July 1, 2021, 7:07pm

More to your question though, you cannot control macroexpansion order in Clojure, it is always outside in.

So the macro wrapping other macros expands first, and so on.

Because of the Lisp syntax, this is pretty easy to follow visually, since everything is nested:

(foo (bar (baz 1 2 3)

Consider all are macros, foo will expand first, than bar, than baz.

An inner macro cannot override when it expands, so you cannot have bar expand before foo, that’s just not possible.

Now foo is in charge, and it could decide to have bar expanded before it does any of its own processing.

DPiepgrass · July 1, 2021, 11:53pm

Thanks @didibus. It’s confusing to me that you can’t control macro execution order. Let’s consider a common thing you might do with LeMP: you use a macro to compute the name of an identifier… but then you want to append something to that identifier to get a final name:

concatId(computeName($T), Suffix)

What is the standard way in Clojure to make computeName run before concatId?

didibus · July 2, 2021, 5:48am

I’m not sure in your case if concatId is a macro or function, but I’ll explain with both.

If concatId is a function, what happens is that there is two pass. When the code is compiled, macros will be expanded outside in. So computeName($T) being the only macro will run, returning the code to compute the name or the computed name itself whatever it does. Let’s say it returns a string:

// source
concatId(computeName($T), Suffix)
// At compile time macros evaluate top to bottom, left to right, outside in.
// Thus after macroexpansion we have:
concatId("some-t", Suffix)

Now at runtime when that code runs, it would just call the concatId function, where the "some-t" and the value of Suffix variable would be passed to it as arguments.

Now if concatId is itself a macro, what happens is a kind of recursive decent from the outer most macro until there are no more.

// source
concatId(computeName($T), Suffix)
// At compile time macros evaluate top to bottom, left to right, outside in.
// Because concatId is the outermost macro, it evaluates first giving us:
computeName($T) + Suffix
// But we are not done, now we macroexpand that returned form as well, starting like I said, top to bottom, left to right, outside in.
// In our case computeName($T) is the left most macro and so we evaluate it and get:
"some-t" + Suffix
// Now let's pretend Suffix was a macro as well, maybe it was Suffix($T), we apply the same evaluation rules for macros until there are no more, thus this time it becomes the next available macro in our ordering rule and so we get:
"some-t" + "-t-suffix"
// Now there are no more macros, so we are done.

And now at runtime "some-t" + "-t-suffix" will execute and return `“some-t-t-suffix”.

Hope that was clear.

Now as the user of these, you cannot force a different ordering, unless by wrapping it all in an outermost macro which will reorder things. So for example, I can’t do:

concatId(expand(computeName($T)), Suffix)

Because in both cases of expand being a function or a macro, it will evaluate after concatId.

The only thing I could do is create a macro of my own that I wrap around it all, lets say I have expandFirstArg that expands the first arg if it’s a macro:

// Now expandFirstArg would run first as it is outermost
expandFirstArg(concatId(computeName($T), Suffix))
// In doing so, it would macroexpand the first arg:
concatId("some-t", Suffix)
// And now this returned form would be further expanded:
"some-t" + Suffix

As you see, the only way to change the order of evaluation for macros or functions is to wrap the code in a macro which changes the evaluation order.

DPiepgrass · July 2, 2021, 6:31am

concatId concatenates identifiers. Identifiers are a compile-time concept, so it can only be a macro. But Clojure uses the word “symbol” for this concept instead, so you have my apology for not using Clojure terminology. Personally, I’m used to thinking of “identifiers” and “symbols” as two separate concepts because an “identifier” has a location (a line and column in a source file), while a “symbol” does not (at least, that’s how it is in other languages).

So concatId is used for combining two partial names into a complete variable or function name. In concatId(computeName($T), Suffix), running concatId first doesn’t make sense; it’s a type error, like saying “multiply the string by 3.14 and then convert it to floating-point” or “multiply (1 + 2) by 5 and then add (1 + 2)”.

There must be a common technique in Clojure for inverting macro evaluation order. It’s an operation too important for it not to be possible, but Google isn’t really helping me find it. Edit: Are you saying that macroexpand could be used for this? Could the concatId macro (if it were a Clojure macro) run macroexpand on its own arguments in order to, in effect, invert the macro execution order?

didibus · July 2, 2021, 7:29am

Hum, okay I’m not sure I understand what Suffix is, and what $T is referring too.

Are you saying that Suffix will resolve to a value at compile time, and so will $T?

I think I need to understand what’s the expected compile time result of:

concatId(computeName($T), Suffix)

What would you want to happen here and what would the result be? Both at compile time and then after at runtime?

Edit:

Hum, maybe I think what you want is generate identifiers in the code, but compute their names all at compile time, for code-gen purpose? So I think in Clojure you’d use eval, and yes you can also call macroexpand-1, but it doesn’t necessarily expand everything and then execute all logic for your computed values, eval on the other hand will:

(defn compute-name
  [t]
  (str "some-" t))

(defn concat-sym
  [sym suffix]
  (symbol (str sym "-" suffix)))
     
(defmacro gen-constant-fn
  [fn-name constant-value]
  `(defn ~(eval fn-name)
     []
     ~constant-value))

(def ^:const t "ford")
(def ^:const suffix "car")

(gen-constant-fn
  (concat-sym
    (compute-name t)
    suffix)
  "vroom")

(some-ford-car)
;;=> "vroom"

This will generate at compile time a function with identifier name some-ford-car, where the name was computed at compile time, that when called at runtime with no arguments returns "vroom".

You can try running the code here: https://replit.com/@didibus/NeatFirstIde#main.clj

This still needs to happen inside the outer macro, eval isn’t some magic thing that runs first, so you need the outer macro to explicitly decide to eval something first.

That said, I can’t remember the last time I needed to reach for eval or macroexpand inside a macro.

I think generally people will do something more tailored to their needs in the macro and avoid having to use eval like so:

(defn compute-name
  [t]
  (str "some-" t))

(defn concat-sym
  [sym suffix]
  (symbol (str sym "-" suffix)))
     
(defmacro gen-constant-fn
  [id suffix constant-value]
  `(defn
     ~(concat-sym (compute-name id) suffix)
     []
     ~constant-value))

(gen-constant-fn
  "ford"
  "car"
  "vroom")

(some-ford-car)
;;=> "vroom"

joinr · July 2, 2021, 8:33am

Identifiers are a compile-time concept, so it can only be a macro.

In clojure, macros are just functions on symbolic expressions that transform the data and yield a result that is then passed to eval. So at “compile time,” or macro-expansion time, we have the ability to perform arbitary transformations on the “source” or the symbolic expression, which is typically a list (in clojure we also have vectors, maps, and set literals that can also be viewed as symbolic expressions and operated on as data). We can have symbols at runtime, and pass symbolic forms around as normal data. This is exactly how the macro system operates. The net result is that during macroexpansion time, you can apply “any” arbitrary function to the input to transform it prior to eventual evaluation. Something like the aforementioned “unless” is a simple example where we can implement it as a simple transformation of the input expression:

;;just build a list manually from the unevaluated
;;input.
(defmacro unless [pred else then]
  (list 'if pred then else))

;;use quasiquoting shorthand to do it for us, which is akin to declaratively defining a list/map/set
;;that we can splice stuff into via ~ and ~@:

(defmacro unless [pred else then]
  `(if ~pred ~then ~else))

;;user=>(pprint (macroexpand-1 '(unless (odd? 3) :even :odd)))
;;(if (odd? 3) :odd :even)

There must be a common technique in Clojure for inverting macro evaluation order.

The fundamental technique is that s-expressions are prefix-notation. I think that is the fundamental piece you are missing. The order of evaluation is driven by effectively simplifying the input form until a result is yielded. You basically have an abstract syntax tree laid bare, and eval is just recursively depth-first walking the tree until it hits a leaf and applying clojure’s evaluation rules along the way as it folds results up (simplifying sub trees by evaluating leaves, applying the root “operator” to the children, over and over again).

Since the input conforms exclusively to prefix function/macro application, the order of evaluation is explicit and obvious.

If it’s a thing that evaluates to itself (an atom) like a number, literal, keyword, string, etc. then we return it.
If it’s a quoted thing (denoted by ’ or the explicit quote form), then we return the unevaluated thing (typically a symbol, or a literal list/vector/map/set of symbols).
If it’s a list, then we evaluate depending on what the first item is.
– If the first item is a special form (like if), then the forms built-in semantics apply (kind of like a macro, except we as users don’t have “direct” access to the implementation).
– If the item denotes a macro, then the unevaluated form (the entire input expression) is passed to the macro’s function for macro expansion, and the “expanded” or transformed result is passed to eval again.
– If the item is a function, then the rest of the items in the form are evaluated, and the evaluated function is applied to the evaluated arguments via apply.
– There are additional semantics for collection literals like vectors, maps, sets, which have the effect of evaluating their contents and retaining the original literal collection.

So that’s basically it. Any alteration of the order of evaluation has to work inside those semantics. All Lisp programs (including Clojure) are just recursive simplifications of s-expressions based on the semantics of the implementation’s eval and apply and some built-in “special forms” that form the unmodified basis of the language (if, def, loop, fn, and others in Clojure).

So the obvious way to control evaluation is as @didibus said, to do it inside a macro which can arbitrarily transform the input expression and guide what is sent to eval “downstream”. This means you can walk the entire input expression (e.g. a list), and transform it to your heart’s content using any clojure function you’d like. That includes selectively applying macroexpansions where you want to and modifying the expanded result. eval is even available at compile time, which is weird but potentially useful. So code walkers are pretty common in lisps. We have a pretty substantial one in clojure.core.async in the go macro that basically uses clojure.tools.analyzer to re-compile the input s-expression into drastically different asynchronous code definitions (state machines) during macro expansion time. Plenty of other examples abound, but they all fall under these semantics. Common Lisp has a bevy of additional built-in features like symbol macros, reader macros, and compiler macros as well (symbol macros are available but not idiomatic; reader macros can be coerced by messing with the internals of the reader, and compiler macros are an unexplored phenomenon, may be possible but unclear if useful).

DPiepgrass · July 2, 2021, 4:34pm

There’s no need to explain how a macro system works, since I wrote one and have been using it for years. (edit: in fact, if you look at Loyc trees and ignore the part about Attributes, you’ll see that the concept is essentially the same as s-expressions, except that a list, which is called a “call”, cannot be empty since every list has a first element called a Target). I was just asking what techniques Clojure developers use (e.g. macroexpand), and how they would design a set of useful macros for C# if they had the opportunity.

@didibus’s comment that “I can’t remember the last time I needed to reach for eval or macroexpand inside a macro” may be important, suggesting something about how macros should be designed. I wonder what the design principle(s) might be.

joinr · July 2, 2021, 10:43pm

It’s impressive that you wrote a macro system and you have been using it for years. As far as clojure goes (as far as I can tell, Lisps in general), controlling the order of evaluation of child expressions is entirely dependent on the underlying semantics of eval, and the lisp’s macro-expansion system. As I mentioned, and @didibus mentioned, this means the parent has control of how the child transforms are applied before sending stuff off to eval for the next stage, to include rewriting children to have new/different macros that will be picked up by macroexpansion. Other tools include macrolet, symbol-macrolet, still just participate in/agument the existing machinery.

Using eval at compile-time is a code smell in my experience, although it’s plausible to concoct use cases where compile-time code generation depends on runtime context (I seemed to have reached for this erroneously early on in my career, so it would be interesting to see some industrial use cases). macrolet is an example where compile-time eval is used to generate the lexically scoped macro expanding functions at macroexpansion time. Some experimental lisps blur this by adding fexprs on top of vau, but that’s on the fringe of my PLT knowledge and seems very much a research subject (e.g. implementations are targets of research papers).

I don’t see anything obvious in your example of concatID that necessitates working outside the semantics of eval and normal macroexpansion:

(defmacro concatID [name suffix]
  `(symbol (str ~name '~suffix)))

(defmacro computeName [x]
  `(str "var" ~(name x)))

user=> (concatID (computeName hello) world)
varhelloworld

Ignoring the semantics of what concatID is actually intended to do in your example, the order of evaluation works out fine by way of normal macroexpansion rules. It would be interesting to see a concocted example on the clojure side where this is not the case and the distinction requires programmer intervention.

From a lay perspective, Loyc trees do seem like s-expressions with more steps. I am uncertain from PLT perspective what novelty they bring to the table, although the loyc concept sounds interesting if not ambitious. Sounds a lot like what graal/truffle is aiming for (that might be an interesting emergent test bed as well in addition to the clr).

didibus · July 3, 2021, 6:05am

Well in Lisp-land, there’s an adage that says you should avoid using macros as much as you can, precisely because they are hard to compose, since they don’t follow any consistent rule of evaluation.

As a user of a macro, unlike with a function, you never know what could happen, you have to understand the semantics of each macro you’re using, and they can all differ in fundamental ways, so there’s no guaranteed semantics you can rely on.

I’d say in general, you expect that a macro will not evaluate its arguments at compile-time, because most macros don’t, they simply rewrite one piece of code into another or generate code from the provided code.

So if you have a macro that performs compile-time eval, as a macro user, that’s a little surprising, and the macro would need to document that.

I think that’s why it’s more rare. When you start mixing compile and runtime in ways that aren’t interchangeable, things can get messy.

So in my prior example, what if someone did:

(gen-constant-fn
  (fetch-name-from-prod-db "foobar")
  (compute-suffix-from-remote-api "car"))

With most macros, you’d expect that the code you gave the macro will be evaluated at runtime, and that the macro will simply expand this code into a longer form or a different form. Here if it ran at compile time it would fail, my desktop doesn’t have access to prod-db and remote-api, my build server also doesn’t.

So as a user, I need to understand, oh, either I provide a value to this macro, or I can provide computations, but they must be runnable at compile-time.

Some macros definitely do this, sometimes you need too, that’s the point, this is the behavior you want, but I’m just pointing out it’s all these kind of “gotchas” in that each macro can have very different set of what can or can’t be done and how will they do things and how do you compose that with other macros or functions?

I don’t think there’s a solution, but I have long wondered about something like you suggested, like what if the user could override eval ordering of the macro, like:

(some-macro (compile-eval (compute-name T)) (compile-eval (+ 1 2 3)))

Where compile-eval would be a special form to the compiler that says evaluate those before the macroexpansion.

Edit:

Well, actually there is a kind of way to do that in Clojure, but it’s not well documented, and while it’s survived many years, I hear it’s not something the core team is fund of, and they might one day remove or modify it in breaking ways, but there is a reader eval that allows you to evaluate things at read-time, which is before macroexpansion:

(concat-id #=(eval (compute-name t)) Suffix)

This would evaluate compute-name before concat-id, it’s considered bad practice, not sure why.

didibus · July 3, 2021, 6:20am

My example above actually does. You want a macro that generates a function of a particular name. The name should be provided by the user of the macro, but the user of the macro wants the name computed through some function, I don’t know, maybe the name is based on some XML config file for example.

system · January 1, 2022, 6:21pm

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.