Analyzing Java sources / .class files using javaparser or javap

didibus · September 7, 2021, 8:47pm

Yes, that’s what I was referring too. You can just create an instance of JavapTask like you can see in the code I linked. The Main class just provides a CLI wrapper, but everything is pure Java.

borkdude · September 7, 2021, 9:16pm

@didibus It seems to not support that user case very well. First of all, the JavapTask thing lives in an invisible module and second, it prints to some stream, it doesn’t return “data”.

 $ jshell
|  Welcome to JShell -- Version 11.0.8
|  For an introduction type: /help intro

jshell> import java.util.spi.ToolProvider;

jshell> import com.sun.tools.javap.JavapTask;
|  Error:
|  package com.sun.tools.javap is not visible
|    (package com.sun.tools.javap is declared in module jdk.jdeps, which is not in the module graph)
|  import com.sun.tools.javap.JavapTask;
|         ^-----------------^

Of course one can hack around this.

ericdallo · September 7, 2021, 9:37pm

Maybe we should consider using both javaparser and javap? when the source code is available probably javaparser can get better results without the need of the source be compiled while javap seems the only option for .class files

didibus · September 7, 2021, 10:06pm

Ya, that’s true, I think the underlying classes used by JavapTask do return “data” (they return Java objects), but those seem to be a lot more difficult to use, like you need to start to understand a lot of how Javap works. So parsing the stream might be easier, but I have not spent that much time with it, so maybe its not so hard.

In clojure-lsp? That would probably work better, I mean it depends how far you want to go to provide Java IDE features. If I write a new method on a source java file, do you want clj-kondo in some other Clojure file that imports my class to instantly see the arity error or method error? Do you want to be able to have clojure-lsp immediately jump-to the Java method or auto-complete? And similarly, if you jump-to a dependent .class file with no source, do you want to open it in VSCode decompiled, or auto-complete? Or also be able to have clj-kondo show arity errors, and all that. If so, I think both might be needed. Seems a lot of work though, but it would be super cool.

One thing though is that, from Clojure, from the REPL, you can’t use the Java source file, so in a typical Clojure app with some Java source, you’d modify the Java source, compile it, and then reload the REPL to pick up the new compiled Java classes, you don’t get auto-reload on the Java source the way you do on the Clojure source.

So I think if I were to provide only one, I’d favor the one that works on compiled classes. Also, most Java IDEs will auto-compile the Java source as you type or save. So as long as you put that on the path to clj-kondo or clojure-lsp, it might be that you can start linting it much faster.

The other thing is, Javap being part of the JDK, seems more likely to always support the latest and newest versions of Java, I don’t know how quick is JavaParser at keeping up.

ericdallo · September 7, 2021, 10:36pm

Yeah, that totally makes sense, thanks for the detailed explanation, it seems start with javap seems a good idea as we can support few important features and improve in the future

colinfleming · September 8, 2021, 8:50am

If it were me, I’d also focus on compiled classes. I don’t have data to back this up, but I think that it’s a far more common use case in Clojure projects to want Java info for dependencies used via interop. Not as many projects have mixed Clojure/Java source. Working with byte code will also work for dependencies written in Scala, Kotlin or whatever. It might work for Java source in the same project with some kind of auto-compile, but that’s a janky solution and long term if you’re serious about this support I suspect you’ll end up with both.

For parsing class files, personally, I’d go straight for ASM. There’s an example of code doing this here, from an IntelliJ Clojure plugin which offers a lightweight interface to Java when used in JetBrains IDEs which don’t support Java, e.g. Webstorm. The code is in Kotlin and probably more complicated than you’d like since it provides a similar interface to the IntelliJ Java classes, but I think most of what you want is in there somewhere. There are plenty of other projects out there doing similar things with ASM.

I don’t know whether LSP plugins can interact with one another, but can you access functionality from a Java LSP plugin which has probably already done all this? Obviously users would have to have that installed and configured as well as the Clojure one, but that doesn’t sound too onerous for someone wanting this functionality.

didibus · September 8, 2021, 5:41pm

If ASM can do this, it could also be a good target, since it is included with Clojure. I’m not sure what advantages it would have with Javap, which is part of the open JDK, but it might have an easier to use API.

borkdude · September 8, 2021, 5:53pm

On JDK11 how would you invoke javap in process? I don’t think it’s intended that way. So ASM seems better in this regard.

didibus · September 8, 2021, 6:55pm

I think you’re right, seems in JDK 11 maybe its been hidden, also the JavaDoc says:

This is NOT part of any supported API. If you write code that depends on this, you do so at your own risk. This code and its internal interfaces are subject to change or deletion without notice.

So ASM might be preferred if you’re going to use it in process, and not through the javap command line tool.

borkdude · September 9, 2021, 9:41am

One drawback of the ASM / .class based approach is that it’s less accurate or more difficult to get locations. It’s pretty good for getting metadata (method names, etc) from .class files though.

PEZ · September 29, 2021, 6:34am

This. With the strong interop story that Clojure has, looking up docs and signatures, while using it, is super convenient.

As for the whole initiative. Love it! Let me throw in some inspiration:

colinfleming · September 30, 2021, 10:17am

I’ll see your inspiration, and I’ll raise you one. Bozhidar says:

There are two potential classes that implement the toUpperCase method. One is java.lang.String and the other is java.lang.Character. We cannot possibly know which one you are trying to evaluate here. … It’s not ideal, but it simply cannot be done in any other way.

Ummm… in the example, he’s literally calling it with a String receiver. Here’s Cursive:

I hear you say: but that’s a trivial example, and real code doesn’t look like that! Ok:

Cursive runs local type inference in the editor, mimicking what the Clojure compiler does. This allows you to have almost Java-level completions. The main issue is that in Clojure, when you’re typing out your code, the method comes before the receiver. No problem, you can do this:

Put the receiver first and use completion for your interop, and since Cursive knows the type of the receiver, you’ll only get completions relevant for that type. When you actually complete the method, Cursive will swap them around for you:

Screen Shot 2021-09-30 at 23.06.36

You don’t need to do that if your code is naturally structured in a way which allows the type to be determined:

Since the receiver comes first in the threading form, Cursive knows its type with no swapping required.

Additionally, note that neither List nor Iterator are imported in this ns, but Cursive knows their types due to the inference. If you want to know the inferred type at any time, you can just ask:

There’s plenty more, I could go on… this all works for Kotlin and Scala code too, you can rename Java/Kotlin/Scala methods from your Clojure code, finding usages of the JVM methods will find the usages in your Clojure code and vice versa, etc etc.

There’s lots that can be done in this space!

borkdude · September 30, 2021, 10:30am

Thanks for sharing Colin, that’s really cool. Clj-kondo has a basic form of type inference too that could be leveraged here. Still contemplating which route to go for Java bytecode/source analysis.

didibus · October 1, 2021, 2:26am

I also think just listing out the completion of all possible types grouped by each type would be great.

Like I’m smart enough to quickly find the section of the type I know I have. Bonus point if it’s ordered by child-most type.

One question, what it it came from a global or a function parameter that had a type hint, would completion also work then?

colinfleming · October 1, 2021, 2:41am

Yes, it does. The subs example above knows about the type because subs has a type hint saying what it will return. Local function arg hints and local binding hints also work, as well as various places where things are implicitly type hinted (e.g. this args in reify/deftype/extend-type method implementations).

didibus · October 1, 2021, 3:12am

Ok, maybe I’m pushing my luck, but how far does this the inference goes?

Example 1:

(defn hello
  []
  "hello"

(.toUpperCase (hello))

No type hint on hello, but local inference of hello should know the type from the literal and infer the return type of hello. Is that then able to be used external to the function by the call to .toUpperCase on (hello)

Example 2:

(defn hello
  [name]
  (str "hello " name)) 

(.toUpperCase (hello))

Would clojure.core functions, even though not type hinted, have their common types be known by Cursive magically (probably hard-coded somewhere)

Example 3:

(defn make-info
  [^String name]
  {:name name})

(.toUpperCase (:name (make-info)))

This one might be a tougher one, but basically can it infer the type of values on maps? Like would it know this is a Variant [^Keyword :name ^String name] and then know that the Keyword fn :name returns the value of key?

colinfleming · October 1, 2021, 4:48am

No, this inference currently only does what Clojure itself does. I could potentially do more, but it would only be in order to suggest to the user where they might want to add further type hints. Cases like the function return types are relatively easy to implement if I decide to go that far. The map value one is probably going beyond what would be worth it, though.

Would clojure.core functions, even though not type hinted, have their common types be known by Cursive magically (probably hard-coded somewhere)

Currently I’ve avoided magic hard-coding, but most core functions (e.g. str) are properly type hinted for interop purposes anyway.

PEZ · October 3, 2021, 7:23pm

So awesome!

bbatsov · October 7, 2021, 8:31am

Just to clarify what I meant - in CIDER’s case, as we’re 100% REPL-powered (no static analysis at all), we can’t know the type of the receiver unless it’s a literal or we have evaluated it. As type method hints come solely from resolving the method names without any context (we don’t send the whole expressions to the backend, just the method symbol) we’re forced to do some guesswork. We may consider adding some context down the road, at least for the trivial cases, but evaluating receivers is dangerous and potentially slow, so that’s definitely one limitation of our approach, at least for the Java interop. Clearly that’s not an issue for Clojure code, but on the Java front Cursive’s approach is way superior, that’s undeniable.

system · April 7, 2022, 8:32pm

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.