Analyzing Java sources / .class files using javaparser or javap

@ericdallo and I are considering using javaparser (a library) in clj-kondo so we can better navigate to classes/methods and find arity errors when calling methods, etc. But this library is LGPL, while clj-kondo is EPL1.0. What should we watch out for?

3 Likes

Are you’re just planning to depend on it or are you planning to link it into your shipping binaries?

In general, LGPL is written to allow linking into other projects without the “viral infection” that is the GPL.

2 Likes

When you distribute binaries which include LGPL-licensed library, you’ll need to (i) provide source code for the library and (ii) package it in a way which makes it possible for the user to replace the library with different/modified version/implementation.

Afaik (+ ianal) those are the only requirements such arrangement would impose. Maybe worth noting explicitly is that you don’t have to provide the source of the library when it is not being distributed with your binaries, i.e. the user has to install/provide it themselves on their end.

1 Like

Perhaps a GraalVM native image counts as static linking? I would say it’s a different flavor of a distributed uberjar (with the caveat that you can’t really easily inspect the contents).

I guess, if everything remains open source, then this is not an issue.

From https://fossa.com/blog/open-source-software-licenses-101-lgpl-license/:

for statically linked libraries, a distributor must offer access to not only the library’s source code, but other information or materials necessary to rebuild the program.

I’m not familiar with GraalVM architecture or build structure, but if I had to guess I’d strongly expect any single executable (as currently found in some babashka distributions) to be considered statically linked artifact.

If I’d have to guess further I’d expect – from parts talking about “Corresponding Application Code” in [1] – that making whole application source available would satisfy requirement of replaceability (while LGPL FAQ currently explicitly says object code is required [2], I reckon its wording is not precise enough, at least I cannot derive requirement for object code or other intermediate build artifacts from the license itself).

One more thing to consider is that simply linking to resources which are not under your control might not be enough to satisfy licensing terms (see e.g. discussion under [3]).

Fwiw hth :slightly_smiling_face:

[1] https://www.gnu.org/licenses/lgpl-3.0.txt
[2] Frequently Asked Questions about the GNU Licenses - GNU Project - Free Software Foundation
[3] licensing - How long do I have to provide the source code for a LGPL-library? - Software Engineering Stack Exchange

1 Like

In Anakondo, I’ve used javap, which comes with the JDK. It can list class methods and fields, and it can give you the line number with -l option as well.

Also, it seems JavaParser is also available under Apache License at the user choice, why don’t you just pick to use it as Apache?

JavaParser is available either under the terms of the LGPL License or the Apache License. You as the user are entitled to choose the terms under which adopt JavaParser.

Issue solved then, thanks!

Is javap also available programmatically?

1 Like

Yes, it is a part of the standard openJDK library, here’s the source for it: https://github.com/unofficial-openjdk/openjdk/blob/jdk/jdk/src/jdk.jdeps/share/classes/com/sun/tools/javap/Main.java

It comes with Java, so you don’t need to depend on anything extra, it’s part of the JDK.

One big difference is that javap works on compiled class files, where as javaparser seems to be a source parser lib. With javap you can lookup method and field information for classes without having to pull down the source, but if you’re in a mixed Java/Clojure project, your own Java code would need to be compiled so that javap can access the compiled classes to grab the methods and fields from.

Also, for source file and line information to be available to javap this debug information needs to be compiled into the class file. By default, javac will include the line number and source file when compiling Java, so most Java libs should have it, but it’s possible to pass an option to javac so that it exclude all debug info from the compiled class file, including the line and source info, in which case it won’t be able to find it.

Personally, I think javap is a better fit for clj-kondo. It can tell you the arity and type of arguments to methods, it can list the public fields and methods on a class, and it can also show you the source file it was compiled from and the line number from it that maps to the method or field.

What do you need the line numbers for though? Are you planning on having clj-kondo return Java source line numbers for like jump to definition behavior that would jump to the Java source location?

What I mean is, is it possible to use javap via a programmatic interface, in process?
It seems like Apache Commons BCEL™ – Home offers such a thing.

/cc @ericdallo

1 Like

There’s also Porycon which is available under Apache 2 and used by clj-java-decompiler

Yes, that’s what I was referring too. You can just create an instance of JavapTask like you can see in the code I linked. The Main class just provides a CLI wrapper, but everything is pure Java.

1 Like

@didibus It seems to not support that user case very well. First of all, the JavapTask thing lives in an invisible module and second, it prints to some stream, it doesn’t return “data”.

 $ jshell
|  Welcome to JShell -- Version 11.0.8
|  For an introduction type: /help intro

jshell> import java.util.spi.ToolProvider;

jshell> import com.sun.tools.javap.JavapTask;
|  Error:
|  package com.sun.tools.javap is not visible
|    (package com.sun.tools.javap is declared in module jdk.jdeps, which is not in the module graph)
|  import com.sun.tools.javap.JavapTask;
|         ^-----------------^

Of course one can hack around this.

Maybe we should consider using both javaparser and javap? when the source code is available probably javaparser can get better results without the need of the source be compiled while javap seems the only option for .class files

Ya, that’s true, I think the underlying classes used by JavapTask do return “data” (they return Java objects), but those seem to be a lot more difficult to use, like you need to start to understand a lot of how Javap works. So parsing the stream might be easier, but I have not spent that much time with it, so maybe its not so hard.

In clojure-lsp? That would probably work better, I mean it depends how far you want to go to provide Java IDE features. If I write a new method on a source java file, do you want clj-kondo in some other Clojure file that imports my class to instantly see the arity error or method error? Do you want to be able to have clojure-lsp immediately jump-to the Java method or auto-complete? And similarly, if you jump-to a dependent .class file with no source, do you want to open it in VSCode decompiled, or auto-complete? Or also be able to have clj-kondo show arity errors, and all that. If so, I think both might be needed. Seems a lot of work though, but it would be super cool.

One thing though is that, from Clojure, from the REPL, you can’t use the Java source file, so in a typical Clojure app with some Java source, you’d modify the Java source, compile it, and then reload the REPL to pick up the new compiled Java classes, you don’t get auto-reload on the Java source the way you do on the Clojure source.

So I think if I were to provide only one, I’d favor the one that works on compiled classes. Also, most Java IDEs will auto-compile the Java source as you type or save. So as long as you put that on the path to clj-kondo or clojure-lsp, it might be that you can start linting it much faster.

The other thing is, Javap being part of the JDK, seems more likely to always support the latest and newest versions of Java, I don’t know how quick is JavaParser at keeping up.

1 Like

Yeah, that totally makes sense, thanks for the detailed explanation, it seems start with javap seems a good idea as we can support few important features and improve in the future

If it were me, I’d also focus on compiled classes. I don’t have data to back this up, but I think that it’s a far more common use case in Clojure projects to want Java info for dependencies used via interop. Not as many projects have mixed Clojure/Java source. Working with byte code will also work for dependencies written in Scala, Kotlin or whatever. It might work for Java source in the same project with some kind of auto-compile, but that’s a janky solution and long term if you’re serious about this support I suspect you’ll end up with both.

For parsing class files, personally, I’d go straight for ASM. There’s an example of code doing this here, from an IntelliJ Clojure plugin which offers a lightweight interface to Java when used in JetBrains IDEs which don’t support Java, e.g. Webstorm. The code is in Kotlin and probably more complicated than you’d like since it provides a similar interface to the IntelliJ Java classes, but I think most of what you want is in there somewhere. There are plenty of other projects out there doing similar things with ASM.

I don’t know whether LSP plugins can interact with one another, but can you access functionality from a Java LSP plugin which has probably already done all this? Obviously users would have to have that installed and configured as well as the Clojure one, but that doesn’t sound too onerous for someone wanting this functionality.

1 Like

If ASM can do this, it could also be a good target, since it is included with Clojure. I’m not sure what advantages it would have with Javap, which is part of the open JDK, but it might have an easier to use API.

On JDK11 how would you invoke javap in process? I don’t think it’s intended that way. So ASM seems better in this regard.

I think you’re right, seems in JDK 11 maybe its been hidden, also the JavaDoc says:

This is NOT part of any supported API. If you write code that depends on this, you do so at your own risk. This code and its internal interfaces are subject to change or deletion without notice.

So ASM might be preferred if you’re going to use it in process, and not through the javap command line tool.

One drawback of the ASM / .class based approach is that it’s less accurate or more difficult to get locations. It’s pretty good for getting metadata (method names, etc) from .class files though.