Compile-time optimizations in Java
- Optimization done by
javac
- What about AOT?
- Could
javac
handle those and other constructs better? - I would like my code to be optimized at compile-time, what can I do?
- Could using another programming language help?
- Are there any downsides to optimizing the compiled bytecode?
- Are there some inherent limitations?
- What are the benefits
- Why tools like ProGuard are not enough
Compile-time optimizations are well-known compiler features in languages like C and C++. In particular, through static analysis, the compiler can do a lot of transformations that produce faster or more efficient code.
In Java, the common knowledge is that there are no compile-time optimizations. The JIT will optimize at runtime the code, using gathered information, like which are the most-called functions.
I generally do not like a lot this approach.
Optimizations done by the JIT are done at runtime, so they will be done on the target device every time the program is executed. Seems like a waste of resources.
Also, compile-time optimizations can be used for creating smaller binaries. This reduces the amount of data to download during an update, and also how much data the CPU or other programs needs to load.
Are differences made by compile-time optimizations measurable? Maybe, it depends on the context, but doing something would surely not hurt.
Optimization done by javac
The Java compiler javac
does generally not optimize code, except in very specific situations.
It can do constant folding, like replacing 3+5
with 8
, but most importantly it can remove dead code in certain cases.
Consider the following program
class Foo{
void bar1(){ System.out.println("bar1!");}
}
public class Main {
public static final boolean useFoo = false;
public static void main(String[] args) {
if (useFoo) {
Foo f = new Foo();
f.bar1();
} else {
System.out.println("Foo not used");
}
}
}
The generated code for the function main
is
public static void main(java.lang.String[]);
0: getstatic #9 // Field java/lang/System.out:Ljava/io/PrintStream;
3: ldc #15 // String Foo not used
5: invokevirtual #17 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
8: return
and it can be observed that the whole branch has been deleted.
Unfortunately, this optimization is available only in some cases
-
The variable must be declared final
-
The variable must not be a function parameter or return value
-
The variable needs to be one of the eight primitive data types, not a class type, or an array
Non-final variable does not lead to dead code-optimization
If useFoo
is not final, one could expect javac
not to do such an optimization. Determining if a member variable (static or not) is changed somewhere is not always trivial, while for immutable variables it is. Even for a private non-final variable, it is hard to tell if a value is changed, mainly because of reflection.
While this is understandable for global and member variables, unfortunately, final
is required even for local variables for removing dead code.
For example, in
class Foo{
void bar1(){ System.out.println("bar!");}
}
class Main {
public void main(String[] args) {
final boolean useFoo = false;
if (useFoo) {
Foo f = new Foo();
f.bar1();
} else {
System.out.println("Foo not used");
}
}
}
the branch with Foo
is removed, while
class Foo{
void bar1(){ System.out.println("bar!");}
}
class Main {
public void main(String[] args) {
boolean useFoo = false;
if (useFoo) {
Foo f = new Foo();
f.bar1();
} else {
System.out.println("Foo not used");
}
}
}
the branch with Foo
is not removed.
Any indirection disables optimizations
Again, this can be tested easily with the following classes
class Foo{
void bar1(){ System.out.println("bar!");}
}
public class Main1 {
private static boolean useFoo(){return false;}
public static void main(String[] args) {
if (useFoo()) {
Foo f = new Foo();
f.bar1();
} else {
System.out.println("Foo not used");
}
}
}
public class Main2 {
private static boolean useFoo = createFalse();
private static boolean createFalse(){return false;}
public static void main(String[] args) {
if (useFoo()) {
Foo f = new Foo();
f.bar1();
} else {
System.out.println("Foo not used");
}
}
}
class Main3 {
private static void impl(final boolean b){
if (b) {
Foo f = new Foo();
f.bar1();
} else {
System.out.println("Foo not used");
}
}
public static void main(String[] args) {
impl(false);
}
}
In all cases, it is possible to see that the branch with the creation of a local variable Foo
is not optimized.
The reason is probably that javac
does not do any inlining.
In the first case, even if useFoo()
trivially returns false
, the information is not available when compiling if (useFoo())
.
The same holds for the second example, thus a final
variable of type needs to be initialized with a literal value.
For parameter types, when compiling private static void impl(final boolean b)
, javac
does not acknowledge that it is called only in one place with the value false
. The optimization would probably kick in all three examples if javac
would inline the code, and replace the parameter/function call with false
.
The variable needs to be a primitive type
This is a corollary of the previous point because any class adds at least one indirection.
The reason why using Boolean
instead boolean
makes such a big difference, even if only one letter is changed, is easily explained when looking at the disassembled class file
class Main {
public void main(String[] args) {
final Boolean useFoo = false;
if (useFoo) {
Foo f = new Foo();
f.bar1();
} else {
System.out.println("Foo not used");
}
}
}
This code is in fact, because of Autoboxing and Unboxing, equivalent to
class Main {
public void main(String[] args) {
final Boolean useFoo = Boolean.valueOf(false);
if (useFoo.booleanValue()) {
Foo f = new Foo();
f.bar1();
} else {
System.out.println("Foo not used");
}
}
}
One indirection can be removed by replacing Boolean.valueOf(false)
with Boolean.FALSE
, but even if the compiler would be able to inline Boolean.booleanValue
, it would probably not eliminate the branch where Foo
is created so easily.
This is because Boolean
can (unfortunately) be modified with reflection at runtime to behave unexpectedly.
The following program prints "Boolean.FALSE is true!":
package main;
import java.lang.reflect.*;
public class Main {
static {
try {
Field VALUE = Boolean.class.getDeclaredField("value");
VALUE.setAccessible(true);
VALUE.set(false, true);
} catch (ReflectiveOperationException e) {
throw new Error(e);
}
}
public static void main(String[] args) {
if(Boolean.FALSE){
System.out.println("Boolean.FALSE is true!");
}
}
}
Granted, this behavior is not portable, the Java runtime also prints the following warnings:
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by main.Main to field java.lang.Boolean.value
WARNING: Please consider reporting this to the maintainers of main.Main
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
This program demonstrates that it is not possible to (currently) resolve at compile-time if Boolean.FALSE
is true or not, as the values are determined at runtime.
One could think that it could remove the indirection by comparing the addresses of Objects instead of calling a member function, like
class Main {
public void main(String[] args) {
final Boolean useFoo = Boolean.FALSE;
if (useFoo == Boolean.TRUE) {
Foo f = new Foo();
f.bar1();
} else {
System.out.println("Foo not used");
}
}
}
but also in this case, javac
does not optimize the code.
In the beginning, I thought it would not optimize the code, because the address of Boolean.TRUE
is determined at runtime, and thus the comparison cannot be made at compile-time. The same would hold if one tries to compare Boolean.FALSE == null
. But then I noticed that Boolean.FALSE == Boolean.FALSE
and Boolean.FALSE != Boolean.FALSE
does not get optimized either to true
/false
, thus probably javac
does not attempt to do anything when it encounters an Object.
What about strings?
Strings
are a special snowflake in Java, and during compilation are treated as literals, similar to primitive types, even if they are Objects.
In fact "a" != null
gets optimized to true
(and thus leads to dead code optimization) and "a" != "b"
also gets optimized. "a" == "a"
gets optimized to true too, but since comparing String
with the same content by address does not always yield the same result, as it depends on how strings are internalized, I’m unsure if this comparison is trustworthy. Using .equals
or .isEmpty
would add an indirection, in those cases there is no dead-code elimination.
What about enums?
Similarly to String
, enums are special too. In particular, it is guaranteed that they can be compared by address, contrary to other objects, String
included.
enum UseFoo {
Yes, No
}
class Main {
public void main(String[] args) {
final UseFoo useFoo = UseFoo.No;
if (useFoo == UseFoo.Yes) {
Foo f = new Foo();
f.bar1();
} else {
System.out.println("Foo not used");
}
}
}
Again, by looking at the bytecode it is possible to see that the dead branch is not eliminated, and the possible reason.
The values of the enum are initialized at runtime, they are not initialized like String
objects or primitive types when using literals, but are more similar to Boolean
and other objects.
What about arrays of primitives?
Arrays are special too. They are objects too but do not add indirections. It is possible to access the contained elements directly and query the size too. Nevertheless, all those operations are done at runtime and do not lead to dead-code optimization.
What about AOT?
Ahead-of-Time Compilation has been introduced in Java 9, and permits to compile Java source files to native code instead of bytecode. jaotc
is an optimizing compiler, but I could find little information on how it works. Unfortunately, it has been deprecated and removed, it seems that it has seen little use since its introduction.
But even if the tool would be available, it does not produce .class
files and thus artifacts that can be used like normal Java libraries by others.
Could javac
handle those and other constructs better?
Yes. Most compiler optimization techniques are not novel, and many can be found in the JIT and probably in AOT too.
I would like my code to be optimized at compile-time, what can I do?
Unless you are measuring, I do not believe that refactoring code for the sole purpose of "performance" is a good idea.
I would still recommend, for example, replacing Boolean
with boolean
as it makes the program simpler to reason about (a Boolean
could be null
), but not if it makes the code more complex (boolean
cannot be used with generics, thus it might lead to code duplication or bigger refactorings).
The same holds for using final
member variables. Immutable objects make it easier to reason about the flow of the program, especially in a multithreaded environment. For local variables and parameters, writing final
does not make the code necessarily easier to read or more clear.
What one could do, would be to use another compiler, or optimize the .class
files.
ProGuard
ProGuard, which is part of the Android SDK, is a program that can minimize and optimize a .jar
artifact and thus can help in some of the scenarios described above.
Unless otherwise written, ProGuard has been invoked similarly to proguard -dontobfuscate -libraryjars /usr/lib/jvm/java-11-openjdk-amd64/jmods/java.base.jmod -injars Main.jar -outjars Main.opt.jar -keep "public class Main { public static void main(java.lang.String[]);}"
, and the bytecode has been made readable with `javap -c 'jar:file:Main.opt.jar!/Main.class' ` `
Non-final variables
The following snippet
class Foo{
void bar1(){ System.out.println("bar!");}
}
public class Main {
public static void main(String[] args) {
boolean b = false;
if (b) {
Foo f = new Foo();
f.bar1();
} else {
System.out.println("Foo not used");
}
}
}
gets optimized too
public static void main(java.lang.String[]);
Code:
0: getstatic #6 // Field java/lang/System.out:Ljava/io/PrintStream;
3: ldc #1 // String Foo not used
5: invokevirtual #7 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
8: return
And it is trivial to see, that there is no reference to any Foo
inside main
.
Object types (Boolean instead of boolean)
The following example
class Foo{
void bar1(){ System.out.println("bar!");}
}
public class Main {
public static void main(String[] args) {
final Boolean b = false;
if (b) {
Foo f = new Foo();
f.bar1();
} else {
System.out.println("Foo not used");
}
}
}
is optimized too
public static void main(java.lang.String[]);
Code:
0: getstatic #10 // Field java/lang/Boolean.FALSE:Ljava/lang/Boolean;
3: dup
4: astore_0
5: invokevirtual #14 // Method java/lang/Boolean.booleanValue:()Z
8: ifeq 26
11: new #3 // class Foo
14: invokespecial #12 // Method Foo."<init>":()V
17: getstatic #11 // Field java/lang/System.out:Ljava/io/PrintStream;
20: ldc #2 // String bar!
22: invokevirtual #13 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
25: return
26: getstatic #11 // Field java/lang/System.out:Ljava/io/PrintStream;
29: ldc #1 // String Foo not used
31: invokevirtual #13 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
34: return
Boolean b = false;
is "optimized" to Boolean b = Boolean.False
, probably thanks to inlining. But the test is still done at runtime.
By using -optimizationpasses 2
as an optimization flag (default value is 1), the code is reduced to
public static void main(java.lang.String[]);
Code:
0: iconst_0
1: ifeq 19
4: new #3 // class Foo
7: invokespecial #10 // Method Foo."<init>":()V
10: getstatic #9 // Field java/lang/System.out:Ljava/io/PrintStream;
13: ldc #2 // String bar!
15: invokevirtual #11 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
18: return
19: getstatic #9 // Field java/lang/System.out:Ljava/io/PrintStream;
22: ldc #1 // String Foo not used
24: invokevirtual #11 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
27: return
There is no Boolean
anymore, but a test is still done at runtime. With -optimizationpasses 3
, finally, the generated code is optimized to
public static void main(java.lang.String[]);
Code:
0: getstatic #6 // Field java/lang/System.out:Ljava/io/PrintStream;
3: ldc #1 // String Foo not used
5: invokevirtual #7 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
8: return
Strings
The following example
class Foo{
void bar1(){ System.out.println("bar!");}
}
public class Main {
public static void main(String[] args) {
if ("nofoo".isEmpty()) {
Foo f = new Foo();
f.bar1();
} else {
System.out.println("Foo not used");
}
}
}
is compiled to
public static void main(java.lang.String[]);
Code:
0: ldc #3 // String nofoo
2: invokevirtual #15 // Method java/lang/String.isEmpty:()Z
5: ifeq 23
8: new #4 // class Foo
11: invokespecial #12 // Method Foo."<init>":()V
14: getstatic #11 // Field java/lang/System.out:Ljava/io/PrintStream;
17: ldc #2 // String bar!
19: invokevirtual #13 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
22: return
23: getstatic #11 // Field java/lang/System.out:Ljava/io/PrintStream;
26: ldc #1 // String Foo not used
28: invokevirtual #13 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
31: return
It is strange to observe that Foo.bar
has been inlined, but String.isEmpty not. If ProGuard had inlined String.isEmpty
, it would have led to the desired dead-code optimization, similar to the example with Boolean
Probably because of reflection, this is not as easy as it sounds, or maybe the implementation of isEmpty
is more complex than I think. Another possibility is that the function is not implemented in Java, but in native code in the JRE, because even when using -optimizationpasses
with greater values, it does not get inlined.
Enums
The output of
class Foo{
void bar1(){ System.out.println("bar!");}
}
enum UseFoo {
Yes, No
}
public class Main {
public static void main(String[] args) {
final UseFoo useFoo = UseFoo.No;
if (useFoo == UseFoo.Yes) {
Foo f = new Foo();
f.bar1();
} else {
System.out.println("Foo not used");
}
}
}
after compiling it with javac
and optimizing it with proguard
, is
public static void main(java.lang.String[]);
Code:
0: getstatic #7 // Field UseFoo.No$31391470:I
3: pop
4: getstatic #8 // Field UseFoo.Yes$31391470:I
7: pop
8: getstatic #9 // Field java/lang/System.out:Ljava/io/PrintStream;
11: ldc #1 // String Foo not used
13: invokevirtual #10 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
16: return
ProGuard can determine that the address of UseFoo.Yes
is different from UseFoo.No
. Strangely the value is loaded in memory with getstatic
and discarded afterward, even when using -optimizationpasses
with higher values. It seems to me that this code could get eliminated too.
Could using another programming language help?
ProGuard seems to help in some of the presented use cases, but not all of them.
It would be nicer if javac
would be able to accomplish this task alone, but it is currently not.
Once upon a time, there was gcj, which as far as I know was an optimizing compiler, but it has been discontinued and never gained much popularity.
Using an alternate language means using a different compiler, so I decided to run a couple of tests with Kotlin.
Kotlin and Strings
fun main(args: Array<String>) {
val noFoo = "noFoo";
if(noFoo.isEmpty())
println("Hello, Foo!");
else
println("Hello, World!");
}
it is interesting to see, that compared to the equivalent Java program
class Main {
public static void main(String[] args) {
final String noFoo = "noFoo";
if (noFoo.isEmpty()) {
System.out.println("Hello, Foo!");
} else {
System.out.println("Hello, World!");
}
}
}
the call to String.isEmpty
is transformed to CharSequence.length
and comparing the value to 0.
As the generated bytecode is different, I hoped that ProGuard would have been able to inline it further and eliminate the dead branch.
public static final void main(java.lang.String[]);
Code:
0: aload_0
1: ldc #8 // String args
3: invokestatic #19 // Method kotlin/jvm/internal/Intrinsics.checkNotNullParameter:(Ljava/lang/Object;Ljava/lang/String;)V
6: ldc #9 // String noFoo
8: checkcast #13 // class java/lang/CharSequence
11: invokeinterface #20, 1 // InterfaceMethod java/lang/CharSequence.length:()I
16: ifne 23
19: iconst_1
20: goto 24
23: iconst_0
24: ifeq 36
27: getstatic #17 // Field java/lang/System.out:Ljava/io/PrintStream;
30: ldc #6 // String Hello, Foo!
32: invokevirtual #18 // Method java/io/PrintStream.println:(Ljava/lang/Object;)V
35: return
36: getstatic #17 // Field java/lang/System.out:Ljava/io/PrintStream;
39: ldc #7 // String Hello, World!
41: invokevirtual #18 // Method java/io/PrintStream.println:(Ljava/lang/Object;)V
44: return
}
Unfortunately, it does not.
Are there any downsides to optimizing the compiled bytecode?
Yes; as with most approaches, there are compromises.
For example, optimizing code means that some analysis and transformations are necessary. This, generally, means longer compile times.
Long build times are often problematic in bigger projects, especially while developing.
Another downside of an optimizing compiler is that small changes can have unpredictable effects on binary sizes.
Because a value is changed from false
to true
, a class that was previously not used and thus its corresponding class file and transitive dependencies eliminated from the artifact cannot be eliminated anymore, and they might take up a lot of space.
Note 📝 | Predicting the performance of a piece of code is difficult, even more so, if the compiler might or might not inline function calls and do other optimizations. In this case, as the JIT does optimizations at runtime, an optimizing compiler would not make it harder to understand the performance implications (it would probably make it easier, as it is at least possible to look at the generated code). |
Another disadvantage is that some optimizations done at compile time might prevent better optimizations at runtime. For example, the compiler could decide to inline a function that is used in only one place, but if the function is never called at runtime, the inlining could make the calling function slower, if the inlined function is big enough.
This issue can be avoided with profile-guided optimization, but it is not an approach without downsides.
For functions below a certain threshold of instructions, this should not be an issue (for example replacing Boolean.valueOf(false)
with Boolean.FALSE
) and removing dead code inside a function, should also not prevent any optimizations that can be done at runtime.
An optimizing compiler might introduce bugs, at least the eclipse compiler did it at least once, by optimizing too much code away.
Optimizations can also make debugging a more unpleasant experience. If variables are optimized out, it is not possible to inspect their values with a debugger. If the compiler does a static flow analysis and detects unreachable code and eliminates it, it cannot be reached anymore by changing some values during the program execution with the debugger.
For this reason, optimizing compilers have often a flag for enabling and disabling optimizations.
Are there some inherent limitations?
Yes, changing the layout of a class, like removing unused classes or functions, is problematic. Reflection and introspection are the main reasons why such operations are not possible. While it is true that such techniques are rare in Java code, calling Java functions from other languages like C is done through reflection, and it is the only official way for binding different languages outside the JVM together.
Note 📝 | Also in C and C++ the compiler does generally not modify the layout of a structure. |
While it is possible to define multiple classes on the same file, javac
creates a .class
file for every class. Thus it is not even true that classes on the same file are necessarily packaged together.
In Java, compared to C, there is no linking phase, where the compiled code is merged. The .class
files are only packaged together or left on the file system, and evaluated at runtime as if they all were independent shared libraries.
This removes many optimization opportunities, it is probably by design that Proguard operates on .jar
files and not .class
, as many optimizations are done between classes, even if some make sense for single .class
files too.
Warning ⚠️ | This is only true to a certain extent, because (surprisingly) javac does dead code elimination based on static final primitives of other classes, defined on different files. |
What are the benefits
I believe that if the javac
compiler would be a little bit more aggressive at optimizing code, or if Proguard or something similar would be part of the official JDK, there would be benefits for most applications.
Smaller binaries mean faster download times, less data to read from the disk, and less code for the JIT to analyze and optimize.
Even if this does not have a noticeable effect on the performance of the single program, it generally means that fewer resources are necessary for accomplishing the same task. Especially on embedded devices, or devices powered by a battery, this can make a bigger difference.
And even if 1MB of difference would not make any measurable difference (except for the reduced size), if every application would require less pace, that could sum up and there could be more place either for other programs or for user data.
The JIT is surely very good at optimizing code, but if the application does not have long runtimes, like a couple of seconds, there is probably no way the JIT can optimize anything without introducing a much bigger overhead. For those types of programs, reducing the startup time and not spending any time running the JIT or GC, is a very tangible advantage.
An interesting read can be found on the git mailing list. I do not know if they tried using ProGuard, but they definitively measured the refactorings and tried to take advantage of the (scarce) compile-time optimization opportunities.
In particular:
JGit struggles with not having an efficient way to represent a SHA-1. C can just say "unsigned char[20]" and have it inline into the container’s memory allocation. A byte[20] in Java will cost an additional 16 bytes of memory, and be slower to access because the bytes themselves are in a different area of memory from the container object. We try to work around it by converting from a byte[20] to 5 ints, but that costs us machine instructions.
Other "unorthodox" bits of advice can be found here.
I am not surprised that a tool like ProGuard is part of the Android development Kit, especially as on some Android devices resources are still scarce. I am surprised that it is not part of any JDK.
Even if today’s phones do have at least a couple of GB of RAM and an internal disk with 10 or more gigabytes, I still keep thinking about my 1GB SD card having a whole Debian system with running applications.
Why tools like ProGuard are not enough
Because compared to the compiler, they are much less tested. Also, it is hard to convince all parties to use an external tool to transform the compiler-generated code; if the compiler would do some of the work by itself, everyone would automatically benefit from it, without changing anything.
There is some work, at least for constant values for enabling new optimizations at compile-time, but it does not seem to be a high priority. Also considering what ProGuard is able to do today, it does not seem to me that language modifications are necessary. Nevertheless, language modification can be interesting, for example having constexpr
like in C++ to ensure that some computations are done at compile-time would surely be a nice feature, and provide a whole new category of optimization opportunities.
Having written all that, the main advantage of tools like ProGuard is that any language that compiles to bytecode, thus Scala, Kotlin, Groovy, and many others, can take advantage of it.
Do you want to share your opinion? Or is there an error, some parts that are not clear enough?
You can contact me anytime.