The Java mascot Duke, by Joe Palrang, licensed under the New BSD license

Compile-time optimizations in Java


17 - 21 minutes to read, 4208 words
Categories: bytecode java
Keywords: bytecode compile-time constants global variable immutable java performance reflection static analysis

Compile-time optimizations are well-known compiler features in languages like C and C++. In particular, through static analysis, the compiler can do a lot of transformations that produce faster or more efficient code.

In Java, the common knowledge is that there are no compile-time optimizations. The JIT will optimize at runtime the code, using gathered information, like which are the most-called functions.

I generally do not like a lot this approach.

Optimizations done by the JIT are done at runtime, so they will be done on the target device every time the program is executed. Seems like a waste of resources.

Also, compile-time optimizations can be used for creating smaller binaries. This reduces the amount of data to download during an update, and also how much data the CPU or other programs needs to load.

Are differences made by compile-time optimizations measurable? Maybe, it depends on the context, but doing something would surely not hurt.

Optimization done by javac

The java compiler javac does generally not optimize code, except in very specific situations.

It can do constant folding, like replacing 3+5 with 8, but most importantly it can remove dead code in certain cases.

Consider the following program

class Foo{
    void bar1(){ System.out.println("bar1!");}
}


public class Main {
  public static final boolean useFoo = false;

  public static void main(String[] args) {
    if (useFoo) {
      Foo f = new Foo();
      f.bar1();
    } else {
      System.out.println("Foo not used");
    }
  }
}

The generated code for the function main is

  public static void main(java.lang.String[]);
       0: getstatic     #9                  // Field java/lang/System.out:Ljava/io/PrintStream;
       3: ldc           #15                 // String Foo not used
       5: invokevirtual #17                 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
       8: return

and it can be observed that the whole branch has been deleted.

Unfortunately, this optimization is available only in some cases

  • The variable must be declared final

  • The variable must not be a function parameter or return value

  • The variable need to be one of the eight primitive data types, not a class type, nor an array

Non-final variable does not lead to dead code-optimization

If useFoo is not final, one could expect javac for not doing such an optimization. Determining if a member variable (static or not) is changed somewhere is not always trivial, while for immutable variables it is. Even for a private non-final variable, it is hard to tell if a value is changed, the straw-man argument would be reflection.

While this is understandable for global and member variables, unfortunately final is currently required even for local variables for removing dead code.

For example, in

class Foo{
    void bar1(){ System.out.println("bar!");}
}

class Main {
  public void main(String[] args) {
    final boolean useFoo = false;
    if (useFoo) {
      Foo f = new Foo();
      f.bar1();
    } else {
      System.out.println("Foo not used");
    }
  }
}

the branch with Foo is removed, while

class Foo{
    void bar1(){ System.out.println("bar!");}
}

class Main {
  public void main(String[] args) {
    boolean useFoo = false;
    if (useFoo) {
      Foo f = new Foo();
      f.bar1();
    } else {
      System.out.println("Foo not used");
    }
  }
}

the branch with Foo is generated.

Any indirection disables optimizations

Again, this can be tested easily with the following classes

class Foo{
    void bar1(){ System.out.println("bar!");}
}

public class Main1 {
  private static boolean useFoo(){return false;}
  public static void main(String[] args) {
    if (useFoo()) {
      Foo f = new Foo();
      f.bar1();
    } else {
      System.out.println("Foo not used");
    }
  }
}

public class Main2 {
  private static boolean useFoo = createFalse();
  private static boolean createFalse(){return false;}
  public static void main(String[] args) {
    if (useFoo()) {
      Foo f = new Foo();
      f.bar1();
    } else {
      System.out.println("Foo not used");
    }
  }
}

class Main3 {
  private static void impl(final boolean b){
     if (b) {
      Foo f = new Foo();
      f.bar1();
    } else {
      System.out.println("Foo not used");
    }
  }
  public static void main(String[] args) {
    impl(false);
  }
}

In all cases, it is possible to see that the branch with the creation of a local variable Foo is not optimized out.

The reason is probably that javac does not do any inlining.

In the first case, even if useFoo() trivially returns false, the information is not available when compiling if (useFoo()).

The same holds for the second example, thus a final variable of type needs to be initialized with a literal value.

For parameter types, when compiling private static void impl(final boolean b), javac does not acknowledge that it is called only in one place with the value false. The optimization would probably kick in all three examples if javac would inline the code, and replace the parameter/function call with false.

The variable needs to be a primitive type

This is a corollary of the previous point because any class adds at least one indirection.

The reason why using Boolean instead boolean makes such a big difference, even if only one letter is changed, is easily explained when looking at the disassembled class file

class Main {
  public void main(String[] args) {
    final Boolean useFoo = false;
    if (useFoo) {
      Foo f = new Foo();
      f.bar1();
    } else {
      System.out.println("Foo not used");
    }
  }
}

This code is in fact, because of Autoboxing and Unboxing, equivalent to

class Main {
  public void main(String[] args) {
    final Boolean useFoo = Boolean.valueOf(false);
    if (useFoo.booleanValue()) {
      Foo f = new Foo();
      f.bar1();
    } else {
      System.out.println("Foo not used");
    }
  }
}

One indirection can be removed by replacing Boolean.valueOf(false) with Boolean.FALSE, but even if the compiler would be able to inline Boolean.booleanValue, it would probably not eliminate the branch where Foo is created so easily.

This is because Boolean can (unfortunately) be modified with reflection at runtime to behave unexpectedly.

The following program prints "Boolean.FALSE is true!":

package main;
import java.lang.reflect.*;

public class Main {
  static {
    try {
      Field VALUE = Boolean.class.getDeclaredField("value");
      VALUE.setAccessible(true);
      VALUE.set(false, true);
    } catch (ReflectiveOperationException e) {
      throw new Error(e);
    }
  }

  public static void main(String[] args) {
    if(Boolean.FALSE){
        System.out.println("Boolean.FALSE is true!");
    }
  }
}

Granted, this behavior is not portable, the Java runtime also prints the following warnings:

WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by main.Main to field java.lang.Boolean.value
WARNING: Please consider reporting this to the maintainers of main.Main
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release

This program demonstrates that it is not possible to (currently) resolve at compile-time if Boolean.FALSE is true or not, as the values are determined at runtime unless one is able to do a "whole-program" analysis, i.e. analyze the main function and all used dependencies at once.

One could think that it could remove the indirection by comparing the addresses of Objects instead of calling a member function, like

class Main {
  public void main(String[] args) {
    final Boolean useFoo = Boolean.FALSE;
    if (useFoo == Boolean.TRUE) {
      Foo f = new Foo();
      f.bar1();
    } else {
      System.out.println("Foo not used");
    }
  }
}

but also in this case, javac does not optimize the code.

In the beginning, I thought it would not optimize the code, because the address of Boolean.TRUE is determined at runtime, and thus the comparison cannot be made at compile-time. The same would hold if one tries to compare Boolean.FALSE == null. But then I noticed that Boolean.FALSE == Boolean.FALSE and Boolean.FALSE != Boolean.FALSE does not get optimized either to true/false, thus probably javac does not attempt to do anything when it encounters an Object.

What about strings?

Strings are a special snowflake in Java, and during compilation are treated as literals, similar to primitive types, even if they are Objects.

In fact "a" != null gets optimized to true (and thous leads to dead code optimization) and "a" != "b" also gets optimized. "a" == "a" gets optimized to true too, but because comparing String with the same content by address does not always yield the same result, as it depends on how strings are internalized, I’m unsure if this comparison is trustworthy. As calling .equals or .isEmpty adds an indirection, there is no dead-code elimination.

What about enums?

Like String enums are special too. In particular, it is guaranteed that they can be compared by address, contrary to other objects, String included.

enum UseFoo {
  Yes, No
}

class Main {
  public void main(String[] args) {
    final UseFoo useFoo = UseFoo.No;
    if (useFoo == UseFoo.Yes) {
      Foo f = new Foo();
      f.bar1();
    } else {
      System.out.println("Foo not used");
    }
  }
}

Again, by looking at the bytecode it is possible to see that the dead branch is not eliminated, and the possible reason.

The values of the enum are initialized at runtime, they are not initialized like String objects or primitive types when using literals, but are more similar to Boolean and other objects.

What about arrays of primitives?

Arrays are special too. They are objects too but do not add indirections. It is possible to access the contained elements directly and query the size too. Nevertheless, all those operations are done at runtime and do not lead to dead-code optimization.

What about AOT?

Ahead-of-Time Compilation has been introduced in Java 9, and permits to compile Java code to native code instead of bytecode. jaotc is an optimizing compiler, but I could find little information on how it works. Unfortunately, it has been deprecated and removed, it seems that it has seen little use since its introduction.

But even if the tool would be available, it does not produce .class files and thus artifacts that can be used like normal Java libraries by others.

Could javac handle those and other constructs better?

Yes. Most compiler optimization techniques are not novel, and many can be found in the JIT and probably in AOT too.

I would like my code to be optimized at compile-time, what can I do?

Unless you are measuring, I do not believe that refactoring code for the sole purpose of "performance" is a good idea.

I would still recommend, for example, to replace Boolean with boolean as it makes the program simpler to reason about (a Boolean could be null), but not if it makes the code more complex (boolean cannot be used with generics, thus it might lead to code duplication or bigger refactorings).

The same holds for using final member variables. Immutable objects make it easier to reason about the flow of the program, especially in a multithreaded environment. For local variables and parameters, writing final does not make the code necessarily easier to read or more clear.

What one could do, would be to use another compiler, or optimize the .class files.

ProGuard

ProGuard, which is part of the Android SDK, is a program that can minimize and optimize a .jar artifact, and thus can help in some of the scenarios described above.

Unless otherwise written, ProGuard has been invoked similarly to proguard -dontobfuscate -libraryjars /usr/lib/jvm/java-11-openjdk-amd64/jmods/java.base.jmod -injars Main.jar -outjars Main.opt.jar -keep "public class Main { public static void main(java.lang.String[]);}", and the bytecode has been made readable with `javap -c 'jar:file:Main.opt.jar!/Main.class' ` `

Non-final variables

Following snippet

class Foo{
    void bar1(){ System.out.println("bar!");}
}

public class Main {
  public static void main(String[] args) {
    boolean b = false;
    if (b) {
      Foo f = new Foo();
      f.bar1();
    } else {
      System.out.println("Foo not used");
    }
  }
}

gets optimized too

  public static void main(java.lang.String[]);
    Code:
       0: getstatic     #6                  // Field java/lang/System.out:Ljava/io/PrintStream;
       3: ldc           #1                  // String Foo not used
       5: invokevirtual #7                  // Method java/io/PrintStream.println:(Ljava/lang/String;)V
       8: return

And it is trivial to see, that there is no reference to any Foo inside main.

Object types (Boolean instead of boolean)

The following example

class Foo{
    void bar1(){ System.out.println("bar!");}
}

public class Main {
  public static void main(String[] args) {
  final Boolean b = false;
    if (b) {
      Foo f = new Foo();
      f.bar1();
    } else {
      System.out.println("Foo not used");
    }
  }
}

is optimized too

  public static void main(java.lang.String[]);
    Code:
       0: getstatic     #10                 // Field java/lang/Boolean.FALSE:Ljava/lang/Boolean;
       3: dup
       4: astore_0
       5: invokevirtual #14                 // Method java/lang/Boolean.booleanValue:()Z
       8: ifeq          26
      11: new           #3                  // class Foo
      14: invokespecial #12                 // Method Foo."<init>":()V
      17: getstatic     #11                 // Field java/lang/System.out:Ljava/io/PrintStream;
      20: ldc           #2                  // String bar!
      22: invokevirtual #13                 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
      25: return
      26: getstatic     #11                 // Field java/lang/System.out:Ljava/io/PrintStream;
      29: ldc           #1                  // String Foo not used
      31: invokevirtual #13                 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
      34: return

Boolean b = false; is "optimized" to Boolean b = Boolean.False, probably thanks to inlining. But the test is still done at runtime.

By using -optimizationpasses 2 as an optimization flag (default value is 1), the code is reduced to

  public static void main(java.lang.String[]);
    Code:
       0: iconst_0
       1: ifeq          19
       4: new           #3                  // class Foo
       7: invokespecial #10                 // Method Foo."<init>":()V
      10: getstatic     #9                  // Field java/lang/System.out:Ljava/io/PrintStream;
      13: ldc           #2                  // String bar!
      15: invokevirtual #11                 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
      18: return
      19: getstatic     #9                  // Field java/lang/System.out:Ljava/io/PrintStream;
      22: ldc           #1                  // String Foo not used
      24: invokevirtual #11                 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
      27: return

There is no Boolean anymore, but a test is still done at runtime. With -optimizationpasses 3, finally, the generated code is optimized to

  public static void main(java.lang.String[]);
    Code:
       0: getstatic     #6                  // Field java/lang/System.out:Ljava/io/PrintStream;
       3: ldc           #1                  // String Foo not used
       5: invokevirtual #7                  // Method java/io/PrintStream.println:(Ljava/lang/String;)V
       8: return

Strings

Following example

class Foo{
    void bar1(){ System.out.println("bar!");}
}

public class Main {
  public static void main(String[] args) {
    if ("nofoo".isEmpty()) {
      Foo f = new Foo();
      f.bar1();
    } else {
      System.out.println("Foo not used");
    }
  }
}

is compiled to

  public static void main(java.lang.String[]);
    Code:
       0: ldc           #3                  // String nofoo
       2: invokevirtual #15                 // Method java/lang/String.isEmpty:()Z
       5: ifeq          23
       8: new           #4                  // class Foo
      11: invokespecial #12                 // Method Foo."<init>":()V
      14: getstatic     #11                 // Field java/lang/System.out:Ljava/io/PrintStream;
      17: ldc           #2                  // String bar!
      19: invokevirtual #13                 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
      22: return
      23: getstatic     #11                 // Field java/lang/System.out:Ljava/io/PrintStream;
      26: ldc           #1                  // String Foo not used
      28: invokevirtual #13                 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
      31: return

It is strange to observe that Foo.bar has been inlined, but String.isEmpty not. If ProGuard would have been able to inline String.isEmpty, it would have led to the desired dead-code optimization, similar to the example with Boolean

Probably because of reflection, this is not as easy as it sounds, or maybe the implementation of isEmpty is more complex than I think. Another possibility is that the function is not implemented in Java, but in native code in the JRE, because even when using -optimizationpasses with greater values, it does not get inlined.

Enums

The output of

class Foo{
    void bar1(){ System.out.println("bar!");}
}

enum UseFoo {
  Yes, No
}

public class Main {
  public static void main(String[] args) {
    final UseFoo useFoo = UseFoo.No;
    if (useFoo == UseFoo.Yes) {
      Foo f = new Foo();
      f.bar1();
    } else {
      System.out.println("Foo not used");
    }
  }
}

after compiling it with javac and optimizing it with proguard, the output is

  public static void main(java.lang.String[]);
    Code:
       0: getstatic     #7                  // Field UseFoo.No$31391470:I
       3: pop
       4: getstatic     #8                  // Field UseFoo.Yes$31391470:I
       7: pop
       8: getstatic     #9                  // Field java/lang/System.out:Ljava/io/PrintStream;
      11: ldc           #1                  // String Foo not used
      13: invokevirtual #10                 // Method java/io/PrintStream.println:(Ljava/lang/String;)V
      16: return

ProGuard can determine that the address of UseFoo.Yes is different from UseFoo.No. Strangely the value is loaded in memory with getstatic and discarded afterward, even when using -optimizationpasses with higher values. It seems to me that this code could get eliminated too.

Could using another programming language help?

ProGuard seems to help in some of the presented use-cases, but not all of them.

It would be nicer if javac would be able to accomplish this task alone, but it is currently not.

Once upon a time, there was gcj, which as far as I know was an optimizing compiler, but it has been discontinued, and never gained much popularity.

Using an alternate language means using a different compiler, so I decided to run a couple of tests with Kotlin.

Kotlin and Strings

fun main(args: Array<String>) {
    val noFoo = "noFoo";
    if(noFoo.isEmpty())
        println("Hello, Foo!");
    else
        println("Hello, World!");
}

it is interesting to see, that compared to the equivalent Java program

class Main {
  public static void main(String[] args) {
    final String noFoo = "noFoo";
    if (noFoo.isEmpty()) {
      System.out.println("Hello, Foo!");
    } else {
      System.out.println("Hello, World!");
    }
  }
}

As the generated bytecode is different, I hoped that ProGuard would have been able to inline it further and eliminate the dead branch.

  public static final void main(java.lang.String[]);
    Code:
       0: aload_0
       1: ldc           #8                  // String args
       3: invokestatic  #19                 // Method kotlin/jvm/internal/Intrinsics.checkNotNullParameter:(Ljava/lang/Object;Ljava/lang/String;)V
       6: ldc           #9                  // String noFoo
       8: checkcast     #13                 // class java/lang/CharSequence
      11: invokeinterface #20,  1           // InterfaceMethod java/lang/CharSequence.length:()I
      16: ifne          23
      19: iconst_1
      20: goto          24
      23: iconst_0
      24: ifeq          36
      27: getstatic     #17                 // Field java/lang/System.out:Ljava/io/PrintStream;
      30: ldc           #6                  // String Hello, Foo!
      32: invokevirtual #18                 // Method java/io/PrintStream.println:(Ljava/lang/Object;)V
      35: return
      36: getstatic     #17                 // Field java/lang/System.out:Ljava/io/PrintStream;
      39: ldc           #7                  // String Hello, World!
      41: invokevirtual #18                 // Method java/io/PrintStream.println:(Ljava/lang/Object;)V
      44: return
}

Unfortunately, it does not.

Are there any downsides to optimize the compiled source code?

Yes; as with most approaches, there are compromises.

For example, optimizing code means that some analysis and transformations are necessary. This, generally, means longer compile times.

Long build times are often problematic in bigger projects, especially while developing.

Another downside of an optimizing compiler is that small changes can have unpredictable effects on binary sizes.

Because a value is changed from false to true, a class that was previously not used and thus its corresponding class file and transitive dependencies eliminated from the artifact cannot be eliminated anymore, and they might take up a lot of space.

Note 📝
Predicting the performance of a piece of code is difficult, even more, if the compiler might or might not inline function calls and do other optimizations. In this case, as the JIT does optimizations at runtime, an optimizing compiler would not make it harder to understand the performance implications (it would probably make it easier, as it is at least possible to look at the generated code).

Another disadvantage is that some optimizations done at compile time might prevent better optimizations at runtime. For example, the compiler could decide to inline a function that is used in only one place, but if the function is never called at runtime, the inlining could make the calling function slower, if the inlined function is big enough.

This issue can be avoided with profile-guided optimization, but it is not an approach without downsides.

For functions below a certain threshold of instructions, this should not be an issue (like replacing Boolean.valueOf(false) with Boolean.FALSE) and removing dead code inside a function, should also not prevent any optimizations that can be done at runtime.

An optimizing compiler might introduce bugs, at least the eclipse compiler did it at least once, by optimizing too much code away.

Optimizations can also make debugging a more unpleasant experience. If variables are optimized out, it is not possible to inspect their values with a debugger. If the compiler does a static flow analysis and detects unreachable code and eliminates it, it cannot be reached anymore by changing some values during the program execution with the debugger.

For this reason, optimizing compilers have often a flag for enabling and disabling optimizations.

Are there some inherent limitations?

Yes, changing the layout of a class, like removing unused classes or functions, is problematic. Reflection and introspection are the main reason why such operations are not possible. While it is true that such techniques are rare in Java code, calling Java functions from other languages like C is done through reflection, and it is the only official way for binding different languages outside the JVM together.

Note 📝
Also in C++ the compiler does generally not modify the layout of a struct/class.

While it is possible to define multiple classes on the same file, javac create a .class file for every class. Thus it is not even true that classes on the same file are necessarily packaged together.

In Java, compared to C, there is no linking phase, where the compiled code is merged. The .class files are only packaged together, or left on the file system, and evaluated at runtime as if they all were independent shared libraries.

This removes many optimization opportunities, it is probably by design that Proguard operates on .jar files and not .class, as many optimizations are done between classes, even if some make sense for single .class files too.

Warning ⚠️
This is only true to a certain extent, because (surprisingly) javac does dead code elimination base on static final primitives of other classes, defined on different files.

What are the benefits

I believe that if the javac compiler would be a little bit more aggressive at optimizing code, and if something like Proguard would be part of the official JDK, there would be benefits for most applications.

Smaller binaries mean faster download times, fewer data to read from the disk, and less code for the JIT to analyze and optimize.

Even if this does not have a noticeable effect on the performance of the single program, it generally means that fewer resources are necessary for accomplishing the same task. Especially on embedded devices, or devices powered by a battery, this can make a bigger difference.

And even if 1MB of difference would not do any measurable difference (except for the reduced size), if every application would require less pace, that could sum up and there could be more place either for other programs or for user data.

The JIT is surely very good at optimizing code, but if the application does not have log runtimes, like a couple of seconds, there is probably no way the JIT can optimize anything without introducing a much bigger overhead. For those types of programs, reducing the startup time and not spending any time running the JIT or GC, is a very tangible advantage.

An interesting read can be found on the git mailing list. I do not know if they tried using ProGuard, but they definitively measured the refactorings and tried to take advantage of the (scarce) compile-time optimization opportunities.

In particular:

JGit struggles with not having an efficient way to represent a SHA-1. C can just say "unsigned char[20]" and have it inline into the container’s memory allocation. A byte[20] in Java will cost an additional 16 bytes of memory, and be slower to access because the bytes themselves are in a different area of memory from the container object. We try to work around it by converting from a byte[20] to 5 ints, but that costs us machine instructions.

— Shawn O. Pearce

Other "unorthodox" bits of advice can be found here.

I do not find it strange that ProGuard is part of the Android development Kit, especially as on some Android devices resources are still scarce.

Even if today’s phones do have at least a couple of GB of Ram, and disks with 10 or more gigabytes, I still keep thinking about my 1GB SD card having a whole Debian system with running applications.

Why tools like ProGuard are not enough

Because compared to the compiler, they are much less tested. Also, it is hard to convince all parties to use an external tool to transform the compiler-generated code; if the compiler would do some of the work by itself, everyone would automatically benefit from it, without changing anything.

There is some work, at least for constant values for enabling new optimizations at compile-time, but it does not seem to be a high priority. Also considering what ProGuard is able to do today, it does not seem to me that language modifications are necessary. Nevertheless, language modification can be interesting, for example having something like constexpr like in C++ to ensure that some computations are done at compile-time would surely be a nice feature, and provide a whole new category of optimization opportunities.

Having written all that, the main advantage of tools like ProGuard is that any language that compiles to bytecode, thus Scala, Kotlin, Groovy, and many others, can take advantage of it.


Do you want to share your opinion? Or is there an error, some parts that are not clear enough?

You can contact me here.