I would like to help improve the Haxe JVM's backends closure

sstewartgallus · May 19, 2020, 7:17pm

Hi

I saw some recent posts on Haxe’s implementation of closures on the
JVM and all I can say is yikes!

The old MethodHandle implementation was extremely broken. The new
typed functions implementation is better but could use invokedynamic
but this time properly so it works fast.

I have been working with MethodHandles on personal projects for a bit
and would love to help discuss this subject with the devs to help make
this area better.

It appears to me the old backend used MethodHandle directly to
represent functions and closures. This is a bad idea.

MethodHandles are not faster than regular method calls only more flexible.
NEVER call a MethodHandle directly unless they are static final values.

These limitations aren’t actually a big problem because you can make
these calls static final by dynamically spinning bytecode (note that
you can reuse the JVM’s LambdaMetafactory for this), by using the
invokedynamic call, or by attaching the MethodHandle to a
MutableCallsite.

The fastest simple way to represent a closure in Java is by using an
abstract base class like:

abstract class FunctionOfMyType {
abstract Blah apply(Foo foo, Gar gar, Bar bar);
}

Why MethodHandles are useful is because combined with invokedynamic
they are much more flexible.

In the blog post JVM Typed Functions - Haxe - The Cross-platform Toolkit you have an implementation like:

public Object invoke() {
    return this.invoke((Object)null);
}

public Object invoke(Object arg0) {
    return this.invoke(arg0, (Object)null);
}

public Object invoke(Object arg0, Object arg1) {
    return null;
}

The typed functions approach is actually a reasonable basis to go off of.
But you can do better.

You shouldn’t manually emit such adapter bytecode. Instead you should
use invokedynamic to generate the adapter at runtime. I would
recommend reusing a framework like dynalink.

My code is a slow WIP mess but you can see how to badly implement
GuardingTypeConverterFactory to convert between Java object’s and your
own representation.

https://github.com/sstewartgallus/jsystemf/blob/master/src/com/sstewartgallus/runtime/TermLinker.java

If you let indy generate such adapters automatically then you will
save on bytecode.

There are really 3 approaches here:

The existing approach
Using indy to generate the adapters dynamically.

public Object invoke() {
    return #indy "invoke" this;
}

public Object invoke(Object arg0) {
    return #indy "invoke" this, arg0;
}

public Object invoke(Object arg0, Object arg1) {
    return #indy "invoke" this, arg0, arg1;
}

Use indy for closure calls and then generate the callsite dynamicly.
This allows for your own callsite cache but is the least compatible
with Android.

Speaking about Android I head the other reason for using the new
approach was supporting older devices.

One approach you could do is dynamically spin an adapter
object. Calling a function dynamically would be something like:

    void mymethod() {
         // stuff...
         var y = ADAPTER.invoke(f, x);
         // stuff..
    }

    private static final Adapter ADAPTER = AdapterObject.getAdapter(AdapterObject.class);

abstract class Adapter extends AdapterObject {
    static Object invoke(FunctionSig sig, Object value);
}

Note that there is tradeoff for better inlining with seperate adapters
for each callsite but more bytecode.

But I would need a deeper understanding of Haxe’s needs. And again I
would really like to talk to the team.

Thanks
Steven Stewart-Gallus

Simn · May 19, 2020, 9:21pm

Thank you very much for wanting to get involved! I’m the author of the JVM target and I’m happy to admit that I had no clue about the JVM whatsoever when I started working on it. There’s a lot of good documentation on it, but invokedynamic in particular made me question my sanity.

Before I dive deeper into this, I would like to know in which way the approach you propose is actually superior. I can’t imagine it being faster given that we’re already making direct (virtual) calls in the current implementation, assuming good types. Could you go a bit more into detail about this?

I’ve documented my previous adventures with invokedynamic here: [boss battle] invokedynamic · Issue #12 · Simn/genjvm · GitHub Reading through that again, I still don’t understand who is responsible for actually finding the method that is supposed to be called.

Again, thanks for getting involved! The combined JVM knowledge of our team leaves a bit to be desired, so the advice of someone with more experience here is very valuable.

sstewartgallus · May 19, 2020, 10:05pm

Hi

So I’m not sure of Haxe’s exact needs but I think invokedynamic could help in a few ways.

Generate less bytecode. Behind the scenes methodhandles use a lambda calculus based interpreter. Methodhandles only actually spin bytecode after more than a few calls, I forget the details. Also methodhandles are generated lazily so no bytecode is generated if one of your functions is never called dynamically.
Generate more specific adapter stubs. Currently your dynamic calls are all or nothing. Even if you know some of your arguments will always be raw numbers they will be boxed.
Help with the optional arguments implementation. I’m not up to date on the details of optional arguments in Haxe but the literal intent of invokedynamic is letting you implement different calling conventions. Indy seems like a good place to insert optional arguments into a methodhandle call. This could avoid a boxing penalty for calls that pass in integers. Otherwise you need to pass in a flag or a null value.
Help implement dynamic Dynamic (Types) - Haxe - The Cross-platform Toolkit . I hope you can see how dynamic field/method lookup is an obvious fit here. Just remember to fallback to a less specialized implementation if the callsite becomes relinked too much and unstable.
All of the dynamic benefits could be used for generics too potentially. Java can’t do this because of legacy reasons but there’s no reason a new language on the JVM couldn’t do generic calls without boxing using indy. This is one of the things I want to explore with my own pet project.
Help implement faster pattern matching. Now this is a complicated topic because the JVM isn’t really that good with switches but usually you’d want to avoid a long chain of ifs/elses. What you want to do is depending on the pattern generate a function that dispatches to the right case and if possible minimise condition checks. This seems like a good use of invokedynamic to me.

Specifically,

var myTree = Node(Leaf("foo"), Node(Leaf("bar"), Leaf("foobar")));
var match = switch (myTree) {
    // matches any Leaf
    case Leaf(_): "0";
    // matches any Node that has r = Leaf
    case Node(_, Leaf(_)): "1";
    // matches any Node that has has
    // r = another Node, which has
    // l = Leaf("bar")
    case Node(_, Node(Leaf("bar"), _)): "2";
    // matches anything
    case _: "3";
  }

could become something like.

var myTree = new Node(new Leaf("foo"), new Node(new Leaf("bar"), new Leaf("foobar")));
var matcher = #indy getCases(myTree, "Leaf(_)", "Node(_, Left(_))", "Node(_, Node(Leaf(\"bar\", _))");
var match = switch (matcher.case) {
case 0 -> "0";
case 1 -> "1";
case 2 -> "2";
default -> "3";
}

Getting a field from a pattern match would involve an invokedynamic call against the returned matcher object. This is tricky to get right and I don’t remember all the details of how they planned to do this for Java on the project Amber mailing list. You might want to pick through the OpenJDK mailing lists and some of Brian Goetz notes on the community review.

Finding the right way to organize invokedynamic is tough. I just went with the approach that using jdk.dynalink seemed to favor of using a couple of class specific linkers
https://github.com/sstewartgallus/jsystemf/blob/master/src/com/sstewartgallus/runtime/TermLinker.java#L37

I think the long story short of indy is that it is definitely much more flexible than regular calls but not faster. If you can find a way to avoid indy without boxing than indy will typically be slower.

The vast majority of cases can be emulated with something like:

static class CallSiteHelper$0 {
     static final CallSite CALLSITE = getCallSite("foo", "blah", "bar");
}
static void myfun() {
     CallSiteHelper$0.CALLSITE.invoke("gar", "feh");
}

Mind you, that doesn’t allow for relinking as MutableCallSite does and is kind of wasteful because you need a helper class for lazy statics. Also generating bytecode is really annoying.

Using constantdynamic is a bit less wasteful but then you might was as well use indy. Speaking of which you might want to look into condy for some of your literals like regular expressions.

sstewartgallus · May 19, 2020, 10:45pm

Also I just remembered a few other tricks you can do after a walk.
One pain point on the JVM is list/array initialization. invokedynamic/ldc can let you generate list/array constructors very easily with code like identity(int[].class).asCollector();.
As well invokedynamic can be useful for extending Java array objects and strings with your own interfaces such as maybe used in Abstract - Haxe - The Cross-platform Toolkit .
You still need to lookup a stub in a ClassValue for Strings/array objects but indy will let you guard on the callsite.

Simn · May 20, 2020, 5:49am

Let’s try with a concrete example of what we actually have to support. Here’s a piece of code which probably represents the worse case, where we have a call-site that is outright Dynamic:

class Main {
	static function main() {
		// local function
		callMe(i -> trace(i));
		// local closure (captures x)
		var x = 12;
		callMe(i -> trace(i + x));
		// static method
		callMe(someStaticMethod);
		// instance method
		callMe(new Main(12).someInstanceMethod);
		// with optional argument
		callMe(someStaticMethodWithOptionalArgument);
		// with different type
		callMe(someStaticMethodWithDifferentType);
	}

	static function someStaticMethod(i:Int) {
		trace(i);
	}

	static function someStaticMethodWithOptionalArgument(i:Int, i2 = 12) {
		trace(i + i2);
	}

	static function someStaticMethodWithDifferentType(f:Float) {
		trace(f);
	}

	static function callMe(d:Dynamic) {
		d(1);
	}

	var i:Int;

	function new(i:Int) {
		this.i = i;
	}

	function someInstanceMethod(i:Int) {
		trace(i + this.i);
	}
}

We can save this as Main.hx, then compile with haxe -main Main --java main.jar -D jvm (or on latest development with haxe -main Main --jvm main.jar). Decompiling the result gives us this:

package haxe.root;

import haxe.Log;
import haxe.generated.Anon0;
import haxe.java.Init;
import haxe.jvm.Function;
import haxe.jvm.Jvm;
import haxe.jvm.Object;

public class Main extends Object {
    public int i;

    public static void main(String[] args) {
        Init.init();
        main();
    }

    public static void main() {
        callMe(Main.Closure_main_0.Main$Closure_main_0);
        int x = 12;
        callMe(new Main.Closure_main_1(x));
        callMe(Main.Main_someStaticMethod.someStaticMethod);
        callMe(new Main(12).new Main_someInstanceMethod(new Main(12)));
        callMe(Main.Main_someStaticMethodWithOptionalArgument.someStaticMethodWithOptionalArgument);
        callMe(Main.Main_someStaticMethodWithDifferentType.someStaticMethodWithDifferentType);
    }

    public static void someStaticMethod(int i) {
        Log.trace.invoke(i, new Anon0("Main", "source/Main.hx", 19, "someStaticMethod"));
    }

    public static void someStaticMethodWithOptionalArgument(int i, Integer i2) {
        int i2 = i2 == null ? 12 : Jvm.toInt(i2);
        Log.trace.invoke(i + i2, new Anon0("Main", "source/Main.hx", 23, "someStaticMethodWithOptionalArgument"));
    }

    public static void someStaticMethodWithDifferentType(double f) {
        Log.trace.invoke(f, new Anon0("Main", "source/Main.hx", 27, "someStaticMethodWithDifferentType"));
    }

    public static void callMe(java.lang.Object d) {
        ((Function)d).invoke(1);
    }

    public void someInstanceMethod(int i) {
        Log.trace.invoke(i + this.i, new Anon0("Main", "source/Main.hx", 41, "someInstanceMethod"));
    }

    public Main(int i) {
        this.i = i;
    }

    public static class Main_someStaticMethod extends Function {
        public static Main.Main_someStaticMethod someStaticMethod = new Main.Main_someStaticMethod();

        Main_someStaticMethod() {
        }

        public void invoke(int i) {
            Main.someStaticMethod(i);
        }
    }

    public static class Main_someStaticMethodWithDifferentType extends Function {
        public static Main.Main_someStaticMethodWithDifferentType someStaticMethodWithDifferentType = new Main.Main_someStaticMethodWithDifferentType();

        Main_someStaticMethodWithDifferentType() {
        }

        public void invoke(double f) {
            Main.someStaticMethodWithDifferentType(f);
        }
    }

    public final class Main_someInstanceMethod extends Function {
        public Main this;

        public Main_someInstanceMethod(Main var1) {
            thisx.this = var1;
            super();
        }

        public void invoke(int arg0) {
            thisx.this.someInstanceMethod(arg0);
        }

        public boolean equals(java.lang.Object other) {
            if (!(other instanceof Main.Main_someInstanceMethod)) {
                return false;
            } else {
                Main.Main_someInstanceMethod otherx = (Main.Main_someInstanceMethod)other;
                return thisx.this == otherx.this;
            }
        }
    }

    public static class Closure_main_0 extends Function {
        public static Main.Closure_main_0 Main$Closure_main_0 = new Main.Closure_main_0();

        Closure_main_0() {
        }

        public void invoke(java.lang.Object i) {
            Log.trace.invoke(i, new Anon0("Main", "source/Main.hx", 4, "main"));
        }
    }

    public static class Closure_main_1 extends Function {
        public int x;

        public Closure_main_1(int x) {
            this.x = x;
            super();
        }

        public void invoke(int i) {
            Log.trace.invoke(i + this.x, new Anon0("Main", "source/Main.hx", 7, "main"));
        }
    }

    public static class Main_someStaticMethodWithOptionalArgument extends Function {
        public static Main.Main_someStaticMethodWithOptionalArgument someStaticMethodWithOptionalArgument = new Main.Main_someStaticMethodWithOptionalArgument();

        Main_someStaticMethodWithOptionalArgument() {
        }

        public void invoke(int i, Integer i2) {
            Main.someStaticMethodWithOptionalArgument(i, i2);
        }
    }
}

This is fast and works well, though I’ll again admit that it’s not particularly elegant. What I struggle to understand is how using invokedynamic on the call-site improves anything. The other question is in which way the representation of function objects would change. Note that both the local closure and the someInstanceMethod case require context.

To be frank, it’s unlikely that we’ll change this implementation again, if for no other reason than the android problems. It would take some very convincing argument for me to consider this, even if we hide it behind something like -D jvm-version=8. However, I’m still very interested in how invokedynamic is supposed to be used in general, and maybe we can indeed utilize it in some way like you suggested.

Edit: I noticed in the bytecode that the call-site was boxing the int argument, which is unnecessary in many cases. I’ve just changed that in this commit. The instruction is now INVOKEVIRTUAL haxe/jvm/Function.invoke (I)V, which means that we get a direct call to the implementation in some of these examples, despite the call-site being typed as Dynamic.

sstewartgallus · May 20, 2020, 6:11pm

You should not need to change the representation of your functions much. It may be useful to spin bytecode that implements your function interface that directly calls into invokedynamic for it’s implementation on occasion.

Unboxing in more cases.
Less bytecode

Using invokedynamic on the callsite should allow unboxing in more cases. For example, as mentioned previously optional arguments would not need to be boxed with indy. As I understand it Function has a number of additional abstract methods specialising for common cases, (it would help if I could figure out where the class is located in the source.) So I’m not sure how it would handle a case like d(1, 1.4, 2, 8.5, “foo”, 4) .

Another possible advantage of invokedynamic is that generating excessive amounts of bytecode can be costly and using indy instead of extra bytecode wrappers can keep the bytecode count down.

As for Android support, that’s totally fair but I think it might be worth considering just treating Android as a separate target entirely. Android is kind of weird all around and is a bit of a special case.

Also why are the function objects static ? Shouldn’t they be static final? Also closure environments require final fields for correct semantics in concurrency.

Simn · May 21, 2020, 1:02pm

Ah yes, good call on making some of these final. I will look into that, thanks!

Simn · May 22, 2020, 8:31am

Done in [jvm] make more things `final` · HaxeFoundation/haxe@3f41cef · GitHub