The calling convention is a serious wtf. They're relying on store-load forwardin...

jjtheblunt · on Dec 29, 2019

I'd assert the calling convention is strange by design: there is the underlying reality that, to support actual closures and lambdas, as Go does, in the Lisp sense, not the fake Java sense, one can't use the C calling conventions. In particular, it's not true that a called function can expect to find bindings for its variables on a call stack, because of the upward funargs issue: some bound variables for a called function in the presense of true lambdas and thus closures will necessarily NOT be found on the C call stack, because of the dissociation of scope with liveness in the presence of lambda (anonymous functions).

pcwalton · on Dec 29, 2019

What you describe is a non-problem: you can trivially spill upvars to the stack on-demand, as most compilers do, while keeping formal parameters in registers. Java needs upvars to be final because it doesn't have the concept of "reference to local variable", but that's just a limitation of the JVM, and one easily solved in other runtimes that very much can pass arguments in registers (e.g. .NET).

tjalfi · on Dec 30, 2019

The Go developers have considered changing to a register-based calling convention[0][1].

I found these tickets a few weeks ago and they explained why the Go developers haven't yet made this change.

[0] https://github.com/golang/go/issues/18597

[1] https://github.com/golang/go/issues/27539

saagarjha · on Dec 30, 2019

Interestingly, one of the suggestions to deal with issues in panic backtraces due to this change is to use DWARF.

kjeetgill · on Dec 29, 2019

I'm not familiar with the issue: what makes Java's lambdas/closures fake? Is it that bound variables need to be effectively final?

hinkley · on Dec 29, 2019

I don’t know if they’ve done anything new, but as originally implemented, they were inner classes.

erik_seaberg · on Dec 29, 2019

The inner class gets copies of the variables, so imperative code that wants to reassign them isn't allowed because it probably won't do what you expected.

The goal is not to GC stack frames. But I'm not sure why the didn't create an inner class to hold the closed-over variables in non-final fields (moving them from the stack to the heap) for both the function and all closures it creates.

(Obligatory "doctor, it hurts when I use mutable state!")

kjeetgill · on Dec 30, 2019

Ah, gotcha. Honestly, I always use this as an example of one of the subtle design points that I really appreciate Java for.

Nitpick, but saying copies in Java can get confusing. Both primitives and references are bound by value. I'm sure you know, but for others: no objects are copied.

I always found this limitation had reassuring regularity; it's the same way arguments are bound to function parameters (minus bring final). Local variables being isolated from "other scopes" means that any interthread communication must be mediated through objects.

pjmlp · on Dec 30, 2019

They were never implemented that way, rather make use of invokedynamic bytecode.

https://youtu.be/Uns1dm3Laq4

Android Java is the one making use of anonymous inner classes instead.

kjeetgill · on Dec 30, 2019

I believe they still are, with the caveat that the bytecode is built at runtime for lambdas not compile time like regular inner classes.

pjmlp · on Dec 30, 2019

Invokedynamic is not related at all to inner classes.

kjeetgill · on Dec 30, 2019

Maybe my memory is a little rusty or I glossed over a bit too much, but I was thinking of how hotspot does lambdas from here[0]. It seems to use the Invokedyanmic Bootstrap method to spin an InnerClass at runtime. To be fair, it's a hotspot thing and not in the JVM spec.

[0]: https://github.com/frohoff/jdk8u-jdk/blob/master/src/share/c...

pjmlp · on Dec 30, 2019

Better check out from Brian's talk.

Not really, because the class file with invokedynamic bytecodes is supposed to work across all JVM implementations.

kjeetgill · on Dec 30, 2019

I think we agree? The bytecode is transferrable because the classfile only contains an invokedynamic that calls the LambdaMetaFactory for bootstrapping. The LambdaMetaFactory is provided by the runtime JVM itself so that linkage dosn't introduce an implementation dependence.

Hotspot's just happens to spin an inner class at runtime.

pjmlp · on Dec 30, 2019

Yes we agree, I do conceed that I wasn't fully correct.

paulddraper · on Dec 29, 2019

> Is it that bound variables need to be effectively final?

I believe this is it.

temac · on Dec 29, 2019

Even with store-load fw, you get a penalty (~3 cycle latency) over register accesses, no?

Jasper_ · on Dec 29, 2019

yeah, but it's cheaper than full L1 hit, which is where it would go if not for that.

temac · on Dec 29, 2019

I was trying to cite a typical full L1 hit latency... I thought store-load fw simply avoid having to flush the complete write buffer before the access is even possible, which risk to take far more than ~3 cycles. Now maybe it can be faster in some cases than an L1 hit, I don't know.

Edit: it seems that store-load forwarding is actually slightly slower than L1: https://www.agner.org/optimize/blog/read.php?i=854#854

pcwalton · on Dec 29, 2019

I'm guessing that the reason was simply ease of porting 32-bit x86 assembly code to 64-bit.