The peer comment about static typing is correct, but there's more, err, flavor to be enjoyed in the challenges of JITing JS. Here's some deopt scenarios to keep you up at night.
To whet your appetite, what happens if you redefine Math.round? Java prohibits this for obvious reasons, but a JS joker may write:
Math.round = () => Infinity;
JS engines really do inline Math.round, and must be prepared for this nonsense.
It gets worse. Maybe you check for a property which is usually not found:
if (!val.uuid)
val.uuid = uuidgen();
Hidden classes can almost reduce this to a pointer comparison. But what if someone adds a uuid property on Object.prototype? Every check is busted! v8 handles this with "validity cells", and it's ugly, requiring that every object know about every object which "inherits" from it.
Now if you are a monster you may choose to write:
Object.defineProperty(Array.prototype, "42", {value: "lol"});
console.log([][42]); // yes it prints "lol"
Every array gets a default value for 42. Think about how you would JIT numeric code in such an environment...
They do, it's just not a priority. The PEP I linked (as well as most of the work of its author, Victor Stinner, in the last years) are motivated by JITing.
There's also the "frame evaluation API" PEP [1], whose purpose is to allow pluggable evaluators in CPython without forking the entire interpreter, like Unladen Swallow had to.
Python is orders of magnitude slower than JS, though. It has the same problems, but doesn't even try solving most. PyPy would be a better VM to compare against.
Java deals with almost exactly the same issue. If you define an interface and a single implementing class, the code will be compiled to always call that classes method, and the deoptimized / recompiled if you load another class that implements that interface. JS could deal with these issues in a similar manner.
The Java JIT has static type information to work with, the JS JIT can only infer type information via heroic efforts. Static types do mean something, especially when working with primitives and other unboxed data types.
This. Javascript engines have to do stuff like https://mathiasbynens.be/notes/shapes-ics to assume many objects will have the same key names and types and create an optimized class layout, with some kind of expensive fallback if anyone ever randomly stores a key or value that violates the assumption. They can't even be sure you'll always be doing integer math (unless you resort to bit masking a la asm.js). Hinting (like __slots__ in Python) might have helped some.
This was one of the main motivations for writing dart. The V8 implementers were tired of having to write these kinds of hacks. Even though dart was initially a dynamically typed language, the shape of the objects was stable.
> The Java JIT has static type information to work with
I don't know that the java JIT has type feedback. However regardless Java being statically typed means Java code is structurally closer to what a JIT wants and can analyse, it's much more difficult (and thus rarer) to fuck around with types and objects generated on the fly at runtime for instance, you're not going to add new attributes to an instance whereas that's just tuesday in javascript.
Java JITs definitely do have type feedback. They profile the types of objects at virtual call sites and do guarded inlining, as well as removing expensive interface casts, etc. The JVM doesn't really have a distinction between classes generated on the fly and those that were on disk, as they all go through a classloader anyway. JVMs, for the most part, blow their brains out at the end of the day and start with zero the next VM run.
Well, you're right that JS JITs can only infer type information via heroic efforts, but AFAIK the Java compiler throws away any type information from the source code, which means that the JVM JIT needs to inver type information again from the byte code.
Still, the JVM JIT is faster than JS due to reasons explained in the sibling comment of the parent one.
Java is only throwing generic type information away in the declaration of classes and methods. The code that is accessing generic types is compiled with casts to the expected type. List<String>.get(1) will generate a cast to String therefore nothing is actually missing in the generated code. It is only missing when you use reflection to e.g. deserialize a List via Jackson by passing List<String>.class. That unfortunately won't work because the generic type parameter is not part of the declaration, only the generated code.
> but AFAIK the Java compiler throws away any type information from the source code
You're thinking about generics. .class files preserve a whole bunch of type information (I'm building a .class decompiler in my free time, and I'm looking at that very same data in my debugger ATM).
Type information is present in fields and class inheritance
For example, a class like this
`class Foo implements Bar<String>`
Retains the fact that the generic type is a String.
That information is completely lost at method invocation. So a method that takes a `Bar<String>` ultimately compiles to a method that takes a `Bar` and knows nothing of the String.
To get that generic information down you have to engage in some fun tricks using either the class or field method I mentioned earlier. (Usually you do this with a second type parameter where it matters).
Java JITs don’t throw away type information. The bytecode carries strong types, though you have to do some work to get them out (some abstract interpretation). But that’s something you can do statically; no need for profiling or speculation.
Yeah but the thread is about types, not generic types specifically.
Also it’s not really true that genetics are erased. There’s that horrible thing javac does so that the VM can support reflection for generics. I get your point though.
Do you mean the passing of classes as additional arguments? I haven't seen the horrible thing.
Overall I think C# got this one right. Generics are right there in the binaries.
bUt ActuALLy, I think the rightest right thing is to do specialization up to representation at link time (or compile time), when the whole program is available, ala MLton. Virgil does this. Of course this is not possible in a dynamic code loading environment but I only have so many fucks to give in this life.
No, I meant that the Class format has metadata about what the genetics were so some java.lang.reflect thing can query about what the genetic type was. I don’t think it comes with soundness guarantees.
I agree C# got this right.
I agree that doing specialization up to link time is ideal from a certain standpoint.
All generics in Java are of type Object from the start, you can't call any methods on them not implemented by Object. So it is wrong to say they are erased, they were never there to start with.
You can add a constraint to a generic declaration, which is an upper (or lower) bound on the allowable type arguments.
e.g.
class A<Y extends X> {
// in this scope, Y is known to be of at least type X,
// so, we can call methods on expressions of type Y
// that belong to type X (and not Object)
Y m() { ... }
}
Type arguments are omitted from usage sites. The technique is literally called erasure in papers and documentation.
a = new A<Foo>();
f = a.m(); // should return a Foo, in bytecode returns Y
// and compiler inserts a cast from Y -> Foo
Generic code is slower in Java because of these extra casts. To get back (most of) the performance, the JVM has to inline enough methods to be able to track the types from start to finish. It can't always.
And that's why Java is slower than it could be. If I'm using String[] array, it can't contain anything but String objects, so JVM does not have to check type everytime I'm accessing that array. That's not true for ArrayList<String>, where compiler must check returned object type from `get(index)` (because it really is Object and can contain anything), but it could be true with better language design.
The Java standard library isn't particularly verbose or insane. It's the "enterprise java" world that deals with the sort of excesses you're thinking about.
It depends. For example, you can verbosely write a loop to copy data from one array to another, or you can use memcpy. The later is typically implemented as hand optimised assembly - it's a pretty small number of operations in x86_64. The former - maybe the compiler optimizes, maybe it doesn't. If it does optimize it, it's definitely more complexity in the compiler and slows it down.
In general having some higher level, well optimized helpers can certainly reduce verbosity and increase speed. That said, some types of verbosity just make the programmer write what the compiler would translate to anyway. Or can end up being unnecessary - e.g. a strongly typed language with no inference certain slows the programmer with no effect on the program (though maybe an effect on the type checker).
A fast memcpy used to be complicated to implement but these days I heard a simple
rep movsb
will do the trick. This assumes that the source register points to source, destination register points to destination and counter register is set to number of bytes to copy.
So for getters and setters a Jit can inline the function call and in the end you have a direct memory access.
In your example you have to cast because URLConnection could be a HttpURLConnection or JarURLConnection.
But again a good Jit would speculate that it is always a HttpURLConnection and deoptimize if not.
The JVM also has full information about the class hierarchy at runtime. If you add a class that overrides a function it will deoptimize that overriden function to a virtual function but otherwise the JVM will just treat it as a static function and inline it if necessary.
The design of JavaScript is not very friendly to high performance. Even though Java is itself not easy to generate efficient code from, producing efficient code from JavaScript is far more difficult.
Then it’s problematic to be so JS and V8 centric. Also the post specifically talked about JS being faster than Python and slower than Java so that’s what this thread is about.