Well, the point is not in some novelty, but that it does a similar thing as an o...

kaba0 · on Nov 23, 2022

Java bytecode is an open standard with more completely independent implementations than wasm, but it is slightly higher level. Graal (truffle) uses ASTs mapping to Java language primitives as a basis (in a way sorta similar to wasm, as that is not really a byte code).

elcritch · on Nov 24, 2022

WASM also takes a whitelisting approach to security rather than a blacklisting like the JVM. Java applets had lots of security issues over the years, whereas WASM provides much better isolation. Overall JVM or CLR always seemed too heavyweight for many use cases.

still_grokking · on Nov 24, 2022

Java had an extremely advanced and strict sandbox for many years.

It offered of course also the possibility to blacklist everything by default.

The problem with such a granular sandbox is that it's too complex for most use cases.

When you need to whitelist any and every call to the outside world this becomes very tedious.

That's why Java now removes the sandbox…

https://inside.java/2021/04/23/security-and-sandboxing-post-...

Let's see how this works out for WASM.

I guess: Without resorting to using some language(s) with build-in support for capability security a whitelist based sandbox approach won't ever work without issues.

First of all you can forget about C/C++. Everything in this languages assumes free access to the system.

Rust does not have any support for capabilities either (yet, and for the years to come at least).

The best current WASM sandbox implementation, which is in WasmTime not Wasmer, is not more than a FS block. (And the memory safety guaranties you get in any VM language, of course).

https://docs.wasmtime.dev/security.html

I don't find any such feature mentioned in the Wasmer docs. It seems completely without any sandbox! (To stress it once more: All VM languages are memory safe. That's not sandboxing. All the "sandboxed by default" claims are misleading, at least).

https://github.com/wasmerio/wasmer/issues/221

That even people here on HN believe in the currently unfulfilled WASM security promises shows only how good the WASM marketing is. This should make you even more skeptical. One should never over promise on security! Java had to learn this the hard way over many years (as their sandbox had have holes in the past, and was also almost impossible to correctly configure).

syrusakbary · on Nov 24, 2022

> I don't find any such feature mentioned in the Wasmer docs. It seems completely without any sandbox!

That's not correct. Wasmer aims to be a fully sandboxed WebAssembly Runtime. Wasmer WASI implementation is also fully sandboxed.

still_grokking · on Nov 24, 2022

> Wasmer WASI implementation is also fully sandboxed.

Could you please link to some authoritative documentation for that?

As I said I don't see this in any docs or GitHub sources.

The point is: It's irrelevant what they "aim for". The only interesting thing at the moment is the status quo.

syrusakbary · on Nov 24, 2022

Sure thing. Check the WASI README (a section was just added to clarify your comment)

https://github.com/wasmerio/wasmer/blob/master/lib/wasi/READ...

still_grokking · on Nov 25, 2022

Thanks for the effort!

I still don't get it.

What does "the aim to be 'fully sandboxed'" mean in the light of "support [of] standard I/O, file I/O, filesystem manipulation, memory management, time, string, environment variables, program startup etc."?

What are "secure systemcalls"? What does this mean on the technical level?

How is the simple FS access filter (something you get form std. Unix FS access rights since 50 years) something special here, or even considered "fully sandboxed"?

(I get the broader idea of capabilities. But I would see this only as a minor implementation detail in the current state).

syrusakbary · on Nov 25, 2022

Basically, what we mean by sandboxed is that the application would not be able to do anything harmful to your system unless explicitly allowed (similar ethos as Deno, compared to NPM).

Let me explain a bit more how:

* stdio / file io: thanks to Wasmer VFS and the WASI mapped dirs, the application will only able to access to the directories explicitly defined by the user when running a program. Any file/dir access outside of the allowed directories will throw an error (both by design of WASI and by our implementation of it). Note: by default stdout/stderr will be piped but that can also be easily customized.

* Environment variables: no environment variables can be accessed unless explicitly specified to our CLI (or SDKs)

* Time: we consider time access (get) to be harmless (WASI don't allow setting the platform/OS time, so we don't have to worry about that)

> What are "secure systemcalls"? What does this mean on the technical level?

It means system calls that we consider harmless to execute. For example, getting the current time (unix timestamp) is considered harmless by our WASI implementation (it should be easy to even shield that, if needed), but accessing a file could be harmful (that's why we ask for permission first).

Of course, no software is free from bugs that could jailbreak the sandbox, specially after Spectre and Meltdown. So having something to be "fully sandboxed" is an infinite game.

But our aim is there, nonetheless :)

still_grokking · on Nov 25, 2022

Thanks for the reply! I appreciate that.

> Basically, what we mean by sandboxed is that the application would not be able to do anything harmful to your system unless explicitly allowed (similar ethos as Deno, compared to NPM).

So how is this different to the mentioned (obviously failed) JVM sandbox?

It's exactly the same concept as I see it. "Security" fully depends on (complex!) configuration.

What possibly could go wrong here…

> stdio / file io: thanks to Wasmer VFS and the WASI mapped dirs, the application will only able to access to the directories explicitly defined by the user when running a program. Any file/dir access outside of the allowed directories will throw an error (both by design of WASI and by our implementation of it). Note: by default stdout/stderr will be piped but that can also be easily customized.

So basically you've implemented file access rights.

With multiple additional layers of indirection and a lot of additional code. (TCB anybody?)

Sure, this looks very promising and innovative. What possibly could go wrong here?

How does this prevent unwanted information exfiltration given std. I/O and networking works?

Also, how does it prevent that a rouge / hacked app for example encrypts all files / databases it has regular access to?

> Environment variables: no environment variables can be accessed unless explicitly specified to our CLI (or SDKs)

> Time: we consider time access (get) to be harmless (WASI don't allow setting the platform/OS time, so we don't have to worry about that)

How is this different form std. Linux "capabilities", and the Linux syscall filter?

> Of course, no software is free from bugs that could jailbreak the sandbox, specially after Spectre and Meltdown. So having something to be "fully sandboxed" is an infinite game.

Getting the sandbox tight and close is not the problem imho.

This was done several times already (with mostly OK-ish results).

The problems start when you need to dig holes into it. (And you need holes, otherwise you're so "secure" that you can't do anything meaningful).

I still don't get how a new "whitelist only" sandbox will not end up in the same fundamental issue like the ones before it. When "security" is crucially depended on complex configuration it's very likely that there will be unintended holes because someone didn't get the config right. (Just think about why almost nobody is using SELinux even it gives you even better guaranties then WASM currently).

Java shows this problem also very well: It died on the client not because there were technical loopholes in the sandbox. There were almost none of this kind! The problem was the overly complex sandbox config (with to be honest additionally quite stupid defaults in the beginning) that nobody got right, and security issues popped up constantly therefore.

So you don't even need to think about Spectre and Meltdown kind of attack vectors.

After years (or even decades) there is still no guaranty that people get things like Linux "capabilities" or even std. FS access rights correct. Now the same people are expected to get some other features, that do basically the same, right?

I know already how this will end up… And this will blow up very painfully right into your face after you have spit so much promises about the superior security of your approach!

Don't be stupid, learn form the Java story. (And no, you're not anyhow better. If you think that, you lost already, and your ship is going to sink with a lot of skit for sure).

---

I think the only way to get capability security right, and especially usable by mere mortals in the first place is to make it a fundamental part of the languages used.

Rust is light years away form that, though… (And others don't even have the basic enabling features for that.)

The languages that tried something like that in the past are by now dead since a long time. (Like the E programming language)

Though I put high hopes into the upcoming Scala features in this regard.

But nevertheless those are way ahead of any other (real) language even there a usable system is also still years away. For reference:

https://docs.scala-lang.org/scala3/reference/experimental/cc...

(This will give you all the features of Rust plus true capabilities in the type system. But not before the next couple of years. But maybe someone starts to clone this feature set early so they won't be a decade behind when Scala finally ships this… ;-))

elcritch · on Nov 24, 2022

Your links show several aspect of the WASM sandboxing techniques. Don't mixup the core WASM and WASI. The FS sandboxing is actually part of WASI, as core WASM doesn't allow any FS access or system calls.

> whitelist any and every call

This is the default WASM model. It provides no outside calls by default. This works fine for the original web browser model where WASM is essentially just providing a way to run a function from JS with near native speed. It's extremely limited in usefulness but fast and secure.

WASI is an effort to provide a set of curated APIs to make it easier to use WASM for more use cases like networking or even printing to a console.

> First of all you can forget about C/C++. Everything in this languages assumes free access to the system.

C/C++ works fine. It essentially acts like a bare metal embedded device, but without access to hardware registers. You cant even print to console by default.

This does means most of the C standard library won't work, but C itself doesn't care. There are no system calls either, only whitelisted functions calls statically provided to the WASM code. Again C/C++ don't care, but it means you can only use code that uses limited subset of the standard library.

The JVM's bytecode design makes it hard to translate C to it, whereas WASM is designed for languages like C with arbitrary pointer arithmetic. WASM just turns point access into memory offsets, while the JVM or CLR assume a Java like language.

> I don't find any such feature mentioned in the Wasmer docs. It seems completely without any sandbox!

Browsers have been shipping WASM VMs for years now without disaster, so apparently they do know whats required for sandboxing. If you can prove otherwise and break out of WASM you could make a lot of money from Google's security bounties or just selling it on the darknet. ;)

I'm sure theres been security flaws just like in any sandbox technology, but generally WASM appears successful. Its been adopted outside browsers by security conscious companies like cloud-flare and fastly to provide sandboxing of user code, etc.

> (To stress it once more: All VM languages are memory safe. That's not sandboxing. All the "sandboxed by default" claims are misleading, at least).

You're correct that limiting memory access is only part of what a sandbox requires. Memory safety is a crucial piece of sandboxing, though many VMs do provide that. However most VMs also provide mechanisms to escape the memory sandbox though so they're not good at being sandboxes.

The other crucial part for sandboxing is control flow. The VM must prevent code from being able to hijack control flow, or to otherwise generate code to escape the sandbox. The links you provided discuss how WASM does this currently:

> All control transfers—direct and indirect branches, as well as direct and indirect calls—are to known and type-checked destinations, so it's not possible to accidentally call into the middle of a function or branch outside of a function.

What most of the hype around WASM is actually about is the ability to provide this sandbox isolation after translating WASM bytecode to native binary code.

Most JVM VMs also do JIT'ing at runtime, but the JVM has some runtime features and assumptions that make statically compiling upfront harder. WASM was designed to enable ahead of time compiling from the outset. That is a key difference between JVM and WASM.

For example here's a paper from Stanford & UC folks for a static verification engine for checking that binary executables translated from WASM to maintain the sandboxing: https://cseweb.ucsd.edu//~dstefan/pubs/johnson:2021:veriwasm...

> One should never over promise on security! Java had to learn this the hard way over many years (as their sandbox had have holes in the past, and was also almost impossible to correctly configure).

To reiterate, WASM incorporated these details from the outset, to the point that providing useable WASM VMs outside of browsers years to come about.

If you're talking about WASI, you have a bit more of a point. WASI is the result of carefully adding back in things like FS access using capabilities and other techniques to keep it sandboxed. This is harder, and introduces more surface area to violate WASM's sandboxing. However WASI seems well thought out so far.

kaba0 · on Nov 25, 2022

I really don’t want to sound argumentative just for the sake of it, but sandboxing through “not allowing any side effecting things” is quite.. trivial? Like, I have written a brainfuck interpreter and it can only do prints to stdout, so it is guaranteed safe, hell, it is quite good as a compile target!

There is nothing inherent in java bytecode that would make it unsafe, nor hard to AOT compile — runtime reflection is not part of the JVM spec itself, it is a separate API (akin to WASI) that provides this capability. So, a JVM interpreter can also do no harm without attaching the endpoints that can do harm obviously. Also, there used to be a project called gcj 20 years ago that managed to AOT compile java just fine, the only riddle here is the aforementioned reflection. Graal uses a closed-world assumption to solve it.

Your link is interesting though, will look into it.

still_grokking · on Nov 25, 2022

> Your link is interesting though, will look into it.

Looks much like the byte-code verifier that is std. part of the JVM.

The foundations on which this is built look quite solid though.

still_grokking · on Nov 25, 2022

### PART 1 (as the comment was too long for the system here) ###:

> The FS sandboxing is actually part of WASI

AFAIK it's part of some WASI implementations, and not a mandatory requirement by WASI.

But maybe I'm wrong here.

Could you point to some authoritative documents that bakes your claim about WASI?

> […] where WASM is essentially just providing a way to run a function from JS with near native speed.

Could you explain how you get "near native speed" in a managed VM? No VM overhead on WASM? What did all the other VMs wrong for so many years that everywhere else there are significant overheads and slowdowns compared to native code?

Could you point to some benchmarks that bake your claim?

> C/C++ works fine. It essentially acts like a bare metal embedded device, but without access to hardware registers. You cant even print to console by default.

Could you explain how some random C/C++ program can run in an environment without the possibility to do basically anything?

What happens for example when I print to the console form such program?

> This does means most of the C standard library won't work, but C itself doesn't care.

How does a C program run without C library?

I was never talking about the C/C++ languages as such.

I said: You can forget to cross compile C/C++. Because I assume that almost no code C/C++ code out there will work in an environment without any possibly to do basically anything.

But maybe I'm very wrong here. Could you explain how this works?

> There are no system calls either, only whitelisted functions calls statically provided to the WASM code.

Ah nice.

And those calls are automatically sandboxes, yes?

How is the sandbox configured?

How does it prevent anything bad form happening?

This would require call filters, that inspect the parameters, and then decide what to do on every call, right?

If so, what's about the overhead of such a "call firewall"? How does it affect performance?

Where do I find the documentation for this sandbox?

> The JVM's bytecode design makes it hard to translate C to it, whereas WASM is designed for languages like C with arbitrary pointer arithmetic. WASM just turns point access into memory offsets, while the JVM or CLR assume a Java like language.

How is this relevant?

If WASM is modeled after the C languages requirements, why are people saying that it would be easily usable for other, high-level languages?

Wouldn't that mean that I need to write a VM that runs in a VM?

How would such a construct affect performance and resource usage?

What additional sandboxing could this bring to the table?

Please keep in mind here that the VM in the VM needs to talk to the outside world to fulfill it's std. lib contracts. How does this work?

> Browsers have been shipping WASM VMs for years now without disaster, so apparently they do know whats required for sandboxing.

But we're not talking about an JS VM embedded WASM VM.

We're talking about a standalone WASM VM outside of the JS VM.

How does sandboxing work there? How are the WASI calls sandboxed? (See also all the remarks above).

Where can I find the documentation for that?

> If you can prove otherwise and break out of WASM you could make a lot of money from Google's security bounties or just selling it on the darknet. ;)

Could it be that you just again confused "sandboxing" with memory safety? (Which is, once again, a property every "managed" VM runtime provides).

> Its been adopted outside browsers by security conscious companies like cloud-flare and fastly to provide sandboxing of user code, etc.

Oh, more sandboxing.

Could you point to some documentation how this sandboxing works on the technical level?

AFAIK they just run the usual containers in HW hypervisor VMs, where you can than place your WASM runtime…

So you get in the end sandboxing form the std. hardware based VM. (Because nobody trusts containers for security. And then anyway, in such scenario the workload inside the container is irrelevant).

> Memory safety is a crucial piece of sandboxing, though many VMs do provide that.

Could you point to any VM that does not?

> However most VMs also provide mechanisms to escape the memory sandbox though so they're not good at being sandboxes.

How is WASI different here?

What's the magic that WASI has, but nobody else?

Do you think a sandbox like the one that the JVM just throw away could prevent memory access outside the JVM? How does the provided escape mechanism work actually on the JVM? Is this part of the JVM? Or would it require some "unsafe" access through unofficial APIs which are not part of the platform?

Could you explain how this looks like in other popular VMs? I'm not familiar with such.

> The other crucial part for sandboxing is control flow. The VM must prevent code from being able to hijack control flow, or to otherwise generate code to escape the sandbox.

Could you explain why this does not work reliably on the JVM? (Even there is by standard byte code verification in place, of course.)

But please keep in mind that there is no user-level way to generate native code on the fly on the JVM. Only a JIT, which is actually not part of the platform, can do this. So how do I hijack control flow on the JVM (or other popular VMs)? Could you explain this (under the assumption that there are no bugs in the implementation of course, as this would render any VM protections on any platform likely useless obviously).

> What most of the hype around WASM is actually about is the ability to provide this sandbox isolation after translating WASM bytecode to native binary code.

Even in the face of implementation bugs in the "translator"?

How does this magic work?

Why do they still add "bumper spaces" around the WASM memory then? Should be strictly unnecessary given the magic property of WASM that the AoT compiler or JIT output is still safe, or not?

> Most JVM VMs also do JIT'ing at runtime, but the JVM has some runtime features and assumptions that make statically compiling upfront harder. WASM was designed to enable ahead of time compiling from the outset. That is a key difference between JVM and WASM.

How would this work with say Java on WASM?

What's the magic here again that makes it hard to AoT compile Java for the the JVM platform but it's easy to compile it AoT to WASM? What do I miss?

Could it be that this feature is only relevant in case the language you're compiling is anyway one of the few that are AoT compiled and not run inside a VM by default?

But than again, how is WASM the superior VM to run all kinds of languages like advertised (given almost all languages besides a few exceptions are VM languages)?

> For example here's a paper from Stanford & UC folks for a static verification engine for checking that binary executables translated from WASM to maintain the sandboxing: https://cseweb.ucsd.edu//~dstefan/pubs/johnson:2021:veriwasm...

Interesting.

Is this VeriWasm widely used in industry? Who's using it in production?

(One could also ask who verified the verifier implementation, but that's always the question with SW verification; usually you can "just trust" multiple layers of insights. Otherwise SW verification would be useless; which it's not imho!)

At least the introduction of the paper answers a few of the previous questions. Let me cite:

> WebAssembly (Wasm) is a platform-independent bytecode that offers both good performance and runtime isolation. To implement isolation, the compiler inserts safety checks when it compiles Wasm to native machine code. While this approach is cheap, it also requires trust in the compiler’s correctness—trust that the compiler has inserted each necessary check, correctly formed, in each proper place. Unfortunately, subtle bugs in the Wasm compiler can break—and have broken isolation guarantees.

I'm not further commenting on this part. I think the statements stand on their own.

still_grokking · on Nov 25, 2022

### PART 2 (as the comment was too long for the system here) ###:

> > One should never over promise on security! Java had to learn this the hard way over many years (as their sandbox had have holes in the past, and was also almost impossible to correctly configure).

> To reiterate, WASM incorporated these details from the outset, to the point that providing useable WASM VMs outside of browsers years to come about.

Could it be that you just again confused "sandboxing" and memory safety?

The problem with the JVM sandbox (and most of the time other VM languages) isn't memory safety!

There have been such bugs here and there, sure. In the JS VMs, the WASM WMs (like pointed out in the above linked paper), and also the JVMs.

But those bugs are the usual implementation bugs.

Once again, this is not relevant to the questions here. Memory safety is not a "sandbox"!

> WASI is the result of carefully adding back in things like FS access using capabilities and other techniques to keep it sandboxed.

What "sandbox" are you talking about?

There is only one WASI implementation that offers a extremely simple FS access filter. (And Wasmer, which we're taking about here, does not even have that, as I see it).

Capability security? Where is this implemented?

One again: I don't talk about what people dreamed up! I'm asking for the status quo.

> This is harder, and introduces more surface area to violate WASM's sandboxing.

Yes, that's the point.

Having a cage that does not let anything through is easy. Just build massive walls without windows and doors. Nothing can get out. Nothing can get in…

But such a cage is almost useless.

So the interesting part starts when a thing in the cage needs to talk to the outside world!

"Solving" the "trivial" problem is indeed easy.

Now show me how you solve the real problem!

But to my knowledge there is nothing there. Nothing. (Besides promises and cool sounding plans for some distant future).

> However WASI seems well thought out so far.

Yes, "thought out". But not implemented!

BTW: They did not come up with anything new here.

Capability security is at least around 40 years old.

The point is: Nobody ever implemented this outside of academia.

We have just now the first steps in that direction.

But that are at best beta versions of tech form the future…

As it stands today: WASM does not have any meaningful sandbox. (Memory safety is not a sandbox! It's "just" a prerequisite to be able to implement a sandbox. But it's not one by itself, and a feature _every_ VM runtime provides).

Even containers will give you more sandboxing based on the (by now) rich features of Linux namespaces. (Still nobody considers this a serous sandbox on which you could bet your ass. That's why we run containers in VM; because that's the only—more or less—reliable sandbox we currently have; as our OSes and hardware still don't provide capability security features).

---

Don't get me wrong though.

I do think that the whole "linear memory" idea for memory safety is indeed a very good one.

It's so good, I think this should be implemented on the HW level! (Or at least in a first step on the OS level).

But "really safe memory cages" aren't sandboxes.

Without capability security on top this does not provide any meaningful improvement over what we have already since decades.

But capability security isn't something that you can just to piggyback after the fact. Everything, form the languages used and their libs, up through the OS and VM layers need to be explicitly build with this mechanism in mind. We're still light years away form having such infrastructure!

But what the WASM people present is some fantasy world where all this is already in place.

Given how aggressive those obvious lies are pushed I can't come to any other conclusion than that there is a lot of bad faith behind this campaigns. "Someone" tries to push by all means their VM tech into the market. Even they're leaking around 20 years behind the next competitor. This whole thing is a disgusting marketing war.

And to be honest I'm quite scared how effective this ultra aggressive marketing is even it's almost completely based on highly over promised stuff up to, in its extreme, blatant lies. I hope the person I'm talking to here isn't part of this marketing effort. And people reading this start to ask themself the "right questions" finally ;-).

kaba0 · on Nov 24, 2022

I do somewhat agree with the security aspects, though it seems to be in-process security measures are not the panacea.

> Overall JVM or CLR always seemed too heavyweight for many use cases.

The most common implementation may be; there are plenty of tiny jvms, some target embedded devices even.

oblio · on Nov 24, 2022

Yeah, but then you run into the problem of, "are those tiny JVMs":

1. Well supported by a company and a community

2. Are they free?

Especially 2, it sounds ridiculous but frankly even huge corps just use free stuff whenever they can.

still_grokking · on Nov 24, 2022

> […] jvms, some target embedded devices even.

Java was invented as IoT technology in the first place. Long before this acronym even existed.

It run on all the old 4-line monochrome text display dumb phones.

Java runs on SmartCards.

kaba0 · on Nov 24, 2022

Yeah, I know, but those are not the most streamlined ways of using Java nowadays, that’s why I didn’t highlight these parts as much (also, smartcards only run java ME, which is a subset of SE)

still_grokking · on Nov 25, 2022

Sure!

I wanted only to point out that using the word "even" in the cited part is kind of "funny". Because this was actually the originally intended usage of Java.

It was invented as safe replacement of C/C++ especially for embedded development. (But now, sure, it ended up on the really huge boxes in DCs).

elcritch · on Nov 24, 2022

Also most of them are also only support programming in Java. This always limited running arbitrary C code, or other languages like C++, Go, Nim, Rust, etc.

still_grokking · on Nov 25, 2022

The JVM is Turing complete. So you can run any language you please on it.

The question is—as always—how efficient this is.

But this goes also the other way around.

Even you can in theory run arbitrary languages on a WASM runtime the question how efficient this is stands also!

And in fact running anything else than "C languages" in WASM is at least as inefficient as running "C languages" on the JVM…

kaba0 · on Nov 25, 2022

> And in fact running anything else than "C languages" in WASM is at least as inefficient as running "C languages" on the JVM…

Hell, it is a really safe bet to say that running a C-like program on the JVM is much much faster than porting a whole runtime (which is very performance-oriented) on top of a runtime and make it run some code. Wasm simply can’t expose the optimizations runtimes routinely make use of (template interpreters for example))