It's like a bunch of people correctly predicted a few months ago that maybe this kind of attack surface shouldn't be added into the kernel: https://news.ycombinator.com/item?id=28355754
Well, one thing written to test-drive Rust for Linux was in-kernel server for 9P, which is probably the closest thing to ksmbd, in-kernel server for SMB, without being identical: https://lwn.net/Articles/907685/
Language doesn't need to be good to be popular for a long time. PHP was (is) previous "shit but lowest common denominator that everyone uses it" before JS took that mantle.
Also "good" depends on the audience. Ability to cheaply hire a bunch of devs to whip up CRUD app is "good" quality of PHP/JS even if it might not make the language "good" from engineering perspective.
Ignore the JS bashing (it is deserved but irrelevant here).
I was referring to the part explaining the 2028-2040 METAL system where "we" gained 4% execution speed because we stuffed all things into a asm.js VM in kernel that can skip syscall overhead. But it's a joke video anyway.
I’d guess it’d be slightly faster. The WASM or eBPF would add some overhead, however you save on context switching costs from kernel to user land. So for IO heavy stuff like SMB it’d probably perform well.
SMB is a complex network protocol, implementing it without a serious bug somewhere is a serious challenge. Kerberos is another complicated protocol. You have user space FS modules, perhaps it would have been better to make it one? But at the same time you have stuff like TLS ans wireguard in the kernel (SMB is still more complicated imho).
Linux being monolithic and all, I don't think there is a good answer. Hope Microkernels make a comeback.
So if it weren't in the kernel it would need to run as root, wouldn't it? That gives the attacker nearly the same rights but a much more convenient programming environment.
Binding to a low socket is one thing. But as a file server it needs to access many users' files. That can be done without being root, too. Reducing the attack surface on one end but weakening things a bit on the other. A net gain I might concede, but not completely black and white.
For an extremely limited number of users who rely on one specific use of this feature. It'll get fixed, quickly, and we'll move on.
Your reaction seems way overblown to me. Maybe you are sticking to your guns here. But to me, considering the real scope of damage & threat here, this seems unfortunate, sad, avoidable, and bad, but for a new implementation of a massive sprawling wild protocol, I'd far rather we try than just throw up our hands & let bravery fold. We should do hard things, even when we can't have absolute confidence in total success.
The main reason to use ksmbd is if you can't use GPLv3 Samba. Most PC SMB servers will still be using Samba instead of ksmbd for this reason. Ksmbd is mostly used on NAS boxes.
My main reason for wanting ksmbd is that it's tiny (a few hundred k I believe). The smallest Samba build I've seen is ~40MB, and not very portable at all. I pretty much had to use buildroot to make it work.
My use case is shipping minimal Linux kernels + initramfs that can be run with QEMU. I need file sharing and SMB is the most universal protocol. I can ship the entire kernel (~5MB) and QEMU (~15MB) in less space than Samba. I would love a minimal build.
> My main reason for wanting ksmbd is that it's tiny
That's a fine reason. Does it have to be in the kernel to be small though? From what I can see ksmpd is small because it delivers only a minimal SMB3, not because it's in-kernel. Why would a user space SMB3 be appreciably larger?
Also, the performance hopes for ksmbd don't appear valid either. By adopting io_uring and splice(2) Samba now outperforms[1] ksmbd by a wide margin. Those results are a year old and may be out of date, but still, I suspect we're getting so close to the limits of hardware that it doesn't matter.
Another argument for in-kernel is SMB Direct: SMB over RDMA. Yet here[2] we see io_uring is receiving the bits needed for that as well.
Finally, the license issue: I can't think of a reason a GPL2/Berkeley licensed SMB3 couldn't be in user space.
Where am I wrong? Is there a valid reason for this to be in kernel? I don't see one.
Edit: looks like the ksmbd SMB Direct work predates the io_uring RDMA capability by a few years, so at that time SMB Direct was a legitimate reason. On the other hand send(..., MSG_ZEROCOPY) predates ksmbd...
I think the biggest reason here, especially for GP, is that ksmb3 already exists now (and it existed for a few years), while your tiny smb3 does not exist yet.
It just means that a better solution that fulfill all their needs doesn't exist yet, and they decided to take one trade-off rather than another one. Maybe in that setup a possible RCE is not that dangerous.
My application is rather unique. It involves shipping VM images to end users. Which means the images can be downloaded many times. But the files shared over SMB would be the user's own files.
I don't have them hosted anywhere. I'd recommend playing with buildroot. It's very simple and mostly just works. Using the provided QEMU config yields a kernel that's about 5MB. I believe user space (a la BusyBox) was about another 5MB. Haven't done much tuning beyond that. Feel free to email me if you have questions.
It seems ksmbd developers are mostly interested in use cases like SMB Direct which benefits from in-kernel implementation, but most ksmbd users are, at this point, interested in avoiding GPLv3.
NFS has been in the kernel for ages and works roughly the same way, kernel driver with a userland component.
NFS has the advantage of having been in the kernel longer with most low hanging fruit security bugs fixed. ksmbd is brand new. Quite the expectation to have it be completely bug free.
edit: wait a minute, this was fixed with kernel 5.15.61
The criticism isn't that anyone expects bug free code, rather that introducing new remotely accessible attack surface to the kernel in 2022 when we know it's likely unsafe is silly. Building an SMB server in the kernel because "well, NFS was secure eventually" overlooks the fact that NFS shouldn't be in the kernel either.
Right, but that argument can be used to stop the inclusion of _any_ new large functionality in the kernel. Neither the Linux users nor the maintainers are currently interested in a feature-freeze of the kernel. If you want a micro-kernel, Linux was never the solution.
Yes, new features will have problems. They remain disabled in build configs of most distros at this point. Over time, more stability will encourage more consumers to enable it too.
This is not new for the Linux community. They will deal with this one like they have with other new features in the past.
> that argument can be used to stop the inclusion of any new large functionality in the kernel.
Sure, it could be, but there are shades of gray. Arguing that adding a full fat SMB server to the kernel is not the same thing as suggesting that file systems should be 100% in user space. You and I both know that the threat posed by introducing a large amount of remotely reachable code is not at all the same as that posed by a new kernel filesystem.
The last few decades of arguing about micro vs monolithic have surely convinced everyone that there is a time and a place for both. Yes, embedding the server in the kernel gives us lower latency by eliding a context switch (a few hundred cycles) but this comes at a pretty high cost that we're going to be paying for years to come.
Call me overly dramatic, but ever since the NSO group stuff went public I've been a lot more risk averse when it comes to introducing remote attack surface because we know that these bugs are being used to kill people. Making kernel compromise harder means possibly saving someone's life. Do I think someone is going to die over a ksmbd bug? Probably not. Would I want to be the one that checked in a huge remotely accessible blob of code? No, so I think carefully about what code I put where to minimize the risk.
Look, I agree with you. Increased attack surface scares me too. I'm just saying that line of argument doesn't persuade the maintainers or the users. We need to find a better way to protect ourselves. And turning off build flags is one way to do that, one that the community has adopted already.
The other point I'll make is that other kernel features scare me far more than this SMB server. Think io_uring, eBPF, or similar systems. Their attack surfaces are far larger, and yet they have become mainstream. Unfortunately, the horse has already left the barn. We need to find better ways to secure our systems. Arguing for fewer features has been tried for decades, and hasn't helped. Not here in the kernel, not in the browser, not anywhere.
I wish the world was easier to secure, but it's not.
Right but servers builtin in kernel are the worst of all cases.
Not only you have service where (if not firewalled) anything can connect and try the luck sending shit to it, it is often integrated with many other systems inside the kernel so it increases effort to rewrite any of that. Protocol clients in kernel have far less problems, for one you only connect to defined endpoint so attacker just to start would need to MITM you, and it is usually smaller codebase than the server
It is also usually stuff where you want to add new features relatively often and "upgrade kernel to use this new server feature" is not thing people like very much.
Providing interfaces to make userspace implementations faster have far better payoff, generic "make disk access and shoving stuff between disk and network fast" will help any file serving demon, not just SMB (point which original Samba proves, as with new improvements it is currently faster than ksmbd)
How do people running Ceph and other exotic filesystems deal with performance? What performance is considered reasonable performance in your opinion? It might not align with others, most people don't push that crazy amounts of data. I know IBM went from in-kernel NFS to Ganesha for their Spectrum Scale product recently.
"Crazy amounts of data" isn't the main concern, it's latency. It's the people storing giant amounts of data who generally don't worry about that so much.
Ceph isn't a filesystem, it's a service layer (self-described "storage platform") that runs on top of some other unspecified filesystem. Think git-annex or hadoop, not ext4.
Anyway the way Ceph does that is replication, just like those other solutions. There may be 4 nodes with filesystems that contain that data, and Ceph is the veneer that lets you not have to worry about the implementation-detail of where it lives.
That's a valid observation. All of the old stuff has been battle tested and reviewed many times. Newer stuff is bound to have bugs that still have not been found. And even old stuff turns up surprises every now and then. For instance
I found a buffer overflow in the OpenSolaris code a few hours ago that originated in a commit made in 2007. It predates that Linux bug by at least a year.
It is amazing how many old bugs have survived to the present day. :/
Pity to see this downvoted because it is a valid point. We are stuck in the past with these macro kernels and this sort of thing is a direct consequence of using them. If anything having a macro kernel makes it quite hard to shut down a module like this to determine if it was well behaved with respect to memory and because it is in the kernel any kind of compromise immediately has far reaching consequences because it breaches all of the barriers in one fell swoop.
Use-after-free bugs are pretty easy to figure out on shutdown because you can free the remainder of your working memory and if it turns out that you have a use-after-free there likely will be a double free in there somewhere.
If not then the problem is more insidious but I've found plenty of use-after-free bugs that way.
Most exploited UAFs don't happen in common execution paths. They're often caused by weird races and error conditions that nobody considered to even happen. It's why things like production ASan is a lot less valuable than people would imagine: most reasonably well tested software doesn't exhibit memory corruption when used normally. So, sure, your suggested technique could be a cool way to try and catch bugs that appear under normal execution but it won't put that much of a dent in the total number of bugs.
That may well be true, even so debugging a kernel module is quite a lot harder than a regular process so the fact that it is a kernel module by itself probably increases the chances of such bugs being present. Especially because you are not in control of the execution context.
Okay, so we move to microkernels, which murders everybody's performance because we still don't know how to work around context switching being expensive, and force everyone to do the constant additional effort to define exactly what components should have permission to access what parts of their system (unless you want everybody to use one-size-fits-all defaults and lose functionality, or just give everything more permission and lose the security benefit). Now in this hypothetical world, an SMB server has a vulnerability. Technically, the situation is much improved; in this situation an attacker only has access to whatever that component had access to. So, just all the files you shared, and (hopefully read-only) user information. And probably network access. Is that really a good cost/benefit trade?
Have you actually benchmarked a context switch on modern hardware? A full switch (including register spilling and page table swap) can be had in <150 cycles on even cheap, older Arm A-series cores like the Cortex A72. We're not still living in a world where a context switch forces you to flush the TLB, you literally just have to pay the cost for the trap, spill, page table swap, unspill, and return. This cost is even lower of modern ARM processors which support speculative exceptions where you can perform the entire context switch speculatively.
If context switches are performance problem you are probably pretty far down the optimation rabbit hole, but the articles you linked have nothing to do with context switches, they are about NUMA optimizations to sendfile on freebsd.
So massive overhead when you need to context switch between network process, I/O process, and FS process just to pass some bytes, where in monolithic kernel that's just a cost of few function calls
Not only we have QNX, most Android drivers since Android 8 are userspace, macOS is incremently moving to userspace drivers killing kernel drivers for anyone else other than Apple, most newer Windows drivers since Vista are userspace, Fuschia is microkernel based and already has some production deployments,...
> because we still don't know how to work around context switching being expensive
Context switching can be as expensive as you want, or almost as cheap as you want depending on the limitations of the hardware. Back in 1992 on by todays' standards very anemic hardware you could do 200K slices per second, today likely orders of magnitude more than that.
I suspect that lots of people - not necessarily you - that have very clear opinions on micro kernels and their capabilities have never actually used them.
Every vehicle you've driven that was made in the last 25 years has one or more instances of micro kernels in them. They are perfectly usable, and the performance hit isn't nearly what people think it is. What you give up in throughput you more than get back in deterministic behavior, reliability and improved latency.
Sorry, portable was a poor word here. Since I'm using Linux, being built in makes it easy for me to use. But if samba shipped as a small static executable that would be even easier.
But SMB in general is complicated mess that only gets more complicated for backward compatibility sake. Comprehensive implementation almost by definition have to be complex.
Small "local users only" implementation would certainly be welcome but I don't see the benefits of keeping it in kernel.
Looks like already Ubuntu 20.04 has it as a loadable module. However, it can be started only by a privileged user and it requires user space tools. So I would guess it's not running unless the sysadmin has actively configured it.
That's called a Chinese Room. :-) A human cannot write correct C. The theorem-prover-human system can. Unfortunately, the C apologists I observe around me are as opposed to formal methods as they are non-C languages. It's some kind of bizarre cowboy thing.
I tend to agree. Really the C standard needs to be yeeted away from WG14's grasp. Because they're responsible for blocking safety related improvements.
>For context Samba and NFS had historically been buggy or exploitable since the 90s.
Cool then let's put it into the kernel, another buggy software? -> right into the kernel, webserver? -> Kernel...oh wait we had that. Database? -> Kernel
I'm so glad ksmbd exists. It's such a lightweight, easy, simple way to interoperate. As someone who has been running OpenWRT & then other small embedded systems for a decade and a half, projects like Samba have been wonderful, but are massive everything-and-the-kitchen-sink (even when heavily stripped down) sized tools required to interoperate with the rest of the computing world. Being able to have a small kernel module built in radically increases the number of systems that can benefit from common interoperation, and it greatly eases the difficulty by being a targeted focused file-sharing implementation rather than the incredibly wide-ranging implementations we get in Samba.
It's definitely been a bit of a challenge to make ksmbd happen. I want to be able to acknowledge problems, validate the fear people have had. But also, ksmbd feels like such a textbook example of what Steve Yeggie's thesis in Notes from the Mystery Magic Bus:
> Software engineering has its own political axis, ranging from conservative to liberal.
Starting from some definitions:
> So we'll start with an operational definition of conservatism, from Jost et al.: "We regard political conservatism as an ideological belief system that is significantly (but not completely) related to motivational concerns having to do with the psychological management of uncertainty and fear."
Today we see some validation of fear. There are problems. It's certainly an inconvenience for those relying on public shares functioning securely. Thusfar it's unclear how many people have been attacked via this, or what harm has been done: the scope of damage is unknown. But the conservative view is justified, in that there have been problems, ksmbd is causing those relying on it to have to update, or risk being attacked.
Reciprocally though, I want to highlight how great this effort is. This is a novel new implementation of a complex protocol, that sits right at the hub of how systems can work together. There's a progressive, can do-ism here that is absolutely cherisable & excellent. That there are problems along the road is absolutely too something we should factor in, is a concern. It's up to everyone to take score, and to decide their alignment, what to go for & what not to. Even though today is a "bad" day for this enhancement, even though the road forward for it hasn't been without difficulty, I don't feel like it dooms the whole enterprise. Trying, to me, feels so worthwhile. The scope of impact, the actual harm of trying, seems so mild, and the doomsaying & fearmongering seems so overblown, to me.
This should be in a Wasm sandbox. (if performance is so critical--otherwise keep it in userspace). Crazy we keep trading serious CVEs for a little perf.
The sandbox doesn't protect against corruption inside of linear memory, nor exploits that take advantage of it to try out to influence code execution paths, triggering calls that shouldn't have happened in first place.
Great the exploit cannot pown the host, it can nevertheless trigger damaging behaviours.
To elaborate on point: "Great, the WASM-SMB server didn't crash the kernel, it can ONLY exfiltrate every single file it has access to, which is every important file because it is a file server. But don't worry, your /etc/shadow is safe!"
Wasm doesn't necessarily help if it's performance-critical because that comes with its own overhead. And the kernel component of ksmbd interacts with other parts of the kernel such as the vfs, sockets/rdma. If you had marshal all the objects into something safe before exposing them to a wasm sandbox (e.g. replacing pointers with some map keys) that'd increase the overhead further.