Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Linux Kernel Ksmbd Use-After-Free Remote Code Execution Vulnerability (zerodayinitiative.com)
147 points by choult on Dec 22, 2022 | hide | past | favorite | 104 comments


It's like a bunch of people correctly predicted a few months ago that maybe this kind of attack surface shouldn't be added into the kernel: https://news.ycombinator.com/item?id=28355754


It is almost as if Linux is trying to catch up to Windows in attack surface:

https://arstechnica.com/information-technology/2022/12/criti...


Similar incentives and assumptions lead to similar results.


Maybe if you want to do SMB in kernel-space they should be using the rust bindings


Rust is driver only…for now


Well, one thing written to test-drive Rust for Linux was in-kernel server for 9P, which is probably the closest thing to ksmbd, in-kernel server for SMB, without being identical: https://lwn.net/Articles/907685/


This is hilarious:

> Initially, he had started trying to replace the ksmbd server, but that turned out to not be an ideal project.

if only he perservered...


Alternatively using a WASM engine or maybe the kernels built in eBPF.


The overhead would make it not worth it, you'd be better off just running it in userspace.

WASM started to become equivalent of "just rewrite it in Rust" here...



Language doesn't need to be good to be popular for a long time. PHP was (is) previous "shit but lowest common denominator that everyone uses it" before JS took that mantle.

Also "good" depends on the audience. Ability to cheaply hire a bunch of devs to whip up CRUD app is "good" quality of PHP/JS even if it might not make the language "good" from engineering perspective.


Ignore the JS bashing (it is deserved but irrelevant here).

I was referring to the part explaining the 2028-2040 METAL system where "we" gained 4% execution speed because we stuffed all things into a asm.js VM in kernel that can skip syscall overhead. But it's a joke video anyway.


I’d guess it’d be slightly faster. The WASM or eBPF would add some overhead, however you save on context switching costs from kernel to user land. So for IO heavy stuff like SMB it’d probably perform well.


Quite frankly any new kernel code that interacts with the network, or untrusted data in general, should use Rust to the greatest extent possible.


Why? Is it because you think Rust code in the kernel is memory safe?


It is definitely, undisputably memory safer.


SMB is a complex network protocol, implementing it without a serious bug somewhere is a serious challenge. Kerberos is another complicated protocol. You have user space FS modules, perhaps it would have been better to make it one? But at the same time you have stuff like TLS ans wireguard in the kernel (SMB is still more complicated imho).

Linux being monolithic and all, I don't think there is a good answer. Hope Microkernels make a comeback.


I am not sure what was so bad about Samba. It was performing better than ksmbd in the sequential access patterns that it typically has:

https://samba.plus/blog/detail/ksmbd-a-new-in-kernel-smb-ser...


Samba also had many severe rce vulns over the years.


So if it weren't in the kernel it would need to run as root, wouldn't it? That gives the attacker nearly the same rights but a much more convenient programming environment.


With CAP_NET_BIND_SERVICE, it doesn't need to be run as root.


And without it can just bind socket then change UID


Binding to a low socket is one thing. But as a file server it needs to access many users' files. That can be done without being root, too. Reducing the attack surface on one end but weakening things a bit on the other. A net gain I might concede, but not completely black and white.


[flagged]


Sincerely, what the hell are you even talking about? You are literally commenting on a kernel RCE enabled by putting this stuff in the kernel!


It's obviously a GPT-3 generated comment.


If you're lucky.


For an extremely limited number of users who rely on one specific use of this feature. It'll get fixed, quickly, and we'll move on.

Your reaction seems way overblown to me. Maybe you are sticking to your guns here. But to me, considering the real scope of damage & threat here, this seems unfortunate, sad, avoidable, and bad, but for a new implementation of a massive sprawling wild protocol, I'd far rather we try than just throw up our hands & let bravery fold. We should do hard things, even when we can't have absolute confidence in total success.


So much downvoting! Make a case. Why do you all disagree so vehemently? What has you so very very concerned?


The kernel desperately needs a product manager, who says no.


Samba outperforms ksmbd anyway - https://samba.plus/blog/detail/ksmbd-a-new-in-kernel-smb-ser...

The main reason to use ksmbd is if you can't use GPLv3 Samba. Most PC SMB servers will still be using Samba instead of ksmbd for this reason. Ksmbd is mostly used on NAS boxes.


My main reason for wanting ksmbd is that it's tiny (a few hundred k I believe). The smallest Samba build I've seen is ~40MB, and not very portable at all. I pretty much had to use buildroot to make it work.

My use case is shipping minimal Linux kernels + initramfs that can be run with QEMU. I need file sharing and SMB is the most universal protocol. I can ship the entire kernel (~5MB) and QEMU (~15MB) in less space than Samba. I would love a minimal build.


> My main reason for wanting ksmbd is that it's tiny

That's a fine reason. Does it have to be in the kernel to be small though? From what I can see ksmpd is small because it delivers only a minimal SMB3, not because it's in-kernel. Why would a user space SMB3 be appreciably larger?

Also, the performance hopes for ksmbd don't appear valid either. By adopting io_uring and splice(2) Samba now outperforms[1] ksmbd by a wide margin. Those results are a year old and may be out of date, but still, I suspect we're getting so close to the limits of hardware that it doesn't matter.

Another argument for in-kernel is SMB Direct: SMB over RDMA. Yet here[2] we see io_uring is receiving the bits needed for that as well.

Finally, the license issue: I can't think of a reason a GPL2/Berkeley licensed SMB3 couldn't be in user space.

Where am I wrong? Is there a valid reason for this to be in kernel? I don't see one.

[1] https://samba.plus/blog/detail/ksmbd-a-new-in-kernel-smb-ser... [2] https://lwn.net/Articles/879724/

Edit: looks like the ksmbd SMB Direct work predates the io_uring RDMA capability by a few years, so at that time SMB Direct was a legitimate reason. On the other hand send(..., MSG_ZEROCOPY) predates ksmbd...


I think the biggest reason here, especially for GP, is that ksmb3 already exists now (and it existed for a few years), while your tiny smb3 does not exist yet.


Ok, so now that we've exhausted the regression of invalid reasons we're left with what? Someone made it so in it goes?

Enjoy the CVEs I guess.


It just means that a better solution that fulfill all their needs doesn't exist yet, and they decided to take one trade-off rather than another one. Maybe in that setup a possible RCE is not that dangerous.


Basically this. I would love a small uaerspace implementation. Even if it had worse performance that's not the most important thing for me.


"I need to share files but can't spare 40MB of space for OS" is quite rare case tho


My application is rather unique. It involves shipping VM images to end users. Which means the images can be downloaded many times. But the files shared over SMB would be the user's own files.


Where can I get copies of your kernel builds and how can I run them? What's the tiniest possible one you've made?


I don't have them hosted anywhere. I'd recommend playing with buildroot. It's very simple and mostly just works. Using the provided QEMU config yields a kernel that's about 5MB. I believe user space (a la BusyBox) was about another 5MB. Haven't done much tuning beyond that. Feel free to email me if you have questions.


> The main reason to use ksmbd is if you can't use GPLv3 Samba

If that’s the case, why did they have to put it in the Kernel? Couldn’t it have just been userland?


It seems ksmbd developers are mostly interested in use cases like SMB Direct which benefits from in-kernel implementation, but most ksmbd users are, at this point, interested in avoiding GPLv3.


> Ksmbd is mostly used on NAS boxes.

Is it even? It was mainlined only last year, I don't think any embedded vendor have had time to ship it in a product.


Comments here are cute.

NFS has been in the kernel for ages and works roughly the same way, kernel driver with a userland component.

NFS has the advantage of having been in the kernel longer with most low hanging fruit security bugs fixed. ksmbd is brand new. Quite the expectation to have it be completely bug free.

edit: wait a minute, this was fixed with kernel 5.15.61

That's at least 20 releases ago.


The criticism isn't that anyone expects bug free code, rather that introducing new remotely accessible attack surface to the kernel in 2022 when we know it's likely unsafe is silly. Building an SMB server in the kernel because "well, NFS was secure eventually" overlooks the fact that NFS shouldn't be in the kernel either.


Right, but that argument can be used to stop the inclusion of _any_ new large functionality in the kernel. Neither the Linux users nor the maintainers are currently interested in a feature-freeze of the kernel. If you want a micro-kernel, Linux was never the solution.

Yes, new features will have problems. They remain disabled in build configs of most distros at this point. Over time, more stability will encourage more consumers to enable it too.

This is not new for the Linux community. They will deal with this one like they have with other new features in the past.


> that argument can be used to stop the inclusion of any new large functionality in the kernel.

Sure, it could be, but there are shades of gray. Arguing that adding a full fat SMB server to the kernel is not the same thing as suggesting that file systems should be 100% in user space. You and I both know that the threat posed by introducing a large amount of remotely reachable code is not at all the same as that posed by a new kernel filesystem.

The last few decades of arguing about micro vs monolithic have surely convinced everyone that there is a time and a place for both. Yes, embedding the server in the kernel gives us lower latency by eliding a context switch (a few hundred cycles) but this comes at a pretty high cost that we're going to be paying for years to come.

Call me overly dramatic, but ever since the NSO group stuff went public I've been a lot more risk averse when it comes to introducing remote attack surface because we know that these bugs are being used to kill people. Making kernel compromise harder means possibly saving someone's life. Do I think someone is going to die over a ksmbd bug? Probably not. Would I want to be the one that checked in a huge remotely accessible blob of code? No, so I think carefully about what code I put where to minimize the risk.


Look, I agree with you. Increased attack surface scares me too. I'm just saying that line of argument doesn't persuade the maintainers or the users. We need to find a better way to protect ourselves. And turning off build flags is one way to do that, one that the community has adopted already.

The other point I'll make is that other kernel features scare me far more than this SMB server. Think io_uring, eBPF, or similar systems. Their attack surfaces are far larger, and yet they have become mainstream. Unfortunately, the horse has already left the barn. We need to find better ways to secure our systems. Arguing for fewer features has been tried for decades, and hasn't helped. Not here in the kernel, not in the browser, not anywhere.

I wish the world was easier to secure, but it's not.


Right but servers builtin in kernel are the worst of all cases.

Not only you have service where (if not firewalled) anything can connect and try the luck sending shit to it, it is often integrated with many other systems inside the kernel so it increases effort to rewrite any of that. Protocol clients in kernel have far less problems, for one you only connect to defined endpoint so attacker just to start would need to MITM you, and it is usually smaller codebase than the server

It is also usually stuff where you want to add new features relatively often and "upgrade kernel to use this new server feature" is not thing people like very much.

Providing interfaces to make userspace implementations faster have far better payoff, generic "make disk access and shoving stuff between disk and network fast" will help any file serving demon, not just SMB (point which original Samba proves, as with new improvements it is currently faster than ksmbd)


> rather that introducing new remotely accessible attack surface to the kernel in 2022 when we know it's likely unsafe is silly.

This is the worst possible take on this.

> Building an SMB server in the kernel because "well, NFS was secure eventually" overlooks the fact that NFS shouldn't be in the kernel either.

The way Linux works, NFS unfortunately has to be in the kernel to achieve reasonable performance.


How do people running Ceph and other exotic filesystems deal with performance? What performance is considered reasonable performance in your opinion? It might not align with others, most people don't push that crazy amounts of data. I know IBM went from in-kernel NFS to Ganesha for their Spectrum Scale product recently.


Ceph/cephfs have kernel clients (and FUSE ones too), but not server. Server is userspace.

It's easier to limit client attack space because to just start attacking client you'd need to MITM the client-server traffic


"Crazy amounts of data" isn't the main concern, it's latency. It's the people storing giant amounts of data who generally don't worry about that so much.


We usually run those services with local nvme disks, they're not as portable but we get great performance.


Ceph isn't a filesystem, it's a service layer (self-described "storage platform") that runs on top of some other unspecified filesystem. Think git-annex or hadoop, not ext4.

Anyway the way Ceph does that is replication, just like those other solutions. There may be 4 nodes with filesystems that contain that data, and Ceph is the veneer that lets you not have to worry about the implementation-detail of where it lives.


Ceph actually does manage its own backing filesystem too these days, after the bluestore migration a few years age.


That's a valid observation. All of the old stuff has been battle tested and reviewed many times. Newer stuff is bound to have bugs that still have not been found. And even old stuff turns up surprises every now and then. For instance

https://nvd.nist.gov/vuln/detail/CVE-2021-27363

4300 affected kernel versions has to be a record of sorts.


I found a buffer overflow in the OpenSolaris code a few hours ago that originated in a commit made in 2007. It predates that Linux bug by at least a year.

It is amazing how many old bugs have survived to the present day. :/


>I found a buffer overflow in the OpenSolaris code a few hours ago that originated in a commit made in 2007.

That's because there is no OpenSolaris anymore....


Yet another vulnerability and exploit that just wouldn't be possible on a well-designed system, such as Genode[0] with seL4[1].

Monolithic UNIX clones are an anachronism we are well past the time to get rid of.

0. https://genode.org/

1. https://sel4.systems/


Pity to see this downvoted because it is a valid point. We are stuck in the past with these macro kernels and this sort of thing is a direct consequence of using them. If anything having a macro kernel makes it quite hard to shut down a module like this to determine if it was well behaved with respect to memory and because it is in the kernel any kind of compromise immediately has far reaching consequences because it breaches all of the barriers in one fell swoop.


> If anything having a macro kernel makes it quite hard to shut down a module like this to determine if it was well behaved with respect to memory

Unloading modules is generally possible, but I'm not quite following what shutting it down has to do with checking memory use?


Use-after-free bugs are pretty easy to figure out on shutdown because you can free the remainder of your working memory and if it turns out that you have a use-after-free there likely will be a double free in there somewhere.

If not then the problem is more insidious but I've found plenty of use-after-free bugs that way.


Most exploited UAFs don't happen in common execution paths. They're often caused by weird races and error conditions that nobody considered to even happen. It's why things like production ASan is a lot less valuable than people would imagine: most reasonably well tested software doesn't exhibit memory corruption when used normally. So, sure, your suggested technique could be a cool way to try and catch bugs that appear under normal execution but it won't put that much of a dent in the total number of bugs.


That may well be true, even so debugging a kernel module is quite a lot harder than a regular process so the fact that it is a kernel module by itself probably increases the chances of such bugs being present. Especially because you are not in control of the execution context.


Valgrind too - only applicable to userspace processes and it would have caught this UAF.


Okay, so we move to microkernels, which murders everybody's performance because we still don't know how to work around context switching being expensive, and force everyone to do the constant additional effort to define exactly what components should have permission to access what parts of their system (unless you want everybody to use one-size-fits-all defaults and lose functionality, or just give everything more permission and lose the security benefit). Now in this hypothetical world, an SMB server has a vulnerability. Technically, the situation is much improved; in this situation an attacker only has access to whatever that component had access to. So, just all the files you shared, and (hopefully read-only) user information. And probably network access. Is that really a good cost/benefit trade?


Have you actually benchmarked a context switch on modern hardware? A full switch (including register spilling and page table swap) can be had in <150 cycles on even cheap, older Arm A-series cores like the Cortex A72. We're not still living in a world where a context switch forces you to flush the TLB, you literally just have to pay the cost for the trap, spill, page table swap, unspill, and return. This cost is even lower of modern ARM processors which support speculative exceptions where you can perform the entire context switch speculatively.


Netflix definitely benched their context-switches and they determined it was expensive enough to be worth engineering out.

https://www.phoronix.com/news/Netflix-NUMA-FreeBSD-Optimized

https://2019.eurobsdcon.org/slides/NUMA%20Optimizations%20in...


If context switches are performance problem you are probably pretty far down the optimation rabbit hole, but the articles you linked have nothing to do with context switches, they are about NUMA optimizations to sendfile on freebsd.


The whole point of using sendfile(2) was to avoid context switches between reading data from disk and sending it via the network stack.


So massive overhead when you need to context switch between network process, I/O process, and FS process just to pass some bytes, where in monolithic kernel that's just a cost of few function calls


Urban myths that keep being retold.

Not only we have QNX, most Android drivers since Android 8 are userspace, macOS is incremently moving to userspace drivers killing kernel drivers for anyone else other than Apple, most newer Windows drivers since Vista are userspace, Fuschia is microkernel based and already has some production deployments,...

And guess what, they perform just fine.


What is fuchsia used for in production? I wasn’t aware that anything was running it yet.


Currently it replaced the OS that powers Google smart displays devices,

https://arstechnica.com/gadgets/2022/08/googles-fuchsia-os-l...

With Google Nest speaker on the horizon,

https://9to5google.com/2022/12/07/google-nest-audio-fuchsia-...


Probably rsyncing a "production" tarball to a backup server, just like all of RISC-V's current "production" usage.


> because we still don't know how to work around context switching being expensive

Context switching can be as expensive as you want, or almost as cheap as you want depending on the limitations of the hardware. Back in 1992 on by todays' standards very anemic hardware you could do 200K slices per second, today likely orders of magnitude more than that.

I suspect that lots of people - not necessarily you - that have very clear opinions on micro kernels and their capabilities have never actually used them.


No, micro kernel are only liked by people with limited imagination. They are perfect, yet unusable.


Every vehicle you've driven that was made in the last 25 years has one or more instances of micro kernels in them. They are perfectly usable, and the performance hit isn't nearly what people think it is. What you give up in throughput you more than get back in deterministic behavior, reliability and improved latency.


Blackberry(QNX), Nokia(Symbian), MarsRover (VxWorks), TrueUnix64, Windows and and and...but yeah they are unusable.


See also https://lwn.net/Articles/871866/.

I would love to see this implementation succeed (Samba is too big and not portable enough for my use case), but there have definitely been challenges.


Isn't in-kernel implementation necessarily more kernel specific? How is ksmbd more portable than Samba?


Sorry, portable was a poor word here. Since I'm using Linux, being built in makes it easy for me to use. But if samba shipped as a small static executable that would be even easier.


I guess you could shove it into container.

But SMB in general is complicated mess that only gets more complicated for backward compatibility sake. Comprehensive implementation almost by definition have to be complex.

Small "local users only" implementation would certainly be welcome but I don't see the benefits of keeping it in kernel.


    --- a/fs/ksmbd/smb2pdu.c
    +++ b/fs/ksmbd/smb2pdu.c
    @@ -2044,6 +2044,7 @@ int smb2_tree_disconnect(struct ksmbd_wo
     
      ksmbd_close_tree_conn_fds(work);
      ksmbd_tree_conn_disconnect(sess, tcon);
    + work->tcon = NULL;
      return 0;
     }


The css doesn’t correctly overflow text on my phone meaning half the page is not rendered. Which kernel versions are vulnerable to this?


It seems to work for me if I rotate the phone to landscape mode.


It was merged in 5.15. Does any major distro configure this in?


Looks like already Ubuntu 20.04 has it as a loadable module. However, it can be started only by a privileged user and it requires user space tools. So I would guess it's not running unless the sysadmin has actively configured it.

Even in Linux 6.1 it is marked experimental.


Humans cannot write correct C, full stop


What do you call seL4, the Compcert C compiler and the tools for writing provably correct C at frama-c.org?


That's called a Chinese Room. :-) A human cannot write correct C. The theorem-prover-human system can. Unfortunately, the C apologists I observe around me are as opposed to formal methods as they are non-C languages. It's some kind of bizarre cowboy thing.


I'd call that a straw man. Very little code is written that way, so it doesn't apply in practice, like... at all.

Also: unless you have very specific requirements there are probably easier languages+tools to write proven-correct programs in.


I tend to agree. Really the C standard needs to be yeeted away from WG14's grasp. Because they're responsible for blocking safety related improvements.


And we need to stop pretending we do.


Do any shares have to be defined or just the module enabled for this to work ?


Not my area; but it looks to me like you have to be able to mount a share; so it is only unauthenticated if you have public shares defined.


Thought : Oh! Linux Kernel is buggy and then read to see it's Samba related and was pretty new module introduced around 2020.

For context Samba and NFS had historically been buggy or exploitable since the 90s.


>For context Samba and NFS had historically been buggy or exploitable since the 90s.

Cool then let's put it into the kernel, another buggy software? -> right into the kernel, webserver? -> Kernel...oh wait we had that. Database? -> Kernel


I'm so glad ksmbd exists. It's such a lightweight, easy, simple way to interoperate. As someone who has been running OpenWRT & then other small embedded systems for a decade and a half, projects like Samba have been wonderful, but are massive everything-and-the-kitchen-sink (even when heavily stripped down) sized tools required to interoperate with the rest of the computing world. Being able to have a small kernel module built in radically increases the number of systems that can benefit from common interoperation, and it greatly eases the difficulty by being a targeted focused file-sharing implementation rather than the incredibly wide-ranging implementations we get in Samba.

It's definitely been a bit of a challenge to make ksmbd happen. I want to be able to acknowledge problems, validate the fear people have had. But also, ksmbd feels like such a textbook example of what Steve Yeggie's thesis in Notes from the Mystery Magic Bus:

> Software engineering has its own political axis, ranging from conservative to liberal.

Starting from some definitions:

> So we'll start with an operational definition of conservatism, from Jost et al.: "We regard political conservatism as an ideological belief system that is significantly (but not completely) related to motivational concerns having to do with the psychological management of uncertainty and fear."

Today we see some validation of fear. There are problems. It's certainly an inconvenience for those relying on public shares functioning securely. Thusfar it's unclear how many people have been attacked via this, or what harm has been done: the scope of damage is unknown. But the conservative view is justified, in that there have been problems, ksmbd is causing those relying on it to have to update, or risk being attacked.

Reciprocally though, I want to highlight how great this effort is. This is a novel new implementation of a complex protocol, that sits right at the hub of how systems can work together. There's a progressive, can do-ism here that is absolutely cherisable & excellent. That there are problems along the road is absolutely too something we should factor in, is a concern. It's up to everyone to take score, and to decide their alignment, what to go for & what not to. Even though today is a "bad" day for this enhancement, even though the road forward for it hasn't been without difficulty, I don't feel like it dooms the whole enterprise. Trying, to me, feels so worthwhile. The scope of impact, the actual harm of trying, seems so mild, and the doomsaying & fearmongering seems so overblown, to me.


This should be in a Wasm sandbox. (if performance is so critical--otherwise keep it in userspace). Crazy we keep trading serious CVEs for a little perf.


The sandbox doesn't protect against corruption inside of linear memory, nor exploits that take advantage of it to try out to influence code execution paths, triggering calls that shouldn't have happened in first place.

Great the exploit cannot pown the host, it can nevertheless trigger damaging behaviours.


To elaborate on point: "Great, the WASM-SMB server didn't crash the kernel, it can ONLY exfiltrate every single file it has access to, which is every important file because it is a file server. But don't worry, your /etc/shadow is safe!"

And of course obligatory XKCD: https://xkcd.com/1200/

I'm happy most HN users don't make anything security-important...


Wasm doesn't necessarily help if it's performance-critical because that comes with its own overhead. And the kernel component of ksmbd interacts with other parts of the kernel such as the vfs, sockets/rdma. If you had marshal all the objects into something safe before exposing them to a wasm sandbox (e.g. replacing pointers with some map keys) that'd increase the overhead further.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: