* Re: CVE-2025-21830: landlock: Handle weird files [not found] ` <2025031034-savanna-debit-eb8e@gregkh> @ 2025-03-10 23:42 ` Dave Chinner 2025-03-11 2:09 ` Kent Overstreet ` (2 more replies) 0 siblings, 3 replies; 22+ messages in thread From: Dave Chinner @ 2025-03-10 23:42 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Mickaël Salaün, cve, Günther Noack, linux-security-module, Kent Overstreet, linux-bcachefs, linux-fsdevel [cc linux-fsdevel] On Mon, Mar 10, 2025 at 03:36:04PM +0100, Greg Kroah-Hartman wrote: > On Mon, Mar 10, 2025 at 01:00:50PM +0100, Mickaël Salaün wrote: > > Hi Greg, > > > > FYI, I don't think this patch fixes a security issue. If attackers can > > corrupt a filesystem, then they should already be able to harm the whole > > system. > > > > The commit description might be a bit confusing, but from an access > > control point of view, the filesystem on which we spotted this issue > > (bcachefs) does not allow to open weird files (but they are still > > visible, hence this patch) and I guess it would be the same for other > > filesystems, right? I'm not sure how a weird file could be used by user > > space. See > > https://lore.kernel.org/all/Zpc46HEacI%2Fwd7Rg@dread.disaster.area/ > > > > The goal of this fix was mainly to not warn about a bcachefs issue (and > > avoid related syzkaller report for Landlock), and to harden Landlock in > > case other filesystems have this kind of bug. > > It was issue a CVE because the reviewers thought that it was a way to > circumvent the landlock permission checks, based on the changelog text > (note, creating a "corrupted filesystem" is quite easy to get many Linux > systems to auto-mount it, so those types of issues do get assigned > CVEs.) That's an argument straight from the security theatre. > If you all do not think this meets the definition of a vulnerability as > defined by CVE.org as: > An instance of one or more weaknesses in a Product that can be > exploited, causing a negative impact to confidentiality, integrity, or > availability; a set of conditions or behaviors that allows the > violation of an explicit or implicit security policy. Yes, so shall we follow this reasoning based on untrusted user auto-mounts of untrusted devices to it's logical conclusion? If an untrusted user is in control of the filesystem image, then they don't need to corrupt the filesystem image to subvert the system. They can just change the permissions on files, change ACLs, change security xattrs (selinux, landlock, smack, etc), replace the contents of file data (e.g. trojan executables), etc. The filesystem will not flag *any* of these shenanigans as they don't involve actually corrupting the filesystem structure. IOWs, the kernel filesystem code can function perfectly and bug free, yet the system can be silently compromised through the hole punched in the *implicitly trusted security information under user control* in the fs image. This is a "trusted device contains trusted security information" model deficiency, not a filesystem implementation issue. The CVE worthy issue here is that the security model is violated by the untrusted automounts, not by how the filesystem reacts to the security model violation that has already occurred. Further, the kernel (and therefore the filesystem implemenation) cannot prevent untrusted user device auto-mounts, so this must be considered a system level vulnerability that requires userspace policy and implementation changes to mitigate. We've tried for years to get userspace to adopt a more security-aware model for untrusted devices, but have made pretty much no progress. Filesystem developers have ended up with their userspace filesystem packages shipping udisks rules to turn off automounting of those filesystem types for application that use udisks for this stuff. That catches -some- of the automounting behaviour, but not all of it. And we can't do anything else without changes to the wider userspace/distro policies around user automounting of untrusted devices. IOWs, to prevent these "corrupted filesystem causes issues" from being considered security issues, we need userspace to stop violating the kernel trust model for persistent security information storage. Greg, you have the ability to issue a CVE that will require downstream distros to fix userspace-based vulnerabilities if they want various certifications. You have the power to force downstream distros to -change their security model policies- for the wider good. We could knock out this whole class of vulnerability in one CVE: issue a CVE considering the auto-mounting of untrusted filesystem images as a *critical system vulnerability*. This can only be solved by changing the distro policies and implementations that allow this dangerous behaviour to persist. We've suggested many relatively user friendly ways this can be handled in the past (e.g. device fingerprinting via libblkid (which it kinda already does) and prompting the user to allow/deny devices with an unknown fingerprint). The simplest policy fix is to simply disallow auto-mount of removable devices by default across the entire distro. If distros want to close that kernel CVE then they have to, at minimum, turn off device auto-mount by default across the entire distro. At worst, this makes the reason you give for filesystem corruption issues being considered CVE worthy go away completely. At best, we get full distro level integration of efficient, persistent untrusted device handling at the desktop interfaces. That would be a win for -everyone-, not just the distro people who have to handle kernel CVEs.... If we want filesystem corruption CVEs to be any other than security theatre, then use we should be using the kernel CVE powers for the reason they were obtained in the first place. i.e. to force downstream distros to address issues they would otherwise ignore to help make our linux systems more reliable and secure. -Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: CVE-2025-21830: landlock: Handle weird files 2025-03-10 23:42 ` CVE-2025-21830: landlock: Handle weird files Dave Chinner @ 2025-03-11 2:09 ` Kent Overstreet 2025-03-11 4:24 ` Dave Chinner 2025-03-11 2:19 ` Unprivileged filesystem mounts Demi Marie Obenour 2025-03-11 6:53 ` CVE-2025-21830: landlock: Handle weird files Greg Kroah-Hartman 2 siblings, 1 reply; 22+ messages in thread From: Kent Overstreet @ 2025-03-11 2:09 UTC (permalink / raw) To: Dave Chinner Cc: Greg Kroah-Hartman, Mickaël Salaün, cve, Günther Noack, linux-security-module, linux-bcachefs, linux-fsdevel On Tue, Mar 11, 2025 at 10:42:41AM +1100, Dave Chinner wrote: > [cc linux-fsdevel] > > On Mon, Mar 10, 2025 at 03:36:04PM +0100, Greg Kroah-Hartman wrote: > > On Mon, Mar 10, 2025 at 01:00:50PM +0100, Mickaël Salaün wrote: > > > Hi Greg, > > > > > > FYI, I don't think this patch fixes a security issue. If attackers can > > > corrupt a filesystem, then they should already be able to harm the whole > > > system. > > > > > > The commit description might be a bit confusing, but from an access > > > control point of view, the filesystem on which we spotted this issue > > > (bcachefs) does not allow to open weird files (but they are still > > > visible, hence this patch) and I guess it would be the same for other > > > filesystems, right? I'm not sure how a weird file could be used by user > > > space. See > > > https://lore.kernel.org/all/Zpc46HEacI%2Fwd7Rg@dread.disaster.area/ > > > > > > The goal of this fix was mainly to not warn about a bcachefs issue (and > > > avoid related syzkaller report for Landlock), and to harden Landlock in > > > case other filesystems have this kind of bug. > > > > It was issue a CVE because the reviewers thought that it was a way to > > circumvent the landlock permission checks, based on the changelog text > > (note, creating a "corrupted filesystem" is quite easy to get many Linux > > systems to auto-mount it, so those types of issues do get assigned > > CVEs.) > > That's an argument straight from the security theatre. > > > If you all do not think this meets the definition of a vulnerability as > > defined by CVE.org as: > > An instance of one or more weaknesses in a Product that can be > > exploited, causing a negative impact to confidentiality, integrity, or > > availability; a set of conditions or behaviors that allows the > > violation of an explicit or implicit security policy. > > Yes, so shall we follow this reasoning based on untrusted user > auto-mounts of untrusted devices to it's logical conclusion? > > If an untrusted user is in control of the filesystem image, then > they don't need to corrupt the filesystem image to subvert the > system. They can just change the permissions on files, change ACLs, > change security xattrs (selinux, landlock, smack, etc), > replace the contents of file data (e.g. trojan executables), etc. If user mounts are enabled, that comes with UID mapping, and device nodes disabled - no? Out of curiosity, what's keeping us from saying "user mounts are generally expected to be safe" for XFS? Obviously, that does expose a massive attack surface, so saying that for a C codebase that wasn't initially designed for it has a high pucker factor. But I've been impressed with syzbot's ability to find bugs, so barring architectural issues which I assume you'd know about it seems it's not nearly as crazy a thought as it used to be - for XFS, as you guys have been the most rigorous about hardening so I expect that's about as good as it's going to get until we start rewriting our filesystems in Rust. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: CVE-2025-21830: landlock: Handle weird files 2025-03-11 2:09 ` Kent Overstreet @ 2025-03-11 4:24 ` Dave Chinner 2025-03-11 10:50 ` Kent Overstreet 0 siblings, 1 reply; 22+ messages in thread From: Dave Chinner @ 2025-03-11 4:24 UTC (permalink / raw) To: Kent Overstreet Cc: Greg Kroah-Hartman, Mickaël Salaün, cve, Günther Noack, linux-security-module, linux-bcachefs, linux-fsdevel On Mon, Mar 10, 2025 at 10:09:22PM -0400, Kent Overstreet wrote: > On Tue, Mar 11, 2025 at 10:42:41AM +1100, Dave Chinner wrote: > > [cc linux-fsdevel] > > > > On Mon, Mar 10, 2025 at 03:36:04PM +0100, Greg Kroah-Hartman wrote: > > > On Mon, Mar 10, 2025 at 01:00:50PM +0100, Mickaël Salaün wrote: > > > > Hi Greg, > > > > > > > > FYI, I don't think this patch fixes a security issue. If attackers can > > > > corrupt a filesystem, then they should already be able to harm the whole > > > > system. > > > > > > > > The commit description might be a bit confusing, but from an access > > > > control point of view, the filesystem on which we spotted this issue > > > > (bcachefs) does not allow to open weird files (but they are still > > > > visible, hence this patch) and I guess it would be the same for other > > > > filesystems, right? I'm not sure how a weird file could be used by user > > > > space. See > > > > https://lore.kernel.org/all/Zpc46HEacI%2Fwd7Rg@dread.disaster.area/ > > > > > > > > The goal of this fix was mainly to not warn about a bcachefs issue (and > > > > avoid related syzkaller report for Landlock), and to harden Landlock in > > > > case other filesystems have this kind of bug. > > > > > > It was issue a CVE because the reviewers thought that it was a way to > > > circumvent the landlock permission checks, based on the changelog text > > > (note, creating a "corrupted filesystem" is quite easy to get many Linux > > > systems to auto-mount it, so those types of issues do get assigned > > > CVEs.) > > > > That's an argument straight from the security theatre. > > > > > If you all do not think this meets the definition of a vulnerability as > > > defined by CVE.org as: > > > An instance of one or more weaknesses in a Product that can be > > > exploited, causing a negative impact to confidentiality, integrity, or > > > availability; a set of conditions or behaviors that allows the > > > violation of an explicit or implicit security policy. > > > > Yes, so shall we follow this reasoning based on untrusted user > > auto-mounts of untrusted devices to it's logical conclusion? > > > > If an untrusted user is in control of the filesystem image, then > > they don't need to corrupt the filesystem image to subvert the > > system. They can just change the permissions on files, change ACLs, > > change security xattrs (selinux, landlock, smack, etc), > > replace the contents of file data (e.g. trojan executables), etc. > > If user mounts are enabled, that comes with UID mapping, and device > nodes disabled - no? Not necessarily. Those security mechanisms are all optional mount options under userspace control.... > Out of curiosity, what's keeping us from saying "user mounts are > generally expected to be safe" for XFS? What does "generally expected to be safe" actually mean? If be "safe" you mean "won't crash the kernel if the structure has been altered in detectable ways with", then we already largely tick that box. However, there are whole classes of DOS attacks that are very difficult to detect without rigorous, expensive runtime checking (e.g. loops in btree pointers). Hence while we catch almost all the the obvious out-of-bounds corruptions within an object, detecting corruptions that require spanning a largely unbound number of objects to detect are not handled at all. I can corrupt a filesystem to induce an endless btree search loop like this pretty easily with a little bit of xfs_db magic. Yup, we even provide the tools to make doing stuff like this easy... If by "safe" you mean "can detect all cases where a metadata field or file data has been tampered with", then XFS is completely unsafe and should not be used. We can't detect that a malicious actor has changed something like a file permission field or the contents of a security xattr. To do that requires cryptographically secure signatures of metadata objects and file data. We do not have that sort of feature in the on-disk format. We expect users that need protection from such tampering will use an envrypted block device to prevent malicious actors from being able to mutate the filesystem structure in this way. > Obviously, that does expose a massive attack surface, so saying that for > a C codebase that wasn't initially designed for it has a high pucker > factor. > > But I've been impressed with syzbot's ability to find bugs, so barring > architectural issues which I assume you'd know about it seems it's not > nearly as crazy a thought as it used to be - for XFS, as you guys have > been the most rigorous about hardening so I expect that's about as good > as it's going to get until we start rewriting our filesystems in Rust. The concerns I have about malicious actors are not mitigated by the language the filesystem is implemented in. It has everything to do with the fact that a filesystem like XFS or ext4 cannot detect someone changing permissions on a file to, say, add a setuid bit to the permissions field and then hide the modification by recalculating the correct CRC for the metdata block. Solving that problem requires a fundamentally different fs/device trust model (i.e. the device is *never* trusted) and an on-disk format that is based around "trust nothing" rather than "trust everything". -Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: CVE-2025-21830: landlock: Handle weird files 2025-03-11 4:24 ` Dave Chinner @ 2025-03-11 10:50 ` Kent Overstreet 0 siblings, 0 replies; 22+ messages in thread From: Kent Overstreet @ 2025-03-11 10:50 UTC (permalink / raw) To: Dave Chinner Cc: Greg Kroah-Hartman, Mickaël Salaün, cve, Günther Noack, linux-security-module, linux-bcachefs, linux-fsdevel On Tue, Mar 11, 2025 at 03:24:40PM +1100, Dave Chinner wrote: > On Mon, Mar 10, 2025 at 10:09:22PM -0400, Kent Overstreet wrote: > > On Tue, Mar 11, 2025 at 10:42:41AM +1100, Dave Chinner wrote: > > If user mounts are enabled, that comes with UID mapping, and device > > nodes disabled - no? > > Not necessarily. Those security mechanisms are all optional mount > options under userspace control.... Well, if someone's being an idiot, that's on them and not something I'm going to argue about :) Uidmapping has been around for plenty long enough for userspace to start using it. > > > Out of curiosity, what's keeping us from saying "user mounts are > > generally expected to be safe" for XFS? > > What does "generally expected to be safe" actually mean? > > If be "safe" you mean "won't crash the kernel if the structure has > been altered in detectable ways with", then we already largely tick > that box. However, there are whole classes of DOS attacks that are > very difficult to detect without rigorous, expensive runtime > checking (e.g. loops in btree pointers). btree nodes don't change depth, so just recording the level of a node and validating it trivially defeats that. bcachefs has that in its on disk format, but if you don't have that then that might be a problem - you'd at least need to know a priori the depth of the root node. > Hence while we catch almost all the the obvious out-of-bounds > corruptions within an object, detecting corruptions that require > spanning a largely unbound number of objects to detect are not > handled at all. I can corrupt a filesystem to induce an endless > btree search loop like this pretty easily with a little bit of > xfs_db magic. Yup, we even provide the tools to make doing stuff > like this easy... *nod* In bcachefs, we right now have no way to cleanly detect "filesystem is actually full, disk accounting info is wrong" so - that means corruption causes allocations to get stuck. That one is fixable, and I'm going to have to at some point since syzbot knows how to trigger it :) > If by "safe" you mean "can detect all cases where a metadata field > or file data has been tampered with", then XFS is completely unsafe > and should not be used. > > We can't detect that a malicious actor has changed something like a > file permission field or the contents of a security xattr. To do > that requires cryptographically secure signatures of metadata > objects and file data. We do not have that sort of feature in the > on-disk format. We expect users that need protection from such > tampering will use an envrypted block device to prevent malicious > actors from being able to mutate the filesystem structure in this > way. Yeah, but that's the less interesting case to me. Not uninteresting, since "I don't fully trust my block device" is a real scenario with network attached storage. But generally, the tampering would be done by the user that did the mount - so perhaps we need to find some new nudges to make uidmapping of user mounts required? That could be done in util-linux... ^ permalink raw reply [flat|nested] 22+ messages in thread
* Unprivileged filesystem mounts 2025-03-10 23:42 ` CVE-2025-21830: landlock: Handle weird files Dave Chinner 2025-03-11 2:09 ` Kent Overstreet @ 2025-03-11 2:19 ` Demi Marie Obenour 2025-03-11 5:57 ` Dave Chinner 2025-03-11 6:53 ` CVE-2025-21830: landlock: Handle weird files Greg Kroah-Hartman 2 siblings, 1 reply; 22+ messages in thread From: Demi Marie Obenour @ 2025-03-11 2:19 UTC (permalink / raw) To: david Cc: cve, gnoack, gregkh, kent.overstreet, linux-bcachefs, linux-fsdevel, linux-security-module, mic, Demi Marie Obenour People have stuff to get done. If you disallow unprivileged filesystem mounts, they will just use sudo (or equivalent) instead. The problem is not that users are mounting untrusted filesystems. The problem is that mounting untrusted filesystems is unsafe. Making untrusted filesystems safe to mount is the only solution that lets users do what they actually need to do. That means either actually fixing the filesystem code, or running it in a sufficiently tight sandbox that vulnerabilities in it are of too low importance to matter. libguestfs+FUSE is the most obvious way to do this, but the performance might not be enough for distros to turn it on. For ext4 and F2FS, if there is a vulnerability that can be exploited by a malicious filesystem image, it is a verified boot bypass for Chrome OS and Android, respectively. Verified boot is a security boundary for both of them, so just forward syzbot reports to their respective security teams and let them do the jobs they are paid to do. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Unprivileged filesystem mounts 2025-03-11 2:19 ` Unprivileged filesystem mounts Demi Marie Obenour @ 2025-03-11 5:57 ` Dave Chinner 2025-03-11 11:01 ` Christian Brauner ` (2 more replies) 0 siblings, 3 replies; 22+ messages in thread From: Dave Chinner @ 2025-03-11 5:57 UTC (permalink / raw) To: Demi Marie Obenour Cc: cve, gnoack, gregkh, kent.overstreet, linux-bcachefs, linux-fsdevel, linux-security-module, mic, Demi Marie Obenour On Mon, Mar 10, 2025 at 10:19:57PM -0400, Demi Marie Obenour wrote: > People have stuff to get done. If you disallow unprivileged filesystem > mounts, they will just use sudo (or equivalent) instead. I am not advocating that we disallow mounting of untrusted devices. > The problem is > not that users are mounting untrusted filesystems. The problem is that > mounting untrusted filesystems is unsafe. > Making untrusted filesystems safe to mount is the only solution that > lets users do what they actually need to do. That means either actually > fixing the filesystem code, Yes, and the point I keep making is that we cannot provide that guarantee from the kernel for existing filesystems. We cannot detect all possible malicous tampering situations without cryptogrpahically secure verification, and we can't generate full trust from nothing. The typical desktop policy of "probe and automount any device that is plugged in" prevents the user from examining the device to determine if it contains what it is supposed to contain. The user is not given any opportunity to device if trust is warranted before the kernel filesystem parser running in ring 0 is exposed to the malicious image. That's the fundamental policy problem we need to address: the user and/or admin is not in control of their own security because application developers and/or distro maintainers have decided they should not have a choice. In this situation, the choice of what to do *must* fall to the user, but the argument for "filesystem corruption is a CVE-worthy bug" is that the choice has been taken away from the user. That's what I'm saying needs to change - the choice needs to be returned to the user... > or running it in a sufficiently tight > sandbox that vulnerabilities in it are of too low importance to matter. > libguestfs+FUSE is the most obvious way to do this, but the performance > might not be enough for distros to turn it on. Yes, I have advocated for that to be used for desktop mounts in the past. Similarly, I have also advocated for liblinux + FUSE to be used so that the kernel filesystem code is used but run from a userspace context where the kernel cannot be compromised. I have also advocated for user removable devices to be encrypted by default. The act of the user unlocking the device automatically marks it as trusted because undetectable malicious tampering is highly unlikely. I have also advocated for a device registry that records removable device signatures and whether the user trusted them or not so that they only need to be prompted once for any given removable device they use. There are *many* potential user-friendly solutions to the problem, but they -all- lie in the domain of userspace applications and/or policies. This is *not* a problem more or better code in the kernel can solve. Kees and Co keep telling us we should be making changes that make it harder (or compeltely prevent) entire classes of vulnerabilities from being exploited. Yet every time we suggest that a more secure policy should be applied to automounting filesystems to prevent system compromise on device hotplug, nobody seems to be willing to put security first. > For ext4 and F2FS, if there is a vulnerability that can be exploited by > a malicious filesystem image, it is a verified boot bypass for Chrome OS > and Android, respectively. Verified boot is a security boundary for > both of them, How does one maliciously corrupt the root filesystem on an Android phone? How many security boundaries have to be violated before an attacker can directly modify the physical storage underlying the read-only system partition? Again, if the attacker has device modification capability, why would they bother trying to perform a complex filesystem corruption attack during boot when they can simply modify what runs on startup? And is this a real attack vector that Android must defend against, why isn't that device and filesystem image cryptographically signed and verified at boot time to prevent such attacks? That will prevent the entire class of malicious tampering exploits completely without having to care about undiscovered filesystem bugs - that's a much more robust solution from a verified boot and system security perspective... > so just forward syzbot reports to their respective > security teams and let them do the jobs they are paid to do. Security teams don't fix "syzbot bugs"; they are typically the people that run syzbot instances. It's the developers who then have to triage and fix the issues that are found, so that's who the bug reports should go to (and do). And just because syzbot finds an issue, that doesn't make it a security issue - all it is is another bug found by another automated test suite that needs fixing. -Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Unprivileged filesystem mounts 2025-03-11 5:57 ` Dave Chinner @ 2025-03-11 11:01 ` Christian Brauner 2025-03-11 17:36 ` Al Viro 2025-03-11 17:54 ` Eric Biggers 2025-03-11 20:10 ` Demi Marie Obenour 2 siblings, 1 reply; 22+ messages in thread From: Christian Brauner @ 2025-03-11 11:01 UTC (permalink / raw) To: Dave Chinner Cc: Demi Marie Obenour, cve, gnoack, gregkh, kent.overstreet, linux-bcachefs, linux-fsdevel, linux-security-module, mic, Demi Marie Obenour On Tue, Mar 11, 2025 at 04:57:54PM +1100, Dave Chinner wrote: > On Mon, Mar 10, 2025 at 10:19:57PM -0400, Demi Marie Obenour wrote: > > People have stuff to get done. If you disallow unprivileged filesystem > > mounts, they will just use sudo (or equivalent) instead. > > I am not advocating that we disallow mounting of untrusted devices. > > > The problem is > > not that users are mounting untrusted filesystems. The problem is that > > mounting untrusted filesystems is unsafe. > > > Making untrusted filesystems safe to mount is the only solution that > > lets users do what they actually need to do. That means either actually > > fixing the filesystem code, > > Yes, and the point I keep making is that we cannot provide that > guarantee from the kernel for existing filesystems. We cannot detect > all possible malicous tampering situations without cryptogrpahically > secure verification, and we can't generate full trust from nothing. > > The typical desktop policy of "probe and automount any device that > is plugged in" prevents the user from examining the device to > determine if it contains what it is supposed to contain. The user > is not given any opportunity to device if trust is warranted before > the kernel filesystem parser running in ring 0 is exposed to the > malicious image. > > That's the fundamental policy problem we need to address: the user > and/or admin is not in control of their own security because > application developers and/or distro maintainers have decided they > should not have a choice. > > In this situation, the choice of what to do *must* fall to the user, > but the argument for "filesystem corruption is a CVE-worthy bug" is > that the choice has been taken away from the user. That's what I'm > saying needs to change - the choice needs to be returned to the > user... > > > or running it in a sufficiently tight > > sandbox that vulnerabilities in it are of too low importance to matter. > > libguestfs+FUSE is the most obvious way to do this, but the performance > > might not be enough for distros to turn it on. > > Yes, I have advocated for that to be used for desktop mounts in the > past. Similarly, I have also advocated for liblinux + FUSE to be > used so that the kernel filesystem code is used but run from a > userspace context where the kernel cannot be compromised. > > I have also advocated for user removable devices to be encrypted by > default. The act of the user unlocking the device automatically > marks it as trusted because undetectable malicious tampering is > highly unlikely. > > I have also advocated for a device registry that records removable > device signatures and whether the user trusted them or not so that > they only need to be prompted once for any given removable device > they use. > > There are *many* potential user-friendly solutions to the problem, > but they -all- lie in the domain of userspace applications and/or > policies. This is *not* a problem more or better code in the kernel > can solve. Strongly agree. > > Kees and Co keep telling us we should be making changes that make it > harder (or compeltely prevent) entire classes of vulnerabilities > from being exploited. Yet every time we suggest that a more secure > policy should be applied to automounting filesystems to prevent > system compromise on device hotplug, nobody seems to be willing to > put security first. I agree with Dave here a lot. The case where arbitrary devices stuck into a laptop (e.g., USB sticks) are mounted isn't solved by making a filesystem mountable unprivileged. The mounted device cannot show up in the global mount namespace somewhere since the user doesn't own the initial mount+user namespace. So it's pointless. In other words, there's filesystem level checks and mount namespace based checks. Circumventing that restriction means that any user can just mount the device at any location in the global mount namespace and therefore simply overmount other stuff. The other thing is whether or not a filesystem is allowed to be mounted by an unprivileged user namespaces. That is not a policy decision the kernel can make, should make, or has to make. This is a road to security disaster. The new mount api has built-in delegation capabilities for exactly this reason and use-case so the kernel doesn't have to do that. Policy like that belongs into userspace. The new mount api makes it possible for userspace to correctly and safely delegate any filesystem mount to unprivileged users. It's e.g., heavily used by bpf to make bpffs and thus bpf usable by unprivileged userspace and containers. There's a generic API for this already that we presented on in [1] at LSFMM 2023. This has proper security policies in place when and how it is allowed even for a user not in a user namespace to mount an arbitrary filesystem (device or no device-based). NAME systemd-mountfsd.service, systemd-mountfsd - Disk Image File System Mount Service SYNOPSIS systemd-mountfsd.service /usr/lib/systemd/systemd-mountfsd DESCRIPTION systemd-mountfsd is a system service that dissects disk images, and returns mount file descriptors for the file systems contained therein to clients, via a Varlink IPC API. The disk images provided must contain a raw file system image or must follow the Discoverable Partitions Specification[1]. Before mounting any file systems authenticity of the disk image is established in one or a combination of the following ways: 1. If the disk image is located in a regular file in one of the directories /var/lib/machines/, /var/lib/portables/, /var/lib/extensions/, /var/lib/confexts/ or their counterparts in the /etc/, /run/, /usr/lib/ it is assumed to be trusted. 2. If the disk image contains a Verity enabled disk image, along with a signature partition with a key in the kernel keyring or in /etc/verity.d/ (and related directories) the disk image is considered trusted. This service provides one Varlink[2] service: io.systemd.MountFileSystem which accepts a file descriptor to a regular file or block device, and returns a number of file descriptors referring to an fsmount() file descriptor the client may then attach to a path of their choice. The returned mounts are automatically allowlisted in the per-user-namespace allowlist maintained by systemd-nsresourced.service(8). The file systems are automatically fsck(8)'ed before mounting. NOTES 1. Discoverable Partitions Specification https://uapi-group.org/specifications/specs/discoverable_partitions_specification/ 2. Varlink https://varlink.org/ This work has now also been expanded to cover plain directory trees and will be available in the next release. It is currently part of systemd but like with a lot of other such tools they are available standalone for non-systemd systems and if not that can be done. [1]: https://youtu.be/RbMhupT3Dk4?si=pIGH5XPPUJ0m6bi0 ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Unprivileged filesystem mounts 2025-03-11 11:01 ` Christian Brauner @ 2025-03-11 17:36 ` Al Viro 2025-03-11 17:43 ` Kent Overstreet 0 siblings, 1 reply; 22+ messages in thread From: Al Viro @ 2025-03-11 17:36 UTC (permalink / raw) To: Christian Brauner Cc: Dave Chinner, Demi Marie Obenour, cve, gnoack, gregkh, kent.overstreet, linux-bcachefs, linux-fsdevel, linux-security-module, mic, Demi Marie Obenour On Tue, Mar 11, 2025 at 12:01:48PM +0100, Christian Brauner wrote: > The case where arbitrary devices stuck into a laptop (e.g., USB sticks) > are mounted isn't solved by making a filesystem mountable unprivileged. > The mounted device cannot show up in the global mount namespace > somewhere since the user doesn't own the initial mount+user namespace. > So it's pointless. In other words, there's filesystem level checks and > mount namespace based checks. Circumventing that restriction means that > any user can just mount the device at any location in the global mount > namespace and therefore simply overmount other stuff. Note that "untrusted contents" is not the worst thing you can run into - it can be content changing behind your back. I seriously doubt that anyone fuzzes for that kind of crap (and no, it's not an invitation to start). I seriously doubt that there's any local filesystem that would be resilent to that... ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Unprivileged filesystem mounts 2025-03-11 17:36 ` Al Viro @ 2025-03-11 17:43 ` Kent Overstreet 0 siblings, 0 replies; 22+ messages in thread From: Kent Overstreet @ 2025-03-11 17:43 UTC (permalink / raw) To: Al Viro Cc: Christian Brauner, Dave Chinner, Demi Marie Obenour, cve, gnoack, gregkh, linux-bcachefs, linux-fsdevel, linux-security-module, mic, Demi Marie Obenour On Tue, Mar 11, 2025 at 05:36:00PM +0000, Al Viro wrote: > On Tue, Mar 11, 2025 at 12:01:48PM +0100, Christian Brauner wrote: > > > The case where arbitrary devices stuck into a laptop (e.g., USB sticks) > > are mounted isn't solved by making a filesystem mountable unprivileged. > > The mounted device cannot show up in the global mount namespace > > somewhere since the user doesn't own the initial mount+user namespace. > > So it's pointless. In other words, there's filesystem level checks and > > mount namespace based checks. Circumventing that restriction means that > > any user can just mount the device at any location in the global mount > > namespace and therefore simply overmount other stuff. > > Note that "untrusted contents" is not the worst thing you can run into - > it can be content changing behind your back. I seriously doubt that > anyone fuzzes for that kind of crap (and no, it's not an invitation to > start). I seriously doubt that there's any local filesystem that would > be resilent to that... Given network block devices (more common with cloud stuff these days), it's not a totally unreasonable thing to want to be secure against. I'd love to see someone attack bcachefs that way - in a few more years :) ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Unprivileged filesystem mounts 2025-03-11 5:57 ` Dave Chinner 2025-03-11 11:01 ` Christian Brauner @ 2025-03-11 17:54 ` Eric Biggers 2025-03-11 20:10 ` Demi Marie Obenour 2 siblings, 0 replies; 22+ messages in thread From: Eric Biggers @ 2025-03-11 17:54 UTC (permalink / raw) To: Dave Chinner Cc: Demi Marie Obenour, cve, gnoack, gregkh, kent.overstreet, linux-bcachefs, linux-fsdevel, linux-security-module, mic, Demi Marie Obenour On Tue, Mar 11, 2025 at 04:57:54PM +1100, Dave Chinner wrote: > And is this a real attack vector that Android must defend against, > why isn't that device and filesystem image cryptographically signed > and verified at boot time to prevent such attacks? That will prevent > the entire class of malicious tampering exploits completely without > having to care about undiscovered filesystem bugs - that's a much > more robust solution from a verified boot and system security > perspective... That's exactly how it works. See https://source.android.com/docs/security/features/verifiedboot and https://source.android.com/docs/security/features/verifiedboot/dm-verity. - Eric ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Unprivileged filesystem mounts 2025-03-11 5:57 ` Dave Chinner 2025-03-11 11:01 ` Christian Brauner 2025-03-11 17:54 ` Eric Biggers @ 2025-03-11 20:10 ` Demi Marie Obenour 2025-03-18 5:21 ` Dave Chinner 2025-03-18 22:11 ` Theodore Ts'o 2 siblings, 2 replies; 22+ messages in thread From: Demi Marie Obenour @ 2025-03-11 20:10 UTC (permalink / raw) To: Dave Chinner Cc: cve, gnoack, gregkh, kent.overstreet, linux-bcachefs, linux-fsdevel, linux-security-module, mic, Demi Marie Obenour [-- Attachment #1: Type: text/plain, Size: 8379 bytes --] On Tue, Mar 11, 2025 at 04:57:54PM +1100, Dave Chinner wrote: > On Mon, Mar 10, 2025 at 10:19:57PM -0400, Demi Marie Obenour wrote: > > People have stuff to get done. If you disallow unprivileged filesystem > > mounts, they will just use sudo (or equivalent) instead. > > I am not advocating that we disallow mounting of untrusted devices. > > > The problem is > > not that users are mounting untrusted filesystems. The problem is that > > mounting untrusted filesystems is unsafe. > > > Making untrusted filesystems safe to mount is the only solution that > > lets users do what they actually need to do. That means either actually > > fixing the filesystem code, > > Yes, and the point I keep making is that we cannot provide that > guarantee from the kernel for existing filesystems. We cannot detect > all possible malicous tampering situations without cryptogrpahically > secure verification, and we can't generate full trust from nothing. Why is it not possible to provide that guarantee? I'm not concerned about infinite loops or deadlocks. Is there a reason it is not possible to prevent memory corruption? > The typical desktop policy of "probe and automount any device that > is plugged in" prevents the user from examining the device to > determine if it contains what it is supposed to contain. The user > is not given any opportunity to device if trust is warranted before > the kernel filesystem parser running in ring 0 is exposed to the > malicious image. > > That's the fundamental policy problem we need to address: the user > and/or admin is not in control of their own security because > application developers and/or distro maintainers have decided they > should not have a choice. > > In this situation, the choice of what to do *must* fall to the user, > but the argument for "filesystem corruption is a CVE-worthy bug" is > that the choice has been taken away from the user. That's what I'm > saying needs to change - the choice needs to be returned to the > user... I am 100% in favor of not automounting filesystems without user interaction, but that only means that an exploit will require user interaction. Users need to get things done, and if their task requires them to a not-fully-trusted filesystem image, then that is what they will do, and they will typically do it in the most obvious way possible. That most obvious way needs to be a safe way, and it needs to have good enough performance that users don't go around looking for an unsafe way. > > or running it in a sufficiently tight > > sandbox that vulnerabilities in it are of too low importance to matter. > > libguestfs+FUSE is the most obvious way to do this, but the performance > > might not be enough for distros to turn it on. > > Yes, I have advocated for that to be used for desktop mounts in the > past. Similarly, I have also advocated for liblinux + FUSE to be > used so that the kernel filesystem code is used but run from a > userspace context where the kernel cannot be compromised. > > I have also advocated for user removable devices to be encrypted by > default. The act of the user unlocking the device automatically > marks it as trusted because undetectable malicious tampering is > highly unlikely. That is definitely a good idea. > I have also advocated for a device registry that records removable > device signatures and whether the user trusted them or not so that > they only need to be prompted once for any given removable device > they use. > > There are *many* potential user-friendly solutions to the problem, > but they -all- lie in the domain of userspace applications and/or > policies. This is *not* a problem more or better code in the kernel > can solve. It is certainly possible to make a memory safe implementation of amy filesystem. If the current implementation can't prevent memory corruption if a malicious filesystem is mounted, that is a characteristic of the implementation. > Kees and Co keep telling us we should be making changes that make it > harder (or compeltely prevent) entire classes of vulnerabilities > from being exploited. Yet every time we suggest that a more secure > policy should be applied to automounting filesystems to prevent > system compromise on device hotplug, nobody seems to be willing to > put security first. Not automounting filesystems on hotplug is a _part_ of the solution. It cannot be the _entire_ solution. Users sometimes need to be able to interact with untrusted filesystem images with a reasonable speed. > > For ext4 and F2FS, if there is a vulnerability that can be exploited by > > a malicious filesystem image, it is a verified boot bypass for Chrome OS > > and Android, respectively. Verified boot is a security boundary for > > both of them, > > How does one maliciously corrupt the root filesystem on an Android > phone? How many security boundaries have to be violated before > an attacker can directly modify the physical storage underlying the > read-only system partition? > > Again, if the attacker has device modification capability, why > would they bother trying to perform a complex filesystem > corruption attack during boot when they can simply modify what > runs on startup? > > And is this a real attack vector that Android must defend against, > why isn't that device and filesystem image cryptographically signed > and verified at boot time to prevent such attacks? That will prevent > the entire class of malicious tampering exploits completely without > having to care about undiscovered filesystem bugs - that's a much > more robust solution from a verified boot and system security > perspective... On both Android and ChromeOS, the root filesystem is a dm-verity volume, and the Merkle tree hash is either signed or is part of the signed kernel image. The signed kernel image is itself verified by the bootloader. Therefore, the root filesystem cannot be tampered with. However, the root filesystem is not the only filesystem image that must be mounted. There is also a writable data volume, and that _cannot_ be signed because it contains user data. It is encrypted, but part of the threat model for both Android and ChromeOS is an attacker who has gained root or even kernel code execution and wants to retain their access across device reboots. They can't tamper with the kernel or root filesystem, and privileged userspace treats the data on the writable filesystem as untrusted. However, the attacker can replace the writable filesystem image with anything they want, so the if they can craft an image that gains kernel code execution the next time the system boots, they have successfully obtained persistance. Also, at least Google Pixels support updating the OS via the bootloader. The bootloader checks that the image was signed by the OS vendor (generally, but not always, Google), and I believe it also checks for downgrade attacks. However, this means of updating the OS doesn't wipe user data. This means that if an attacker has gained code execution with root or even kernel privileges, updating the OS to a version that has patched the vulnerability the attacker used will revoke their access. The same is true if the attacker used USB for their exploit and the reboot happens after the user has unplugged the USB device. Furthermore, on UEFI systems the EFI System Partition cannot be cryptographically protected as the firmware does not support this. > > so just forward syzbot reports to their respective > > security teams and let them do the jobs they are paid to do. > > Security teams don't fix "syzbot bugs"; they are typically the > people that run syzbot instances. It's the developers who then > have to triage and fix the issues that are found, so that's who the > bug reports should go to (and do). And just because syzbot finds an > issue, that doesn't make it a security issue - all it is is another > bug found by another automated test suite that needs fixing. Browser vendors consider many kinds of memory unsafety problems to be exploitable until and unless proven otherwise. My understanding is that experience has proven them to be correct in this regard. -- Sincerely, Demi Marie Obenour (she/her/hers) Invisible Things Lab [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Unprivileged filesystem mounts 2025-03-11 20:10 ` Demi Marie Obenour @ 2025-03-18 5:21 ` Dave Chinner 2025-03-19 14:55 ` Demi Marie Obenour 2025-03-18 22:11 ` Theodore Ts'o 1 sibling, 1 reply; 22+ messages in thread From: Dave Chinner @ 2025-03-18 5:21 UTC (permalink / raw) To: Demi Marie Obenour Cc: cve, gnoack, gregkh, kent.overstreet, linux-bcachefs, linux-fsdevel, linux-security-module, mic, Demi Marie Obenour On Tue, Mar 11, 2025 at 04:10:42PM -0400, Demi Marie Obenour wrote: > On Tue, Mar 11, 2025 at 04:57:54PM +1100, Dave Chinner wrote: > > On Mon, Mar 10, 2025 at 10:19:57PM -0400, Demi Marie Obenour wrote: > > > People have stuff to get done. If you disallow unprivileged filesystem > > > mounts, they will just use sudo (or equivalent) instead. > > > > I am not advocating that we disallow mounting of untrusted devices. > > > > > The problem is > > > not that users are mounting untrusted filesystems. The problem is that > > > mounting untrusted filesystems is unsafe. > > > > > Making untrusted filesystems safe to mount is the only solution that > > > lets users do what they actually need to do. That means either actually > > > fixing the filesystem code, > > > > Yes, and the point I keep making is that we cannot provide that > > guarantee from the kernel for existing filesystems. We cannot detect > > all possible malicous tampering situations without cryptogrpahically > > secure verification, and we can't generate full trust from nothing. > > Why is it not possible to provide that guarantee? I'm not concerned > about infinite loops or deadlocks. Is there a reason it is not possible > to prevent memory corruption? You're asking me to prove that the on-disk filesystem format parsing implementation is 100% provably correct. Not only that, you're wanting me to say that journal replay copying incomplete, unverifiable structure fragments over the top of existing disk structures is 100% provably correct. I am the person whole architected the existing metadata validation infrastructure that XFS uses, and so I know it's limitations in intimate detail. It is, by far, the closest thing we have to complete runtime metadata validation in any Linux filesystem (except maybe bcachefs), but it is nowhere near able to detect and prevent 100% of potential structure corruptions. It is *far from trivial* to validate all the weird corner cases that exist in the on-disk format that have evolved over the last 3 decades. For the first 15 years of development, almost zero thought was given to runtime validation of the on-disk format. People even fought against introducing it at all. And despite this, we still have to support the on-disk functionality those old, difficult to validate, persistent structures describe. [ And then there's some other random memory corruption bug in the code, and all bets are off... ] IOWs, no filesystem developer is ever going to give you a guarantee that a filesystem implementation is free from memory corruption bugs unless they've designed and implemented from the ground up to be 100% safe from such issues. No such filesystem exists in the kernel, and it will probably be years away before anything may exist to fill that gap. > > The typical desktop policy of "probe and automount any device that > > is plugged in" prevents the user from examining the device to > > determine if it contains what it is supposed to contain. The user > > is not given any opportunity to device if trust is warranted before > > the kernel filesystem parser running in ring 0 is exposed to the > > malicious image. > > > > That's the fundamental policy problem we need to address: the user > > and/or admin is not in control of their own security because > > application developers and/or distro maintainers have decided they > > should not have a choice. > > > > In this situation, the choice of what to do *must* fall to the user, > > but the argument for "filesystem corruption is a CVE-worthy bug" is > > that the choice has been taken away from the user. That's what I'm > > saying needs to change - the choice needs to be returned to the > > user... > > I am 100% in favor of not automounting filesystems without user > interaction, but that only means that an exploit will require user > interaction. Users need to get things done, and if their task requires > them to a not-fully-trusted filesystem image, then that is what they > will do, and they will typically do it in the most obvious way possible. > That most obvious way needs to be a safe way, and it needs to have good > enough performance that users don't go around looking for an unsafe way. Well, yes, that is obvious, and not a point of contention at all, as is evidenced by the list of solutions to this problem I outlined. > > > or running it in a sufficiently tight > > > sandbox that vulnerabilities in it are of too low importance to matter. > > > libguestfs+FUSE is the most obvious way to do this, but the performance > > > might not be enough for distros to turn it on. > > > > Yes, I have advocated for that to be used for desktop mounts in the > > past. Similarly, I have also advocated for liblinux + FUSE to be > > used so that the kernel filesystem code is used but run from a > > userspace context where the kernel cannot be compromised. > > > > I have also advocated for user removable devices to be encrypted by > > default. The act of the user unlocking the device automatically > > marks it as trusted because undetectable malicious tampering is > > highly unlikely. > > That is definitely a good idea. > > > I have also advocated for a device registry that records removable > > device signatures and whether the user trusted them or not so that > > they only need to be prompted once for any given removable device > > they use. > > > > There are *many* potential user-friendly solutions to the problem, > > but they -all- lie in the domain of userspace applications and/or > > policies. This is *not* a problem more or better code in the kernel > > can solve. > > It is certainly possible to make a memory safe implementation of amy > filesystem. Spoken like a True Expert. > If the current implementation can't prevent memory > corruption if a malicious filesystem is mounted, that is a > characteristic of the implementation. Ah, now I see what you are trying to do. You're building a strawman around memory corruption that you can use the argument "we need to reimplement everything in Rust" to knock down. Sorry, not playing that game. > However, the root filesystem is not the only filesystem image that must > be mounted. There is also a writable data volume, and that _cannot_ be > signed because it contains user data. It is encrypted, but part of the > threat model for both Android and ChromeOS is an attacker who has gained > root or even kernel code execution and wants to retain their access > across device reboots. They can't tamper with the kernel or root > filesystem, and privileged userspace treats the data on the writable > filesystem as untrusted. However, the attacker can replace the writable > filesystem image with anything they want, And therein lies the attack a fielsystem implementation can't defend against: the attacker can rewrite the unencrypted block device to contain anything they want, and that will then pass verification on the next boot. Perhaps that's the class of storage attack you should seek to prevent, not try to slap bandaids over trust model violations or insinuate the only solution is to rewrite complex subsystems in Rust.... -Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Unprivileged filesystem mounts 2025-03-18 5:21 ` Dave Chinner @ 2025-03-19 14:55 ` Demi Marie Obenour 2025-03-19 16:59 ` Theodore Ts'o 0 siblings, 1 reply; 22+ messages in thread From: Demi Marie Obenour @ 2025-03-19 14:55 UTC (permalink / raw) To: Dave Chinner Cc: cve, gnoack, gregkh, kent.overstreet, linux-bcachefs, linux-fsdevel, linux-security-module, mic, Demi Marie Obenour [-- Attachment #1: Type: text/plain, Size: 8482 bytes --] On Tue, Mar 18, 2025 at 04:21:48PM +1100, Dave Chinner wrote: > On Tue, Mar 11, 2025 at 04:10:42PM -0400, Demi Marie Obenour wrote: > > On Tue, Mar 11, 2025 at 04:57:54PM +1100, Dave Chinner wrote: > > > On Mon, Mar 10, 2025 at 10:19:57PM -0400, Demi Marie Obenour wrote: > > > > People have stuff to get done. If you disallow unprivileged filesystem > > > > mounts, they will just use sudo (or equivalent) instead. > > > > > > I am not advocating that we disallow mounting of untrusted devices. > > > > > > > The problem is > > > > not that users are mounting untrusted filesystems. The problem is that > > > > mounting untrusted filesystems is unsafe. > > > > > > > Making untrusted filesystems safe to mount is the only solution that > > > > lets users do what they actually need to do. That means either actually > > > > fixing the filesystem code, > > > > > > Yes, and the point I keep making is that we cannot provide that > > > guarantee from the kernel for existing filesystems. We cannot detect > > > all possible malicous tampering situations without cryptogrpahically > > > secure verification, and we can't generate full trust from nothing. > > > > Why is it not possible to provide that guarantee? I'm not concerned > > about infinite loops or deadlocks. Is there a reason it is not possible > > to prevent memory corruption? > > You're asking me to prove that the on-disk filesystem format parsing > implementation is 100% provably correct. Not only that, you're > wanting me to say that journal replay copying incomplete, > unverifiable structure fragments over the top of existing disk > structures is 100% provably correct. > > I am the person whole architected the existing metadata validation > infrastructure that XFS uses, and so I know it's limitations in > intimate detail. It is, by far, the closest thing we have to > complete runtime metadata validation in any Linux filesystem > (except maybe bcachefs), but it is nowhere near able to detect and > prevent 100% of potential structure corruptions. > > It is *far from trivial* to validate all the weird corner cases that > exist in the on-disk format that have evolved over the last 3 > decades. For the first 15 years of development, almost zero thought > was given to runtime validation of the on-disk format. People even > fought against introducing it at all. And despite this, we still > have to support the on-disk functionality those old, difficult to > validate, persistent structures describe. > > [ And then there's some other random memory corruption bug in the > code, and all bets are off... ] > > IOWs, no filesystem developer is ever going to give you a guarantee > that a filesystem implementation is free from memory corruption bugs > unless they've designed and implemented from the ground up to be > 100% safe from such issues. No such filesystem exists in the kernel, > and it will probably be years away before anything may exist to fill > that gap. That makes sense. > > > The typical desktop policy of "probe and automount any device that > > > is plugged in" prevents the user from examining the device to > > > determine if it contains what it is supposed to contain. The user > > > is not given any opportunity to device if trust is warranted before > > > the kernel filesystem parser running in ring 0 is exposed to the > > > malicious image. > > > > > > That's the fundamental policy problem we need to address: the user > > > and/or admin is not in control of their own security because > > > application developers and/or distro maintainers have decided they > > > should not have a choice. > > > > > > In this situation, the choice of what to do *must* fall to the user, > > > but the argument for "filesystem corruption is a CVE-worthy bug" is > > > that the choice has been taken away from the user. That's what I'm > > > saying needs to change - the choice needs to be returned to the > > > user... > > > > I am 100% in favor of not automounting filesystems without user > > interaction, but that only means that an exploit will require user > > interaction. Users need to get things done, and if their task requires > > them to a not-fully-trusted filesystem image, then that is what they > > will do, and they will typically do it in the most obvious way possible. > > That most obvious way needs to be a safe way, and it needs to have good > > enough performance that users don't go around looking for an unsafe way. > > Well, yes, that is obvious, and not a point of contention at all, > as is evidenced by the list of solutions to this problem I outlined. What kind of performance do the existing solutions (libguestfs, lklfuse) have? > > > > or running it in a sufficiently tight > > > > sandbox that vulnerabilities in it are of too low importance to matter. > > > > libguestfs+FUSE is the most obvious way to do this, but the performance > > > > might not be enough for distros to turn it on. > > > > > > Yes, I have advocated for that to be used for desktop mounts in the > > > past. Similarly, I have also advocated for liblinux + FUSE to be > > > used so that the kernel filesystem code is used but run from a > > > userspace context where the kernel cannot be compromised. > > > > > > I have also advocated for user removable devices to be encrypted by > > > default. The act of the user unlocking the device automatically > > > marks it as trusted because undetectable malicious tampering is > > > highly unlikely. > > > > That is definitely a good idea. > > > > > I have also advocated for a device registry that records removable > > > device signatures and whether the user trusted them or not so that > > > they only need to be prompted once for any given removable device > > > they use. > > > > > > There are *many* potential user-friendly solutions to the problem, > > > but they -all- lie in the domain of userspace applications and/or > > > policies. This is *not* a problem more or better code in the kernel > > > can solve. > > > > It is certainly possible to make a memory safe implementation of amy > > filesystem. > > Spoken like a True Expert. I am saying this in the sense of "it is possible to make a memory safe implementation of *anything*, unless that thing exposes a memory unsafe API.". It's a generic statement about programs in general. It does not imply that doing so is practical. > > If the current implementation can't prevent memory > > corruption if a malicious filesystem is mounted, that is a > > characteristic of the implementation. > > Ah, now I see what you are trying to do. You're building a strawman > around memory corruption that you can use the argument "we need to > reimplement everything in Rust" to knock down. > > Sorry, not playing that game. There are other options, like "run the filesystem in a tightly sandboxed userspace process, especially compiled through WebAssembly". The difficulty is making them sufficiently performant for distributions to actually use them. > > However, the root filesystem is not the only filesystem image that must > > be mounted. There is also a writable data volume, and that _cannot_ be > > signed because it contains user data. It is encrypted, but part of the > > threat model for both Android and ChromeOS is an attacker who has gained > > root or even kernel code execution and wants to retain their access > > across device reboots. They can't tamper with the kernel or root > > filesystem, and privileged userspace treats the data on the writable > > filesystem as untrusted. However, the attacker can replace the writable > > filesystem image with anything they want, > > And therein lies the attack a fielsystem implementation can't defend > against: the attacker can rewrite the unencrypted block device to > contain anything they want, and that will then pass verification on > the next boot. Perhaps that's the class of storage attack you should > seek to prevent, not try to slap bandaids over trust model > violations or insinuate the only solution is to rewrite complex > subsystems in Rust.... The Chrome OS and Android threat models require that they remain secure no matter what the contents of the unsigned block device actually are, even if they are completely malicious. -- Sincerely, Demi Marie Obenour (she/her/hers) Invisible Things Lab [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Unprivileged filesystem mounts 2025-03-19 14:55 ` Demi Marie Obenour @ 2025-03-19 16:59 ` Theodore Ts'o 2025-03-19 17:32 ` Demi Marie Obenour 0 siblings, 1 reply; 22+ messages in thread From: Theodore Ts'o @ 2025-03-19 16:59 UTC (permalink / raw) To: Demi Marie Obenour Cc: Dave Chinner, cve, gnoack, gregkh, kent.overstreet, linux-bcachefs, linux-fsdevel, linux-security-module, mic, Demi Marie Obenour On Wed, Mar 19, 2025 at 10:55:39AM -0400, Demi Marie Obenour wrote: > What kind of performance do the existing solutions (libguestfs, lklfuse) > have? For most of the use cases that I'm aware of, which is to support occasional file transfers through crappy USB thumb drives (the kind which a nation state actor would to scatter in the parking lot of their target), the performance doesn't really matter. Certainly these are the ones which apply for the Android and ChromeOS use cases. I suppose there is the use case of people who are running Adobe Lightroom Classic on their Macbook Air where they are using an external SSD because Apple's storage pricing is highway robbery, but (a) it's MacOS, not Linux, and (b) this is arguably a much smaller percentage of the use case cases in terms of millions and millions of Android and Chrome Users. Most of the more naive Mac users probably just pay $$$ to Apple and don't use external storage anyway. :-) > There are other options, like "run the filesystem in a tightly sandboxed > userspace process, especially compiled through WebAssembly". The > difficulty is making them sufficiently performant for distributions to > actually use them. I suspect that using a kernel file system running in a guest VM and then making it available via 9pfs would be far more performant than something involving FUSE. But the details would all be in the implementation, and the skill level of the engineer doing the work. I'll also note that since you are mentioning Chrome OS and Android a lot, there seems to be a lot of interest in using VM's as a security boundary (see CrosVM[1] which is a Rust-based VMM). So it's likely that this infrastructure would be available to you if you are doing work in this area. [1] https://github.com/google/crosvm Cheers, - Ted ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Unprivileged filesystem mounts 2025-03-19 16:59 ` Theodore Ts'o @ 2025-03-19 17:32 ` Demi Marie Obenour 2025-03-19 20:11 ` Theodore Ts'o 0 siblings, 1 reply; 22+ messages in thread From: Demi Marie Obenour @ 2025-03-19 17:32 UTC (permalink / raw) To: Theodore Ts'o Cc: Dave Chinner, cve, gnoack, gregkh, kent.overstreet, linux-bcachefs, linux-fsdevel, linux-security-module, mic, Demi Marie Obenour [-- Attachment #1: Type: text/plain, Size: 2493 bytes --] On Wed, Mar 19, 2025 at 12:59:31PM -0400, Theodore Ts'o wrote: > On Wed, Mar 19, 2025 at 10:55:39AM -0400, Demi Marie Obenour wrote: > > What kind of performance do the existing solutions (libguestfs, lklfuse) > > have? > > For most of the use cases that I'm aware of, which is to support > occasional file transfers through crappy USB thumb drives (the kind > which a nation state actor would to scatter in the parking lot of > their target), the performance doesn't really matter. Certainly these > are the ones which apply for the Android and ChromeOS use cases. Would this have sufficient performance for backups? > I suppose there is the use case of people who are running Adobe > Lightroom Classic on their Macbook Air where they are using an > external SSD because Apple's storage pricing is highway robbery, but > (a) it's MacOS, not Linux, and (b) this is arguably a much smaller > percentage of the use case cases in terms of millions and millions of > Android and Chrome Users. Most of the more naive Mac users probably > just pay $$$ to Apple and don't use external storage anyway. :-) > > > There are other options, like "run the filesystem in a tightly sandboxed > > userspace process, especially compiled through WebAssembly". The > > difficulty is making them sufficiently performant for distributions to > > actually use them. > > I suspect that using a kernel file system running in a guest VM and > then making it available via 9pfs would be far more performant than > something involving FUSE. But the details would all be in the > implementation, and the skill level of the engineer doing the work. Why do you suspect this? I'm genuinely curious, especially because my understanding is that virtiofs (which uses the FUSE protocol internally) is considered faster than 9pfs. > I'll also note that since you are mentioning Chrome OS and Android a > lot, there seems to be a lot of interest in using VM's as a security > boundary (see CrosVM[1] which is a Rust-based VMM). So it's likely > that this infrastructure would be available to you if you are doing > work in this area. > > [1] https://github.com/google/crosvm The need to resort to virtualization as a security boundary makes me wonder if Linux is designed for outdated threat models and security paradigms. Sadly, changing the threat model would be extremely expensive today. -- Sincerely, Demi Marie Obenour (she/her/hers) Invisible Things Lab [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Unprivileged filesystem mounts 2025-03-19 17:32 ` Demi Marie Obenour @ 2025-03-19 20:11 ` Theodore Ts'o 0 siblings, 0 replies; 22+ messages in thread From: Theodore Ts'o @ 2025-03-19 20:11 UTC (permalink / raw) To: Demi Marie Obenour Cc: Dave Chinner, cve, gnoack, gregkh, kent.overstreet, linux-bcachefs, linux-fsdevel, linux-security-module, mic, Demi Marie Obenour On Wed, Mar 19, 2025 at 01:32:59PM -0400, Demi Marie Obenour wrote: > > I suspect that using a kernel file system running in a guest VM and > > then making it available via 9pfs would be far more performant than > > something involving FUSE. But the details would all be in the > > implementation, and the skill level of the engineer doing the work. > > Why do you suspect this? I'm genuinely curious, especially because my > understanding is that virtiofs (which uses the FUSE protocol internally) > is considered faster than 9pfs. I was saying that 9pfs is faster than fuse. Yes, virtiofs would be faster than 9pfs. No question. However, it might be harder to audit the virtiofs client implementation given the virtiofs ring buffer interface to make sure it is free of potential security exploits.9pfs would be simpler to reassure folks that it is safe(tm). > The need to resort to virtualization as a security boundary makes me > wonder if Linux is designed for outdated threat models and security > paradigms. Sadly, changing the threat model would be extremely > expensive today. I wouldn't say that it's specific to Linux; for many, MANY, MANY decades, the disk drive was considered within the Trusted Computing Boundary. This was true for Multics; VMS; Unix, and other operating systems that were certified to the Trusted Computing System Evaluation Criteria (aka the "Orange Book") to the B1 and B2 certification Ejecting the storage device so it is outside the TCB is a huge change in the threat model, especially given that for a long time people have made performance, including simultaneous modifications to the same file, the primary requirement for most file systems. If we want to make a single, simple file system that is good enough for file exchange and backup, where we only need to optimize for sequental, single-threaded I/O, and for low-cost or moderate-cost flash devices, that's a much simpler sort of file system that we could secure against this modified threat model. However, given how much companies have always been massively stingy about funding file system development (and these days, anything which isn't AI :-), I suspect a sandbox/VM approach is going to be a much more cost effective approach. But I'm happy to be proven wrong, if some company is willing to fund the effort --- let's see the names and we can invite them into the relevant collaboration forums, such as the weekly ext4 video conference if it's appropriate. However, just having security people kvetching on open source mailing lists, or raising syzbot bugs for threat models that the file system maintainers had never agreed to, and then trying to bully or shame volunteers to do the work for free is, I would argue, not productive. Cheers, - Ted ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Unprivileged filesystem mounts 2025-03-11 20:10 ` Demi Marie Obenour 2025-03-18 5:21 ` Dave Chinner @ 2025-03-18 22:11 ` Theodore Ts'o 2025-03-19 17:44 ` Demi Marie Obenour 1 sibling, 1 reply; 22+ messages in thread From: Theodore Ts'o @ 2025-03-18 22:11 UTC (permalink / raw) To: Demi Marie Obenour Cc: Dave Chinner, cve, gnoack, gregkh, kent.overstreet, linux-bcachefs, linux-fsdevel, linux-security-module, mic, Demi Marie Obenour On Tue, Mar 11, 2025 at 04:10:42PM -0400, Demi Marie Obenour wrote: > > Why is it not possible to provide that guarantee? I'm not concerned > about infinite loops or deadlocks. Is there a reason it is not possible > to prevent memory corruption? Companies and users are willing to pay to improve performance for file systems. q(For example, we have been working for Cloud services that are interested in improving the performance of their first party database products using the fact with cloud emulated block devices, we can guarantee that 16k write won't be torn, and this can resul;t in significant database performance.) However, I have *yet* to see any company willing to invest in hardening file systems against maliciously modified file system images. We can debate how much it might cost it to harden a file system, but given how much companies are willing to pay --- zero --- it's mostly an academic question. In addition, if someone made a file system which is guaranteed to be safe, but it had massive performance regressions relative other file systems --- it's unclear how many users or system administrators would use it. And we've seen that --- there are known mitigations for CPU cache attacks which are so expensive, that companies or end users have chosen not to enable them. Yes, there are some security folks who believe that security is the most important thing, uber alles. Unfortunately, those people tend not to be the ones writing the checks or authorizing hiring budgets. That being said, if someone asked me if it was best way to invest software development dollars --- I'd say no. Don't get me wrong, if someone were to give me some minions tasked to harden ext4, I know how I could keep them busy and productive. But a more cost effective way of addressing the "untrusted file sytem problem" would be: (a) Run a forced fsck to check the file system for inconsistency before letting the file system be mounted. (b) Mount the file system in a virtual machine, and then make it available to the host using something like 9pfs. 9pfs is very simple file system which is easy to validate, and it's a strategy used by gVisor's file system gopher. These two approaches are complementary, with (a) being easier, and (b) probably a bit more robust from a security perspective, but it a bit more work --- with both providing a layered approach. > > In this situation, the choice of what to do *must* fall to the user, > > but the argument for "filesystem corruption is a CVE-worthy bug" is > > that the choice has been taken away from the user. That's what I'm > > saying needs to change - the choice needs to be returned to the > > user... Users can alwayus do stupid things. For example, they could download a random binary from the web, then execute it. We've seen very popular software which is instaled via "curl <URL> | bash". Should we therefore call bash be a CVE-vulnerability? Realistically, this is probably a far bigger vulnerability if we're talking about stupid user tricks. ("But.... but... but... users need to be able to install software" --- we can't stop them from piping the output of curl into bash.) Which is another reason why I don't really blame the VP's that are making funding decisions; it's not clear that the ROI of funding file system security hardening is the best way to spend a company's dollars. Remember, Zuckerburg has been quoted as saying that he's laying off engineers so his company can buy more GPU's, we know that funding is not infinite. Every company is making ROI decisions; you might not agree with the decisions, but trust me, they're making them. But if some company would like to invest software engineering effort in addition features or perform security hardening --- they should contact me, and I'd be happy to chat. We have weekly ext4 video conference calls, and I'm happy to collaborate with companies have a business interest in seeing some feature get pursued. There *have* been some that are security related --- fscrypt and fsverity were both implemented for ext4 first, in support of Android and ChromeOS's security use cases. But in practice this has been the exception, and not the rule. > Not automounting filesystems on hotplug is a _part_ of the solution. > It cannot be the _entire_ solution. Users sometimes need to be able to > interact with untrusted filesystem images with a reasonable speed. Running fsck on a file system *before* automounting file systems would be a pretty decent start towards a solution. Is it perfect? No. But it would provide a huge amount of protection. Note that this won't help if you have a malicious hardware that *pretends* to be a USB storage device, but which doens't behave a like a honest storage device. For example, reading a particular sector with one data at time T, and a different data at time T+X, with no intervening writes. There is no real defense to this attack, since there is no way that you can authentiate the external storage device; you could have a registry of USB vendor and model id's, but a device can always lie about its id numbers. If you are worried about this kind of attack, the only thing you can do is to prevent external USB devices from being attached. This *is* something that you can do with Chrome and Android enterprise security policies, and, I've talked to a bank's senior I/T leader that chose to put epoxy in their desktop, to mitigate aginst a whole *class* of USB security attacks. Like everything else, security and usability and performance and costs are all engineering tradeoffs. So what works for one use case and threat model won't be optimal for another, just as fscrypt works well for Android and ChromeOS, but it doesn't necessarily work well for other use cases (where I might recommed dm-crypt instead). Cheers, - Ted ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Unprivileged filesystem mounts 2025-03-18 22:11 ` Theodore Ts'o @ 2025-03-19 17:44 ` Demi Marie Obenour 2025-03-19 21:25 ` Theodore Ts'o 0 siblings, 1 reply; 22+ messages in thread From: Demi Marie Obenour @ 2025-03-19 17:44 UTC (permalink / raw) To: Theodore Ts'o Cc: Dave Chinner, cve, gnoack, gregkh, kent.overstreet, linux-bcachefs, linux-fsdevel, linux-security-module, mic, Demi Marie Obenour [-- Attachment #1: Type: text/plain, Size: 7688 bytes --] On Tue, Mar 18, 2025 at 06:11:28PM -0400, Theodore Ts'o wrote: > On Tue, Mar 11, 2025 at 04:10:42PM -0400, Demi Marie Obenour wrote: > > > > Why is it not possible to provide that guarantee? I'm not concerned > > about infinite loops or deadlocks. Is there a reason it is not possible > > to prevent memory corruption? > > Companies and users are willing to pay to improve performance for file > systems. q(For example, we have been working for Cloud services that > are interested in improving the performance of their first party > database products using the fact with cloud emulated block devices, we > can guarantee that 16k write won't be torn, and this can resul;t in > significant database performance.) > > However, I have *yet* to see any company willing to invest in > hardening file systems against maliciously modified file system > images. We can debate how much it might cost it to harden a file > system, but given how much companies are willing to pay --- zero --- > it's mostly an academic question. Google _ought_ to be willing to pay for ext4 and f2fs. Have you asked ChromeOS and Android security about this? Exploits involving malicious filesystem images are in scope for their bug bounty programs. > In addition, if someone made a file system which is guaranteed to be > safe, but it had massive performance regressions relative other file > systems --- it's unclear how many users or system administrators would > use it. And we've seen that --- there are known mitigations for CPU > cache attacks which are so expensive, that companies or end users have > chosen not to enable them. Yes, there are some security folks who > believe that security is the most important thing, uber alles. > Unfortunately, those people tend not to be the ones writing the checks > or authorizing hiring budgets. > > That being said, if someone asked me if it was best way to invest > software development dollars --- I'd say no. Don't get me wrong, if > someone were to give me some minions tasked to harden ext4, I know how > I could keep them busy and productive. But a more cost effective way > of addressing the "untrusted file sytem problem" would be: > > (a) Run a forced fsck to check the file system for inconsistency > before letting the file system be mounted. > > (b) Mount the file system in a virtual machine, and then make it > available to the host using something like 9pfs. 9pfs is very simple > file system which is easy to validate, and it's a strategy used by > gVisor's file system gopher. > > These two approaches are complementary, with (a) being easier, and (b) > probably a bit more robust from a security perspective, but it a bit > more work --- with both providing a layered approach. Definitely a good idea. > > > In this situation, the choice of what to do *must* fall to the user, > > > but the argument for "filesystem corruption is a CVE-worthy bug" is > > > that the choice has been taken away from the user. That's what I'm > > > saying needs to change - the choice needs to be returned to the > > > user... > > Users can alwayus do stupid things. For example, they could download > a random binary from the web, then execute it. We've seen very > popular software which is instaled via "curl <URL> | bash". Should we > therefore call bash be a CVE-vulnerability? > > Realistically, this is probably a far bigger vulnerability if we're > talking about stupid user tricks. ("But.... but... but... users need > to be able to install software" --- we can't stop them from piping the > output of curl into bash.) Which is another reason why I don't really > blame the VP's that are making funding decisions; it's not clear that > the ROI of funding file system security hardening is the best way to > spend a company's dollars. Remember, Zuckerburg has been quoted as > saying that he's laying off engineers so his company can buy more > GPU's, we know that funding is not infinite. Every company is making > ROI decisions; you might not agree with the decisions, but trust me, > they're making them. > > But if some company would like to invest software engineering effort > in addition features or perform security hardening --- they should > contact me, and I'd be happy to chat. We have weekly ext4 video > conference calls, and I'm happy to collaborate with companies have a > business interest in seeing some feature get pursued. There *have* > been some that are security related --- fscrypt and fsverity were both > implemented for ext4 first, in support of Android and ChromeOS's > security use cases. But in practice this has been the exception, and > not the rule. Android and ChromeOS do _not_ allow you to run curl <URL> | bash, at least outside of a VM. > > Not automounting filesystems on hotplug is a _part_ of the solution. > > It cannot be the _entire_ solution. Users sometimes need to be able to > > interact with untrusted filesystem images with a reasonable speed. > > Running fsck on a file system *before* automounting file systems would > be a pretty decent start towards a solution. Is it perfect? No. But > it would provide a huge amount of protection. > > Note that this won't help if you have a malicious hardware that > *pretends* to be a USB storage device, but which doens't behave a like > a honest storage device. For example, reading a particular sector > with one data at time T, and a different data at time T+X, with no > intervening writes. There is no real defense to this attack, since > there is no way that you can authentiate the external storage device; > you could have a registry of USB vendor and model id's, but a device > can always lie about its id numbers. This attack can be defended against by sandboxing the filesystem driver and copying files to trusted storage before using them. You can authenticate devices based on what port they are plugged into, and Qubes OS is working on exactly that. > If you are worried about this kind of attack, the only thing you can > do is to prevent external USB devices from being attached. This *is* > something that you can do with Chrome and Android enterprise security > policies, and, I've talked to a bank's senior I/T leader that chose to > put epoxy in their desktop, to mitigate aginst a whole *class* of USB > security attacks. Or you can disable your firmware's USB stack and ensure that USB devices are only attached to virtual machines. Dasharo allows the former, and Qubes OS allows the latter. (Disclaimer: I work on Qubes OS). > Like everything else, security and usability and performance and costs > are all engineering tradeoffs. So what works for one use case and > threat model won't be optimal for another, just as fscrypt works well > for Android and ChromeOS, but it doesn't necessarily work well for > other use cases (where I might recommed dm-crypt instead). Is the tradeoff fundamental, or is it a consequence of Linux being a monolithic kernel? If Linux were a microkernel and every filesystem driver ran as a userspace process with no access to anything but the device it is accessing, then there would be no tradeoff when it comes to filesystems: a compromised filesystem driver would have no more access than the device itself would, so compromising a filesystem driver would be of much less value to an attacker. There is still the problem that plug and play is incompatible with not trusting devices to identify themselves, but that's a different concern. -- Sincerely, Demi Marie Obenour (she/her/hers) Invisible Things Lab [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Unprivileged filesystem mounts 2025-03-19 17:44 ` Demi Marie Obenour @ 2025-03-19 21:25 ` Theodore Ts'o 2025-03-20 6:26 ` Demi Marie Obenour 0 siblings, 1 reply; 22+ messages in thread From: Theodore Ts'o @ 2025-03-19 21:25 UTC (permalink / raw) To: Demi Marie Obenour Cc: Dave Chinner, cve, gnoack, gregkh, kent.overstreet, linux-bcachefs, linux-fsdevel, linux-security-module, mic, Demi Marie Obenour On Wed, Mar 19, 2025 at 01:44:13PM -0400, Demi Marie Obenour wrote: > > Note that this won't help if you have a malicious hardware that > > *pretends* to be a USB storage device, but which doens't behave a like > > a honest storage device. For example, reading a particular sector > > with one data at time T, and a different data at time T+X, with no > > intervening writes. There is no real defense to this attack, since > > there is no way that you can authentiate the external storage device; > > you could have a registry of USB vendor and model id's, but a device > > can always lie about its id numbers. > > This attack can be defended against by sandboxing the filesystem driver > and copying files to trusted storage before using them. You can > authenticate devices based on what port they are plugged into, and Qubes > OS is working on exactly that. Copying files to trusted storge is not sufficient. The problem is that an untrustworthy storage device can still play games with metadata blocks. If you are willing to copy the entire storage device to trustworthy storage, and then run fsck on the file system, and then mount it, then *sure* that would help. But if the storage device is very large or very slow, this might not be practical. > > Like everything else, security and usability and performance and costs > > are all engineering tradeoffs.... > > Is the tradeoff fundamental, or is it a consequence of Linux being a > monolithic kernel? If Linux were a microkernel and every filesystem > driver ran as a userspace process with no access to anything but the > device it is accessing, then there would be no tradeoff when it comes to > filesystems: a compromised filesystem driver would have no more access > than the device itself would, so compromising a filesystem driver would > be of much less value to an attacker. There is still the problem that > plug and play is incompatible with not trusting devices to identify > themselves, but that's a different concern. Microkernels have historically been a performance disaster. Yes, you can invest a *vast* amount of effort into trying to make a microkernel OS more performant, but in the meantime, the competing monolithic kernel will have gotten even faster, or added more features, leaving the microkernel in the dust. The effort needed to create a new file system from scratch, taking it all the way from the initial design, implementation, testing and performance tuning, and making it something customers are comfortable depending on it for enterprise workloads is between 50 and 100 engineer years. This estimate came from looking at the development effort needed for various file systems implemented on monolithic kernels, including Digital's Advfs (part of Digital Unix and OSF/1), IBM's AIX, and Sun's ZFS, as well as GPFS from IBM (although that was a cluster file sytem, and the effort estimated from my talking to the engineering managers and tech leads was around 200 PY's.) I'm not sure how much harder it will be to make a performant file system which is suitable for enterprise workloads from a performance, feature, and stability perspective, *and* to make it secure against storage devices which are outside the TCB, *and* to make it work on a microkernel. But I'm going to guess it would inflate these effort estimates by at least 50%, if not more. Of course, if we're just witing a super simple file system that is suitable for backups and file transfers, but not much else, that would probably take much less efort. But if we need to support file exchange with storge devices with NTFS or HFS, thos aren't simple file sytes. So the VM sandbox approach might still be the better way to go. Cheers, - Ted ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Unprivileged filesystem mounts 2025-03-19 21:25 ` Theodore Ts'o @ 2025-03-20 6:26 ` Demi Marie Obenour 2025-03-20 16:00 ` Theodore Ts'o 0 siblings, 1 reply; 22+ messages in thread From: Demi Marie Obenour @ 2025-03-20 6:26 UTC (permalink / raw) To: Theodore Ts'o Cc: Dave Chinner, cve, gnoack, gregkh, kent.overstreet, linux-bcachefs, linux-fsdevel, linux-security-module, mic, Demi Marie Obenour [-- Attachment #1: Type: text/plain, Size: 5976 bytes --] On Wed, Mar 19, 2025 at 05:25:17PM -0400, Theodore Ts'o wrote: > On Wed, Mar 19, 2025 at 01:44:13PM -0400, Demi Marie Obenour wrote: > > > Note that this won't help if you have a malicious hardware that > > > *pretends* to be a USB storage device, but which doens't behave a like > > > a honest storage device. For example, reading a particular sector > > > with one data at time T, and a different data at time T+X, with no > > > intervening writes. There is no real defense to this attack, since > > > there is no way that you can authentiate the external storage device; > > > you could have a registry of USB vendor and model id's, but a device > > > can always lie about its id numbers. > > > > This attack can be defended against by sandboxing the filesystem driver > > and copying files to trusted storage before using them. You can > > authenticate devices based on what port they are plugged into, and Qubes > > OS is working on exactly that. > > Copying files to trusted storge is not sufficient. The problem is > that an untrustworthy storage device can still play games with > metadata blocks. If you are willing to copy the entire storage device > to trustworthy storage, and then run fsck on the file system, and then > mount it, then *sure* that would help. But if the storage device is > very large or very slow, this might not be practical. Copying flles is not sufficient on its own. You need to _also_ sandbox the file system driver, which defeats the attack you mentioned above: the attacker can compromise the VM running the file system, but that doesn't give the attacker anything particularly useful. > > > Like everything else, security and usability and performance and costs > > > are all engineering tradeoffs.... > > > > Is the tradeoff fundamental, or is it a consequence of Linux being a > > monolithic kernel? If Linux were a microkernel and every filesystem > > driver ran as a userspace process with no access to anything but the > > device it is accessing, then there would be no tradeoff when it comes to > > filesystems: a compromised filesystem driver would have no more access > > than the device itself would, so compromising a filesystem driver would > > be of much less value to an attacker. There is still the problem that > > plug and play is incompatible with not trusting devices to identify > > themselves, but that's a different concern. > > Microkernels have historically been a performance disaster. Yes, you > can invest a *vast* amount of effort into trying to make a microkernel > OS more performant, but in the meantime, the competing monolithic > kernel will have gotten even faster, or added more features, leaving > the microkernel in the dust. The L4 family of microkernels, and especially seL4, show that microkernels do not need to be slow. I do agree that making a microkernel-based OS fast is hard, but on the other hand, running an entire Linux VM just to host a single application isn't exactly an efficient use of resources either. The latter is what systems like Kata containers wind up doing. > The effort needed to create a new file system from scratch, taking it > all the way from the initial design, implementation, testing and > performance tuning, and making it something customers are comfortable > depending on it for enterprise workloads is between 50 and 100 > engineer years. This estimate came from looking at the development > effort needed for various file systems implemented on monolithic > kernels, including Digital's Advfs (part of Digital Unix and OSF/1), > IBM's AIX, and Sun's ZFS, as well as GPFS from IBM (although that was > a cluster file sytem, and the effort estimated from my talking to the > engineering managers and tech leads was around 200 PY's.) > > I'm not sure how much harder it will be to make a performant file > system which is suitable for enterprise workloads from a performance, > feature, and stability perspective, *and* to make it secure against > storage devices which are outside the TCB, *and* to make it work on a > microkernel. But I'm going to guess it would inflate these effort > estimates by at least 50%, if not more. My understanding is that "Secure against storage devices which are outside the TCB" mostly requires 2 things: 1. Either a programming language in which memory safety vulnerabilities are difficult to introduce by accident, or a sandbox that ensures that a compromised file system driver cannot do more than cause file system operations to return wrong results. 2. A way to kill a file system that is caught in an infinite loop, is eating too much memory, or is otherwise the victim of a denial of service attack without crashing the whole system. This is not needed if denial of service attacks are outside of your threat model. I'm not asking you (or anyone else) to write a filesystem driver that has no bugs in the face of arbitrarily corrupted input. I _expect_ that there will be bugs in this case. Right now, Linux kernel file systems are written in C and run in the kernel, which means that a bug can easily result in a complete system compromise. > Of course, if we're just witing a super simple file system that is > suitable for backups and file transfers, but not much else, that would > probably take much less efort. But if we need to support file > exchange with storge devices with NTFS or HFS, thos aren't simple file > sytes. So the VM sandbox approach might still be the better way to go. Certainly the VM sandbox is the simplest approach in the short term. P.S.: For all that I may disagree with you on a lot of things, I am very grateful for all the work you have put into making ext4 as solid a filesystem as it is, as well as for your other innovations (like creating /dev/{u,}random). -- Sincerely, Demi Marie Obenour (she/her/hers) Invisible Things Lab [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Unprivileged filesystem mounts 2025-03-20 6:26 ` Demi Marie Obenour @ 2025-03-20 16:00 ` Theodore Ts'o 0 siblings, 0 replies; 22+ messages in thread From: Theodore Ts'o @ 2025-03-20 16:00 UTC (permalink / raw) To: Demi Marie Obenour Cc: Dave Chinner, cve, gnoack, gregkh, kent.overstreet, linux-bcachefs, linux-fsdevel, linux-security-module, mic, Demi Marie Obenour On Thu, Mar 20, 2025 at 02:26:41AM -0400, Demi Marie Obenour wrote: > The L4 family of microkernels, and especially seL4, show that > microkernels do not need to be slow. With all due respect to folks who have wrked on L4 and its derivatives, L4 is a research prototype. The gap between a research prototype and something that can actually be used in wide variety of use cases, from smart watches, to mainframes, is... large. If some company is willing to fund such work, I'd be very interested to see what they can come up with. I will note that Google has tried dabbling in this space with Fuchsia, and getting to something that can actually be shipped in a product has been a very long road. To their credit, they have managed to do this for a version of Nest Hub, but most people would say that it is very far from being suitable for Android or Chrome OS, and supprting data center workloads was explicitly a non-goal by the Fuschia team. See [1] for more details. In 2018, it was reported that Google had over 100 engineers working on Fuchsia starting in 2016, with the hopes that it would be ready "in 5 years". Per [2], apparently in 2024 Fuschia "is not dead", but work has slowed and there aren't as many people working on it. (Disclosure: I work at Google but all of my recent knowledge about Fuchsia comes from news reports; the last time I talked to anyone on the Fuchsia team was well before COVID.) [1] https://www.bloomberg.com/news/articles/2018-07-19/google-team-is-said-to-plot-android-successor-draw-skepticism [2] https://www.reddit.com/r/Fuchsia/comments/1g7x2vs/what_happened_to_fuchsia/ > I do agree that making a microkernel-based OS fast is hard, but on > the other hand, running an entire Linux VM just to host a single > application isn't exactly an efficient use of resources either. Well, if you want to try to make a business case to VP's with estimates of how many engineers this would require, probably in a sustained effort taking at least 5 to 10 years, I cordially invite you to make the attempt. :-) Given how cheap hardware has been geting, running multiple VM's on an Android phone or a ChromeOS laptop might not actally be that expensive, relative to the cost of the required number of software engineers for some of the alternatives we've discussed on this thread. There are ways that you can share the read-only text pages for the kernel, etc., to optimize the overhead of the VM, for exaple. It is also much easier to collavorate with SOC designers to create hardware optimizations for a VM abstraction, as compared to creating hardwae optmizations for a software-level OS abstraction such as a container or microkernel task. So I don't think it's a safe assumption that VM overheads will always be unacceptable relative to the alternatives. Cheers, - Ted ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: CVE-2025-21830: landlock: Handle weird files 2025-03-10 23:42 ` CVE-2025-21830: landlock: Handle weird files Dave Chinner 2025-03-11 2:09 ` Kent Overstreet 2025-03-11 2:19 ` Unprivileged filesystem mounts Demi Marie Obenour @ 2025-03-11 6:53 ` Greg Kroah-Hartman 2 siblings, 0 replies; 22+ messages in thread From: Greg Kroah-Hartman @ 2025-03-11 6:53 UTC (permalink / raw) To: Dave Chinner Cc: Mickaël Salaün, cve, Günther Noack, linux-security-module, Kent Overstreet, linux-bcachefs, linux-fsdevel On Tue, Mar 11, 2025 at 10:42:41AM +1100, Dave Chinner wrote: > Greg, you have the ability to issue a CVE that will require > downstream distros to fix userspace-based vulnerabilities if they > want various certifications. You have the power to force downstream > distros to -change their security model policies- for the wider > good. > > We could knock out this whole class of vulnerability in one CVE: > issue a CVE considering the auto-mounting of untrusted filesystem > images as a *critical system vulnerability*. This can only be solved > by changing the distro policies and implementations that allow this > dangerous behaviour to persist. I wish we could do that, but remember, we can not tell people how to use Linux. We have no "control" over that at all. All we can do is point out "here is a potential vulnerability, it might be applicable to you, or you might not, depending on your use case, it's up to you to figure it out". And we do that by issuing CVEs. Heck, if we could dictate use, I would issue a "stop using panic on warn you fools!" CVE right now which would instantly get rid of a huge percentage of all kernel CVEs out there. Smart users of Linux do disable that, and so they are not vulnerable to those at all. Remember, we issue on average, 11-13 CVEs a day, here's our most recent numbers: === CVEs Published in Last 6 Months === October 2024: 427 CVEs November 2024: 280 CVEs December 2024: 358 CVEs January 2025: 234 CVEs February 2025: 929 CVEs March 2025: 56 CVEs === Overall Averages === Average CVEs per month: 415.99 Average CVEs per week: 95.64 Average CVEs per day: 13.66 So don't get all worried about individual CVEs, unless you all think they are not valid at all, which we are glad to revoke. > At worst, this makes the reason you give for filesystem corruption > issues being considered CVE worthy go away completely. Filesystem corruption or data loss is not considered a vulnerability by cve.org, so we do not track them at this point in time. However other group's requirements might require this in the future, so this might change (i.e. the CRA law in Europe.) thanks, greg k-h ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2025-03-20 16:00 UTC | newest] Thread overview: 22+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <2025030611-CVE-2025-21830-da64@gregkh> [not found] ` <20250310.ooshu9Cha2oo@digikod.net> [not found] ` <2025031034-savanna-debit-eb8e@gregkh> 2025-03-10 23:42 ` CVE-2025-21830: landlock: Handle weird files Dave Chinner 2025-03-11 2:09 ` Kent Overstreet 2025-03-11 4:24 ` Dave Chinner 2025-03-11 10:50 ` Kent Overstreet 2025-03-11 2:19 ` Unprivileged filesystem mounts Demi Marie Obenour 2025-03-11 5:57 ` Dave Chinner 2025-03-11 11:01 ` Christian Brauner 2025-03-11 17:36 ` Al Viro 2025-03-11 17:43 ` Kent Overstreet 2025-03-11 17:54 ` Eric Biggers 2025-03-11 20:10 ` Demi Marie Obenour 2025-03-18 5:21 ` Dave Chinner 2025-03-19 14:55 ` Demi Marie Obenour 2025-03-19 16:59 ` Theodore Ts'o 2025-03-19 17:32 ` Demi Marie Obenour 2025-03-19 20:11 ` Theodore Ts'o 2025-03-18 22:11 ` Theodore Ts'o 2025-03-19 17:44 ` Demi Marie Obenour 2025-03-19 21:25 ` Theodore Ts'o 2025-03-20 6:26 ` Demi Marie Obenour 2025-03-20 16:00 ` Theodore Ts'o 2025-03-11 6:53 ` CVE-2025-21830: landlock: Handle weird files Greg Kroah-Hartman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).