From: Demi Marie Obenour <demiobenour@gmail.com>
To: Gao Xiang <hsiangkao@linux.alibaba.com>,
"Darrick J. Wong" <djwong@kernel.org>
Cc: Miklos Szeredi <miklos@szeredi.hu>,
linux-fsdevel@vger.kernel.org,
Joanne Koong <joannelkoong@gmail.com>,
John Groves <John@groves.net>, Bernd Schubert <bernd@bsbernd.com>,
Amir Goldstein <amir73il@gmail.com>,
Luis Henriques <luis@igalia.com>,
Horst Birthelmer <horst@birthelmer.de>,
Gao Xiang <xiang@kernel.org>,
lsf-pc@lists.linux-foundation.org
Subject: Re: [LSF/MM/BPF TOPIC] Where is fuse going? API cleanup, restructuring and more
Date: Sun, 22 Mar 2026 01:13:21 -0400 [thread overview]
Message-ID: <b30397b4-df0d-4f96-89a6-9c90aad10bd6@gmail.com> (raw)
In-Reply-To: <390cd031-742b-4f1b-99c4-8ee41a259744@linux.alibaba.com>
[-- Attachment #1.1.1: Type: text/plain, Size: 9023 bytes --]
On 3/22/26 00:51, Gao Xiang wrote:
>
>
> On 2026/3/22 11:25, Demi Marie Obenour wrote:
>
> ...
>
>>>
>>> Technically speaking fuse4fs could just invoke e2fsck -fn before it
>>> starts up the rest of the libfuse initialization but who knows if that's
>>> an acceptable risk. Also unclear if you actually want -fy for that.
>>
>
> Let me try to reply the remaining part:
>
>> To me, the attacks mentioned above are all either user error,
>> or vulnerabilities in software accessing the filesystem. If one
>
> There are many consequences if users try to use potential inconsistent
> writable filesystems directly (without full fsck), what I can think
> out including but not limited to:
>
> - data loss (considering data block double free issue);
> - data theft (for example, users keep sensitive information in the
> workload in a high permission inode but it can be read with
> low permission malicious inode later);
> - data tamper (the same principle).
>
> All vulnerabilities above happen after users try to write the
> inconsistent filesystem, which is hard to prevent by on-disk
> design.
>
> But if users write with copy-on-write to another local consistent
> filesystem, all the vulnerabilities above won't exist.
That makes sense! Is this because the reads are at least
deterministic?
>> doesn't trust a filesystem image, then any data from the filesystem
>> can't be trusted either. The only exception is if one can verify
>
> I don't think trustiness is the core part of this whole topic,
> because Linux namespace & cgroup concepts are totally _invented_
> for untrusted or isolated workloads.
>
> If you untrust some workload, fine, isolate into another
> namespace: you cannot strictly trust anything.
>
> The kernel always has bugs, but is that the real main reason
> you never run untrusted workloads? I don't think so.
I always use VMs for untrusted workloads.
>> the data cryptographically, which is what fsverity is for.
>> If the filesystem is mounted r/o and the image doesn't change, one
>> could guarantee that accessing the filesystem will at least return
>> deterministic results even for corrupted images. That's something that
>> would need to be guaranteed by individual filesystem implementations,
>> though.
>
> I just want to say that the real problem with generic writable
> filesystems is that their on-disk design makes it difficult to
> prevent or detect harmful inconsistencies.
>
> First, the on-disk format includes redundant metadata and even
> malicious journal metadata (as I mentioned in previous emails).
> This makes it hard to determine whether the filesystem is
> inconsistent without performing a full disk scan, which takes
> much long time.
>
> Of course, you could mount severely inconsistent writable
> filesystems in read-only (RO) mode. However, they are still
> inconsistent by definition according to their formal on-disk
> specifications. Furthermore, the runtime kernel implementatio
> mixes read-write and read-only logic within the same
> codebase, which complicates the practical consequences.
>
> Due to immutable filesystem designs, almost all typical severe
> inconsistencies cannot happen by design or be regard as harmful.
> I believe the core issue is not trustworthiness; even with
> an untrusted workload, you should be able to audit it easily.
> However, severely inconsistent writable filesystems make such
> auditability much harder.
That actually makes a lot of sense. I had not considered the journal,
which means one must modify the disk image just to mount it.
>> See the end of this email for a long note about what can and cannot
>> be guaranteed in the face of corrupt or malicious filesystem images.
>>
>>>> "that is not the case that we will handle with userspace FUSE
>>>> drivers, because the metadata is serious broken"), the only way to
>>>> resolve such attack vectors is to run
>>>>
>>>> the full-scan fsck consistency check and then mount "rw"
>>>>
>>>> or
>>>>
>>>> using the immutable filesystem like EROFS (so that there will not
>>>> be such inconsisteny issues by design) and isolate the entire write
>>>> traffic with a full copy-on-write mechanism with OverlayFS for
>>>> example (IOWs, to make all write copy-on-write into another trusted
>>>> local filesystem).
>>>
>>> (Yeah, that's probably the only way to go for prepopulated images like
>>> root filesystems and container packages)
>>
>> Even an immutable filesystem can still be corrupt.
>>
>>>> I hope it's a valid case, and that can indeed happen if the arbitary
>>>> generic filesystem can be mounted in "rw". And my immutable image
>>>> filesystem idea can help mitigate this too (just because the immutable
>>>> image won't be changed in any way, and all writes are always copy-up)
>>>
>>> That, we agree on :)
>>
>> Indeed, expecting writes to a corrupt filesystem to behave reasonably
>> is very foolish.
>>
>> Long note starts here: There is no *fundamental* reason that a crafted
>> filesystem image must be able to cause crashes, memory corruption, etc.
>
> I still think those kinds of security risks just of implementation
> bugs are the easist part of the whole issue.
>
> Many linux kernel bugs can cause crashes, memory corruption, why
> crafted filesystems need to be specially considered?
In the past, filesystem implementations have often not focused on
this. The Linux Kernel CVE team does not issue CVEs for such bugs.
>> This applies even if the filesystem image may be written to while
>> mounted. It is always *possible* to write a filesystem such that
>> it never trusts anything it reads from disk and assumes each read
>> could return arbitrarily malicious results.
>
> Linux namespaces are invented for those kind of usage, the broken
> archive images return garbage data or even archive images can be
> changed randomly at runtime, what's the real impacts if they are
> isolated by the namespaces?
None! Regardless of whether one considers namespaces sufficient
for isolating malicious code, they can definitely isolate filesystem
operations very well.
>> Right now, many filesystem maintainers do not consider this to be a
>> priority. Even if they did, I don't think *anyone* (myself included)
>> could write a filesystem implementation in C that didn't have memory
>> corruption flaws. The only exceptions are if the filesystem is
>
> I think this is still falling into the aspect of implementation
> bugs, my question is simply: "why filesystem is special in this
> kind of area, there are many other kernel subsystems in C which
> can receive untrusted data, like TCP/IP stack", why filesystem
> is special for particular memory corruption flaws?
See above - the difference is that filesystems have historically
not been written with untrusted input in mind. This, of course,
can be changed.
> I really think different aspects are often mixed when this topic
> is mentioned, which makes the discussion getting more and more
> divergent.
I agree.
> If we talk about implementation bugs, I think filesystem is not
> special, but as I said, I think the main issue is the writable
> filesystem on-disk format design, due to the design, there are
> many severe consequences out of inconsistent filesystems.
It definitely makes things much harder, and dramatically increases
the attack surface.
Most uses I have (notably backups) have a hard requirement for writable
storage, and when they don't need it they can use dm-verity.
>> incredibly simple or formal methods are used, and neither is the
>> case for existing filesystems in the Linux kernel. By sandboxing a
>> filesystem, one ensures that an attacker who compromises a filesystem
>> implementation needs to find *another* exploit to compromise the
>> whole system.
>
> Yes, yet sandboxing is the one part, of course VM sandboxing
> is better than Linux namespace isolation, but VMs cost much.
I use a lot of VMs, but they indeed use significant resources. I hope
that at some point this can largely be solved with copy-on-write
VM forking.
> Other than sandboxing, I think auditability is important too,
> especially users provide sensitive data to new workloads.
>
> Of course, only dealing with trusted workloads is the best,
> out of question. But in the real world, we cannot always
> face complete trusted workloads. For untrusted workloads,
> we need to find reliable ways to audit them until they
> become trusted.
>
> Just like in the real world: accumulate credit, undergo
> audits, and eventually earn trust.
>
> Sorry about my English, but I hope I express my whole idea.
>
> Thanks,
> Gao Xiang
Don't worry about your English. It is completely understandable and
more than capable of getting your (very informative) points across.
--
Sincerely,
Demi Marie Obenour (she/her/hers)
[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 7253 bytes --]
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2026-03-22 5:13 UTC|newest]
Thread overview: 79+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <aYIsRc03fGhQ7vbS@groves.net>
2026-02-02 13:51 ` [LSF/MM/BPF TOPIC] Where is fuse going? API cleanup, restructuring and more Miklos Szeredi
2026-02-02 16:14 ` Amir Goldstein
2026-02-03 7:55 ` Miklos Szeredi
2026-02-03 9:19 ` [Lsf-pc] " Jan Kara
2026-02-03 10:31 ` Amir Goldstein
2026-02-04 9:22 ` Joanne Koong
2026-02-04 10:37 ` Amir Goldstein
2026-02-04 10:43 ` [Lsf-pc] " Jan Kara
2026-02-06 6:09 ` Darrick J. Wong
2026-02-21 6:07 ` Demi Marie Obenour
2026-02-21 7:07 ` Darrick J. Wong
2026-02-21 22:16 ` Demi Marie Obenour
2026-02-23 21:58 ` Darrick J. Wong
2026-02-04 20:47 ` Bernd Schubert
2026-02-06 6:26 ` Darrick J. Wong
2026-02-03 10:15 ` Luis Henriques
2026-02-03 10:20 ` Amir Goldstein
2026-02-03 10:38 ` Luis Henriques
2026-02-03 14:20 ` Christian Brauner
2026-02-03 10:36 ` Amir Goldstein
2026-02-03 17:13 ` John Groves
2026-02-04 19:06 ` Darrick J. Wong
2026-02-04 19:38 ` Horst Birthelmer
2026-02-04 20:58 ` Bernd Schubert
2026-02-06 5:47 ` Darrick J. Wong
2026-02-04 22:50 ` Gao Xiang
2026-02-06 5:38 ` Darrick J. Wong
2026-02-06 6:15 ` Gao Xiang
2026-02-21 0:47 ` Darrick J. Wong
2026-03-17 4:17 ` Gao Xiang
2026-03-18 21:51 ` Darrick J. Wong
2026-03-19 8:05 ` Gao Xiang
2026-03-22 3:25 ` Demi Marie Obenour
2026-03-22 3:52 ` Gao Xiang
2026-03-22 4:51 ` Gao Xiang
2026-03-22 5:13 ` Demi Marie Obenour [this message]
2026-03-22 5:30 ` Gao Xiang
2026-03-23 9:54 ` [Lsf-pc] " Jan Kara
2026-03-23 10:19 ` Gao Xiang
2026-03-23 11:14 ` Jan Kara
2026-03-23 11:42 ` Gao Xiang
2026-03-23 12:01 ` Gao Xiang
2026-03-23 14:13 ` Jan Kara
2026-03-23 14:36 ` Gao Xiang
2026-03-23 14:47 ` Jan Kara
2026-03-23 14:57 ` Gao Xiang
2026-03-24 8:48 ` Christian Brauner
2026-03-24 9:30 ` Gao Xiang
2026-03-24 9:49 ` Demi Marie Obenour
2026-03-24 9:53 ` Gao Xiang
2026-03-24 10:02 ` Demi Marie Obenour
2026-03-24 10:14 ` Gao Xiang
2026-03-24 10:17 ` Demi Marie Obenour
2026-03-24 10:25 ` Gao Xiang
2026-03-24 11:58 ` Demi Marie Obenour
2026-03-24 12:21 ` Gao Xiang
2026-03-26 14:39 ` Christian Brauner
2026-03-23 12:08 ` Demi Marie Obenour
2026-03-23 12:13 ` Gao Xiang
2026-03-23 12:19 ` Demi Marie Obenour
2026-03-23 12:30 ` Gao Xiang
2026-03-23 12:33 ` Gao Xiang
2026-03-22 5:14 ` Gao Xiang
2026-03-23 9:43 ` [Lsf-pc] " Jan Kara
2026-03-23 10:05 ` Gao Xiang
2026-03-23 10:14 ` Jan Kara
2026-03-23 10:30 ` Gao Xiang
2026-02-04 23:19 ` Gao Xiang
2026-02-05 3:33 ` John Groves
2026-02-05 9:27 ` Amir Goldstein
2026-02-06 5:52 ` Darrick J. Wong
2026-02-06 20:48 ` John Groves
2026-02-07 0:22 ` Joanne Koong
2026-02-12 4:46 ` Joanne Koong
2026-02-21 0:37 ` Darrick J. Wong
2026-02-26 20:21 ` Joanne Koong
2026-03-03 4:57 ` Darrick J. Wong
2026-03-03 17:28 ` Joanne Koong
2026-02-20 23:59 ` Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b30397b4-df0d-4f96-89a6-9c90aad10bd6@gmail.com \
--to=demiobenour@gmail.com \
--cc=John@groves.net \
--cc=amir73il@gmail.com \
--cc=bernd@bsbernd.com \
--cc=djwong@kernel.org \
--cc=horst@birthelmer.de \
--cc=hsiangkao@linux.alibaba.com \
--cc=joannelkoong@gmail.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=luis@igalia.com \
--cc=miklos@szeredi.hu \
--cc=xiang@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox