public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed
From: Demi Marie Obenour <demiobenour@gmail.com>
To: Gao Xiang <hsiangkao@linux.alibaba.com>,
	"Darrick J. Wong" <djwong@kernel.org>
Cc: Miklos Szeredi <miklos@szeredi.hu>,
	linux-fsdevel@vger.kernel.org,
	Joanne Koong <joannelkoong@gmail.com>,
	John Groves <John@groves.net>, Bernd Schubert <bernd@bsbernd.com>,
	Amir Goldstein <amir73il@gmail.com>,
	Luis Henriques <luis@igalia.com>,
	Horst Birthelmer <horst@birthelmer.de>,
	Gao Xiang <xiang@kernel.org>,
	lsf-pc@lists.linux-foundation.org
Subject: Re: [LSF/MM/BPF TOPIC] Where is fuse going? API cleanup, restructuring and more
Date: Sun, 22 Mar 2026 01:13:21 -0400	[thread overview]
Message-ID: <b30397b4-df0d-4f96-89a6-9c90aad10bd6@gmail.com> (raw)
In-Reply-To: <390cd031-742b-4f1b-99c4-8ee41a259744@linux.alibaba.com>


[-- Attachment #1.1.1: Type: text/plain, Size: 9023 bytes --]

On 3/22/26 00:51, Gao Xiang wrote:
> 
> 
> On 2026/3/22 11:25, Demi Marie Obenour wrote:
> 
> ...
> 
>>>
>>> Technically speaking fuse4fs could just invoke e2fsck -fn before it
>>> starts up the rest of the libfuse initialization but who knows if that's
>>> an acceptable risk.  Also unclear if you actually want -fy for that.
>>
> 
> Let me try to reply the remaining part:
> 
>> To me, the attacks mentioned above are all either user error,
>> or vulnerabilities in software accessing the filesystem.  If one
> 
> There are many consequences if users try to use potential inconsistent
> writable filesystems directly (without full fsck), what I can think
> out including but not limited to:
> 
>   - data loss (considering data block double free issue);
>   - data theft (for example, users keep sensitive information in the
>        workload in a high permission inode but it can be read with
>        low permission malicious inode later);
>   - data tamper (the same principle).
> 
> All vulnerabilities above happen after users try to write the
> inconsistent filesystem, which is hard to prevent by on-disk
> design.
> 
> But if users write with copy-on-write to another local consistent
> filesystem, all the vulnerabilities above won't exist.

That makes sense!  Is this because the reads are at least
deterministic?

>> doesn't trust a filesystem image, then any data from the filesystem
>> can't be trusted either.  The only exception is if one can verify
> 
> I don't think trustiness is the core part of this whole topic,
> because Linux namespace & cgroup concepts are totally _invented_
> for untrusted or isolated workloads.
> 
> If you untrust some workload, fine, isolate into another
> namespace: you cannot strictly trust anything.
> 
> The kernel always has bugs, but is that the real main reason
> you never run untrusted workloads? I don't think so.
I always use VMs for untrusted workloads.

>> the data cryptographically, which is what fsverity is for.
>> If the filesystem is mounted r/o and the image doesn't change, one
>> could guarantee that accessing the filesystem will at least return
>> deterministic results even for corrupted images.  That's something that
>> would need to be guaranteed by individual filesystem implementations,
>> though.
> 
> I just want to say that the real problem with generic writable
> filesystems is that their on-disk design makes it difficult to
> prevent or detect harmful inconsistencies.
> 
> First, the on-disk format includes redundant metadata and even
> malicious journal metadata (as I mentioned in previous emails).
> This makes it hard to determine whether the filesystem is
> inconsistent without performing a full disk scan, which takes
> much long time.
> 
> Of course, you could mount severely inconsistent writable
> filesystems in read-only (RO) mode.  However, they are still
> inconsistent by definition according to their formal on-disk
> specifications.  Furthermore, the runtime kernel implementatio
>   mixes read-write and read-only logic within the same
> codebase, which complicates the practical consequences.
> 
> Due to immutable filesystem designs, almost all typical severe
> inconsistencies cannot happen by design or be regard as harmful.
> I believe the core issue is not trustworthiness; even with
> an untrusted workload, you should be able to audit it easily.
> However, severely inconsistent writable filesystems make such
> auditability much harder.

That actually makes a lot of sense.  I had not considered the journal,
which means one must modify the disk image just to mount it.

>> See the end of this email for a long note about what can and cannot
>> be guaranteed in the face of corrupt or malicious filesystem images.
>>
>>>> "that is not the case that we will handle with userspace FUSE
>>>> drivers, because the metadata is serious broken"), the only way to
>>>> resolve such attack vectors is to run
>>>>
>>>> the full-scan fsck consistency check and then mount "rw"
>>>>
>>>> or
>>>>
>>>> using the immutable filesystem like EROFS (so that there will not
>>>> be such inconsisteny issues by design) and isolate the entire write
>>>> traffic with a full copy-on-write mechanism with OverlayFS for
>>>> example (IOWs, to make all write copy-on-write into another trusted
>>>> local filesystem).
>>>
>>> (Yeah, that's probably the only way to go for prepopulated images like
>>> root filesystems and container packages)
>>
>> Even an immutable filesystem can still be corrupt.
>>
>>>> I hope it's a valid case, and that can indeed happen if the arbitary
>>>> generic filesystem can be mounted in "rw".  And my immutable image
>>>> filesystem idea can help mitigate this too (just because the immutable
>>>> image won't be changed in any way, and all writes are always copy-up)
>>>
>>> That, we agree on :)
>>
>> Indeed, expecting writes to a corrupt filesystem to behave reasonably
>> is very foolish.
>>
>> Long note starts here: There is no *fundamental* reason that a crafted
>> filesystem image must be able to cause crashes, memory corruption, etc.
> 
> I still think those kinds of security risks just of implementation
> bugs are the easist part of the whole issue.
> 
> Many linux kernel bugs can cause crashes, memory corruption, why
> crafted filesystems need to be specially considered?

In the past, filesystem implementations have often not focused on
this.  The Linux Kernel CVE team does not issue CVEs for such bugs.

>> This applies even if the filesystem image may be written to while
>> mounted.  It is always *possible* to write a filesystem such that
>> it never trusts anything it reads from disk and assumes each read
>> could return arbitrarily malicious results.
> 
> Linux namespaces are invented for those kind of usage, the broken
> archive images return garbage data or even archive images can be
> changed randomly at runtime, what's the real impacts if they are
> isolated by the namespaces?

None!  Regardless of whether one considers namespaces sufficient
for isolating malicious code, they can definitely isolate filesystem
operations very well.

>> Right now, many filesystem maintainers do not consider this to be a
>> priority.  Even if they did, I don't think *anyone* (myself included)
>> could write a filesystem implementation in C that didn't have memory
>> corruption flaws.  The only exceptions are if the filesystem is
> 
> I think this is still falling into the aspect of implementation
> bugs, my question is simply: "why filesystem is special in this
> kind of area, there are many other kernel subsystems in C which
> can receive untrusted data, like TCP/IP stack", why filesystem
> is special for particular memory corruption flaws?

See above - the difference is that filesystems have historically
not been written with untrusted input in mind.  This, of course,
can be changed.

> I really think different aspects are often mixed when this topic
> is mentioned, which makes the discussion getting more and more
> divergent.

I agree.

> If we talk about implementation bugs, I think filesystem is not
> special, but as I said, I think the main issue is the writable
> filesystem on-disk format design, due to the design, there are
> many severe consequences out of inconsistent filesystems.

It definitely makes things much harder, and dramatically increases
the attack surface.

Most uses I have (notably backups) have a hard requirement for writable
storage, and when they don't need it they can use dm-verity.

>> incredibly simple or formal methods are used, and neither is the
>> case for existing filesystems in the Linux kernel.  By sandboxing a
>> filesystem, one ensures that an attacker who compromises a filesystem
>> implementation needs to find *another* exploit to compromise the
>> whole system.
> 
> Yes, yet sandboxing is the one part, of course VM sandboxing
> is better than Linux namespace isolation, but VMs cost much.

I use a lot of VMs, but they indeed use significant resources.  I hope
that at some point this can largely be solved with copy-on-write
VM forking.

> Other than sandboxing, I think auditability is important too,
> especially users provide sensitive data to new workloads.
> 
> Of course, only dealing with trusted workloads is the best,
> out of question.  But in the real world, we cannot always
> face complete trusted workloads.  For untrusted workloads,
> we need to find reliable ways to audit them until they
> become trusted.
> 
> Just like in the real world: accumulate credit, undergo
> audits, and eventually earn trust.
> 
> Sorry about my English, but I hope I express my whole idea.
> 
> Thanks,
> Gao Xiang

Don't worry about your English.  It is completely understandable and
more than capable of getting your (very informative) points across.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 7253 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2026-03-22  5:13 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <aYIsRc03fGhQ7vbS@groves.net>
2026-02-02 13:51 ` [LSF/MM/BPF TOPIC] Where is fuse going? API cleanup, restructuring and more Miklos Szeredi
2026-02-02 16:14   ` Amir Goldstein
2026-02-03  7:55     ` Miklos Szeredi
2026-02-03  9:19       ` [Lsf-pc] " Jan Kara
2026-02-03 10:31         ` Amir Goldstein
2026-02-04  9:22       ` Joanne Koong
2026-02-04 10:37         ` Amir Goldstein
2026-02-04 10:43         ` [Lsf-pc] " Jan Kara
2026-02-06  6:09           ` Darrick J. Wong
2026-02-21  6:07             ` Demi Marie Obenour
2026-02-21  7:07               ` Darrick J. Wong
2026-02-21 22:16                 ` Demi Marie Obenour
2026-02-23 21:58                   ` Darrick J. Wong
2026-02-04 20:47         ` Bernd Schubert
2026-02-06  6:26         ` Darrick J. Wong
2026-02-03 10:15     ` Luis Henriques
2026-02-03 10:20       ` Amir Goldstein
2026-02-03 10:38         ` Luis Henriques
2026-02-03 14:20         ` Christian Brauner
2026-02-03 10:36   ` Amir Goldstein
2026-02-03 17:13   ` John Groves
2026-02-04 19:06   ` Darrick J. Wong
2026-02-04 19:38     ` Horst Birthelmer
2026-02-04 20:58     ` Bernd Schubert
2026-02-06  5:47       ` Darrick J. Wong
2026-02-04 22:50     ` Gao Xiang
2026-02-06  5:38       ` Darrick J. Wong
2026-02-06  6:15         ` Gao Xiang
2026-02-21  0:47           ` Darrick J. Wong
2026-03-17  4:17             ` Gao Xiang
2026-03-18 21:51               ` Darrick J. Wong
2026-03-19  8:05                 ` Gao Xiang
2026-03-22  3:25                 ` Demi Marie Obenour
2026-03-22  3:52                   ` Gao Xiang
2026-03-22  4:51                   ` Gao Xiang
2026-03-22  5:13                     ` Demi Marie Obenour [this message]
2026-03-22  5:30                       ` Gao Xiang
2026-03-23  9:54                     ` [Lsf-pc] " Jan Kara
2026-03-23 10:19                       ` Gao Xiang
2026-03-23 11:14                         ` Jan Kara
2026-03-23 11:42                           ` Gao Xiang
2026-03-23 12:01                             ` Gao Xiang
2026-03-23 14:13                               ` Jan Kara
2026-03-23 14:36                                 ` Gao Xiang
2026-03-23 14:47                                   ` Jan Kara
2026-03-23 14:57                                     ` Gao Xiang
2026-03-24  8:48                                     ` Christian Brauner
2026-03-24  9:30                                       ` Gao Xiang
2026-03-24  9:49                                         ` Demi Marie Obenour
2026-03-24  9:53                                           ` Gao Xiang
2026-03-24 10:02                                             ` Demi Marie Obenour
2026-03-24 10:14                                               ` Gao Xiang
2026-03-24 10:17                                                 ` Demi Marie Obenour
2026-03-24 10:25                                                   ` Gao Xiang
2026-03-24 11:58                                       ` Demi Marie Obenour
2026-03-24 12:21                                         ` Gao Xiang
2026-03-26 14:39                                           ` Christian Brauner
2026-03-23 12:08                           ` Demi Marie Obenour
2026-03-23 12:13                             ` Gao Xiang
2026-03-23 12:19                               ` Demi Marie Obenour
2026-03-23 12:30                                 ` Gao Xiang
2026-03-23 12:33                                   ` Gao Xiang
2026-03-22  5:14                   ` Gao Xiang
2026-03-23  9:43                     ` [Lsf-pc] " Jan Kara
2026-03-23 10:05                       ` Gao Xiang
2026-03-23 10:14                         ` Jan Kara
2026-03-23 10:30                           ` Gao Xiang
2026-02-04 23:19     ` Gao Xiang
2026-02-05  3:33     ` John Groves
2026-02-05  9:27       ` Amir Goldstein
2026-02-06  5:52         ` Darrick J. Wong
2026-02-06 20:48           ` John Groves
2026-02-07  0:22             ` Joanne Koong
2026-02-12  4:46               ` Joanne Koong
2026-02-21  0:37                 ` Darrick J. Wong
2026-02-26 20:21                   ` Joanne Koong
2026-03-03  4:57                     ` Darrick J. Wong
2026-03-03 17:28                       ` Joanne Koong
2026-02-20 23:59             ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b30397b4-df0d-4f96-89a6-9c90aad10bd6@gmail.com \
    --to=demiobenour@gmail.com \
    --cc=John@groves.net \
    --cc=amir73il@gmail.com \
    --cc=bernd@bsbernd.com \
    --cc=djwong@kernel.org \
    --cc=horst@birthelmer.de \
    --cc=hsiangkao@linux.alibaba.com \
    --cc=joannelkoong@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=luis@igalia.com \
    --cc=miklos@szeredi.hu \
    --cc=xiang@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox