Copy-on-read for untrusted image mounts, and differentiating between metadata and user data writes.

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Demi Marie Obenour <demiobenour@gmail.com>
To: Theodore Tso <tytso@mit.edu>, Zw Tang <shicenci@gmail.com>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>,
	libaokun@linux.alibaba.com, jack@suse.cz, ojaswin@linux.ibm.com,
	linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org,
	yi.zhang@huawei.com
Subject: Copy-on-read for untrusted image mounts, and differentiating between metadata and user data writes.
Date: Wed, 29 Apr 2026 11:06:13 -0400	[thread overview]
Message-ID: <f0960900-b98d-4db9-8e83-642bcf752ed5@gmail.com> (raw)
In-Reply-To: <4e76eb68-862d-4b9f-8242-e6aced2704ee@gmail.com>

[-- Attachment #1.1.1: Type: text/plain, Size: 4202 bytes --]

On 4/25/26 14:00, Demi Marie Obenour wrote:
> On 4/21/26 08:20, Theodore Tso wrote:
>> On Tue, Apr 21, 2026 at 07:32:43PM +0800, Zw Tang wrote:
>>> This looks like an ext4 inline-data boundary/state inconsistency triggered
>>> while writing to an ext4 image crafted by syzkaller. The later
>>> KASAN: slab-use-after-free in rwsem_down_write_slowpath() appears to be a
>>> secondary effect after the primary ext4 BUG, likely during teardown/unlink
>>> after the filesystem failure.
>>
>> Writing to a mounted image is not something that we consider a valid
>> threat model.  If you can write to a mounted image, there are a
>> zillion different ways that you can creash the kernel, or you can
>> create a setuid shell, etc.
>>
>> The upstream syzkaller bot makes sure that CONFIG_BLK_DEV_WRITE_MOUNTED
>> is not defined to avoid syzkaller noise.
> 
> CONFIG_BLK_DEV_WRITE_MOUNTED only blocks writing via the specific block
> device that is mounted.  It doesn't block writing via other methods.
> If I recall correctly, its purpose was to prevent writing to the
> buffer cache used by the filesystem driver.
> 
> Changing block devices that are mounted is also reachable via USB.
> Yes, some distros may disable automount, but users who have stuff to
> get done will mount USB devices anyway.  Telling users "don't do this"
> very rarely works in practice.

So this gave me an idea that might work in practice, without requiring
any additional work from the ext4 (or XFS) developers.

My understanding is that:

1. e2fsck *is* intended to be secure against malicious filesystem
   images (though not TOCTOU).

2. Mounting with nosuid,noexec,nodev,nosymfollow can mitigate
   VFS-level attacks.

This means that the following should be a safe way to mount an
untrusted ext4 filesystem:

1. Copy it to trusted storage.
2. Run 'e2fsck -f -n' (or 'e2fsck -f -p') on the image.
3. Mount it with nosuid,noexec,nodev,nosymfollow.

The first step protects against TOCTOU, and the second protects
against metadata parsing attacks.

It's possible to optimize this by using a virtual block device that:

1. When data is read for the first time, copies it from the untrusted
   device to the trusted device.  Subsequent reads come from the
   trusted device.

2. When data is written, it is written to both the untrusted and
   trusted devices.
However, if one is mounting read-only, one can actually go further than
that for ext4.  ext4 doesn't care about file contents, only metadata.
So TOCTOU attacks on file contents can't affect it.

Furthermore, e2fsck must read all metadata in order to do its job.
After all, the safety of step 3 in the above procedure depends on step
2 doing exactly that.  If the filesystem is mounted with 'ro,noatime',
then (IIUC) the only change that the ext4 driver will need to make
to the filesystem is journal replay.  But e2fsck has to do that to
check the integrity of the filesystem.

This means that copy-on-read is only necessary while e2fsck is running.
Afterwards, it's no longer necessary to copy newly-accessed data to
the trusted device.  Erroring any writes to data that was not already
read will ensure that further metadata (that needs TOCTOU protection)
is not written out.

This doesn't protect against TOCTOU attacks on applications, but for
at least some backup workloads that is not an issue.

With minimal cooperation from the filesystem, one can do even better.
If the device knew what was metadata and what was file contents,
it could do better by only storing metadata on trusted storage,
while file contents are allowed to be written to untrusted storage.

his has many applications in the server world!  It's
quite possible that one wants to treat metadata and file contents
differently.  For instance, one might want to put the metadata on a
fast NVMe drive (or a RAID 1 of two or more such drives), while the
file contents are on a RAID 6 on HDDs.  This keeps metadata access
(lots of random reads and writes) fast, while bulk data access (likely
sequential and performance-insensitive) can go on cheap bulk storage.
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 7253 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

next prev parent reply	other threads:[~2026-04-29 15:06 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-21 11:32 [BUG] ext4: BUG_ON in ext4_write_inline_data (fs/ext4/inline.c:240) Zw Tang
2026-04-21 12:20 ` Theodore Tso
2026-04-25 18:00   ` Demi Marie Obenour
2026-04-26  3:22     ` Theodore Tso
2026-04-28 20:50       ` Demi Marie Obenour
2026-04-29  4:40         ` Theodore Tso
2026-04-29  5:32           ` Demi Marie Obenour
2026-04-27 11:17     ` Jan Kara
2026-04-29 15:06     ` Demi Marie Obenour [this message]
2026-04-21 12:25 ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f0960900-b98d-4db9-8e83-642bcf752ed5@gmail.com \
    --to=demiobenour@gmail.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=jack@suse.cz \
    --cc=libaokun@linux.alibaba.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ojaswin@linux.ibm.com \
    --cc=shicenci@gmail.com \
    --cc=tytso@mit.edu \
    --cc=yi.zhang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox