* [PATCH RFC 0/8] fs: support freeze/thaw/mark_dead/sync with shared devices
@ 2026-06-02 10:10 Christian Brauner
2026-06-02 10:10 ` [PATCH RFC 1/8] fs, block: move blk_mode_t and fop_flags_t into <linux/types.h> Christian Brauner
` (9 more replies)
0 siblings, 10 replies; 21+ messages in thread
From: Christian Brauner @ 2026-06-02 10:10 UTC (permalink / raw)
To: Christoph Hellwig, Jan Kara
Cc: Jens Axboe, Alexander Viro, linux-block, linux-kernel,
linux-fsdevel, Carlos Maiolino, linux-xfs, Chris Mason,
David Sterba, linux-btrfs, Theodore Ts'o, linux-ext4,
Gao Xiang, linux-erofs, Christian Brauner (Amutable)
Note, this is on the border between RFC/POC and so I haven't pushed this
through testing yet. But I don't want to waste more time on this before
showing it.
I surveyed various fs implementations because I want the ability to
extend userspace the ability to manage what devices can be onlined in a
centralized way without having to force every fs to care about this.
I realized that erofs allows sharing block devices with multiple
superblocks. Any freeze, thaw, removal, or sync on those devices will
not be communicated to the superblocks using it and our current
infrastructure is unable to deal with this.
This attempts to add the ability to go from device number to all the
superblock using that device, iterate through them one-by-one and
perform actions on them. For most fses this is a 1:1 mapping but for
erofs its a 1:many mapping.
This is not unreasonable infastructure to support in my opinion. I
played around with some ideas for this and I want to send out an RFC to
gather some early input.
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
---
Christian Brauner (8):
fs, block: move blk_mode_t and fop_flags_t into <linux/types.h>
fs: add a global device to super block hash table
fs: refuse to claim any frozen block device
xfs: port to fs_bdev_file_open_by_path()
btrfs: open via dedicated fs bdev helpers
ext4: open via dedicated fs bdev helpers
erofs: open via dedicated fs bdev helpers
super: make fs_holder_ops private
fs/btrfs/dev-replace.c | 6 +-
fs/btrfs/ioctl.c | 4 +-
fs/btrfs/volumes.c | 26 ++-
fs/erofs/data.c | 6 +
fs/erofs/internal.h | 10 ++
fs/erofs/super.c | 66 +++++--
fs/erofs/zdata.c | 10 +-
fs/ext4/super.c | 12 +-
fs/super.c | 452 ++++++++++++++++++++++++++++++++---------------
fs/xfs/xfs_buf.c | 2 +-
fs/xfs/xfs_super.c | 10 +-
include/linux/blkdev.h | 9 -
include/linux/fs.h | 2 -
include/linux/fs/super.h | 7 +
include/linux/types.h | 2 +
15 files changed, 433 insertions(+), 191 deletions(-)
---
base-commit: 254f49634ee16a731174d2ae34bc50bd5f45e731
change-id: 20260602-work-super-bdev_holder_global-8cba5e52bed5
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH RFC 1/8] fs, block: move blk_mode_t and fop_flags_t into <linux/types.h>
2026-06-02 10:10 [PATCH RFC 0/8] fs: support freeze/thaw/mark_dead/sync with shared devices Christian Brauner
@ 2026-06-02 10:10 ` Christian Brauner
2026-06-08 9:57 ` Jan Kara
2026-06-02 10:10 ` [PATCH RFC 2/8] fs: add a global device to super block hash table Christian Brauner
` (8 subsequent siblings)
9 siblings, 1 reply; 21+ messages in thread
From: Christian Brauner @ 2026-06-02 10:10 UTC (permalink / raw)
To: Christoph Hellwig, Jan Kara
Cc: Jens Axboe, Alexander Viro, linux-block, linux-kernel,
linux-fsdevel, Carlos Maiolino, linux-xfs, Chris Mason,
David Sterba, linux-btrfs, Theodore Ts'o, linux-ext4,
Gao Xiang, linux-erofs, Christian Brauner (Amutable)
blk_mode_t and fop_flags_t are both plain 'unsigned int __bitwise' flag
typedefs, exactly like the gfp_t, slab_flags_t and fmode_t that already
live in <linux/types.h>. Move them there so they are available
everywhere without having to drag in a subsystem header.
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
---
include/linux/blkdev.h | 2 --
include/linux/fs.h | 2 --
include/linux/types.h | 2 ++
3 files changed, 2 insertions(+), 4 deletions(-)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 890128cdea1c..c8494d64a69d 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -126,8 +126,6 @@ struct blk_integrity {
unsigned char pi_tuple_size;
};
-typedef unsigned int __bitwise blk_mode_t;
-
/* open for reading */
#define BLK_OPEN_READ ((__force blk_mode_t)(1 << 0))
/* open for writing */
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 11559c513dfb..e9346be8470f 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1921,8 +1921,6 @@ struct dir_context {
struct io_uring_cmd;
struct offset_ctx;
-typedef unsigned int __bitwise fop_flags_t;
-
struct file_operations {
struct module *owner;
fop_flags_t fop_flags;
diff --git a/include/linux/types.h b/include/linux/types.h
index 608050dbca6a..ef026585420b 100644
--- a/include/linux/types.h
+++ b/include/linux/types.h
@@ -163,6 +163,8 @@ typedef u32 dma_addr_t;
typedef unsigned int __bitwise gfp_t;
typedef unsigned int __bitwise slab_flags_t;
typedef unsigned int __bitwise fmode_t;
+typedef unsigned int __bitwise blk_mode_t;
+typedef unsigned int __bitwise fop_flags_t;
#ifdef CONFIG_PHYS_ADDR_T_64BIT
typedef u64 phys_addr_t;
--
2.47.3
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH RFC 2/8] fs: add a global device to super block hash table
2026-06-02 10:10 [PATCH RFC 0/8] fs: support freeze/thaw/mark_dead/sync with shared devices Christian Brauner
2026-06-02 10:10 ` [PATCH RFC 1/8] fs, block: move blk_mode_t and fop_flags_t into <linux/types.h> Christian Brauner
@ 2026-06-02 10:10 ` Christian Brauner
2026-06-08 10:14 ` Jan Kara
2026-06-16 12:34 ` Christoph Hellwig
2026-06-02 10:10 ` [PATCH RFC 3/8] fs: refuse to claim any frozen block device Christian Brauner
` (7 subsequent siblings)
9 siblings, 2 replies; 21+ messages in thread
From: Christian Brauner @ 2026-06-02 10:10 UTC (permalink / raw)
To: Christoph Hellwig, Jan Kara
Cc: Jens Axboe, Alexander Viro, linux-block, linux-kernel,
linux-fsdevel, Carlos Maiolino, linux-xfs, Chris Mason,
David Sterba, linux-btrfs, Theodore Ts'o, linux-ext4,
Gao Xiang, linux-erofs, Christian Brauner (Amutable)
fs_holder_ops recovers the owning superblock from bdev->bd_holder, which
forces the holder to be exactly one superblock and prevents several
superblocks from sharing one block device. That's what erofs is doing.
Introduce a global dev_t-keyed rhltable mapping each block device to the
superblock(s) using it. The holder argument becomes purely the block
layer's exclusivity token (a superblock, or a file_system_type for
shared devices) and is no longer needed by the fs specific callbacks.
Registration keeps one entry per (device, superblock). When a filesystem
claims a device it already uses (xfs with its log on the data device), no
second entry is added, so each superblock is acted on once.
Each table entry holds a passive reference (s_count) on its superblock,
so the struct stays valid for as long as the entry is reachable. The
callbacks look the device up in the table and act on every superblock
using it:
Unlinking an entry is deferred to the last unpin, so a cursor never
resumes from a removed node. After this it's possible to act on all
superblocks that share a given device.
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
---
fs/super.c | 430 +++++++++++++++++++++++++++++++++--------------
include/linux/blkdev.h | 7 -
include/linux/fs/super.h | 7 +
3 files changed, 309 insertions(+), 135 deletions(-)
diff --git a/fs/super.c b/fs/super.c
index 378e81efe643..e0174d5819a0 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -24,6 +24,7 @@
#include <linux/export.h>
#include <linux/slab.h>
#include <linux/blkdev.h>
+#include <linux/rhashtable.h>
#include <linux/mount.h>
#include <linux/security.h>
#include <linux/writeback.h> /* for the emergency remount stuff */
@@ -1411,186 +1412,234 @@ EXPORT_SYMBOL(sget_dev);
#ifdef CONFIG_BLOCK
/*
- * Lock the superblock that is holder of the bdev. Returns the superblock
- * pointer if we successfully locked the superblock and it is alive. Otherwise
- * we return NULL and just unlock bdev->bd_holder_lock.
- *
- * The function must be called with bdev->bd_holder_lock and releases it.
+ * Filesystems claim block devices through fs_bdev_file_open_by_{dev,path}(),
+ * which records a {dev_t -> super_block} entry in the global @fs_bdev_supers
+ * table. The fs_holder_ops callbacks resolve a device event to the
+ * superblock(s) using that device by looking it up there rather than reading
+ * bdev->bd_holder, so several superblocks may share one block device -- the
+ * holder is then only the block layer's exclusivity token.
*/
-static struct super_block *bdev_super_lock(struct block_device *bdev, bool excl)
- __releases(&bdev->bd_holder_lock)
+struct fs_bdev_holder {
+ dev_t dev; /* @fs_bdev_supers key */
+ struct super_block *sb;
+ refcount_t fs_bdev_passive; /* @fs_bdev_active>0 bias + cursor pins */
+ refcount_t fs_bdev_active; /* open claims for (dev, sb) */
+ struct rhlist_head node;
+ struct rcu_head rcu;
+};
+
+static struct rhltable fs_bdev_supers;
+static const struct rhashtable_params fs_bdev_params = {
+ .key_len = sizeof(dev_t),
+ .key_offset = offsetof(struct fs_bdev_holder, dev),
+ .head_offset = offsetof(struct fs_bdev_holder, node),
+};
+
+static int __init fs_bdev_supers_init(void)
{
- struct super_block *sb = bdev->bd_holder;
- bool locked;
+ if (rhltable_init(&fs_bdev_supers, &fs_bdev_params))
+ panic("VFS: Cannot initialise fs_bdev_supers\n");
+ return 0;
+}
+fs_initcall(fs_bdev_supers_init);
- lockdep_assert_held(&bdev->bd_holder_lock);
- lockdep_assert_not_held(&sb->s_umount);
- lockdep_assert_not_held(&bdev->bd_disk->open_mutex);
+static void fs_bdev_holder_put(struct fs_bdev_holder *h)
+{
+ /* Unlink only once unpinned, so a cursor never resumes from a removed node. */
+ if (refcount_dec_and_test(&h->fs_bdev_passive)) {
+ rhltable_remove(&fs_bdev_supers, &h->node, fs_bdev_params);
+ put_super(h->sb);
+ kfree_rcu(h, rcu);
+ }
+}
- /* Make sure sb doesn't go away from under us */
- spin_lock(&sb_lock);
- sb->s_count++;
- spin_unlock(&sb_lock);
+/*
+ * Walk the superblocks sharing a block device the way __iterate_supers() walks
+ * super_blocks: fs_bdev_first()/fs_bdev_next() return each entry with its node
+ * pinned (refcount) so the chain link survives the RCU drop and the sleeping
+ * work the callbacks do between iterations; fs_bdev_next() also unpins the
+ * previous entry. The entry's fs_bdev_passive ref keeps @h->sb valid; callers
+ * take s_active and/or super_lock_shared() as needed and skip dying superblocks.
+ * A shared per-entry list node can't replace this because mark_dead and sync
+ * are not mutually serialised.
+ */
+static struct fs_bdev_holder *fs_bdev_pin(struct rhlist_head *pos)
+{
+ struct fs_bdev_holder *h;
- mutex_unlock(&bdev->bd_holder_lock);
+ /* Caller holds rcu_read_lock(). */
+ for (; pos; pos = rcu_dereference_all(pos->next)) {
+ h = container_of(pos, struct fs_bdev_holder, node);
+ if (refcount_inc_not_zero(&h->fs_bdev_passive))
+ return h;
+ }
+ return NULL;
+}
- locked = super_lock(sb, excl);
+static struct fs_bdev_holder *fs_bdev_first(dev_t dev)
+{
+ struct fs_bdev_holder *h;
- /*
- * If the superblock wasn't already SB_DYING then we hold
- * s_umount and can safely drop our temporary reference.
- */
- put_super(sb);
+ rcu_read_lock();
+ h = fs_bdev_pin(rhltable_lookup(&fs_bdev_supers, &dev, fs_bdev_params));
+ rcu_read_unlock();
+ return h;
+}
- if (!locked)
- return NULL;
+static struct fs_bdev_holder *fs_bdev_next(struct fs_bdev_holder *prev)
+{
+ struct fs_bdev_holder *h;
- if (!sb->s_root || !(sb->s_flags & SB_ACTIVE)) {
- super_unlock(sb, excl);
- return NULL;
- }
+ rcu_read_lock();
+ h = fs_bdev_pin(rcu_dereference_all(prev->node.next));
+ rcu_read_unlock();
+
+ fs_bdev_holder_put(prev);
+ return h;
+}
- return sb;
+static int fs_super_freeze(struct super_block *sb)
+{
+ if (sb->s_op->freeze_super)
+ return sb->s_op->freeze_super(sb,
+ FREEZE_MAY_NEST | FREEZE_HOLDER_USERSPACE, NULL);
+ return freeze_super(sb, FREEZE_MAY_NEST | FREEZE_HOLDER_USERSPACE, NULL);
+}
+
+static int fs_super_thaw(struct super_block *sb)
+{
+ if (sb->s_op->thaw_super)
+ return sb->s_op->thaw_super(sb,
+ FREEZE_MAY_NEST | FREEZE_HOLDER_USERSPACE, NULL);
+ return thaw_super(sb, FREEZE_MAY_NEST | FREEZE_HOLDER_USERSPACE, NULL);
}
static void fs_bdev_mark_dead(struct block_device *bdev, bool surprise)
{
- struct super_block *sb;
+ struct fs_bdev_holder *h;
+ dev_t dev = bdev->bd_dev;
- sb = bdev_super_lock(bdev, false);
- if (!sb)
- return;
+ mutex_unlock(&bdev->bd_holder_lock);
- if (sb->s_op->remove_bdev) {
- int ret;
+ for (h = fs_bdev_first(dev); h; h = fs_bdev_next(h)) {
+ struct super_block *sb = h->sb;
- ret = sb->s_op->remove_bdev(sb, bdev);
- if (!ret) {
- super_unlock_shared(sb);
- return;
+ if (!super_lock_shared(sb))
+ continue;
+ if (sb->s_root && (sb->s_flags & SB_ACTIVE)) {
+ if (!sb->s_op->remove_bdev ||
+ sb->s_op->remove_bdev(sb, bdev)) {
+ if (!surprise)
+ sync_filesystem(sb);
+ shrink_dcache_sb(sb);
+ evict_inodes(sb);
+ if (sb->s_op->shutdown)
+ sb->s_op->shutdown(sb);
+ }
}
- /* Fallback to shutdown. */
+ super_unlock_shared(sb);
}
-
- if (!surprise)
- sync_filesystem(sb);
- shrink_dcache_sb(sb);
- evict_inodes(sb);
- if (sb->s_op->shutdown)
- sb->s_op->shutdown(sb);
-
- super_unlock_shared(sb);
}
static void fs_bdev_sync(struct block_device *bdev)
{
- struct super_block *sb;
+ struct fs_bdev_holder *h;
+ dev_t dev = bdev->bd_dev;
- sb = bdev_super_lock(bdev, false);
- if (!sb)
- return;
+ mutex_unlock(&bdev->bd_holder_lock);
- sync_filesystem(sb);
- super_unlock_shared(sb);
-}
+ for (h = fs_bdev_first(dev); h; h = fs_bdev_next(h)) {
+ struct super_block *sb = h->sb;
-static struct super_block *get_bdev_super(struct block_device *bdev)
-{
- bool active = false;
- struct super_block *sb;
-
- sb = bdev_super_lock(bdev, true);
- if (sb) {
- active = atomic_inc_not_zero(&sb->s_active);
- super_unlock_excl(sb);
+ if (!super_lock_shared(sb))
+ continue;
+ if (sb->s_root && (sb->s_flags & SB_ACTIVE))
+ sync_filesystem(sb);
+ super_unlock_shared(sb);
}
- if (!active)
- return NULL;
- return sb;
}
/**
- * fs_bdev_freeze - freeze owning filesystem of block device
+ * fs_bdev_freeze - freeze every superblock using a block device
* @bdev: block device
*
- * Freeze the filesystem that owns this block device if it is still
- * active.
- *
- * A filesystem that owns multiple block devices may be frozen from each
- * block device and won't be unfrozen until all block devices are
- * unfrozen. Each block device can only freeze the filesystem once as we
- * nest freezes for block devices in the block layer.
+ * Freeze each live superblock using @bdev. A superblock owning several block
+ * devices is frozen once per device and stays frozen until all are thawed; the
+ * block layer nests these freezes so the count stays balanced.
*
- * Return: If the freeze was successful zero is returned. If the freeze
- * failed a negative error code is returned.
+ * Return: 0, or the error from the one superblock on a single-fs device. When
+ * several superblocks share @bdev a per-superblock failure is swallowed
+ * (see below), but a sync_blockdev() failure is always reported.
*/
static int fs_bdev_freeze(struct block_device *bdev)
{
- struct super_block *sb;
- int error = 0;
+ dev_t dev = bdev->bd_dev;
+ struct fs_bdev_holder *h;
+ unsigned int count = 0;
+ int error = 0, err;
lockdep_assert_held(&bdev->bd_fsfreeze_mutex);
- sb = get_bdev_super(bdev);
- if (!sb)
- return -EINVAL;
+ mutex_unlock(&bdev->bd_holder_lock);
- if (sb->s_op->freeze_super)
- error = sb->s_op->freeze_super(sb,
- FREEZE_MAY_NEST | FREEZE_HOLDER_USERSPACE, NULL);
- else
- error = freeze_super(sb,
- FREEZE_MAY_NEST | FREEZE_HOLDER_USERSPACE, NULL);
+ for (h = fs_bdev_first(dev); h; h = fs_bdev_next(h)) {
+ if (!atomic_inc_not_zero(&h->sb->s_active))
+ continue;
+ err = fs_super_freeze(h->sb);
+ if (err && !error)
+ error = err;
+ deactivate_super(h->sb);
+ count++;
+ }
+
+ /*
+ * When several superblocks share the device, keep it frozen even if some
+ * of them failed to freeze and swallow the error: rolling the rest back
+ * via thaw_super() can fail too, so neither is a clear win. A single
+ * filesystem (count == 1) still reports its error.
+ */
+ if (error && count > 1)
+ error = 0;
if (!error)
error = sync_blockdev(bdev);
- deactivate_super(sb);
return error;
}
/**
- * fs_bdev_thaw - thaw owning filesystem of block device
+ * fs_bdev_thaw - thaw every superblock using a block device
* @bdev: block device
*
- * Thaw the filesystem that owns this block device.
+ * The counterpart to fs_bdev_freeze(): thaw each live superblock using @bdev.
+ * A zero return does not imply a superblock is fully unfrozen; it may have been
+ * frozen more than once (by the kernel or via another device).
*
- * A filesystem that owns multiple block devices may be frozen from each
- * block device and won't be unfrozen until all block devices are
- * unfrozen. Each block device can only freeze the filesystem once as we
- * nest freezes for block devices in the block layer.
- *
- * Return: If the thaw was successful zero is returned. If the thaw
- * failed a negative error code is returned. If this function
- * returns zero it doesn't mean that the filesystem is unfrozen
- * as it may have been frozen multiple times (kernel may hold a
- * freeze or might be frozen from other block devices).
+ * Return: 0, or the first error on a single-fs device; a shared device swallows
+ * per-superblock errors, as fs_bdev_freeze() does.
*/
static int fs_bdev_thaw(struct block_device *bdev)
{
- struct super_block *sb;
- int error;
+ dev_t dev = bdev->bd_dev;
+ struct fs_bdev_holder *h;
+ unsigned int count = 0;
+ int error = 0, err;
lockdep_assert_held(&bdev->bd_fsfreeze_mutex);
- /*
- * The block device may have been frozen before it was claimed by a
- * filesystem. Concurrently another process might try to mount that
- * frozen block device and has temporarily claimed the block device for
- * that purpose causing a concurrent fs_bdev_thaw() to end up here. The
- * mounter is already about to abort mounting because they still saw an
- * elevanted bdev->bd_fsfreeze_count so get_bdev_super() will return
- * NULL in that case.
- */
- sb = get_bdev_super(bdev);
- if (!sb)
- return -EINVAL;
+ mutex_unlock(&bdev->bd_holder_lock);
- if (sb->s_op->thaw_super)
- error = sb->s_op->thaw_super(sb,
- FREEZE_MAY_NEST | FREEZE_HOLDER_USERSPACE, NULL);
- else
- error = thaw_super(sb,
- FREEZE_MAY_NEST | FREEZE_HOLDER_USERSPACE, NULL);
- deactivate_super(sb);
+ for (h = fs_bdev_first(dev); h; h = fs_bdev_next(h)) {
+ if (!atomic_inc_not_zero(&h->sb->s_active))
+ continue;
+ err = fs_super_thaw(h->sb);
+ if (err && !error)
+ error = err;
+ deactivate_super(h->sb);
+ count++;
+ }
+
+ /* Shared device: swallow per-superblock errors, like fs_bdev_freeze(). */
+ if (error && count > 1)
+ error = 0;
return error;
}
@@ -1602,6 +1651,131 @@ const struct blk_holder_ops fs_holder_ops = {
};
EXPORT_SYMBOL_GPL(fs_holder_ops);
+static int fs_bdev_register(struct file *bdev_file, struct super_block *sb)
+{
+ dev_t dev = file_bdev(bdev_file)->bd_dev;
+ struct rhlist_head *list, *pos;
+ struct fs_bdev_holder *h;
+ int err;
+
+ /*
+ * A superblock may claim one device more than once (xfs with its log on
+ * the data device). Keep a single entry per (device, superblock) and
+ * count the claims in @fs_bdev_active; the entry lives until the last one
+ * is released.
+ */
+ scoped_guard(rcu) {
+ list = rhltable_lookup(&fs_bdev_supers, &dev, fs_bdev_params);
+ rhl_for_each_entry_rcu(h, pos, list, node)
+ if (h->sb == sb && refcount_inc_not_zero(&h->fs_bdev_active))
+ return 0;
+ }
+
+ h = kmalloc(sizeof(*h), GFP_KERNEL);
+ if (!h)
+ return -ENOMEM;
+ h->dev = dev;
+ h->sb = sb;
+ refcount_set(&h->fs_bdev_passive, 1);
+ refcount_set(&h->fs_bdev_active, 1);
+
+ err = rhltable_insert(&fs_bdev_supers, &h->node, fs_bdev_params);
+ if (err) {
+ kfree(h);
+ return err;
+ }
+
+ /* The sb->s_count ref keeps @h->sb valid for as long as the entry exists. */
+ spin_lock(&sb_lock);
+ sb->s_count++;
+ spin_unlock(&sb_lock);
+
+ return 0;
+}
+
+/**
+ * fs_bdev_file_open_by_dev - claim a block device on behalf of a superblock
+ * @dev: block device number
+ * @mode: open mode
+ * @holder: block-layer exclusivity token (a superblock, or the file_system_type
+ * when the device may be shared by several superblocks of that type)
+ * @sb: superblock to drive fs_holder_ops events for
+ *
+ * Open @dev with &fs_holder_ops and register that @sb uses it, so device
+ * removal/sync/freeze/thaw are propagated to @sb (and any other superblock
+ * sharing @dev). Must be paired with fs_bdev_file_release().
+ *
+ * Return: an opened block-device file or an ERR_PTR().
+ */
+struct file *fs_bdev_file_open_by_dev(dev_t dev, blk_mode_t mode, void *holder,
+ struct super_block *sb)
+{
+ struct file *bdev_file;
+ int err;
+
+ bdev_file = bdev_file_open_by_dev(dev, mode, holder, &fs_holder_ops);
+ if (IS_ERR(bdev_file))
+ return bdev_file;
+
+ err = fs_bdev_register(bdev_file, sb);
+ if (err) {
+ bdev_fput(bdev_file);
+ return ERR_PTR(err);
+ }
+ return bdev_file;
+}
+EXPORT_SYMBOL_GPL(fs_bdev_file_open_by_dev);
+
+struct file *fs_bdev_file_open_by_path(const char *path, blk_mode_t mode,
+ void *holder, struct super_block *sb)
+{
+ struct file *bdev_file;
+ int err;
+
+ bdev_file = bdev_file_open_by_path(path, mode, holder, &fs_holder_ops);
+ if (IS_ERR(bdev_file))
+ return bdev_file;
+
+ err = fs_bdev_register(bdev_file, sb);
+ if (err) {
+ bdev_fput(bdev_file);
+ return ERR_PTR(err);
+ }
+ return bdev_file;
+}
+EXPORT_SYMBOL_GPL(fs_bdev_file_open_by_path);
+
+/**
+ * fs_bdev_file_release - release a block device claimed for a superblock
+ * @bdev_file: file returned by fs_bdev_file_open_by_{dev,path}()
+ * @sb: superblock the device was claimed for
+ *
+ * Drop one claim on the {dev, @sb} entry; the last claim unregisters it (a
+ * pinning cursor defers the actual unlink). Then close the block device.
+ */
+void fs_bdev_file_release(struct file *bdev_file, struct super_block *sb)
+{
+ dev_t dev = file_bdev(bdev_file)->bd_dev;
+ struct fs_bdev_holder *h, *found = NULL;
+ struct rhlist_head *list, *pos;
+
+ rcu_read_lock();
+ list = rhltable_lookup(&fs_bdev_supers, &dev, fs_bdev_params);
+ rhl_for_each_entry_rcu(h, pos, list, node) {
+ if (h->sb != sb)
+ continue;
+ /* At most one entry per (dev, sb); the last claim drops the bias. */
+ if (refcount_dec_and_test(&h->fs_bdev_active))
+ found = h;
+ break;
+ }
+ rcu_read_unlock();
+ if (found)
+ fs_bdev_holder_put(found);
+ bdev_fput(bdev_file);
+}
+EXPORT_SYMBOL_GPL(fs_bdev_file_release);
+
int setup_bdev_super(struct super_block *sb, int sb_flags,
struct fs_context *fc)
{
@@ -1609,7 +1783,7 @@ int setup_bdev_super(struct super_block *sb, int sb_flags,
struct file *bdev_file;
struct block_device *bdev;
- bdev_file = bdev_file_open_by_dev(sb->s_dev, mode, sb, &fs_holder_ops);
+ bdev_file = fs_bdev_file_open_by_dev(sb->s_dev, mode, sb, sb);
if (IS_ERR(bdev_file)) {
if (fc)
errorf(fc, "%s: Can't open blockdev", fc->source);
@@ -1623,7 +1797,7 @@ int setup_bdev_super(struct super_block *sb, int sb_flags,
* writable from userspace even for a read-only block device.
*/
if ((mode & BLK_OPEN_WRITE) && bdev_read_only(bdev)) {
- bdev_fput(bdev_file);
+ fs_bdev_file_release(bdev_file, sb);
return -EACCES;
}
@@ -1634,7 +1808,7 @@ int setup_bdev_super(struct super_block *sb, int sb_flags,
if (atomic_read(&bdev->bd_fsfreeze_count) > 0) {
if (fc)
warnf(fc, "%pg: Can't mount, blockdev is frozen", bdev);
- bdev_fput(bdev_file);
+ fs_bdev_file_release(bdev_file, sb);
return -EBUSY;
}
spin_lock(&sb_lock);
@@ -1725,7 +1899,7 @@ void kill_block_super(struct super_block *sb)
generic_shutdown_super(sb);
if (bdev) {
sync_blockdev(bdev);
- bdev_fput(sb->s_bdev_file);
+ fs_bdev_file_release(sb->s_bdev_file, sb);
}
}
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index c8494d64a69d..43d37c02febf 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1760,13 +1760,6 @@ struct blk_holder_ops {
int (*thaw)(struct block_device *bdev);
};
-/*
- * For filesystems using @fs_holder_ops, the @holder argument passed to
- * helpers used to open and claim block devices via
- * bd_prepare_to_claim() must point to a superblock.
- */
-extern const struct blk_holder_ops fs_holder_ops;
-
/*
* Return the correct open flags for blkdev_get_by_* for super block flags
* as stored in sb->s_flags.
diff --git a/include/linux/fs/super.h b/include/linux/fs/super.h
index f21ffbb6dea5..721d842e3b24 100644
--- a/include/linux/fs/super.h
+++ b/include/linux/fs/super.h
@@ -235,4 +235,11 @@ int freeze_super(struct super_block *super, enum freeze_holder who,
int thaw_super(struct super_block *super, enum freeze_holder who,
const void *freeze_owner);
+struct file;
+struct file *fs_bdev_file_open_by_dev(dev_t dev, blk_mode_t mode, void *holder,
+ struct super_block *sb);
+struct file *fs_bdev_file_open_by_path(const char *path, blk_mode_t mode,
+ void *holder, struct super_block *sb);
+void fs_bdev_file_release(struct file *bdev_file, struct super_block *sb);
+
#endif /* _LINUX_FS_SUPER_H */
--
2.47.3
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH RFC 3/8] fs: refuse to claim any frozen block device
2026-06-02 10:10 [PATCH RFC 0/8] fs: support freeze/thaw/mark_dead/sync with shared devices Christian Brauner
2026-06-02 10:10 ` [PATCH RFC 1/8] fs, block: move blk_mode_t and fop_flags_t into <linux/types.h> Christian Brauner
2026-06-02 10:10 ` [PATCH RFC 2/8] fs: add a global device to super block hash table Christian Brauner
@ 2026-06-02 10:10 ` Christian Brauner
2026-06-08 10:01 ` Jan Kara
2026-06-02 10:10 ` [PATCH RFC 4/8] xfs: port to fs_bdev_file_open_by_path() Christian Brauner
` (6 subsequent siblings)
9 siblings, 1 reply; 21+ messages in thread
From: Christian Brauner @ 2026-06-02 10:10 UTC (permalink / raw)
To: Christoph Hellwig, Jan Kara
Cc: Jens Axboe, Alexander Viro, linux-block, linux-kernel,
linux-fsdevel, Carlos Maiolino, linux-xfs, Chris Mason,
David Sterba, linux-btrfs, Theodore Ts'o, linux-ext4,
Gao Xiang, linux-erofs, Christian Brauner (Amutable)
setup_bdev_super() already refuses to bring a filesystem up on a frozen
block device but only for the primary device. Now that filesystems claim
every device through fs_bdev_file_open_by_{dev,path}(), do that check
once in the registration helper so it covers all of them.
Drop the now-redundant check from setup_bdev_super().
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
---
fs/super.c | 21 +++++++++++----------
1 file changed, 11 insertions(+), 10 deletions(-)
diff --git a/fs/super.c b/fs/super.c
index e0174d5819a0..cea743f699e4 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1690,6 +1690,17 @@ static int fs_bdev_register(struct file *bdev_file, struct super_block *sb)
sb->s_count++;
spin_unlock(&sb_lock);
+ /*
+ * Don't bring a filesystem up on a frozen device. The entry is already
+ * published, so a freeze either is seen here or finds it and waits in
+ * super_lock() until this mount is born or (on -EBUSY) dies. The mount
+ * aborts, so the entry is torn down without rebalancing @fs_bdev_active.
+ */
+ if (atomic_read(&file_bdev(bdev_file)->bd_fsfreeze_count) > 0) {
+ fs_bdev_holder_put(h);
+ return -EBUSY;
+ }
+
return 0;
}
@@ -1801,16 +1812,6 @@ int setup_bdev_super(struct super_block *sb, int sb_flags,
return -EACCES;
}
- /*
- * It is enough to check bdev was not frozen before we set
- * s_bdev as freezing will wait until SB_BORN is set.
- */
- if (atomic_read(&bdev->bd_fsfreeze_count) > 0) {
- if (fc)
- warnf(fc, "%pg: Can't mount, blockdev is frozen", bdev);
- fs_bdev_file_release(bdev_file, sb);
- return -EBUSY;
- }
spin_lock(&sb_lock);
sb->s_bdev_file = bdev_file;
sb->s_bdev = bdev;
--
2.47.3
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH RFC 4/8] xfs: port to fs_bdev_file_open_by_path()
2026-06-02 10:10 [PATCH RFC 0/8] fs: support freeze/thaw/mark_dead/sync with shared devices Christian Brauner
` (2 preceding siblings ...)
2026-06-02 10:10 ` [PATCH RFC 3/8] fs: refuse to claim any frozen block device Christian Brauner
@ 2026-06-02 10:10 ` Christian Brauner
2026-06-08 10:15 ` Jan Kara
2026-06-02 10:10 ` [PATCH RFC 5/8] btrfs: open via dedicated fs bdev helpers Christian Brauner
` (5 subsequent siblings)
9 siblings, 1 reply; 21+ messages in thread
From: Christian Brauner @ 2026-06-02 10:10 UTC (permalink / raw)
To: Christoph Hellwig, Jan Kara
Cc: Jens Axboe, Alexander Viro, linux-block, linux-kernel,
linux-fsdevel, Carlos Maiolino, linux-xfs, Chris Mason,
David Sterba, linux-btrfs, Theodore Ts'o, linux-ext4,
Gao Xiang, linux-erofs, Christian Brauner (Amutable)
Route opens through fs_bdev_file_open_by_path() so each external device
is registered against mp->m_super, and convert the matching releases.
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
---
fs/xfs/xfs_buf.c | 2 +-
fs/xfs/xfs_super.c | 10 +++++-----
2 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 580d40a5ee57..3d3b29edb156 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -1601,7 +1601,7 @@ xfs_free_buftarg(
fs_put_dax(btp->bt_daxdev, btp->bt_mount);
/* the main block device is closed by kill_block_super */
if (btp->bt_bdev != btp->bt_mount->m_super->s_bdev)
- bdev_fput(btp->bt_file);
+ fs_bdev_file_release(btp->bt_file, btp->bt_mount->m_super);
kfree(btp);
}
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index f8de44443e81..304667210695 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -400,8 +400,8 @@ xfs_blkdev_get(
blk_mode_t mode;
mode = sb_open_mode(mp->m_super->s_flags);
- *bdev_filep = bdev_file_open_by_path(name, mode,
- mp->m_super, &fs_holder_ops);
+ *bdev_filep = fs_bdev_file_open_by_path(name, mode,
+ mp->m_super, mp->m_super);
if (IS_ERR(*bdev_filep)) {
error = PTR_ERR(*bdev_filep);
*bdev_filep = NULL;
@@ -526,7 +526,7 @@ xfs_open_devices(
mp->m_logdev_targp = mp->m_ddev_targp;
/* Handle won't be used, drop it */
if (logdev_file)
- bdev_fput(logdev_file);
+ fs_bdev_file_release(logdev_file, mp->m_super);
}
return 0;
@@ -538,10 +538,10 @@ xfs_open_devices(
xfs_free_buftarg(mp->m_ddev_targp);
out_close_rtdev:
if (rtdev_file)
- bdev_fput(rtdev_file);
+ fs_bdev_file_release(rtdev_file, mp->m_super);
out_close_logdev:
if (logdev_file)
- bdev_fput(logdev_file);
+ fs_bdev_file_release(logdev_file, mp->m_super);
return error;
}
--
2.47.3
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH RFC 5/8] btrfs: open via dedicated fs bdev helpers
2026-06-02 10:10 [PATCH RFC 0/8] fs: support freeze/thaw/mark_dead/sync with shared devices Christian Brauner
` (3 preceding siblings ...)
2026-06-02 10:10 ` [PATCH RFC 4/8] xfs: port to fs_bdev_file_open_by_path() Christian Brauner
@ 2026-06-02 10:10 ` Christian Brauner
2026-06-02 10:10 ` [PATCH RFC 6/8] ext4: " Christian Brauner
` (4 subsequent siblings)
9 siblings, 0 replies; 21+ messages in thread
From: Christian Brauner @ 2026-06-02 10:10 UTC (permalink / raw)
To: Christoph Hellwig, Jan Kara
Cc: Jens Axboe, Alexander Viro, linux-block, linux-kernel,
linux-fsdevel, Carlos Maiolino, linux-xfs, Chris Mason,
David Sterba, linux-btrfs, Theodore Ts'o, linux-ext4,
Gao Xiang, linux-erofs, Christian Brauner (Amutable)
Route opens through fs_bdev_file_open_by_path() so each external device
is registered against the correct superblock, and convert the matching
releases.
The temporary identification opens that only read the superblock and close
again pass a NULL holder and are left untouched.
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
---
fs/btrfs/dev-replace.c | 6 +++---
fs/btrfs/ioctl.c | 4 ++--
fs/btrfs/volumes.c | 26 +++++++++++++++++---------
3 files changed, 22 insertions(+), 14 deletions(-)
diff --git a/fs/btrfs/dev-replace.c b/fs/btrfs/dev-replace.c
index 8f8fa14886de..463155b0b1ff 100644
--- a/fs/btrfs/dev-replace.c
+++ b/fs/btrfs/dev-replace.c
@@ -247,8 +247,8 @@ static int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info,
return -EINVAL;
}
- bdev_file = bdev_file_open_by_path(device_path, BLK_OPEN_WRITE,
- fs_info->sb, &fs_holder_ops);
+ bdev_file = fs_bdev_file_open_by_path(device_path, BLK_OPEN_WRITE,
+ fs_info->sb, fs_info->sb);
if (IS_ERR(bdev_file)) {
btrfs_err(fs_info, "target device %s is invalid!", device_path);
return PTR_ERR(bdev_file);
@@ -325,7 +325,7 @@ static int btrfs_init_dev_replace_tgtdev(struct btrfs_fs_info *fs_info,
return 0;
error:
- bdev_fput(bdev_file);
+ fs_bdev_file_release(bdev_file, fs_info->sb);
return ret;
}
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index b2e447f5005c..16afa71b98f2 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -2579,7 +2579,7 @@ static long btrfs_ioctl_rm_dev_v2(struct file *file, void __user *arg)
err_drop:
mnt_drop_write_file(file);
if (bdev_file)
- bdev_fput(bdev_file);
+ fs_bdev_file_release(bdev_file, fs_info->sb);
out:
btrfs_put_dev_args_from_path(&args);
kfree(vol_args);
@@ -2630,7 +2630,7 @@ static long btrfs_ioctl_rm_dev(struct file *file, void __user *arg)
mnt_drop_write_file(file);
if (bdev_file)
- bdev_fput(bdev_file);
+ fs_bdev_file_release(bdev_file, fs_info->sb);
out:
btrfs_put_dev_args_from_path(&args);
out_free:
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index a88e68f90564..6f7d7afb4d66 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -480,7 +480,12 @@ btrfs_get_bdev_and_sb(const char *device_path, blk_mode_t flags, void *holder,
struct block_device *bdev;
int ret;
- *bdev_file = bdev_file_open_by_path(device_path, flags, holder, &fs_holder_ops);
+ if (holder)
+ *bdev_file = fs_bdev_file_open_by_path(device_path, flags,
+ holder, holder);
+ else
+ *bdev_file = bdev_file_open_by_path(device_path, flags, NULL,
+ NULL);
if (IS_ERR(*bdev_file)) {
ret = PTR_ERR(*bdev_file);
@@ -495,7 +500,7 @@ btrfs_get_bdev_and_sb(const char *device_path, blk_mode_t flags, void *holder,
if (holder) {
ret = set_blocksize(*bdev_file, BTRFS_BDEV_BLOCKSIZE);
if (ret) {
- bdev_fput(*bdev_file);
+ fs_bdev_file_release(*bdev_file, holder);
goto error;
}
}
@@ -503,7 +508,10 @@ btrfs_get_bdev_and_sb(const char *device_path, blk_mode_t flags, void *holder,
*disk_super = btrfs_read_disk_super(bdev, 0, false);
if (IS_ERR(*disk_super)) {
ret = PTR_ERR(*disk_super);
- bdev_fput(*bdev_file);
+ if (holder)
+ fs_bdev_file_release(*bdev_file, holder);
+ else
+ bdev_fput(*bdev_file);
goto error;
}
@@ -727,7 +735,7 @@ static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices,
error_free_page:
btrfs_release_disk_super(disk_super);
- bdev_fput(bdev_file);
+ fs_bdev_file_release(bdev_file, holder);
return -EINVAL;
}
@@ -1082,7 +1090,7 @@ static void __btrfs_free_extra_devids(struct btrfs_fs_devices *fs_devices,
continue;
if (device->bdev_file) {
- bdev_fput(device->bdev_file);
+ fs_bdev_file_release(device->bdev_file, fs_devices->fs_info->sb);
device->bdev = NULL;
device->bdev_file = NULL;
fs_devices->open_devices--;
@@ -1129,7 +1137,7 @@ static void btrfs_close_bdev(struct btrfs_device *device)
invalidate_bdev(device->bdev);
}
- bdev_fput(device->bdev_file);
+ fs_bdev_file_release(device->bdev_file, device->fs_info->sb);
}
static void btrfs_close_one_device(struct btrfs_device *device)
@@ -2820,8 +2828,8 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
if (sb_rdonly(sb) && !fs_devices->seeding)
return -EROFS;
- bdev_file = bdev_file_open_by_path(device_path, BLK_OPEN_WRITE,
- fs_info->sb, &fs_holder_ops);
+ bdev_file = fs_bdev_file_open_by_path(device_path, BLK_OPEN_WRITE,
+ fs_info->sb, fs_info->sb);
if (IS_ERR(bdev_file))
return PTR_ERR(bdev_file);
@@ -3045,7 +3053,7 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
error_free_device:
btrfs_free_device(device);
error:
- bdev_fput(bdev_file);
+ fs_bdev_file_release(bdev_file, fs_info->sb);
if (locked) {
mutex_unlock(&uuid_mutex);
up_write(&sb->s_umount);
--
2.47.3
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH RFC 6/8] ext4: open via dedicated fs bdev helpers
2026-06-02 10:10 [PATCH RFC 0/8] fs: support freeze/thaw/mark_dead/sync with shared devices Christian Brauner
` (4 preceding siblings ...)
2026-06-02 10:10 ` [PATCH RFC 5/8] btrfs: open via dedicated fs bdev helpers Christian Brauner
@ 2026-06-02 10:10 ` Christian Brauner
2026-06-08 10:18 ` Jan Kara
2026-06-02 10:10 ` [PATCH RFC 7/8] erofs: " Christian Brauner
` (3 subsequent siblings)
9 siblings, 1 reply; 21+ messages in thread
From: Christian Brauner @ 2026-06-02 10:10 UTC (permalink / raw)
To: Christoph Hellwig, Jan Kara
Cc: Jens Axboe, Alexander Viro, linux-block, linux-kernel,
linux-fsdevel, Carlos Maiolino, linux-xfs, Chris Mason,
David Sterba, linux-btrfs, Theodore Ts'o, linux-ext4,
Gao Xiang, linux-erofs, Christian Brauner (Amutable)
Route opens through fs_bdev_file_open_by_path() so each external device
is registered against the correct superblock, and convert the matching
releases.
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
---
fs/ext4/super.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 6a77db4d3124..8108d999008e 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -5793,7 +5793,7 @@ failed_mount8: __maybe_unused
brelse(sbi->s_sbh);
if (sbi->s_journal_bdev_file) {
invalidate_bdev(file_bdev(sbi->s_journal_bdev_file));
- bdev_fput(sbi->s_journal_bdev_file);
+ fs_bdev_file_release(sbi->s_journal_bdev_file, sb);
}
out_fail:
invalidate_bdev(sb->s_bdev);
@@ -5972,9 +5972,9 @@ static struct file *ext4_get_journal_blkdev(struct super_block *sb,
struct ext4_super_block *es;
int errno;
- bdev_file = bdev_file_open_by_dev(j_dev,
+ bdev_file = fs_bdev_file_open_by_dev(j_dev,
BLK_OPEN_READ | BLK_OPEN_WRITE | BLK_OPEN_RESTRICT_WRITES,
- sb, &fs_holder_ops);
+ sb, sb);
if (IS_ERR(bdev_file)) {
ext4_msg(sb, KERN_ERR,
"failed to open journal device unknown-block(%u,%u) %ld",
@@ -6034,7 +6034,7 @@ static struct file *ext4_get_journal_blkdev(struct super_block *sb,
out_bh:
brelse(bh);
out_bdev:
- bdev_fput(bdev_file);
+ fs_bdev_file_release(bdev_file, sb);
return ERR_PTR(errno);
}
@@ -6073,7 +6073,7 @@ static journal_t *ext4_open_dev_journal(struct super_block *sb,
out_journal:
ext4_journal_destroy(EXT4_SB(sb), journal);
out_bdev:
- bdev_fput(bdev_file);
+ fs_bdev_file_release(bdev_file, sb);
return ERR_PTR(errno);
}
@@ -7492,7 +7492,7 @@ static void ext4_kill_sb(struct super_block *sb)
kill_block_super(sb);
if (bdev_file)
- bdev_fput(bdev_file);
+ fs_bdev_file_release(bdev_file, sb);
}
static struct file_system_type ext4_fs_type = {
--
2.47.3
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH RFC 7/8] erofs: open via dedicated fs bdev helpers
2026-06-02 10:10 [PATCH RFC 0/8] fs: support freeze/thaw/mark_dead/sync with shared devices Christian Brauner
` (5 preceding siblings ...)
2026-06-02 10:10 ` [PATCH RFC 6/8] ext4: " Christian Brauner
@ 2026-06-02 10:10 ` Christian Brauner
2026-06-02 16:25 ` Gao Xiang
2026-06-02 10:10 ` [PATCH RFC 8/8] super: make fs_holder_ops private Christian Brauner
` (2 subsequent siblings)
9 siblings, 1 reply; 21+ messages in thread
From: Christian Brauner @ 2026-06-02 10:10 UTC (permalink / raw)
To: Christoph Hellwig, Jan Kara
Cc: Jens Axboe, Alexander Viro, linux-block, linux-kernel,
linux-fsdevel, Carlos Maiolino, linux-xfs, Chris Mason,
David Sterba, linux-btrfs, Theodore Ts'o, linux-ext4,
Gao Xiang, linux-erofs, Christian Brauner (Amutable)
Route opens through fs_bdev_file_open_by_path() so each external device
is registered against the correct superblock, and convert the matching
releases.
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
---
fs/erofs/data.c | 6 +++++
fs/erofs/internal.h | 10 ++++++++
fs/erofs/super.c | 66 +++++++++++++++++++++++++++++++++++++++++++----------
fs/erofs/zdata.c | 10 +++++---
4 files changed, 77 insertions(+), 15 deletions(-)
diff --git a/fs/erofs/data.c b/fs/erofs/data.c
index 44da21c9d777..5220585293df 100644
--- a/fs/erofs/data.c
+++ b/fs/erofs/data.c
@@ -69,6 +69,9 @@ int erofs_init_metabuf(struct erofs_buf *buf, struct super_block *sb,
{
struct erofs_sb_info *sbi = EROFS_SB(sb);
+ if (erofs_is_shutdown(sb))
+ return -EIO;
+
buf->file = NULL;
if (in_metabox) {
if (unlikely(!sbi->metabox_inode))
@@ -236,6 +239,9 @@ int erofs_map_dev(struct super_block *sb, struct erofs_map_dev *map)
}
up_read(&devs->rwsem);
}
+ if (erofs_is_shutdown(sb) ||
+ (map->m_dif && READ_ONCE(map->m_dif->dead)))
+ return -EIO;
return 0;
}
diff --git a/fs/erofs/internal.h b/fs/erofs/internal.h
index 4792490161ec..ca1ed7ce3961 100644
--- a/fs/erofs/internal.h
+++ b/fs/erofs/internal.h
@@ -48,6 +48,7 @@ struct erofs_device_info {
erofs_blk_t blocks;
erofs_blk_t uniaddr;
+ bool dead; /* backing device gone; fence I/O */
};
enum {
@@ -104,6 +105,7 @@ struct erofs_xattr_prefix_item {
struct erofs_sb_info {
struct erofs_device_info dif0;
struct erofs_mount_opts opt; /* options */
+ unsigned long flags; /* see EROFS_SB_* */
#ifdef CONFIG_EROFS_FS_ZIP
/* list for all registered superblocks, mainly for shrinker */
struct list_head list;
@@ -195,6 +197,14 @@ static inline bool erofs_is_fscache_mode(struct super_block *sb)
!erofs_is_fileio_mode(EROFS_SB(sb)) && !sb->s_bdev;
}
+/* erofs_sb_info->flags */
+#define EROFS_SB_SHUTDOWN 0 /* primary device gone; fail all I/O */
+
+static inline bool erofs_is_shutdown(struct super_block *sb)
+{
+ return test_bit(EROFS_SB_SHUTDOWN, &EROFS_SB(sb)->flags);
+}
+
enum {
EROFS_ZIP_CACHE_DISABLED,
EROFS_ZIP_CACHE_READAHEAD,
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index 802add6652fd..e03cb95be96b 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -153,8 +153,8 @@ static int erofs_init_device(struct erofs_buf *buf, struct super_block *sb,
} else if (!sbi->devs->flatdev) {
file = erofs_is_fileio_mode(sbi) ?
filp_open(dif->path, O_RDONLY | O_LARGEFILE, 0) :
- bdev_file_open_by_path(dif->path,
- BLK_OPEN_READ, sb->s_type, NULL);
+ fs_bdev_file_open_by_path(dif->path,
+ BLK_OPEN_READ, sb->s_type, sb);
if (IS_ERR(file)) {
if (file == ERR_PTR(-ENOTBLK))
return -EINVAL;
@@ -843,11 +843,16 @@ static int erofs_fc_reconfigure(struct fs_context *fc)
static int erofs_release_device_info(int id, void *ptr, void *data)
{
+ struct super_block *sb = data;
struct erofs_device_info *dif = ptr;
fs_put_dax(dif->dax_dev, NULL);
- if (dif->file)
- fput(dif->file);
+ if (dif->file) {
+ if (S_ISBLK(file_inode(dif->file)->i_mode))
+ fs_bdev_file_release(dif->file, sb);
+ else
+ fput(dif->file);
+ }
erofs_fscache_unregister_cookie(dif->fscache);
dif->fscache = NULL;
kfree(dif->path);
@@ -855,18 +860,19 @@ static int erofs_release_device_info(int id, void *ptr, void *data)
return 0;
}
-static void erofs_free_dev_context(struct erofs_dev_context *devs)
+static void erofs_free_dev_context(struct erofs_dev_context *devs,
+ struct super_block *sb)
{
if (!devs)
return;
- idr_for_each(&devs->tree, &erofs_release_device_info, NULL);
+ idr_for_each(&devs->tree, &erofs_release_device_info, sb);
idr_destroy(&devs->tree);
kfree(devs);
}
-static void erofs_sb_free(struct erofs_sb_info *sbi)
+static void erofs_sb_free(struct erofs_sb_info *sbi, struct super_block *sb)
{
- erofs_free_dev_context(sbi->devs);
+ erofs_free_dev_context(sbi->devs, sb);
kfree(sbi->fsid);
kfree_sensitive(sbi->domain_id);
if (sbi->dif0.file)
@@ -879,8 +885,13 @@ static void erofs_fc_free(struct fs_context *fc)
{
struct erofs_sb_info *sbi = fc->s_fs_info;
- if (sbi) /* free here if an error occurs before transferring to sb */
- erofs_sb_free(sbi);
+ /*
+ * Freed here only if an error occurs before the sb is set up; at that
+ * point no block-backed device has been claimed (that happens in
+ * fill_super), so the NULL sb never reaches fs_bdev_file_release().
+ */
+ if (sbi)
+ erofs_sb_free(sbi, NULL);
}
static const struct fs_context_operations erofs_context_ops = {
@@ -936,7 +947,7 @@ static void erofs_kill_sb(struct super_block *sb)
erofs_drop_internal_inodes(sbi);
fs_put_dax(sbi->dif0.dax_dev, NULL);
erofs_fscache_unregister_fs(sb);
- erofs_sb_free(sbi);
+ erofs_sb_free(sbi, sb);
sb->s_fs_info = NULL;
}
@@ -948,7 +959,7 @@ static void erofs_put_super(struct super_block *sb)
erofs_shrinker_unregister(sb);
erofs_xattr_prefixes_cleanup(sb);
erofs_drop_internal_inodes(sbi);
- erofs_free_dev_context(sbi->devs);
+ erofs_free_dev_context(sbi->devs, sb);
sbi->devs = NULL;
erofs_fscache_unregister_fs(sb);
}
@@ -1121,6 +1132,35 @@ static void erofs_evict_inode(struct inode *inode)
clear_inode(inode);
}
+/*
+ * A blob device may back several erofs superblocks; fence only the affected
+ * one and keep the rest of the mount alive. The primary device falls back to
+ * the generic teardown (return non-zero).
+ */
+static int erofs_remove_bdev(struct super_block *sb, struct block_device *bdev)
+{
+ struct erofs_dev_context *devs = EROFS_SB(sb)->devs;
+ struct erofs_device_info *dif;
+ int id;
+
+ if (bdev == sb->s_bdev)
+ return 1;
+
+ down_read(&devs->rwsem);
+ idr_for_each_entry(&devs->tree, dif, id) {
+ if (dif->file && S_ISBLK(file_inode(dif->file)->i_mode) &&
+ file_bdev(dif->file)->bd_dev == bdev->bd_dev)
+ WRITE_ONCE(dif->dead, true);
+ }
+ up_read(&devs->rwsem);
+ return 0;
+}
+
+static void erofs_shutdown(struct super_block *sb)
+{
+ set_bit(EROFS_SB_SHUTDOWN, &EROFS_SB(sb)->flags);
+}
+
const struct super_operations erofs_sops = {
.put_super = erofs_put_super,
.alloc_inode = erofs_alloc_inode,
@@ -1128,6 +1168,8 @@ const struct super_operations erofs_sops = {
.evict_inode = erofs_evict_inode,
.statfs = erofs_statfs,
.show_options = erofs_show_options,
+ .remove_bdev = erofs_remove_bdev,
+ .shutdown = erofs_shutdown,
};
module_init(erofs_module_init);
diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
index 43bb5a6a9924..89ae91935364 100644
--- a/fs/erofs/zdata.c
+++ b/fs/erofs/zdata.c
@@ -1697,11 +1697,15 @@ static void z_erofs_submit_queue(struct z_erofs_frontend *f,
continue;
}
- /* no device id here, thus it will always succeed */
mdev = (struct erofs_map_dev) {
.m_pa = round_down(pcl->pos, sb->s_blocksize),
};
- (void)erofs_map_dev(sb, &mdev);
+ if (erofs_map_dev(sb, &mdev)) {
+ /* the backing device is gone; fail the batch */
+ q[JQ_SUBMIT]->eio = true;
+ qtail[JQ_SUBMIT] = &pcl->next;
+ continue;
+ }
cur = mdev.m_pa;
end = round_up(cur + pcl->pageofs_in + pcl->pclustersize,
@@ -1785,7 +1789,7 @@ static void z_erofs_submit_queue(struct z_erofs_frontend *f,
* although background is preferred, no one is pending for submission.
* don't issue decompression but drop it directly instead.
*/
- if (!*force_fg && !nr_bios) {
+ if (!*force_fg && !nr_bios && !q[JQ_SUBMIT]->eio) {
kvfree(q[JQ_SUBMIT]);
return;
}
--
2.47.3
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH RFC 8/8] super: make fs_holder_ops private
2026-06-02 10:10 [PATCH RFC 0/8] fs: support freeze/thaw/mark_dead/sync with shared devices Christian Brauner
` (6 preceding siblings ...)
2026-06-02 10:10 ` [PATCH RFC 7/8] erofs: " Christian Brauner
@ 2026-06-02 10:10 ` Christian Brauner
2026-06-08 10:18 ` Jan Kara
2026-06-02 16:12 ` [PATCH RFC 0/8] fs: support freeze/thaw/mark_dead/sync with shared devices Gao Xiang
2026-06-03 6:43 ` [syzbot ci] " syzbot ci
9 siblings, 1 reply; 21+ messages in thread
From: Christian Brauner @ 2026-06-02 10:10 UTC (permalink / raw)
To: Christoph Hellwig, Jan Kara
Cc: Jens Axboe, Alexander Viro, linux-block, linux-kernel,
linux-fsdevel, Carlos Maiolino, linux-xfs, Chris Mason,
David Sterba, linux-btrfs, Theodore Ts'o, linux-ext4,
Gao Xiang, linux-erofs, Christian Brauner (Amutable)
There's no need to expose it anymore.
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
---
fs/super.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/fs/super.c b/fs/super.c
index cea743f699e4..983c2fbf5202 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1643,13 +1643,12 @@ static int fs_bdev_thaw(struct block_device *bdev)
return error;
}
-const struct blk_holder_ops fs_holder_ops = {
+static const struct blk_holder_ops fs_holder_ops = {
.mark_dead = fs_bdev_mark_dead,
.sync = fs_bdev_sync,
.freeze = fs_bdev_freeze,
.thaw = fs_bdev_thaw,
};
-EXPORT_SYMBOL_GPL(fs_holder_ops);
static int fs_bdev_register(struct file *bdev_file, struct super_block *sb)
{
--
2.47.3
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH RFC 0/8] fs: support freeze/thaw/mark_dead/sync with shared devices
2026-06-02 10:10 [PATCH RFC 0/8] fs: support freeze/thaw/mark_dead/sync with shared devices Christian Brauner
` (7 preceding siblings ...)
2026-06-02 10:10 ` [PATCH RFC 8/8] super: make fs_holder_ops private Christian Brauner
@ 2026-06-02 16:12 ` Gao Xiang
2026-06-03 6:43 ` [syzbot ci] " syzbot ci
9 siblings, 0 replies; 21+ messages in thread
From: Gao Xiang @ 2026-06-02 16:12 UTC (permalink / raw)
To: Christian Brauner
Cc: Jens Axboe, Alexander Viro, linux-block, linux-kernel,
linux-fsdevel, Carlos Maiolino, linux-xfs, Chris Mason,
David Sterba, linux-btrfs, Theodore Ts'o, linux-ext4,
Gao Xiang, linux-erofs, Christoph Hellwig, Jan Kara
Hi,
On 2026/6/2 18:10, Christian Brauner wrote:
> Note, this is on the border between RFC/POC and so I haven't pushed this
> through testing yet. But I don't want to waste more time on this before
> showing it.
>
> I surveyed various fs implementations because I want the ability to
> extend userspace the ability to manage what devices can be onlined in a
> centralized way without having to force every fs to care about this.
>
> I realized that erofs allows sharing block devices with multiple
> superblocks. Any freeze, thaw, removal, or sync on those devices will
> not be communicated to the superblocks using it and our current
> infrastructure is unable to deal with this.
>
> This attempts to add the ability to go from device number to all the
> superblock using that device, iterate through them one-by-one and
> perform actions on them. For most fses this is a 1:1 mapping but for
> erofs its a 1:many mapping.
>
> This is not unreasonable infastructure to support in my opinion. I
> played around with some ideas for this and I want to send out an RFC to
> gather some early input.
Yes, just a side note: On the erofs side, since we apply immutable
model to each filesystems rather than writable filesystem approaches
so inode data (in devices or files) can be shared among multiple
different filesystems without any reference count needs for example
(in the similar models: any write needs to be COWed using overlayfs
for example.), so blob devices are 1:many shared mapping by design.
One typical example is that we could convert each OCI tar layer
into an erofs blob, and use a metadata-only erofs to index these
converted erofs blobs so there is only one filesystem instead of
per-layer filesystems (it's called fsmerge in the containerd
implementation.), but each converted erofs blob can be shared
among different filesystems.
Another example is incremental diff updates, the primary device
can only contain incremental data and refer to the base image for
the remaining data; and base image can be shared too.
Thanks,
Gao Xiang
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH RFC 7/8] erofs: open via dedicated fs bdev helpers
2026-06-02 10:10 ` [PATCH RFC 7/8] erofs: " Christian Brauner
@ 2026-06-02 16:25 ` Gao Xiang
2026-06-03 13:42 ` Christian Brauner
0 siblings, 1 reply; 21+ messages in thread
From: Gao Xiang @ 2026-06-02 16:25 UTC (permalink / raw)
To: Christian Brauner
Cc: Jens Axboe, Alexander Viro, linux-block, linux-kernel,
linux-fsdevel, Carlos Maiolino, linux-xfs, Chris Mason,
David Sterba, linux-btrfs, Theodore Ts'o, linux-ext4,
Gao Xiang, linux-erofs, Christoph Hellwig, Jan Kara
On 2026/6/2 18:10, Christian Brauner wrote:
> Route opens through fs_bdev_file_open_by_path() so each external device
> is registered against the correct superblock, and convert the matching
> releases.
>
> Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
> ---
> fs/erofs/data.c | 6 +++++
> fs/erofs/internal.h | 10 ++++++++
> fs/erofs/super.c | 66 +++++++++++++++++++++++++++++++++++++++++++----------
> fs/erofs/zdata.c | 10 +++++---
> 4 files changed, 77 insertions(+), 15 deletions(-)
>
> diff --git a/fs/erofs/data.c b/fs/erofs/data.c
> index 44da21c9d777..5220585293df 100644
> --- a/fs/erofs/data.c
> +++ b/fs/erofs/data.c
> @@ -69,6 +69,9 @@ int erofs_init_metabuf(struct erofs_buf *buf, struct super_block *sb,
> {
> struct erofs_sb_info *sbi = EROFS_SB(sb);
>
> + if (erofs_is_shutdown(sb))
> + return -EIO;
> +
> buf->file = NULL;
> if (in_metabox) {
> if (unlikely(!sbi->metabox_inode))
> @@ -236,6 +239,9 @@ int erofs_map_dev(struct super_block *sb, struct erofs_map_dev *map)
> }
> up_read(&devs->rwsem);
> }
> + if (erofs_is_shutdown(sb) ||
> + (map->m_dif && READ_ONCE(map->m_dif->dead)))
> + return -EIO;
Take a quick look at the code, maybe we can just add
the SHUTDOWN status only since I don't think remove an
individual blob device is useful for the typical image
use cases, so there is no need adding `dead` for each
individual extra device.
and just bail out if erofs_is_shutdown() at the very
beginning of erofs_map_dev()?
> return 0;
> }
>
...
> diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
> index 43bb5a6a9924..89ae91935364 100644
> --- a/fs/erofs/zdata.c
> +++ b/fs/erofs/zdata.c
> @@ -1697,11 +1697,15 @@ static void z_erofs_submit_queue(struct z_erofs_frontend *f,
> continue;
> }
>
> - /* no device id here, thus it will always succeed */
> mdev = (struct erofs_map_dev) {
> .m_pa = round_down(pcl->pos, sb->s_blocksize),
> };
> - (void)erofs_map_dev(sb, &mdev);
> + if (erofs_map_dev(sb, &mdev)) {
> + /* the backing device is gone; fail the batch */
> + q[JQ_SUBMIT]->eio = true;
> + qtail[JQ_SUBMIT] = &pcl->next;
> + continue;
> + }
It needs some injection tests anyway.
May I ask if it's an urgent 7.2 work? If not, I could
make a preparation patch for the upcoming 7.2 cycle
to handle erofs_map_dev() failure here so you don't
need to bother with this in this patchset.
I will seek more time to resolve the recent todos
yet always intercepted by other unrelated stuffs.
Thanks,
Gao Xaing
^ permalink raw reply [flat|nested] 21+ messages in thread
* [syzbot ci] Re: fs: support freeze/thaw/mark_dead/sync with shared devices
2026-06-02 10:10 [PATCH RFC 0/8] fs: support freeze/thaw/mark_dead/sync with shared devices Christian Brauner
` (8 preceding siblings ...)
2026-06-02 16:12 ` [PATCH RFC 0/8] fs: support freeze/thaw/mark_dead/sync with shared devices Gao Xiang
@ 2026-06-03 6:43 ` syzbot ci
9 siblings, 0 replies; 21+ messages in thread
From: syzbot ci @ 2026-06-03 6:43 UTC (permalink / raw)
To: axboe, brauner, cem, clm, dsterba, hch, jack, linux-block,
linux-btrfs, linux-erofs, linux-ext4, linux-fsdevel, linux-kernel,
linux-xfs, tytso, viro, xiang
Cc: syzbot, syzkaller-bugs
syzbot ci has tested the following series
[v1] fs: support freeze/thaw/mark_dead/sync with shared devices
https://lore.kernel.org/all/20260602-work-super-bdev_holder_global-v1-0-bb0fd82f3861@kernel.org
* [PATCH RFC 1/8] fs, block: move blk_mode_t and fop_flags_t into <linux/types.h>
* [PATCH RFC 2/8] fs: add a global device to super block hash table
* [PATCH RFC 3/8] fs: refuse to claim any frozen block device
* [PATCH RFC 4/8] xfs: port to fs_bdev_file_open_by_path()
* [PATCH RFC 5/8] btrfs: open via dedicated fs bdev helpers
* [PATCH RFC 6/8] ext4: open via dedicated fs bdev helpers
* [PATCH RFC 7/8] erofs: open via dedicated fs bdev helpers
* [PATCH RFC 8/8] super: make fs_holder_ops private
and found the following issue:
general protection fault in close_fs_devices
Full report is available here:
https://ci.syzbot.org/series/9511f00a-a3c2-44ab-9a0b-2d65de5bbd49
***
general protection fault in close_fs_devices
tree: bpf-next
URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/bpf/bpf-next.git
base: 254f49634ee16a731174d2ae34bc50bd5f45e731
arch: amd64
compiler: Debian clang version 21.1.8 (++20251221033036+2078da43e25a-1~exp1~20251221153213.50), Debian LLD 21.1.8
config: https://ci.syzbot.org/builds/4af26755-5773-453e-807d-ee451d2fdec5/config
syz repro: https://ci.syzbot.org/findings/2d8d96f7-d133-47dc-b4ca-5c0c65e1b6c9/syz_repro
btrfs: Deprecated parameter 'usebackuproot'
BTRFS warning: 'usebackuproot' is deprecated, use 'rescue=usebackuproot' instead
BTRFS: device fsid ed167579-eb65-4e76-9a50-61ac97e9b59d devid 1281 transid 8 /dev/loop1 (7:1) scanned by syz.1.18 (5863)
Oops: general protection fault, probably for non-canonical address 0xdffffc00000000f8: 0000 [#1] SMP KASAN PTI
KASAN: null-ptr-deref in range [0x00000000000007c0-0x00000000000007c7]
CPU: 1 UID: 0 PID: 5863 Comm: syz.1.18 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
RIP: 0010:btrfs_close_bdev fs/btrfs/volumes.c:1140 [inline]
RIP: 0010:btrfs_close_one_device fs/btrfs/volumes.c:1161 [inline]
RIP: 0010:close_fs_devices+0x47c/0x860 fs/btrfs/volumes.c:1204
Code: 3c 08 00 74 08 48 89 ef e8 b1 95 38 fe 48 8b 6d 00 b8 c0 07 00 00 48 01 c5 48 89 e8 48 c1 e8 03 48 b9 00 00 00 00 00 fc ff df <80> 3c 08 00 74 08 48 89 ef e8 86 95 38 fe 48 8b 75 00 4c 89 ff e8
RSP: 0018:ffffc90004007a48 EFLAGS: 00010202
RAX: 00000000000000f8 RBX: 1ffff110368c440b RCX: dffffc0000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 00000000000007c0 R08: ffff8881b462206f R09: 1ffff110368c440d
R10: dffffc0000000000 R11: ffffed10368c440e R12: ffff8881b4622000
R13: ffff8881b4622068 R14: ffff8881b4622058 R15: ffff8881707b7a00
FS: 00007f849d6ce6c0(0000) GS:ffff8882a9292000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f849c786a00 CR3: 00000001bbbcc000 CR4: 00000000000006f0
Call Trace:
<TASK>
btrfs_close_devices+0xcd/0x570 fs/btrfs/volumes.c:1219
btrfs_free_fs_info+0x4f/0x360 fs/btrfs/disk-io.c:1205
deactivate_locked_super+0xbc/0x130 fs/super.c:477
btrfs_get_tree_super fs/btrfs/super.c:-1 [inline]
btrfs_get_tree_subvol fs/btrfs/super.c:2087 [inline]
btrfs_get_tree+0xca6/0x1910 fs/btrfs/super.c:2121
vfs_get_tree+0x92/0x2a0 fs/super.c:1928
fc_mount fs/namespace.c:1193 [inline]
do_new_mount_fc fs/namespace.c:3758 [inline]
do_new_mount+0x341/0xd30 fs/namespace.c:3834
do_mount fs/namespace.c:4167 [inline]
__do_sys_mount fs/namespace.c:4383 [inline]
__se_sys_mount+0x31d/0x420 fs/namespace.c:4360
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0x15f/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f849c79e0ca
Code: 48 c7 c2 e8 ff ff ff f7 d8 64 89 02 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f849d6cde58 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
RAX: ffffffffffffffda RBX: 00007f849d6cdee0 RCX: 00007f849c79e0ca
RDX: 00002000000055c0 RSI: 0000200000000340 RDI: 00007f849d6cdea0
RBP: 00002000000055c0 R08: 00007f849d6cdee0 R09: 0000000000000408
R10: 0000000000000408 R11: 0000000000000246 R12: 0000200000000340
R13: 00007f849d6cdea0 R14: 00000000000055f5 R15: 0000200000000380
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:btrfs_close_bdev fs/btrfs/volumes.c:1140 [inline]
RIP: 0010:btrfs_close_one_device fs/btrfs/volumes.c:1161 [inline]
RIP: 0010:close_fs_devices+0x47c/0x860 fs/btrfs/volumes.c:1204
Code: 3c 08 00 74 08 48 89 ef e8 b1 95 38 fe 48 8b 6d 00 b8 c0 07 00 00 48 01 c5 48 89 e8 48 c1 e8 03 48 b9 00 00 00 00 00 fc ff df <80> 3c 08 00 74 08 48 89 ef e8 86 95 38 fe 48 8b 75 00 4c 89 ff e8
RSP: 0018:ffffc90004007a48 EFLAGS: 00010202
RAX: 00000000000000f8 RBX: 1ffff110368c440b RCX: dffffc0000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 00000000000007c0 R08: ffff8881b462206f R09: 1ffff110368c440d
R10: dffffc0000000000 R11: ffffed10368c440e R12: ffff8881b4622000
R13: ffff8881b4622068 R14: ffff8881b4622058 R15: ffff8881707b7a00
FS: 00007f849d6ce6c0(0000) GS:ffff8882a9292000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000557941c2b058 CR3: 00000001bbbcc000 CR4: 00000000000006f0
----------------
Code disassembly (best guess):
0: 3c 08 cmp $0x8,%al
2: 00 74 08 48 add %dh,0x48(%rax,%rcx,1)
6: 89 ef mov %ebp,%edi
8: e8 b1 95 38 fe call 0xfe3895be
d: 48 8b 6d 00 mov 0x0(%rbp),%rbp
11: b8 c0 07 00 00 mov $0x7c0,%eax
16: 48 01 c5 add %rax,%rbp
19: 48 89 e8 mov %rbp,%rax
1c: 48 c1 e8 03 shr $0x3,%rax
20: 48 b9 00 00 00 00 00 movabs $0xdffffc0000000000,%rcx
27: fc ff df
* 2a: 80 3c 08 00 cmpb $0x0,(%rax,%rcx,1) <-- trapping instruction
2e: 74 08 je 0x38
30: 48 89 ef mov %rbp,%rdi
33: e8 86 95 38 fe call 0xfe3895be
38: 48 8b 75 00 mov 0x0(%rbp),%rsi
3c: 4c 89 ff mov %r15,%rdi
3f: e8 .byte 0xe8
***
If these findings have caused you to resend the series or submit a
separate fix, please add the following tag to your commit message:
Tested-by: syzbot@syzkaller.appspotmail.com
---
This report is generated by a bot. It may contain errors.
syzbot ci engineers can be reached at syzkaller@googlegroups.com.
To test a patch for this bug, please reply with `#syz test`
(should be on a separate line).
The patch should be attached to the email.
Note: arguments like custom git repos and branches are not supported.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH RFC 7/8] erofs: open via dedicated fs bdev helpers
2026-06-02 16:25 ` Gao Xiang
@ 2026-06-03 13:42 ` Christian Brauner
2026-06-10 6:55 ` Gao Xiang
0 siblings, 1 reply; 21+ messages in thread
From: Christian Brauner @ 2026-06-03 13:42 UTC (permalink / raw)
To: Gao Xiang
Cc: Jens Axboe, Alexander Viro, linux-block, linux-kernel,
linux-fsdevel, Carlos Maiolino, linux-xfs, Chris Mason,
David Sterba, linux-btrfs, Theodore Ts'o, linux-ext4,
Gao Xiang, linux-erofs, Christoph Hellwig, Jan Kara
> May I ask if it's an urgent 7.2 work? If not, I could
No no, it's way too late for that this cycle.
> make a preparation patch for the upcoming 7.2 cycle
> to handle erofs_map_dev() failure here so you don't
> need to bother with this in this patchset.
Sounds good. I take it you can just do this yourself without me.
> I will seek more time to resolve the recent todos
Thanks!
> yet always intercepted by other unrelated stuffs.
:)
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH RFC 1/8] fs, block: move blk_mode_t and fop_flags_t into <linux/types.h>
2026-06-02 10:10 ` [PATCH RFC 1/8] fs, block: move blk_mode_t and fop_flags_t into <linux/types.h> Christian Brauner
@ 2026-06-08 9:57 ` Jan Kara
0 siblings, 0 replies; 21+ messages in thread
From: Jan Kara @ 2026-06-08 9:57 UTC (permalink / raw)
To: Christian Brauner
Cc: Christoph Hellwig, Jan Kara, Jens Axboe, Alexander Viro,
linux-block, linux-kernel, linux-fsdevel, Carlos Maiolino,
linux-xfs, Chris Mason, David Sterba, linux-btrfs,
Theodore Ts'o, linux-ext4, Gao Xiang, linux-erofs
On Tue 02-06-26 12:10:07, Christian Brauner wrote:
> blk_mode_t and fop_flags_t are both plain 'unsigned int __bitwise' flag
> typedefs, exactly like the gfp_t, slab_flags_t and fmode_t that already
> live in <linux/types.h>. Move them there so they are available
> everywhere without having to drag in a subsystem header.
>
> Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
Makes sense. Feel free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
> ---
> include/linux/blkdev.h | 2 --
> include/linux/fs.h | 2 --
> include/linux/types.h | 2 ++
> 3 files changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 890128cdea1c..c8494d64a69d 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -126,8 +126,6 @@ struct blk_integrity {
> unsigned char pi_tuple_size;
> };
>
> -typedef unsigned int __bitwise blk_mode_t;
> -
> /* open for reading */
> #define BLK_OPEN_READ ((__force blk_mode_t)(1 << 0))
> /* open for writing */
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 11559c513dfb..e9346be8470f 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1921,8 +1921,6 @@ struct dir_context {
> struct io_uring_cmd;
> struct offset_ctx;
>
> -typedef unsigned int __bitwise fop_flags_t;
> -
> struct file_operations {
> struct module *owner;
> fop_flags_t fop_flags;
> diff --git a/include/linux/types.h b/include/linux/types.h
> index 608050dbca6a..ef026585420b 100644
> --- a/include/linux/types.h
> +++ b/include/linux/types.h
> @@ -163,6 +163,8 @@ typedef u32 dma_addr_t;
> typedef unsigned int __bitwise gfp_t;
> typedef unsigned int __bitwise slab_flags_t;
> typedef unsigned int __bitwise fmode_t;
> +typedef unsigned int __bitwise blk_mode_t;
> +typedef unsigned int __bitwise fop_flags_t;
>
> #ifdef CONFIG_PHYS_ADDR_T_64BIT
> typedef u64 phys_addr_t;
>
> --
> 2.47.3
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH RFC 3/8] fs: refuse to claim any frozen block device
2026-06-02 10:10 ` [PATCH RFC 3/8] fs: refuse to claim any frozen block device Christian Brauner
@ 2026-06-08 10:01 ` Jan Kara
0 siblings, 0 replies; 21+ messages in thread
From: Jan Kara @ 2026-06-08 10:01 UTC (permalink / raw)
To: Christian Brauner
Cc: Christoph Hellwig, Jan Kara, Jens Axboe, Alexander Viro,
linux-block, linux-kernel, linux-fsdevel, Carlos Maiolino,
linux-xfs, Chris Mason, David Sterba, linux-btrfs,
Theodore Ts'o, linux-ext4, Gao Xiang, linux-erofs
On Tue 02-06-26 12:10:09, Christian Brauner wrote:
> setup_bdev_super() already refuses to bring a filesystem up on a frozen
> block device but only for the primary device. Now that filesystems claim
> every device through fs_bdev_file_open_by_{dev,path}(), do that check
> once in the registration helper so it covers all of them.
>
> Drop the now-redundant check from setup_bdev_super().
>
> Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
> ---
> fs/super.c | 21 +++++++++++----------
> 1 file changed, 11 insertions(+), 10 deletions(-)
>
> diff --git a/fs/super.c b/fs/super.c
> index e0174d5819a0..cea743f699e4 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -1690,6 +1690,17 @@ static int fs_bdev_register(struct file *bdev_file, struct super_block *sb)
> sb->s_count++;
> spin_unlock(&sb_lock);
>
> + /*
> + * Don't bring a filesystem up on a frozen device. The entry is already
> + * published, so a freeze either is seen here or finds it and waits in
> + * super_lock() until this mount is born or (on -EBUSY) dies. The mount
> + * aborts, so the entry is torn down without rebalancing @fs_bdev_active.
> + */
> + if (atomic_read(&file_bdev(bdev_file)->bd_fsfreeze_count) > 0) {
> + fs_bdev_holder_put(h);
> + return -EBUSY;
> + }
> +
> return 0;
> }
Shouldn't this check be common also for the branch where we only increase
the refcount? Or is a filesystem where a superblock claims the bdev
multiple times and can get frozen inbetween too insane?
Honza
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH RFC 2/8] fs: add a global device to super block hash table
2026-06-02 10:10 ` [PATCH RFC 2/8] fs: add a global device to super block hash table Christian Brauner
@ 2026-06-08 10:14 ` Jan Kara
2026-06-16 12:34 ` Christoph Hellwig
1 sibling, 0 replies; 21+ messages in thread
From: Jan Kara @ 2026-06-08 10:14 UTC (permalink / raw)
To: Christian Brauner
Cc: Christoph Hellwig, Jan Kara, Jens Axboe, Alexander Viro,
linux-block, linux-kernel, linux-fsdevel, Carlos Maiolino,
linux-xfs, Chris Mason, David Sterba, linux-btrfs,
Theodore Ts'o, linux-ext4, Gao Xiang, linux-erofs
On Tue 02-06-26 12:10:08, Christian Brauner wrote:
> fs_holder_ops recovers the owning superblock from bdev->bd_holder, which
> forces the holder to be exactly one superblock and prevents several
> superblocks from sharing one block device. That's what erofs is doing.
>
> Introduce a global dev_t-keyed rhltable mapping each block device to the
> superblock(s) using it. The holder argument becomes purely the block
> layer's exclusivity token (a superblock, or a file_system_type for
> shared devices) and is no longer needed by the fs specific callbacks.
>
> Registration keeps one entry per (device, superblock). When a filesystem
> claims a device it already uses (xfs with its log on the data device), no
> second entry is added, so each superblock is acted on once.
>
> Each table entry holds a passive reference (s_count) on its superblock,
> so the struct stays valid for as long as the entry is reachable. The
> callbacks look the device up in the table and act on every superblock
> using it:
>
> Unlinking an entry is deferred to the last unpin, so a cursor never
> resumes from a removed node. After this it's possible to act on all
> superblocks that share a given device.
>
> Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
Looks good! One comment below:
> static void fs_bdev_mark_dead(struct block_device *bdev, bool surprise)
> {
> - struct super_block *sb;
> + struct fs_bdev_holder *h;
> + dev_t dev = bdev->bd_dev;
>
> - sb = bdev_super_lock(bdev, false);
> - if (!sb)
> - return;
> + mutex_unlock(&bdev->bd_holder_lock);
The moment we drop bd_holder_lock, there's nothing which prevents the bdev
owner from changing. So this can lead to a situation where we miss calling
->mark_dead callback of the new holder. Similarly for all the other holder
ops. I didn't find a situation where it would actually matter so I think
we're fine but it's a potential catch. Anyway, feel free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
>
> - if (sb->s_op->remove_bdev) {
> - int ret;
> + for (h = fs_bdev_first(dev); h; h = fs_bdev_next(h)) {
> + struct super_block *sb = h->sb;
>
> - ret = sb->s_op->remove_bdev(sb, bdev);
> - if (!ret) {
> - super_unlock_shared(sb);
> - return;
> + if (!super_lock_shared(sb))
> + continue;
> + if (sb->s_root && (sb->s_flags & SB_ACTIVE)) {
> + if (!sb->s_op->remove_bdev ||
> + sb->s_op->remove_bdev(sb, bdev)) {
> + if (!surprise)
> + sync_filesystem(sb);
> + shrink_dcache_sb(sb);
> + evict_inodes(sb);
> + if (sb->s_op->shutdown)
> + sb->s_op->shutdown(sb);
> + }
> }
> - /* Fallback to shutdown. */
> + super_unlock_shared(sb);
> }
> -
> - if (!surprise)
> - sync_filesystem(sb);
> - shrink_dcache_sb(sb);
> - evict_inodes(sb);
> - if (sb->s_op->shutdown)
> - sb->s_op->shutdown(sb);
> -
> - super_unlock_shared(sb);
> }
>
> static void fs_bdev_sync(struct block_device *bdev)
> {
> - struct super_block *sb;
> + struct fs_bdev_holder *h;
> + dev_t dev = bdev->bd_dev;
>
> - sb = bdev_super_lock(bdev, false);
> - if (!sb)
> - return;
> + mutex_unlock(&bdev->bd_holder_lock);
>
> - sync_filesystem(sb);
> - super_unlock_shared(sb);
> -}
> + for (h = fs_bdev_first(dev); h; h = fs_bdev_next(h)) {
> + struct super_block *sb = h->sb;
>
> -static struct super_block *get_bdev_super(struct block_device *bdev)
> -{
> - bool active = false;
> - struct super_block *sb;
> -
> - sb = bdev_super_lock(bdev, true);
> - if (sb) {
> - active = atomic_inc_not_zero(&sb->s_active);
> - super_unlock_excl(sb);
> + if (!super_lock_shared(sb))
> + continue;
> + if (sb->s_root && (sb->s_flags & SB_ACTIVE))
> + sync_filesystem(sb);
> + super_unlock_shared(sb);
> }
> - if (!active)
> - return NULL;
> - return sb;
> }
>
> /**
> - * fs_bdev_freeze - freeze owning filesystem of block device
> + * fs_bdev_freeze - freeze every superblock using a block device
> * @bdev: block device
> *
> - * Freeze the filesystem that owns this block device if it is still
> - * active.
> - *
> - * A filesystem that owns multiple block devices may be frozen from each
> - * block device and won't be unfrozen until all block devices are
> - * unfrozen. Each block device can only freeze the filesystem once as we
> - * nest freezes for block devices in the block layer.
> + * Freeze each live superblock using @bdev. A superblock owning several block
> + * devices is frozen once per device and stays frozen until all are thawed; the
> + * block layer nests these freezes so the count stays balanced.
> *
> - * Return: If the freeze was successful zero is returned. If the freeze
> - * failed a negative error code is returned.
> + * Return: 0, or the error from the one superblock on a single-fs device. When
> + * several superblocks share @bdev a per-superblock failure is swallowed
> + * (see below), but a sync_blockdev() failure is always reported.
> */
> static int fs_bdev_freeze(struct block_device *bdev)
> {
> - struct super_block *sb;
> - int error = 0;
> + dev_t dev = bdev->bd_dev;
> + struct fs_bdev_holder *h;
> + unsigned int count = 0;
> + int error = 0, err;
>
> lockdep_assert_held(&bdev->bd_fsfreeze_mutex);
>
> - sb = get_bdev_super(bdev);
> - if (!sb)
> - return -EINVAL;
> + mutex_unlock(&bdev->bd_holder_lock);
>
> - if (sb->s_op->freeze_super)
> - error = sb->s_op->freeze_super(sb,
> - FREEZE_MAY_NEST | FREEZE_HOLDER_USERSPACE, NULL);
> - else
> - error = freeze_super(sb,
> - FREEZE_MAY_NEST | FREEZE_HOLDER_USERSPACE, NULL);
> + for (h = fs_bdev_first(dev); h; h = fs_bdev_next(h)) {
> + if (!atomic_inc_not_zero(&h->sb->s_active))
> + continue;
> + err = fs_super_freeze(h->sb);
> + if (err && !error)
> + error = err;
> + deactivate_super(h->sb);
> + count++;
> + }
> +
> + /*
> + * When several superblocks share the device, keep it frozen even if some
> + * of them failed to freeze and swallow the error: rolling the rest back
> + * via thaw_super() can fail too, so neither is a clear win. A single
> + * filesystem (count == 1) still reports its error.
> + */
> + if (error && count > 1)
> + error = 0;
> if (!error)
> error = sync_blockdev(bdev);
> - deactivate_super(sb);
> return error;
> }
>
> /**
> - * fs_bdev_thaw - thaw owning filesystem of block device
> + * fs_bdev_thaw - thaw every superblock using a block device
> * @bdev: block device
> *
> - * Thaw the filesystem that owns this block device.
> + * The counterpart to fs_bdev_freeze(): thaw each live superblock using @bdev.
> + * A zero return does not imply a superblock is fully unfrozen; it may have been
> + * frozen more than once (by the kernel or via another device).
> *
> - * A filesystem that owns multiple block devices may be frozen from each
> - * block device and won't be unfrozen until all block devices are
> - * unfrozen. Each block device can only freeze the filesystem once as we
> - * nest freezes for block devices in the block layer.
> - *
> - * Return: If the thaw was successful zero is returned. If the thaw
> - * failed a negative error code is returned. If this function
> - * returns zero it doesn't mean that the filesystem is unfrozen
> - * as it may have been frozen multiple times (kernel may hold a
> - * freeze or might be frozen from other block devices).
> + * Return: 0, or the first error on a single-fs device; a shared device swallows
> + * per-superblock errors, as fs_bdev_freeze() does.
> */
> static int fs_bdev_thaw(struct block_device *bdev)
> {
> - struct super_block *sb;
> - int error;
> + dev_t dev = bdev->bd_dev;
> + struct fs_bdev_holder *h;
> + unsigned int count = 0;
> + int error = 0, err;
>
> lockdep_assert_held(&bdev->bd_fsfreeze_mutex);
>
> - /*
> - * The block device may have been frozen before it was claimed by a
> - * filesystem. Concurrently another process might try to mount that
> - * frozen block device and has temporarily claimed the block device for
> - * that purpose causing a concurrent fs_bdev_thaw() to end up here. The
> - * mounter is already about to abort mounting because they still saw an
> - * elevanted bdev->bd_fsfreeze_count so get_bdev_super() will return
> - * NULL in that case.
> - */
> - sb = get_bdev_super(bdev);
> - if (!sb)
> - return -EINVAL;
> + mutex_unlock(&bdev->bd_holder_lock);
>
> - if (sb->s_op->thaw_super)
> - error = sb->s_op->thaw_super(sb,
> - FREEZE_MAY_NEST | FREEZE_HOLDER_USERSPACE, NULL);
> - else
> - error = thaw_super(sb,
> - FREEZE_MAY_NEST | FREEZE_HOLDER_USERSPACE, NULL);
> - deactivate_super(sb);
> + for (h = fs_bdev_first(dev); h; h = fs_bdev_next(h)) {
> + if (!atomic_inc_not_zero(&h->sb->s_active))
> + continue;
> + err = fs_super_thaw(h->sb);
> + if (err && !error)
> + error = err;
> + deactivate_super(h->sb);
> + count++;
> + }
> +
> + /* Shared device: swallow per-superblock errors, like fs_bdev_freeze(). */
> + if (error && count > 1)
> + error = 0;
> return error;
> }
>
> @@ -1602,6 +1651,131 @@ const struct blk_holder_ops fs_holder_ops = {
> };
> EXPORT_SYMBOL_GPL(fs_holder_ops);
>
> +static int fs_bdev_register(struct file *bdev_file, struct super_block *sb)
> +{
> + dev_t dev = file_bdev(bdev_file)->bd_dev;
> + struct rhlist_head *list, *pos;
> + struct fs_bdev_holder *h;
> + int err;
> +
> + /*
> + * A superblock may claim one device more than once (xfs with its log on
> + * the data device). Keep a single entry per (device, superblock) and
> + * count the claims in @fs_bdev_active; the entry lives until the last one
> + * is released.
> + */
> + scoped_guard(rcu) {
> + list = rhltable_lookup(&fs_bdev_supers, &dev, fs_bdev_params);
> + rhl_for_each_entry_rcu(h, pos, list, node)
> + if (h->sb == sb && refcount_inc_not_zero(&h->fs_bdev_active))
> + return 0;
> + }
> +
> + h = kmalloc(sizeof(*h), GFP_KERNEL);
> + if (!h)
> + return -ENOMEM;
> + h->dev = dev;
> + h->sb = sb;
> + refcount_set(&h->fs_bdev_passive, 1);
> + refcount_set(&h->fs_bdev_active, 1);
> +
> + err = rhltable_insert(&fs_bdev_supers, &h->node, fs_bdev_params);
> + if (err) {
> + kfree(h);
> + return err;
> + }
> +
> + /* The sb->s_count ref keeps @h->sb valid for as long as the entry exists. */
> + spin_lock(&sb_lock);
> + sb->s_count++;
> + spin_unlock(&sb_lock);
> +
> + return 0;
> +}
> +
> +/**
> + * fs_bdev_file_open_by_dev - claim a block device on behalf of a superblock
> + * @dev: block device number
> + * @mode: open mode
> + * @holder: block-layer exclusivity token (a superblock, or the file_system_type
> + * when the device may be shared by several superblocks of that type)
> + * @sb: superblock to drive fs_holder_ops events for
> + *
> + * Open @dev with &fs_holder_ops and register that @sb uses it, so device
> + * removal/sync/freeze/thaw are propagated to @sb (and any other superblock
> + * sharing @dev). Must be paired with fs_bdev_file_release().
> + *
> + * Return: an opened block-device file or an ERR_PTR().
> + */
> +struct file *fs_bdev_file_open_by_dev(dev_t dev, blk_mode_t mode, void *holder,
> + struct super_block *sb)
> +{
> + struct file *bdev_file;
> + int err;
> +
> + bdev_file = bdev_file_open_by_dev(dev, mode, holder, &fs_holder_ops);
> + if (IS_ERR(bdev_file))
> + return bdev_file;
> +
> + err = fs_bdev_register(bdev_file, sb);
> + if (err) {
> + bdev_fput(bdev_file);
> + return ERR_PTR(err);
> + }
> + return bdev_file;
> +}
> +EXPORT_SYMBOL_GPL(fs_bdev_file_open_by_dev);
> +
> +struct file *fs_bdev_file_open_by_path(const char *path, blk_mode_t mode,
> + void *holder, struct super_block *sb)
> +{
> + struct file *bdev_file;
> + int err;
> +
> + bdev_file = bdev_file_open_by_path(path, mode, holder, &fs_holder_ops);
> + if (IS_ERR(bdev_file))
> + return bdev_file;
> +
> + err = fs_bdev_register(bdev_file, sb);
> + if (err) {
> + bdev_fput(bdev_file);
> + return ERR_PTR(err);
> + }
> + return bdev_file;
> +}
> +EXPORT_SYMBOL_GPL(fs_bdev_file_open_by_path);
> +
> +/**
> + * fs_bdev_file_release - release a block device claimed for a superblock
> + * @bdev_file: file returned by fs_bdev_file_open_by_{dev,path}()
> + * @sb: superblock the device was claimed for
> + *
> + * Drop one claim on the {dev, @sb} entry; the last claim unregisters it (a
> + * pinning cursor defers the actual unlink). Then close the block device.
> + */
> +void fs_bdev_file_release(struct file *bdev_file, struct super_block *sb)
> +{
> + dev_t dev = file_bdev(bdev_file)->bd_dev;
> + struct fs_bdev_holder *h, *found = NULL;
> + struct rhlist_head *list, *pos;
> +
> + rcu_read_lock();
> + list = rhltable_lookup(&fs_bdev_supers, &dev, fs_bdev_params);
> + rhl_for_each_entry_rcu(h, pos, list, node) {
> + if (h->sb != sb)
> + continue;
> + /* At most one entry per (dev, sb); the last claim drops the bias. */
> + if (refcount_dec_and_test(&h->fs_bdev_active))
> + found = h;
> + break;
> + }
> + rcu_read_unlock();
> + if (found)
> + fs_bdev_holder_put(found);
> + bdev_fput(bdev_file);
> +}
> +EXPORT_SYMBOL_GPL(fs_bdev_file_release);
> +
> int setup_bdev_super(struct super_block *sb, int sb_flags,
> struct fs_context *fc)
> {
> @@ -1609,7 +1783,7 @@ int setup_bdev_super(struct super_block *sb, int sb_flags,
> struct file *bdev_file;
> struct block_device *bdev;
>
> - bdev_file = bdev_file_open_by_dev(sb->s_dev, mode, sb, &fs_holder_ops);
> + bdev_file = fs_bdev_file_open_by_dev(sb->s_dev, mode, sb, sb);
> if (IS_ERR(bdev_file)) {
> if (fc)
> errorf(fc, "%s: Can't open blockdev", fc->source);
> @@ -1623,7 +1797,7 @@ int setup_bdev_super(struct super_block *sb, int sb_flags,
> * writable from userspace even for a read-only block device.
> */
> if ((mode & BLK_OPEN_WRITE) && bdev_read_only(bdev)) {
> - bdev_fput(bdev_file);
> + fs_bdev_file_release(bdev_file, sb);
> return -EACCES;
> }
>
> @@ -1634,7 +1808,7 @@ int setup_bdev_super(struct super_block *sb, int sb_flags,
> if (atomic_read(&bdev->bd_fsfreeze_count) > 0) {
> if (fc)
> warnf(fc, "%pg: Can't mount, blockdev is frozen", bdev);
> - bdev_fput(bdev_file);
> + fs_bdev_file_release(bdev_file, sb);
> return -EBUSY;
> }
> spin_lock(&sb_lock);
> @@ -1725,7 +1899,7 @@ void kill_block_super(struct super_block *sb)
> generic_shutdown_super(sb);
> if (bdev) {
> sync_blockdev(bdev);
> - bdev_fput(sb->s_bdev_file);
> + fs_bdev_file_release(sb->s_bdev_file, sb);
> }
> }
>
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index c8494d64a69d..43d37c02febf 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -1760,13 +1760,6 @@ struct blk_holder_ops {
> int (*thaw)(struct block_device *bdev);
> };
>
> -/*
> - * For filesystems using @fs_holder_ops, the @holder argument passed to
> - * helpers used to open and claim block devices via
> - * bd_prepare_to_claim() must point to a superblock.
> - */
> -extern const struct blk_holder_ops fs_holder_ops;
> -
> /*
> * Return the correct open flags for blkdev_get_by_* for super block flags
> * as stored in sb->s_flags.
> diff --git a/include/linux/fs/super.h b/include/linux/fs/super.h
> index f21ffbb6dea5..721d842e3b24 100644
> --- a/include/linux/fs/super.h
> +++ b/include/linux/fs/super.h
> @@ -235,4 +235,11 @@ int freeze_super(struct super_block *super, enum freeze_holder who,
> int thaw_super(struct super_block *super, enum freeze_holder who,
> const void *freeze_owner);
>
> +struct file;
> +struct file *fs_bdev_file_open_by_dev(dev_t dev, blk_mode_t mode, void *holder,
> + struct super_block *sb);
> +struct file *fs_bdev_file_open_by_path(const char *path, blk_mode_t mode,
> + void *holder, struct super_block *sb);
> +void fs_bdev_file_release(struct file *bdev_file, struct super_block *sb);
> +
> #endif /* _LINUX_FS_SUPER_H */
>
> --
> 2.47.3
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH RFC 4/8] xfs: port to fs_bdev_file_open_by_path()
2026-06-02 10:10 ` [PATCH RFC 4/8] xfs: port to fs_bdev_file_open_by_path() Christian Brauner
@ 2026-06-08 10:15 ` Jan Kara
0 siblings, 0 replies; 21+ messages in thread
From: Jan Kara @ 2026-06-08 10:15 UTC (permalink / raw)
To: Christian Brauner
Cc: Christoph Hellwig, Jan Kara, Jens Axboe, Alexander Viro,
linux-block, linux-kernel, linux-fsdevel, Carlos Maiolino,
linux-xfs, Chris Mason, David Sterba, linux-btrfs,
Theodore Ts'o, linux-ext4, Gao Xiang, linux-erofs
On Tue 02-06-26 12:10:10, Christian Brauner wrote:
> Route opens through fs_bdev_file_open_by_path() so each external device
> is registered against mp->m_super, and convert the matching releases.
>
> Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
Looks good. Feel free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
> ---
> fs/xfs/xfs_buf.c | 2 +-
> fs/xfs/xfs_super.c | 10 +++++-----
> 2 files changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index 580d40a5ee57..3d3b29edb156 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -1601,7 +1601,7 @@ xfs_free_buftarg(
> fs_put_dax(btp->bt_daxdev, btp->bt_mount);
> /* the main block device is closed by kill_block_super */
> if (btp->bt_bdev != btp->bt_mount->m_super->s_bdev)
> - bdev_fput(btp->bt_file);
> + fs_bdev_file_release(btp->bt_file, btp->bt_mount->m_super);
> kfree(btp);
> }
>
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index f8de44443e81..304667210695 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -400,8 +400,8 @@ xfs_blkdev_get(
> blk_mode_t mode;
>
> mode = sb_open_mode(mp->m_super->s_flags);
> - *bdev_filep = bdev_file_open_by_path(name, mode,
> - mp->m_super, &fs_holder_ops);
> + *bdev_filep = fs_bdev_file_open_by_path(name, mode,
> + mp->m_super, mp->m_super);
> if (IS_ERR(*bdev_filep)) {
> error = PTR_ERR(*bdev_filep);
> *bdev_filep = NULL;
> @@ -526,7 +526,7 @@ xfs_open_devices(
> mp->m_logdev_targp = mp->m_ddev_targp;
> /* Handle won't be used, drop it */
> if (logdev_file)
> - bdev_fput(logdev_file);
> + fs_bdev_file_release(logdev_file, mp->m_super);
> }
>
> return 0;
> @@ -538,10 +538,10 @@ xfs_open_devices(
> xfs_free_buftarg(mp->m_ddev_targp);
> out_close_rtdev:
> if (rtdev_file)
> - bdev_fput(rtdev_file);
> + fs_bdev_file_release(rtdev_file, mp->m_super);
> out_close_logdev:
> if (logdev_file)
> - bdev_fput(logdev_file);
> + fs_bdev_file_release(logdev_file, mp->m_super);
> return error;
> }
>
>
> --
> 2.47.3
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH RFC 6/8] ext4: open via dedicated fs bdev helpers
2026-06-02 10:10 ` [PATCH RFC 6/8] ext4: " Christian Brauner
@ 2026-06-08 10:18 ` Jan Kara
0 siblings, 0 replies; 21+ messages in thread
From: Jan Kara @ 2026-06-08 10:18 UTC (permalink / raw)
To: Christian Brauner
Cc: Christoph Hellwig, Jan Kara, Jens Axboe, Alexander Viro,
linux-block, linux-kernel, linux-fsdevel, Carlos Maiolino,
linux-xfs, Chris Mason, David Sterba, linux-btrfs,
Theodore Ts'o, linux-ext4, Gao Xiang, linux-erofs
On Tue 02-06-26 12:10:12, Christian Brauner wrote:
> Route opens through fs_bdev_file_open_by_path() so each external device
> is registered against the correct superblock, and convert the matching
> releases.
>
> Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
Looks good. Feel free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
> ---
> fs/ext4/super.c | 12 ++++++------
> 1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 6a77db4d3124..8108d999008e 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -5793,7 +5793,7 @@ failed_mount8: __maybe_unused
> brelse(sbi->s_sbh);
> if (sbi->s_journal_bdev_file) {
> invalidate_bdev(file_bdev(sbi->s_journal_bdev_file));
> - bdev_fput(sbi->s_journal_bdev_file);
> + fs_bdev_file_release(sbi->s_journal_bdev_file, sb);
> }
> out_fail:
> invalidate_bdev(sb->s_bdev);
> @@ -5972,9 +5972,9 @@ static struct file *ext4_get_journal_blkdev(struct super_block *sb,
> struct ext4_super_block *es;
> int errno;
>
> - bdev_file = bdev_file_open_by_dev(j_dev,
> + bdev_file = fs_bdev_file_open_by_dev(j_dev,
> BLK_OPEN_READ | BLK_OPEN_WRITE | BLK_OPEN_RESTRICT_WRITES,
> - sb, &fs_holder_ops);
> + sb, sb);
> if (IS_ERR(bdev_file)) {
> ext4_msg(sb, KERN_ERR,
> "failed to open journal device unknown-block(%u,%u) %ld",
> @@ -6034,7 +6034,7 @@ static struct file *ext4_get_journal_blkdev(struct super_block *sb,
> out_bh:
> brelse(bh);
> out_bdev:
> - bdev_fput(bdev_file);
> + fs_bdev_file_release(bdev_file, sb);
> return ERR_PTR(errno);
> }
>
> @@ -6073,7 +6073,7 @@ static journal_t *ext4_open_dev_journal(struct super_block *sb,
> out_journal:
> ext4_journal_destroy(EXT4_SB(sb), journal);
> out_bdev:
> - bdev_fput(bdev_file);
> + fs_bdev_file_release(bdev_file, sb);
> return ERR_PTR(errno);
> }
>
> @@ -7492,7 +7492,7 @@ static void ext4_kill_sb(struct super_block *sb)
> kill_block_super(sb);
>
> if (bdev_file)
> - bdev_fput(bdev_file);
> + fs_bdev_file_release(bdev_file, sb);
> }
>
> static struct file_system_type ext4_fs_type = {
>
> --
> 2.47.3
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH RFC 8/8] super: make fs_holder_ops private
2026-06-02 10:10 ` [PATCH RFC 8/8] super: make fs_holder_ops private Christian Brauner
@ 2026-06-08 10:18 ` Jan Kara
0 siblings, 0 replies; 21+ messages in thread
From: Jan Kara @ 2026-06-08 10:18 UTC (permalink / raw)
To: Christian Brauner
Cc: Christoph Hellwig, Jan Kara, Jens Axboe, Alexander Viro,
linux-block, linux-kernel, linux-fsdevel, Carlos Maiolino,
linux-xfs, Chris Mason, David Sterba, linux-btrfs,
Theodore Ts'o, linux-ext4, Gao Xiang, linux-erofs
On Tue 02-06-26 12:10:14, Christian Brauner wrote:
> There's no need to expose it anymore.
>
> Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
Looks good. Feel free to add:
Reviewed-by: Jan Kara <jack@suse.cz>
Honza
> ---
> fs/super.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/fs/super.c b/fs/super.c
> index cea743f699e4..983c2fbf5202 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -1643,13 +1643,12 @@ static int fs_bdev_thaw(struct block_device *bdev)
> return error;
> }
>
> -const struct blk_holder_ops fs_holder_ops = {
> +static const struct blk_holder_ops fs_holder_ops = {
> .mark_dead = fs_bdev_mark_dead,
> .sync = fs_bdev_sync,
> .freeze = fs_bdev_freeze,
> .thaw = fs_bdev_thaw,
> };
> -EXPORT_SYMBOL_GPL(fs_holder_ops);
>
> static int fs_bdev_register(struct file *bdev_file, struct super_block *sb)
> {
>
> --
> 2.47.3
>
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH RFC 7/8] erofs: open via dedicated fs bdev helpers
2026-06-03 13:42 ` Christian Brauner
@ 2026-06-10 6:55 ` Gao Xiang
0 siblings, 0 replies; 21+ messages in thread
From: Gao Xiang @ 2026-06-10 6:55 UTC (permalink / raw)
To: Christian Brauner
Cc: Jens Axboe, Alexander Viro, linux-block, linux-kernel,
linux-fsdevel, Carlos Maiolino, linux-xfs, Chris Mason,
David Sterba, linux-btrfs, Theodore Ts'o, linux-ext4,
Gao Xiang, linux-erofs, Christoph Hellwig, Jan Kara
Hi Christian,
On 2026/6/3 21:42, Christian Brauner wrote:
>> May I ask if it's an urgent 7.2 work? If not, I could
>
> No no, it's way too late for that this cycle.
>
>> make a preparation patch for the upcoming 7.2 cycle
>> to handle erofs_map_dev() failure here so you don't
>> need to bother with this in this patchset.
>
> Sounds good. I take it you can just do this yourself without me.
>
>> I will seek more time to resolve the recent todos
>
> Thanks!
>
>> yet always intercepted by other unrelated stuffs.
>
> :)
I removed .shutdown() and .remove_bdev() implementations since I
think it doesn't quite seem necessary for immutable fses, but
would like to know your thoughts too, my overall own comments are
documented in the commit message below:
From 933f6c6f2e704116d9a15815c880196bec7b9ee3 Mon Sep 17 00:00:00 2001
From: Christian Brauner <brauner@kernel.org>
Date: Tue, 2 Jun 2026 12:10:13 +0200
Subject: [PATCH] erofs: open via dedicated fs bdev helpers
Route opens through fs_bdev_file_open_by_path() so each external device
is registered against the correct superblock, and convert the matching
releases.
Gao Xiang: I think typical immutable filesystems don't need .shutdown()
and .remove_bdev() for the following reasons:
- blk_mark_disk_dead() sets GD_DEAD in advance of fs_bdev_mark_dead()
so that the following bios will fail immediately; block_device
references are still valid so it seems overkill to handle dead
blockdevs in the deep filesystem I/O submission path.
- Immutable filesystems like EROFS don't have write paths and journals,
so they don't need to block writes (i.e., new dirty pages), metadata
changes, and abort journals.
- The comment above loop_change_fd() documents a valid read-only use
case we need to support anyway, but it calls disk_force_media_change()
which will call fs_bdev_mark_dead() later: we don't want loop_change_fd()
shutdowns the active filesystems and return -EIO unconditionally.
Currently I think the default behavior (shrink_dcache_sb + evict_inodes)
in fs_bdev_mark_dead() is enough for immutable filesystems, tried to
document in the commit here for later reference.
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
---
fs/erofs/super.c | 35 +++++++++++++++++++++++------------
1 file changed, 23 insertions(+), 12 deletions(-)
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index 802add6652fd..def9cbfbc9d8 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -153,8 +153,8 @@ static int erofs_init_device(struct erofs_buf *buf, struct super_block *sb,
} else if (!sbi->devs->flatdev) {
file = erofs_is_fileio_mode(sbi) ?
filp_open(dif->path, O_RDONLY | O_LARGEFILE, 0) :
- bdev_file_open_by_path(dif->path,
- BLK_OPEN_READ, sb->s_type, NULL);
+ fs_bdev_file_open_by_path(dif->path,
+ BLK_OPEN_READ, sb->s_type, sb);
if (IS_ERR(file)) {
if (file == ERR_PTR(-ENOTBLK))
return -EINVAL;
@@ -843,11 +843,16 @@ static int erofs_fc_reconfigure(struct fs_context *fc)
static int erofs_release_device_info(int id, void *ptr, void *data)
{
+ struct super_block *sb = data;
struct erofs_device_info *dif = ptr;
fs_put_dax(dif->dax_dev, NULL);
- if (dif->file)
- fput(dif->file);
+ if (dif->file) {
+ if (S_ISBLK(file_inode(dif->file)->i_mode))
+ fs_bdev_file_release(dif->file, sb);
+ else
+ fput(dif->file);
+ }
erofs_fscache_unregister_cookie(dif->fscache);
dif->fscache = NULL;
kfree(dif->path);
@@ -855,18 +860,19 @@ static int erofs_release_device_info(int id, void *ptr, void *data)
return 0;
}
-static void erofs_free_dev_context(struct erofs_dev_context *devs)
+static void erofs_free_dev_context(struct erofs_dev_context *devs,
+ struct super_block *sb)
{
if (!devs)
return;
- idr_for_each(&devs->tree, &erofs_release_device_info, NULL);
+ idr_for_each(&devs->tree, &erofs_release_device_info, sb);
idr_destroy(&devs->tree);
kfree(devs);
}
-static void erofs_sb_free(struct erofs_sb_info *sbi)
+static void erofs_sb_free(struct erofs_sb_info *sbi, struct super_block *sb)
{
- erofs_free_dev_context(sbi->devs);
+ erofs_free_dev_context(sbi->devs, sb);
kfree(sbi->fsid);
kfree_sensitive(sbi->domain_id);
if (sbi->dif0.file)
@@ -879,8 +885,13 @@ static void erofs_fc_free(struct fs_context *fc)
{
struct erofs_sb_info *sbi = fc->s_fs_info;
- if (sbi) /* free here if an error occurs before transferring to sb */
- erofs_sb_free(sbi);
+ /*
+ * Freed here only if an error occurs before the sb is set up; at that
+ * point no block-backed device has been claimed (that happens in
+ * fill_super), so the NULL sb never reaches fs_bdev_file_release().
+ */
+ if (sbi)
+ erofs_sb_free(sbi, NULL);
}
static const struct fs_context_operations erofs_context_ops = {
@@ -936,7 +947,7 @@ static void erofs_kill_sb(struct super_block *sb)
erofs_drop_internal_inodes(sbi);
fs_put_dax(sbi->dif0.dax_dev, NULL);
erofs_fscache_unregister_fs(sb);
- erofs_sb_free(sbi);
+ erofs_sb_free(sbi, sb);
sb->s_fs_info = NULL;
}
@@ -948,7 +959,7 @@ static void erofs_put_super(struct super_block *sb)
erofs_shrinker_unregister(sb);
erofs_xattr_prefixes_cleanup(sb);
erofs_drop_internal_inodes(sbi);
- erofs_free_dev_context(sbi->devs);
+ erofs_free_dev_context(sbi->devs, sb);
sbi->devs = NULL;
erofs_fscache_unregister_fs(sb);
}
--
2.43.5
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH RFC 2/8] fs: add a global device to super block hash table
2026-06-02 10:10 ` [PATCH RFC 2/8] fs: add a global device to super block hash table Christian Brauner
2026-06-08 10:14 ` Jan Kara
@ 2026-06-16 12:34 ` Christoph Hellwig
1 sibling, 0 replies; 21+ messages in thread
From: Christoph Hellwig @ 2026-06-16 12:34 UTC (permalink / raw)
To: Christian Brauner
Cc: Christoph Hellwig, Jan Kara, Jens Axboe, Alexander Viro,
linux-block, linux-kernel, linux-fsdevel, Carlos Maiolino,
linux-xfs, Chris Mason, David Sterba, linux-btrfs,
Theodore Ts'o, linux-ext4, Gao Xiang, linux-erofs
On Tue, Jun 02, 2026 at 12:10:08PM +0200, Christian Brauner wrote:
> fs_holder_ops recovers the owning superblock from bdev->bd_holder, which
> forces the holder to be exactly one superblock and prevents several
> superblocks from sharing one block device. That's what erofs is doing.
>
> Introduce a global dev_t-keyed rhltable mapping each block device to the
> superblock(s) using it. The holder argument becomes purely the block
> layer's exclusivity token (a superblock, or a file_system_type for
> shared devices) and is no longer needed by the fs specific callbacks.
Err, no. block devices need to have a specific owner. If erofs wants
to share a device between superblock it needs to come up with an entity
that owns the block devices which is not a superblock.
IMHO sharing devices between superblocks is a bad idea, but that ship
has sailed, but please keep it contained inside of erofs.
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2026-06-16 12:34 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-02 10:10 [PATCH RFC 0/8] fs: support freeze/thaw/mark_dead/sync with shared devices Christian Brauner
2026-06-02 10:10 ` [PATCH RFC 1/8] fs, block: move blk_mode_t and fop_flags_t into <linux/types.h> Christian Brauner
2026-06-08 9:57 ` Jan Kara
2026-06-02 10:10 ` [PATCH RFC 2/8] fs: add a global device to super block hash table Christian Brauner
2026-06-08 10:14 ` Jan Kara
2026-06-16 12:34 ` Christoph Hellwig
2026-06-02 10:10 ` [PATCH RFC 3/8] fs: refuse to claim any frozen block device Christian Brauner
2026-06-08 10:01 ` Jan Kara
2026-06-02 10:10 ` [PATCH RFC 4/8] xfs: port to fs_bdev_file_open_by_path() Christian Brauner
2026-06-08 10:15 ` Jan Kara
2026-06-02 10:10 ` [PATCH RFC 5/8] btrfs: open via dedicated fs bdev helpers Christian Brauner
2026-06-02 10:10 ` [PATCH RFC 6/8] ext4: " Christian Brauner
2026-06-08 10:18 ` Jan Kara
2026-06-02 10:10 ` [PATCH RFC 7/8] erofs: " Christian Brauner
2026-06-02 16:25 ` Gao Xiang
2026-06-03 13:42 ` Christian Brauner
2026-06-10 6:55 ` Gao Xiang
2026-06-02 10:10 ` [PATCH RFC 8/8] super: make fs_holder_ops private Christian Brauner
2026-06-08 10:18 ` Jan Kara
2026-06-02 16:12 ` [PATCH RFC 0/8] fs: support freeze/thaw/mark_dead/sync with shared devices Gao Xiang
2026-06-03 6:43 ` [syzbot ci] " syzbot ci
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox