Linux CIFS filesystem development
 help / color / mirror / Atom feed
* [PATCH 0/8] super: retire sget(), convert iterators to RCU
@ 2026-05-26 15:09 Christian Brauner
  2026-05-26 15:09 ` [PATCH 1/8] ext4: convert extents KUnit test to sget_fc() Christian Brauner
                   ` (9 more replies)
  0 siblings, 10 replies; 15+ messages in thread
From: Christian Brauner @ 2026-05-26 15:09 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Theodore Ts'o, Andreas Dilger, Jan Kara, Ritesh Harjani (IBM),
	linux-ext4, linux-cifs, Alexander Viro,
	Christian Brauner (Amutable)

* retire sget(): CIFS plus the two ext4 KUnit tests (extents-test,
  mballoc-test) were the last in-tree callers, and all three convert
  cleanly to sget_fc(). That lets sget() and its prototype come out,
  taking ~60 lines that only existed to be kept in lockstep with
  sget_fc() on every publish-path change.

* Walk @super_blocks and @type->fs_supers under RCU, pinned by
  refcount_inc_not_zero(&sb->s_count). iterate_supers(),
  iterate_supers_type(), user_get_super(), do_emergency_remount(),
  filesystems_freeze() and filesystems_thaw() no longer hold sb_lock
  across the cursor advance.

  The conversion goes in four small steps. Drop sb_lock from
  setup_bdev_super(): the {s_bdev_file, s_bdev, s_bdi,
  SB_I_STABLE_WRITES} tuple is publication of immutable state, and
  SB_BORN already gates every reader via super_wake()'s
  smp_store_release paired with super_flags()'s smp_load_acquire. Then
  convert sb->s_count to refcount_t -- mechanical, every increment is
  still under sb_lock. Then switch the write-side list/hlist ops to
  their _rcu variants; @super_blocks gets list_bidir_del_rcu() so the
  reverse-walking iterators (filesystems_freeze, do_emergency_remount)
  keep a valid ->prev on the unlinked entry, matching the canonical
  pattern in kernel/nstree.c. Finally, convert the iterators themselves:
  cursor advance via READ_ONCE / rcu_dereference, with the previous
  entry kept pinned via its s_count across the rcu_read_unlock ->
  callback -> rcu_read_lock cycle.

Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
---
Christian Brauner (8):
      ext4: convert extents KUnit test to sget_fc()
      ext4: convert mballoc KUnit test to sget_fc()
      smb: client: convert cifs_smb3_do_mount() to sget_fc()
      fs: retire sget()
      super: drop sb_lock from setup_bdev_super() tuple publication
      super: convert sb->s_count to refcount_t
      super: switch list manipulation to _rcu primitives
      super: convert iterators to RCU readers + refcount_inc_not_zero

 fs/btrfs/super.c               |   2 +-
 fs/ext4/extents-test.c         |  22 +++++-
 fs/ext4/mballoc-test.c         |  17 ++++-
 fs/smb/client/cifsfs.c         |  43 ++++++-----
 fs/smb/client/cifsfs.h         |   3 +-
 fs/smb/client/cifsproto.h      |   3 +-
 fs/smb/client/connect.c        |   5 +-
 fs/smb/client/fs_context.c     |   2 +-
 fs/super.c                     | 167 ++++++++++++++---------------------------
 include/linux/fs.h             |   4 -
 include/linux/fs/super_types.h |   3 +-
 11 files changed, 127 insertions(+), 144 deletions(-)
---
base-commit: 254f49634ee16a731174d2ae34bc50bd5f45e731
change-id: 20260526-work-sget-6bc80b96cba5


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/8] ext4: convert extents KUnit test to sget_fc()
  2026-05-26 15:09 [PATCH 0/8] super: retire sget(), convert iterators to RCU Christian Brauner
@ 2026-05-26 15:09 ` Christian Brauner
  2026-05-26 15:09 ` [PATCH 2/8] ext4: convert mballoc " Christian Brauner
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: Christian Brauner @ 2026-05-26 15:09 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Theodore Ts'o, Andreas Dilger, Jan Kara, Ritesh Harjani (IBM),
	linux-ext4, linux-cifs, Alexander Viro,
	Christian Brauner (Amutable)

The extents KUnit test uses sget() to get an initialized superblock for
its fake file_system_type. sget() predates fs_context and we want to
retire it. Switch this caller over to sget_fc().

Add a no-op ext_init_fs_context() so fs_context_for_mount() has
something to call on the fake fs_type. ext_set() now takes a struct
fs_context * (still a no-op). extents_kunit_init() allocates the fc,
hands it to sget_fc() and drops the fc reference once the sb is
published. sget_fc() does not retain a pointer to it.

No functional change for the test.

Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
---
 fs/ext4/extents-test.c | 22 ++++++++++++++++++----
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/extents-test.c b/fs/ext4/extents-test.c
index 6b53a3f39fcd..bd7795a82607 100644
--- a/fs/ext4/extents-test.c
+++ b/fs/ext4/extents-test.c
@@ -37,6 +37,7 @@
 
 #include <kunit/test.h>
 #include <kunit/static_stub.h>
+#include <linux/fs_context.h>
 #include <linux/gfp_types.h>
 #include <linux/stddef.h>
 
@@ -130,14 +131,20 @@ static void ext_kill_sb(struct super_block *sb)
 	generic_shutdown_super(sb);
 }
 
-static int ext_set(struct super_block *sb, void *data)
+static int ext_init_fs_context(struct fs_context *fc)
+{
+	return 0;
+}
+
+static int ext_set(struct super_block *sb, struct fs_context *fc)
 {
 	return 0;
 }
 
 static struct file_system_type ext_fs_type = {
-	.name = "extents test",
-	.kill_sb = ext_kill_sb,
+	.name		 = "extents test",
+	.init_fs_context = ext_init_fs_context,
+	.kill_sb	 = ext_kill_sb,
 };
 
 static void extents_kunit_exit(struct kunit *test)
@@ -223,6 +230,7 @@ static int extents_kunit_init(struct kunit *test)
 	struct ext4_inode_info *ei;
 	struct inode *inode;
 	struct super_block *sb;
+	struct fs_context *fc;
 	struct ext4_sb_info *sbi = NULL;
 	struct kunit_ext_test_param *param =
 		(struct kunit_ext_test_param *)(test->param_value);
@@ -232,7 +240,13 @@ static int extents_kunit_init(struct kunit *test)
 	if (sbi == NULL)
 		return -ENOMEM;
 
-	sb = sget(&ext_fs_type, NULL, ext_set, 0, NULL);
+	fc = fs_context_for_mount(&ext_fs_type, 0);
+	if (IS_ERR(fc)) {
+		kfree(sbi);
+		return PTR_ERR(fc);
+	}
+	sb = sget_fc(fc, NULL, ext_set);
+	put_fs_context(fc);
 	if (IS_ERR(sb)) {
 		kfree(sbi);
 		return PTR_ERR(sb);

-- 
2.47.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/8] ext4: convert mballoc KUnit test to sget_fc()
  2026-05-26 15:09 [PATCH 0/8] super: retire sget(), convert iterators to RCU Christian Brauner
  2026-05-26 15:09 ` [PATCH 1/8] ext4: convert extents KUnit test to sget_fc() Christian Brauner
@ 2026-05-26 15:09 ` Christian Brauner
  2026-05-27  0:47   ` Theodore Tso
  2026-05-26 15:09 ` [PATCH 3/8] smb: client: convert cifs_smb3_do_mount() " Christian Brauner
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 15+ messages in thread
From: Christian Brauner @ 2026-05-26 15:09 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Theodore Ts'o, Andreas Dilger, Jan Kara, Ritesh Harjani (IBM),
	linux-ext4, linux-cifs, Alexander Viro,
	Christian Brauner (Amutable)

Same treatment as the extents KUnit test. The mballoc test uses sget()
as a thin "give me an initialized superblock" wrapper for a fake
file_system_type. Move it onto sget_fc() so sget() can go away.

Add a no-op mbt_init_fs_context() so fs_context_for_mount() has
something to call on the fake fs_type. mbt_set() now takes a struct
fs_context * (still a no-op). mbt_ext4_alloc_super_block() allocates
the fc, hands it to sget_fc() and drops the fc reference once the sb
is published.

No functional change.

Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
---
 fs/ext4/mballoc-test.c | 17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/mballoc-test.c b/fs/ext4/mballoc-test.c
index 90ed505fa4b1..d90da44aadbd 100644
--- a/fs/ext4/mballoc-test.c
+++ b/fs/ext4/mballoc-test.c
@@ -5,6 +5,7 @@
 
 #include <kunit/test.h>
 #include <kunit/static_stub.h>
+#include <linux/fs_context.h>
 #include <linux/random.h>
 
 #include "ext4.h"
@@ -63,8 +64,14 @@ static void mbt_kill_sb(struct super_block *sb)
 	generic_shutdown_super(sb);
 }
 
+static int mbt_init_fs_context(struct fs_context *fc)
+{
+	return 0;
+}
+
 static struct file_system_type mbt_fs_type = {
 	.name			= "mballoc test",
+	.init_fs_context	= mbt_init_fs_context,
 	.kill_sb		= mbt_kill_sb,
 };
 
@@ -127,7 +134,7 @@ static void mbt_mb_release(struct super_block *sb)
 	kfree(sb->s_bdev);
 }
 
-static int mbt_set(struct super_block *sb, void *data)
+static int mbt_set(struct super_block *sb, struct fs_context *fc)
 {
 	return 0;
 }
@@ -136,13 +143,19 @@ static struct super_block *mbt_ext4_alloc_super_block(void)
 {
 	struct mbt_ext4_super_block *fsb;
 	struct super_block *sb;
+	struct fs_context *fc;
 	struct ext4_sb_info *sbi;
 
 	fsb = kzalloc_obj(*fsb);
 	if (fsb == NULL)
 		return NULL;
 
-	sb = sget(&mbt_fs_type, NULL, mbt_set, 0, NULL);
+	fc = fs_context_for_mount(&mbt_fs_type, 0);
+	if (IS_ERR(fc))
+		goto out;
+
+	sb = sget_fc(fc, NULL, mbt_set);
+	put_fs_context(fc);
 	if (IS_ERR(sb))
 		goto out;
 

-- 
2.47.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/8] smb: client: convert cifs_smb3_do_mount() to sget_fc()
  2026-05-26 15:09 [PATCH 0/8] super: retire sget(), convert iterators to RCU Christian Brauner
  2026-05-26 15:09 ` [PATCH 1/8] ext4: convert extents KUnit test to sget_fc() Christian Brauner
  2026-05-26 15:09 ` [PATCH 2/8] ext4: convert mballoc " Christian Brauner
@ 2026-05-26 15:09 ` Christian Brauner
  2026-05-26 15:09 ` [PATCH 4/8] fs: retire sget() Christian Brauner
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: Christian Brauner @ 2026-05-26 15:09 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Theodore Ts'o, Andreas Dilger, Jan Kara, Ritesh Harjani (IBM),
	linux-ext4, linux-cifs, Alexander Viro,
	Christian Brauner (Amutable)

The CIFS mount path already runs through fs_context: smb3_get_tree()
calls smb3_get_tree_common() with a struct fs_context * in hand. But
the fc is dropped on the way to sget(). Plumb it through to sget_fc()
so the legacy sget() interface can go.

cifs_smb3_do_mount() now takes (struct fs_context *, struct
smb3_fs_context *). The old (fs_type, flags) pair is reconstructed
from fc->fs_type and fc->sb_flags. The flags argument was always
passed as 0 by the sole caller anyway. The cifs_dbg diagnostic now
prints fc->sb_flags directly.

cifs_match_super() and cifs_set_super() were the two void-data
callbacks for sget(). The match callback now takes
(struct super_block *, struct fs_context *) and reads struct
cifs_mnt_data out of fc->sget_key. The set callback is gone entirely:
sget_fc() pre-populates sb->s_fs_info from fc->s_fs_info before
invoking set() so set_anon_super_fc() (which just allocates an anon
bdev) is sufficient.

Before sget_fc() we stash cifs_sb in fc->s_fs_info, the per-mount data
in fc->sget_key and force fc->sb_flags to SB_NODIRATIME | SB_NOATIME
to reproduce the previous hard-coded behaviour (alloc_super() reads
fc->sb_flags). The original sb_flags is saved and restored around the
call so the rest of the mount path sees the same fc semantics as
before.

mnt_data.flags keeps its historical value of 0 so the CIFS_MS_MASK
comparison in compare_mount_options() returns the same (always-equal)
result.

No functional change. With this in place sget() has no remaining CIFS
caller.

Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
---
 fs/smb/client/cifsfs.c     | 43 ++++++++++++++++++++++++++-----------------
 fs/smb/client/cifsfs.h     |  3 ++-
 fs/smb/client/cifsproto.h  |  3 ++-
 fs/smb/client/connect.c    |  5 +++--
 fs/smb/client/fs_context.c |  2 +-
 5 files changed, 34 insertions(+), 22 deletions(-)

diff --git a/fs/smb/client/cifsfs.c b/fs/smb/client/cifsfs.c
index 9f76b0347fa9..d5074e3fbb85 100644
--- a/fs/smb/client/cifsfs.c
+++ b/fs/smb/client/cifsfs.c
@@ -12,6 +12,7 @@
 
 #include <linux/module.h>
 #include <linux/fs.h>
+#include <linux/fs_context.h>
 #include <linux/filelock.h>
 #include <linux/mount.h>
 #include <linux/slab.h>
@@ -966,26 +967,19 @@ cifs_get_root(struct smb3_fs_context *ctx, struct super_block *sb)
 	return dentry;
 }
 
-static int cifs_set_super(struct super_block *sb, void *data)
-{
-	struct cifs_mnt_data *mnt_data = data;
-	sb->s_fs_info = mnt_data->cifs_sb;
-	return set_anon_super(sb, NULL);
-}
-
 struct dentry *
-cifs_smb3_do_mount(struct file_system_type *fs_type,
-	      int flags, struct smb3_fs_context *old_ctx)
+cifs_smb3_do_mount(struct fs_context *fc, struct smb3_fs_context *old_ctx)
 {
 	struct cifs_mnt_data mnt_data;
 	struct cifs_sb_info *cifs_sb;
 	struct super_block *sb;
 	struct dentry *root;
+	unsigned int saved_sb_flags;
 	int rc;
 
 	if (cifsFYI) {
-		cifs_dbg(FYI, "%s: devname=%s flags=0x%x\n", __func__,
-			 old_ctx->source, flags);
+		cifs_dbg(FYI, "%s: devname=%s sb_flags=0x%x\n", __func__,
+			 old_ctx->source, fc->sb_flags);
 	} else {
 		cifs_info("Attempting to mount %s\n", old_ctx->source);
 	}
@@ -1012,7 +1006,7 @@ cifs_smb3_do_mount(struct file_system_type *fs_type,
 
 	rc = cifs_mount(cifs_sb, cifs_sb->ctx);
 	if (rc) {
-		if (!(flags & SB_SILENT))
+		if (!(fc->sb_flags & SB_SILENT))
 			cifs_dbg(VFS, "cifs_mount failed w/return code = %d\n",
 				 rc);
 		root = ERR_PTR(rc);
@@ -1021,12 +1015,27 @@ cifs_smb3_do_mount(struct file_system_type *fs_type,
 
 	mnt_data.ctx = cifs_sb->ctx;
 	mnt_data.cifs_sb = cifs_sb;
-	mnt_data.flags = flags;
+	mnt_data.flags = 0;
 
-	/* BB should we make this contingent on mount parm? */
-	flags |= SB_NODIRATIME | SB_NOATIME;
-
-	sb = sget(fs_type, cifs_match_super, cifs_set_super, flags, &mnt_data);
+	/*
+	 * sb->s_flags is set from fc->sb_flags by alloc_super(). CIFS has
+	 * historically forced SB_NODIRATIME | SB_NOATIME on every mount and
+	 * ignored the caller-supplied SB_* flags. Preserve that behaviour by
+	 * overriding fc->sb_flags around the sget_fc() call.
+	 *
+	 * Hand cifs_sb to sget_fc() via fc->s_fs_info; sget_fc() copies it
+	 * onto sb->s_fs_info before running set() and clears fc->s_fs_info
+	 * on successful publish. Pass the rest of the per-mount context to
+	 * cifs_match_super() through fc->sget_key.
+	 */
+	saved_sb_flags = fc->sb_flags;
+	fc->sb_flags = SB_NODIRATIME | SB_NOATIME;
+	fc->s_fs_info = cifs_sb;
+	fc->sget_key = &mnt_data;
+	sb = sget_fc(fc, cifs_match_super, set_anon_super_fc);
+	fc->sget_key = NULL;
+	fc->s_fs_info = NULL;
+	fc->sb_flags = saved_sb_flags;
 	if (IS_ERR(sb)) {
 		cifs_umount(cifs_sb);
 		return ERR_CAST(sb);
diff --git a/fs/smb/client/cifsfs.h b/fs/smb/client/cifsfs.h
index c455b15f2778..0a93f48924a5 100644
--- a/fs/smb/client/cifsfs.h
+++ b/fs/smb/client/cifsfs.h
@@ -144,8 +144,9 @@ ssize_t cifs_file_copychunk_range(unsigned int xid, struct file *src_file,
 long cifs_ioctl(struct file *filep, unsigned int command, unsigned long arg);
 void cifs_setsize(struct inode *inode, loff_t offset);
 
+struct fs_context;
 struct smb3_fs_context;
-struct dentry *cifs_smb3_do_mount(struct file_system_type *fs_type, int flags,
+struct dentry *cifs_smb3_do_mount(struct fs_context *fc,
 				  struct smb3_fs_context *old_ctx);
 
 char *cifs_silly_fullpath(struct dentry *dentry);
diff --git a/fs/smb/client/cifsproto.h b/fs/smb/client/cifsproto.h
index 4a25afda9448..a39572cbaadb 100644
--- a/fs/smb/client/cifsproto.h
+++ b/fs/smb/client/cifsproto.h
@@ -19,6 +19,7 @@
 struct statfs;
 struct smb_rqst;
 struct smb3_fs_context;
+struct fs_context;
 
 /*
  *****************************************************************
@@ -236,7 +237,7 @@ void cifs_mount_put_conns(struct cifs_mount_ctx *mnt_ctx);
 int cifs_mount_get_session(struct cifs_mount_ctx *mnt_ctx);
 int cifs_is_path_remote(struct cifs_mount_ctx *mnt_ctx);
 int cifs_mount_get_tcon(struct cifs_mount_ctx *mnt_ctx);
-int cifs_match_super(struct super_block *sb, void *data);
+int cifs_match_super(struct super_block *sb, struct fs_context *fc);
 int cifs_mount(struct cifs_sb_info *cifs_sb, struct smb3_fs_context *ctx);
 void cifs_umount(struct cifs_sb_info *cifs_sb);
 void cifs_mark_open_files_invalid(struct cifs_tcon *tcon);
diff --git a/fs/smb/client/connect.c b/fs/smb/client/connect.c
index dcde25da468d..79762e6bbe50 100644
--- a/fs/smb/client/connect.c
+++ b/fs/smb/client/connect.c
@@ -6,6 +6,7 @@
  *
  */
 #include <linux/fs.h>
+#include <linux/fs_context.h>
 #include <linux/net.h>
 #include <linux/string.h>
 #include <linux/sched/mm.h>
@@ -2991,9 +2992,9 @@ static int match_prepath(struct super_block *sb,
 }
 
 int
-cifs_match_super(struct super_block *sb, void *data)
+cifs_match_super(struct super_block *sb, struct fs_context *fc)
 {
-	struct cifs_mnt_data *mnt_data = data;
+	struct cifs_mnt_data *mnt_data = fc->sget_key;
 	struct smb3_fs_context *ctx;
 	struct cifs_sb_info *cifs_sb;
 	struct TCP_Server_Info *tcp_srv;
diff --git a/fs/smb/client/fs_context.c b/fs/smb/client/fs_context.c
index b9544eb0381b..6aba4e1c9c27 100644
--- a/fs/smb/client/fs_context.c
+++ b/fs/smb/client/fs_context.c
@@ -920,7 +920,7 @@ static int smb3_get_tree_common(struct fs_context *fc)
 	struct dentry *root;
 	int rc = 0;
 
-	root = cifs_smb3_do_mount(fc->fs_type, 0, ctx);
+	root = cifs_smb3_do_mount(fc, ctx);
 	if (IS_ERR(root))
 		return PTR_ERR(root);
 

-- 
2.47.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/8] fs: retire sget()
  2026-05-26 15:09 [PATCH 0/8] super: retire sget(), convert iterators to RCU Christian Brauner
                   ` (2 preceding siblings ...)
  2026-05-26 15:09 ` [PATCH 3/8] smb: client: convert cifs_smb3_do_mount() " Christian Brauner
@ 2026-05-26 15:09 ` Christian Brauner
  2026-05-26 15:09 ` [PATCH 5/8] super: drop sb_lock from setup_bdev_super() tuple publication Christian Brauner
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: Christian Brauner @ 2026-05-26 15:09 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Theodore Ts'o, Andreas Dilger, Jan Kara, Ritesh Harjani (IBM),
	linux-ext4, linux-cifs, Alexander Viro,
	Christian Brauner (Amutable)

sget() and sget_fc() have lived side by side as near-duplicate
find-or-create-and-publish helpers for the legacy and fs_context mount
APIs. The three remaining in-tree callers (CIFS plus the ext4 extents
and mballoc KUnit tests) have all been moved to sget_fc(). Nothing
calls sget() anymore.

Delete sget() from fs/super.c and the prototype in <linux/fs.h>.
Update the two comments that referred to "sget()" or "sget{_fc}()" to
just say "sget_fc()".

This removes ~60 lines of code that only existed to be kept in
lockstep with sget_fc() on every superblock publish-path change.

Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
---
 fs/btrfs/super.c   |  2 +-
 fs/super.c         | 71 ++++--------------------------------------------------
 include/linux/fs.h |  4 ---
 3 files changed, 6 insertions(+), 71 deletions(-)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index b26aa9169e83..636154861d7c 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -2052,7 +2052,7 @@ static int btrfs_get_tree_subvol(struct fs_context *fc)
 	 * then open_ctree will properly initialize the file system specific
 	 * settings later.  btrfs_init_fs_info initializes the static elements
 	 * of the fs_info (locks and such) to make cleanup easier if we find a
-	 * superblock with our given fs_devices later on at sget() time.
+	 * superblock with our given fs_devices later on at sget_fc() time.
 	 */
 	fs_info = kvzalloc_obj(struct btrfs_fs_info);
 	if (!fs_info)
diff --git a/fs/super.c b/fs/super.c
index 378e81efe643..5fe8cea9f8fe 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -328,7 +328,7 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags,
 	init_rwsem(&s->s_umount);
 	lockdep_set_class(&s->s_umount, &type->s_umount_key);
 	/*
-	 * sget() can have s_umount recursion.
+	 * sget_fc() can have s_umount recursion.
 	 *
 	 * When it cannot find a suitable sb, it allocates a new
 	 * one (this one), and tries again to find a suitable old
@@ -439,7 +439,7 @@ static void kill_super_notify(struct super_block *sb)
 
 	/*
 	 * Remove it from @fs_supers so it isn't found by new
-	 * sget{_fc}() walkers anymore. Any concurrent mounter still
+	 * sget_fc() walkers anymore. Any concurrent mounter still
 	 * managing to grab a temporary reference is guaranteed to
 	 * already see SB_DYING and will wait until we notify them about
 	 * SB_DEAD.
@@ -517,7 +517,7 @@ EXPORT_SYMBOL(deactivate_super);
  * @sb: superblock to acquire
  *
  * Acquire a temporary reference on a superblock and try to trade it for
- * an active reference. This is used in sget{_fc}() to wait for a
+ * an active reference. This is used in sget_fc() to wait for a
  * superblock to either become SB_BORN or for it to pass through
  * sb->kill() and be marked as SB_DEAD.
  *
@@ -673,11 +673,11 @@ void generic_shutdown_super(struct super_block *sb)
 	/*
 	 * Broadcast to everyone that grabbed a temporary reference to this
 	 * superblock before we removed it from @fs_supers that the superblock
-	 * is dying. Every walker of @fs_supers outside of sget{_fc}() will now
+	 * is dying. Every walker of @fs_supers outside of sget_fc() will now
 	 * discard this superblock and treat it as dead.
 	 *
 	 * We leave the superblock on @fs_supers so it can be found by
-	 * sget{_fc}() until we passed sb->kill_sb().
+	 * sget_fc() until we passed sb->kill_sb().
 	 */
 	super_wake(sb, SB_DYING);
 	super_unlock_excl(sb);
@@ -808,67 +808,6 @@ struct super_block *sget_fc(struct fs_context *fc,
 }
 EXPORT_SYMBOL(sget_fc);
 
-/**
- *	sget	-	find or create a superblock
- *	@type:	  filesystem type superblock should belong to
- *	@test:	  comparison callback
- *	@set:	  setup callback
- *	@flags:	  mount flags
- *	@data:	  argument to each of them
- */
-struct super_block *sget(struct file_system_type *type,
-			int (*test)(struct super_block *,void *),
-			int (*set)(struct super_block *,void *),
-			int flags,
-			void *data)
-{
-	struct user_namespace *user_ns = current_user_ns();
-	struct super_block *s = NULL;
-	struct super_block *old;
-	int err;
-
-retry:
-	spin_lock(&sb_lock);
-	if (test) {
-		hlist_for_each_entry(old, &type->fs_supers, s_instances) {
-			if (!test(old, data))
-				continue;
-			if (user_ns != old->s_user_ns) {
-				spin_unlock(&sb_lock);
-				destroy_unused_super(s);
-				return ERR_PTR(-EBUSY);
-			}
-			if (!grab_super(old))
-				goto retry;
-			destroy_unused_super(s);
-			return old;
-		}
-	}
-	if (!s) {
-		spin_unlock(&sb_lock);
-		s = alloc_super(type, flags, user_ns);
-		if (!s)
-			return ERR_PTR(-ENOMEM);
-		goto retry;
-	}
-
-	err = set(s, data);
-	if (err) {
-		spin_unlock(&sb_lock);
-		destroy_unused_super(s);
-		return ERR_PTR(err);
-	}
-	s->s_type = type;
-	strscpy(s->s_id, type->name, sizeof(s->s_id));
-	list_add_tail(&s->s_list, &super_blocks);
-	hlist_add_head(&s->s_instances, &type->fs_supers);
-	spin_unlock(&sb_lock);
-	get_filesystem(type);
-	shrinker_register(s->s_shrink);
-	return s;
-}
-EXPORT_SYMBOL(sget);
-
 void drop_super(struct super_block *sb)
 {
 	super_unlock_shared(sb);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 11559c513dfb..6dbe3218dc1e 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2327,10 +2327,6 @@ void free_anon_bdev(dev_t);
 struct super_block *sget_fc(struct fs_context *fc,
 			    int (*test)(struct super_block *, struct fs_context *),
 			    int (*set)(struct super_block *, struct fs_context *));
-struct super_block *sget(struct file_system_type *type,
-			int (*test)(struct super_block *,void *),
-			int (*set)(struct super_block *,void *),
-			int flags, void *data);
 struct super_block *sget_dev(struct fs_context *fc, dev_t dev);
 
 /* Alas, no aliases. Too much hassle with bringing module.h everywhere */

-- 
2.47.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 5/8] super: drop sb_lock from setup_bdev_super() tuple publication
  2026-05-26 15:09 [PATCH 0/8] super: retire sget(), convert iterators to RCU Christian Brauner
                   ` (3 preceding siblings ...)
  2026-05-26 15:09 ` [PATCH 4/8] fs: retire sget() Christian Brauner
@ 2026-05-26 15:09 ` Christian Brauner
  2026-05-27 11:53   ` Christian Brauner
  2026-05-26 15:09 ` [PATCH 6/8] super: convert sb->s_count to refcount_t Christian Brauner
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 15+ messages in thread
From: Christian Brauner @ 2026-05-26 15:09 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Theodore Ts'o, Andreas Dilger, Jan Kara, Ritesh Harjani (IBM),
	linux-ext4, linux-cifs, Alexander Viro,
	Christian Brauner (Amutable)

The tuple {s_bdev_file, s_bdev, s_bdi, SB_I_STABLE_WRITES} written by
setup_bdev_super() is publication of immutable state, not list
integrity. The sb is already on @super_blocks and @fs_supers at this
point (sget_dev() -> sget_fc() put it there) but SB_BORN is unset, so
any iterator that calls super_lock() blocks on
wait_var_event(SB_BORN | SB_DYING).

The SUPER_ITER_UNLOCKED iterators (filesystems_freeze,
filesystems_thaw, do_emergency_remount) do not look at s_bdev, s_bdi
or s_iflags so they cannot observe a partial fill either.

When vfs_get_tree() later calls super_wake(sb, SB_BORN) it does

    smp_store_release(&sb->s_flags, sb->s_flags | SB_BORN)

and any reader gating on SB_BORN via super_flags() loads sb->s_flags
with smp_load_acquire(). The release/acquire pair orders the four
prior writes against the load of SB_BORN.

s_iflags is a shared field so use WRITE_ONCE() on the
read-modify-write to keep the compiler from tearing the store.
retire_super() is the only other writer of s_iflags and only runs
against an already-born sb under s_umount.

This drops one of the five sb_lock acquisitions in the mount path
with no behavioural change for any reader.

Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
---
 fs/super.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index 5fe8cea9f8fe..c451f689c7b3 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1576,13 +1576,16 @@ int setup_bdev_super(struct super_block *sb, int sb_flags,
 		bdev_fput(bdev_file);
 		return -EBUSY;
 	}
-	spin_lock(&sb_lock);
+	/*
+	 * Publish before SB_BORN is set. super_wake(sb, SB_BORN) below uses
+	 * smp_store_release(); any iterator that observes SB_BORN via
+	 * super_flags()'s smp_load_acquire() sees these writes.
+	 */
 	sb->s_bdev_file = bdev_file;
 	sb->s_bdev = bdev;
 	sb->s_bdi = bdi_get(bdev->bd_disk->bdi);
 	if (bdev_stable_writes(bdev))
-		sb->s_iflags |= SB_I_STABLE_WRITES;
-	spin_unlock(&sb_lock);
+		WRITE_ONCE(sb->s_iflags, sb->s_iflags | SB_I_STABLE_WRITES);
 
 	snprintf(sb->s_id, sizeof(sb->s_id), "%pg", bdev);
 	shrinker_debugfs_rename(sb->s_shrink, "sb-%s:%s", sb->s_type->name,

-- 
2.47.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 6/8] super: convert sb->s_count to refcount_t
  2026-05-26 15:09 [PATCH 0/8] super: retire sget(), convert iterators to RCU Christian Brauner
                   ` (4 preceding siblings ...)
  2026-05-26 15:09 ` [PATCH 5/8] super: drop sb_lock from setup_bdev_super() tuple publication Christian Brauner
@ 2026-05-26 15:09 ` Christian Brauner
  2026-05-26 15:09 ` [PATCH 7/8] super: switch list manipulation to _rcu primitives Christian Brauner
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: Christian Brauner @ 2026-05-26 15:09 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Theodore Ts'o, Andreas Dilger, Jan Kara, Ritesh Harjani (IBM),
	linux-ext4, linux-cifs, Alexander Viro,
	Christian Brauner (Amutable)

s_count is the temporary-reference count used to pin a superblock
across the spinlock-to-rwsem hop in every iterator and in
grab_super(). It's a plain int incremented and decremented only under
sb_lock.

Convert it to refcount_t. No semantic change yet: every increment
still happens with sb_lock held, so observation of a live ref is
still serialised by the lock. The increments use refcount_inc()
rather than refcount_inc_not_zero() because every callsite is still
looking at an sb known to be live under sb_lock.

This prepares the ground for switching iterators to RCU readers in a
later patch, at which point refcount_inc_not_zero() becomes the right
primitive at the lockless pin sites.

Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
---
 fs/super.c                     | 14 +++++++-------
 include/linux/fs/super_types.h |  3 ++-
 2 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index c451f689c7b3..2fa7023010ec 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -366,7 +366,7 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags,
 	spin_lock_init(&s->s_inode_wblist_lock);
 	fserror_mount(s);
 
-	s->s_count = 1;
+	refcount_set(&s->s_count, 1);
 	atomic_set(&s->s_active, 1);
 	mutex_init(&s->s_vfs_rename_mutex);
 	lockdep_set_class(&s->s_vfs_rename_mutex, &type->s_vfs_rename_key);
@@ -406,7 +406,7 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags,
  */
 static void __put_super(struct super_block *s)
 {
-	if (!--s->s_count) {
+	if (refcount_dec_and_test(&s->s_count)) {
 		list_del_init(&s->s_list);
 		WARN_ON(s->s_dentry_lru.node);
 		WARN_ON(s->s_inode_lru.node);
@@ -528,7 +528,7 @@ static bool grab_super(struct super_block *sb)
 {
 	bool locked;
 
-	sb->s_count++;
+	refcount_inc(&sb->s_count);
 	spin_unlock(&sb_lock);
 	locked = super_lock_excl(sb);
 	if (locked) {
@@ -857,7 +857,7 @@ static void __iterate_supers(void (*f)(struct super_block *, void *), void *arg,
 	     sb = next_super(sb, flags)) {
 		if (super_flags(sb, SB_DYING))
 			continue;
-		sb->s_count++;
+		refcount_inc(&sb->s_count);
 		spin_unlock(&sb_lock);
 
 		if (flags & SUPER_ITER_UNLOCKED) {
@@ -902,7 +902,7 @@ void iterate_supers_type(struct file_system_type *type,
 		if (super_flags(sb, SB_DYING))
 			continue;
 
-		sb->s_count++;
+		refcount_inc(&sb->s_count);
 		spin_unlock(&sb_lock);
 
 		locked = super_lock_shared(sb);
@@ -934,7 +934,7 @@ struct super_block *user_get_super(dev_t dev, bool excl)
 		if (sb->s_dev != dev)
 			continue;
 
-		sb->s_count++;
+		refcount_inc(&sb->s_count);
 		spin_unlock(&sb_lock);
 
 		locked = super_lock(sb, excl);
@@ -1368,7 +1368,7 @@ static struct super_block *bdev_super_lock(struct block_device *bdev, bool excl)
 
 	/* Make sure sb doesn't go away from under us */
 	spin_lock(&sb_lock);
-	sb->s_count++;
+	refcount_inc(&sb->s_count);
 	spin_unlock(&sb_lock);
 
 	mutex_unlock(&bdev->bd_holder_lock);
diff --git a/include/linux/fs/super_types.h b/include/linux/fs/super_types.h
index 383050e7fdf5..3a8cc0c723a8 100644
--- a/include/linux/fs/super_types.h
+++ b/include/linux/fs/super_types.h
@@ -11,6 +11,7 @@
 #include <linux/uidgid.h>
 #include <linux/uuid.h>
 #include <linux/percpu-rwsem.h>
+#include <linux/refcount.h>
 #include <linux/workqueue_types.h>
 #include <linux/quota.h>
 
@@ -145,7 +146,7 @@ struct super_block {
 	unsigned long				s_magic;
 	struct dentry				*s_root;
 	struct rw_semaphore			s_umount;
-	int					s_count;
+	refcount_t				s_count;
 	atomic_t				s_active;
 #ifdef CONFIG_SECURITY
 	void					*s_security;

-- 
2.47.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 7/8] super: switch list manipulation to _rcu primitives
  2026-05-26 15:09 [PATCH 0/8] super: retire sget(), convert iterators to RCU Christian Brauner
                   ` (5 preceding siblings ...)
  2026-05-26 15:09 ` [PATCH 6/8] super: convert sb->s_count to refcount_t Christian Brauner
@ 2026-05-26 15:09 ` Christian Brauner
  2026-05-26 15:09 ` [PATCH 8/8] super: convert iterators to RCU readers + refcount_inc_not_zero Christian Brauner
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: Christian Brauner @ 2026-05-26 15:09 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Theodore Ts'o, Andreas Dilger, Jan Kara, Ritesh Harjani (IBM),
	linux-ext4, linux-cifs, Alexander Viro,
	Christian Brauner (Amutable)

Swap the list/hlist write-side operations on @super_blocks and
@fs_type->fs_supers over to their _rcu variants. All three call sites
still hold sb_lock; this is a purely mechanical change that
establishes the writer-side memory ordering lockless RCU readers can
rely on in the next patch.

The affected sites are sget_fc() (list_add_tail() and
hlist_add_head() at the publish step), __put_super()
(list_del_init() -> list_bidir_del_rcu() of s_list when the last
temporary reference is dropped) and kill_super_notify()
(hlist_del_init() -> hlist_del_rcu() of s_instances).

@super_blocks gets list_bidir_del_rcu() rather than list_del_rcu()
because the next patch walks the list backward for
filesystems_freeze() and do_emergency_remount(). list_del_rcu()
preserves the unlinked entry's ->next pointer but poisons ->prev with
LIST_POISON2, which would crash any concurrent reverse traversal that
landed on the just-unlinked entry between the SB_DYING check and the
cursor advance. list_bidir_del_rcu() preserves both ->next and
->prev so reverse traversal stays safe. See kernel/nstree.c for the
canonical bidirectional-RCU list pattern.

The "_init" half of the deletions is not used elsewhere on these list
nodes after removal so dropping it is fine. The entry is about to be
freed via call_rcu(destroy_super_rcu) (for s_list) or to disappear
with the superblock (for s_instances, once the list has done its job
notifying SB_DEAD waiters).

Iterators keep using plain list_for_each_entry() and
hlist_for_each_entry() under sb_lock. Their conversion to lockless
RCU traversal with refcount_inc_not_zero() is the next patch.

Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
---
 fs/super.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index 2fa7023010ec..8c01b95be717 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -407,7 +407,7 @@ static struct super_block *alloc_super(struct file_system_type *type, int flags,
 static void __put_super(struct super_block *s)
 {
 	if (refcount_dec_and_test(&s->s_count)) {
-		list_del_init(&s->s_list);
+		list_bidir_del_rcu(&s->s_list);
 		WARN_ON(s->s_dentry_lru.node);
 		WARN_ON(s->s_inode_lru.node);
 		WARN_ON(s->s_mounts);
@@ -445,7 +445,7 @@ static void kill_super_notify(struct super_block *sb)
 	 * SB_DEAD.
 	 */
 	spin_lock(&sb_lock);
-	hlist_del_init(&sb->s_instances);
+	hlist_del_rcu(&sb->s_instances);
 	spin_unlock(&sb_lock);
 
 	/*
@@ -784,8 +784,8 @@ struct super_block *sget_fc(struct fs_context *fc,
 	 * It's in a nascent state and users should wait on SB_BORN or
 	 * SB_DYING to be set.
 	 */
-	list_add_tail(&s->s_list, &super_blocks);
-	hlist_add_head(&s->s_instances, &s->s_type->fs_supers);
+	list_add_tail_rcu(&s->s_list, &super_blocks);
+	hlist_add_head_rcu(&s->s_instances, &s->s_type->fs_supers);
 	spin_unlock(&sb_lock);
 	get_filesystem(s->s_type);
 	shrinker_register(s->s_shrink);

-- 
2.47.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 8/8] super: convert iterators to RCU readers + refcount_inc_not_zero
  2026-05-26 15:09 [PATCH 0/8] super: retire sget(), convert iterators to RCU Christian Brauner
                   ` (6 preceding siblings ...)
  2026-05-26 15:09 ` [PATCH 7/8] super: switch list manipulation to _rcu primitives Christian Brauner
@ 2026-05-26 15:09 ` Christian Brauner
  2026-05-27 11:54 ` [PATCH 0/8] super: retire sget(), convert iterators to RCU Christian Brauner
  2026-05-28 11:18 ` Jan Kara
  9 siblings, 0 replies; 15+ messages in thread
From: Christian Brauner @ 2026-05-26 15:09 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Theodore Ts'o, Andreas Dilger, Jan Kara, Ritesh Harjani (IBM),
	linux-ext4, linux-cifs, Alexander Viro,
	Christian Brauner (Amutable)

Walk @super_blocks and @fs_supers under rcu_read_lock() and pin the
current entry with refcount_inc_not_zero() instead of holding sb_lock
across the cursor advance. sb_lock was only there to keep the
cursor's ->next / ->prev pointer from being mutated by concurrent
list_del / list_add. RCU semantics give us that guarantee directly:
list_bidir_del_rcu() preserves both ->next and ->prev on the
unlinked entry and list_add_tail_rcu() publishes new entries with
the release barrier set up by the previous patch.

The pattern at each iterator is:

    rcu_read_lock();
    list_for_each_entry_rcu(sb, ...) {
            if (SB_DYING)                             continue;
            if (!refcount_inc_not_zero(&sb->s_count)) continue;
            rcu_read_unlock();

            ...                       /* may sleep on s_umount */

            if (prev)
                    put_super(prev);
            prev = sb;
            rcu_read_lock();          /* prev pinned: prev->{next,prev} valid */
    }
    rcu_read_unlock();
    if (prev)
            put_super(prev);

While we hold a pin on @prev, __put_super() cannot reach the
refcount_dec_and_test() transition that drives list_bidir_del_rcu().
So @prev stays on the list and concurrent list_bidir_del_rcu() of
other entries keeps @prev->s_list.{next,prev} pointing at the still-
live neighbour (or the head sentinel). The cursor advance after
re-acquiring rcu_read_lock() is therefore always against a live
chain in whichever direction we're walking.

put_super() now appears in the middle of the loop where __put_super()
used to be called with sb_lock held. It briefly takes sb_lock for
the trailing-ref drop; in the common case dec_and_test() returns
false and the lock is held for only a handful of cycles.

first_super() and next_super() switch the forward arm to READ_ONCE()
on the head and cursor ->next pointers and the reverse arm to
rcu_dereference(list_bidir_prev_rcu(...)). The forward arm matches
the semantics of list_entry_rcu() used internally by
list_for_each_entry_rcu(); the reverse arm is the canonical
bidirectional-RCU traversal pattern (see kernel/nstree.c) and is
needed because filesystems_freeze() and do_emergency_remount() pass
SUPER_ITER_REVERSE.

iterate_supers_type() and user_get_super() get the same treatment.
user_get_super() simplifies further: on lookup hit we return with
the pin; on lookup miss followed by SB_DYING discovery we put_super()
and return NULL.

sget_fc() and grab_super() are not touched here.

Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
---
 fs/super.c | 71 +++++++++++++++++++++++++++++++++-----------------------------
 1 file changed, 38 insertions(+), 33 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index 8c01b95be717..d9b1148f7030 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -831,17 +831,25 @@ enum super_iter_flags_t {
 
 static inline struct super_block *first_super(enum super_iter_flags_t flags)
 {
+	struct list_head *next;
+
 	if (flags & SUPER_ITER_REVERSE)
-		return list_last_entry(&super_blocks, struct super_block, s_list);
-	return list_first_entry(&super_blocks, struct super_block, s_list);
+		next = rcu_dereference(list_bidir_prev_rcu(&super_blocks));
+	else
+		next = READ_ONCE(super_blocks.next);
+	return list_entry(next, struct super_block, s_list);
 }
 
 static inline struct super_block *next_super(struct super_block *sb,
 					     enum super_iter_flags_t flags)
 {
+	struct list_head *next;
+
 	if (flags & SUPER_ITER_REVERSE)
-		return list_prev_entry(sb, s_list);
-	return list_next_entry(sb, s_list);
+		next = rcu_dereference(list_bidir_prev_rcu(&sb->s_list));
+	else
+		next = READ_ONCE(sb->s_list.next);
+	return list_entry(next, struct super_block, s_list);
 }
 
 static void __iterate_supers(void (*f)(struct super_block *, void *), void *arg,
@@ -850,15 +858,15 @@ static void __iterate_supers(void (*f)(struct super_block *, void *), void *arg,
 	struct super_block *sb, *p = NULL;
 	bool excl = flags & SUPER_ITER_EXCL;
 
-	guard(spinlock)(&sb_lock);
-
+	rcu_read_lock();
 	for (sb = first_super(flags);
 	     !list_entry_is_head(sb, &super_blocks, s_list);
 	     sb = next_super(sb, flags)) {
 		if (super_flags(sb, SB_DYING))
 			continue;
-		refcount_inc(&sb->s_count);
-		spin_unlock(&sb_lock);
+		if (!refcount_inc_not_zero(&sb->s_count))
+			continue;
+		rcu_read_unlock();
 
 		if (flags & SUPER_ITER_UNLOCKED) {
 			f(sb, arg);
@@ -867,13 +875,14 @@ static void __iterate_supers(void (*f)(struct super_block *, void *), void *arg,
 			super_unlock(sb, excl);
 		}
 
-		spin_lock(&sb_lock);
 		if (p)
-			__put_super(p);
+			put_super(p);
 		p = sb;
+		rcu_read_lock();
 	}
+	rcu_read_unlock();
 	if (p)
-		__put_super(p);
+		put_super(p);
 }
 
 void iterate_supers(void (*f)(struct super_block *, void *), void *arg)
@@ -895,15 +904,15 @@ void iterate_supers_type(struct file_system_type *type,
 {
 	struct super_block *sb, *p = NULL;
 
-	spin_lock(&sb_lock);
-	hlist_for_each_entry(sb, &type->fs_supers, s_instances) {
+	rcu_read_lock();
+	hlist_for_each_entry_rcu(sb, &type->fs_supers, s_instances) {
 		bool locked;
 
 		if (super_flags(sb, SB_DYING))
 			continue;
-
-		refcount_inc(&sb->s_count);
-		spin_unlock(&sb_lock);
+		if (!refcount_inc_not_zero(&sb->s_count))
+			continue;
+		rcu_read_unlock();
 
 		locked = super_lock_shared(sb);
 		if (locked) {
@@ -911,14 +920,14 @@ void iterate_supers_type(struct file_system_type *type,
 			super_unlock_shared(sb);
 		}
 
-		spin_lock(&sb_lock);
 		if (p)
-			__put_super(p);
+			put_super(p);
 		p = sb;
+		rcu_read_lock();
 	}
+	rcu_read_unlock();
 	if (p)
-		__put_super(p);
-	spin_unlock(&sb_lock);
+		put_super(p);
 }
 
 EXPORT_SYMBOL(iterate_supers_type);
@@ -927,25 +936,21 @@ struct super_block *user_get_super(dev_t dev, bool excl)
 {
 	struct super_block *sb;
 
-	spin_lock(&sb_lock);
-	list_for_each_entry(sb, &super_blocks, s_list) {
-		bool locked;
-
+	rcu_read_lock();
+	list_for_each_entry_rcu(sb, &super_blocks, s_list) {
 		if (sb->s_dev != dev)
 			continue;
+		if (!refcount_inc_not_zero(&sb->s_count))
+			continue;
+		rcu_read_unlock();
 
-		refcount_inc(&sb->s_count);
-		spin_unlock(&sb_lock);
-
-		locked = super_lock(sb, excl);
-		if (locked)
+		if (super_lock(sb, excl))
 			return sb;
 
-		spin_lock(&sb_lock);
-		__put_super(sb);
-		break;
+		put_super(sb);
+		return NULL;
 	}
-	spin_unlock(&sb_lock);
+	rcu_read_unlock();
 	return NULL;
 }
 

-- 
2.47.3


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/8] ext4: convert mballoc KUnit test to sget_fc()
  2026-05-26 15:09 ` [PATCH 2/8] ext4: convert mballoc " Christian Brauner
@ 2026-05-27  0:47   ` Theodore Tso
  2026-05-28 12:02     ` Christian Brauner
  0 siblings, 1 reply; 15+ messages in thread
From: Theodore Tso @ 2026-05-27  0:47 UTC (permalink / raw)
  To: Christian Brauner
  Cc: linux-fsdevel, Andreas Dilger, Jan Kara, Ritesh Harjani (IBM),
	linux-ext4, linux-cifs, Alexander Viro

On Tue, May 26, 2026 at 05:09:04PM +0200, Christian Brauner wrote:
> Add a no-op mbt_init_fs_context() so fs_context_for_mount() has
> something to call on the fake fs_type....

I was trying to figure out what needed to be in an init_fs_context()
functrion, and I came accross this in
Documentation/filesystems/mount_api.rst:

       const struct fs_context_operations *ops

     These are operations that can be done on a filesystem context (see
     below).  This must be set by the ->init_fs_context() file_system_type
     operation.    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

So is it safe to just have an init_fs_context() function which doesn't
do this?

> +static int mbt_init_fs_context(struct fs_context *fc)
> +{
> +	return 0;
> +}
> +

I see in fs/fs_context.c that in some places the code protects against
a NULL ops pointer:

        if (fc->need_free && fc->ops && fc->ops->free)
		fc->ops->free(fc);

But in other places, it doesn't and we'll end up derefrencing a null
pointer:

        if (fc->ops->parse_param) {
		ret = fc->ops->parse_param(fc, param);

	....

So it's unclear to me --- when is it safe (and not safe) to not bother
to fill in the ops pointer?

						- Ted

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 5/8] super: drop sb_lock from setup_bdev_super() tuple publication
  2026-05-26 15:09 ` [PATCH 5/8] super: drop sb_lock from setup_bdev_super() tuple publication Christian Brauner
@ 2026-05-27 11:53   ` Christian Brauner
  0 siblings, 0 replies; 15+ messages in thread
From: Christian Brauner @ 2026-05-27 11:53 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Theodore Ts'o, Andreas Dilger, Jan Kara, Ritesh Harjani (IBM),
	linux-ext4, linux-cifs, Alexander Viro

>  	}
> -	spin_lock(&sb_lock);

Yeah, I failed to consider that we need to protect against a concurrent
sget_fc() call with a custom callback so we cannot reasonably drop this
lock.

> -	spin_unlock(&sb_lock);
> +		WRITE_ONCE(sb->s_iflags, sb->s_iflags | SB_I_STABLE_WRITES);

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/8] super: retire sget(), convert iterators to RCU
  2026-05-26 15:09 [PATCH 0/8] super: retire sget(), convert iterators to RCU Christian Brauner
                   ` (7 preceding siblings ...)
  2026-05-26 15:09 ` [PATCH 8/8] super: convert iterators to RCU readers + refcount_inc_not_zero Christian Brauner
@ 2026-05-27 11:54 ` Christian Brauner
  2026-05-28 11:18 ` Jan Kara
  9 siblings, 0 replies; 15+ messages in thread
From: Christian Brauner @ 2026-05-27 11:54 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Theodore Ts'o, Andreas Dilger, Jan Kara, Ritesh Harjani (IBM),
	linux-ext4, linux-cifs, Alexander Viro

On Tue, May 26, 2026 at 05:09:02PM +0200, Christian Brauner wrote:
> * retire sget(): CIFS plus the two ext4 KUnit tests (extents-test,
> 
> * Walk @super_blocks and @type->fs_supers under RCU, pinned by

Can't work as I originally envisioned.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/8] super: retire sget(), convert iterators to RCU
  2026-05-26 15:09 [PATCH 0/8] super: retire sget(), convert iterators to RCU Christian Brauner
                   ` (8 preceding siblings ...)
  2026-05-27 11:54 ` [PATCH 0/8] super: retire sget(), convert iterators to RCU Christian Brauner
@ 2026-05-28 11:18 ` Jan Kara
  9 siblings, 0 replies; 15+ messages in thread
From: Jan Kara @ 2026-05-28 11:18 UTC (permalink / raw)
  To: Christian Brauner
  Cc: linux-fsdevel, Theodore Ts'o, Andreas Dilger, Jan Kara,
	Ritesh Harjani (IBM), linux-ext4, linux-cifs, Alexander Viro

On Tue 26-05-26 17:09:02, Christian Brauner wrote:
> * retire sget(): CIFS plus the two ext4 KUnit tests (extents-test,
>   mballoc-test) were the last in-tree callers, and all three convert
>   cleanly to sget_fc(). That lets sget() and its prototype come out,
>   taking ~60 lines that only existed to be kept in lockstep with
>   sget_fc() on every publish-path change.

This is definitely a good cleanup!

> * Walk @super_blocks and @type->fs_supers under RCU, pinned by
>   refcount_inc_not_zero(&sb->s_count). iterate_supers(),
>   iterate_supers_type(), user_get_super(), do_emergency_remount(),
>   filesystems_freeze() and filesystems_thaw() no longer hold sb_lock
>   across the cursor advance.
> 
>   The conversion goes in four small steps. Drop sb_lock from
>   setup_bdev_super(): the {s_bdev_file, s_bdev, s_bdi,
>   SB_I_STABLE_WRITES} tuple is publication of immutable state, and
>   SB_BORN already gates every reader via super_wake()'s
>   smp_store_release paired with super_flags()'s smp_load_acquire. Then
>   convert sb->s_count to refcount_t -- mechanical, every increment is
>   still under sb_lock. Then switch the write-side list/hlist ops to
>   their _rcu variants; @super_blocks gets list_bidir_del_rcu() so the
>   reverse-walking iterators (filesystems_freeze, do_emergency_remount)
>   keep a valid ->prev on the unlinked entry, matching the canonical
>   pattern in kernel/nstree.c. Finally, convert the iterators themselves:
>   cursor advance via READ_ONCE / rcu_dereference, with the previous
>   entry kept pinned via its s_count across the rcu_read_unlock ->
>   callback -> rcu_read_lock cycle.

So I guess the motivation for getting rid of sb_lock is some contention on
it you can observe? When exactly? It would be nice to mention the
motivation as a justification for the additional complexity...

								Honza

> 
> Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
> ---
> Christian Brauner (8):
>       ext4: convert extents KUnit test to sget_fc()
>       ext4: convert mballoc KUnit test to sget_fc()
>       smb: client: convert cifs_smb3_do_mount() to sget_fc()
>       fs: retire sget()
>       super: drop sb_lock from setup_bdev_super() tuple publication
>       super: convert sb->s_count to refcount_t
>       super: switch list manipulation to _rcu primitives
>       super: convert iterators to RCU readers + refcount_inc_not_zero
> 
>  fs/btrfs/super.c               |   2 +-
>  fs/ext4/extents-test.c         |  22 +++++-
>  fs/ext4/mballoc-test.c         |  17 ++++-
>  fs/smb/client/cifsfs.c         |  43 ++++++-----
>  fs/smb/client/cifsfs.h         |   3 +-
>  fs/smb/client/cifsproto.h      |   3 +-
>  fs/smb/client/connect.c        |   5 +-
>  fs/smb/client/fs_context.c     |   2 +-
>  fs/super.c                     | 167 ++++++++++++++---------------------------
>  include/linux/fs.h             |   4 -
>  include/linux/fs/super_types.h |   3 +-
>  11 files changed, 127 insertions(+), 144 deletions(-)
> ---
> base-commit: 254f49634ee16a731174d2ae34bc50bd5f45e731
> change-id: 20260526-work-sget-6bc80b96cba5
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/8] ext4: convert mballoc KUnit test to sget_fc()
  2026-05-27  0:47   ` Theodore Tso
@ 2026-05-28 12:02     ` Christian Brauner
  2026-06-03 13:52       ` Theodore Tso
  0 siblings, 1 reply; 15+ messages in thread
From: Christian Brauner @ 2026-05-28 12:02 UTC (permalink / raw)
  To: Theodore Tso
  Cc: linux-fsdevel, Andreas Dilger, Jan Kara, Ritesh Harjani (IBM),
	linux-ext4, linux-cifs, Alexander Viro

On Tue, May 26, 2026 at 07:47:27PM -0500, Theodore Ts'o wrote:
> On Tue, May 26, 2026 at 05:09:04PM +0200, Christian Brauner wrote:
> > Add a no-op mbt_init_fs_context() so fs_context_for_mount() has
> > something to call on the fake fs_type....
> 
> I was trying to figure out what needed to be in an init_fs_context()
> functrion, and I came accross this in
> Documentation/filesystems/mount_api.rst:
> 
>        const struct fs_context_operations *ops
> 
>      These are operations that can be done on a filesystem context (see
>      below).  This must be set by the ->init_fs_context() file_system_type
>      operation.    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> So is it safe to just have an init_fs_context() function which doesn't
> do this?


> 
> > +static int mbt_init_fs_context(struct fs_context *fc)
> > +{
> > +	return 0;
> > +}
> > +
> 
> I see in fs/fs_context.c that in some places the code protects against
> a NULL ops pointer:
> 
>         if (fc->need_free && fc->ops && fc->ops->free)
> 		fc->ops->free(fc);
> 
> But in other places, it doesn't and we'll end up derefrencing a null
> pointer:
> 
>         if (fc->ops->parse_param) {
> 		ret = fc->ops->parse_param(fc, param);
> 
> 	....
> 
> So it's unclear to me --- when is it safe (and not safe) to not bother
> to fill in the ops pointer?

Hey Ted!

In these two cases it's fine. Because you're just using the allocation
and deallocation functions to get a fs_context that's basically just an
empty vessel to get at a superblock via sget_fc() but you're not really
doing anything with it.

IOW, you can never end up in callchains that cause issues.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/8] ext4: convert mballoc KUnit test to sget_fc()
  2026-05-28 12:02     ` Christian Brauner
@ 2026-06-03 13:52       ` Theodore Tso
  0 siblings, 0 replies; 15+ messages in thread
From: Theodore Tso @ 2026-06-03 13:52 UTC (permalink / raw)
  To: Christian Brauner
  Cc: linux-fsdevel, Andreas Dilger, Jan Kara, Ritesh Harjani (IBM),
	linux-ext4, linux-cifs, Alexander Viro

On Thu, May 28, 2026 at 02:02:50PM +0200, Christian Brauner wrote:
> 
> In these two cases it's fine. Because you're just using the allocation
> and deallocation functions to get a fs_context that's basically just an
> empty vessel to get at a superblock via sget_fc() but you're not really
> doing anything with it.

If you're OK with, I have no objects, but...

I'm sure it's fine today.  But is this something which is documented
to be fine in the future?  It just seems a little fragile and is
contrary to the documentation.

Thanks,

						- Ted

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2026-06-03 13:52 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-26 15:09 [PATCH 0/8] super: retire sget(), convert iterators to RCU Christian Brauner
2026-05-26 15:09 ` [PATCH 1/8] ext4: convert extents KUnit test to sget_fc() Christian Brauner
2026-05-26 15:09 ` [PATCH 2/8] ext4: convert mballoc " Christian Brauner
2026-05-27  0:47   ` Theodore Tso
2026-05-28 12:02     ` Christian Brauner
2026-06-03 13:52       ` Theodore Tso
2026-05-26 15:09 ` [PATCH 3/8] smb: client: convert cifs_smb3_do_mount() " Christian Brauner
2026-05-26 15:09 ` [PATCH 4/8] fs: retire sget() Christian Brauner
2026-05-26 15:09 ` [PATCH 5/8] super: drop sb_lock from setup_bdev_super() tuple publication Christian Brauner
2026-05-27 11:53   ` Christian Brauner
2026-05-26 15:09 ` [PATCH 6/8] super: convert sb->s_count to refcount_t Christian Brauner
2026-05-26 15:09 ` [PATCH 7/8] super: switch list manipulation to _rcu primitives Christian Brauner
2026-05-26 15:09 ` [PATCH 8/8] super: convert iterators to RCU readers + refcount_inc_not_zero Christian Brauner
2026-05-27 11:54 ` [PATCH 0/8] super: retire sget(), convert iterators to RCU Christian Brauner
2026-05-28 11:18 ` Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox