[PATCH v3 0/4] super: allow waiting without s

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v3 0/4] super: allow waiting without s_umount held
@ 2023-08-18 14:00 Christian Brauner
  2023-08-18 14:00 ` [PATCH v3 1/4] super: use locking helpers Christian Brauner
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Christian Brauner @ 2023-08-18 14:00 UTC (permalink / raw)
  To: Jan Kara, Christoph Hellwig; +Cc: Al Viro, linux-fsdevel, Christian Brauner

Hey everyone,

This is an attempty to allow concurrent mounters and iterators to wait
on superblock state changes without having to hold s_umount. This is
made necessary by recent attempts to open block devices after superblock
creation and fixing deadlocks due to blkdev_put() trying to acquire
s_umount while s_umount is already held.

This is on top of Jan's and Christoph's work in vfs.super. Obviously not
for v6.6. I hope I got it right but this is intricate. 

It reliably survives xfstests for btrfs, ext4, and xfs while
concurrently having 7 processes running ustat() hammering on
super_blocks and a while true loop that tries to mount a filsystem with
an invalid superblock hammering on sget{_fc}() concurrently as well.
Even with inserting an arbitrary 5s delay after generic_shutdown_super()
and blkdev_put() things work fine.

Thanks and don't hit me over the head with things.
Christian

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
Changes in v3:
- Improve super_lock() description.
- Use full barrier in super_wake() to ensure that waitqueue_active()
  check works as expected.
- Simplify grab_super_dead().
- Link to v2: https://lore.kernel.org/r/20230818-vfs-super-fixes-v3-v2-0-cdab45934983@kernel.org

Changes in v2:
- Rename various functions according to Jan's and Willy's suggestions.
- Remove smp_wmb() as smp_store_release() is enough.
- Remove hlist_unhashed() checks now that we wait on SB_DYING.
- Link to v1: https://lore.kernel.org/r/20230817-vfs-super-fixes-v3-v1-0-06ddeca7059b@kernel.org

---
Christian Brauner (4):
      super: use locking helpers
      super: make locking naming consistent
      super: wait for nascent superblocks
      super: wait until we passed kill super

 fs/fs-writeback.c  |   4 +-
 fs/internal.h      |   2 +-
 fs/super.c         | 377 +++++++++++++++++++++++++++++++++++++++--------------
 include/linux/fs.h |   2 +
 4 files changed, 286 insertions(+), 99 deletions(-)
---
base-commit: f3aeab61fb15edef1e81828da8dbf0814541e49b
change-id: 20230816-vfs-super-fixes-v3-f2cff6192a50

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v3 1/4] super: use locking helpers
  2023-08-18 14:00 [PATCH v3 0/4] super: allow waiting without s_umount held Christian Brauner
@ 2023-08-18 14:00 ` Christian Brauner
  2023-08-18 14:00 ` [PATCH v3 2/4] super: make locking naming consistent Christian Brauner
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 14+ messages in thread
From: Christian Brauner @ 2023-08-18 14:00 UTC (permalink / raw)
  To: Jan Kara, Christoph Hellwig; +Cc: Al Viro, linux-fsdevel, Christian Brauner

Replace the open-coded {down,up}_{read,write}() calls with simple
wrappers. Follow-up patches will benefit from this as well.

Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 fs/super.c | 126 ++++++++++++++++++++++++++++++++++++++-----------------------
 1 file changed, 78 insertions(+), 48 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index c878e7373f93..b12e2f247e1e 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -50,6 +50,42 @@ static char *sb_writers_name[SB_FREEZE_LEVELS] = {
 	"sb_internal",
 };
 
+static inline void super_lock(struct super_block *sb, bool excl)
+{
+	if (excl)
+		down_write(&sb->s_umount);
+	else
+		down_read(&sb->s_umount);
+}
+
+static inline void super_unlock(struct super_block *sb, bool excl)
+{
+	if (excl)
+		up_write(&sb->s_umount);
+	else
+		up_read(&sb->s_umount);
+}
+
+static inline void super_lock_excl(struct super_block *sb)
+{
+	super_lock(sb, true);
+}
+
+static inline void super_lock_shared(struct super_block *sb)
+{
+	super_lock(sb, false);
+}
+
+static inline void super_unlock_excl(struct super_block *sb)
+{
+	super_unlock(sb, true);
+}
+
+static inline void super_unlock_shared(struct super_block *sb)
+{
+	super_unlock(sb, false);
+}
+
 /*
  * One thing we have to be careful of with a per-sb shrinker is that we don't
  * drop the last active reference to the superblock from within the shrinker.
@@ -110,7 +146,7 @@ static unsigned long super_cache_scan(struct shrinker *shrink,
 		freed += sb->s_op->free_cached_objects(sb, sc);
 	}
 
-	up_read(&sb->s_umount);
+	super_unlock_shared(sb);
 	return freed;
 }
 
@@ -176,7 +212,7 @@ static void destroy_unused_super(struct super_block *s)
 {
 	if (!s)
 		return;
-	up_write(&s->s_umount);
+	super_unlock_excl(s);
 	list_lru_destroy(&s->s_dentry_lru);
 	list_lru_destroy(&s->s_inode_lru);
 	security_sb_free(s);
@@ -340,7 +376,7 @@ void deactivate_locked_super(struct super_block *s)
 		put_filesystem(fs);
 		put_super(s);
 	} else {
-		up_write(&s->s_umount);
+		super_unlock_excl(s);
 	}
 }
 
@@ -357,7 +393,7 @@ EXPORT_SYMBOL(deactivate_locked_super);
 void deactivate_super(struct super_block *s)
 {
 	if (!atomic_add_unless(&s->s_active, -1, 1)) {
-		down_write(&s->s_umount);
+		super_lock_excl(s);
 		deactivate_locked_super(s);
 	}
 }
@@ -381,12 +417,12 @@ static int grab_super(struct super_block *s) __releases(sb_lock)
 {
 	s->s_count++;
 	spin_unlock(&sb_lock);
-	down_write(&s->s_umount);
+	super_lock_excl(s);
 	if ((s->s_flags & SB_BORN) && atomic_inc_not_zero(&s->s_active)) {
 		put_super(s);
 		return 1;
 	}
-	up_write(&s->s_umount);
+	super_unlock_excl(s);
 	put_super(s);
 	return 0;
 }
@@ -414,7 +450,7 @@ bool trylock_super(struct super_block *sb)
 		if (!hlist_unhashed(&sb->s_instances) &&
 		    sb->s_root && (sb->s_flags & SB_BORN))
 			return true;
-		up_read(&sb->s_umount);
+		super_unlock_shared(sb);
 	}
 
 	return false;
@@ -439,13 +475,13 @@ bool trylock_super(struct super_block *sb)
 void retire_super(struct super_block *sb)
 {
 	WARN_ON(!sb->s_bdev);
-	down_write(&sb->s_umount);
+	super_lock_excl(sb);
 	if (sb->s_iflags & SB_I_PERSB_BDI) {
 		bdi_unregister(sb->s_bdi);
 		sb->s_iflags &= ~SB_I_PERSB_BDI;
 	}
 	sb->s_iflags |= SB_I_RETIRED;
-	up_write(&sb->s_umount);
+	super_unlock_excl(sb);
 }
 EXPORT_SYMBOL(retire_super);
 
@@ -521,7 +557,7 @@ void generic_shutdown_super(struct super_block *sb)
 	/* should be initialized for __put_super_and_need_restart() */
 	hlist_del_init(&sb->s_instances);
 	spin_unlock(&sb_lock);
-	up_write(&sb->s_umount);
+	super_unlock_excl(sb);
 	if (sb->s_bdi != &noop_backing_dev_info) {
 		if (sb->s_iflags & SB_I_PERSB_BDI)
 			bdi_unregister(sb->s_bdi);
@@ -685,7 +721,7 @@ EXPORT_SYMBOL(sget);
 
 void drop_super(struct super_block *sb)
 {
-	up_read(&sb->s_umount);
+	super_unlock_shared(sb);
 	put_super(sb);
 }
 
@@ -693,7 +729,7 @@ EXPORT_SYMBOL(drop_super);
 
 void drop_super_exclusive(struct super_block *sb)
 {
-	up_write(&sb->s_umount);
+	super_unlock_excl(sb);
 	put_super(sb);
 }
 EXPORT_SYMBOL(drop_super_exclusive);
@@ -739,10 +775,10 @@ void iterate_supers(void (*f)(struct super_block *, void *), void *arg)
 		sb->s_count++;
 		spin_unlock(&sb_lock);
 
-		down_read(&sb->s_umount);
+		super_lock_shared(sb);
 		if (sb->s_root && (sb->s_flags & SB_BORN))
 			f(sb, arg);
-		up_read(&sb->s_umount);
+		super_unlock_shared(sb);
 
 		spin_lock(&sb_lock);
 		if (p)
@@ -773,10 +809,10 @@ void iterate_supers_type(struct file_system_type *type,
 		sb->s_count++;
 		spin_unlock(&sb_lock);
 
-		down_read(&sb->s_umount);
+		super_lock_shared(sb);
 		if (sb->s_root && (sb->s_flags & SB_BORN))
 			f(sb, arg);
-		up_read(&sb->s_umount);
+		super_unlock_shared(sb);
 
 		spin_lock(&sb_lock);
 		if (p)
@@ -813,7 +849,7 @@ struct super_block *get_active_super(struct block_device *bdev)
 		if (sb->s_bdev == bdev) {
 			if (!grab_super(sb))
 				goto restart;
-			up_write(&sb->s_umount);
+			super_unlock_excl(sb);
 			return sb;
 		}
 	}
@@ -833,17 +869,11 @@ struct super_block *user_get_super(dev_t dev, bool excl)
 		if (sb->s_dev ==  dev) {
 			sb->s_count++;
 			spin_unlock(&sb_lock);
-			if (excl)
-				down_write(&sb->s_umount);
-			else
-				down_read(&sb->s_umount);
+			super_lock(sb, excl);
 			/* still alive? */
 			if (sb->s_root && (sb->s_flags & SB_BORN))
 				return sb;
-			if (excl)
-				up_write(&sb->s_umount);
-			else
-				up_read(&sb->s_umount);
+			super_unlock(sb, excl);
 			/* nope, got unmounted */
 			spin_lock(&sb_lock);
 			__put_super(sb);
@@ -889,9 +919,9 @@ int reconfigure_super(struct fs_context *fc)
 
 	if (remount_ro) {
 		if (!hlist_empty(&sb->s_pins)) {
-			up_write(&sb->s_umount);
+			super_unlock_excl(sb);
 			group_pin_kill(&sb->s_pins);
-			down_write(&sb->s_umount);
+			super_lock_excl(sb);
 			if (!sb->s_root)
 				return 0;
 			if (sb->s_writers.frozen != SB_UNFROZEN)
@@ -954,7 +984,7 @@ int reconfigure_super(struct fs_context *fc)
 
 static void do_emergency_remount_callback(struct super_block *sb)
 {
-	down_write(&sb->s_umount);
+	super_lock_excl(sb);
 	if (sb->s_root && sb->s_bdev && (sb->s_flags & SB_BORN) &&
 	    !sb_rdonly(sb)) {
 		struct fs_context *fc;
@@ -967,7 +997,7 @@ static void do_emergency_remount_callback(struct super_block *sb)
 			put_fs_context(fc);
 		}
 	}
-	up_write(&sb->s_umount);
+	super_unlock_excl(sb);
 }
 
 static void do_emergency_remount(struct work_struct *work)
@@ -990,12 +1020,12 @@ void emergency_remount(void)
 
 static void do_thaw_all_callback(struct super_block *sb)
 {
-	down_write(&sb->s_umount);
+	super_lock_excl(sb);
 	if (sb->s_root && sb->s_flags & SB_BORN) {
 		emergency_thaw_bdev(sb);
 		thaw_super_locked(sb);
 	} else {
-		up_write(&sb->s_umount);
+		super_unlock_excl(sb);
 	}
 }
 
@@ -1182,10 +1212,10 @@ EXPORT_SYMBOL(get_tree_keyed);
  */
 static bool lock_active_super(struct super_block *sb)
 {
-	down_read(&sb->s_umount);
+	super_lock_shared(sb);
 	if (!sb->s_root ||
 	    (sb->s_flags & (SB_ACTIVE | SB_BORN)) != (SB_ACTIVE | SB_BORN)) {
-		up_read(&sb->s_umount);
+		super_unlock_shared(sb);
 		return false;
 	}
 	return true;
@@ -1208,7 +1238,7 @@ static void fs_bdev_mark_dead(struct block_device *bdev, bool surprise)
 	if (sb->s_op->shutdown)
 		sb->s_op->shutdown(sb);
 
-	up_read(&sb->s_umount);
+	super_unlock_shared(sb);
 }
 
 static void fs_bdev_sync(struct block_device *bdev)
@@ -1220,7 +1250,7 @@ static void fs_bdev_sync(struct block_device *bdev)
 	if (!lock_active_super(sb))
 		return;
 	sync_filesystem(sb);
-	up_read(&sb->s_umount);
+	super_unlock_shared(sb);
 }
 
 const struct blk_holder_ops fs_holder_ops = {
@@ -1342,9 +1372,9 @@ int get_tree_bdev(struct fs_context *fc,
 		 * bdev_mark_dead()). It is safe because we have active sb
 		 * reference and SB_BORN is not set yet.
 		 */
-		up_write(&s->s_umount);
+		super_unlock_excl(s);
 		error = setup_bdev_super(s, fc->sb_flags, fc);
-		down_write(&s->s_umount);
+		super_lock_excl(s);
 		if (!error)
 			error = fill_super(s, fc);
 		if (error) {
@@ -1394,9 +1424,9 @@ struct dentry *mount_bdev(struct file_system_type *fs_type,
 		 * bdev_mark_dead()). It is safe because we have active sb
 		 * reference and SB_BORN is not set yet.
 		 */
-		up_write(&s->s_umount);
+		super_unlock_excl(s);
 		error = setup_bdev_super(s, flags, NULL);
-		down_write(&s->s_umount);
+		super_lock_excl(s);
 		if (!error)
 			error = fill_super(s, data, flags & SB_SILENT ? 1 : 0);
 		if (error) {
@@ -1685,29 +1715,29 @@ int freeze_super(struct super_block *sb)
 	int ret;
 
 	atomic_inc(&sb->s_active);
-	down_write(&sb->s_umount);
+	super_lock_excl(sb);
 	if (sb->s_writers.frozen != SB_UNFROZEN) {
 		deactivate_locked_super(sb);
 		return -EBUSY;
 	}
 
 	if (!(sb->s_flags & SB_BORN)) {
-		up_write(&sb->s_umount);
+		super_unlock_excl(sb);
 		return 0;	/* sic - it's "nothing to do" */
 	}
 
 	if (sb_rdonly(sb)) {
 		/* Nothing to do really... */
 		sb->s_writers.frozen = SB_FREEZE_COMPLETE;
-		up_write(&sb->s_umount);
+		super_unlock_excl(sb);
 		return 0;
 	}
 
 	sb->s_writers.frozen = SB_FREEZE_WRITE;
 	/* Release s_umount to preserve sb_start_write -> s_umount ordering */
-	up_write(&sb->s_umount);
+	super_unlock_excl(sb);
 	sb_wait_write(sb, SB_FREEZE_WRITE);
-	down_write(&sb->s_umount);
+	super_lock_excl(sb);
 
 	/* Now we go and block page faults... */
 	sb->s_writers.frozen = SB_FREEZE_PAGEFAULT;
@@ -1743,7 +1773,7 @@ int freeze_super(struct super_block *sb)
 	 */
 	sb->s_writers.frozen = SB_FREEZE_COMPLETE;
 	lockdep_sb_freeze_release(sb);
-	up_write(&sb->s_umount);
+	super_unlock_excl(sb);
 	return 0;
 }
 EXPORT_SYMBOL(freeze_super);
@@ -1753,7 +1783,7 @@ static int thaw_super_locked(struct super_block *sb)
 	int error;
 
 	if (sb->s_writers.frozen != SB_FREEZE_COMPLETE) {
-		up_write(&sb->s_umount);
+		super_unlock_excl(sb);
 		return -EINVAL;
 	}
 
@@ -1770,7 +1800,7 @@ static int thaw_super_locked(struct super_block *sb)
 			printk(KERN_ERR
 				"VFS:Filesystem thaw failed\n");
 			lockdep_sb_freeze_release(sb);
-			up_write(&sb->s_umount);
+			super_unlock_excl(sb);
 			return error;
 		}
 	}
@@ -1790,7 +1820,7 @@ static int thaw_super_locked(struct super_block *sb)
  */
 int thaw_super(struct super_block *sb)
 {
-	down_write(&sb->s_umount);
+	super_lock_excl(sb);
 	return thaw_super_locked(sb);
 }
 EXPORT_SYMBOL(thaw_super);

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 2/4] super: make locking naming consistent
  2023-08-18 14:00 [PATCH v3 0/4] super: allow waiting without s_umount held Christian Brauner
  2023-08-18 14:00 ` [PATCH v3 1/4] super: use locking helpers Christian Brauner
@ 2023-08-18 14:00 ` Christian Brauner
  2023-08-18 14:00 ` [PATCH v3 3/4] super: wait for nascent superblocks Christian Brauner
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 14+ messages in thread
From: Christian Brauner @ 2023-08-18 14:00 UTC (permalink / raw)
  To: Jan Kara, Christoph Hellwig; +Cc: Al Viro, linux-fsdevel, Christian Brauner

Make the naming consistent with the earlier introduced
super_lock_{read,write}() helpers.

Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 fs/fs-writeback.c |  4 ++--
 fs/internal.h     |  2 +-
 fs/super.c        | 28 ++++++++++++++--------------
 3 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index aca4b4811394..969ce991b0b0 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1953,9 +1953,9 @@ static long __writeback_inodes_wb(struct bdi_writeback *wb,
 		struct inode *inode = wb_inode(wb->b_io.prev);
 		struct super_block *sb = inode->i_sb;
 
-		if (!trylock_super(sb)) {
+		if (!super_trylock_shared(sb)) {
 			/*
-			 * trylock_super() may fail consistently due to
+			 * super_trylock_shared() may fail consistently due to
 			 * s_umount being grabbed by someone else. Don't use
 			 * requeue_io() to avoid busy retrying the inode/sb.
 			 */
diff --git a/fs/internal.h b/fs/internal.h
index b94290f61714..74d3b161dd2c 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -115,7 +115,7 @@ static inline void put_file_access(struct file *file)
  * super.c
  */
 extern int reconfigure_super(struct fs_context *);
-extern bool trylock_super(struct super_block *sb);
+extern bool super_trylock_shared(struct super_block *sb);
 struct super_block *user_get_super(dev_t, bool excl);
 void put_super(struct super_block *sb);
 extern bool mount_capable(struct fs_context *);
diff --git a/fs/super.c b/fs/super.c
index b12e2f247e1e..ba5d813c5804 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -112,7 +112,7 @@ static unsigned long super_cache_scan(struct shrinker *shrink,
 	if (!(sc->gfp_mask & __GFP_FS))
 		return SHRINK_STOP;
 
-	if (!trylock_super(sb))
+	if (!super_trylock_shared(sb))
 		return SHRINK_STOP;
 
 	if (sb->s_op->nr_cached_objects)
@@ -159,17 +159,17 @@ static unsigned long super_cache_count(struct shrinker *shrink,
 	sb = container_of(shrink, struct super_block, s_shrink);
 
 	/*
-	 * We don't call trylock_super() here as it is a scalability bottleneck,
-	 * so we're exposed to partial setup state. The shrinker rwsem does not
-	 * protect filesystem operations backing list_lru_shrink_count() or
-	 * s_op->nr_cached_objects(). Counts can change between
-	 * super_cache_count and super_cache_scan, so we really don't need locks
-	 * here.
+	 * We don't call super_trylock_shared() here as it is a scalability
+	 * bottleneck, so we're exposed to partial setup state. The shrinker
+	 * rwsem does not protect filesystem operations backing
+	 * list_lru_shrink_count() or s_op->nr_cached_objects(). Counts can
+	 * change between super_cache_count and super_cache_scan, so we really
+	 * don't need locks here.
 	 *
 	 * However, if we are currently mounting the superblock, the underlying
 	 * filesystem might be in a state of partial construction and hence it
-	 * is dangerous to access it.  trylock_super() uses a SB_BORN check to
-	 * avoid this situation, so do the same here. The memory barrier is
+	 * is dangerous to access it.  super_trylock_shared() uses a SB_BORN check
+	 * to avoid this situation, so do the same here. The memory barrier is
 	 * matched with the one in mount_fs() as we don't hold locks here.
 	 */
 	if (!(sb->s_flags & SB_BORN))
@@ -428,7 +428,7 @@ static int grab_super(struct super_block *s) __releases(sb_lock)
 }
 
 /*
- *	trylock_super - try to grab ->s_umount shared
+ *	super_trylock_shared - try to grab ->s_umount shared
  *	@sb: reference we are trying to grab
  *
  *	Try to prevent fs shutdown.  This is used in places where we
@@ -444,7 +444,7 @@ static int grab_super(struct super_block *s) __releases(sb_lock)
  *	of down_read().  There's a couple of places that are OK with that, but
  *	it's very much not a general-purpose interface.
  */
-bool trylock_super(struct super_block *sb)
+bool super_trylock_shared(struct super_block *sb)
 {
 	if (down_read_trylock(&sb->s_umount)) {
 		if (!hlist_unhashed(&sb->s_instances) &&
@@ -1210,7 +1210,7 @@ EXPORT_SYMBOL(get_tree_keyed);
  * and the place that clears the pointer to the superblock used by this function
  * before freeing the superblock.
  */
-static bool lock_active_super(struct super_block *sb)
+static bool super_lock_shared_active(struct super_block *sb)
 {
 	super_lock_shared(sb);
 	if (!sb->s_root ||
@@ -1228,7 +1228,7 @@ static void fs_bdev_mark_dead(struct block_device *bdev, bool surprise)
 	/* bd_holder_lock ensures that the sb isn't freed */
 	lockdep_assert_held(&bdev->bd_holder_lock);
 
-	if (!lock_active_super(sb))
+	if (!super_lock_shared_active(sb))
 		return;
 
 	if (!surprise)
@@ -1247,7 +1247,7 @@ static void fs_bdev_sync(struct block_device *bdev)
 
 	lockdep_assert_held(&bdev->bd_holder_lock);
 
-	if (!lock_active_super(sb))
+	if (!super_lock_shared_active(sb))
 		return;
 	sync_filesystem(sb);
 	super_unlock_shared(sb);

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 3/4] super: wait for nascent superblocks
  2023-08-18 14:00 [PATCH v3 0/4] super: allow waiting without s_umount held Christian Brauner
  2023-08-18 14:00 ` [PATCH v3 1/4] super: use locking helpers Christian Brauner
  2023-08-18 14:00 ` [PATCH v3 2/4] super: make locking naming consistent Christian Brauner
@ 2023-08-18 14:00 ` Christian Brauner
  2023-08-18 14:57   ` Matthew Wilcox
  2023-08-21 15:52   ` Jan Kara
  2023-08-18 14:00 ` [PATCH v3 4/4] super: wait until we passed kill super Christian Brauner
  2023-08-22 12:36 ` [PATCH v3 0/4] super: allow waiting without s_umount held Christian Brauner
  4 siblings, 2 replies; 14+ messages in thread
From: Christian Brauner @ 2023-08-18 14:00 UTC (permalink / raw)
  To: Jan Kara, Christoph Hellwig; +Cc: Al Viro, linux-fsdevel, Christian Brauner

Recent patches experiment with making it possible to allocate a new
superblock before opening the relevant block device. Naturally this has
intricate side-effects that we get to learn about while developing this.

Superblock allocators such as sget{_fc}() return with s_umount of the
new superblock held and lock ordering currently requires that block
level locks such as bdev_lock and open_mutex rank above s_umount.

Before aca740cecbe5 ("fs: open block device after superblock creation")
ordering was guaranteed to be correct as block devices were opened prior
to superblock allocation and thus s_umount wasn't held. But now s_umount
must be dropped before opening block devices to avoid locking
violations.

This has consequences. The main one being that iterators over
@super_blocks and @fs_supers that grab a temporary reference to the
superblock can now also grab s_umount before the caller has managed to
open block devices and called fill_super(). So whereas before such
iterators or concurrent mounts would have simply slept on s_umount until
SB_BORN was set or the superblock was discard due to initalization
failure they can now needlessly spin through sget{_fc}().

If the caller is sleeping on bdev_lock or open_mutex one caller waiting
on SB_BORN will always spin somewhere and potentially this can go on for
quite a while.

It should be possible to drop s_umount while allowing iterators to wait
on a nascent superblock to either be born or discarded. This patch
implements a wait_var_event() mechanism allowing iterators to sleep
until they are woken when the superblock is born or discarded.

This also allows us to avoid relooping through @fs_supers and
@super_blocks if a superblock isn't yet born or dying.

Link: aca740cecbe5 ("fs: open block device after superblock creation")
Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 fs/super.c         | 200 +++++++++++++++++++++++++++++++++++++++--------------
 include/linux/fs.h |   1 +
 2 files changed, 150 insertions(+), 51 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index ba5d813c5804..896f05f34377 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -50,7 +50,7 @@ static char *sb_writers_name[SB_FREEZE_LEVELS] = {
 	"sb_internal",
 };
 
-static inline void super_lock(struct super_block *sb, bool excl)
+static inline void __super_lock(struct super_block *sb, bool excl)
 {
 	if (excl)
 		down_write(&sb->s_umount);
@@ -66,14 +66,9 @@ static inline void super_unlock(struct super_block *sb, bool excl)
 		up_read(&sb->s_umount);
 }
 
-static inline void super_lock_excl(struct super_block *sb)
+static inline void __super_lock_excl(struct super_block *sb)
 {
-	super_lock(sb, true);
-}
-
-static inline void super_lock_shared(struct super_block *sb)
-{
-	super_lock(sb, false);
+	__super_lock(sb, true);
 }
 
 static inline void super_unlock_excl(struct super_block *sb)
@@ -86,6 +81,95 @@ static inline void super_unlock_shared(struct super_block *sb)
 	super_unlock(sb, false);
 }
 
+static inline bool wait_born(struct super_block *sb)
+{
+	unsigned int flags;
+
+	/*
+	 * Pairs with smp_mb() in super_wake() and ensures
+	 * that we see SB_BORN or SB_DYING after we're woken.
+	 */
+	flags = smp_load_acquire(&sb->s_flags);
+	return flags & (SB_BORN | SB_DYING);
+}
+
+/**
+ * super_lock - wait for superblock to become ready and lock it
+ * @sb: superblock to wait for
+ * @excl: whether exclusive access is required
+ *
+ * If the superblock has neither passed through vfs_get_tree() or
+ * generic_shutdown_super() yet wait for it to happen. Either superblock
+ * creation will succeed and SB_BORN is set by vfs_get_tree() or we're
+ * woken and we'll see SB_DYING.
+ *
+ * The caller must have acquired a temporary reference on @sb->s_count.
+ *
+ * Return: This returns true if SB_BORN was set, false if SB_DYING was
+ *         set. The function acquires s_umount and returns with it held.
+ */
+static __must_check bool super_lock(struct super_block *sb, bool excl)
+{
+
+	lockdep_assert_not_held(&sb->s_umount);
+
+relock:
+	__super_lock(sb, excl);
+
+	/*
+	 * Has gone through generic_shutdown_super() in the meantime.
+	 * @sb->s_root is NULL and @sb->s_active is 0. No one needs to
+	 * grab a reference to this. Tell them so.
+	 */
+	if (sb->s_flags & SB_DYING)
+		return false;
+
+	/* Has called ->get_tree() successfully. */
+	if (sb->s_flags & SB_BORN)
+		return true;
+
+	super_unlock(sb, excl);
+
+	/* wait until the superblock is ready or dying */
+	wait_var_event(&sb->s_flags, wait_born(sb));
+
+	/*
+	 * Neither SB_BORN nor SB_DYING are ever unset so we never loop.
+	 * Just reacquire @sb->s_umount for the caller.
+	 */
+	goto relock;
+}
+
+/* wait and acquire read-side of @sb->s_umount */
+static inline bool super_lock_shared(struct super_block *sb)
+{
+	return super_lock(sb, false);
+}
+
+/* wait and acquire write-side of @sb->s_umount */
+static inline bool super_lock_excl(struct super_block *sb)
+{
+	return super_lock(sb, true);
+}
+
+/* wake waiters */
+#define SUPER_WAKE_FLAGS (SB_BORN | SB_DYING)
+static void super_wake(struct super_block *sb, unsigned int flag)
+{
+	WARN_ON_ONCE((flag & ~SUPER_WAKE_FLAGS));
+	WARN_ON_ONCE(hweight32(flag & SUPER_WAKE_FLAGS) > 1);
+
+	/*
+	 * Pairs with smp_load_acquire() in super_lock() and
+	 * ensures that @flag is set before we wake anyone and ensures
+	 * that checking whether waitqueue is active isn't hoisted
+	 * before the store of @flag.
+	 */
+	sb->s_flags |= flag;
+	smp_mb();
+	wake_up_var(&sb->s_flags);
+}
+
 /*
  * One thing we have to be careful of with a per-sb shrinker is that we don't
  * drop the last active reference to the superblock from within the shrinker.
@@ -393,7 +477,7 @@ EXPORT_SYMBOL(deactivate_locked_super);
 void deactivate_super(struct super_block *s)
 {
 	if (!atomic_add_unless(&s->s_active, -1, 1)) {
-		super_lock_excl(s);
+		__super_lock_excl(s);
 		deactivate_locked_super(s);
 	}
 }
@@ -415,10 +499,12 @@ EXPORT_SYMBOL(deactivate_super);
  */
 static int grab_super(struct super_block *s) __releases(sb_lock)
 {
+	bool born;
+
 	s->s_count++;
 	spin_unlock(&sb_lock);
-	super_lock_excl(s);
-	if ((s->s_flags & SB_BORN) && atomic_inc_not_zero(&s->s_active)) {
+	born = super_lock_excl(s);
+	if (born && atomic_inc_not_zero(&s->s_active)) {
 		put_super(s);
 		return 1;
 	}
@@ -447,8 +533,8 @@ static int grab_super(struct super_block *s) __releases(sb_lock)
 bool super_trylock_shared(struct super_block *sb)
 {
 	if (down_read_trylock(&sb->s_umount)) {
-		if (!hlist_unhashed(&sb->s_instances) &&
-		    sb->s_root && (sb->s_flags & SB_BORN))
+		if (!(sb->s_flags & SB_DYING) && sb->s_root &&
+		    (sb->s_flags & SB_BORN))
 			return true;
 		super_unlock_shared(sb);
 	}
@@ -475,7 +561,7 @@ bool super_trylock_shared(struct super_block *sb)
 void retire_super(struct super_block *sb)
 {
 	WARN_ON(!sb->s_bdev);
-	super_lock_excl(sb);
+	__super_lock_excl(sb);
 	if (sb->s_iflags & SB_I_PERSB_BDI) {
 		bdi_unregister(sb->s_bdi);
 		sb->s_iflags &= ~SB_I_PERSB_BDI;
@@ -557,6 +643,13 @@ void generic_shutdown_super(struct super_block *sb)
 	/* should be initialized for __put_super_and_need_restart() */
 	hlist_del_init(&sb->s_instances);
 	spin_unlock(&sb_lock);
+	/*
+	 * Broadcast to everyone that grabbed a temporary reference to this
+	 * superblock before we removed it from @fs_supers that the superblock
+	 * is dying. Every walker of @fs_supers outside of sget{_fc}() will now
+	 * discard this superblock and treat it as dead.
+	 */
+	super_wake(sb, SB_DYING);
 	super_unlock_excl(sb);
 	if (sb->s_bdi != &noop_backing_dev_info) {
 		if (sb->s_iflags & SB_I_PERSB_BDI)
@@ -631,6 +724,11 @@ struct super_block *sget_fc(struct fs_context *fc,
 	s->s_type = fc->fs_type;
 	s->s_iflags |= fc->s_iflags;
 	strscpy(s->s_id, s->s_type->name, sizeof(s->s_id));
+	/*
+	 * Make the superblock visible on @super_blocks and @fs_supers.
+	 * It's in a nascent state and users should wait on SB_BORN or
+	 * SB_DYING to be set.
+	 */
 	list_add_tail(&s->s_list, &super_blocks);
 	hlist_add_head(&s->s_instances, &s->s_type->fs_supers);
 	spin_unlock(&sb_lock);
@@ -740,7 +838,8 @@ static void __iterate_supers(void (*f)(struct super_block *))
 
 	spin_lock(&sb_lock);
 	list_for_each_entry(sb, &super_blocks, s_list) {
-		if (hlist_unhashed(&sb->s_instances))
+		/* Pairs with memory marrier in super_wake(). */
+		if (smp_load_acquire(&sb->s_flags) & SB_DYING)
 			continue;
 		sb->s_count++;
 		spin_unlock(&sb_lock);
@@ -770,13 +869,13 @@ void iterate_supers(void (*f)(struct super_block *, void *), void *arg)
 
 	spin_lock(&sb_lock);
 	list_for_each_entry(sb, &super_blocks, s_list) {
-		if (hlist_unhashed(&sb->s_instances))
-			continue;
+		bool born;
+
 		sb->s_count++;
 		spin_unlock(&sb_lock);
 
-		super_lock_shared(sb);
-		if (sb->s_root && (sb->s_flags & SB_BORN))
+		born = super_lock_shared(sb);
+		if (born && sb->s_root)
 			f(sb, arg);
 		super_unlock_shared(sb);
 
@@ -806,11 +905,13 @@ void iterate_supers_type(struct file_system_type *type,
 
 	spin_lock(&sb_lock);
 	hlist_for_each_entry(sb, &type->fs_supers, s_instances) {
+		bool born;
+
 		sb->s_count++;
 		spin_unlock(&sb_lock);
 
-		super_lock_shared(sb);
-		if (sb->s_root && (sb->s_flags & SB_BORN))
+		born = super_lock_shared(sb);
+		if (born && sb->s_root)
 			f(sb, arg);
 		super_unlock_shared(sb);
 
@@ -841,14 +942,11 @@ struct super_block *get_active_super(struct block_device *bdev)
 	if (!bdev)
 		return NULL;
 
-restart:
 	spin_lock(&sb_lock);
 	list_for_each_entry(sb, &super_blocks, s_list) {
-		if (hlist_unhashed(&sb->s_instances))
-			continue;
 		if (sb->s_bdev == bdev) {
 			if (!grab_super(sb))
-				goto restart;
+				return NULL;
 			super_unlock_excl(sb);
 			return sb;
 		}
@@ -862,22 +960,21 @@ struct super_block *user_get_super(dev_t dev, bool excl)
 	struct super_block *sb;
 
 	spin_lock(&sb_lock);
-rescan:
 	list_for_each_entry(sb, &super_blocks, s_list) {
-		if (hlist_unhashed(&sb->s_instances))
-			continue;
 		if (sb->s_dev ==  dev) {
+			bool born;
+
 			sb->s_count++;
 			spin_unlock(&sb_lock);
-			super_lock(sb, excl);
 			/* still alive? */
-			if (sb->s_root && (sb->s_flags & SB_BORN))
+			born = super_lock(sb, excl);
+			if (born && sb->s_root)
 				return sb;
 			super_unlock(sb, excl);
 			/* nope, got unmounted */
 			spin_lock(&sb_lock);
 			__put_super(sb);
-			goto rescan;
+			break;
 		}
 	}
 	spin_unlock(&sb_lock);
@@ -921,7 +1018,7 @@ int reconfigure_super(struct fs_context *fc)
 		if (!hlist_empty(&sb->s_pins)) {
 			super_unlock_excl(sb);
 			group_pin_kill(&sb->s_pins);
-			super_lock_excl(sb);
+			__super_lock_excl(sb);
 			if (!sb->s_root)
 				return 0;
 			if (sb->s_writers.frozen != SB_UNFROZEN)
@@ -984,9 +1081,9 @@ int reconfigure_super(struct fs_context *fc)
 
 static void do_emergency_remount_callback(struct super_block *sb)
 {
-	super_lock_excl(sb);
-	if (sb->s_root && sb->s_bdev && (sb->s_flags & SB_BORN) &&
-	    !sb_rdonly(sb)) {
+	bool born = super_lock_excl(sb);
+
+	if (born && sb->s_root && sb->s_bdev && !sb_rdonly(sb)) {
 		struct fs_context *fc;
 
 		fc = fs_context_for_reconfigure(sb->s_root,
@@ -1020,8 +1117,9 @@ void emergency_remount(void)
 
 static void do_thaw_all_callback(struct super_block *sb)
 {
-	super_lock_excl(sb);
-	if (sb->s_root && sb->s_flags & SB_BORN) {
+	bool born = super_lock_excl(sb);
+
+	if (born && sb->s_root) {
 		emergency_thaw_bdev(sb);
 		thaw_super_locked(sb);
 	} else {
@@ -1212,9 +1310,9 @@ EXPORT_SYMBOL(get_tree_keyed);
  */
 static bool super_lock_shared_active(struct super_block *sb)
 {
-	super_lock_shared(sb);
-	if (!sb->s_root ||
-	    (sb->s_flags & (SB_ACTIVE | SB_BORN)) != (SB_ACTIVE | SB_BORN)) {
+	bool born = super_lock_shared(sb);
+
+	if (!born || !sb->s_root || !(sb->s_flags & SB_ACTIVE)) {
 		super_unlock_shared(sb);
 		return false;
 	}
@@ -1374,7 +1472,7 @@ int get_tree_bdev(struct fs_context *fc,
 		 */
 		super_unlock_excl(s);
 		error = setup_bdev_super(s, fc->sb_flags, fc);
-		super_lock_excl(s);
+		__super_lock_excl(s);
 		if (!error)
 			error = fill_super(s, fc);
 		if (error) {
@@ -1426,7 +1524,7 @@ struct dentry *mount_bdev(struct file_system_type *fs_type,
 		 */
 		super_unlock_excl(s);
 		error = setup_bdev_super(s, flags, NULL);
-		super_lock_excl(s);
+		__super_lock_excl(s);
 		if (!error)
 			error = fill_super(s, data, flags & SB_SILENT ? 1 : 0);
 		if (error) {
@@ -1566,13 +1664,13 @@ int vfs_get_tree(struct fs_context *fc)
 	WARN_ON(!sb->s_bdi);
 
 	/*
-	 * Write barrier is for super_cache_count(). We place it before setting
-	 * SB_BORN as the data dependency between the two functions is the
-	 * superblock structure contents that we just set up, not the SB_BORN
-	 * flag.
+	 * super_wake() contains a memory barrier which also care of
+	 * ordering for super_cache_count(). We place it before setting
+	 * SB_BORN as the data dependency between the two functions is
+	 * the superblock structure contents that we just set up, not
+	 * the SB_BORN flag.
 	 */
-	smp_wmb();
-	sb->s_flags |= SB_BORN;
+	super_wake(sb, SB_BORN);
 
 	error = security_sb_set_mnt_opts(sb, fc->security, 0, NULL);
 	if (unlikely(error)) {
@@ -1715,7 +1813,7 @@ int freeze_super(struct super_block *sb)
 	int ret;
 
 	atomic_inc(&sb->s_active);
-	super_lock_excl(sb);
+	__super_lock_excl(sb);
 	if (sb->s_writers.frozen != SB_UNFROZEN) {
 		deactivate_locked_super(sb);
 		return -EBUSY;
@@ -1737,7 +1835,7 @@ int freeze_super(struct super_block *sb)
 	/* Release s_umount to preserve sb_start_write -> s_umount ordering */
 	super_unlock_excl(sb);
 	sb_wait_write(sb, SB_FREEZE_WRITE);
-	super_lock_excl(sb);
+	__super_lock_excl(sb);
 
 	/* Now we go and block page faults... */
 	sb->s_writers.frozen = SB_FREEZE_PAGEFAULT;
@@ -1820,7 +1918,7 @@ static int thaw_super_locked(struct super_block *sb)
  */
 int thaw_super(struct super_block *sb)
 {
-	super_lock_excl(sb);
+	__super_lock_excl(sb);
 	return thaw_super_locked(sb);
 }
 EXPORT_SYMBOL(thaw_super);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 14b5777a24a0..173672645156 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1095,6 +1095,7 @@ extern int send_sigurg(struct fown_struct *fown);
 #define SB_LAZYTIME     BIT(25)	/* Update the on-disk [acm]times lazily */
 
 /* These sb flags are internal to the kernel */
+#define SB_DYING        BIT(24)
 #define SB_SUBMOUNT     BIT(26)
 #define SB_FORCE        BIT(27)
 #define SB_NOSEC        BIT(28)

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 4/4] super: wait until we passed kill super
  2023-08-18 14:00 [PATCH v3 0/4] super: allow waiting without s_umount held Christian Brauner
                   ` (2 preceding siblings ...)
  2023-08-18 14:00 ` [PATCH v3 3/4] super: wait for nascent superblocks Christian Brauner
@ 2023-08-18 14:00 ` Christian Brauner
  2023-08-21 15:57   ` Jan Kara
  2023-08-22 12:36 ` [PATCH v3 0/4] super: allow waiting without s_umount held Christian Brauner
  4 siblings, 1 reply; 14+ messages in thread
From: Christian Brauner @ 2023-08-18 14:00 UTC (permalink / raw)
  To: Jan Kara, Christoph Hellwig; +Cc: Al Viro, linux-fsdevel, Christian Brauner

Recent rework moved block device closing out of sb->put_super() and into
sb->kill_sb() to avoid deadlocks as s_umount is held in put_super() and
blkdev_put() can end up taking s_umount again.

That means we need to move the removal of the superblock from @fs_supers
out of generic_shutdown_super() and into deactivate_locked_super() to
ensure that concurrent mounters don't fail to open block devices that
are still in use because blkdev_put() in sb->kill_sb() hasn't been
called yet.

We can now do this as we can make iterators through @fs_super and
@super_blocks wait without holding s_umount. Concurrent mounts will wait
until a dying superblock is fully dead so until sb->kill_sb() has been
called and SB_DEAD been set. Concurrent iterators can already discard
any SB_DYING superblock.

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 fs/super.c         | 71 ++++++++++++++++++++++++++++++++++++++++++++++++------
 include/linux/fs.h |  1 +
 2 files changed, 65 insertions(+), 7 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index 896f05f34377..015e428818ce 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -153,7 +153,7 @@ static inline bool super_lock_excl(struct super_block *sb)
 }
 
 /* wake waiters */
-#define SUPER_WAKE_FLAGS (SB_BORN | SB_DYING)
+#define SUPER_WAKE_FLAGS (SB_BORN | SB_DYING | SB_DEAD)
 static void super_wake(struct super_block *sb, unsigned int flag)
 {
 	WARN_ON_ONCE((flag & ~SUPER_WAKE_FLAGS));
@@ -457,6 +457,25 @@ void deactivate_locked_super(struct super_block *s)
 		list_lru_destroy(&s->s_dentry_lru);
 		list_lru_destroy(&s->s_inode_lru);
 
+		/*
+		 * Remove it from @fs_supers so it isn't found by new
+		 * sget{_fc}() walkers anymore. Any concurrent mounter still
+		 * managing to grab a temporary reference is guaranteed to
+		 * already see SB_DYING and will wait until we notify them about
+		 * SB_DEAD.
+		 */
+		spin_lock(&sb_lock);
+		hlist_del_init(&s->s_instances);
+		spin_unlock(&sb_lock);
+
+		/*
+		 * Let concurrent mounts know that this thing is really dead.
+		 * We don't need @sb->s_umount here as every concurrent caller
+		 * will see SB_DYING and either discard the superblock or wait
+		 * for SB_DEAD.
+		 */
+		super_wake(s, SB_DEAD);
+
 		put_filesystem(fs);
 		put_super(s);
 	} else {
@@ -513,6 +532,45 @@ static int grab_super(struct super_block *s) __releases(sb_lock)
 	return 0;
 }
 
+static inline bool wait_dead(struct super_block *sb)
+{
+	unsigned int flags;
+
+	/*
+	 * Pairs with memory barrier in super_wake() and ensures
+	 * that we see SB_DEAD after we're woken.
+	 */
+	flags = smp_load_acquire(&sb->s_flags);
+	return flags & SB_DEAD;
+}
+
+/**
+ * grab_super_dead - acquire an active reference to a superblock
+ * @sb: superblock to acquire
+ *
+ * Acquire a temporary reference on a superblock and try to trade it for
+ * an active reference. This is used in sget{_fc}() to wait for a
+ * superblock to either become SB_BORN or for it to pass through
+ * sb->kill() and be marked as SB_DEAD.
+ *
+ * Return: This returns true if an active reference could be acquired,
+ *         false if not.
+ */
+static bool grab_super_dead(struct super_block *sb)
+{
+
+	sb->s_count++;
+	if (grab_super(sb)) {
+		put_super(sb);
+		lockdep_assert_held(&sb->s_umount);
+		return true;
+	}
+	wait_var_event(&sb->s_flags, wait_dead(sb));
+	put_super(sb);
+	lockdep_assert_not_held(&sb->s_umount);
+	return false;
+}
+
 /*
  *	super_trylock_shared - try to grab ->s_umount shared
  *	@sb: reference we are trying to grab
@@ -639,15 +697,14 @@ void generic_shutdown_super(struct super_block *sb)
 			spin_unlock(&sb->s_inode_list_lock);
 		}
 	}
-	spin_lock(&sb_lock);
-	/* should be initialized for __put_super_and_need_restart() */
-	hlist_del_init(&sb->s_instances);
-	spin_unlock(&sb_lock);
 	/*
 	 * Broadcast to everyone that grabbed a temporary reference to this
 	 * superblock before we removed it from @fs_supers that the superblock
 	 * is dying. Every walker of @fs_supers outside of sget{_fc}() will now
 	 * discard this superblock and treat it as dead.
+	 *
+	 * We leave the superblock on @fs_supers so it can be found by
+	 * sget{_fc}() until we passed sb->kill_sb().
 	 */
 	super_wake(sb, SB_DYING);
 	super_unlock_excl(sb);
@@ -742,7 +799,7 @@ struct super_block *sget_fc(struct fs_context *fc,
 		destroy_unused_super(s);
 		return ERR_PTR(-EBUSY);
 	}
-	if (!grab_super(old))
+	if (!grab_super_dead(old))
 		goto retry;
 	destroy_unused_super(s);
 	return old;
@@ -786,7 +843,7 @@ struct super_block *sget(struct file_system_type *type,
 				destroy_unused_super(s);
 				return ERR_PTR(-EBUSY);
 			}
-			if (!grab_super(old))
+			if (!grab_super_dead(old))
 				goto retry;
 			destroy_unused_super(s);
 			return old;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 173672645156..a63da68305e9 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1095,6 +1095,7 @@ extern int send_sigurg(struct fown_struct *fown);
 #define SB_LAZYTIME     BIT(25)	/* Update the on-disk [acm]times lazily */
 
 /* These sb flags are internal to the kernel */
+#define SB_DEAD         BIT(21)
 #define SB_DYING        BIT(24)
 #define SB_SUBMOUNT     BIT(26)
 #define SB_FORCE        BIT(27)

-- 
2.34.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 3/4] super: wait for nascent superblocks
  2023-08-18 14:00 ` [PATCH v3 3/4] super: wait for nascent superblocks Christian Brauner
@ 2023-08-18 14:57   ` Matthew Wilcox
  2023-08-18 15:21     ` Christian Brauner
  2023-08-21 15:52   ` Jan Kara
  1 sibling, 1 reply; 14+ messages in thread
From: Matthew Wilcox @ 2023-08-18 14:57 UTC (permalink / raw)
  To: Christian Brauner; +Cc: Jan Kara, Christoph Hellwig, Al Viro, linux-fsdevel

On Fri, Aug 18, 2023 at 04:00:50PM +0200, Christian Brauner wrote:
> +/**
> + * super_lock - wait for superblock to become ready and lock it
> + * @sb: superblock to wait for
> + * @excl: whether exclusive access is required
> + *
> + * If the superblock has neither passed through vfs_get_tree() or
> + * generic_shutdown_super() yet wait for it to happen. Either superblock
> + * creation will succeed and SB_BORN is set by vfs_get_tree() or we're
> + * woken and we'll see SB_DYING.
> + *
> + * The caller must have acquired a temporary reference on @sb->s_count.
> + *
> + * Return: This returns true if SB_BORN was set, false if SB_DYING was
> + *         set. The function acquires s_umount and returns with it held.

I think this needs slightly stronger language to warn callers that
*even if it returns false*, it returns with the lock held!  I find
this surprising.  Usually returning failure means that the lock isn't
held, but looking at the callers shows that it wouldn't simplify them
to release the lock before returning failure.

Here's my suggestion:

/**
 * super_lock - Wait for superblock to become ready and lock it.
 * @sb: Superblock to wait for.
 * @excl: Whether exclusive access is required.
 *
 * If the superblock has neither passed through vfs_get_tree() or
 * generic_shutdown_super() yet wait for it to happen. Either superblock
 * creation will succeed and SB_BORN is set by vfs_get_tree() or we're
 * woken and we'll see SB_DYING.
 *
 * Context: May sleep.  The caller must have incremented @sb->s_count.
 * Return: s_umount is always acquired.  Returns true if the superblock
 * is active, false if the superblock is dying.
 */

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 3/4] super: wait for nascent superblocks
  2023-08-18 14:57   ` Matthew Wilcox
@ 2023-08-18 15:21     ` Christian Brauner
  0 siblings, 0 replies; 14+ messages in thread
From: Christian Brauner @ 2023-08-18 15:21 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Jan Kara, Christoph Hellwig, Al Viro, linux-fsdevel

> Here's my suggestion:
> 
> /**
>  * super_lock - Wait for superblock to become ready and lock it.
>  * @sb: Superblock to wait for.
>  * @excl: Whether exclusive access is required.
>  *
>  * If the superblock has neither passed through vfs_get_tree() or
>  * generic_shutdown_super() yet wait for it to happen. Either superblock
>  * creation will succeed and SB_BORN is set by vfs_get_tree() or we're
>  * woken and we'll see SB_DYING.
>  *
>  * Context: May sleep.  The caller must have incremented @sb->s_count.
>  * Return: s_umount is always acquired.  Returns true if the superblock
>  * is active, false if the superblock is dying.
>  */

Yeah, sure. Applied in-tree.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 3/4] super: wait for nascent superblocks
  2023-08-18 14:00 ` [PATCH v3 3/4] super: wait for nascent superblocks Christian Brauner
  2023-08-18 14:57   ` Matthew Wilcox
@ 2023-08-21 15:52   ` Jan Kara
  2023-08-21 15:56     ` Christian Brauner
  2023-08-21 16:02     ` Jan Kara
  1 sibling, 2 replies; 14+ messages in thread
From: Jan Kara @ 2023-08-21 15:52 UTC (permalink / raw)
  To: Christian Brauner; +Cc: Jan Kara, Christoph Hellwig, Al Viro, linux-fsdevel

On Fri 18-08-23 16:00:50, Christian Brauner wrote:
> Recent patches experiment with making it possible to allocate a new
> superblock before opening the relevant block device. Naturally this has
> intricate side-effects that we get to learn about while developing this.
> 
> Superblock allocators such as sget{_fc}() return with s_umount of the
> new superblock held and lock ordering currently requires that block
> level locks such as bdev_lock and open_mutex rank above s_umount.
> 
> Before aca740cecbe5 ("fs: open block device after superblock creation")
> ordering was guaranteed to be correct as block devices were opened prior
> to superblock allocation and thus s_umount wasn't held. But now s_umount
> must be dropped before opening block devices to avoid locking
> violations.
> 
> This has consequences. The main one being that iterators over
> @super_blocks and @fs_supers that grab a temporary reference to the
> superblock can now also grab s_umount before the caller has managed to
> open block devices and called fill_super(). So whereas before such
> iterators or concurrent mounts would have simply slept on s_umount until
> SB_BORN was set or the superblock was discard due to initalization
> failure they can now needlessly spin through sget{_fc}().
> 
> If the caller is sleeping on bdev_lock or open_mutex one caller waiting
> on SB_BORN will always spin somewhere and potentially this can go on for
> quite a while.
> 
> It should be possible to drop s_umount while allowing iterators to wait
> on a nascent superblock to either be born or discarded. This patch
> implements a wait_var_event() mechanism allowing iterators to sleep
> until they are woken when the superblock is born or discarded.
> 
> This also allows us to avoid relooping through @fs_supers and
> @super_blocks if a superblock isn't yet born or dying.
> 
> Link: aca740cecbe5 ("fs: open block device after superblock creation")
> Signed-off-by: Christian Brauner <brauner@kernel.org>
...
> +/* wake waiters */
> +#define SUPER_WAKE_FLAGS (SB_BORN | SB_DYING)
> +static void super_wake(struct super_block *sb, unsigned int flag)
> +{
> +	WARN_ON_ONCE((flag & ~SUPER_WAKE_FLAGS));
> +	WARN_ON_ONCE(hweight32(flag & SUPER_WAKE_FLAGS) > 1);
> +
> +	/*
> +	 * Pairs with smp_load_acquire() in super_lock() and
> +	 * ensures that @flag is set before we wake anyone and ensures
> +	 * that checking whether waitqueue is active isn't hoisted
> +	 * before the store of @flag.
> +	 */
> +	sb->s_flags |= flag;
> +	smp_mb();
> +	wake_up_var(&sb->s_flags);

I think we misunderstood here. I believe we need:

	/*
	 * Pairs with smp_load_acquire() in super_lock() to make sure
	 * all initializations in the superblock are seen by the user
	 * seeing SB_BORN sent.
	 */
	smp_store_release(&sb->s_flags, sb->s_flags | flag);
	/*
	 * Pairs with the barrier in prepare_to_wait_event() to make sure
	 * ___wait_var_event() either sees SB_BORN set or
	 * waitqueue_active() check in wake_up_var() sees the waiter
	 */
	smp_rmb();
	wake_up_var(&sb->s_flags);

or we need something equivalent with stronger barriers. Like:

	smp_wmb();
	sb->s_flags |= flag;
	smp_rmb();
	wake_up_var(&sb->s_flags);

> @@ -1715,7 +1813,7 @@ int freeze_super(struct super_block *sb)
>  	int ret;
>  
>  	atomic_inc(&sb->s_active);
> -	super_lock_excl(sb);
> +	__super_lock_excl(sb);
>  	if (sb->s_writers.frozen != SB_UNFROZEN) {
>  		deactivate_locked_super(sb);
>  		return -EBUSY;
> @@ -1737,7 +1835,7 @@ int freeze_super(struct super_block *sb)
>  	/* Release s_umount to preserve sb_start_write -> s_umount ordering */
>  	super_unlock_excl(sb);
>  	sb_wait_write(sb, SB_FREEZE_WRITE);
> -	super_lock_excl(sb);
> +	__super_lock_excl(sb);
>  
>  	/* Now we go and block page faults... */
>  	sb->s_writers.frozen = SB_FREEZE_PAGEFAULT;
> @@ -1820,7 +1918,7 @@ static int thaw_super_locked(struct super_block *sb)
>   */
>  int thaw_super(struct super_block *sb)
>  {
> -	super_lock_excl(sb);
> +	__super_lock_excl(sb);
>  	return thaw_super_locked(sb);
>  }

Maybe we can have in these places rather:

	if (!super_lock_excl(sb))
		WARN(1, "Dying superblock while freezing!");

So that we reduce the amount of __super_lock_excl() calls which are kind of
special. In these places we hold active reference so practically this is
equivalent. Just a though, pick whatever you find better, I don't have a
strong opinion but wanted to share this idea.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 3/4] super: wait for nascent superblocks
  2023-08-21 15:52   ` Jan Kara
@ 2023-08-21 15:56     ` Christian Brauner
  2023-08-21 16:04       ` Jan Kara
  2023-08-21 16:02     ` Jan Kara
  1 sibling, 1 reply; 14+ messages in thread
From: Christian Brauner @ 2023-08-21 15:56 UTC (permalink / raw)
  To: Jan Kara; +Cc: Christoph Hellwig, Al Viro, linux-fsdevel

> I think we misunderstood here. I believe we need:
> 
> 	/*
> 	 * Pairs with smp_load_acquire() in super_lock() to make sure
> 	 * all initializations in the superblock are seen by the user
> 	 * seeing SB_BORN sent.
> 	 */
> 	smp_store_release(&sb->s_flags, sb->s_flags | flag);
> 	/*
> 	 * Pairs with the barrier in prepare_to_wait_event() to make sure
> 	 * ___wait_var_event() either sees SB_BORN set or
> 	 * waitqueue_active() check in wake_up_var() sees the waiter
> 	 */
> 	smp_rmb();
> 	wake_up_var(&sb->s_flags);

Oh right, sorry I missed this.

> Maybe we can have in these places rather:
> 
> 	if (!super_lock_excl(sb))
> 		WARN(1, "Dying superblock while freezing!");
> 
> So that we reduce the amount of __super_lock_excl() calls which are kind of
> special. In these places we hold active reference so practically this is
> equivalent. Just a though, pick whatever you find better, I don't have a
> strong opinion but wanted to share this idea.

Ok, will pick yours.

Do you want me to resend?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 4/4] super: wait until we passed kill super
  2023-08-18 14:00 ` [PATCH v3 4/4] super: wait until we passed kill super Christian Brauner
@ 2023-08-21 15:57   ` Jan Kara
  0 siblings, 0 replies; 14+ messages in thread
From: Jan Kara @ 2023-08-21 15:57 UTC (permalink / raw)
  To: Christian Brauner; +Cc: Jan Kara, Christoph Hellwig, Al Viro, linux-fsdevel

On Fri 18-08-23 16:00:51, Christian Brauner wrote:
> Recent rework moved block device closing out of sb->put_super() and into
> sb->kill_sb() to avoid deadlocks as s_umount is held in put_super() and
> blkdev_put() can end up taking s_umount again.
> 
> That means we need to move the removal of the superblock from @fs_supers
> out of generic_shutdown_super() and into deactivate_locked_super() to
> ensure that concurrent mounters don't fail to open block devices that
> are still in use because blkdev_put() in sb->kill_sb() hasn't been
> called yet.
> 
> We can now do this as we can make iterators through @fs_super and
> @super_blocks wait without holding s_umount. Concurrent mounts will wait
> until a dying superblock is fully dead so until sb->kill_sb() has been
> called and SB_DEAD been set. Concurrent iterators can already discard
> any SB_DYING superblock.
> 
> Signed-off-by: Christian Brauner <brauner@kernel.org>

Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/super.c         | 71 ++++++++++++++++++++++++++++++++++++++++++++++++------
>  include/linux/fs.h |  1 +
>  2 files changed, 65 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/super.c b/fs/super.c
> index 896f05f34377..015e428818ce 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -153,7 +153,7 @@ static inline bool super_lock_excl(struct super_block *sb)
>  }
>  
>  /* wake waiters */
> -#define SUPER_WAKE_FLAGS (SB_BORN | SB_DYING)
> +#define SUPER_WAKE_FLAGS (SB_BORN | SB_DYING | SB_DEAD)
>  static void super_wake(struct super_block *sb, unsigned int flag)
>  {
>  	WARN_ON_ONCE((flag & ~SUPER_WAKE_FLAGS));
> @@ -457,6 +457,25 @@ void deactivate_locked_super(struct super_block *s)
>  		list_lru_destroy(&s->s_dentry_lru);
>  		list_lru_destroy(&s->s_inode_lru);
>  
> +		/*
> +		 * Remove it from @fs_supers so it isn't found by new
> +		 * sget{_fc}() walkers anymore. Any concurrent mounter still
> +		 * managing to grab a temporary reference is guaranteed to
> +		 * already see SB_DYING and will wait until we notify them about
> +		 * SB_DEAD.
> +		 */
> +		spin_lock(&sb_lock);
> +		hlist_del_init(&s->s_instances);
> +		spin_unlock(&sb_lock);
> +
> +		/*
> +		 * Let concurrent mounts know that this thing is really dead.
> +		 * We don't need @sb->s_umount here as every concurrent caller
> +		 * will see SB_DYING and either discard the superblock or wait
> +		 * for SB_DEAD.
> +		 */
> +		super_wake(s, SB_DEAD);
> +
>  		put_filesystem(fs);
>  		put_super(s);
>  	} else {
> @@ -513,6 +532,45 @@ static int grab_super(struct super_block *s) __releases(sb_lock)
>  	return 0;
>  }
>  
> +static inline bool wait_dead(struct super_block *sb)
> +{
> +	unsigned int flags;
> +
> +	/*
> +	 * Pairs with memory barrier in super_wake() and ensures
> +	 * that we see SB_DEAD after we're woken.
> +	 */
> +	flags = smp_load_acquire(&sb->s_flags);
> +	return flags & SB_DEAD;
> +}
> +
> +/**
> + * grab_super_dead - acquire an active reference to a superblock
> + * @sb: superblock to acquire
> + *
> + * Acquire a temporary reference on a superblock and try to trade it for
> + * an active reference. This is used in sget{_fc}() to wait for a
> + * superblock to either become SB_BORN or for it to pass through
> + * sb->kill() and be marked as SB_DEAD.
> + *
> + * Return: This returns true if an active reference could be acquired,
> + *         false if not.
> + */
> +static bool grab_super_dead(struct super_block *sb)
> +{
> +
> +	sb->s_count++;
> +	if (grab_super(sb)) {
> +		put_super(sb);
> +		lockdep_assert_held(&sb->s_umount);
> +		return true;
> +	}
> +	wait_var_event(&sb->s_flags, wait_dead(sb));
> +	put_super(sb);
> +	lockdep_assert_not_held(&sb->s_umount);
> +	return false;
> +}
> +
>  /*
>   *	super_trylock_shared - try to grab ->s_umount shared
>   *	@sb: reference we are trying to grab
> @@ -639,15 +697,14 @@ void generic_shutdown_super(struct super_block *sb)
>  			spin_unlock(&sb->s_inode_list_lock);
>  		}
>  	}
> -	spin_lock(&sb_lock);
> -	/* should be initialized for __put_super_and_need_restart() */
> -	hlist_del_init(&sb->s_instances);
> -	spin_unlock(&sb_lock);
>  	/*
>  	 * Broadcast to everyone that grabbed a temporary reference to this
>  	 * superblock before we removed it from @fs_supers that the superblock
>  	 * is dying. Every walker of @fs_supers outside of sget{_fc}() will now
>  	 * discard this superblock and treat it as dead.
> +	 *
> +	 * We leave the superblock on @fs_supers so it can be found by
> +	 * sget{_fc}() until we passed sb->kill_sb().
>  	 */
>  	super_wake(sb, SB_DYING);
>  	super_unlock_excl(sb);
> @@ -742,7 +799,7 @@ struct super_block *sget_fc(struct fs_context *fc,
>  		destroy_unused_super(s);
>  		return ERR_PTR(-EBUSY);
>  	}
> -	if (!grab_super(old))
> +	if (!grab_super_dead(old))
>  		goto retry;
>  	destroy_unused_super(s);
>  	return old;
> @@ -786,7 +843,7 @@ struct super_block *sget(struct file_system_type *type,
>  				destroy_unused_super(s);
>  				return ERR_PTR(-EBUSY);
>  			}
> -			if (!grab_super(old))
> +			if (!grab_super_dead(old))
>  				goto retry;
>  			destroy_unused_super(s);
>  			return old;
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 173672645156..a63da68305e9 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1095,6 +1095,7 @@ extern int send_sigurg(struct fown_struct *fown);
>  #define SB_LAZYTIME     BIT(25)	/* Update the on-disk [acm]times lazily */
>  
>  /* These sb flags are internal to the kernel */
> +#define SB_DEAD         BIT(21)
>  #define SB_DYING        BIT(24)
>  #define SB_SUBMOUNT     BIT(26)
>  #define SB_FORCE        BIT(27)
> 
> -- 
> 2.34.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 3/4] super: wait for nascent superblocks
  2023-08-21 15:52   ` Jan Kara
  2023-08-21 15:56     ` Christian Brauner
@ 2023-08-21 16:02     ` Jan Kara
  2023-08-21 16:08       ` Christian Brauner
  1 sibling, 1 reply; 14+ messages in thread
From: Jan Kara @ 2023-08-21 16:02 UTC (permalink / raw)
  To: Christian Brauner; +Cc: Jan Kara, Christoph Hellwig, Al Viro, linux-fsdevel

On Mon 21-08-23 17:52:37, Jan Kara wrote:
> On Fri 18-08-23 16:00:50, Christian Brauner wrote:
> > Recent patches experiment with making it possible to allocate a new
> > superblock before opening the relevant block device. Naturally this has
> > intricate side-effects that we get to learn about while developing this.
> > 
> > Superblock allocators such as sget{_fc}() return with s_umount of the
> > new superblock held and lock ordering currently requires that block
> > level locks such as bdev_lock and open_mutex rank above s_umount.
> > 
> > Before aca740cecbe5 ("fs: open block device after superblock creation")
> > ordering was guaranteed to be correct as block devices were opened prior
> > to superblock allocation and thus s_umount wasn't held. But now s_umount
> > must be dropped before opening block devices to avoid locking
> > violations.
> > 
> > This has consequences. The main one being that iterators over
> > @super_blocks and @fs_supers that grab a temporary reference to the
> > superblock can now also grab s_umount before the caller has managed to
> > open block devices and called fill_super(). So whereas before such
> > iterators or concurrent mounts would have simply slept on s_umount until
> > SB_BORN was set or the superblock was discard due to initalization
> > failure they can now needlessly spin through sget{_fc}().
> > 
> > If the caller is sleeping on bdev_lock or open_mutex one caller waiting
> > on SB_BORN will always spin somewhere and potentially this can go on for
> > quite a while.
> > 
> > It should be possible to drop s_umount while allowing iterators to wait
> > on a nascent superblock to either be born or discarded. This patch
> > implements a wait_var_event() mechanism allowing iterators to sleep
> > until they are woken when the superblock is born or discarded.
> > 
> > This also allows us to avoid relooping through @fs_supers and
> > @super_blocks if a superblock isn't yet born or dying.
> > 
> > Link: aca740cecbe5 ("fs: open block device after superblock creation")
> > Signed-off-by: Christian Brauner <brauner@kernel.org>
> ...
> > +/* wake waiters */
> > +#define SUPER_WAKE_FLAGS (SB_BORN | SB_DYING)
> > +static void super_wake(struct super_block *sb, unsigned int flag)
> > +{
> > +	WARN_ON_ONCE((flag & ~SUPER_WAKE_FLAGS));
> > +	WARN_ON_ONCE(hweight32(flag & SUPER_WAKE_FLAGS) > 1);
> > +
> > +	/*
> > +	 * Pairs with smp_load_acquire() in super_lock() and
> > +	 * ensures that @flag is set before we wake anyone and ensures
> > +	 * that checking whether waitqueue is active isn't hoisted
> > +	 * before the store of @flag.
> > +	 */
> > +	sb->s_flags |= flag;
> > +	smp_mb();
> > +	wake_up_var(&sb->s_flags);
> 
> I think we misunderstood here. I believe we need:
> 
> 	/*
> 	 * Pairs with smp_load_acquire() in super_lock() to make sure
> 	 * all initializations in the superblock are seen by the user
> 	 * seeing SB_BORN sent.
> 	 */
> 	smp_store_release(&sb->s_flags, sb->s_flags | flag);
> 	/*
> 	 * Pairs with the barrier in prepare_to_wait_event() to make sure
> 	 * ___wait_var_event() either sees SB_BORN set or
> 	 * waitqueue_active() check in wake_up_var() sees the waiter
> 	 */
> 	smp_rmb();
> 	wake_up_var(&sb->s_flags);
> 
> or we need something equivalent with stronger barriers. Like:
> 
> 	smp_wmb();
> 	sb->s_flags |= flag;
> 	smp_rmb();
> 	wake_up_var(&sb->s_flags);

Ah, in both of the above cases we actually need smp_mb() (as you properly
chose) instead of smp_rmb() I wrote above because we need to establish
write-vs-read ordering.

BTW, if you pick one of these two schemes, feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

If you decide for something else, I'd like to see the result first...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 3/4] super: wait for nascent superblocks
  2023-08-21 15:56     ` Christian Brauner
@ 2023-08-21 16:04       ` Jan Kara
  0 siblings, 0 replies; 14+ messages in thread
From: Jan Kara @ 2023-08-21 16:04 UTC (permalink / raw)
  To: Christian Brauner; +Cc: Jan Kara, Christoph Hellwig, Al Viro, linux-fsdevel

On Mon 21-08-23 17:56:07, Christian Brauner wrote:
> > I think we misunderstood here. I believe we need:
> > 
> > 	/*
> > 	 * Pairs with smp_load_acquire() in super_lock() to make sure
> > 	 * all initializations in the superblock are seen by the user
> > 	 * seeing SB_BORN sent.
> > 	 */
> > 	smp_store_release(&sb->s_flags, sb->s_flags | flag);
> > 	/*
> > 	 * Pairs with the barrier in prepare_to_wait_event() to make sure
> > 	 * ___wait_var_event() either sees SB_BORN set or
> > 	 * waitqueue_active() check in wake_up_var() sees the waiter
> > 	 */
> > 	smp_rmb();
> > 	wake_up_var(&sb->s_flags);
> 
> Oh right, sorry I missed this.
> 
> > Maybe we can have in these places rather:
> > 
> > 	if (!super_lock_excl(sb))
> > 		WARN(1, "Dying superblock while freezing!");
> > 
> > So that we reduce the amount of __super_lock_excl() calls which are kind of
> > special. In these places we hold active reference so practically this is
> > equivalent. Just a though, pick whatever you find better, I don't have a
> > strong opinion but wanted to share this idea.
> 
> Ok, will pick yours.
> 
> Do you want me to resend?

No need to resend as far as I'm concerned - and my suggestion was subtly
wrong as well - see my other email.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 3/4] super: wait for nascent superblocks
  2023-08-21 16:02     ` Jan Kara
@ 2023-08-21 16:08       ` Christian Brauner
  0 siblings, 0 replies; 14+ messages in thread
From: Christian Brauner @ 2023-08-21 16:08 UTC (permalink / raw)
  To: Jan Kara; +Cc: Christoph Hellwig, Al Viro, linux-fsdevel

> Ah, in both of the above cases we actually need smp_mb() (as you properly

No worries, I got what you meant and that's what I did.

> 
> BTW, if you pick one of these two schemes, feel free to add:
> 
> Reviewed-by: Jan Kara <jack@suse.cz>

Ok, thanks!

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 0/4] super: allow waiting without s_umount held
  2023-08-18 14:00 [PATCH v3 0/4] super: allow waiting without s_umount held Christian Brauner
                   ` (3 preceding siblings ...)
  2023-08-18 14:00 ` [PATCH v3 4/4] super: wait until we passed kill super Christian Brauner
@ 2023-08-22 12:36 ` Christian Brauner
  4 siblings, 0 replies; 14+ messages in thread
From: Christian Brauner @ 2023-08-22 12:36 UTC (permalink / raw)
  To: Christian Brauner, Jan Kara; +Cc: Al Viro, linux-fsdevel, Christoph Hellwig

On Fri, 18 Aug 2023 16:00:47 +0200, Christian Brauner wrote:
> Hey everyone,
> 
> This is an attempty to allow concurrent mounters and iterators to wait
> on superblock state changes without having to hold s_umount. This is
> made necessary by recent attempts to open block devices after superblock
> creation and fixing deadlocks due to blkdev_put() trying to acquire
> s_umount while s_umount is already held.
> 
> [...]

Already applied this but didn't send out a notification.

Jan's suggestion to log a warning when freeze/thaw is called on dying
superblock is on top of the series as well:

      super: use higher-level helper for {freeze,thaw}
      https://git.kernel.org/vfs/vfs/c/051178c366bb

---

Applied to the vfs.super branch of the vfs/vfs.git tree.
Patches in the vfs.super branch should appear in linux-next soon.

Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.

It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.

Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs.super.fs_supers_lock

[1/4] super: use locking helpers
      https://git.kernel.org/vfs/vfs/c/0ed33598ddf3
[2/4] super: make locking naming consistent
      https://git.kernel.org/vfs/vfs/c/d8ce82efdece
[3/4] super: wait for nascent superblocks
      https://git.kernel.org/vfs/vfs/c/5e8749141521
[4/4] super: wait until we passed kill super
      https://git.kernel.org/vfs/vfs/c/2c18a63b760a

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-08-22 12:36 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-18 14:00 [PATCH v3 0/4] super: allow waiting without s_umount held Christian Brauner
2023-08-18 14:00 ` [PATCH v3 1/4] super: use locking helpers Christian Brauner
2023-08-18 14:00 ` [PATCH v3 2/4] super: make locking naming consistent Christian Brauner
2023-08-18 14:00 ` [PATCH v3 3/4] super: wait for nascent superblocks Christian Brauner
2023-08-18 14:57   ` Matthew Wilcox
2023-08-18 15:21     ` Christian Brauner
2023-08-21 15:52   ` Jan Kara
2023-08-21 15:56     ` Christian Brauner
2023-08-21 16:04       ` Jan Kara
2023-08-21 16:02     ` Jan Kara
2023-08-21 16:08       ` Christian Brauner
2023-08-18 14:00 ` [PATCH v3 4/4] super: wait until we passed kill super Christian Brauner
2023-08-21 15:57   ` Jan Kara
2023-08-22 12:36 ` [PATCH v3 0/4] super: allow waiting without s_umount held Christian Brauner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).