linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* s_fs_info and ->kill_sb revisited v2
@ 2023-08-09 22:05 Christoph Hellwig
  2023-08-09 22:05 ` [PATCH 01/13] xfs: reformat the xfs_fs_free prototype Christoph Hellwig
                   ` (12 more replies)
  0 siblings, 13 replies; 31+ messages in thread
From: Christoph Hellwig @ 2023-08-09 22:05 UTC (permalink / raw)
  To: Al Viro, Christian Brauner
  Cc: Namjae Jeon, Sungjong Seo, Theodore Ts'o, Andreas Dilger,
	Konstantin Komarov, Darrick J. Wong, linux-fsdevel, linux-ext4,
	ntfs3, linux-xfs

Hi all,

this series is against the VFS vfs.super branch does two slightly
related things:

 - move closing of the external devices in ext4 and xfs from ->put_super
   into ->kill_sb so that this isn't done under s_umount which creates
   lock ordere reversal
 - move freeing the private dta in s_fs_info into ->kill_sb for file systems
   that pass it in through the fs_context, as otherwise we could leak it
   before fill_super is called (this is something new on the vfs.super
   branch because of the changed place where blkdev_get is called)

Changes since v1:
 - keep the invalidate_bdev call in XFS and actually document it
 - minor whitespace fixes


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 01/13] xfs: reformat the xfs_fs_free prototype
  2023-08-09 22:05 s_fs_info and ->kill_sb revisited v2 Christoph Hellwig
@ 2023-08-09 22:05 ` Christoph Hellwig
  2023-08-10  8:53   ` Christian Brauner
  2023-08-09 22:05 ` [PATCH 02/13] xfs: remove a superfluous s_fs_info NULL check in xfs_fs_put_super Christoph Hellwig
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 31+ messages in thread
From: Christoph Hellwig @ 2023-08-09 22:05 UTC (permalink / raw)
  To: Al Viro, Christian Brauner
  Cc: Namjae Jeon, Sungjong Seo, Theodore Ts'o, Andreas Dilger,
	Konstantin Komarov, Darrick J. Wong, linux-fsdevel, linux-ext4,
	ntfs3, linux-xfs

The xfs_fs_free prototype formatting is a weird mix of the classic XFS
style and the Linux style.  Fix it up to be consistent.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_super.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index bf6e0a261a49e6..0a294659c18972 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1933,7 +1933,8 @@ xfs_fs_reconfigure(
 	return 0;
 }
 
-static void xfs_fs_free(
+static void
+xfs_fs_free(
 	struct fs_context	*fc)
 {
 	struct xfs_mount	*mp = fc->s_fs_info;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 02/13] xfs: remove a superfluous s_fs_info NULL check in xfs_fs_put_super
  2023-08-09 22:05 s_fs_info and ->kill_sb revisited v2 Christoph Hellwig
  2023-08-09 22:05 ` [PATCH 01/13] xfs: reformat the xfs_fs_free prototype Christoph Hellwig
@ 2023-08-09 22:05 ` Christoph Hellwig
  2023-08-09 22:05 ` [PATCH 03/13] xfs: free the xfs_mount in ->kill_sb Christoph Hellwig
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 31+ messages in thread
From: Christoph Hellwig @ 2023-08-09 22:05 UTC (permalink / raw)
  To: Al Viro, Christian Brauner
  Cc: Namjae Jeon, Sungjong Seo, Theodore Ts'o, Andreas Dilger,
	Konstantin Komarov, Darrick J. Wong, linux-fsdevel, linux-ext4,
	ntfs3, linux-xfs

->put_super is only called when sb->s_root is set, and thus when
fill_super succeeds.  Thus drop the NULL check that can't happen in
xfs_fs_put_super.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_super.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 0a294659c18972..128f4a2924d49c 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1132,10 +1132,6 @@ xfs_fs_put_super(
 {
 	struct xfs_mount	*mp = XFS_M(sb);
 
-	/* if ->fill_super failed, we have no mount to tear down */
-	if (!sb->s_fs_info)
-		return;
-
 	xfs_notice(mp, "Unmounting Filesystem %pU", &mp->m_sb.sb_uuid);
 	xfs_filestream_unmount(mp);
 	xfs_unmountfs(mp);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 03/13] xfs: free the xfs_mount in ->kill_sb
  2023-08-09 22:05 s_fs_info and ->kill_sb revisited v2 Christoph Hellwig
  2023-08-09 22:05 ` [PATCH 01/13] xfs: reformat the xfs_fs_free prototype Christoph Hellwig
  2023-08-09 22:05 ` [PATCH 02/13] xfs: remove a superfluous s_fs_info NULL check in xfs_fs_put_super Christoph Hellwig
@ 2023-08-09 22:05 ` Christoph Hellwig
  2023-08-09 22:05 ` [PATCH 04/13] xfs: remove xfs_blkdev_put Christoph Hellwig
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 31+ messages in thread
From: Christoph Hellwig @ 2023-08-09 22:05 UTC (permalink / raw)
  To: Al Viro, Christian Brauner
  Cc: Namjae Jeon, Sungjong Seo, Theodore Ts'o, Andreas Dilger,
	Konstantin Komarov, Darrick J. Wong, linux-fsdevel, linux-ext4,
	ntfs3, linux-xfs

As a rule of thumb everything allocated to the fs_context and moved into
the super_block should be freed by ->kill_sb so that the teardown
handling doesn't need to be duplicated between the fill_super error
path and put_super.  Implement a XFS-specific kill_sb method to do that.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_super.c | 20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 128f4a2924d49c..d2f3ae6ba8938b 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1143,9 +1143,6 @@ xfs_fs_put_super(
 	xfs_destroy_percpu_counters(mp);
 	xfs_destroy_mount_workqueues(mp);
 	xfs_close_devices(mp);
-
-	sb->s_fs_info = NULL;
-	xfs_mount_free(mp);
 }
 
 static long
@@ -1487,7 +1484,7 @@ xfs_fs_fill_super(
 
 	error = xfs_fs_validate_params(mp);
 	if (error)
-		goto out_free_names;
+		return error;
 
 	sb_min_blocksize(sb, BBSIZE);
 	sb->s_xattr = xfs_xattr_handlers;
@@ -1514,7 +1511,7 @@ xfs_fs_fill_super(
 
 	error = xfs_open_devices(mp);
 	if (error)
-		goto out_free_names;
+		return error;
 
 	error = xfs_init_mount_workqueues(mp);
 	if (error)
@@ -1734,9 +1731,6 @@ xfs_fs_fill_super(
 	xfs_destroy_mount_workqueues(mp);
  out_close_devices:
 	xfs_close_devices(mp);
- out_free_names:
-	sb->s_fs_info = NULL;
-	xfs_mount_free(mp);
 	return error;
 
  out_unmount:
@@ -1999,12 +1993,20 @@ static int xfs_init_fs_context(
 	return 0;
 }
 
+static void
+xfs_kill_sb(
+	struct super_block		*sb)
+{
+	kill_block_super(sb);
+	xfs_mount_free(XFS_M(sb));
+}
+
 static struct file_system_type xfs_fs_type = {
 	.owner			= THIS_MODULE,
 	.name			= "xfs",
 	.init_fs_context	= xfs_init_fs_context,
 	.parameters		= xfs_fs_parameters,
-	.kill_sb		= kill_block_super,
+	.kill_sb		= xfs_kill_sb,
 	.fs_flags		= FS_REQUIRES_DEV | FS_ALLOW_IDMAP,
 };
 MODULE_ALIAS_FS("xfs");
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 04/13] xfs: remove xfs_blkdev_put
  2023-08-09 22:05 s_fs_info and ->kill_sb revisited v2 Christoph Hellwig
                   ` (2 preceding siblings ...)
  2023-08-09 22:05 ` [PATCH 03/13] xfs: free the xfs_mount in ->kill_sb Christoph Hellwig
@ 2023-08-09 22:05 ` Christoph Hellwig
  2023-08-09 22:05 ` [PATCH 05/13] xfs: close the RT and log block devices in xfs_free_buftarg Christoph Hellwig
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 31+ messages in thread
From: Christoph Hellwig @ 2023-08-09 22:05 UTC (permalink / raw)
  To: Al Viro, Christian Brauner
  Cc: Namjae Jeon, Sungjong Seo, Theodore Ts'o, Andreas Dilger,
	Konstantin Komarov, Darrick J. Wong, linux-fsdevel, linux-ext4,
	ntfs3, linux-xfs

There isn't much use for this trivial wrapper, especially as the NULL
check is only needed in a single call site.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_super.c | 18 +++++-------------
 1 file changed, 5 insertions(+), 13 deletions(-)

diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index d2f3ae6ba8938b..fc21e543357ef5 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -395,15 +395,6 @@ xfs_blkdev_get(
 	return error;
 }
 
-STATIC void
-xfs_blkdev_put(
-	struct xfs_mount	*mp,
-	struct block_device	*bdev)
-{
-	if (bdev)
-		blkdev_put(bdev, mp->m_super);
-}
-
 STATIC void
 xfs_close_devices(
 	struct xfs_mount	*mp)
@@ -412,13 +403,13 @@ xfs_close_devices(
 		struct block_device *logdev = mp->m_logdev_targp->bt_bdev;
 
 		xfs_free_buftarg(mp->m_logdev_targp);
-		xfs_blkdev_put(mp, logdev);
+		blkdev_put(logdev, mp->m_super);
 	}
 	if (mp->m_rtdev_targp) {
 		struct block_device *rtdev = mp->m_rtdev_targp->bt_bdev;
 
 		xfs_free_buftarg(mp->m_rtdev_targp);
-		xfs_blkdev_put(mp, rtdev);
+		blkdev_put(rtdev, mp->m_super);
 	}
 	xfs_free_buftarg(mp->m_ddev_targp);
 }
@@ -503,10 +494,11 @@ xfs_open_devices(
  out_free_ddev_targ:
 	xfs_free_buftarg(mp->m_ddev_targp);
  out_close_rtdev:
-	xfs_blkdev_put(mp, rtdev);
+	if (rtdev)
+		blkdev_put(rtdev, sb);
  out_close_logdev:
 	if (logdev && logdev != ddev)
-		xfs_blkdev_put(mp, logdev);
+		blkdev_put(logdev, sb);
 	goto out_relock;
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 05/13] xfs: close the RT and log block devices in xfs_free_buftarg
  2023-08-09 22:05 s_fs_info and ->kill_sb revisited v2 Christoph Hellwig
                   ` (3 preceding siblings ...)
  2023-08-09 22:05 ` [PATCH 04/13] xfs: remove xfs_blkdev_put Christoph Hellwig
@ 2023-08-09 22:05 ` Christoph Hellwig
  2023-08-09 22:05 ` [PATCH 06/13] xfs: close the external block devices in xfs_mount_free Christoph Hellwig
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 31+ messages in thread
From: Christoph Hellwig @ 2023-08-09 22:05 UTC (permalink / raw)
  To: Al Viro, Christian Brauner
  Cc: Namjae Jeon, Sungjong Seo, Theodore Ts'o, Andreas Dilger,
	Konstantin Komarov, Darrick J. Wong, linux-fsdevel, linux-ext4,
	ntfs3, linux-xfs

Closing the block devices logically belongs into xfs_free_buftarg,  So
instead of open coding it in the caller move it there and add a check
for the s_bdev so that the main device isn't close as that's done by the
VFS helper.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_buf.c   |  5 +++++
 fs/xfs/xfs_super.c | 12 ++----------
 2 files changed, 7 insertions(+), 10 deletions(-)

diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 15d1e5a7c2d340..e33eb17648dfed 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -1938,6 +1938,8 @@ void
 xfs_free_buftarg(
 	struct xfs_buftarg	*btp)
 {
+	struct block_device	*bdev = btp->bt_bdev;
+
 	unregister_shrinker(&btp->bt_shrinker);
 	ASSERT(percpu_counter_sum(&btp->bt_io_count) == 0);
 	percpu_counter_destroy(&btp->bt_io_count);
@@ -1946,6 +1948,9 @@ xfs_free_buftarg(
 	blkdev_issue_flush(btp->bt_bdev);
 	invalidate_bdev(btp->bt_bdev);
 	fs_put_dax(btp->bt_daxdev, btp->bt_mount);
+	/* the main block device is closed by kill_block_super */
+	if (bdev != btp->bt_mount->m_super->s_bdev)
+		blkdev_put(bdev, btp->bt_mount->m_super);
 
 	kmem_free(btp);
 }
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index fc21e543357ef5..368c05a2dea5b9 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -399,18 +399,10 @@ STATIC void
 xfs_close_devices(
 	struct xfs_mount	*mp)
 {
-	if (mp->m_logdev_targp && mp->m_logdev_targp != mp->m_ddev_targp) {
-		struct block_device *logdev = mp->m_logdev_targp->bt_bdev;
-
+	if (mp->m_logdev_targp && mp->m_logdev_targp != mp->m_ddev_targp)
 		xfs_free_buftarg(mp->m_logdev_targp);
-		blkdev_put(logdev, mp->m_super);
-	}
-	if (mp->m_rtdev_targp) {
-		struct block_device *rtdev = mp->m_rtdev_targp->bt_bdev;
-
+	if (mp->m_rtdev_targp)
 		xfs_free_buftarg(mp->m_rtdev_targp);
-		blkdev_put(rtdev, mp->m_super);
-	}
 	xfs_free_buftarg(mp->m_ddev_targp);
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 06/13] xfs: close the external block devices in xfs_mount_free
  2023-08-09 22:05 s_fs_info and ->kill_sb revisited v2 Christoph Hellwig
                   ` (4 preceding siblings ...)
  2023-08-09 22:05 ` [PATCH 05/13] xfs: close the RT and log block devices in xfs_free_buftarg Christoph Hellwig
@ 2023-08-09 22:05 ` Christoph Hellwig
  2023-08-09 22:34   ` Darrick J. Wong
  2023-08-09 22:05 ` [PATCH 07/13] xfs: document the invalidate_bdev call in invalidate_bdev Christoph Hellwig
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 31+ messages in thread
From: Christoph Hellwig @ 2023-08-09 22:05 UTC (permalink / raw)
  To: Al Viro, Christian Brauner
  Cc: Namjae Jeon, Sungjong Seo, Theodore Ts'o, Andreas Dilger,
	Konstantin Komarov, Darrick J. Wong, linux-fsdevel, linux-ext4,
	ntfs3, linux-xfs

blkdev_put must not be called under sb->s_umount to avoid a lock order
reversal with disk->open_mutex.  Move closing the buftargs into ->kill_sb
to archive that.  Note that the flushing of the disk caches and
block device mapping invalidated needs to stay in ->put_super as the main
block device is closed in kill_block_super already.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_buf.c   |  2 --
 fs/xfs/xfs_super.c | 36 ++++++++++++++++++++++++++----------
 2 files changed, 26 insertions(+), 12 deletions(-)

diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index e33eb17648dfed..3b903f6bce98d8 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -1945,8 +1945,6 @@ xfs_free_buftarg(
 	percpu_counter_destroy(&btp->bt_io_count);
 	list_lru_destroy(&btp->bt_lru);
 
-	blkdev_issue_flush(btp->bt_bdev);
-	invalidate_bdev(btp->bt_bdev);
 	fs_put_dax(btp->bt_daxdev, btp->bt_mount);
 	/* the main block device is closed by kill_block_super */
 	if (bdev != btp->bt_mount->m_super->s_bdev)
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 368c05a2dea5b9..4ae3b01ed038c7 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -396,14 +396,19 @@ xfs_blkdev_get(
 }
 
 STATIC void
-xfs_close_devices(
+xfs_shutdown_devices(
 	struct xfs_mount	*mp)
 {
-	if (mp->m_logdev_targp && mp->m_logdev_targp != mp->m_ddev_targp)
-		xfs_free_buftarg(mp->m_logdev_targp);
-	if (mp->m_rtdev_targp)
-		xfs_free_buftarg(mp->m_rtdev_targp);
-	xfs_free_buftarg(mp->m_ddev_targp);
+	if (mp->m_logdev_targp && mp->m_logdev_targp != mp->m_ddev_targp) {
+		blkdev_issue_flush(mp->m_logdev_targp->bt_bdev);
+		invalidate_bdev(mp->m_logdev_targp->bt_bdev);
+	}
+	if (mp->m_rtdev_targp) {
+		blkdev_issue_flush(mp->m_rtdev_targp->bt_bdev);
+		invalidate_bdev(mp->m_rtdev_targp->bt_bdev);
+	}
+	blkdev_issue_flush(mp->m_ddev_targp->bt_bdev);
+	invalidate_bdev(mp->m_ddev_targp->bt_bdev);
 }
 
 /*
@@ -741,6 +746,17 @@ static void
 xfs_mount_free(
 	struct xfs_mount	*mp)
 {
+	/*
+	 * Free the buftargs here because blkdev_put needs to be called outside
+	 * of sb->s_umount, which is held around the call to ->put_super.
+	 */
+	if (mp->m_logdev_targp && mp->m_logdev_targp != mp->m_ddev_targp)
+		xfs_free_buftarg(mp->m_logdev_targp);
+	if (mp->m_rtdev_targp)
+		xfs_free_buftarg(mp->m_rtdev_targp);
+	if (mp->m_ddev_targp)
+		xfs_free_buftarg(mp->m_ddev_targp);
+
 	kfree(mp->m_rtname);
 	kfree(mp->m_logname);
 	kmem_free(mp);
@@ -1126,7 +1142,7 @@ xfs_fs_put_super(
 	xfs_inodegc_free_percpu(mp);
 	xfs_destroy_percpu_counters(mp);
 	xfs_destroy_mount_workqueues(mp);
-	xfs_close_devices(mp);
+	xfs_shutdown_devices(mp);
 }
 
 static long
@@ -1499,7 +1515,7 @@ xfs_fs_fill_super(
 
 	error = xfs_init_mount_workqueues(mp);
 	if (error)
-		goto out_close_devices;
+		goto out_shutdown_devices;
 
 	error = xfs_init_percpu_counters(mp);
 	if (error)
@@ -1713,8 +1729,8 @@ xfs_fs_fill_super(
 	xfs_destroy_percpu_counters(mp);
  out_destroy_workqueues:
 	xfs_destroy_mount_workqueues(mp);
- out_close_devices:
-	xfs_close_devices(mp);
+ out_shutdown_devices:
+	xfs_shutdown_devices(mp);
 	return error;
 
  out_unmount:
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 07/13] xfs: document the invalidate_bdev call in invalidate_bdev
  2023-08-09 22:05 s_fs_info and ->kill_sb revisited v2 Christoph Hellwig
                   ` (5 preceding siblings ...)
  2023-08-09 22:05 ` [PATCH 06/13] xfs: close the external block devices in xfs_mount_free Christoph Hellwig
@ 2023-08-09 22:05 ` Christoph Hellwig
  2023-08-09 22:39   ` Darrick J. Wong
  2023-08-10 15:22   ` Matthew Wilcox
  2023-08-09 22:05 ` [PATCH 08/13] ext4: close the external journal device in ->kill_sb Christoph Hellwig
                   ` (5 subsequent siblings)
  12 siblings, 2 replies; 31+ messages in thread
From: Christoph Hellwig @ 2023-08-09 22:05 UTC (permalink / raw)
  To: Al Viro, Christian Brauner
  Cc: Namjae Jeon, Sungjong Seo, Theodore Ts'o, Andreas Dilger,
	Konstantin Komarov, Darrick J. Wong, linux-fsdevel, linux-ext4,
	ntfs3, linux-xfs

Copy and paste the commit message from Darrick into a comment to explain
the seemly odd invalidate_bdev in xfs_shutdown_devices.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_super.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 4ae3b01ed038c7..c169beb0d8cab3 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -399,6 +399,32 @@ STATIC void
 xfs_shutdown_devices(
 	struct xfs_mount	*mp)
 {
+	/*
+	 * Udev is triggered whenever anyone closes a block device or unmounts
+	 * a file systemm on a block device.
+	 * The default udev rules invoke blkid to read the fs super and create
+	 * symlinks to the bdev under /dev/disk.  For this, it uses buffered
+	 * reads through the page cache.
+	 *
+	 * xfs_db also uses buffered reads to examine metadata.  There is no
+	 * coordination between xfs_db and udev, which means that they can run
+	 * concurrently.  Note there is no coordination between the kernel and
+	 * blkid either.
+	 *
+	 * On a system with 64k pages, the page cache can cache the superblock
+	 * and the root inode (and hence the root directory) with the same 64k
+	 * page.  If udev spawns blkid after the mkfs and the system is busy
+	 * enough that it is still running when xfs_db starts up, they'll both
+	 * read from the same page in the pagecache.
+	 *
+	 * The unmount writes updated inode metadata to disk directly.  The XFS
+	 * buffer cache does not use the bdev pagecache, nor does it invalidate
+	 * the pagecache on umount.  If the above scenario occurs, the pagecache
+	 * no longer reflects what's on disk, xfs_db reads the stale metadata,
+	 * and fails to find /a.  Most of the time this succeeds because closing
+	 * a bdev invalidates the page cache, but when processes race, everyone
+	 * loses.
+	 */
 	if (mp->m_logdev_targp && mp->m_logdev_targp != mp->m_ddev_targp) {
 		blkdev_issue_flush(mp->m_logdev_targp->bt_bdev);
 		invalidate_bdev(mp->m_logdev_targp->bt_bdev);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 08/13] ext4: close the external journal device in ->kill_sb
  2023-08-09 22:05 s_fs_info and ->kill_sb revisited v2 Christoph Hellwig
                   ` (6 preceding siblings ...)
  2023-08-09 22:05 ` [PATCH 07/13] xfs: document the invalidate_bdev call in invalidate_bdev Christoph Hellwig
@ 2023-08-09 22:05 ` Christoph Hellwig
  2023-08-09 22:05 ` [PATCH 09/13] exfat: don't RCU-free the sbi Christoph Hellwig
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 31+ messages in thread
From: Christoph Hellwig @ 2023-08-09 22:05 UTC (permalink / raw)
  To: Al Viro, Christian Brauner
  Cc: Namjae Jeon, Sungjong Seo, Theodore Ts'o, Andreas Dilger,
	Konstantin Komarov, Darrick J. Wong, linux-fsdevel, linux-ext4,
	ntfs3, linux-xfs

blkdev_put must not be called under sb->s_umount to avoid a lock order
reversal with disk->open_mutex.  Move closing the external journal device
into ->kill_sb to archive that.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/ext4/super.c | 50 ++++++++++++++++++++++++-------------------------
 1 file changed, 25 insertions(+), 25 deletions(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index e6384782b4d036..60d2815a0b7ea5 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -93,6 +93,7 @@ static int ext4_get_tree(struct fs_context *fc);
 static int ext4_reconfigure(struct fs_context *fc);
 static void ext4_fc_free(struct fs_context *fc);
 static int ext4_init_fs_context(struct fs_context *fc);
+static void ext4_kill_sb(struct super_block *sb);
 static const struct fs_parameter_spec ext4_param_specs[];
 
 /*
@@ -135,7 +136,7 @@ static struct file_system_type ext2_fs_type = {
 	.name			= "ext2",
 	.init_fs_context	= ext4_init_fs_context,
 	.parameters		= ext4_param_specs,
-	.kill_sb		= kill_block_super,
+	.kill_sb		= ext4_kill_sb,
 	.fs_flags		= FS_REQUIRES_DEV,
 };
 MODULE_ALIAS_FS("ext2");
@@ -151,7 +152,7 @@ static struct file_system_type ext3_fs_type = {
 	.name			= "ext3",
 	.init_fs_context	= ext4_init_fs_context,
 	.parameters		= ext4_param_specs,
-	.kill_sb		= kill_block_super,
+	.kill_sb		= ext4_kill_sb,
 	.fs_flags		= FS_REQUIRES_DEV,
 };
 MODULE_ALIAS_FS("ext3");
@@ -1116,25 +1117,6 @@ static struct block_device *ext4_blkdev_get(dev_t dev, struct super_block *sb)
 	return NULL;
 }
 
-/*
- * Release the journal device
- */
-static void ext4_blkdev_remove(struct ext4_sb_info *sbi)
-{
-	struct block_device *bdev;
-	bdev = sbi->s_journal_bdev;
-	if (bdev) {
-		/*
-		 * Invalidate the journal device's buffers.  We don't want them
-		 * floating about in memory - the physical journal device may
-		 * hotswapped, and it breaks the `ro-after' testing code.
-		 */
-		invalidate_bdev(bdev);
-		blkdev_put(bdev, sbi->s_sb);
-		sbi->s_journal_bdev = NULL;
-	}
-}
-
 static inline struct inode *orphan_list_entry(struct list_head *l)
 {
 	return &list_entry(l, struct ext4_inode_info, i_orphan)->vfs_inode;
@@ -1330,8 +1312,13 @@ static void ext4_put_super(struct super_block *sb)
 	sync_blockdev(sb->s_bdev);
 	invalidate_bdev(sb->s_bdev);
 	if (sbi->s_journal_bdev) {
+		/*
+		 * Invalidate the journal device's buffers.  We don't want them
+		 * floating about in memory - the physical journal device may
+		 * hotswapped, and it breaks the `ro-after' testing code.
+		 */
 		sync_blockdev(sbi->s_journal_bdev);
-		ext4_blkdev_remove(sbi);
+		invalidate_bdev(sbi->s_journal_bdev);
 	}
 
 	ext4_xattr_destroy_cache(sbi->s_ea_inode_cache);
@@ -5654,9 +5641,11 @@ failed_mount9: __maybe_unused
 		kfree(get_qf_name(sb, sbi, i));
 #endif
 	fscrypt_free_dummy_policy(&sbi->s_dummy_enc_policy);
-	/* ext4_blkdev_remove() calls kill_bdev(), release bh before it. */
 	brelse(sbi->s_sbh);
-	ext4_blkdev_remove(sbi);
+	if (sbi->s_journal_bdev) {
+		invalidate_bdev(sbi->s_journal_bdev);
+		blkdev_put(sbi->s_journal_bdev, sb);
+	}
 out_fail:
 	invalidate_bdev(sb->s_bdev);
 	sb->s_fs_info = NULL;
@@ -7266,12 +7255,23 @@ static inline int ext3_feature_set_ok(struct super_block *sb)
 	return 1;
 }
 
+static void ext4_kill_sb(struct super_block *sb)
+{
+	struct ext4_sb_info *sbi = EXT4_SB(sb);
+	struct block_device *journal_bdev = sbi ? sbi->s_journal_bdev : NULL;
+
+	kill_block_super(sb);
+
+	if (journal_bdev)
+		blkdev_put(journal_bdev, sb);
+}
+
 static struct file_system_type ext4_fs_type = {
 	.owner			= THIS_MODULE,
 	.name			= "ext4",
 	.init_fs_context	= ext4_init_fs_context,
 	.parameters		= ext4_param_specs,
-	.kill_sb		= kill_block_super,
+	.kill_sb		= ext4_kill_sb,
 	.fs_flags		= FS_REQUIRES_DEV | FS_ALLOW_IDMAP,
 };
 MODULE_ALIAS_FS("ext4");
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 09/13] exfat: don't RCU-free the sbi
  2023-08-09 22:05 s_fs_info and ->kill_sb revisited v2 Christoph Hellwig
                   ` (7 preceding siblings ...)
  2023-08-09 22:05 ` [PATCH 08/13] ext4: close the external journal device in ->kill_sb Christoph Hellwig
@ 2023-08-09 22:05 ` Christoph Hellwig
  2023-08-10 13:01   ` Namjae Jeon
  2023-08-09 22:05 ` [PATCH 10/13] exfat: free the sbi and iocharset in ->kill_sb Christoph Hellwig
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 31+ messages in thread
From: Christoph Hellwig @ 2023-08-09 22:05 UTC (permalink / raw)
  To: Al Viro, Christian Brauner
  Cc: Namjae Jeon, Sungjong Seo, Theodore Ts'o, Andreas Dilger,
	Konstantin Komarov, Darrick J. Wong, linux-fsdevel, linux-ext4,
	ntfs3, linux-xfs

There are no RCU critical sections for accessing any information in the
sbi, so drop the call_rcu indirection for freeing the sbi.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/exfat/exfat_fs.h |  2 --
 fs/exfat/super.c    | 15 ++++-----------
 2 files changed, 4 insertions(+), 13 deletions(-)

diff --git a/fs/exfat/exfat_fs.h b/fs/exfat/exfat_fs.h
index 729ada9e26e82e..f55498e5c23d46 100644
--- a/fs/exfat/exfat_fs.h
+++ b/fs/exfat/exfat_fs.h
@@ -273,8 +273,6 @@ struct exfat_sb_info {
 
 	spinlock_t inode_hash_lock;
 	struct hlist_head inode_hashtable[EXFAT_HASH_SIZE];
-
-	struct rcu_head rcu;
 };
 
 #define EXFAT_CACHE_VALID	0
diff --git a/fs/exfat/super.c b/fs/exfat/super.c
index 8c32460e031e80..3c6aec96d0dc85 100644
--- a/fs/exfat/super.c
+++ b/fs/exfat/super.c
@@ -31,16 +31,6 @@ static void exfat_free_iocharset(struct exfat_sb_info *sbi)
 		kfree(sbi->options.iocharset);
 }
 
-static void exfat_delayed_free(struct rcu_head *p)
-{
-	struct exfat_sb_info *sbi = container_of(p, struct exfat_sb_info, rcu);
-
-	unload_nls(sbi->nls_io);
-	exfat_free_iocharset(sbi);
-	exfat_free_upcase_table(sbi);
-	kfree(sbi);
-}
-
 static void exfat_put_super(struct super_block *sb)
 {
 	struct exfat_sb_info *sbi = EXFAT_SB(sb);
@@ -50,7 +40,10 @@ static void exfat_put_super(struct super_block *sb)
 	brelse(sbi->boot_bh);
 	mutex_unlock(&sbi->s_lock);
 
-	call_rcu(&sbi->rcu, exfat_delayed_free);
+	unload_nls(sbi->nls_io);
+	exfat_free_iocharset(sbi);
+	exfat_free_upcase_table(sbi);
+	kfree(sbi);
 }
 
 static int exfat_sync_fs(struct super_block *sb, int wait)
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 10/13] exfat: free the sbi and iocharset in ->kill_sb
  2023-08-09 22:05 s_fs_info and ->kill_sb revisited v2 Christoph Hellwig
                   ` (8 preceding siblings ...)
  2023-08-09 22:05 ` [PATCH 09/13] exfat: don't RCU-free the sbi Christoph Hellwig
@ 2023-08-09 22:05 ` Christoph Hellwig
  2023-08-10 13:02   ` Namjae Jeon
  2023-08-09 22:05 ` [PATCH 11/13] ntfs3: rename put_ntfs ntfs3_free_sbi Christoph Hellwig
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 31+ messages in thread
From: Christoph Hellwig @ 2023-08-09 22:05 UTC (permalink / raw)
  To: Al Viro, Christian Brauner
  Cc: Namjae Jeon, Sungjong Seo, Theodore Ts'o, Andreas Dilger,
	Konstantin Komarov, Darrick J. Wong, linux-fsdevel, linux-ext4,
	ntfs3, linux-xfs

As a rule of thumb everything allocated to the fs_context and moved into
the super_block should be freed by ->kill_sb so that the teardown
handling doesn't need to be duplicated between the fill_super error
path and put_super.  Implement an exfat-specific kill_sb method to do
that and share the code with the mount contex free helper for the
mount error handling case.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/exfat/super.c | 28 ++++++++++++++++++----------
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/fs/exfat/super.c b/fs/exfat/super.c
index 3c6aec96d0dc85..85b04a4064af1e 100644
--- a/fs/exfat/super.c
+++ b/fs/exfat/super.c
@@ -41,9 +41,7 @@ static void exfat_put_super(struct super_block *sb)
 	mutex_unlock(&sbi->s_lock);
 
 	unload_nls(sbi->nls_io);
-	exfat_free_iocharset(sbi);
 	exfat_free_upcase_table(sbi);
-	kfree(sbi);
 }
 
 static int exfat_sync_fs(struct super_block *sb, int wait)
@@ -703,9 +701,6 @@ static int exfat_fill_super(struct super_block *sb, struct fs_context *fc)
 
 check_nls_io:
 	unload_nls(sbi->nls_io);
-	exfat_free_iocharset(sbi);
-	sb->s_fs_info = NULL;
-	kfree(sbi);
 	return err;
 }
 
@@ -714,14 +709,18 @@ static int exfat_get_tree(struct fs_context *fc)
 	return get_tree_bdev(fc, exfat_fill_super);
 }
 
+static void exfat_free_sbi(struct exfat_sb_info *sbi)
+{
+	exfat_free_iocharset(sbi);
+	kfree(sbi);
+}
+
 static void exfat_free(struct fs_context *fc)
 {
 	struct exfat_sb_info *sbi = fc->s_fs_info;
 
-	if (sbi) {
-		exfat_free_iocharset(sbi);
-		kfree(sbi);
-	}
+	if (sbi)
+		exfat_free_sbi(sbi);
 }
 
 static int exfat_reconfigure(struct fs_context *fc)
@@ -766,12 +765,21 @@ static int exfat_init_fs_context(struct fs_context *fc)
 	return 0;
 }
 
+static void exfat_kill_sb(struct super_block *sb)
+{
+	struct exfat_sb_info *sbi = sb->s_fs_info;
+
+	kill_block_super(sb);
+	if (sbi)
+		exfat_free_sbi(sbi);
+}
+
 static struct file_system_type exfat_fs_type = {
 	.owner			= THIS_MODULE,
 	.name			= "exfat",
 	.init_fs_context	= exfat_init_fs_context,
 	.parameters		= exfat_parameters,
-	.kill_sb		= kill_block_super,
+	.kill_sb		= exfat_kill_sb,
 	.fs_flags		= FS_REQUIRES_DEV,
 };
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 11/13] ntfs3: rename put_ntfs ntfs3_free_sbi
  2023-08-09 22:05 s_fs_info and ->kill_sb revisited v2 Christoph Hellwig
                   ` (9 preceding siblings ...)
  2023-08-09 22:05 ` [PATCH 10/13] exfat: free the sbi and iocharset in ->kill_sb Christoph Hellwig
@ 2023-08-09 22:05 ` Christoph Hellwig
  2023-08-09 22:05 ` [PATCH 12/13] ntfs3: don't call sync_blockdev in ntfs_put_super Christoph Hellwig
  2023-08-09 22:05 ` [PATCH 13/13] ntfs3: free the sbi in ->kill_sb Christoph Hellwig
  12 siblings, 0 replies; 31+ messages in thread
From: Christoph Hellwig @ 2023-08-09 22:05 UTC (permalink / raw)
  To: Al Viro, Christian Brauner
  Cc: Namjae Jeon, Sungjong Seo, Theodore Ts'o, Andreas Dilger,
	Konstantin Komarov, Darrick J. Wong, linux-fsdevel, linux-ext4,
	ntfs3, linux-xfs

put_ntfs is a rather unconventional name for a function that frees the
sbi and associated resources.  Give it a more descriptive name and drop
the duplicate name in the top of the function comment.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
---
 fs/ntfs3/super.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/ntfs3/super.c b/fs/ntfs3/super.c
index 1a02072b6b0e16..bb985d3756d949 100644
--- a/fs/ntfs3/super.c
+++ b/fs/ntfs3/super.c
@@ -569,9 +569,9 @@ static void init_once(void *foo)
 }
 
 /*
- * put_ntfs - Noinline to reduce binary size.
+ * Noinline to reduce binary size.
  */
-static noinline void put_ntfs(struct ntfs_sb_info *sbi)
+static noinline void ntfs3_free_sbi(struct ntfs_sb_info *sbi)
 {
 	kfree(sbi->new_rec);
 	kvfree(ntfs_put_shared(sbi->upcase));
@@ -627,7 +627,7 @@ static void ntfs_put_super(struct super_block *sb)
 	ntfs_set_state(sbi, NTFS_DIRTY_CLEAR);
 
 	put_mount_options(sbi->options);
-	put_ntfs(sbi);
+	ntfs3_free_sbi(sbi);
 	sb->s_fs_info = NULL;
 
 	sync_blockdev(sb->s_bdev);
@@ -1569,7 +1569,7 @@ static int ntfs_fill_super(struct super_block *sb, struct fs_context *fc)
 	 * ntfs_fs_free will be called with fc->s_fs_info = NULL
 	 */
 	put_mount_options(sbi->options);
-	put_ntfs(sbi);
+	ntfs3_free_sbi(sbi);
 	sb->s_fs_info = NULL;
 	kfree(boot2);
 
@@ -1659,7 +1659,7 @@ static void ntfs_fs_free(struct fs_context *fc)
 	struct ntfs_sb_info *sbi = fc->s_fs_info;
 
 	if (sbi)
-		put_ntfs(sbi);
+		ntfs3_free_sbi(sbi);
 
 	if (opts)
 		put_mount_options(opts);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 12/13] ntfs3: don't call sync_blockdev in ntfs_put_super
  2023-08-09 22:05 s_fs_info and ->kill_sb revisited v2 Christoph Hellwig
                   ` (10 preceding siblings ...)
  2023-08-09 22:05 ` [PATCH 11/13] ntfs3: rename put_ntfs ntfs3_free_sbi Christoph Hellwig
@ 2023-08-09 22:05 ` Christoph Hellwig
  2023-08-09 22:05 ` [PATCH 13/13] ntfs3: free the sbi in ->kill_sb Christoph Hellwig
  12 siblings, 0 replies; 31+ messages in thread
From: Christoph Hellwig @ 2023-08-09 22:05 UTC (permalink / raw)
  To: Al Viro, Christian Brauner
  Cc: Namjae Jeon, Sungjong Seo, Theodore Ts'o, Andreas Dilger,
	Konstantin Komarov, Darrick J. Wong, linux-fsdevel, linux-ext4,
	ntfs3, linux-xfs

kill_block_super will call sync_blockdev just a tad later already.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
---
 fs/ntfs3/super.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/fs/ntfs3/super.c b/fs/ntfs3/super.c
index bb985d3756d949..727138933a9324 100644
--- a/fs/ntfs3/super.c
+++ b/fs/ntfs3/super.c
@@ -629,8 +629,6 @@ static void ntfs_put_super(struct super_block *sb)
 	put_mount_options(sbi->options);
 	ntfs3_free_sbi(sbi);
 	sb->s_fs_info = NULL;
-
-	sync_blockdev(sb->s_bdev);
 }
 
 static int ntfs_statfs(struct dentry *dentry, struct kstatfs *buf)
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 13/13] ntfs3: free the sbi in ->kill_sb
  2023-08-09 22:05 s_fs_info and ->kill_sb revisited v2 Christoph Hellwig
                   ` (11 preceding siblings ...)
  2023-08-09 22:05 ` [PATCH 12/13] ntfs3: don't call sync_blockdev in ntfs_put_super Christoph Hellwig
@ 2023-08-09 22:05 ` Christoph Hellwig
  2023-09-07 13:05   ` Guenter Roeck
  12 siblings, 1 reply; 31+ messages in thread
From: Christoph Hellwig @ 2023-08-09 22:05 UTC (permalink / raw)
  To: Al Viro, Christian Brauner
  Cc: Namjae Jeon, Sungjong Seo, Theodore Ts'o, Andreas Dilger,
	Konstantin Komarov, Darrick J. Wong, linux-fsdevel, linux-ext4,
	ntfs3, linux-xfs

As a rule of thumb everything allocated to the fs_context and moved into
the super_block should be freed by ->kill_sb so that the teardown
handling doesn't need to be duplicated between the fill_super error
path and put_super.  Implement an ntfs3-specific kill_sb method to do
that.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
---
 fs/ntfs3/super.c | 25 ++++++++++++-------------
 1 file changed, 12 insertions(+), 13 deletions(-)

diff --git a/fs/ntfs3/super.c b/fs/ntfs3/super.c
index 727138933a9324..5fffddea554f18 100644
--- a/fs/ntfs3/super.c
+++ b/fs/ntfs3/super.c
@@ -625,10 +625,6 @@ static void ntfs_put_super(struct super_block *sb)
 
 	/* Mark rw ntfs as clear, if possible. */
 	ntfs_set_state(sbi, NTFS_DIRTY_CLEAR);
-
-	put_mount_options(sbi->options);
-	ntfs3_free_sbi(sbi);
-	sb->s_fs_info = NULL;
 }
 
 static int ntfs_statfs(struct dentry *dentry, struct kstatfs *buf)
@@ -1562,15 +1558,7 @@ static int ntfs_fill_super(struct super_block *sb, struct fs_context *fc)
 put_inode_out:
 	iput(inode);
 out:
-	/*
-	 * Free resources here.
-	 * ntfs_fs_free will be called with fc->s_fs_info = NULL
-	 */
-	put_mount_options(sbi->options);
-	ntfs3_free_sbi(sbi);
-	sb->s_fs_info = NULL;
 	kfree(boot2);
-
 	return err;
 }
 
@@ -1726,13 +1714,24 @@ static int ntfs_init_fs_context(struct fs_context *fc)
 	return -ENOMEM;
 }
 
+static void ntfs3_kill_sb(struct super_block *sb)
+{
+	struct ntfs_sb_info *sbi = sb->s_fs_info;
+
+	kill_block_super(sb);
+
+	if (sbi->options)
+		put_mount_options(sbi->options);
+	ntfs3_free_sbi(sbi);
+}
+
 // clang-format off
 static struct file_system_type ntfs_fs_type = {
 	.owner			= THIS_MODULE,
 	.name			= "ntfs3",
 	.init_fs_context	= ntfs_init_fs_context,
 	.parameters		= ntfs_fs_parameters,
-	.kill_sb		= kill_block_super,
+	.kill_sb		= ntfs3_kill_sb,
 	.fs_flags		= FS_REQUIRES_DEV | FS_ALLOW_IDMAP,
 };
 // clang-format on
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH 06/13] xfs: close the external block devices in xfs_mount_free
  2023-08-09 22:05 ` [PATCH 06/13] xfs: close the external block devices in xfs_mount_free Christoph Hellwig
@ 2023-08-09 22:34   ` Darrick J. Wong
  0 siblings, 0 replies; 31+ messages in thread
From: Darrick J. Wong @ 2023-08-09 22:34 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Al Viro, Christian Brauner, Namjae Jeon, Sungjong Seo,
	Theodore Ts'o, Andreas Dilger, Konstantin Komarov,
	linux-fsdevel, linux-ext4, ntfs3, linux-xfs

On Wed, Aug 09, 2023 at 03:05:38PM -0700, Christoph Hellwig wrote:
> blkdev_put must not be called under sb->s_umount to avoid a lock order
> reversal with disk->open_mutex.  Move closing the buftargs into ->kill_sb
> to archive that.  Note that the flushing of the disk caches and
> block device mapping invalidated needs to stay in ->put_super as the main
> block device is closed in kill_block_super already.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks good!
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/xfs_buf.c   |  2 --
>  fs/xfs/xfs_super.c | 36 ++++++++++++++++++++++++++----------
>  2 files changed, 26 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index e33eb17648dfed..3b903f6bce98d8 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -1945,8 +1945,6 @@ xfs_free_buftarg(
>  	percpu_counter_destroy(&btp->bt_io_count);
>  	list_lru_destroy(&btp->bt_lru);
>  
> -	blkdev_issue_flush(btp->bt_bdev);
> -	invalidate_bdev(btp->bt_bdev);
>  	fs_put_dax(btp->bt_daxdev, btp->bt_mount);
>  	/* the main block device is closed by kill_block_super */
>  	if (bdev != btp->bt_mount->m_super->s_bdev)
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index 368c05a2dea5b9..4ae3b01ed038c7 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -396,14 +396,19 @@ xfs_blkdev_get(
>  }
>  
>  STATIC void
> -xfs_close_devices(
> +xfs_shutdown_devices(
>  	struct xfs_mount	*mp)
>  {
> -	if (mp->m_logdev_targp && mp->m_logdev_targp != mp->m_ddev_targp)
> -		xfs_free_buftarg(mp->m_logdev_targp);
> -	if (mp->m_rtdev_targp)
> -		xfs_free_buftarg(mp->m_rtdev_targp);
> -	xfs_free_buftarg(mp->m_ddev_targp);
> +	if (mp->m_logdev_targp && mp->m_logdev_targp != mp->m_ddev_targp) {
> +		blkdev_issue_flush(mp->m_logdev_targp->bt_bdev);
> +		invalidate_bdev(mp->m_logdev_targp->bt_bdev);
> +	}
> +	if (mp->m_rtdev_targp) {
> +		blkdev_issue_flush(mp->m_rtdev_targp->bt_bdev);
> +		invalidate_bdev(mp->m_rtdev_targp->bt_bdev);
> +	}
> +	blkdev_issue_flush(mp->m_ddev_targp->bt_bdev);
> +	invalidate_bdev(mp->m_ddev_targp->bt_bdev);
>  }
>  
>  /*
> @@ -741,6 +746,17 @@ static void
>  xfs_mount_free(
>  	struct xfs_mount	*mp)
>  {
> +	/*
> +	 * Free the buftargs here because blkdev_put needs to be called outside
> +	 * of sb->s_umount, which is held around the call to ->put_super.
> +	 */
> +	if (mp->m_logdev_targp && mp->m_logdev_targp != mp->m_ddev_targp)
> +		xfs_free_buftarg(mp->m_logdev_targp);
> +	if (mp->m_rtdev_targp)
> +		xfs_free_buftarg(mp->m_rtdev_targp);
> +	if (mp->m_ddev_targp)
> +		xfs_free_buftarg(mp->m_ddev_targp);
> +
>  	kfree(mp->m_rtname);
>  	kfree(mp->m_logname);
>  	kmem_free(mp);
> @@ -1126,7 +1142,7 @@ xfs_fs_put_super(
>  	xfs_inodegc_free_percpu(mp);
>  	xfs_destroy_percpu_counters(mp);
>  	xfs_destroy_mount_workqueues(mp);
> -	xfs_close_devices(mp);
> +	xfs_shutdown_devices(mp);
>  }
>  
>  static long
> @@ -1499,7 +1515,7 @@ xfs_fs_fill_super(
>  
>  	error = xfs_init_mount_workqueues(mp);
>  	if (error)
> -		goto out_close_devices;
> +		goto out_shutdown_devices;
>  
>  	error = xfs_init_percpu_counters(mp);
>  	if (error)
> @@ -1713,8 +1729,8 @@ xfs_fs_fill_super(
>  	xfs_destroy_percpu_counters(mp);
>   out_destroy_workqueues:
>  	xfs_destroy_mount_workqueues(mp);
> - out_close_devices:
> -	xfs_close_devices(mp);
> + out_shutdown_devices:
> +	xfs_shutdown_devices(mp);
>  	return error;
>  
>   out_unmount:
> -- 
> 2.39.2
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 07/13] xfs: document the invalidate_bdev call in invalidate_bdev
  2023-08-09 22:05 ` [PATCH 07/13] xfs: document the invalidate_bdev call in invalidate_bdev Christoph Hellwig
@ 2023-08-09 22:39   ` Darrick J. Wong
  2023-08-10  8:20     ` Christian Brauner
  2023-08-10 15:53     ` Christoph Hellwig
  2023-08-10 15:22   ` Matthew Wilcox
  1 sibling, 2 replies; 31+ messages in thread
From: Darrick J. Wong @ 2023-08-09 22:39 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Al Viro, Christian Brauner, Namjae Jeon, Sungjong Seo,
	Theodore Ts'o, Andreas Dilger, Konstantin Komarov,
	linux-fsdevel, linux-ext4, ntfs3, linux-xfs

On Wed, Aug 09, 2023 at 03:05:39PM -0700, Christoph Hellwig wrote:
> Copy and paste the commit message from Darrick into a comment to explain
> the seemly odd invalidate_bdev in xfs_shutdown_devices.

      ^ seemingly?
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/xfs/xfs_super.c | 26 ++++++++++++++++++++++++++
>  1 file changed, 26 insertions(+)
> 
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index 4ae3b01ed038c7..c169beb0d8cab3 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -399,6 +399,32 @@ STATIC void
>  xfs_shutdown_devices(
>  	struct xfs_mount	*mp)
>  {
> +	/*
> +	 * Udev is triggered whenever anyone closes a block device or unmounts
> +	 * a file systemm on a block device.
> +	 * The default udev rules invoke blkid to read the fs super and create
> +	 * symlinks to the bdev under /dev/disk.  For this, it uses buffered
> +	 * reads through the page cache.
> +	 *
> +	 * xfs_db also uses buffered reads to examine metadata.  There is no
> +	 * coordination between xfs_db and udev, which means that they can run
> +	 * concurrently.  Note there is no coordination between the kernel and
> +	 * blkid either.
> +	 *
> +	 * On a system with 64k pages, the page cache can cache the superblock
> +	 * and the root inode (and hence the root directory) with the same 64k
> +	 * page.  If udev spawns blkid after the mkfs and the system is busy
> +	 * enough that it is still running when xfs_db starts up, they'll both
> +	 * read from the same page in the pagecache.
> +	 *
> +	 * The unmount writes updated inode metadata to disk directly.  The XFS
> +	 * buffer cache does not use the bdev pagecache, nor does it invalidate
> +	 * the pagecache on umount.  If the above scenario occurs, the pagecache

This sentence reads a little strangely, since "nor does it invalidate"
would seem to conflict with the invalidate_bdev call below.  I suggest
changing the verb a bit:

"The XFS buffer cache does not use the bdev pagecache, so it needs to
invalidate that pagecache on unmount."

With those two things changed,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D


> +	 * no longer reflects what's on disk, xfs_db reads the stale metadata,
> +	 * and fails to find /a.  Most of the time this succeeds because closing
> +	 * a bdev invalidates the page cache, but when processes race, everyone
> +	 * loses.
> +	 */
>  	if (mp->m_logdev_targp && mp->m_logdev_targp != mp->m_ddev_targp) {
>  		blkdev_issue_flush(mp->m_logdev_targp->bt_bdev);
>  		invalidate_bdev(mp->m_logdev_targp->bt_bdev);
> -- 
> 2.39.2
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 07/13] xfs: document the invalidate_bdev call in invalidate_bdev
  2023-08-09 22:39   ` Darrick J. Wong
@ 2023-08-10  8:20     ` Christian Brauner
  2023-08-10 15:53     ` Christoph Hellwig
  1 sibling, 0 replies; 31+ messages in thread
From: Christian Brauner @ 2023-08-10  8:20 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Al Viro, Namjae Jeon, Sungjong Seo,
	Theodore Ts'o, Andreas Dilger, Konstantin Komarov,
	linux-fsdevel, linux-ext4, ntfs3, linux-xfs

On Wed, Aug 09, 2023 at 03:39:23PM -0700, Darrick J. Wong wrote:
> On Wed, Aug 09, 2023 at 03:05:39PM -0700, Christoph Hellwig wrote:
> > Copy and paste the commit message from Darrick into a comment to explain
> > the seemly odd invalidate_bdev in xfs_shutdown_devices.
> 
>       ^ seemingly?
> > 
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > ---
> >  fs/xfs/xfs_super.c | 26 ++++++++++++++++++++++++++
> >  1 file changed, 26 insertions(+)
> > 
> > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> > index 4ae3b01ed038c7..c169beb0d8cab3 100644
> > --- a/fs/xfs/xfs_super.c
> > +++ b/fs/xfs/xfs_super.c
> > @@ -399,6 +399,32 @@ STATIC void
> >  xfs_shutdown_devices(
> >  	struct xfs_mount	*mp)
> >  {
> > +	/*
> > +	 * Udev is triggered whenever anyone closes a block device or unmounts
> > +	 * a file systemm on a block device.
> > +	 * The default udev rules invoke blkid to read the fs super and create
> > +	 * symlinks to the bdev under /dev/disk.  For this, it uses buffered
> > +	 * reads through the page cache.
> > +	 *
> > +	 * xfs_db also uses buffered reads to examine metadata.  There is no
> > +	 * coordination between xfs_db and udev, which means that they can run
> > +	 * concurrently.  Note there is no coordination between the kernel and
> > +	 * blkid either.
> > +	 *
> > +	 * On a system with 64k pages, the page cache can cache the superblock
> > +	 * and the root inode (and hence the root directory) with the same 64k
> > +	 * page.  If udev spawns blkid after the mkfs and the system is busy
> > +	 * enough that it is still running when xfs_db starts up, they'll both
> > +	 * read from the same page in the pagecache.
> > +	 *
> > +	 * The unmount writes updated inode metadata to disk directly.  The XFS
> > +	 * buffer cache does not use the bdev pagecache, nor does it invalidate
> > +	 * the pagecache on umount.  If the above scenario occurs, the pagecache
> 
> This sentence reads a little strangely, since "nor does it invalidate"
> would seem to conflict with the invalidate_bdev call below.  I suggest
> changing the verb a bit:
> 
> "The XFS buffer cache does not use the bdev pagecache, so it needs to
> invalidate that pagecache on unmount."
> 
> With those two things changed,
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>

Fixed in-tree.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 01/13] xfs: reformat the xfs_fs_free prototype
  2023-08-09 22:05 ` [PATCH 01/13] xfs: reformat the xfs_fs_free prototype Christoph Hellwig
@ 2023-08-10  8:53   ` Christian Brauner
  2023-08-10 15:51     ` Christoph Hellwig
  0 siblings, 1 reply; 31+ messages in thread
From: Christian Brauner @ 2023-08-10  8:53 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Christian Brauner, Namjae Jeon, Sungjong Seo, Theodore Ts'o,
	Andreas Dilger, Konstantin Komarov, Darrick J. Wong,
	linux-fsdevel, linux-ext4, ntfs3, linux-xfs, Al Viro

On Wed, 09 Aug 2023 15:05:33 -0700, Christoph Hellwig wrote:
> The xfs_fs_free prototype formatting is a weird mix of the classic XFS
> style and the Linux style.  Fix it up to be consistent.
> 
> 

I've completely reshuffled that branch now:

(1) bd_super removal
(2) ->kill_sb() fixes
(3) setup_bdev_super() work (open devices after sb creation)
(4) fs_holder_ops rework
(5) exclusive sb creation

I think that's the most natural order. That involved quite a bit of
massaging, so please make sure that everything's in proper order. The
most sensitive and error prone part of the reordering is that before the
switch from fs_type to sb as holder took place prior to this series.
Whereas due to the reordering the switch now takes place after this
series. This is good though because we can't use setup_bdev_super()
correctly anyway before we have the ->kill_sb() changes done.

git rebase -i v6.5-rc1 -x "make fs/"

builds cleanly. Please double check!

---

Applied to the vfs.super branch of the vfs/vfs.git tree.
Patches in the vfs.super branch should appear in linux-next soon.

Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.

It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.

Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs.super

[01/13] xfs: reformat the xfs_fs_free prototype
        https://git.kernel.org/vfs/vfs/c/183f60c58f51
[02/13] xfs: remove a superfluous s_fs_info NULL check in xfs_fs_put_super
        https://git.kernel.org/vfs/vfs/c/94d1bb71d395
[03/13] xfs: free the xfs_mount in ->kill_sb
        https://git.kernel.org/vfs/vfs/c/b92fea73ce6a
[04/13] xfs: remove xfs_blkdev_put
        https://git.kernel.org/vfs/vfs/c/3b6c117834c2
[05/13] xfs: close the RT and log block devices in xfs_free_buftarg
        https://git.kernel.org/vfs/vfs/c/6d4e81f94e80
[06/13] xfs: close the external block devices in xfs_mount_free
        https://git.kernel.org/vfs/vfs/c/bfeb8750e6fe
[07/13] xfs: document the invalidate_bdev call in invalidate_bdev
        https://git.kernel.org/vfs/vfs/c/e4676171bad6
[08/13] ext4: close the external journal device in ->kill_sb
        https://git.kernel.org/vfs/vfs/c/e1a1b0fba97b
[09/13] exfat: don't RCU-free the sbi
        https://git.kernel.org/vfs/vfs/c/6a3f5dee46f6
[10/13] exfat: free the sbi and iocharset in ->kill_sb
        https://git.kernel.org/vfs/vfs/c/cc0313e8135e
[11/13] ntfs3: rename put_ntfs ntfs3_free_sbi
        https://git.kernel.org/vfs/vfs/c/34371bb44afc
[12/13] ntfs3: don't call sync_blockdev in ntfs_put_super
        https://git.kernel.org/vfs/vfs/c/91952e99005b
[13/13] ntfs3: free the sbi in ->kill_sb
        https://git.kernel.org/vfs/vfs/c/5fb25fde7dee

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 09/13] exfat: don't RCU-free the sbi
  2023-08-09 22:05 ` [PATCH 09/13] exfat: don't RCU-free the sbi Christoph Hellwig
@ 2023-08-10 13:01   ` Namjae Jeon
  0 siblings, 0 replies; 31+ messages in thread
From: Namjae Jeon @ 2023-08-10 13:01 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Al Viro, Christian Brauner, Sungjong Seo, Theodore Ts'o,
	Andreas Dilger, Konstantin Komarov, Darrick J. Wong,
	linux-fsdevel, linux-ext4, ntfs3, linux-xfs

2023-08-10 7:05 GMT+09:00, Christoph Hellwig <hch@lst.de>:
> There are no RCU critical sections for accessing any information in the
> sbi, so drop the call_rcu indirection for freeing the sbi.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>

Thanks!

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 10/13] exfat: free the sbi and iocharset in ->kill_sb
  2023-08-09 22:05 ` [PATCH 10/13] exfat: free the sbi and iocharset in ->kill_sb Christoph Hellwig
@ 2023-08-10 13:02   ` Namjae Jeon
  0 siblings, 0 replies; 31+ messages in thread
From: Namjae Jeon @ 2023-08-10 13:02 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Al Viro, Christian Brauner, Sungjong Seo, Theodore Ts'o,
	Andreas Dilger, Konstantin Komarov, Darrick J. Wong,
	linux-fsdevel, linux-ext4, ntfs3, linux-xfs

2023-08-10 7:05 GMT+09:00, Christoph Hellwig <hch@lst.de>:
> As a rule of thumb everything allocated to the fs_context and moved into
> the super_block should be freed by ->kill_sb so that the teardown
> handling doesn't need to be duplicated between the fill_super error
> path and put_super.  Implement an exfat-specific kill_sb method to do
> that and share the code with the mount contex free helper for the
> mount error handling case.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>

Thanks!

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 07/13] xfs: document the invalidate_bdev call in invalidate_bdev
  2023-08-09 22:05 ` [PATCH 07/13] xfs: document the invalidate_bdev call in invalidate_bdev Christoph Hellwig
  2023-08-09 22:39   ` Darrick J. Wong
@ 2023-08-10 15:22   ` Matthew Wilcox
  2023-08-10 15:52     ` Christoph Hellwig
  2023-08-10 15:57     ` Darrick J. Wong
  1 sibling, 2 replies; 31+ messages in thread
From: Matthew Wilcox @ 2023-08-10 15:22 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Al Viro, Christian Brauner, Namjae Jeon, Sungjong Seo,
	Theodore Ts'o, Andreas Dilger, Konstantin Komarov,
	Darrick J. Wong, linux-fsdevel, linux-ext4, ntfs3, linux-xfs

On Wed, Aug 09, 2023 at 03:05:39PM -0700, Christoph Hellwig wrote:
> +	/*
> +	 * Udev is triggered whenever anyone closes a block device or unmounts
> +	 * a file systemm on a block device.
> +	 * The default udev rules invoke blkid to read the fs super and create
> +	 * symlinks to the bdev under /dev/disk.  For this, it uses buffered
> +	 * reads through the page cache.
> +	 *
> +	 * xfs_db also uses buffered reads to examine metadata.  There is no
> +	 * coordination between xfs_db and udev, which means that they can run
> +	 * concurrently.  Note there is no coordination between the kernel and
> +	 * blkid either.
> +	 *
> +	 * On a system with 64k pages, the page cache can cache the superblock
> +	 * and the root inode (and hence the root directory) with the same 64k
> +	 * page.  If udev spawns blkid after the mkfs and the system is busy
> +	 * enough that it is still running when xfs_db starts up, they'll both
> +	 * read from the same page in the pagecache.
> +	 *
> +	 * The unmount writes updated inode metadata to disk directly.  The XFS
> +	 * buffer cache does not use the bdev pagecache, nor does it invalidate
> +	 * the pagecache on umount.  If the above scenario occurs, the pagecache
> +	 * no longer reflects what's on disk, xfs_db reads the stale metadata,
> +	 * and fails to find /a.  Most of the time this succeeds because closing
> +	 * a bdev invalidates the page cache, but when processes race, everyone
> +	 * loses.
> +	 */
>  	if (mp->m_logdev_targp && mp->m_logdev_targp != mp->m_ddev_targp) {
>  		blkdev_issue_flush(mp->m_logdev_targp->bt_bdev);
>  		invalidate_bdev(mp->m_logdev_targp->bt_bdev);

While I have no complaints with this as a commit message, it's just too
verbose for an inline comment, IMO.  Something pithier and more generic
would seem appropriate.  How about:

	/*
	 * Prevent userspace (eg blkid or xfs_db) from seeing stale data.
	 * XFS is not coherent with the bdev's page cache.
	 */

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 01/13] xfs: reformat the xfs_fs_free prototype
  2023-08-10  8:53   ` Christian Brauner
@ 2023-08-10 15:51     ` Christoph Hellwig
  2023-08-11  7:24       ` Christian Brauner
  0 siblings, 1 reply; 31+ messages in thread
From: Christoph Hellwig @ 2023-08-10 15:51 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Christoph Hellwig, Namjae Jeon, Sungjong Seo, Theodore Ts'o,
	Andreas Dilger, Konstantin Komarov, Darrick J. Wong,
	linux-fsdevel, linux-ext4, ntfs3, linux-xfs, Al Viro

I think the btrfs hunk (below) in "fs: use the super_block as holder when
mounting file systems" needs to be dropped, as we dropped the prep patch
that allows to use the sb as a holder for now.  I'll add it to my resend
of the btrfs conversion.

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index f1dd172d8d5bd7..d58ace4c1d2962 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -69,8 +69,6 @@ static const struct super_operations btrfs_super_ops;
  * requested by subvol=/path. That way the callchain is straightforward and we
  * don't have to play tricks with the mount options and recursive calls to
  * btrfs_mount.
- *
- * The new btrfs_root_fs_type also servers as a tag for the bdev_holder.
  */
 static struct file_system_type btrfs_fs_type;
 static struct file_system_type btrfs_root_fs_type;
@@ -1515,7 +1513,7 @@ static struct dentry *btrfs_mount_root(struct file_system_type *fs_type,
 		snprintf(s->s_id, sizeof(s->s_id), "%pg", bdev);
 		shrinker_debugfs_rename(&s->s_shrink, "sb-%s:%s", fs_type->name,
 					s->s_id);
-		btrfs_sb(s)->bdev_holder = fs_type;
+		fs_info->bdev_holder = s;
 		error = btrfs_fill_super(s, fs_devices, data);
 	}
 	if (!error)

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH 07/13] xfs: document the invalidate_bdev call in invalidate_bdev
  2023-08-10 15:22   ` Matthew Wilcox
@ 2023-08-10 15:52     ` Christoph Hellwig
  2023-08-10 16:00       ` Darrick J. Wong
  2023-08-10 15:57     ` Darrick J. Wong
  1 sibling, 1 reply; 31+ messages in thread
From: Christoph Hellwig @ 2023-08-10 15:52 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Christoph Hellwig, Al Viro, Christian Brauner, Namjae Jeon,
	Sungjong Seo, Theodore Ts'o, Andreas Dilger,
	Konstantin Komarov, Darrick J. Wong, linux-fsdevel, linux-ext4,
	ntfs3, linux-xfs

On Thu, Aug 10, 2023 at 04:22:15PM +0100, Matthew Wilcox wrote:
> >  		blkdev_issue_flush(mp->m_logdev_targp->bt_bdev);
> >  		invalidate_bdev(mp->m_logdev_targp->bt_bdev);
> 
> While I have no complaints with this as a commit message, it's just too
> verbose for an inline comment, IMO.  Something pithier and more generic
> would seem appropriate.  How about:
> 
> 	/*
> 	 * Prevent userspace (eg blkid or xfs_db) from seeing stale data.
> 	 * XFS is not coherent with the bdev's page cache.
> 	 */

Well, this completely misses the point.  The point is that XFS should
never have to invalidate the page cache because it's not using it,
but it has to due to weird races.  I tried to condese the message but
I could not come up with a good one that's not losing information.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 07/13] xfs: document the invalidate_bdev call in invalidate_bdev
  2023-08-09 22:39   ` Darrick J. Wong
  2023-08-10  8:20     ` Christian Brauner
@ 2023-08-10 15:53     ` Christoph Hellwig
  1 sibling, 0 replies; 31+ messages in thread
From: Christoph Hellwig @ 2023-08-10 15:53 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Al Viro, Christian Brauner, Namjae Jeon,
	Sungjong Seo, Theodore Ts'o, Andreas Dilger,
	Konstantin Komarov, linux-fsdevel, linux-ext4, ntfs3, linux-xfs

On Wed, Aug 09, 2023 at 03:39:23PM -0700, Darrick J. Wong wrote:
> > +	 * read from the same page in the pagecache.
> > +	 *
> > +	 * The unmount writes updated inode metadata to disk directly.  The XFS
> > +	 * buffer cache does not use the bdev pagecache, nor does it invalidate
> > +	 * the pagecache on umount.  If the above scenario occurs, the pagecache
> 
> This sentence reads a little strangely, since "nor does it invalidate"
> would seem to conflict with the invalidate_bdev call below.  I suggest
> changing the verb a bit:
> 
> "The XFS buffer cache does not use the bdev pagecache, so it needs to
> invalidate that pagecache on unmount."

Agreed. I'll forward it to the original author of the sentence time
permitting :)


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 07/13] xfs: document the invalidate_bdev call in invalidate_bdev
  2023-08-10 15:22   ` Matthew Wilcox
  2023-08-10 15:52     ` Christoph Hellwig
@ 2023-08-10 15:57     ` Darrick J. Wong
  1 sibling, 0 replies; 31+ messages in thread
From: Darrick J. Wong @ 2023-08-10 15:57 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Christoph Hellwig, Al Viro, Christian Brauner, Namjae Jeon,
	Sungjong Seo, Theodore Ts'o, Andreas Dilger,
	Konstantin Komarov, linux-fsdevel, linux-ext4, ntfs3, linux-xfs

On Thu, Aug 10, 2023 at 04:22:15PM +0100, Matthew Wilcox wrote:
> On Wed, Aug 09, 2023 at 03:05:39PM -0700, Christoph Hellwig wrote:
> > +	/*
> > +	 * Udev is triggered whenever anyone closes a block device or unmounts
> > +	 * a file systemm on a block device.
> > +	 * The default udev rules invoke blkid to read the fs super and create
> > +	 * symlinks to the bdev under /dev/disk.  For this, it uses buffered
> > +	 * reads through the page cache.
> > +	 *
> > +	 * xfs_db also uses buffered reads to examine metadata.  There is no
> > +	 * coordination between xfs_db and udev, which means that they can run
> > +	 * concurrently.  Note there is no coordination between the kernel and
> > +	 * blkid either.
> > +	 *
> > +	 * On a system with 64k pages, the page cache can cache the superblock
> > +	 * and the root inode (and hence the root directory) with the same 64k
> > +	 * page.  If udev spawns blkid after the mkfs and the system is busy
> > +	 * enough that it is still running when xfs_db starts up, they'll both
> > +	 * read from the same page in the pagecache.
> > +	 *
> > +	 * The unmount writes updated inode metadata to disk directly.  The XFS
> > +	 * buffer cache does not use the bdev pagecache, nor does it invalidate
> > +	 * the pagecache on umount.  If the above scenario occurs, the pagecache
> > +	 * no longer reflects what's on disk, xfs_db reads the stale metadata,
> > +	 * and fails to find /a.  Most of the time this succeeds because closing
> > +	 * a bdev invalidates the page cache, but when processes race, everyone
> > +	 * loses.
> > +	 */
> >  	if (mp->m_logdev_targp && mp->m_logdev_targp != mp->m_ddev_targp) {
> >  		blkdev_issue_flush(mp->m_logdev_targp->bt_bdev);
> >  		invalidate_bdev(mp->m_logdev_targp->bt_bdev);
> 
> While I have no complaints with this as a commit message, it's just too
> verbose for an inline comment, IMO.  Something pithier and more generic
> would seem appropriate.  How about:
> 
> 	/*
> 	 * Prevent userspace (eg blkid or xfs_db) from seeing stale data.
> 	 * XFS is not coherent with the bdev's page cache.

"XFS' buffer cache is not coherent with the bdev's page cache."
?

--D

> 	 */

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 07/13] xfs: document the invalidate_bdev call in invalidate_bdev
  2023-08-10 15:52     ` Christoph Hellwig
@ 2023-08-10 16:00       ` Darrick J. Wong
  0 siblings, 0 replies; 31+ messages in thread
From: Darrick J. Wong @ 2023-08-10 16:00 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Matthew Wilcox, Al Viro, Christian Brauner, Namjae Jeon,
	Sungjong Seo, Theodore Ts'o, Andreas Dilger,
	Konstantin Komarov, linux-fsdevel, linux-ext4, ntfs3, linux-xfs

On Thu, Aug 10, 2023 at 05:52:25PM +0200, Christoph Hellwig wrote:
> On Thu, Aug 10, 2023 at 04:22:15PM +0100, Matthew Wilcox wrote:
> > >  		blkdev_issue_flush(mp->m_logdev_targp->bt_bdev);
> > >  		invalidate_bdev(mp->m_logdev_targp->bt_bdev);
> > 
> > While I have no complaints with this as a commit message, it's just too
> > verbose for an inline comment, IMO.  Something pithier and more generic
> > would seem appropriate.  How about:
> > 
> > 	/*
> > 	 * Prevent userspace (eg blkid or xfs_db) from seeing stale data.
> > 	 * XFS is not coherent with the bdev's page cache.
> > 	 */
> 
> Well, this completely misses the point.  The point is that XFS should
> never have to invalidate the page cache because it's not using it,
> but it has to due to weird races.  I tried to condese the message but
> I could not come up with a good one that's not losing information.

Agreed -- it took me a while to set up an arm64 box with just the right
debugging info to figure out why certain fstests were flaky.  I do think
it's useful (despite my other reply to willy) to retain the defect
details for hard-to-reproduce errors, and the only way to do that
without encountering the dead url problem is to dump it in a huge
commit message or a comment.

(Too bad there's no way to have a commit whose code comments reference
the commit id of that commit to say "Hey, you need to read this commit
before you touch this line"...)

--D

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 01/13] xfs: reformat the xfs_fs_free prototype
  2023-08-10 15:51     ` Christoph Hellwig
@ 2023-08-11  7:24       ` Christian Brauner
  0 siblings, 0 replies; 31+ messages in thread
From: Christian Brauner @ 2023-08-11  7:24 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Namjae Jeon, Sungjong Seo, Theodore Ts'o, Andreas Dilger,
	Konstantin Komarov, Darrick J. Wong, linux-fsdevel, linux-ext4,
	ntfs3, linux-xfs, Al Viro

On Thu, Aug 10, 2023 at 05:51:16PM +0200, Christoph Hellwig wrote:
> I think the btrfs hunk (below) in "fs: use the super_block as holder when
> mounting file systems" needs to be dropped, as we dropped the prep patch

Ah yes, thanks!

> that allows to use the sb as a holder for now.  I'll add it to my resend
> of the btrfs conversion.

Dropped.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 13/13] ntfs3: free the sbi in ->kill_sb
  2023-08-09 22:05 ` [PATCH 13/13] ntfs3: free the sbi in ->kill_sb Christoph Hellwig
@ 2023-09-07 13:05   ` Guenter Roeck
  2023-09-07 13:54     ` Christian Brauner
  0 siblings, 1 reply; 31+ messages in thread
From: Guenter Roeck @ 2023-09-07 13:05 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Al Viro, Christian Brauner, Namjae Jeon, Sungjong Seo,
	Theodore Ts'o, Andreas Dilger, Konstantin Komarov,
	Darrick J. Wong, linux-fsdevel, linux-ext4, ntfs3, linux-xfs

On Wed, Aug 09, 2023 at 03:05:45PM -0700, Christoph Hellwig wrote:
> As a rule of thumb everything allocated to the fs_context and moved into
> the super_block should be freed by ->kill_sb so that the teardown
> handling doesn't need to be duplicated between the fill_super error
> path and put_super.  Implement an ntfs3-specific kill_sb method to do
> that.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Christian Brauner <brauner@kernel.org>

This patch results in:

[   10.136703] ------------[ cut here ]------------
ILLOPC: ffffffffbbcb48c0: 0f 0b
[   10.136841] VFS: Busy inodes after unmount of sdb (ntfs3)
[   10.138019] WARNING: CPU: 0 PID: 188 at fs/super.c:695 generic_shutdown_super+0x100/0x160
[   10.138241] Modules linked in:
[   10.138417] CPU: 0 PID: 188 Comm: umount Not tainted 6.5.0-12107-g7ba2090ca64e #1
[   10.138541] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
[   10.138736] RIP: 0010:generic_shutdown_super+0x100/0x160
[   10.138857] Code: cc cc e8 b3 a1 f7 ff 48 8b bb 38 01 00 00 eb d9 48 8b 43 28 48 8d b3 20 06 00 00 48 c7 c7 48 f8 62 bd 48 8b 10 e8 d0 1a dd ff <0f> 0b 4c 8d a3 80 08 00 00 4c 89 e7 e8 
2f 51 06 01 48 8b 8b c0 08
[   10.139177] RSP: 0018:ffff9d8b004f3e60 EFLAGS: 00000282
[   10.139281] RAX: 0000000000000000 RBX: ffff979982dc1000 RCX: 0000000000000027
[   10.139378] RDX: ffff9799bec2c888 RSI: 0000000000000001 RDI: ffff9799bec2c880
[   10.139485] RBP: ffff979982dc18c0 R08: ffffffffbd956908 R09: 00000000ffffdfff
[   10.139584] R10: ffffffffbd876920 R11: ffffffffbd929cc8 R12: ffff979981418040
[   10.139681] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[   10.139801] FS:  00007f6c00b1eb48(0000) GS:ffff9799bec00000(0000) knlGS:0000000000000000
[   10.139978] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   10.140096] CR2: 00007ffd8265f60f CR3: 0000000003268000 CR4: 00000000000406f0
[   10.140249] Call Trace:
[   10.140377]  <TASK>
[   10.140471]  ? __warn+0x7f/0x160
[   10.140561]  ? generic_shutdown_super+0x100/0x160
[   10.140636]  ? report_bug+0x199/0x1b0
[   10.140696]  ? prb_read_valid+0x16/0x20
[   10.140762]  ? handle_bug+0x3c/0x70
[   10.140828]  ? exc_invalid_op+0x18/0x70
[   10.140937]  ? asm_exc_invalid_op+0x1a/0x20
[   10.141036]  ? generic_shutdown_super+0x100/0x160
[   10.141129]  kill_block_super+0x16/0x40
[   10.141209]  ntfs3_kill_sb+0x13/0x50
[   10.141275]  deactivate_locked_super+0x30/0xa0
[   10.141344]  cleanup_mnt+0xfb/0x150
[   10.141414]  task_work_run+0x58/0xa0
[   10.141475]  exit_to_user_mode_prepare+0x108/0x110
[   10.141549]  syscall_exit_to_user_mode+0x21/0x50
[   10.141618]  do_syscall_64+0x4c/0x90
[   10.141677]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[   10.141785] RIP: 0033:0x7f6c00aa421d
[   10.142030] Code: 05 48 89 c7 e8 d2 e8 ff ff 5a c3 31 f6 50 b8 a6 00 00 00 0f 05 48 89 c7 e8 be e8 ff ff 5a c3 48 63 f6 50 b8 a6 00 00 00 0f 05 <48> 89 c7 e8 a9 e8 ff ff 5a c3 49 89 ca 
50 48 63 ff 4d 63 c0 b8 2f
[   10.142241] RSP: 002b:00007ffd15c5ad90 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
[   10.142349] RAX: 0000000000000000 RBX: 00007f6c00b1fdc0 RCX: 00007f6c00aa421d
[   10.142437] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00007f6c00b1f9e0
[   10.142524] RBP: 00007f6c00b1f9f0 R08: 00007f6c00b1f9f0 R09: 0000000000000012
[   10.142611] R10: 00007f6c00b1f9fc R11: 0000000000000246 R12: 00007f6c00b1fdc0
[   10.142698] R13: 00007f6c00b1f9e0 R14: 0000000000000000 R15: 00007ffd15c5aee8
[   10.142828]  </TASK>
[   10.142966] ---[ end trace 0000000000000000 ]---
[   10.151159] general protection fault, probably for non-canonical address 0xdead000000000125: 0000 [#1] PREEMPT SMP PTI
[   10.151393] CPU: 0 PID: 188 Comm: umount Tainted: G        W          6.5.0-12107-g7ba2090ca64e #1
[   10.151504] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
[   10.151626] RIP: 0010:iput.part.0+0xbd/0x230
[   10.151687] Code: 05 c0 8b e5 01 48 85 c0 74 0c 48 8b 78 08 48 89 ee e8 f7 62 01 00 65 ff 0d 28 33 36 44 75 a5 0f 1f 44 00 00 eb 9e 48 8b 5d 28 <48> 8b 4b 30 a8 08 0f 85 d7 00 00 00 48 
8b 49 28 48 85 c9 0f 84 b6
[   10.151884] RSP: 0018:ffff9d8b004f3e80 EFLAGS: 00000246
[   10.151962] RAX: 0000000000000000 RBX: dead0000000000f5 RCX: ffff9799831c2458
[   10.152045] RDX: 0000000000000001 RSI: 00000000b39f7e05 RDI: ffff9799817a89d0
[   10.152126] RBP: ffff9799817a8948 R08: 00000000ffffffff R09: 00000000ffffffff
[   10.152206] R10: ffff9799817fc2c8 R11: 0000000000000000 R12: ffff9799817a89d0
[   10.152287] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[   10.152367] FS:  00007f6c00b1eb48(0000) GS:ffff9799bec00000(0000) knlGS:0000000000000000
[   10.152459] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   10.152528] CR2: dead000000000125 CR3: 0000000003268000 CR4: 00000000000406f0
[   10.152609] Call Trace:
[   10.152644]  <TASK>
[   10.152678]  ? die_addr+0x32/0x90
[   10.152730]  ? exc_general_protection+0x1a8/0x3d0
[   10.152800]  ? asm_exc_general_protection+0x26/0x30
[   10.152869]  ? iput.part.0+0xbd/0x230
[   10.152940]  ntfs3_free_sbi+0x5d/0x110
[   10.153001]  deactivate_locked_super+0x30/0xa0
[   10.153078]  cleanup_mnt+0xfb/0x150
[   10.153132]  task_work_run+0x58/0xa0
[   10.153200]  exit_to_user_mode_prepare+0x108/0x110
[   10.153266]  syscall_exit_to_user_mode+0x21/0x50
[   10.153327]  do_syscall_64+0x4c/0x90
[   10.153384]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[   10.153449] RIP: 0033:0x7f6c00aa421d
[   10.153499] Code: 05 48 89 c7 e8 d2 e8 ff ff 5a c3 31 f6 50 b8 a6 00 00 00 0f 05 48 89 c7 e8 be e8 ff ff 5a c3 48 63 f6 50 b8 a6 00 00 00 0f 05 <48> 89 c7 e8 a9 e8 ff ff 5a c3 49 89 ca 
50 48 63 ff 4d 63 c0 b8 2f
[   10.153697] RSP: 002b:00007ffd15c5ad90 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
[   10.153788] RAX: 0000000000000000 RBX: 00007f6c00b1fdc0 RCX: 00007f6c00aa421d
[   10.153869] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00007f6c00b1f9e0
[   10.153949] RBP: 00007f6c00b1f9f0 R08: 00007f6c00b1f9f0 R09: 0000000000000012
[   10.154029] R10: 00007f6c00b1f9fc R11: 0000000000000246 R12: 00007f6c00b1fdc0
[   10.154110] R13: 00007f6c00b1f9e0 R14: 0000000000000000 R15: 00007ffd15c5aee8
[   10.154199]  </TASK>
[   10.154238] Modules linked in:
[   10.154539] ---[ end trace 0000000000000000 ]---
[   10.154666] RIP: 0010:iput.part.0+0xbd/0x230
[   10.154731] Code: 05 c0 8b e5 01 48 85 c0 74 0c 48 8b 78 08 48 89 ee e8 f7 62 01 00 65 ff 0d 28 33 36 44 75 a5 0f 1f 44 00 00 eb 9e 48 8b 5d 28 <48> 8b 4b 30 a8 08 0f 85 d7 00 00 00 48 
8b 49 28 48 85 c9 0f 84 b6
[   10.154999] RSP: 0018:ffff9d8b004f3e80 EFLAGS: 00000246
[   10.155070] RAX: 0000000000000000 RBX: dead0000000000f5 RCX: ffff9799831c2458
[   10.155159] RDX: 0000000000000001 RSI: 00000000b39f7e05 RDI: ffff9799817a89d0
[   10.155261] RBP: ffff9799817a8948 R08: 00000000ffffffff R09: 00000000ffffffff
[   10.155342] R10: ffff9799817fc2c8 R11: 0000000000000000 R12: ffff9799817a89d0
[   10.155423] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[   10.155503] FS:  00007f6c00b1eb48(0000) GS:ffff9799bec00000(0000) knlGS:0000000000000000
[   10.155594] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   10.155662] CR2: dead000000000125 CR3: 0000000003268000 CR4: 00000000000406f0

when trying to mount and unmount an ntfs3 file system.

Guenter

---
# bad: [7ba2090ca64ea1aa435744884124387db1fac70f] Merge tag 'ceph-for-6.6-rc1' of https://github.com/ceph/ceph-client
# good: [830b3c68c1fb1e9176028d02ef86f3cf76aa2476] Linux 6.1
git bisect start 'HEAD' 'v6.1'
# good: [5c7ecada25d2086aee607ff7deb69e77faa4aa92] Merge tag 'f2fs-for-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
git bisect good 5c7ecada25d2086aee607ff7deb69e77faa4aa92
# good: [6c1561fb900524c5bceb924071b3e9b8a67ff3da] Merge tag 'soc-dt-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
git bisect good 6c1561fb900524c5bceb924071b3e9b8a67ff3da
# bad: [68cf01760bc0891074e813b9bb06d2696cac1c01] Merge tag 'v6.6-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
git bisect bad 68cf01760bc0891074e813b9bb06d2696cac1c01
# good: [1c7873e3364570ec89343ff4877e0f27a7b21a61] mm: lock newly mapped VMA with corrected ordering
git bisect good 1c7873e3364570ec89343ff4877e0f27a7b21a61
# good: [b92e8f5472a28e311983f9f47e281e0adf56f10a] btrfs: print block group super and delalloc bytes when dumping space info
git bisect good b92e8f5472a28e311983f9f47e281e0adf56f10a
# bad: [330235e87410349042468b52baff02af7cb7d331] Merge tag 'acpi-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
git bisect bad 330235e87410349042468b52baff02af7cb7d331
# bad: [547635c6ac47c7556d6954935b189defe90422f7] Merge tag 'for-6.6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
git bisect bad 547635c6ac47c7556d6954935b189defe90422f7
# good: [615e95831ec3d428cc554ac12e9439e2d66038d3] Merge tag 'v6.6-vfs.ctime' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
git bisect good 615e95831ec3d428cc554ac12e9439e2d66038d3
# bad: [475d4df82719225510625b4263baa1105665f4b3] Merge tag 'v6.6-vfs.fchmodat2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
git bisect bad 475d4df82719225510625b4263baa1105665f4b3
# good: [de16588a7737b12e63ec646d72b45befb2b1f8f7] Merge tag 'v6.6-vfs.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
git bisect good de16588a7737b12e63ec646d72b45befb2b1f8f7
# bad: [8ffa54e3370c5a8b9538dbe4077fc9c4b5a08f45] xfs use fs_holder_ops for the log and RT devices
git bisect bad 8ffa54e3370c5a8b9538dbe4077fc9c4b5a08f45
# good: [4abc9a43d99ccab7bd71742b86d2f48d8be798c3] exfat: free the sbi and iocharset in ->kill_sb
git bisect good 4abc9a43d99ccab7bd71742b86d2f48d8be798c3
# bad: [4b41828be268544286794c18200e83861de3554e] ext4: make the IS_EXT2_SB/IS_EXT3_SB checks more robust
git bisect bad 4b41828be268544286794c18200e83861de3554e
# bad: [a4f64a300a299f884a1da55d99c97a87061201cd] ntfs3: free the sbi in ->kill_sb
git bisect bad a4f64a300a299f884a1da55d99c97a87061201cd
# good: [5f0fb2210bb34ecd3f7bfde0d8f0068b79b2e094] ntfs3: don't call sync_blockdev in ntfs_put_super
git bisect good 5f0fb2210bb34ecd3f7bfde0d8f0068b79b2e094
# first bad commit: [a4f64a300a299f884a1da55d99c97a87061201cd] ntfs3: free the sbi in ->kill_sb

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 13/13] ntfs3: free the sbi in ->kill_sb
  2023-09-07 13:05   ` Guenter Roeck
@ 2023-09-07 13:54     ` Christian Brauner
  2023-09-07 15:23       ` Guenter Roeck
  0 siblings, 1 reply; 31+ messages in thread
From: Christian Brauner @ 2023-09-07 13:54 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Christoph Hellwig, Al Viro, Namjae Jeon, Sungjong Seo,
	Theodore Ts'o, Andreas Dilger, Konstantin Komarov,
	Darrick J. Wong, linux-fsdevel, linux-ext4, ntfs3, linux-xfs

[-- Attachment #1: Type: text/plain, Size: 652 bytes --]

On Thu, Sep 07, 2023 at 06:05:40AM -0700, Guenter Roeck wrote:
> On Wed, Aug 09, 2023 at 03:05:45PM -0700, Christoph Hellwig wrote:
> > As a rule of thumb everything allocated to the fs_context and moved into
> > the super_block should be freed by ->kill_sb so that the teardown
> > handling doesn't need to be duplicated between the fill_super error
> > path and put_super.  Implement an ntfs3-specific kill_sb method to do
> > that.
> > 
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > Reviewed-by: Christian Brauner <brauner@kernel.org>
> 
> This patch results in:

The appended patch should fix this. Are you able to test it?
I will as well.

[-- Attachment #2: 0001-ntfs3-put-inodes-in-ntfs3_put_super.patch --]
[-- Type: text/x-diff, Size: 1894 bytes --]

From 55d5075cd668eda6a08aaf6569cbc556db8a952b Mon Sep 17 00:00:00 2001
From: Christian Brauner <brauner@kernel.org>
Date: Thu, 7 Sep 2023 15:52:28 +0200
Subject: [PATCH] ntfs3: put inodes in ntfs3_put_super()

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 fs/ntfs3/super.c | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/fs/ntfs3/super.c b/fs/ntfs3/super.c
index 5fffddea554f..4c73afd935e7 100644
--- a/fs/ntfs3/super.c
+++ b/fs/ntfs3/super.c
@@ -571,12 +571,8 @@ static void init_once(void *foo)
 /*
  * Noinline to reduce binary size.
  */
-static noinline void ntfs3_free_sbi(struct ntfs_sb_info *sbi)
+static noinline void ntfs3_put_sbi(struct ntfs_sb_info *sbi)
 {
-	kfree(sbi->new_rec);
-	kvfree(ntfs_put_shared(sbi->upcase));
-	kfree(sbi->def_table);
-
 	wnd_close(&sbi->mft.bitmap);
 	wnd_close(&sbi->used.bitmap);
 
@@ -601,6 +597,13 @@ static noinline void ntfs3_free_sbi(struct ntfs_sb_info *sbi)
 	indx_clear(&sbi->security.index_sdh);
 	indx_clear(&sbi->reparse.index_r);
 	indx_clear(&sbi->objid.index_o);
+}
+
+static noinline void ntfs3_free_sbi(struct ntfs_sb_info *sbi)
+{
+	kfree(sbi->new_rec);
+	kvfree(ntfs_put_shared(sbi->upcase));
+	kfree(sbi->def_table);
 	kfree(sbi->compress.lznt);
 #ifdef CONFIG_NTFS3_LZX_XPRESS
 	xpress_free_decompressor(sbi->compress.xpress);
@@ -625,6 +628,7 @@ static void ntfs_put_super(struct super_block *sb)
 
 	/* Mark rw ntfs as clear, if possible. */
 	ntfs_set_state(sbi, NTFS_DIRTY_CLEAR);
+	ntfs3_put_sbi(sbi);
 }
 
 static int ntfs_statfs(struct dentry *dentry, struct kstatfs *buf)
@@ -1644,8 +1648,10 @@ static void ntfs_fs_free(struct fs_context *fc)
 	struct ntfs_mount_options *opts = fc->fs_private;
 	struct ntfs_sb_info *sbi = fc->s_fs_info;
 
-	if (sbi)
+	if (sbi) {
+		ntfs3_put_sbi(sbi);
 		ntfs3_free_sbi(sbi);
+	}
 
 	if (opts)
 		put_mount_options(opts);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH 13/13] ntfs3: free the sbi in ->kill_sb
  2023-09-07 13:54     ` Christian Brauner
@ 2023-09-07 15:23       ` Guenter Roeck
  2023-09-07 15:49         ` Christian Brauner
  0 siblings, 1 reply; 31+ messages in thread
From: Guenter Roeck @ 2023-09-07 15:23 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Christoph Hellwig, Al Viro, Namjae Jeon, Sungjong Seo,
	Theodore Ts'o, Andreas Dilger, Konstantin Komarov,
	Darrick J. Wong, linux-fsdevel, linux-ext4, ntfs3, linux-xfs

On 9/7/23 06:54, Christian Brauner wrote:
> On Thu, Sep 07, 2023 at 06:05:40AM -0700, Guenter Roeck wrote:
>> On Wed, Aug 09, 2023 at 03:05:45PM -0700, Christoph Hellwig wrote:
>>> As a rule of thumb everything allocated to the fs_context and moved into
>>> the super_block should be freed by ->kill_sb so that the teardown
>>> handling doesn't need to be duplicated between the fill_super error
>>> path and put_super.  Implement an ntfs3-specific kill_sb method to do
>>> that.
>>>
>>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>>> Reviewed-by: Christian Brauner <brauner@kernel.org>
>>
>> This patch results in:
> 
> The appended patch should fix this. Are you able to test it?
> I will as well.

Yes, this patch restores the previous behavior (no more backtrace or crash).

Tested-by: Guenter Roeck <linux@roeck-us.net>

I say "restore previous behavior" because my simple "recursive copy; partially
remove copied files" test still fails. That problem apparently existed since
ntfs3 has been introduced (I see it as far back as v5.15).

Thanks,
Guenter


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 13/13] ntfs3: free the sbi in ->kill_sb
  2023-09-07 15:23       ` Guenter Roeck
@ 2023-09-07 15:49         ` Christian Brauner
  0 siblings, 0 replies; 31+ messages in thread
From: Christian Brauner @ 2023-09-07 15:49 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Christoph Hellwig, Al Viro, Namjae Jeon, Sungjong Seo,
	Theodore Ts'o, Andreas Dilger, Konstantin Komarov,
	Darrick J. Wong, linux-fsdevel, linux-ext4, ntfs3, linux-xfs

On Thu, Sep 07, 2023 at 08:23:09AM -0700, Guenter Roeck wrote:
> On 9/7/23 06:54, Christian Brauner wrote:
> > On Thu, Sep 07, 2023 at 06:05:40AM -0700, Guenter Roeck wrote:
> > > On Wed, Aug 09, 2023 at 03:05:45PM -0700, Christoph Hellwig wrote:
> > > > As a rule of thumb everything allocated to the fs_context and moved into
> > > > the super_block should be freed by ->kill_sb so that the teardown
> > > > handling doesn't need to be duplicated between the fill_super error
> > > > path and put_super.  Implement an ntfs3-specific kill_sb method to do
> > > > that.
> > > > 
> > > > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > > > Reviewed-by: Christian Brauner <brauner@kernel.org>
> > > 
> > > This patch results in:
> > 
> > The appended patch should fix this. Are you able to test it?
> > I will as well.
> 
> Yes, this patch restores the previous behavior (no more backtrace or crash).

Great, I'll get this fixed in upstream.

> 
> Tested-by: Guenter Roeck <linux@roeck-us.net>
> 
> I say "restore previous behavior" because my simple "recursive copy; partially
> remove copied files" test still fails. That problem apparently existed since
> ntfs3 has been introduced (I see it as far back as v5.15).

I don't think anyone finds that surprising.

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2023-09-07 16:27 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-09 22:05 s_fs_info and ->kill_sb revisited v2 Christoph Hellwig
2023-08-09 22:05 ` [PATCH 01/13] xfs: reformat the xfs_fs_free prototype Christoph Hellwig
2023-08-10  8:53   ` Christian Brauner
2023-08-10 15:51     ` Christoph Hellwig
2023-08-11  7:24       ` Christian Brauner
2023-08-09 22:05 ` [PATCH 02/13] xfs: remove a superfluous s_fs_info NULL check in xfs_fs_put_super Christoph Hellwig
2023-08-09 22:05 ` [PATCH 03/13] xfs: free the xfs_mount in ->kill_sb Christoph Hellwig
2023-08-09 22:05 ` [PATCH 04/13] xfs: remove xfs_blkdev_put Christoph Hellwig
2023-08-09 22:05 ` [PATCH 05/13] xfs: close the RT and log block devices in xfs_free_buftarg Christoph Hellwig
2023-08-09 22:05 ` [PATCH 06/13] xfs: close the external block devices in xfs_mount_free Christoph Hellwig
2023-08-09 22:34   ` Darrick J. Wong
2023-08-09 22:05 ` [PATCH 07/13] xfs: document the invalidate_bdev call in invalidate_bdev Christoph Hellwig
2023-08-09 22:39   ` Darrick J. Wong
2023-08-10  8:20     ` Christian Brauner
2023-08-10 15:53     ` Christoph Hellwig
2023-08-10 15:22   ` Matthew Wilcox
2023-08-10 15:52     ` Christoph Hellwig
2023-08-10 16:00       ` Darrick J. Wong
2023-08-10 15:57     ` Darrick J. Wong
2023-08-09 22:05 ` [PATCH 08/13] ext4: close the external journal device in ->kill_sb Christoph Hellwig
2023-08-09 22:05 ` [PATCH 09/13] exfat: don't RCU-free the sbi Christoph Hellwig
2023-08-10 13:01   ` Namjae Jeon
2023-08-09 22:05 ` [PATCH 10/13] exfat: free the sbi and iocharset in ->kill_sb Christoph Hellwig
2023-08-10 13:02   ` Namjae Jeon
2023-08-09 22:05 ` [PATCH 11/13] ntfs3: rename put_ntfs ntfs3_free_sbi Christoph Hellwig
2023-08-09 22:05 ` [PATCH 12/13] ntfs3: don't call sync_blockdev in ntfs_put_super Christoph Hellwig
2023-08-09 22:05 ` [PATCH 13/13] ntfs3: free the sbi in ->kill_sb Christoph Hellwig
2023-09-07 13:05   ` Guenter Roeck
2023-09-07 13:54     ` Christian Brauner
2023-09-07 15:23       ` Guenter Roeck
2023-09-07 15:49         ` Christian Brauner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).