public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* [BUG REPORT] next-20231102: generic/311 fails on XFS with external log
@ 2023-11-02 12:36 Chandan Babu R
  2023-11-02 14:54 ` Christian Brauner
  0 siblings, 1 reply; 5+ messages in thread
From: Chandan Babu R @ 2023-11-02 12:36 UTC (permalink / raw)
  To: brauner
  Cc: viro, axboe, linux-block, linux-fsdevel, djwong, linux-xfs,
	dchinner

Hi,

generic/311 consistently fails when executing on a kernel built from
next-20231102.

The following is the fstests config file that was used during testing.

export FSTYP=xfs

export TEST_DEV=/dev/loop0
export TEST_DIR=/mnt/test
export TEST_LOGDEV=/dev/loop2

export SCRATCH_DEV=/dev/loop1
export SCRATCH_MNT=/mnt/scratch
export SCRATCH_LOGDEV=/dev/loop3

export USE_EXTERNAL=yes

export MKFS_OPTIONS="-f -m crc=1,reflink=1,rmapbt=1, -i sparse=1 -lsize=1g"


The following is the contents obtained from 311.out.bad.

QA output created by 311
Running test 1 buffered, normal suspend
Random seed is 1
ee6103415276cde95544b11b2675f132
device-mapper: suspend ioctl on flakey-logtest  failed: Device or resource busy
Command failed.
failed to suspend flakey-logtest


Git bisect revealed the following to be the first bad commit,

abcb2b94cce4fb7a8f84278e8da4d726837439d1
Author:     Christian Brauner <brauner@kernel.org>
AuthorDate: Wed Sep 27 15:21:16 2023 +0200
Commit:     Christian Brauner <brauner@kernel.org>
CommitDate: Sat Oct 28 13:29:24 2023 +0200

bdev: implement freeze and thaw holder operations

The old method of implementing block device freeze and thaw operations
required us to rely on get_active_super() to walk the list of all
superblocks on the system to find any superblock that might use the
block device. This is wasteful and not very pleasant overall.

Now that we can finally go straight from block device to owning
superblock things become way simpler.

Link: https://lore.kernel.org/r/20231024-vfs-super-freeze-v2-5-599c19f4faac@kernel.org
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>

--
Chandan

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG REPORT] next-20231102: generic/311 fails on XFS with external log
  2023-11-02 12:36 [BUG REPORT] next-20231102: generic/311 fails on XFS with external log Chandan Babu R
@ 2023-11-02 14:54 ` Christian Brauner
  2023-11-02 20:48   ` Dave Chinner
  2023-11-03  8:14   ` Christoph Hellwig
  0 siblings, 2 replies; 5+ messages in thread
From: Christian Brauner @ 2023-11-02 14:54 UTC (permalink / raw)
  To: Chandan Babu R, Jan Kara, Darrick J. Wong
  Cc: viro, axboe, linux-block, linux-fsdevel, djwong, linux-xfs,
	dchinner, Christoph Hellwig

On Thu, Nov 02, 2023 at 06:06:10PM +0530, Chandan Babu R wrote:
> Hi,
> 
> generic/311 consistently fails when executing on a kernel built from
> next-20231102.
> 
> The following is the fstests config file that was used during testing.
> 
> export FSTYP=xfs
> 
> export TEST_DEV=/dev/loop0
> export TEST_DIR=/mnt/test
> export TEST_LOGDEV=/dev/loop2
> 
> export SCRATCH_DEV=/dev/loop1
> export SCRATCH_MNT=/mnt/scratch
> export SCRATCH_LOGDEV=/dev/loop3

Thanks for the report. So dm flakey sets up:

/dev/dm-0 over /dev/loop0
/dev/dm-1 over /dev/loop2

and then we mount an xfs filesystem with:

/dev/loop2 as logdev and /dev/loop0 as the main device.

So on current kernels what happens is that if you freeze the main
device you end up:

bdev_freeze(dm-0)
-> get_super(dm-0) # finds xfs sb
   -> freeze_super(sb)

if you also freeze the log device afterwards via:

bdev_freeze(dm-1)
-> get_super(dm-1) # doesn't find xfs sb because freezing only works for
                   # main device

What's currently in -next allows you to roughly do the following:

bdev_freeze(dm-0)
-> fs_bdev_freeze(dm-0->sb)
   -> freeze_super(dm-0->sb) # returns 0

bdev_freeze(dm-1)
-> fs_bdev_freeze(dm-1->sb)
   -> freeze_super(dm-1->sb) # returns -EBUSY

So you'll see EBUSY because the superblock was already frozen when the
main block device was frozen. I was somewhat expecting that we may run
into such issues.

I think we just need to figure out what we want to do in cases the
superblock is frozen via multiple devices. It would probably be correct
to keep it frozen as long as any of the devices is frozen?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG REPORT] next-20231102: generic/311 fails on XFS with external log
  2023-11-02 14:54 ` Christian Brauner
@ 2023-11-02 20:48   ` Dave Chinner
  2023-11-03  8:14   ` Christoph Hellwig
  1 sibling, 0 replies; 5+ messages in thread
From: Dave Chinner @ 2023-11-02 20:48 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Chandan Babu R, Jan Kara, Darrick J. Wong, viro, axboe,
	linux-block, linux-fsdevel, linux-xfs, dchinner,
	Christoph Hellwig

On Thu, Nov 02, 2023 at 03:54:48PM +0100, Christian Brauner wrote:
> On Thu, Nov 02, 2023 at 06:06:10PM +0530, Chandan Babu R wrote:
> > Hi,
> > 
> > generic/311 consistently fails when executing on a kernel built from
> > next-20231102.
> > 
> > The following is the fstests config file that was used during testing.
> > 
> > export FSTYP=xfs
> > 
> > export TEST_DEV=/dev/loop0
> > export TEST_DIR=/mnt/test
> > export TEST_LOGDEV=/dev/loop2
> > 
> > export SCRATCH_DEV=/dev/loop1
> > export SCRATCH_MNT=/mnt/scratch
> > export SCRATCH_LOGDEV=/dev/loop3
> 
> Thanks for the report. So dm flakey sets up:
> 
> /dev/dm-0 over /dev/loop0
> /dev/dm-1 over /dev/loop2
> 
> and then we mount an xfs filesystem with:
> 
> /dev/loop2 as logdev and /dev/loop0 as the main device.
> 
> So on current kernels what happens is that if you freeze the main
> device you end up:
> 
> bdev_freeze(dm-0)
> -> get_super(dm-0) # finds xfs sb
>    -> freeze_super(sb)
> 
> if you also freeze the log device afterwards via:
> 
> bdev_freeze(dm-1)
> -> get_super(dm-1) # doesn't find xfs sb because freezing only works for
>                    # main device
> 
> What's currently in -next allows you to roughly do the following:
> 
> bdev_freeze(dm-0)
> -> fs_bdev_freeze(dm-0->sb)
>    -> freeze_super(dm-0->sb) # returns 0
> 
> bdev_freeze(dm-1)
> -> fs_bdev_freeze(dm-1->sb)
>    -> freeze_super(dm-1->sb) # returns -EBUSY
> 
> So you'll see EBUSY because the superblock was already frozen when the
> main block device was frozen. I was somewhat expecting that we may run
> into such issues.
> 
> I think we just need to figure out what we want to do in cases the
> superblock is frozen via multiple devices. It would probably be correct
> to keep it frozen as long as any of the devices is frozen?

So this series removed the blockdev freeze nesting code that dm
suspend/resume functionality used (i.e. it allowed concurrent bdev
freeze/thaw works and leaves the bdev frozen until the last thaw
occurs). Removing bd_fsfreeze_count essentially removed this nesting
ability.

IMO, bdev_freeze() should still nest freeze/thaw across all devices
in the filesystem like it used to on the main device. This implies
that freeze_super() needs to know that it is being called from
bdev_freeze() and needs a freeze counter to allow concurrent bdev
freezes and only thaw the fs when the last freeze goes away....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG REPORT] next-20231102: generic/311 fails on XFS with external log
  2023-11-02 14:54 ` Christian Brauner
  2023-11-02 20:48   ` Dave Chinner
@ 2023-11-03  8:14   ` Christoph Hellwig
  2023-11-03  8:35     ` Christian Brauner
  1 sibling, 1 reply; 5+ messages in thread
From: Christoph Hellwig @ 2023-11-03  8:14 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Chandan Babu R, Jan Kara, Darrick J. Wong, viro, axboe,
	linux-block, linux-fsdevel, linux-xfs, dchinner,
	Christoph Hellwig

On Thu, Nov 02, 2023 at 03:54:48PM +0100, Christian Brauner wrote:
> So you'll see EBUSY because the superblock was already frozen when the
> main block device was frozen. I was somewhat expecting that we may run
> into such issues.
> 
> I think we just need to figure out what we want to do in cases the
> superblock is frozen via multiple devices. It would probably be correct
> to keep it frozen as long as any of the devices is frozen?

As dave pointed out I think we need to bring back / keep the freeze
count.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG REPORT] next-20231102: generic/311 fails on XFS with external log
  2023-11-03  8:14   ` Christoph Hellwig
@ 2023-11-03  8:35     ` Christian Brauner
  0 siblings, 0 replies; 5+ messages in thread
From: Christian Brauner @ 2023-11-03  8:35 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Chandan Babu R, Jan Kara, Darrick J. Wong, viro, axboe,
	linux-block, linux-fsdevel, linux-xfs, dchinner

On Fri, Nov 03, 2023 at 09:14:05AM +0100, Christoph Hellwig wrote:
> On Thu, Nov 02, 2023 at 03:54:48PM +0100, Christian Brauner wrote:
> > So you'll see EBUSY because the superblock was already frozen when the
> > main block device was frozen. I was somewhat expecting that we may run
> > into such issues.
> > 
> > I think we just need to figure out what we want to do in cases the
> > superblock is frozen via multiple devices. It would probably be correct
> > to keep it frozen as long as any of the devices is frozen?
> 
> As dave pointed out I think we need to bring back / keep the freeze
> count.

The freeze count never want away. IOW, for each block device we still
have bd_fsfreeze_count otherwise we couldn't nest per-block device. What
we need is a freeze counter in sb_writers so we can nest superblock
freezes. IOW, we need to count the number of block devices that
requested/caused the superblock to be frozen. I think we're all in
agreement though. All of our suggestions should be the same.
I'm currently testing:

From c1849037227e5801f0b5e8acfa05aa5d90f4c9e4 Mon Sep 17 00:00:00 2001
From: Christian Brauner <brauner@kernel.org>
Date: Fri, 3 Nov 2023 08:38:49 +0100
Subject: [PATCH] [DRAFT] fs: handle freezing from multiple devices

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 fs/super.c         | 44 +++++++++++++++++++++++++++++++++++++++-----
 include/linux/fs.h |  2 ++
 2 files changed, 41 insertions(+), 5 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index 176c55abd9de..882c79366c70 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1476,9 +1476,11 @@ static int fs_bdev_freeze(struct block_device *bdev)
 		return -EINVAL;
 
 	if (sb->s_op->freeze_super)
-		error = sb->s_op->freeze_super(sb, FREEZE_HOLDER_USERSPACE);
+		error = sb->s_op->freeze_super(sb,
+				FREEZE_HOLDER_BLOCK | FREEZE_HOLDER_USERSPACE);
 	else
-		error = freeze_super(sb, FREEZE_HOLDER_USERSPACE);
+		error = freeze_super(sb,
+				FREEZE_HOLDER_BLOCK | FREEZE_HOLDER_USERSPACE);
 	if (!error)
 		error = sync_blockdev(bdev);
 	deactivate_super(sb);
@@ -1497,9 +1499,11 @@ static int fs_bdev_thaw(struct block_device *bdev)
 		return -EINVAL;
 
 	if (sb->s_op->thaw_super)
-		error = sb->s_op->thaw_super(sb, FREEZE_HOLDER_USERSPACE);
+		error = sb->s_op->thaw_super(sb,
+				FREEZE_HOLDER_BLOCK | FREEZE_HOLDER_USERSPACE);
 	else
-		error = thaw_super(sb, FREEZE_HOLDER_USERSPACE);
+		error = thaw_super(sb,
+				FREEZE_HOLDER_BLOCK | FREEZE_HOLDER_USERSPACE);
 	deactivate_super(sb);
 	return error;
 }
@@ -1923,6 +1927,7 @@ static int wait_for_partially_frozen(struct super_block *sb)
  * @who should be:
  * * %FREEZE_HOLDER_USERSPACE if userspace wants to freeze the fs;
  * * %FREEZE_HOLDER_KERNEL if the kernel wants to freeze the fs.
+ * * %FREEZE_HOLDER_BLOCK if freeze originated from the block layer.
  *
  * The @who argument distinguishes between the kernel and userspace trying to
  * freeze the filesystem.  Although there cannot be multiple kernel freezes or
@@ -1958,18 +1963,33 @@ static int wait_for_partially_frozen(struct super_block *sb)
 int freeze_super(struct super_block *sb, enum freeze_holder who)
 {
 	int ret;
+	bool bdev_initiated;
 
 	if (!super_lock_excl(sb)) {
 		WARN_ON_ONCE("Dying superblock while freezing!");
 		return -EINVAL;
 	}
 	atomic_inc(&sb->s_active);
+	bdev_initiated = (who & FREEZE_HOLDER_BLOCK);
+	who &= ~FREEZE_HOLDER_BLOCK;
 
 retry:
 	if (sb->s_writers.frozen == SB_FREEZE_COMPLETE) {
+		ret = -EBUSY;
+
+		/*
+		 * This is a freeze request from another block device
+		 * associated with the same superblock.
+		 */
+		if (bdev_initiated) {
+			sb->s_writers.bdev_count++;
+			pr_info("Freeze initiated from %d block devices\n", sb->s_writers.bdev_count);
+			ret = 0;
+		}
+
 		if (sb->s_writers.freeze_holders & who) {
 			deactivate_locked_super(sb);
-			return -EBUSY;
+			return ret;
 		}
 
 		WARN_ON(sb->s_writers.freeze_holders == 0);
@@ -2002,6 +2022,8 @@ int freeze_super(struct super_block *sb, enum freeze_holder who)
 		/* Nothing to do really... */
 		sb->s_writers.freeze_holders |= who;
 		sb->s_writers.frozen = SB_FREEZE_COMPLETE;
+		if (bdev_initiated)
+			sb->s_writers.bdev_count++;
 		wake_up_var(&sb->s_writers.frozen);
 		super_unlock_excl(sb);
 		return 0;
@@ -2052,6 +2074,8 @@ int freeze_super(struct super_block *sb, enum freeze_holder who)
 	 */
 	sb->s_writers.freeze_holders |= who;
 	sb->s_writers.frozen = SB_FREEZE_COMPLETE;
+	if (bdev_initiated)
+		sb->s_writers.bdev_count++;
 	wake_up_var(&sb->s_writers.frozen);
 	lockdep_sb_freeze_release(sb);
 	super_unlock_excl(sb);
@@ -2068,12 +2092,22 @@ EXPORT_SYMBOL(freeze_super);
 static int thaw_super_locked(struct super_block *sb, enum freeze_holder who)
 {
 	int error = -EINVAL;
+	bool bdev_initiated = (who & FREEZE_HOLDER_BLOCK);
+	who &= ~FREEZE_HOLDER_BLOCK;
 
 	if (sb->s_writers.frozen != SB_FREEZE_COMPLETE)
 		goto out_unlock;
 	if (!(sb->s_writers.freeze_holders & who))
 		goto out_unlock;
 
+	if (bdev_initiated)
+		sb->s_writers.bdev_count--;
+	if (sb->s_writers.bdev_count) {
+		pr_info("Filesystems held frozen by %d block devices\n", sb->s_writers.bdev_count);
+		error = 0;
+		goto out_unlock;
+	}
+
 	/*
 	 * Freeze is shared with someone else.  Release our hold and drop the
 	 * active ref that freeze_super assigned to the freezer.
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 63ff88d20e46..edc9c071c199 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1186,6 +1186,7 @@ enum {
 struct sb_writers {
 	unsigned short			frozen;		/* Is sb frozen? */
 	unsigned short			freeze_holders;	/* Who froze fs? */
+	int				bdev_count;
 	struct percpu_rw_semaphore	rw_sem[SB_FREEZE_LEVELS];
 };
 
@@ -2054,6 +2055,7 @@ extern loff_t vfs_dedupe_file_range_one(struct file *src_file, loff_t src_pos,
 enum freeze_holder {
 	FREEZE_HOLDER_KERNEL	= (1U << 0),
 	FREEZE_HOLDER_USERSPACE	= (1U << 1),
+	FREEZE_HOLDER_BLOCK	= (1U << 2),
 };
 
 struct super_operations {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-11-03  8:36 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-02 12:36 [BUG REPORT] next-20231102: generic/311 fails on XFS with external log Chandan Babu R
2023-11-02 14:54 ` Christian Brauner
2023-11-02 20:48   ` Dave Chinner
2023-11-03  8:14   ` Christoph Hellwig
2023-11-03  8:35     ` Christian Brauner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox