* [BUG REPORT] next-20231102: generic/311 fails on XFS with external log @ 2023-11-02 12:36 Chandan Babu R 2023-11-02 14:54 ` Christian Brauner 0 siblings, 1 reply; 5+ messages in thread From: Chandan Babu R @ 2023-11-02 12:36 UTC (permalink / raw) To: brauner Cc: viro, axboe, linux-block, linux-fsdevel, djwong, linux-xfs, dchinner Hi, generic/311 consistently fails when executing on a kernel built from next-20231102. The following is the fstests config file that was used during testing. export FSTYP=xfs export TEST_DEV=/dev/loop0 export TEST_DIR=/mnt/test export TEST_LOGDEV=/dev/loop2 export SCRATCH_DEV=/dev/loop1 export SCRATCH_MNT=/mnt/scratch export SCRATCH_LOGDEV=/dev/loop3 export USE_EXTERNAL=yes export MKFS_OPTIONS="-f -m crc=1,reflink=1,rmapbt=1, -i sparse=1 -lsize=1g" The following is the contents obtained from 311.out.bad. QA output created by 311 Running test 1 buffered, normal suspend Random seed is 1 ee6103415276cde95544b11b2675f132 device-mapper: suspend ioctl on flakey-logtest failed: Device or resource busy Command failed. failed to suspend flakey-logtest Git bisect revealed the following to be the first bad commit, abcb2b94cce4fb7a8f84278e8da4d726837439d1 Author: Christian Brauner <brauner@kernel.org> AuthorDate: Wed Sep 27 15:21:16 2023 +0200 Commit: Christian Brauner <brauner@kernel.org> CommitDate: Sat Oct 28 13:29:24 2023 +0200 bdev: implement freeze and thaw holder operations The old method of implementing block device freeze and thaw operations required us to rely on get_active_super() to walk the list of all superblocks on the system to find any superblock that might use the block device. This is wasteful and not very pleasant overall. Now that we can finally go straight from block device to owning superblock things become way simpler. Link: https://lore.kernel.org/r/20231024-vfs-super-freeze-v2-5-599c19f4faac@kernel.org Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org> -- Chandan ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [BUG REPORT] next-20231102: generic/311 fails on XFS with external log 2023-11-02 12:36 [BUG REPORT] next-20231102: generic/311 fails on XFS with external log Chandan Babu R @ 2023-11-02 14:54 ` Christian Brauner 2023-11-02 20:48 ` Dave Chinner 2023-11-03 8:14 ` Christoph Hellwig 0 siblings, 2 replies; 5+ messages in thread From: Christian Brauner @ 2023-11-02 14:54 UTC (permalink / raw) To: Chandan Babu R, Jan Kara, Darrick J. Wong Cc: viro, axboe, linux-block, linux-fsdevel, djwong, linux-xfs, dchinner, Christoph Hellwig On Thu, Nov 02, 2023 at 06:06:10PM +0530, Chandan Babu R wrote: > Hi, > > generic/311 consistently fails when executing on a kernel built from > next-20231102. > > The following is the fstests config file that was used during testing. > > export FSTYP=xfs > > export TEST_DEV=/dev/loop0 > export TEST_DIR=/mnt/test > export TEST_LOGDEV=/dev/loop2 > > export SCRATCH_DEV=/dev/loop1 > export SCRATCH_MNT=/mnt/scratch > export SCRATCH_LOGDEV=/dev/loop3 Thanks for the report. So dm flakey sets up: /dev/dm-0 over /dev/loop0 /dev/dm-1 over /dev/loop2 and then we mount an xfs filesystem with: /dev/loop2 as logdev and /dev/loop0 as the main device. So on current kernels what happens is that if you freeze the main device you end up: bdev_freeze(dm-0) -> get_super(dm-0) # finds xfs sb -> freeze_super(sb) if you also freeze the log device afterwards via: bdev_freeze(dm-1) -> get_super(dm-1) # doesn't find xfs sb because freezing only works for # main device What's currently in -next allows you to roughly do the following: bdev_freeze(dm-0) -> fs_bdev_freeze(dm-0->sb) -> freeze_super(dm-0->sb) # returns 0 bdev_freeze(dm-1) -> fs_bdev_freeze(dm-1->sb) -> freeze_super(dm-1->sb) # returns -EBUSY So you'll see EBUSY because the superblock was already frozen when the main block device was frozen. I was somewhat expecting that we may run into such issues. I think we just need to figure out what we want to do in cases the superblock is frozen via multiple devices. It would probably be correct to keep it frozen as long as any of the devices is frozen? ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [BUG REPORT] next-20231102: generic/311 fails on XFS with external log 2023-11-02 14:54 ` Christian Brauner @ 2023-11-02 20:48 ` Dave Chinner 2023-11-03 8:14 ` Christoph Hellwig 1 sibling, 0 replies; 5+ messages in thread From: Dave Chinner @ 2023-11-02 20:48 UTC (permalink / raw) To: Christian Brauner Cc: Chandan Babu R, Jan Kara, Darrick J. Wong, viro, axboe, linux-block, linux-fsdevel, linux-xfs, dchinner, Christoph Hellwig On Thu, Nov 02, 2023 at 03:54:48PM +0100, Christian Brauner wrote: > On Thu, Nov 02, 2023 at 06:06:10PM +0530, Chandan Babu R wrote: > > Hi, > > > > generic/311 consistently fails when executing on a kernel built from > > next-20231102. > > > > The following is the fstests config file that was used during testing. > > > > export FSTYP=xfs > > > > export TEST_DEV=/dev/loop0 > > export TEST_DIR=/mnt/test > > export TEST_LOGDEV=/dev/loop2 > > > > export SCRATCH_DEV=/dev/loop1 > > export SCRATCH_MNT=/mnt/scratch > > export SCRATCH_LOGDEV=/dev/loop3 > > Thanks for the report. So dm flakey sets up: > > /dev/dm-0 over /dev/loop0 > /dev/dm-1 over /dev/loop2 > > and then we mount an xfs filesystem with: > > /dev/loop2 as logdev and /dev/loop0 as the main device. > > So on current kernels what happens is that if you freeze the main > device you end up: > > bdev_freeze(dm-0) > -> get_super(dm-0) # finds xfs sb > -> freeze_super(sb) > > if you also freeze the log device afterwards via: > > bdev_freeze(dm-1) > -> get_super(dm-1) # doesn't find xfs sb because freezing only works for > # main device > > What's currently in -next allows you to roughly do the following: > > bdev_freeze(dm-0) > -> fs_bdev_freeze(dm-0->sb) > -> freeze_super(dm-0->sb) # returns 0 > > bdev_freeze(dm-1) > -> fs_bdev_freeze(dm-1->sb) > -> freeze_super(dm-1->sb) # returns -EBUSY > > So you'll see EBUSY because the superblock was already frozen when the > main block device was frozen. I was somewhat expecting that we may run > into such issues. > > I think we just need to figure out what we want to do in cases the > superblock is frozen via multiple devices. It would probably be correct > to keep it frozen as long as any of the devices is frozen? So this series removed the blockdev freeze nesting code that dm suspend/resume functionality used (i.e. it allowed concurrent bdev freeze/thaw works and leaves the bdev frozen until the last thaw occurs). Removing bd_fsfreeze_count essentially removed this nesting ability. IMO, bdev_freeze() should still nest freeze/thaw across all devices in the filesystem like it used to on the main device. This implies that freeze_super() needs to know that it is being called from bdev_freeze() and needs a freeze counter to allow concurrent bdev freezes and only thaw the fs when the last freeze goes away.... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [BUG REPORT] next-20231102: generic/311 fails on XFS with external log 2023-11-02 14:54 ` Christian Brauner 2023-11-02 20:48 ` Dave Chinner @ 2023-11-03 8:14 ` Christoph Hellwig 2023-11-03 8:35 ` Christian Brauner 1 sibling, 1 reply; 5+ messages in thread From: Christoph Hellwig @ 2023-11-03 8:14 UTC (permalink / raw) To: Christian Brauner Cc: Chandan Babu R, Jan Kara, Darrick J. Wong, viro, axboe, linux-block, linux-fsdevel, linux-xfs, dchinner, Christoph Hellwig On Thu, Nov 02, 2023 at 03:54:48PM +0100, Christian Brauner wrote: > So you'll see EBUSY because the superblock was already frozen when the > main block device was frozen. I was somewhat expecting that we may run > into such issues. > > I think we just need to figure out what we want to do in cases the > superblock is frozen via multiple devices. It would probably be correct > to keep it frozen as long as any of the devices is frozen? As dave pointed out I think we need to bring back / keep the freeze count. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [BUG REPORT] next-20231102: generic/311 fails on XFS with external log 2023-11-03 8:14 ` Christoph Hellwig @ 2023-11-03 8:35 ` Christian Brauner 0 siblings, 0 replies; 5+ messages in thread From: Christian Brauner @ 2023-11-03 8:35 UTC (permalink / raw) To: Christoph Hellwig Cc: Chandan Babu R, Jan Kara, Darrick J. Wong, viro, axboe, linux-block, linux-fsdevel, linux-xfs, dchinner On Fri, Nov 03, 2023 at 09:14:05AM +0100, Christoph Hellwig wrote: > On Thu, Nov 02, 2023 at 03:54:48PM +0100, Christian Brauner wrote: > > So you'll see EBUSY because the superblock was already frozen when the > > main block device was frozen. I was somewhat expecting that we may run > > into such issues. > > > > I think we just need to figure out what we want to do in cases the > > superblock is frozen via multiple devices. It would probably be correct > > to keep it frozen as long as any of the devices is frozen? > > As dave pointed out I think we need to bring back / keep the freeze > count. The freeze count never want away. IOW, for each block device we still have bd_fsfreeze_count otherwise we couldn't nest per-block device. What we need is a freeze counter in sb_writers so we can nest superblock freezes. IOW, we need to count the number of block devices that requested/caused the superblock to be frozen. I think we're all in agreement though. All of our suggestions should be the same. I'm currently testing: From c1849037227e5801f0b5e8acfa05aa5d90f4c9e4 Mon Sep 17 00:00:00 2001 From: Christian Brauner <brauner@kernel.org> Date: Fri, 3 Nov 2023 08:38:49 +0100 Subject: [PATCH] [DRAFT] fs: handle freezing from multiple devices Signed-off-by: Christian Brauner <brauner@kernel.org> --- fs/super.c | 44 +++++++++++++++++++++++++++++++++++++++----- include/linux/fs.h | 2 ++ 2 files changed, 41 insertions(+), 5 deletions(-) diff --git a/fs/super.c b/fs/super.c index 176c55abd9de..882c79366c70 100644 --- a/fs/super.c +++ b/fs/super.c @@ -1476,9 +1476,11 @@ static int fs_bdev_freeze(struct block_device *bdev) return -EINVAL; if (sb->s_op->freeze_super) - error = sb->s_op->freeze_super(sb, FREEZE_HOLDER_USERSPACE); + error = sb->s_op->freeze_super(sb, + FREEZE_HOLDER_BLOCK | FREEZE_HOLDER_USERSPACE); else - error = freeze_super(sb, FREEZE_HOLDER_USERSPACE); + error = freeze_super(sb, + FREEZE_HOLDER_BLOCK | FREEZE_HOLDER_USERSPACE); if (!error) error = sync_blockdev(bdev); deactivate_super(sb); @@ -1497,9 +1499,11 @@ static int fs_bdev_thaw(struct block_device *bdev) return -EINVAL; if (sb->s_op->thaw_super) - error = sb->s_op->thaw_super(sb, FREEZE_HOLDER_USERSPACE); + error = sb->s_op->thaw_super(sb, + FREEZE_HOLDER_BLOCK | FREEZE_HOLDER_USERSPACE); else - error = thaw_super(sb, FREEZE_HOLDER_USERSPACE); + error = thaw_super(sb, + FREEZE_HOLDER_BLOCK | FREEZE_HOLDER_USERSPACE); deactivate_super(sb); return error; } @@ -1923,6 +1927,7 @@ static int wait_for_partially_frozen(struct super_block *sb) * @who should be: * * %FREEZE_HOLDER_USERSPACE if userspace wants to freeze the fs; * * %FREEZE_HOLDER_KERNEL if the kernel wants to freeze the fs. + * * %FREEZE_HOLDER_BLOCK if freeze originated from the block layer. * * The @who argument distinguishes between the kernel and userspace trying to * freeze the filesystem. Although there cannot be multiple kernel freezes or @@ -1958,18 +1963,33 @@ static int wait_for_partially_frozen(struct super_block *sb) int freeze_super(struct super_block *sb, enum freeze_holder who) { int ret; + bool bdev_initiated; if (!super_lock_excl(sb)) { WARN_ON_ONCE("Dying superblock while freezing!"); return -EINVAL; } atomic_inc(&sb->s_active); + bdev_initiated = (who & FREEZE_HOLDER_BLOCK); + who &= ~FREEZE_HOLDER_BLOCK; retry: if (sb->s_writers.frozen == SB_FREEZE_COMPLETE) { + ret = -EBUSY; + + /* + * This is a freeze request from another block device + * associated with the same superblock. + */ + if (bdev_initiated) { + sb->s_writers.bdev_count++; + pr_info("Freeze initiated from %d block devices\n", sb->s_writers.bdev_count); + ret = 0; + } + if (sb->s_writers.freeze_holders & who) { deactivate_locked_super(sb); - return -EBUSY; + return ret; } WARN_ON(sb->s_writers.freeze_holders == 0); @@ -2002,6 +2022,8 @@ int freeze_super(struct super_block *sb, enum freeze_holder who) /* Nothing to do really... */ sb->s_writers.freeze_holders |= who; sb->s_writers.frozen = SB_FREEZE_COMPLETE; + if (bdev_initiated) + sb->s_writers.bdev_count++; wake_up_var(&sb->s_writers.frozen); super_unlock_excl(sb); return 0; @@ -2052,6 +2074,8 @@ int freeze_super(struct super_block *sb, enum freeze_holder who) */ sb->s_writers.freeze_holders |= who; sb->s_writers.frozen = SB_FREEZE_COMPLETE; + if (bdev_initiated) + sb->s_writers.bdev_count++; wake_up_var(&sb->s_writers.frozen); lockdep_sb_freeze_release(sb); super_unlock_excl(sb); @@ -2068,12 +2092,22 @@ EXPORT_SYMBOL(freeze_super); static int thaw_super_locked(struct super_block *sb, enum freeze_holder who) { int error = -EINVAL; + bool bdev_initiated = (who & FREEZE_HOLDER_BLOCK); + who &= ~FREEZE_HOLDER_BLOCK; if (sb->s_writers.frozen != SB_FREEZE_COMPLETE) goto out_unlock; if (!(sb->s_writers.freeze_holders & who)) goto out_unlock; + if (bdev_initiated) + sb->s_writers.bdev_count--; + if (sb->s_writers.bdev_count) { + pr_info("Filesystems held frozen by %d block devices\n", sb->s_writers.bdev_count); + error = 0; + goto out_unlock; + } + /* * Freeze is shared with someone else. Release our hold and drop the * active ref that freeze_super assigned to the freezer. diff --git a/include/linux/fs.h b/include/linux/fs.h index 63ff88d20e46..edc9c071c199 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1186,6 +1186,7 @@ enum { struct sb_writers { unsigned short frozen; /* Is sb frozen? */ unsigned short freeze_holders; /* Who froze fs? */ + int bdev_count; struct percpu_rw_semaphore rw_sem[SB_FREEZE_LEVELS]; }; @@ -2054,6 +2055,7 @@ extern loff_t vfs_dedupe_file_range_one(struct file *src_file, loff_t src_pos, enum freeze_holder { FREEZE_HOLDER_KERNEL = (1U << 0), FREEZE_HOLDER_USERSPACE = (1U << 1), + FREEZE_HOLDER_BLOCK = (1U << 2), }; struct super_operations { -- 2.34.1 ^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2023-11-03 8:36 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-11-02 12:36 [BUG REPORT] next-20231102: generic/311 fails on XFS with external log Chandan Babu R 2023-11-02 14:54 ` Christian Brauner 2023-11-02 20:48 ` Dave Chinner 2023-11-03 8:14 ` Christoph Hellwig 2023-11-03 8:35 ` Christian Brauner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox