From: "Darrick J. Wong" <djwong@kernel.org>
To: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Cc: linux-xfs@vger.kernel.org, fstests@vger.kernel.org, hch@lst.de
Subject: Re: [bug report] xfs/806 sporadic failure
Date: Wed, 29 Apr 2026 08:58:13 -0700 [thread overview]
Message-ID: <20260429155813.GD7751@frogsfrogsfrogs> (raw)
In-Reply-To: <ae7-c9s0KVBRsSy3@shinmob>
On Mon, Apr 27, 2026 at 03:16:44PM +0900, Shin'ichiro Kawasaki wrote:
> Hello Darrick and all,
>
> Recently, I observed a sporadic failure of xfs/806. If does not fail always,
> but fails in steady manner in the last a few months. I would like to ask if
> anyone has any insights about the failure.
>
> I started observing that failure in February. It was observed with various test
> target devices, zoned/non-zoned, null_blk/tcmu/HDD. The failure was not
> recreated easiliy. Recently, I found that I can recreate the failure in stable
> manner using dm-crypt on my SMR HDDs and by repeating the test cases from
> xfs/803 to xfs/806.
>
> The failure can be recrated by running xfs/806 many times (> 200 times), but
> it looks recreated sooner by repeating xfs/803 to xfs/806.
>
> The failure is observed with recent xfs/for-next branch.
>
> On falure, the fstests console looks like this:
> ------------------------------------------------------------------------------
> xfs/803 12s ... 12s
> xfs/804 12s ... 12s
> xfs/805 9s ... 9s
> xfs/806 13s ... - output mismatch (see /home/kts/kernel-test-suite/src/xfstests/results//xfs/806.out.bad)
> --- tests/xfs/806.out 2026-03-26 22:40:32.089917897 +0900
> +++ /home/kts/kernel-test-suite/src/xfstests/results//xfs/806.out.bad 2026-04-27 09:21:27.164750487 +0900
> @@ -1,6 +1,7 @@
> QA output created by 806
> Info: TEST_DIR/806.mount: Checking and repairing per autofsck directive.
> -Info: TEST_DIR/806.mount: Disabling scrub per autofsck directive.
> +mount: /var/kts/test/806.mount: /var/kts/test/806.somefile is already mounted.
Huh. It's very strange that the fs is still mounted, yet the unmount
didn't print anything about that.
> +Info: TEST_DIR/806.mount: Checking and repairing per autofsck directive.
> Info: TEST_DIR/806.mount: Checking per autofsck directive.
> Info: TEST_DIR/806.mount: Optimizing per autofsck directive.
> ...
> (Run 'diff -u /home/kts/kernel-test-suite/src/xfstests/tests/xfs/806.out /home/kts/kernel-test-suite/src/xfstests/results//xfs/806.out.bad' to see the entire diff)
> Ran: xfs/803 xfs/804 xfs/805 xfs/806
> Failures: xfs/806
> ------------------------------------------------------------------------------
>
> As you can see, the mount command fails with the "/var/kts/test/806.somefile is
> already mounted." message. The test case does mount multiple times. The step of
> mount failure is not always same, but the failure message is same always.
>
> Dmesg does not leave anything looks suspecious to me [1].
>
> FYI, here I share the steps I use to recreate the failure. I use a QEMU test
> node with Fedora 43. SMR HDDs are exposed to the QEMU VM via PCI passthrough.
Fedora 43, so xfsprogs isn't new enough to have xfs_healer.
> # Set up dm-crypt on two SMR HDDs, /dev/sdc and /dev/sdd. Offset and size are
> # chosen to have 16 conventional zones and 144 sequential write required zones.
> $ sudo dd if=/dev/random of=/tmp/keyfile bs=1 count=256
> $ sudo cryptsetup open --batch-mode --type plain --cipher aes-cbc-essiv:sha256 --key-size 256 --key-file /tmp/keyfile --offset 382730240 --size 83886080 /dev/sdc test1
> $ sudo cryptsetup open --batch-mode --type plain --cipher aes-cbc-essiv:sha256 --key-size 256 --key-file /tmp/keyfile --offset 382730240 --size 83886080 /dev/sdd test2
>
> # prepare local.config
> $ cat ./local.config
> export TEST_DIR=/var/kts/test
> export TEST_DEV="/dev/mapper/test1"
> export KEEP_DMESG=yes
> export FSTYP=xfs
> export MKFS_OPTIONS=""
> export SCRATCH_MNT=/var/kts/scratch
> export SCRATCH_DEV="/dev/mapper/test2"
> export FSX_AVOID=-a
>
> # format TEST_DEV
> $ sudo mkfs.xfs /dev/mapper/test1
>
> # repeat test cases from xfs/803 to xfs/806 200 times.
> $ for ((i=0;i<200;i++)); do echo $i; if ! sudo ./check xfs/803 xfs/804 xfs/805 xfs/806; then break; fi; done
>
> With the steps, the failures is recreated at 20th repeat at earliest, or 120th
> repeat when it takes longer.
>
> Using this environment, I did quick printk debug, and found that do_umount()
> in fs/namespace.c returns -EBUSY on failure case. But I'm not sure why it
> happens.
>
> Quote from fs/namespace.c:
> ------------------------------------------------------------------------------
> event++;
> if (flags & MNT_DETACH) {
> umount_tree(mnt, UMOUNT_PROPAGATE);
> retval = 0;
> } else {
> smp_mb(); // paired with __legitimize_mnt()
> shrink_submounts(mnt);
> retval = -EBUSY; <===================================== here
> if (!propagate_mount_busy(mnt, 2)) {
> umount_tree(mnt, UMOUNT_PROPAGATE|UMOUNT_SYNC);
> retval = 0;
> }
> }
> ------------------------------------------------------------------------------
>
> Any advice will be welcomed. If I can do anything in my test environment,
> please let me know.
Hrmm. Let me try this case (loop image on test fs on dmcrypted smr hdd)
in the lab when I'm back from travelling tomorrow. I wonder if there's
some sort of delay in unmounting, though nothing stands out immediately.
--D
>
> System maintenance is planned for my test machines in the first week of May.
> So, my action on my test environment may take some time.
>
>
> [1] dmesg
>
> [ 1966.727914] [ T137522] run fstests xfs/803 at 2026-04-27 13:47:54
> [ 1972.747786] [ T138553] XFS (dm-0): Unmounting Filesystem 4586471c-c0d6-4e61-b808-4ba3976a8732
> [ 1973.406021] [ T138576] XFS (dm-0): EXPERIMENTAL zoned RT device feature enabled. Use at your own risk!
> [ 1973.419142] [ T138576] XFS (dm-0): Mounting V5 Filesystem 4586471c-c0d6-4e61-b808-4ba3976a8732
> [ 1973.568485] [ T138576] XFS (dm-0): Ending clean mount
> [ 1973.570947] [ T138576] XFS (dm-0): limiting open zones to 36 due to total zone count (144)
> [ 1973.577722] [ T138576] XFS (dm-0): 144 zones of 65536 blocks (36 max open zones)
> [ 1973.937017] [ T138621] XFS (dm-0): Unmounting Filesystem 4586471c-c0d6-4e61-b808-4ba3976a8732
> [ 1975.775968] [ T137522] run fstests xfs/804 at 2026-04-27 13:48:03
> [ 1977.795592] [ T139088] XFS (dm-0): EXPERIMENTAL zoned RT device feature enabled. Use at your own risk!
> [ 1977.807396] [ T139088] XFS (dm-0): Mounting V5 Filesystem 4586471c-c0d6-4e61-b808-4ba3976a8732
> [ 1977.922463] [ T139088] XFS (dm-0): Ending clean mount
> [ 1977.925319] [ T139088] XFS (dm-0): limiting open zones to 36 due to total zone count (144)
> [ 1977.932679] [ T139088] XFS (dm-0): 144 zones of 65536 blocks (36 max open zones)
> [ 1978.971340] [ T139191] XFS (dm-0): Unmounting Filesystem 4586471c-c0d6-4e61-b808-4ba3976a8732
> [ 1980.413770] [ T139283] XFS (dm-0): EXPERIMENTAL zoned RT device feature enabled. Use at your own risk!
> [ 1980.426461] [ T139283] XFS (dm-0): Mounting V5 Filesystem 4586471c-c0d6-4e61-b808-4ba3976a8732
> [ 1980.550420] [ T139283] XFS (dm-0): Ending clean mount
> [ 1980.552682] [ T139283] XFS (dm-0): limiting open zones to 36 due to total zone count (144)
> [ 1980.558805] [ T139283] XFS (dm-0): 144 zones of 65536 blocks (36 max open zones)
> [ 1983.784203] [ T139478] XFS (dm-0): Unmounting Filesystem 4586471c-c0d6-4e61-b808-4ba3976a8732
> [ 1984.571368] [ T139670] XFS (dm-0): EXPERIMENTAL zoned RT device feature enabled. Use at your own risk!
> [ 1984.582316] [ T139670] XFS (dm-0): Mounting V5 Filesystem 4586471c-c0d6-4e61-b808-4ba3976a8732
> [ 1984.698207] [ T139670] XFS (dm-0): Ending clean mount
> [ 1984.700759] [ T139670] XFS (dm-0): limiting open zones to 36 due to total zone count (144)
> [ 1984.708996] [ T139670] XFS (dm-0): 144 zones of 65536 blocks (36 max open zones)
> [ 1985.237069] [ T137522] run fstests xfs/805 at 2026-04-27 13:48:13
> [ 1991.545779] [ T140080] XFS (dm-0): Unmounting Filesystem 4586471c-c0d6-4e61-b808-4ba3976a8732
> [ 1992.390589] [ T140272] XFS (dm-0): EXPERIMENTAL zoned RT device feature enabled. Use at your own risk!
> [ 1992.400273] [ T140272] XFS (dm-0): Mounting V5 Filesystem 4586471c-c0d6-4e61-b808-4ba3976a8732
> [ 1992.510456] [ T140272] XFS (dm-0): Ending clean mount
> [ 1992.513019] [ T140272] XFS (dm-0): limiting open zones to 36 due to total zone count (144)
> [ 1992.519907] [ T140272] XFS (dm-0): 144 zones of 65536 blocks (36 max open zones)
> [ 1993.060516] [ T137522] run fstests xfs/806 at 2026-04-27 13:48:21
> [ 1996.241717] [ T140594] loop0: detected capacity change from 0 to 20971520
> [ 1996.286338] [ T140594] XFS (loop0): Mounting V5 Filesystem cdb36045-8330-48c7-b1a4-82c5d7ed21e1
> [ 1996.308568] [ T140594] XFS (loop0): Ending clean mount
> [ 1996.388493] [ T140613] XFS (loop0): Unmounting Filesystem cdb36045-8330-48c7-b1a4-82c5d7ed21e1
> [ 1996.599702] [ T140615] loop0: detected capacity change from 0 to 20971520
> [ 1996.638831] [ T140615] XFS (loop0): Mounting V5 Filesystem 3218c7d6-c4a4-4f5a-b9bc-c1ba7ada7e6d
> [ 1996.661288] [ T140615] XFS (loop0): Ending clean mount
> [ 1996.727650] [ T140630] XFS (loop0): Unmounting Filesystem 3218c7d6-c4a4-4f5a-b9bc-c1ba7ada7e6d
> [ 1996.927360] [ T140632] loop0: detected capacity change from 0 to 20971520
> [ 1996.965520] [ T140632] XFS (loop0): Mounting V5 Filesystem 15db07f7-722e-401e-8f22-856ceeedeed6
> [ 1996.983714] [ T140632] XFS (loop0): Ending clean mount
> [ 1997.058905] [ T140651] XFS (loop0): Unmounting Filesystem 15db07f7-722e-401e-8f22-856ceeedeed6
> [ 1997.256820] [ T140653] loop0: detected capacity change from 0 to 20971520
> [ 1997.294703] [ T140653] XFS (loop0): Mounting V5 Filesystem 98713c32-802b-4179-8ec6-fcc89968b76e
> [ 1997.312428] [ T140653] XFS (loop0): Ending clean mount
> [ 1997.385745] [ T140672] XFS (loop0): Unmounting Filesystem 98713c32-802b-4179-8ec6-fcc89968b76e
> [ 1997.574544] [ T140674] loop0: detected capacity change from 0 to 20971520
> [ 1997.616104] [ T140674] XFS (loop0): Mounting V5 Filesystem 45368b30-8413-48b4-becc-4e787f833e42
> [ 1997.638100] [ T140674] XFS (loop0): Ending clean mount
> [ 1997.916462] [ T140706] XFS (loop0): Unmounting Filesystem 45368b30-8413-48b4-becc-4e787f833e42
> [ 1998.076829] [ T140708] loop0: detected capacity change from 0 to 20971520
> [ 1998.113814] [ T140708] XFS (loop0): Mounting V5 Filesystem 2f115ed5-8d6d-4a03-97fb-4b2215947ec6
> [ 1998.132609] [ T140708] XFS (loop0): Ending clean mount
> [ 1998.225237] [ T140727] XFS (loop0): Unmounting Filesystem 2f115ed5-8d6d-4a03-97fb-4b2215947ec6
> [ 2000.147618] [ T140793] XFS (dm-0): Unmounting Filesystem 4586471c-c0d6-4e61-b808-4ba3976a8732
> [ 2001.085155] [ T140984] XFS (dm-0): EXPERIMENTAL zoned RT device feature enabled. Use at your own risk!
> [ 2001.094211] [ T140984] XFS (dm-0): Mounting V5 Filesystem 4586471c-c0d6-4e61-b808-4ba3976a8732
> [ 2001.212889] [ T140984] XFS (dm-0): Ending clean mount
> [ 2001.215370] [ T140984] XFS (dm-0): limiting open zones to 36 due to total zone count (144)
> [ 2001.221959] [ T140984] XFS (dm-0): 144 zones of 65536 blocks (36 max open zones)
> [ 2001.680831] [ T141027] XFS (dm-0): Unmounting Filesystem 4586471c-c0d6-4e61-b808-4ba3976a8732
>
prev parent reply other threads:[~2026-04-29 15:58 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-27 6:16 [bug report] xfs/806 sporadic failure Shin'ichiro Kawasaki
2026-04-29 15:58 ` Darrick J. Wong [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260429155813.GD7751@frogsfrogsfrogs \
--to=djwong@kernel.org \
--cc=fstests@vger.kernel.org \
--cc=hch@lst.de \
--cc=linux-xfs@vger.kernel.org \
--cc=shinichiro.kawasaki@wdc.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox