* rm hanging, v6.1.35
@ 2023-07-10 21:53 Chris Dunlop
2023-07-11 0:13 ` Chris Dunlop
2023-07-11 0:53 ` rm hanging, v6.1.35 Bagas Sanjaya
0 siblings, 2 replies; 15+ messages in thread
From: Chris Dunlop @ 2023-07-10 21:53 UTC (permalink / raw)
To: linux-xfs
Hi,
This box is newly booted into linux v6.1.35 (2 days ago), it was
previously running v5.15.118 without any problems (other than that fixed
by "5e672cd69f0a xfs: non-blocking inodegc pushes", the reason for the
upgrade).
I have rm operations on two files that have been stuck for in excess of 22
hours and 18 hours respectively:
$ ps -opid,lstart,state,wchan=WCHAN-xxxxxxxxxxxxxxx,cmd -C rm
PID STARTED S WCHAN-xxxxxxxxxxxxxxx CMD
2379355 Mon Jul 10 09:07:57 2023 D vfs_unlink /bin/rm -rf /aaa/5539_tmp
2392421 Mon Jul 10 09:18:27 2023 D down_write_nested /bin/rm -rf /aaa/5539_tmp
2485728 Mon Jul 10 09:28:57 2023 D down_write_nested /bin/rm -rf /aaa/5539_tmp
2488254 Mon Jul 10 09:39:27 2023 D down_write_nested /bin/rm -rf /aaa/5539_tmp
2491180 Mon Jul 10 09:49:58 2023 D down_write_nested /bin/rm -rf /aaa/5539_tmp
3014914 Mon Jul 10 13:00:33 2023 D vfs_unlink /bin/rm -rf /bbb/5541_tmp
3095893 Mon Jul 10 13:11:03 2023 D down_write_nested /bin/rm -rf /bbb/5541_tmp
3098809 Mon Jul 10 13:21:35 2023 D down_write_nested /bin/rm -rf /bbb/5541_tmp
3101387 Mon Jul 10 13:32:06 2023 D down_write_nested /bin/rm -rf /bbb/5541_tmp
3195017 Mon Jul 10 13:42:37 2023 D down_write_nested /bin/rm -rf /bbb/5541_tmp
The "rm"s are run from a process that's obviously tried a few times to get
rid of these files.
There's nothing extraordinary about the files in terms of size:
$ ls -ltrn --full-time /aaa/5539_tmp /bbb/5541_tmp
-rw-rw-rw- 1 1482 1482 7870643 2023-07-10 06:07:58.684036505 +1000 /aaa/5539_tmp
-rw-rw-rw- 1 1482 1482 701240 2023-07-10 10:00:34.181064549 +1000 /bbb/5541_tmp
As hinted by the WCHAN in the ps output above, each "primary" rm (i.e. the
first one run on each file) stack trace looks like:
[<0>] vfs_unlink+0x48/0x270
[<0>] do_unlinkat+0x1f5/0x290
[<0>] __x64_sys_unlinkat+0x3b/0x60
[<0>] do_syscall_64+0x34/0x80
[<0>] entry_SYSCALL_64_after_hwframe+0x46/0xb0
And each "secondary" rm (i.e. the subsequent ones on each file) stack
trace looks like:
== blog-230710-xfs-rm-stuckd
[<0>] down_write_nested+0xdc/0x100
[<0>] do_unlinkat+0x10d/0x290
[<0>] __x64_sys_unlinkat+0x3b/0x60
[<0>] do_syscall_64+0x34/0x80
[<0>] entry_SYSCALL_64_after_hwframe+0x46/0xb0
Multiple kernel strack traces don't show vfs_unlink or anything related
that I can see, or anything else consistent or otherwise interesting. Most
cores are idle.
Each of /aaa and /bbb are separate XFS filesystems:
$ xfs_info /aaa
meta-data=/dev/mapper/aaa isize=512 agcount=2, agsize=268434432 blks
= sectsz=4096 attr=2, projid32bit=1
= crc=1 finobt=1, sparse=1, rmapbt=1
= reflink=1 bigtime=1 inobtcount=1
data = bsize=4096 blocks=536868864, imaxpct=5
= sunit=256 swidth=256 blks
naming =version 2 bsize=4096 ascii-ci=0, ftype=1
log =internal log bsize=4096 blocks=262143, version=2
= sectsz=4096 sunit=1 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
$ xfs_info /bbb
meta-data=/dev/mapper/bbb isize=512 agcount=8, agsize=268434432 blks
= sectsz=4096 attr=2, projid32bit=1
= crc=1 finobt=1, sparse=1, rmapbt=1
= reflink=1 bigtime=1 inobtcount=1
data = bsize=4096 blocks=1879047168, imaxpct=5
= sunit=256 swidth=256 blks
naming =version 2 bsize=4096 ascii-ci=0, ftype=1
log =internal log bsize=4096 blocks=521728, version=2
= sectsz=4096 sunit=1 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
There's plenty of free space at the fs level:
$ df -h /aaa /bbb
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/aaa 2.0T 551G 1.5T 27% /aaa
/dev/mapper/bbb 7.0T 3.6T 3.5T 52% /bbb
The fses are on sparse ceph/rbd volumes, the underlying storage tells me
they're 50-60% utilised:
aaa: provisioned="2048G" used="1015.9G"
bbb: provisioned="7168G" used="4925.3G"
Where to from here?
I'm guessing only a reboot is going to unstick this. Anything I should be
looking at before reverting to v5.15.118?
...subsequent to starting writing all this down I have another two sets of
rms stuck, again on unremarkable files, and on two more separate
filesystems.
...oh. And an 'ls' on those files is hanging. The reboot has become more
urgent.
Cheers,
Chris
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: rm hanging, v6.1.35 2023-07-10 21:53 rm hanging, v6.1.35 Chris Dunlop @ 2023-07-11 0:13 ` Chris Dunlop 2023-07-11 1:57 ` Chris Dunlop 2023-07-11 0:53 ` rm hanging, v6.1.35 Bagas Sanjaya 1 sibling, 1 reply; 15+ messages in thread From: Chris Dunlop @ 2023-07-11 0:13 UTC (permalink / raw) To: linux-xfs On Tue, Jul 11, 2023 at 07:53:54AM +1000, Chris Dunlop wrote: > Hi, > > This box is newly booted into linux v6.1.35 (2 days ago), it was > previously running v5.15.118 without any problems (other than that > fixed by "5e672cd69f0a xfs: non-blocking inodegc pushes", the reason > for the upgrade). > > I have rm operations on two files that have been stuck for in excess > of 22 hours and 18 hours respectively: ... > ...subsequent to starting writing all this down I have another two > sets of rms stuck, again on unremarkable files, and on two more > separate filesystems. > > ...oh. And an 'ls' on those files is hanging. The reboot has become > more urgent. FYI, it's not 'ls' that's hanging, it's bash, because I used a wildcard on the command line. The bash stack: $ cat /proc/24779/stack [<0>] iterate_dir+0x3e/0x180 [<0>] __x64_sys_getdents64+0x71/0x100 [<0>] do_syscall_64+0x34/0x80 [<0>] entry_SYSCALL_64_after_hwframe+0x46/0xb0 'lsof' shows me it's trying to read one of the directories holding the file that one of the newer hanging "rm"s is trying to remove. Cheers, Chris ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: rm hanging, v6.1.35 2023-07-11 0:13 ` Chris Dunlop @ 2023-07-11 1:57 ` Chris Dunlop 2023-07-11 3:10 ` Dave Chinner 0 siblings, 1 reply; 15+ messages in thread From: Chris Dunlop @ 2023-07-11 1:57 UTC (permalink / raw) To: linux-xfs On Tue, Jul 11, 2023 at 10:13:31AM +1000, Chris Dunlop wrote: > On Tue, Jul 11, 2023 at 07:53:54AM +1000, Chris Dunlop wrote: >> Hi, >> >> This box is newly booted into linux v6.1.35 (2 days ago), it was >> previously running v5.15.118 without any problems (other than that >> fixed by "5e672cd69f0a xfs: non-blocking inodegc pushes", the reason >> for the upgrade). >> >> I have rm operations on two files that have been stuck for in excess >> of 22 hours and 18 hours respectively: > ... >> ...subsequent to starting writing all this down I have another two >> sets of rms stuck, again on unremarkable files, and on two more >> separate filesystems. >> >> ...oh. And an 'ls' on those files is hanging. The reboot has become >> more urgent. > > FYI, it's not 'ls' that's hanging, it's bash, because I used a > wildcard on the command line. The bash stack: > > $ cat /proc/24779/stack > [<0>] iterate_dir+0x3e/0x180 > [<0>] __x64_sys_getdents64+0x71/0x100 > [<0>] do_syscall_64+0x34/0x80 > [<0>] entry_SYSCALL_64_after_hwframe+0x46/0xb0 > > 'lsof' shows me it's trying to read one of the directories holding the > file that one of the newer hanging "rm"s is trying to remove. Ugh. It wasn't just the "rm"s and bash hanging (and as it turns out, xfs_logprint), they were just obvious because that's what I was looking at. It turns out there was a whole lot more hanging. Full sysrq-w output at: https://file.io/tg7F5OqIWo1B Sorry if there's more I could be looking at, but I've gotta reboot this thing NOW... Cheers, Chris ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: rm hanging, v6.1.35 2023-07-11 1:57 ` Chris Dunlop @ 2023-07-11 3:10 ` Dave Chinner 2023-07-11 7:05 ` Chris Dunlop 0 siblings, 1 reply; 15+ messages in thread From: Dave Chinner @ 2023-07-11 3:10 UTC (permalink / raw) To: Chris Dunlop; +Cc: linux-xfs FYI, Chris: google is classifying your email as spam because it doesn't have any dkim/dmarc authentication on it. You're lucky I even got this, because I know that gmail mostly just drops such email in a black hole now.... On Tue, Jul 11, 2023 at 11:57:16AM +1000, Chris Dunlop wrote: > On Tue, Jul 11, 2023 at 10:13:31AM +1000, Chris Dunlop wrote: > > On Tue, Jul 11, 2023 at 07:53:54AM +1000, Chris Dunlop wrote: > > > Hi, > > > > > > This box is newly booted into linux v6.1.35 (2 days ago), it was > > > previously running v5.15.118 without any problems (other than that > > > fixed by "5e672cd69f0a xfs: non-blocking inodegc pushes", the reason > > > for the upgrade). > > > > > > I have rm operations on two files that have been stuck for in excess > > > of 22 hours and 18 hours respectively: > > ... > > > ...subsequent to starting writing all this down I have another two > > > sets of rms stuck, again on unremarkable files, and on two more > > > separate filesystems. > > > > > > ...oh. And an 'ls' on those files is hanging. The reboot has become > > > more urgent. > > > > FYI, it's not 'ls' that's hanging, it's bash, because I used a wildcard > > on the command line. The bash stack: > > > > $ cat /proc/24779/stack > > [<0>] iterate_dir+0x3e/0x180 > > [<0>] __x64_sys_getdents64+0x71/0x100 > > [<0>] do_syscall_64+0x34/0x80 > > [<0>] entry_SYSCALL_64_after_hwframe+0x46/0xb0 > > > > 'lsof' shows me it's trying to read one of the directories holding the > > file that one of the newer hanging "rm"s is trying to remove. > > Ugh. It wasn't just the "rm"s and bash hanging (and as it turns out, > xfs_logprint), they were just obvious because that's what I was looking at. > It turns out there was a whole lot more hanging. > > Full sysrq-w output at: > > https://file.io/tg7F5OqIWo1B Ok, you have XFS on dm-writecache on md raid 5 and everything is stuck in either dm-writecache or md. For example, dm-writecache writeback is stuck waiting on md: Jul 11 11:23:15 b2 kernel: task:kworker/40:1 state:D stack:0 pid:1801085 ppid:2 flags:0x00004000 Jul 11 11:23:15 b2 kernel: Workqueue: writecache-writeback writecache_flush_work [dm_writecache] Jul 11 11:23:15 b2 kernel: Call Trace: Jul 11 11:23:15 b2 kernel: <TASK> Jul 11 11:23:15 b2 kernel: __schedule+0x245/0x7a0 Jul 11 11:23:15 b2 kernel: schedule+0x50/0xa0 Jul 11 11:23:15 b2 kernel: raid5_get_active_stripe+0x1de/0x480 [raid456] Jul 11 11:23:15 b2 kernel: raid5_make_request+0x290/0xe10 [raid456] Jul 11 11:23:15 b2 kernel: md_handle_request+0x196/0x240 [md_mod] Jul 11 11:23:15 b2 kernel: __submit_bio+0x125/0x250 Jul 11 11:23:15 b2 kernel: submit_bio_noacct_nocheck+0x141/0x370 Jul 11 11:23:15 b2 kernel: dispatch_io+0x16f/0x2e0 [dm_mod] Jul 11 11:23:15 b2 kernel: dm_io+0xf3/0x200 [dm_mod] Jul 11 11:23:15 b2 kernel: ssd_commit_flushed+0x12c/0x1c0 [dm_writecache] Jul 11 11:23:15 b2 kernel: writecache_flush+0xde/0x2a0 [dm_writecache] Jul 11 11:23:15 b2 kernel: writecache_flush_work+0x1f/0x30 [dm_writecache] Jul 11 11:23:15 b2 kernel: process_one_work+0x296/0x500 Jul 11 11:23:15 b2 kernel: worker_thread+0x2a/0x3b0 Jul 11 11:23:15 b2 kernel: kthread+0xe6/0x110 Jul 11 11:23:15 b2 kernel: ret_from_fork+0x1f/0x30 Jul 11 11:23:15 b2 kernel: </TASK> This task holds the wc_lock() while it is doing this writeback flush. All the XFS filesystems are stuck in similar ways, such as trying to push the journal and getting stuck in IO submission in dm-writecache: Jul 11 11:23:15 b2 kernel: Workqueue: xfs-cil/dm-128 xlog_cil_push_work [xfs] Jul 11 11:23:15 b2 kernel: Call Trace: Jul 11 11:23:15 b2 kernel: <TASK> Jul 11 11:23:15 b2 kernel: __schedule+0x245/0x7a0 Jul 11 11:23:15 b2 kernel: schedule+0x50/0xa0 Jul 11 11:23:15 b2 kernel: schedule_preempt_disabled+0x14/0x20 Jul 11 11:23:15 b2 kernel: __mutex_lock+0x72a/0xc80 Jul 11 11:23:15 b2 kernel: writecache_map+0x2f/0x790 [dm_writecache] Jul 11 11:23:15 b2 kernel: __map_bio+0x46/0x1c0 [dm_mod] Jul 11 11:23:15 b2 kernel: __send_duplicate_bios+0x1d4/0x240 [dm_mod] Jul 11 11:23:15 b2 kernel: __send_empty_flush+0x95/0xd0 [dm_mod] Jul 11 11:23:15 b2 kernel: dm_submit_bio+0x3b5/0x5f0 [dm_mod] Jul 11 11:23:15 b2 kernel: __submit_bio+0x125/0x250 Jul 11 11:23:15 b2 kernel: submit_bio_noacct_nocheck+0x141/0x370 Jul 11 11:23:15 b2 kernel: xlog_state_release_iclog+0xfb/0x220 [xfs] Jul 11 11:23:15 b2 kernel: xlog_write_get_more_iclog_space+0x87/0x130 [xfs] Jul 11 11:23:15 b2 kernel: xlog_write+0x295/0x3d0 [xfs] Jul 11 11:23:15 b2 kernel: xlog_cil_push_work+0x5b5/0x760 [xfs] Jul 11 11:23:15 b2 kernel: process_one_work+0x296/0x500 Jul 11 11:23:15 b2 kernel: worker_thread+0x2a/0x3b0 Jul 11 11:23:15 b2 kernel: kthread+0xe6/0x110 Jul 11 11:23:15 b2 kernel: ret_from_fork+0x1f/0x30 These are getting into dm-writecache, and the lock they are getting stuck on is the wc_lock(). Directory reads are getting stuck in dm-write-cache: Jul 11 11:23:16 b2 kernel: task:find state:D stack:0 pid:1832634 ppid:1832627 flags:0x00000000 Jul 11 11:23:16 b2 kernel: Call Trace: Jul 11 11:23:16 b2 kernel: <TASK> Jul 11 11:23:16 b2 kernel: __schedule+0x245/0x7a0 Jul 11 11:23:16 b2 kernel: schedule+0x50/0xa0 Jul 11 11:23:16 b2 kernel: schedule_preempt_disabled+0x14/0x20 Jul 11 11:23:16 b2 kernel: __mutex_lock+0x72a/0xc80 Jul 11 11:23:16 b2 kernel: writecache_map+0x2f/0x790 [dm_writecache] Jul 11 11:23:16 b2 kernel: __map_bio+0x46/0x1c0 [dm_mod] Jul 11 11:23:16 b2 kernel: dm_submit_bio+0x273/0x5f0 [dm_mod] Jul 11 11:23:16 b2 kernel: __submit_bio+0x125/0x250 Jul 11 11:23:16 b2 kernel: submit_bio_noacct_nocheck+0x141/0x370 Jul 11 11:23:16 b2 kernel: _xfs_buf_ioapply+0x224/0x3c0 [xfs] Jul 11 11:23:16 b2 kernel: __xfs_buf_submit+0x88/0x1c0 [xfs] Jul 11 11:23:16 b2 kernel: xfs_buf_read_map+0x1a2/0x340 [xfs] Jul 11 11:23:16 b2 kernel: xfs_buf_readahead_map+0x23/0x30 [xfs] Jul 11 11:23:16 b2 kernel: xfs_da_reada_buf+0x76/0x80 [xfs] Jul 11 11:23:16 b2 kernel: xfs_dir2_leaf_readbuf+0x269/0x350 [xfs] Jul 11 11:23:16 b2 kernel: xfs_dir2_leaf_getdents+0xd5/0x3b0 [xfs] Jul 11 11:23:16 b2 kernel: xfs_readdir+0xff/0x1d0 [xfs] Jul 11 11:23:16 b2 kernel: iterate_dir+0x13f/0x180 Jul 11 11:23:16 b2 kernel: __x64_sys_getdents64+0x71/0x100 Jul 11 11:23:16 b2 kernel: ? __x64_sys_getdents64+0x100/0x100 Jul 11 11:23:16 b2 kernel: do_syscall_64+0x34/0x80 Jul 11 11:23:16 b2 kernel: entry_SYSCALL_64_after_hwframe+0x46/0xb0 Same lock. File data reads are getting stuck the same way in dm-writecache. File data writes (i.e. writeback) are getting stuck the same way in dm-writecache. Yup, everything is stuck on dm-writecache flushing to the underlying RAID-5 device. I don't see anything indicating a filesystem problem here. This looks like a massively overloaded RAID 5 volume. i.e. the fast storage that makes up the write cache has filled, and now everything is stuck waiting for the fast cache to drain to the slow backing store before new writes can be made to the fast cache. IOWs, everything is running as RAID5 write speed and there's a huge backlog of data being written to the RAID 5 volume(s) keeping them 100% busy. If there is no IO going to the RAID 5 volumes, then you've got a RAID 5 or physical IO hang of some kind. But I can't find any indication of that - everything looks like it's waiting on RAID 5 stripe population to make write progress... -Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: rm hanging, v6.1.35 2023-07-11 3:10 ` Dave Chinner @ 2023-07-11 7:05 ` Chris Dunlop 2023-07-11 22:21 ` Dave Chinner 0 siblings, 1 reply; 15+ messages in thread From: Chris Dunlop @ 2023-07-11 7:05 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-xfs On Tue, Jul 11, 2023 at 01:10:11PM +1000, Dave Chinner wrote: > FYI, Chris: google is classifying your email as spam because it > doesn't have any dkim/dmarc authentication on it. You're lucky I > even got this, because I know that gmail mostly just drops such > email in a black hole now.... Dkim hopefully fixed now. More on the problem topic below, and apologies for the aside, but this is my immediate concern: post the reboot into v5.15.118 I have one of the filesystems failing to mount with: Jul 11 16:13:10 b2 kernel: XFS (dm-146): Metadata CRC error detected at xfs_allocbt_read_verify+0x12/0xb0 [xfs], xfs_bnobt block 0x10a00f7f8 Jul 11 16:13:10 b2 kernel: XFS (dm-146): Unmount and run xfs_repair Jul 11 16:13:10 b2 kernel: XFS (dm-146): First 128 bytes of corrupted metadata buffer: Jul 11 16:13:10 b2 kernel: 00000000: 41 42 33 42 00 00 01 ab 05 0a 78 fe 02 c0 2b ff AB3B......x...+. Jul 11 16:13:10 b2 kernel: 00000010: 00 00 00 01 0a 00 f7 f8 00 00 00 26 00 3b 2d 40 ...........&.;-@ Jul 11 16:13:10 b2 kernel: 00000020: cf 42 ed 07 78 a1 4e ff 9e 20 e3 d9 6f fc 3e 30 .B..x.N.. ..o.>0 Jul 11 16:13:10 b2 kernel: 00000030: 00 00 00 02 80 b2 25 41 08 c7 6c 00 00 00 01 28 ......%A..l....( Jul 11 16:13:10 b2 kernel: 00000040: 08 c7 6d 3e 00 00 00 c2 08 c7 6f 65 00 00 00 a7 ..m>......oe.... Jul 11 16:13:10 b2 kernel: 00000050: 08 c7 70 a5 00 00 00 3a 08 c7 71 00 00 00 00 c4 ..p....:..q..... Jul 11 16:13:10 b2 kernel: 00000060: 08 c7 71 e8 00 00 00 18 08 c7 72 50 00 00 02 c3 ..q.......rP.... Jul 11 16:13:10 b2 kernel: 00000070: 08 c7 75 b1 00 00 02 4b 08 c7 78 db 00 00 02 3d ..u....K..x....= Jul 11 16:13:10 b2 kernel: XFS (dm-146): log mount/recovery failed: error -74 Jul 11 16:13:10 b2 kernel: XFS (dm-146): log mount failed Then xfs_repair comes back with: # xfs_repair /dev/vg-name/lv-name Phase 1 - find and verify superblock... - reporting progress in intervals of 15 minutes Phase 2 - using internal log - zero log... ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this. At this point is "xfs_repair -L" (and attendant potential data loss) my only option? > On Tue, Jul 11, 2023 at 11:57:16AM +1000, Chris Dunlop wrote: >> On Tue, Jul 11, 2023 at 10:13:31AM +1000, Chris Dunlop wrote: >>> On Tue, Jul 11, 2023 at 07:53:54AM +1000, Chris Dunlop wrote: >>>> Hi, >>>> >>>> This box is newly booted into linux v6.1.35 (2 days ago), it was >>>> previously running v5.15.118 without any problems (other than that >>>> fixed by "5e672cd69f0a xfs: non-blocking inodegc pushes", the reason >>>> for the upgrade). >>>> >>>> I have rm operations on two files that have been stuck for in excess >>>> of 22 hours and 18 hours respectively: ... >> >> Full sysrq-w output at: >> >> https://file.io/tg7F5OqIWo1B > > Ok, you have XFS on dm-writecache on md raid 5 and > everything is stuck in either dm-writecache or md. Yes, dm-writecache, with the cache on SSD md raid6, in front of the XFS devices on lvm / ceph-rbd. > This task holds the wc_lock() while it is doing this writeback > flush. > > All the XFS filesystems are stuck in similar ways, such as trying to > push the journal and getting stuck in IO submission in > dm-writecache: ... > These are getting into dm-writecache, and the lock they are getting > stuck on is the wc_lock(). ... > > Directory reads are getting stuck in dm-write-cache: ... > Same lock. > > File data reads are getting stuck the same way in dm-writecache. > > File data writes (i.e. writeback) are getting stuck the same way > in dm-writecache. > > Yup, everything is stuck on dm-writecache flushing to the underlying > RAID-5 device. > > I don't see anything indicating a filesystem problem here. This > looks like a massively overloaded RAID 5 volume. i.e. the fast > storage that makes up the write cache has filled, and now everything > is stuck waiting for the fast cache to drain to the slow backing > store before new writes can be made to the fast cache. IOWs, > everything is running as RAID5 write speed and there's a huge > backlog of data being written to the RAID 5 volume(s) keeping them > 100% busy. This had never been an issue with v5.15. Is it possible the upgrade to v6.1 had a hand in this or that's probably just coincidence? In particular, could "5e672cd69f0a xfs: non-blocking inodegc pushes" cause a significantly greater write load on the cache? I note that there's one fs (separate to the corrupt one above) that's still in mount recovery since the boot some hours ago. On past experience that indicates the inodegc stuff is holding things up, i.e. it would have been running prior to the reboot - or at least trying to. > If there is no IO going to the RAID 5 volumes, then you've got a > RAID 5 or physical IO hang of some kind. But I can't find any > indication of that - everything looks like it's waiting on RAID 5 > stripe population to make write progress... There's been no change to the cache device over the reboot, and it currently looks busy, but it doesn't look completely swamped: load %user %nice %sys %iow %stl %idle dev rrqm/s wrqm/s r/s w/s rkB/s wkB/s arq-sz aqu-sz await rwait wwait %util 2023-07-11-16:10:00 16.7 0.6 0.0 1.0 15.0 0.0 82.5 md10 0.0 0.0 172.5 1060.1 17131.66 27450.08 72.34 6.33 5.13 0.56 5.9 48.8 2023-07-11-16:11:00 17.2 0.8 0.0 1.1 15.6 0.0 81.7 md10 0.0 0.0 185.7 1151.4 18202.36 20681.51 58.16 8.18 6.12 0.53 7.0 44.0 2023-07-11-16:12:00 18.9 0.5 0.0 1.0 17.0 0.0 80.6 md10 0.0 0.0 179.9 1139.6 15617.51 26579.51 63.96 7.62 5.78 0.46 6.6 47.5 2023-07-11-16:13:00 18.7 0.6 0.0 1.0 15.4 0.0 82.1 md10 0.0 0.0 161.8 1399.7 15751.70 26273.20 53.83 11.47 7.35 0.51 8.1 47.4 2023-07-11-16:14:00 17.2 0.5 0.0 0.9 13.9 0.0 83.8 md10 0.0 0.0 190.2 1251.1 15207.77 20907.73 50.12 14.19 9.85 2.16 11.0 42.2 2023-07-11-16:15:00 17.0 0.6 0.0 1.0 13.8 0.0 83.7 md10 0.0 0.0 163.5 1571.2 14561.41 24508.45 45.05 14.38 8.29 0.53 9.1 46.7 2023-07-11-16:16:00 18.7 0.8 0.0 1.0 15.5 0.0 81.8 md10 0.0 0.0 164.6 1294.7 13256.49 20737.23 46.59 12.54 8.59 0.50 9.6 42.3 2023-07-11-16:17:00 18.9 0.6 0.0 1.0 14.0 0.0 83.4 md10 0.0 0.0 179.4 1026.7 14079.12 19095.93 55.01 9.12 7.56 0.91 8.7 44.8 2023-07-11-16:18:00 19.4 0.6 0.0 1.0 13.8 0.0 83.7 md10 0.0 0.0 151.6 1315.5 12935.14 23547.30 49.73 11.80 8.04 0.43 8.9 48.5 2023-07-11-16:19:00 19.6 0.6 0.0 1.0 11.9 0.0 85.7 md10 0.0 0.0 168.6 1895.9 14153.63 23810.50 36.78 16.75 8.11 0.77 8.8 48.5 It should be handling about the same load as prior to the reboot. Cheers, Chris ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: rm hanging, v6.1.35 2023-07-11 7:05 ` Chris Dunlop @ 2023-07-11 22:21 ` Dave Chinner 2023-07-12 1:13 ` Chris Dunlop 0 siblings, 1 reply; 15+ messages in thread From: Dave Chinner @ 2023-07-11 22:21 UTC (permalink / raw) To: Chris Dunlop; +Cc: linux-xfs On Tue, Jul 11, 2023 at 05:05:30PM +1000, Chris Dunlop wrote: > On Tue, Jul 11, 2023 at 01:10:11PM +1000, Dave Chinner wrote: > > FYI, Chris: google is classifying your email as spam because it > > doesn't have any dkim/dmarc authentication on it. You're lucky I > > even got this, because I know that gmail mostly just drops such > > email in a black hole now.... > > Dkim hopefully fixed now. Looks good. > More on the problem topic below, and apologies for the aside, but this is my > immediate concern: post the reboot into v5.15.118 I have one of the > filesystems failing to mount with: > > Jul 11 16:13:10 b2 kernel: XFS (dm-146): Metadata CRC error detected at xfs_allocbt_read_verify+0x12/0xb0 [xfs], xfs_bnobt block 0x10a00f7f8 > Jul 11 16:13:10 b2 kernel: XFS (dm-146): Unmount and run xfs_repair > Jul 11 16:13:10 b2 kernel: XFS (dm-146): First 128 bytes of corrupted metadata buffer: > Jul 11 16:13:10 b2 kernel: 00000000: 41 42 33 42 00 00 01 ab 05 0a 78 fe 02 c0 2b ff AB3B......x...+. > Jul 11 16:13:10 b2 kernel: 00000010: 00 00 00 01 0a 00 f7 f8 00 00 00 26 00 3b 2d 40 ...........&.;-@ > Jul 11 16:13:10 b2 kernel: 00000020: cf 42 ed 07 78 a1 4e ff 9e 20 e3 d9 6f fc 3e 30 .B..x.N.. ..o.>0 > Jul 11 16:13:10 b2 kernel: 00000030: 00 00 00 02 80 b2 25 41 08 c7 6c 00 00 00 01 28 ......%A..l....( > Jul 11 16:13:10 b2 kernel: 00000040: 08 c7 6d 3e 00 00 00 c2 08 c7 6f 65 00 00 00 a7 ..m>......oe.... > Jul 11 16:13:10 b2 kernel: 00000050: 08 c7 70 a5 00 00 00 3a 08 c7 71 00 00 00 00 c4 ..p....:..q..... > Jul 11 16:13:10 b2 kernel: 00000060: 08 c7 71 e8 00 00 00 18 08 c7 72 50 00 00 02 c3 ..q.......rP.... > Jul 11 16:13:10 b2 kernel: 00000070: 08 c7 75 b1 00 00 02 4b 08 c7 78 db 00 00 02 3d ..u....K..x....= > Jul 11 16:13:10 b2 kernel: XFS (dm-146): log mount/recovery failed: error -74 > Jul 11 16:13:10 b2 kernel: XFS (dm-146): log mount failed I would suggest that this indicates a torn write of some kind. Given the state of the system when you rebooted, and all the RAID 5/6 writes in progress, this is entirely possible... > Then xfs_repair comes back with: > > # xfs_repair /dev/vg-name/lv-name > Phase 1 - find and verify superblock... > - reporting progress in intervals of 15 minutes > Phase 2 - using internal log > - zero log... > ERROR: The filesystem has valuable metadata changes in a log which needs to > be replayed. Mount the filesystem to replay the log, and unmount it before > re-running xfs_repair. If you are unable to mount the filesystem, then use > the -L option to destroy the log and attempt a repair. > Note that destroying the log may cause corruption -- please attempt a mount > of the filesystem before doing this. > > At this point is "xfs_repair -L" (and attendant potential data loss) my only > option? One option is to correct the CRC with xfs_db, then try to mount the filesystem again, but that will simply allow recovery to try to modify what is possibly a bad btree block and then have other things go really wrong (e.g. cross-linked data, duplicate freespace) at a later point in time. That's potentially far more damaging, and I wouldn't try it without having a viable rollback strategy (e.g. full device snapshot and/or copy).... It's probably safest to just zero the log and run repair at this point. > > This task holds the wc_lock() while it is doing this writeback > > flush. > > > > All the XFS filesystems are stuck in similar ways, such as trying to > > push the journal and getting stuck in IO submission in > > dm-writecache: > ... > > These are getting into dm-writecache, and the lock they are getting > > stuck on is the wc_lock(). > ... > > > > Directory reads are getting stuck in dm-write-cache: > ... > > Same lock. > > > > File data reads are getting stuck the same way in dm-writecache. > > > > File data writes (i.e. writeback) are getting stuck the same way > > in dm-writecache. > > > > Yup, everything is stuck on dm-writecache flushing to the underlying > > RAID-5 device. > > > > I don't see anything indicating a filesystem problem here. This > > looks like a massively overloaded RAID 5 volume. i.e. the fast > > storage that makes up the write cache has filled, and now everything > > is stuck waiting for the fast cache to drain to the slow backing > > store before new writes can be made to the fast cache. IOWs, > > everything is running as RAID5 write speed and there's a huge > > backlog of data being written to the RAID 5 volume(s) keeping them > > 100% busy. > > This had never been an issue with v5.15. Is it possible the upgrade to v6.1 > had a hand in this or that's probably just coincidence? It could be a dm-writecache or md raid regression, but it could be just "luck". > In particular, could "5e672cd69f0a xfs: non-blocking inodegc pushes" cause a > significantly greater write load on the cache? No. > I note that there's one fs (separate to the corrupt one above) that's still > in mount recovery since the boot some hours ago. On past experience that > indicates the inodegc stuff is holding things up, i.e. it would have been > running prior to the reboot - or at least trying to. I only found one inodegc working running in the trace above - it was blocked on writecache doing a rmapbt block read removing extents. It may have been a long running cleanup, but it may not have been. As it is, mount taking a long time because there's a inode with a huge number of extents that need freeing on an unlinked list is *not* related to background inodegc in any way - this has -always- happened with XFS; it's simply what unlinked inode recovery does. i.e. background inodegc just moved the extent freeing from syscall exit context to an async worker thread; the filesystem still has exactly the same work to do. If the system goes down while that work is in progress, log recovery has always finished off that cleanup work... > > If there is no IO going to the RAID 5 volumes, then you've got a > > RAID 5 or physical IO hang of some kind. But I can't find any > > indication of that - everything looks like it's waiting on RAID 5 > > stripe population to make write progress... > > There's been no change to the cache device over the reboot, and it > currently looks busy, but it doesn't look completely swamped: .... About 150 1MB sized reads, and about 1000-1500 much smaller writes each second, with an average write wait of near 10ms. Certainly not a fast device, and it's running at about 50% utilisation with an average queue depth of 10-15 IOs. > It should be handling about the same load as prior to the reboot. If that's the typical sustained load, I wouldn't expect it to have that much extra overhead given small writes on RAID 6 volumes have the worse performance characteristics possible. If the write cache starts flushing lots of small discontiguous writes, I'd expect to see that device go to 100% utilisation and long write wait times for extended periods... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: rm hanging, v6.1.35 2023-07-11 22:21 ` Dave Chinner @ 2023-07-12 1:13 ` Chris Dunlop 2023-07-12 1:42 ` Dave Chinner 0 siblings, 1 reply; 15+ messages in thread From: Chris Dunlop @ 2023-07-12 1:13 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-xfs On Wed, Jul 12, 2023 at 08:21:11AM +1000, Dave Chinner wrote: > On Tue, Jul 11, 2023 at 05:05:30PM +1000, Chris Dunlop wrote: >> On Tue, Jul 11, 2023 at 01:10:11PM +1000, Dave Chinner wrote: > >> More on the problem topic below, and apologies for the aside, but >> this is my immediate concern: post the reboot into v5.15.118 I have >> one of the filesystems failing to mount with: ... > I would suggest that this indicates a torn write of some kind. Given > the state of the system when you rebooted, and all the RAID 5/6 > writes in progress, this is entirely possible... ... > It's probably safest to just zero the log and run repair at this > point. Thanks. I did that last night (after taking a snapshot) - 'xfs_repair -v' is still running but there's a LOT of nasty output so it's not looking good. ...oh. It finished whilst I've been writing this. If you're interested in the log: https://file.io/XOGokgxgttEX The directory structure looks sane, I'll start running checks on the data. >>> I don't see anything indicating a filesystem problem here. This >>> looks like a massively overloaded RAID 5 volume. i.e. the fast >>> storage that makes up the write cache has filled, and now everything >>> is stuck waiting for the fast cache to drain to the slow backing >>> store before new writes can be made to the fast cache. IOWs, >>> everything is running as RAID5 write speed and there's a huge >>> backlog of data being written to the RAID 5 volume(s) keeping them >>> 100% busy. Oddly, of the 56 similarly configured filesystems (writecache/lvm/rbd) on this box, with maybe 10 actively sinking writes at any time, that one above is the only one that had any trouble on reboot. If it had been a general problem w/ the cache device I would have thought more of the active writers would have similar issues. Maybe I just got lucky - and/or that demonstrates how hard xfs tries to keep your data sane. >> This had never been an issue with v5.15. Is it possible the upgrade >> to v6.1 had a hand in this or that's probably just coincidence? > > It could be a dm-writecache or md raid regression, but it could be > just "luck". Ugh. Sigh. I guess at some point I'm going to have to bite the bullet again, and next time watch the cache device like a hawk. I'll keep an eye out for dm-writecache and md raid problems and patches etc. so see what might come up. Is there anything else that occurs to you that I might monitor prior to and during any future recurrance of the problem? >> In particular, could "5e672cd69f0a xfs: non-blocking inodegc >> pushes" cause a significantly greater write load on the cache? > > No. > >> I note that there's one fs (separate to the corrupt one above) >> that's still in mount recovery since the boot some hours ago. On >> past experience that indicates the inodegc stuff is holding things >> up, i.e. it would have been running prior to the reboot - or at >> least trying to. > > I only found one inodegc working running in the trace above - it was > blocked on writecache doing a rmapbt block read removing extents. It > may have been a long running cleanup, but it may not have been. Probably was a cleanup: that's the reason for the update to v6.1, every now and again the box was running into the problem of getting blocked on the cleanup. The extent of the problem is significantly reduced by moving from one large fs where any extended cleanup would block everything, to multiple small-ish fs'es (500G-15T) where the blast radius of an extended cleanup is far more constrained. But the problem was still pretty annoying when it hits. Hmmm, maybe I can just carry a local backport of "non-blocking inodegc pushes" in my local v5.15. That would push back my need to move do v6.1. Or could / should it be considered for an official backport? Looks like it applies cleanly to current v5.15.120. > As it is, mount taking a long time because there's a inode with a > huge number of extents that need freeing on an unlinked list is > *not* related to background inodegc in any way - this has -always- > happened with XFS; it's simply what unlinked inode recovery does. > > i.e. background inodegc just moved the extent freeing from syscall > exit context to an async worker thread; the filesystem still has > exactly the same work to do. If the system goes down while that work > is in progress, log recovery has always finished off that cleanup > work... Gotcha. I'd mistakenly thought "non-blocking inodegc pushes" queued up the garbage collection for background processing. My further mistaken hand-wavy thought was that, if the processing that was previously foreground was now punted to background (perhaps with different priorities) maybe the background processing was simply way more efficient, enough to swamp the cache with metadata updates. But with your further explanation and actually reading the patch (should have done that first) shows the gc was already queued, the update was to NOT wait for the queue to be flushed. Hmmm, then again... might it be possible that, without the patch, at some point after a large delete, further work was blocked whilst waiting for the queue to be flushed, limiting the total amount of work, but with the patch, the further work (e.g. more deletes) is able to be queued - possibly to the point of swamping the cache device? >> There's been no change to the cache device over the reboot, and it >> currently looks busy, but it doesn't look completely swamped: > .... > > About 150 1MB sized reads, and about 1000-1500 much smaller > writes each second, with an average write wait of near 10ms. > Certainly not a fast device, and it's running at about 50% > utilisation with an average queue depth of 10-15 IOs. That's hopefully the cache working as intended: sinking small continuous writes (network data uploads) and aggregating them into larger blocks to flush out to the bulk storage (i.e. reads from the cache to write to the bulk). >> It should be handling about the same load as prior to the reboot. > > If that's the typical sustained load, I wouldn't expect it to have > that much extra overhead given small writes on RAID 6 volumes have > the worse performance characteristics possible. If the write cache > starts flushing lots of small discontiguous writes, I'd expect to > see that device go to 100% utilisation and long write wait times for > extended periods... There shouldn't be many small discontigous writes in the data: it's basically network uploads to sequential files in the 100s MB to multi GB range. But there's also a bunch of reflinking, not to mention occasionally removing highly reflinked multi-TB files, so these metadata updates might count as "lots of small discontiguous writes"? Thanks for your help, Chris ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: rm hanging, v6.1.35 2023-07-12 1:13 ` Chris Dunlop @ 2023-07-12 1:42 ` Dave Chinner 2023-07-12 2:17 ` Subject: v5.15 backport - 5e672cd69f0a xfs: non-blocking inodegc pushes Chris Dunlop 0 siblings, 1 reply; 15+ messages in thread From: Dave Chinner @ 2023-07-12 1:42 UTC (permalink / raw) To: Chris Dunlop; +Cc: linux-xfs On Wed, Jul 12, 2023 at 11:13:56AM +1000, Chris Dunlop wrote: > On Wed, Jul 12, 2023 at 08:21:11AM +1000, Dave Chinner wrote: > > On Tue, Jul 11, 2023 at 05:05:30PM +1000, Chris Dunlop wrote: > > > On Tue, Jul 11, 2023 at 01:10:11PM +1000, Dave Chinner wrote: > > > > > More on the problem topic below, and apologies for the aside, but > > > this is my immediate concern: post the reboot into v5.15.118 I have > > > one of the filesystems failing to mount with: > ... > > I would suggest that this indicates a torn write of some kind. Given > > the state of the system when you rebooted, and all the RAID 5/6 > > writes in progress, this is entirely possible... > ... > > It's probably safest to just zero the log and run repair at this > > point. > > Thanks. I did that last night (after taking a snapshot) - 'xfs_repair -v' is > still running but there's a LOT of nasty output so it's not looking good. > > ...oh. It finished whilst I've been writing this. If you're interested in > the log: > > https://file.io/XOGokgxgttEX Oh, there's *lots* of CRC errors. All through the free space and rmap btrees, but all in a similar range of LBAs. $ grep CRC ~/Downloads/xfs_repair.log |sort -k 9 Metadata CRC error detected at 0x5557765dfc6d, xfs_rmapbt block 0x10518c600/0x1000 Metadata CRC error detected at 0x5557765dfc6d, xfs_rmapbt block 0x10518c648/0x1000 Metadata CRC error detected at 0x55577660b07d, xfs_bnobt block 0x10518c650/0x1000 Metadata CRC error detected at 0x55577660b07d, xfs_cntbt block 0x10518c668/0x1000 Metadata CRC error detected at 0x55577660b07d, xfs_bnobt block 0x10518ce88/0x1000 Metadata CRC error detected at 0x5557765dfc6d, xfs_rmapbt block 0x10518d6c0/0x1000 Metadata CRC error detected at 0x55577660b07d, xfs_bnobt block 0x10518d6d0/0x1000 Metadata CRC error detected at 0x5557765dfc6d, xfs_rmapbt block 0x10518d6e0/0x1000 Metadata CRC error detected at 0x5557765dfc6d, xfs_rmapbt block 0x10518d828/0x1000 Metadata CRC error detected at 0x55577660b07d, xfs_bnobt block 0x10518d858/0x1000 ..... There's a whole cluster of CRC errors very close together (10 across 0x1200 sectors = 6144 sectors = 3MB of disk space). This pattern of "bad CRC clusters" occurs more often than not as I look through the list of CRC errors. I suspect this indicates something more fundamental wrong with either the RAID volume of the dm-writecache on top of it... > The directory structure looks sane, I'll start running checks on the data. Given the amount of bad metadata, I wouldn't trust anything in that filesystem to be properly intact. > > > > I don't see anything indicating a filesystem problem here. This > > > > looks like a massively overloaded RAID 5 volume. i.e. the fast > > > > storage that makes up the write cache has filled, and now everything > > > > is stuck waiting for the fast cache to drain to the slow backing > > > > store before new writes can be made to the fast cache. IOWs, > > > > everything is running as RAID5 write speed and there's a huge > > > > backlog of data being written to the RAID 5 volume(s) keeping them > > > > 100% busy. > > Oddly, of the 56 similarly configured filesystems (writecache/lvm/rbd) on > this box, with maybe 10 actively sinking writes at any time, that one above > is the only one that had any trouble on reboot. If it had been a general > problem w/ the cache device I would have thought more of the active writers > would have similar issues. Maybe I just got lucky - and/or that demonstrates > how hard xfs tries to keep your data sane. I don't think it's XFS related at all.... > > > This had never been an issue with v5.15. Is it possible the upgrade > > > to v6.1 had a hand in this or that's probably just coincidence? > > > > It could be a dm-writecache or md raid regression, but it could be > > just "luck". > > Ugh. Sigh. I guess at some point I'm going to have to bite the bullet again, > and next time watch the cache device like a hawk. I'll keep an eye out for > dm-writecache and md raid problems and patches etc. so see what might come > up. > > Is there anything else that occurs to you that I might monitor prior to and > during any future recurrance of the problem? Not really. What every problem occurred that started things going bad was not evident from the information that was gathered. > > > In particular, could "5e672cd69f0a xfs: non-blocking inodegc pushes" > > > cause a significantly greater write load on the cache? > > > > No. > > > > > I note that there's one fs (separate to the corrupt one above) > > > that's still in mount recovery since the boot some hours ago. On > > > past experience that indicates the inodegc stuff is holding things > > > up, i.e. it would have been running prior to the reboot - or at > > > least trying to. > > > > I only found one inodegc working running in the trace above - it was > > blocked on writecache doing a rmapbt block read removing extents. It > > may have been a long running cleanup, but it may not have been. > > Probably was a cleanup: that's the reason for the update to v6.1, every now > and again the box was running into the problem of getting blocked on the > cleanup. The extent of the problem is significantly reduced by moving from > one large fs where any extended cleanup would block everything, to multiple > small-ish fs'es (500G-15T) where the blast radius of an extended cleanup is > far more constrained. But the problem was still pretty annoying when it > hits. > > Hmmm, maybe I can just carry a local backport of "non-blocking inodegc > pushes" in my local v5.15. That would push back my need to move do v6.1. > > Or could / should it be considered for an official backport? Looks like it > applies cleanly to current v5.15.120. I thought that had already been done - there's supposed to be someone taking care of 5.15 LTS backports for XFS.... > > As it is, mount taking a long time because there's a inode with a > > huge number of extents that need freeing on an unlinked list is > > *not* related to background inodegc in any way - this has -always- > > happened with XFS; it's simply what unlinked inode recovery does. > > > > i.e. background inodegc just moved the extent freeing from syscall > > exit context to an async worker thread; the filesystem still has > > exactly the same work to do. If the system goes down while that work > > is in progress, log recovery has always finished off that cleanup > > work... > > Gotcha. I'd mistakenly thought "non-blocking inodegc pushes" queued up the > garbage collection for background processing. My further mistaken hand-wavy > thought was that, if the processing that was previously foreground was now > punted to background (perhaps with different priorities) maybe the > background processing was simply way more efficient, enough to swamp the > cache with metadata updates. Doubt it, the processing is exactly the same code, just done from a different task context. > But with your further explanation and actually reading the patch (should > have done that first) shows the gc was already queued, the update was to NOT > wait for the queue to be flushed. > > Hmmm, then again... might it be possible that, without the patch, at some > point after a large delete, further work was blocked whilst waiting for the > queue to be flushed, limiting the total amount of work, but with the patch, > the further work (e.g. more deletes) is able to be queued - possibly to the > point of swamping the cache device? Unlinks don't generate a lot of IO. They do a lot of repeated operations to cached metadata, and the journal relogs all those blocks without triggering IO to the journal, too. > > > There's been no change to the cache device over the reboot, and it > > > currently looks busy, but it doesn't look completely swamped: > > .... > > > > About 150 1MB sized reads, and about 1000-1500 much smaller > > writes each second, with an average write wait of near 10ms. > > Certainly not a fast device, and it's running at about 50% > > utilisation with an average queue depth of 10-15 IOs. > > That's hopefully the cache working as intended: sinking small continuous > writes (network data uploads) and aggregating them into larger blocks to > flush out to the bulk storage (i.e. reads from the cache to write to the > bulk). > > > > It should be handling about the same load as prior to the reboot. > > > > If that's the typical sustained load, I wouldn't expect it to have > > that much extra overhead given small writes on RAID 6 volumes have > > the worse performance characteristics possible. If the write cache > > starts flushing lots of small discontiguous writes, I'd expect to > > see that device go to 100% utilisation and long write wait times for > > extended periods... > > There shouldn't be many small discontigous writes in the data: it's > basically network uploads to sequential files in the 100s MB to multi GB > range. But there's also a bunch of reflinking, not to mention occasionally > removing highly reflinked multi-TB files, so these metadata updates might > count as "lots of small discontiguous writes"? Yeah, the metadata updates end up being small discontiguous writes.... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 15+ messages in thread
* Subject: v5.15 backport - 5e672cd69f0a xfs: non-blocking inodegc pushes 2023-07-12 1:42 ` Dave Chinner @ 2023-07-12 2:17 ` Chris Dunlop 2023-07-12 9:26 ` Amir Goldstein 0 siblings, 1 reply; 15+ messages in thread From: Chris Dunlop @ 2023-07-12 2:17 UTC (permalink / raw) To: Leah Rumancik, Theodore Ts'o; +Cc: Dave Chinner, linux-xfs Request for backport to v5.15: 5e672cd69f0a xfs: non-blocking inodegc pushes Reference: https://lore.kernel.org/all/ZK4E%2FgGuaBu+qvKL@dread.disaster.area/ --------------------------------------------------------------------- From: Dave Chinner <david@fromorbit.com> To: Chris Dunlop <chris@onthe.net.au> Cc: linux-xfs@vger.kernel.org Subject: Re: rm hanging, v6.1.35 On Wed, Jul 12, 2023 at 11:13:56AM +1000, Chris Dunlop wrote: >> On Tue, Jul 11, 2023 at 05:05:30PM +1000, Chris Dunlop wrote: >>> In particular, could "5e672cd69f0a xfs: non-blocking inodegc pushes" >>> cause a significantly greater write load on the cache? ... > Or could / should it be considered for an official backport? Looks like it > applies cleanly to current v5.15.120. I thought that had already been done - there's supposed to be someone taking care of 5.15 LTS backports for XFS.... --------------------------------------------------------------------- Thanks, Chris ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Subject: v5.15 backport - 5e672cd69f0a xfs: non-blocking inodegc pushes 2023-07-12 2:17 ` Subject: v5.15 backport - 5e672cd69f0a xfs: non-blocking inodegc pushes Chris Dunlop @ 2023-07-12 9:26 ` Amir Goldstein 2023-07-13 0:31 ` Chris Dunlop 0 siblings, 1 reply; 15+ messages in thread From: Amir Goldstein @ 2023-07-12 9:26 UTC (permalink / raw) To: Chris Dunlop Cc: Leah Rumancik, Theodore Ts'o, Dave Chinner, linux-xfs, Leah Rumancik, Darrick J. Wong, Chandan Babu R On Wed, Jul 12, 2023 at 5:37 AM Chris Dunlop <chris@onthe.net.au> wrote: > > Request for backport to v5.15: > > 5e672cd69f0a xfs: non-blocking inodegc pushes This is not the subject of above commit, it was the subject of the cover letter: https://www.spinics.net/lists/linux-xfs/msg61813.html containing the following upstream commits: 7cf2b0f9611b xfs: bound maximum wait time for inodegc work 5e672cd69f0a xfs: introduce xfs_inodegc_push() > > Reference: > > https://lore.kernel.org/all/ZK4E%2FgGuaBu+qvKL@dread.disaster.area/ > --------------------------------------------------------------------- > From: Dave Chinner <david@fromorbit.com> > To: Chris Dunlop <chris@onthe.net.au> > Cc: linux-xfs@vger.kernel.org > Subject: Re: rm hanging, v6.1.35 > > On Wed, Jul 12, 2023 at 11:13:56AM +1000, Chris Dunlop wrote: > >> On Tue, Jul 11, 2023 at 05:05:30PM +1000, Chris Dunlop wrote: > >>> In particular, could "5e672cd69f0a xfs: non-blocking inodegc pushes" > >>> cause a significantly greater write load on the cache? > ... > > Or could / should it be considered for an official backport? Looks like it > > applies cleanly to current v5.15.120. > > I thought that had already been done - there's supposed to be > someone taking care of 5.15 LTS backports for XFS.... Leah has already queued these two patches for 5.15 backport, but she is now on sick leave, so that was not done yet. We did however, identify a few more inodegc fixes from 6.4, which also fix a bug in one of the two commits above: 03e0add80f4c xfs: explicitly specify cpu when forcing inodegc delayed work to run immediately Fixes: 7cf2b0f9611b ("xfs: bound maximum wait time for inodegc work") b37c4c8339cd xfs: check that per-cpu inodegc workers actually run on that cpu 2254a7396a0c xfs: fix xfs_inodegc_stop racing with mod_delayed_work Fixes: 6191cf3ad59f ("xfs: flush inodegc workqueue tasks before cancel") 6191cf3ad59f ("xfs: flush inodegc workqueue tasks before cancel") has already been applied to 5.15.y. stable tree rules require that the above fixes from 6.4 be applied to 6.1.y before 5.15.y, so I have already tested them and they are ready to be posted. I wanted to wait a bit after the 6.4.0 release to make sure that we did not pull the blanket too much to the other side, because as the reports from Chris demonstrate, the inodegc fixes had some unexpected outcomes. Anyway, it is 6.4.3 already and I haven't seen any shouts on the list, plus the 6.4 fixes look pretty safe, so I guess this is a good time for me to post the 6.1.y backports. w.r.t testing and posting the 5.15.y backports, we currently have a problem. Chandan said that he will not have time to fill in for Leah and I don't have a baseline established for 5.15 and am going on vacation next week anyway. So either Chandan can make an exception for this inodegc series, or it will have to wait for Leah to be back. Thanks, Amir. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Subject: v5.15 backport - 5e672cd69f0a xfs: non-blocking inodegc pushes 2023-07-12 9:26 ` Amir Goldstein @ 2023-07-13 0:31 ` Chris Dunlop 2023-07-13 0:57 ` Chris Dunlop 0 siblings, 1 reply; 15+ messages in thread From: Chris Dunlop @ 2023-07-13 0:31 UTC (permalink / raw) To: Amir Goldstein Cc: Leah Rumancik, Theodore Ts'o, Dave Chinner, linux-xfs, Leah Rumancik, Darrick J. Wong, Chandan Babu R On Wed, Jul 12, 2023 at 12:26:54PM +0300, Amir Goldstein wrote: > On Wed, Jul 12, 2023 at 5:37 AM Chris Dunlop <chris@onthe.net.au> wrote: >> >> Request for backport to v5.15: >> >> 5e672cd69f0a xfs: non-blocking inodegc pushes > > This is not the subject of above commit, it was the > subject of the cover letter: > https://www.spinics.net/lists/linux-xfs/msg61813.html > > containing the following upstream commits: > 7cf2b0f9611b xfs: bound maximum wait time for inodegc work > 5e672cd69f0a xfs: introduce xfs_inodegc_push() ... > Leah has already queued these two patches for 5.15 backport, > but she is now on sick leave, so that was not done yet. > > We did however, identify a few more inodegc fixes from 6.4, > which also fix a bug in one of the two commits above: > > 03e0add80f4c xfs: explicitly specify cpu when forcing inodegc delayed > work to run immediately > Fixes: 7cf2b0f9611b ("xfs: bound maximum wait time for inodegc work") > b37c4c8339cd xfs: check that per-cpu inodegc workers actually run on that cpu > 2254a7396a0c xfs: fix xfs_inodegc_stop racing with mod_delayed_work > Fixes: 6191cf3ad59f ("xfs: flush inodegc workqueue tasks before cancel") > > 6191cf3ad59f ("xfs: flush inodegc workqueue tasks before cancel") has already > been applied to 5.15.y. > > stable tree rules require that the above fixes from 6.4 be applied to 6.1.y > before 5.15.y, so I have already tested them and they are ready to be posted. > I wanted to wait a bit after the 6.4.0 release to make sure that we did not pull > the blanket too much to the other side, because as the reports from Chris > demonstrate, the inodegc fixes had some unexpected outcomes. Just to clarify: whilst I updated to 6.1 specifically to gain access to the non-blocking inodegc commits, there's no evidence those inodegc commits had anything at all to do with my issue on 6.1, and Dave's sense is that they probably weren't involved. That said, those "Fixes:" commits that haven't yet made it to 6.1 look at least a little suspicious to my naive eye. > Anyway, it is 6.4.3 already and I haven't seen any shouts on the list, > plus the 6.4 fixes look pretty safe, so I guess this is a good time for me > to post the 6.1.y backports. > > w.r.t testing and posting the 5.15.y backports, we currently have a problem. > Chandan said that he will not have time to fill in for Leah and I don't have > a baseline established for 5.15 and am going on vacation next week anyway. > > So either Chandan can make an exception for this inodegc series, or it will > have to wait for Leah to be back. I might gird my loins and try 6.1 again once those further fixes are in. Cheers, Chris ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Subject: v5.15 backport - 5e672cd69f0a xfs: non-blocking inodegc pushes 2023-07-13 0:31 ` Chris Dunlop @ 2023-07-13 0:57 ` Chris Dunlop 0 siblings, 0 replies; 15+ messages in thread From: Chris Dunlop @ 2023-07-13 0:57 UTC (permalink / raw) To: Amir Goldstein Cc: Leah Rumancik, Theodore Ts'o, Dave Chinner, linux-xfs, Leah Rumancik, Darrick J. Wong, Chandan Babu R On Thu, Jul 13, 2023 at 10:31:04AM +1000, Chris Dunlop wrote: > On Wed, Jul 12, 2023 at 12:26:54PM +0300, Amir Goldstein wrote: >> On Wed, Jul 12, 2023 at 5:37 AM Chris Dunlop <chris@onthe.net.au> wrote: >>> >>> Request for backport to v5.15: >>> >>> 5e672cd69f0a xfs: non-blocking inodegc pushes >> >> This is not the subject of above commit, it was the >> subject of the cover letter: >> https://www.spinics.net/lists/linux-xfs/msg61813.html >> >> containing the following upstream commits: >> 7cf2b0f9611b xfs: bound maximum wait time for inodegc work >> 5e672cd69f0a xfs: introduce xfs_inodegc_push() > ... >> Leah has already queued these two patches for 5.15 backport, >> but she is now on sick leave, so that was not done yet. >> >> We did however, identify a few more inodegc fixes from 6.4, >> which also fix a bug in one of the two commits above: >> >> 03e0add80f4c xfs: explicitly specify cpu when forcing inodegc delayed >> work to run immediately >> Fixes: 7cf2b0f9611b ("xfs: bound maximum wait time for inodegc work") >> b37c4c8339cd xfs: check that per-cpu inodegc workers actually run on that cpu >> 2254a7396a0c xfs: fix xfs_inodegc_stop racing with mod_delayed_work >> Fixes: 6191cf3ad59f ("xfs: flush inodegc workqueue tasks before cancel") >> >> 6191cf3ad59f ("xfs: flush inodegc workqueue tasks before cancel") has already >> been applied to 5.15.y. >> >> stable tree rules require that the above fixes from 6.4 be applied to 6.1.y >> before 5.15.y, so I have already tested them and they are ready to be posted. >> I wanted to wait a bit after the 6.4.0 release to make sure that we did not pull >> the blanket too much to the other side, because as the reports from Chris >> demonstrate, the inodegc fixes had some unexpected outcomes. > > Just to clarify: whilst I updated to 6.1 specifically to gain access > to the non-blocking inodegc commits, there's no evidence those inodegc > commits had anything at all to do with my issue on 6.1, and Dave's > sense is that they probably weren't involved. > > That said, those "Fixes:" commits that haven't yet made it to 6.1 look > at least a little suspicious to my naive eye. Hmmm.... in particular, this fix: 2254a7396a0c xfs: fix xfs_inodegc_stop racing with mod_delayed_work Fixes: 6191cf3ad59f ("xfs: flush inodegc workqueue tasks before cancel") mentions: ---- For this to trip, we must have a thread draining the inodedgc workqueue and a second thread trying to queue inodegc work to that workqueue. This can happen if freezing or a ro remount race with reclaim poking our faux inodegc shrinker and another thread dropping an unlinked O_RDONLY file: ---- I'm indeed periodically freezing these filesystems to take snapshots. So that could possibly be the source of my problem, although my kernel log does not have the WARN_ON_ONCE() mentioned in the patch. Cheers, Chris ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: rm hanging, v6.1.35 2023-07-10 21:53 rm hanging, v6.1.35 Chris Dunlop 2023-07-11 0:13 ` Chris Dunlop @ 2023-07-11 0:53 ` Bagas Sanjaya 2023-07-11 1:13 ` Chris Dunlop 2023-07-11 2:29 ` Dave Chinner 1 sibling, 2 replies; 15+ messages in thread From: Bagas Sanjaya @ 2023-07-11 0:53 UTC (permalink / raw) To: Chris Dunlop, Linux XFS, Dave Chinner, Darrick J. Wong Cc: Linux Stable, Linux Kernel Mailing List, Linux Regressions [-- Attachment #1: Type: text/plain, Size: 5291 bytes --] On Tue, Jul 11, 2023 at 07:53:54AM +1000, Chris Dunlop wrote: > Hi, > > This box is newly booted into linux v6.1.35 (2 days ago), it was previously > running v5.15.118 without any problems (other than that fixed by > "5e672cd69f0a xfs: non-blocking inodegc pushes", the reason for the > upgrade). > > I have rm operations on two files that have been stuck for in excess of 22 > hours and 18 hours respectively: > > $ ps -opid,lstart,state,wchan=WCHAN-xxxxxxxxxxxxxxx,cmd -C rm > PID STARTED S WCHAN-xxxxxxxxxxxxxxx CMD > 2379355 Mon Jul 10 09:07:57 2023 D vfs_unlink /bin/rm -rf /aaa/5539_tmp > 2392421 Mon Jul 10 09:18:27 2023 D down_write_nested /bin/rm -rf /aaa/5539_tmp > 2485728 Mon Jul 10 09:28:57 2023 D down_write_nested /bin/rm -rf /aaa/5539_tmp > 2488254 Mon Jul 10 09:39:27 2023 D down_write_nested /bin/rm -rf /aaa/5539_tmp > 2491180 Mon Jul 10 09:49:58 2023 D down_write_nested /bin/rm -rf /aaa/5539_tmp > 3014914 Mon Jul 10 13:00:33 2023 D vfs_unlink /bin/rm -rf /bbb/5541_tmp > 3095893 Mon Jul 10 13:11:03 2023 D down_write_nested /bin/rm -rf /bbb/5541_tmp > 3098809 Mon Jul 10 13:21:35 2023 D down_write_nested /bin/rm -rf /bbb/5541_tmp > 3101387 Mon Jul 10 13:32:06 2023 D down_write_nested /bin/rm -rf /bbb/5541_tmp > 3195017 Mon Jul 10 13:42:37 2023 D down_write_nested /bin/rm -rf /bbb/5541_tmp > > The "rm"s are run from a process that's obviously tried a few times to get > rid of these files. > > There's nothing extraordinary about the files in terms of size: > > $ ls -ltrn --full-time /aaa/5539_tmp /bbb/5541_tmp > -rw-rw-rw- 1 1482 1482 7870643 2023-07-10 06:07:58.684036505 +1000 /aaa/5539_tmp > -rw-rw-rw- 1 1482 1482 701240 2023-07-10 10:00:34.181064549 +1000 /bbb/5541_tmp > > As hinted by the WCHAN in the ps output above, each "primary" rm (i.e. the > first one run on each file) stack trace looks like: > > [<0>] vfs_unlink+0x48/0x270 > [<0>] do_unlinkat+0x1f5/0x290 > [<0>] __x64_sys_unlinkat+0x3b/0x60 > [<0>] do_syscall_64+0x34/0x80 > [<0>] entry_SYSCALL_64_after_hwframe+0x46/0xb0 > > And each "secondary" rm (i.e. the subsequent ones on each file) stack trace > looks like: > > == blog-230710-xfs-rm-stuckd > [<0>] down_write_nested+0xdc/0x100 > [<0>] do_unlinkat+0x10d/0x290 > [<0>] __x64_sys_unlinkat+0x3b/0x60 > [<0>] do_syscall_64+0x34/0x80 > [<0>] entry_SYSCALL_64_after_hwframe+0x46/0xb0 > > Multiple kernel strack traces don't show vfs_unlink or anything related that > I can see, or anything else consistent or otherwise interesting. Most cores > are idle. > > Each of /aaa and /bbb are separate XFS filesystems: > > $ xfs_info /aaa > meta-data=/dev/mapper/aaa isize=512 agcount=2, agsize=268434432 blks > = sectsz=4096 attr=2, projid32bit=1 > = crc=1 finobt=1, sparse=1, rmapbt=1 > = reflink=1 bigtime=1 inobtcount=1 > data = bsize=4096 blocks=536868864, imaxpct=5 > = sunit=256 swidth=256 blks > naming =version 2 bsize=4096 ascii-ci=0, ftype=1 > log =internal log bsize=4096 blocks=262143, version=2 > = sectsz=4096 sunit=1 blks, lazy-count=1 > realtime =none extsz=4096 blocks=0, rtextents=0 > > $ xfs_info /bbb > meta-data=/dev/mapper/bbb isize=512 agcount=8, agsize=268434432 blks > = sectsz=4096 attr=2, projid32bit=1 > = crc=1 finobt=1, sparse=1, rmapbt=1 > = reflink=1 bigtime=1 inobtcount=1 > data = bsize=4096 blocks=1879047168, imaxpct=5 > = sunit=256 swidth=256 blks > naming =version 2 bsize=4096 ascii-ci=0, ftype=1 > log =internal log bsize=4096 blocks=521728, version=2 > = sectsz=4096 sunit=1 blks, lazy-count=1 > realtime =none extsz=4096 blocks=0, rtextents=0 > > There's plenty of free space at the fs level: > > $ df -h /aaa /bbb > Filesystem Size Used Avail Use% Mounted on > /dev/mapper/aaa 2.0T 551G 1.5T 27% /aaa > /dev/mapper/bbb 7.0T 3.6T 3.5T 52% /bbb > > The fses are on sparse ceph/rbd volumes, the underlying storage tells me > they're 50-60% utilised: > > aaa: provisioned="2048G" used="1015.9G" > bbb: provisioned="7168G" used="4925.3G" > > Where to from here? > > I'm guessing only a reboot is going to unstick this. Anything I should be > looking at before reverting to v5.15.118? > > ...subsequent to starting writing all this down I have another two sets of > rms stuck, again on unremarkable files, and on two more separate > filesystems. > > ...oh. And an 'ls' on those files is hanging. The reboot has become more > urgent. > Smells like regression resurfaced, right? I mean, does 5e672cd69f0a53 not completely fix your reported blocking regression earlier? I'm kinda confused... -- An old man doll... just what I always wanted! - Clara [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 228 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: rm hanging, v6.1.35 2023-07-11 0:53 ` rm hanging, v6.1.35 Bagas Sanjaya @ 2023-07-11 1:13 ` Chris Dunlop 2023-07-11 2:29 ` Dave Chinner 1 sibling, 0 replies; 15+ messages in thread From: Chris Dunlop @ 2023-07-11 1:13 UTC (permalink / raw) To: Bagas Sanjaya Cc: Linux XFS, Dave Chinner, Darrick J. Wong, Linux Stable, Linux Kernel Mailing List, Linux Regressions On Tue, Jul 11, 2023 at 07:53:35AM +0700, Bagas Sanjaya wrote: > On Tue, Jul 11, 2023 at 07:53:54AM +1000, Chris Dunlop wrote: >> Hi, >> >> This box is newly booted into linux v6.1.35 (2 days ago), it was previously >> running v5.15.118 without any problems (other than that fixed by >> "5e672cd69f0a xfs: non-blocking inodegc pushes", the reason for the >> upgrade). >> >> I have rm operations on two files that have been stuck for in excess of 22 >> hours and 18 hours respectively: > > Smells like regression resurfaced, right? I mean, does 5e672cd69f0a53 not > completely fix your reported blocking regression earlier? > > I'm kinda confused... It looks like a completely different problem. I was wanting 5e672cd69f0a53 as a resolution to this: https://lore.kernel.org/all/20220510215411.GT1098723@dread.disaster.area/T/ In that case there were kernel stacks etc. showing inodegc stuff, and the problem eventually resolved itself once the massive amount of work had been processed (5e672cd69f0a53 puts that work into the background). In this case it looks like it's just a pure hang or deadlock - there's apparently nothing happening. Cheers, Chris ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: rm hanging, v6.1.35 2023-07-11 0:53 ` rm hanging, v6.1.35 Bagas Sanjaya 2023-07-11 1:13 ` Chris Dunlop @ 2023-07-11 2:29 ` Dave Chinner 1 sibling, 0 replies; 15+ messages in thread From: Dave Chinner @ 2023-07-11 2:29 UTC (permalink / raw) To: Bagas Sanjaya Cc: Chris Dunlop, Linux XFS, Dave Chinner, Darrick J. Wong, Linux Stable, Linux Kernel Mailing List, Linux Regressions On Tue, Jul 11, 2023 at 07:53:35AM +0700, Bagas Sanjaya wrote: > On Tue, Jul 11, 2023 at 07:53:54AM +1000, Chris Dunlop wrote: > > Hi, > > > > This box is newly booted into linux v6.1.35 (2 days ago), it was previously > > running v5.15.118 without any problems (other than that fixed by > > "5e672cd69f0a xfs: non-blocking inodegc pushes", the reason for the > > upgrade). > > > > I have rm operations on two files that have been stuck for in excess of 22 > > hours and 18 hours respectively: > > > > $ ps -opid,lstart,state,wchan=WCHAN-xxxxxxxxxxxxxxx,cmd -C rm > > PID STARTED S WCHAN-xxxxxxxxxxxxxxx CMD > > 2379355 Mon Jul 10 09:07:57 2023 D vfs_unlink /bin/rm -rf /aaa/5539_tmp > > 2392421 Mon Jul 10 09:18:27 2023 D down_write_nested /bin/rm -rf /aaa/5539_tmp > > 2485728 Mon Jul 10 09:28:57 2023 D down_write_nested /bin/rm -rf /aaa/5539_tmp > > 2488254 Mon Jul 10 09:39:27 2023 D down_write_nested /bin/rm -rf /aaa/5539_tmp > > 2491180 Mon Jul 10 09:49:58 2023 D down_write_nested /bin/rm -rf /aaa/5539_tmp > > 3014914 Mon Jul 10 13:00:33 2023 D vfs_unlink /bin/rm -rf /bbb/5541_tmp > > 3095893 Mon Jul 10 13:11:03 2023 D down_write_nested /bin/rm -rf /bbb/5541_tmp > > 3098809 Mon Jul 10 13:21:35 2023 D down_write_nested /bin/rm -rf /bbb/5541_tmp > > 3101387 Mon Jul 10 13:32:06 2023 D down_write_nested /bin/rm -rf /bbb/5541_tmp > > 3195017 Mon Jul 10 13:42:37 2023 D down_write_nested /bin/rm -rf /bbb/5541_tmp > > > > The "rm"s are run from a process that's obviously tried a few times to get > > rid of these files. > > > There's nothing extraordinary about the files in terms of size: > > > > $ ls -ltrn --full-time /aaa/5539_tmp /bbb/5541_tmp > > -rw-rw-rw- 1 1482 1482 7870643 2023-07-10 06:07:58.684036505 +1000 /aaa/5539_tmp > > -rw-rw-rw- 1 1482 1482 701240 2023-07-10 10:00:34.181064549 +1000 /bbb/5541_tmp > > > > As hinted by the WCHAN in the ps output above, each "primary" rm (i.e. the > > first one run on each file) stack trace looks like: > > > > [<0>] vfs_unlink+0x48/0x270 > > [<0>] do_unlinkat+0x1f5/0x290 > > [<0>] __x64_sys_unlinkat+0x3b/0x60 > > [<0>] do_syscall_64+0x34/0x80 > > [<0>] entry_SYSCALL_64_after_hwframe+0x46/0xb0 This looks to be stuck on the target inode lock (i.e. the locks for the inodes at /aaa/5539_tmp and /bbb/5541_tmp). What's holding these inode locks? This hasn't even got to XFS yet here, so there's something else going on in the background. Attached the full output of 'echo w > /proc/sysrq-trigger' and 'echo t > /proc/sysrq-trigger', please? > > > > And each "secondary" rm (i.e. the subsequent ones on each file) stack trace > > looks like: > > > > == blog-230710-xfs-rm-stuckd > > [<0>] down_write_nested+0xdc/0x100 > > [<0>] do_unlinkat+0x10d/0x290 > > [<0>] __x64_sys_unlinkat+0x3b/0x60 > > [<0>] do_syscall_64+0x34/0x80 > > [<0>] entry_SYSCALL_64_after_hwframe+0x46/0xb0 These are likely all stuck on the parent directory inode lock (i.e. /aaa and /bbb). > > Where to from here? > > > > I'm guessing only a reboot is going to unstick this. Anything I should be > > looking at before reverting to v5.15.118? > > > > ...subsequent to starting writing all this down I have another two sets of > > rms stuck, again on unremarkable files, and on two more separate > > filesystems. What's an "unremarkable file" look like? Is is a reflink copy of something else, a hard link, a small/large regular data file or something else? > > > > ...oh. And an 'ls' on those files is hanging. The reboot has become more > > urgent. Yup, that's most likely getting stuck on the directory locks that the unlinks are holding.... -Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2023-07-13 0:58 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-07-10 21:53 rm hanging, v6.1.35 Chris Dunlop 2023-07-11 0:13 ` Chris Dunlop 2023-07-11 1:57 ` Chris Dunlop 2023-07-11 3:10 ` Dave Chinner 2023-07-11 7:05 ` Chris Dunlop 2023-07-11 22:21 ` Dave Chinner 2023-07-12 1:13 ` Chris Dunlop 2023-07-12 1:42 ` Dave Chinner 2023-07-12 2:17 ` Subject: v5.15 backport - 5e672cd69f0a xfs: non-blocking inodegc pushes Chris Dunlop 2023-07-12 9:26 ` Amir Goldstein 2023-07-13 0:31 ` Chris Dunlop 2023-07-13 0:57 ` Chris Dunlop 2023-07-11 0:53 ` rm hanging, v6.1.35 Bagas Sanjaya 2023-07-11 1:13 ` Chris Dunlop 2023-07-11 2:29 ` Dave Chinner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox