From: Bagas Sanjaya <bagasdotme@gmail.com>
To: Chris Dunlop <chris@onthe.net.au>,
Linux XFS <linux-xfs@vger.kernel.org>,
Dave Chinner <dchinner@redhat.com>,
"Darrick J. Wong" <djwong@kernel.org>
Cc: Linux Stable <stable@vger.kernel.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Linux Regressions <regressions@lists.linux.dev>
Subject: Re: rm hanging, v6.1.35
Date: Tue, 11 Jul 2023 07:53:35 +0700 [thread overview]
Message-ID: <ZKyoD7WDKfzsKAaT@debian.me> (raw)
In-Reply-To: <20230710215354.GA679018@onthe.net.au>
[-- Attachment #1: Type: text/plain, Size: 5291 bytes --]
On Tue, Jul 11, 2023 at 07:53:54AM +1000, Chris Dunlop wrote:
> Hi,
>
> This box is newly booted into linux v6.1.35 (2 days ago), it was previously
> running v5.15.118 without any problems (other than that fixed by
> "5e672cd69f0a xfs: non-blocking inodegc pushes", the reason for the
> upgrade).
>
> I have rm operations on two files that have been stuck for in excess of 22
> hours and 18 hours respectively:
>
> $ ps -opid,lstart,state,wchan=WCHAN-xxxxxxxxxxxxxxx,cmd -C rm
> PID STARTED S WCHAN-xxxxxxxxxxxxxxx CMD
> 2379355 Mon Jul 10 09:07:57 2023 D vfs_unlink /bin/rm -rf /aaa/5539_tmp
> 2392421 Mon Jul 10 09:18:27 2023 D down_write_nested /bin/rm -rf /aaa/5539_tmp
> 2485728 Mon Jul 10 09:28:57 2023 D down_write_nested /bin/rm -rf /aaa/5539_tmp
> 2488254 Mon Jul 10 09:39:27 2023 D down_write_nested /bin/rm -rf /aaa/5539_tmp
> 2491180 Mon Jul 10 09:49:58 2023 D down_write_nested /bin/rm -rf /aaa/5539_tmp
> 3014914 Mon Jul 10 13:00:33 2023 D vfs_unlink /bin/rm -rf /bbb/5541_tmp
> 3095893 Mon Jul 10 13:11:03 2023 D down_write_nested /bin/rm -rf /bbb/5541_tmp
> 3098809 Mon Jul 10 13:21:35 2023 D down_write_nested /bin/rm -rf /bbb/5541_tmp
> 3101387 Mon Jul 10 13:32:06 2023 D down_write_nested /bin/rm -rf /bbb/5541_tmp
> 3195017 Mon Jul 10 13:42:37 2023 D down_write_nested /bin/rm -rf /bbb/5541_tmp
>
> The "rm"s are run from a process that's obviously tried a few times to get
> rid of these files.
>
> There's nothing extraordinary about the files in terms of size:
>
> $ ls -ltrn --full-time /aaa/5539_tmp /bbb/5541_tmp
> -rw-rw-rw- 1 1482 1482 7870643 2023-07-10 06:07:58.684036505 +1000 /aaa/5539_tmp
> -rw-rw-rw- 1 1482 1482 701240 2023-07-10 10:00:34.181064549 +1000 /bbb/5541_tmp
>
> As hinted by the WCHAN in the ps output above, each "primary" rm (i.e. the
> first one run on each file) stack trace looks like:
>
> [<0>] vfs_unlink+0x48/0x270
> [<0>] do_unlinkat+0x1f5/0x290
> [<0>] __x64_sys_unlinkat+0x3b/0x60
> [<0>] do_syscall_64+0x34/0x80
> [<0>] entry_SYSCALL_64_after_hwframe+0x46/0xb0
>
> And each "secondary" rm (i.e. the subsequent ones on each file) stack trace
> looks like:
>
> == blog-230710-xfs-rm-stuckd
> [<0>] down_write_nested+0xdc/0x100
> [<0>] do_unlinkat+0x10d/0x290
> [<0>] __x64_sys_unlinkat+0x3b/0x60
> [<0>] do_syscall_64+0x34/0x80
> [<0>] entry_SYSCALL_64_after_hwframe+0x46/0xb0
>
> Multiple kernel strack traces don't show vfs_unlink or anything related that
> I can see, or anything else consistent or otherwise interesting. Most cores
> are idle.
>
> Each of /aaa and /bbb are separate XFS filesystems:
>
> $ xfs_info /aaa
> meta-data=/dev/mapper/aaa isize=512 agcount=2, agsize=268434432 blks
> = sectsz=4096 attr=2, projid32bit=1
> = crc=1 finobt=1, sparse=1, rmapbt=1
> = reflink=1 bigtime=1 inobtcount=1
> data = bsize=4096 blocks=536868864, imaxpct=5
> = sunit=256 swidth=256 blks
> naming =version 2 bsize=4096 ascii-ci=0, ftype=1
> log =internal log bsize=4096 blocks=262143, version=2
> = sectsz=4096 sunit=1 blks, lazy-count=1
> realtime =none extsz=4096 blocks=0, rtextents=0
>
> $ xfs_info /bbb
> meta-data=/dev/mapper/bbb isize=512 agcount=8, agsize=268434432 blks
> = sectsz=4096 attr=2, projid32bit=1
> = crc=1 finobt=1, sparse=1, rmapbt=1
> = reflink=1 bigtime=1 inobtcount=1
> data = bsize=4096 blocks=1879047168, imaxpct=5
> = sunit=256 swidth=256 blks
> naming =version 2 bsize=4096 ascii-ci=0, ftype=1
> log =internal log bsize=4096 blocks=521728, version=2
> = sectsz=4096 sunit=1 blks, lazy-count=1
> realtime =none extsz=4096 blocks=0, rtextents=0
>
> There's plenty of free space at the fs level:
>
> $ df -h /aaa /bbb
> Filesystem Size Used Avail Use% Mounted on
> /dev/mapper/aaa 2.0T 551G 1.5T 27% /aaa
> /dev/mapper/bbb 7.0T 3.6T 3.5T 52% /bbb
>
> The fses are on sparse ceph/rbd volumes, the underlying storage tells me
> they're 50-60% utilised:
>
> aaa: provisioned="2048G" used="1015.9G"
> bbb: provisioned="7168G" used="4925.3G"
>
> Where to from here?
>
> I'm guessing only a reboot is going to unstick this. Anything I should be
> looking at before reverting to v5.15.118?
>
> ...subsequent to starting writing all this down I have another two sets of
> rms stuck, again on unremarkable files, and on two more separate
> filesystems.
>
> ...oh. And an 'ls' on those files is hanging. The reboot has become more
> urgent.
>
Smells like regression resurfaced, right? I mean, does 5e672cd69f0a53 not
completely fix your reported blocking regression earlier?
I'm kinda confused...
--
An old man doll... just what I always wanted! - Clara
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
next prev parent reply other threads:[~2023-07-11 0:53 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-10 21:53 rm hanging, v6.1.35 Chris Dunlop
2023-07-11 0:13 ` Chris Dunlop
2023-07-11 1:57 ` Chris Dunlop
2023-07-11 3:10 ` Dave Chinner
2023-07-11 7:05 ` Chris Dunlop
2023-07-11 22:21 ` Dave Chinner
2023-07-12 1:13 ` Chris Dunlop
2023-07-12 1:42 ` Dave Chinner
2023-07-12 2:17 ` Subject: v5.15 backport - 5e672cd69f0a xfs: non-blocking inodegc pushes Chris Dunlop
2023-07-12 9:26 ` Amir Goldstein
2023-07-13 0:31 ` Chris Dunlop
2023-07-13 0:57 ` Chris Dunlop
2023-07-11 0:53 ` Bagas Sanjaya [this message]
2023-07-11 1:13 ` rm hanging, v6.1.35 Chris Dunlop
2023-07-11 2:29 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZKyoD7WDKfzsKAaT@debian.me \
--to=bagasdotme@gmail.com \
--cc=chris@onthe.net.au \
--cc=dchinner@redhat.com \
--cc=djwong@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=regressions@lists.linux.dev \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox