All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bagas Sanjaya <bagasdotme@gmail.com>
To: Chris Dunlop <chris@onthe.net.au>,
	Linux XFS <linux-xfs@vger.kernel.org>,
	Dave Chinner <dchinner@redhat.com>,
	"Darrick J. Wong" <djwong@kernel.org>
Cc: Linux Stable <stable@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Linux Regressions <regressions@lists.linux.dev>
Subject: Re: rm hanging, v6.1.35
Date: Tue, 11 Jul 2023 07:53:35 +0700	[thread overview]
Message-ID: <ZKyoD7WDKfzsKAaT@debian.me> (raw)
In-Reply-To: <20230710215354.GA679018@onthe.net.au>

[-- Attachment #1: Type: text/plain, Size: 5291 bytes --]

On Tue, Jul 11, 2023 at 07:53:54AM +1000, Chris Dunlop wrote:
> Hi,
> 
> This box is newly booted into linux v6.1.35 (2 days ago), it was previously
> running v5.15.118 without any problems (other than that fixed by
> "5e672cd69f0a xfs: non-blocking inodegc pushes", the reason for the
> upgrade).
> 
> I have rm operations on two files that have been stuck for in excess of 22
> hours and 18 hours respectively:
> 
> $ ps -opid,lstart,state,wchan=WCHAN-xxxxxxxxxxxxxxx,cmd -C rm
>     PID                  STARTED S WCHAN-xxxxxxxxxxxxxxx CMD
> 2379355 Mon Jul 10 09:07:57 2023 D vfs_unlink            /bin/rm -rf /aaa/5539_tmp
> 2392421 Mon Jul 10 09:18:27 2023 D down_write_nested     /bin/rm -rf /aaa/5539_tmp
> 2485728 Mon Jul 10 09:28:57 2023 D down_write_nested     /bin/rm -rf /aaa/5539_tmp
> 2488254 Mon Jul 10 09:39:27 2023 D down_write_nested     /bin/rm -rf /aaa/5539_tmp
> 2491180 Mon Jul 10 09:49:58 2023 D down_write_nested     /bin/rm -rf /aaa/5539_tmp
> 3014914 Mon Jul 10 13:00:33 2023 D vfs_unlink            /bin/rm -rf /bbb/5541_tmp
> 3095893 Mon Jul 10 13:11:03 2023 D down_write_nested     /bin/rm -rf /bbb/5541_tmp
> 3098809 Mon Jul 10 13:21:35 2023 D down_write_nested     /bin/rm -rf /bbb/5541_tmp
> 3101387 Mon Jul 10 13:32:06 2023 D down_write_nested     /bin/rm -rf /bbb/5541_tmp
> 3195017 Mon Jul 10 13:42:37 2023 D down_write_nested     /bin/rm -rf /bbb/5541_tmp
> 
> The "rm"s are run from a process that's obviously tried a few times to get
> rid of these files.
> 
> There's nothing extraordinary about the files in terms of size:
> 
> $ ls -ltrn --full-time /aaa/5539_tmp /bbb/5541_tmp
> -rw-rw-rw- 1 1482 1482 7870643 2023-07-10 06:07:58.684036505 +1000 /aaa/5539_tmp
> -rw-rw-rw- 1 1482 1482  701240 2023-07-10 10:00:34.181064549 +1000 /bbb/5541_tmp
> 
> As hinted by the WCHAN in the ps output above, each "primary" rm (i.e. the
> first one run on each file) stack trace looks like:
> 
> [<0>] vfs_unlink+0x48/0x270
> [<0>] do_unlinkat+0x1f5/0x290
> [<0>] __x64_sys_unlinkat+0x3b/0x60
> [<0>] do_syscall_64+0x34/0x80
> [<0>] entry_SYSCALL_64_after_hwframe+0x46/0xb0
> 
> And each "secondary" rm (i.e. the subsequent ones on each file) stack trace
> looks like:
> 
> == blog-230710-xfs-rm-stuckd
> [<0>] down_write_nested+0xdc/0x100
> [<0>] do_unlinkat+0x10d/0x290
> [<0>] __x64_sys_unlinkat+0x3b/0x60
> [<0>] do_syscall_64+0x34/0x80
> [<0>] entry_SYSCALL_64_after_hwframe+0x46/0xb0
> 
> Multiple kernel strack traces don't show vfs_unlink or anything related that
> I can see, or anything else consistent or otherwise interesting. Most cores
> are idle.
> 
> Each of /aaa and /bbb are separate XFS filesystems:
> 
> $ xfs_info /aaa
> meta-data=/dev/mapper/aaa        isize=512    agcount=2, agsize=268434432 blks
>          =                       sectsz=4096  attr=2, projid32bit=1
>          =                       crc=1        finobt=1, sparse=1, rmapbt=1
>          =                       reflink=1    bigtime=1 inobtcount=1
> data     =                       bsize=4096   blocks=536868864, imaxpct=5
>          =                       sunit=256    swidth=256 blks
> naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
> log      =internal log           bsize=4096   blocks=262143, version=2
>          =                       sectsz=4096  sunit=1 blks, lazy-count=1
> realtime =none                   extsz=4096   blocks=0, rtextents=0
> 
> $ xfs_info /bbb
> meta-data=/dev/mapper/bbb        isize=512    agcount=8, agsize=268434432 blks
>          =                       sectsz=4096  attr=2, projid32bit=1
>          =                       crc=1        finobt=1, sparse=1, rmapbt=1
>          =                       reflink=1    bigtime=1 inobtcount=1
> data     =                       bsize=4096   blocks=1879047168, imaxpct=5
>          =                       sunit=256    swidth=256 blks
> naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
> log      =internal log           bsize=4096   blocks=521728, version=2
>          =                       sectsz=4096  sunit=1 blks, lazy-count=1
> realtime =none                   extsz=4096   blocks=0, rtextents=0
> 
> There's plenty of free space at the fs level:
> 
> $ df -h /aaa /bbb
> Filesystem          Size  Used Avail Use% Mounted on
> /dev/mapper/aaa     2.0T  551G  1.5T  27% /aaa
> /dev/mapper/bbb     7.0T  3.6T  3.5T  52% /bbb
> 
> The fses are on sparse ceph/rbd volumes, the underlying storage tells me
> they're 50-60% utilised:
> 
> aaa: provisioned="2048G" used="1015.9G"
> bbb: provisioned="7168G" used="4925.3G"
> 
> Where to from here?
> 
> I'm guessing only a reboot is going to unstick this. Anything I should be
> looking at before reverting to v5.15.118?
> 
> ...subsequent to starting writing all this down I have another two sets of
> rms stuck, again on unremarkable files, and on two more separate
> filesystems.
> 
> ...oh. And an 'ls' on those files is hanging. The reboot has become more
> urgent.
> 

Smells like regression resurfaced, right? I mean, does 5e672cd69f0a53 not
completely fix your reported blocking regression earlier?

I'm kinda confused...

-- 
An old man doll... just what I always wanted! - Clara

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

  parent reply	other threads:[~2023-07-11  0:53 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-10 21:53 rm hanging, v6.1.35 Chris Dunlop
2023-07-11  0:13 ` Chris Dunlop
2023-07-11  1:57   ` Chris Dunlop
2023-07-11  3:10     ` Dave Chinner
2023-07-11  7:05       ` Chris Dunlop
2023-07-11 22:21         ` Dave Chinner
2023-07-12  1:13           ` Chris Dunlop
2023-07-12  1:42             ` Dave Chinner
2023-07-12  2:17               ` Subject: v5.15 backport - 5e672cd69f0a xfs: non-blocking inodegc pushes Chris Dunlop
2023-07-12  9:26                 ` Amir Goldstein
2023-07-13  0:31                   ` Chris Dunlop
2023-07-13  0:57                     ` Chris Dunlop
2023-07-11  0:53 ` Bagas Sanjaya [this message]
2023-07-11  1:13   ` rm hanging, v6.1.35 Chris Dunlop
2023-07-11  2:29   ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZKyoD7WDKfzsKAaT@debian.me \
    --to=bagasdotme@gmail.com \
    --cc=chris@onthe.net.au \
    --cc=dchinner@redhat.com \
    --cc=djwong@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=regressions@lists.linux.dev \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.