From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2ED26EB64D9 for ; Tue, 11 Jul 2023 02:29:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230389AbjGKC3u (ORCPT ); Mon, 10 Jul 2023 22:29:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34500 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230259AbjGKC3u (ORCPT ); Mon, 10 Jul 2023 22:29:50 -0400 Received: from mail-ot1-x32c.google.com (mail-ot1-x32c.google.com [IPv6:2607:f8b0:4864:20::32c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9B5A391 for ; Mon, 10 Jul 2023 19:29:48 -0700 (PDT) Received: by mail-ot1-x32c.google.com with SMTP id 46e09a7af769-6b74e2d8c98so4564550a34.2 for ; Mon, 10 Jul 2023 19:29:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20221208.gappssmtp.com; s=20221208; t=1689042588; x=1691634588; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=PAcYcQjSn8xP98Kv6S6ayhAabmO1DM5t1UR0BBuH+OE=; b=WZ3Cco+0z7/goRk3rzbW1Q8Bm4NGTZcWnUC4vx8nMXY8TEYFlAEhAh/31nyCtxASLI L0XUXJAEbEb33DoxeF6TkhvLwj+vqCoG4LoSfb8eeYaC6cpxlzYbMdC+bWUQrT8KeEqO UOkra5wQjWOgV8+D1ElqXs2KXM1lYzwOSPikHVw+B3WYEtsDfuJjxr+F+XFJHP354+ah l77MfOmlkTIe8LM6g+ZdXxdOA0blmTvsYghrNylAKBUYLXWothqZTrQgjWb3qQ0kRWCo M5e1jgUk/3a6Mh5J2vJlylGoUtbkBLxElCO1U3Q04sFoHKvT0xzj+VJH/qC4sv5fiPeJ pFKQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689042588; x=1691634588; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=PAcYcQjSn8xP98Kv6S6ayhAabmO1DM5t1UR0BBuH+OE=; b=lkII9RYpJCjPRx6Snt/GZruEYcaJT0wUS/q4SmQyAWZH3cYYjYwMcTOLIVaioPfAUD xmgTkNBlFB0FxI8xd31xZeyDvQhpSo6a6w5w0JTDmhWWIkWHdxXS3Cgui6zE1xadRdbH XS87JSq3v7In3TsKP53oL3h9PbuS2HQJRKdm8Z3ODwVgjp+/zExMugb/CmKm/P9V3Ulz XP4Qn6hsbHitpRRRIwYGurzwt1/3UpvGYZNXhmqLcMRWUK6ZxgDwl4J4UpuZ542vlVD1 3nitI/O6uGHjW9TAejBgmBLkSzjhaUyBpyASg1lbPBGkTaQhYrEWVOrgxwSUPSePyoX9 O0ug== X-Gm-Message-State: ABy/qLbhuOjXFHnG223ezcB6b5yc7ef/+dtzuqCIA+rBTnjRj+hHZHvd ByGuo6zvChhCNBiaia1s7UtEzg== X-Google-Smtp-Source: APBJJlElSoBOffVTVDIpbJy4UIiu3P3+qhqg6us2VrXGZDT0hzskw1HitduzKJ97j3awUJOVmxodsg== X-Received: by 2002:a9d:5e18:0:b0:6b4:54f6:59d2 with SMTP id d24-20020a9d5e18000000b006b454f659d2mr13483143oti.3.1689042587927; Mon, 10 Jul 2023 19:29:47 -0700 (PDT) Received: from dread.disaster.area (pa49-180-246-40.pa.nsw.optusnet.com.au. [49.180.246.40]) by smtp.gmail.com with ESMTPSA id c16-20020a17090ab29000b002657aa777f1sm6880198pjr.19.2023.07.10.19.29.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 10 Jul 2023 19:29:47 -0700 (PDT) Received: from dave by dread.disaster.area with local (Exim 4.96) (envelope-from ) id 1qJ38J-004clZ-0w; Tue, 11 Jul 2023 12:29:43 +1000 Date: Tue, 11 Jul 2023 12:29:43 +1000 From: Dave Chinner To: Bagas Sanjaya Cc: Chris Dunlop , Linux XFS , Dave Chinner , "Darrick J. Wong" , Linux Stable , Linux Kernel Mailing List , Linux Regressions Subject: Re: rm hanging, v6.1.35 Message-ID: References: <20230710215354.GA679018@onthe.net.au> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org On Tue, Jul 11, 2023 at 07:53:35AM +0700, Bagas Sanjaya wrote: > On Tue, Jul 11, 2023 at 07:53:54AM +1000, Chris Dunlop wrote: > > Hi, > > > > This box is newly booted into linux v6.1.35 (2 days ago), it was previously > > running v5.15.118 without any problems (other than that fixed by > > "5e672cd69f0a xfs: non-blocking inodegc pushes", the reason for the > > upgrade). > > > > I have rm operations on two files that have been stuck for in excess of 22 > > hours and 18 hours respectively: > > > > $ ps -opid,lstart,state,wchan=WCHAN-xxxxxxxxxxxxxxx,cmd -C rm > > PID STARTED S WCHAN-xxxxxxxxxxxxxxx CMD > > 2379355 Mon Jul 10 09:07:57 2023 D vfs_unlink /bin/rm -rf /aaa/5539_tmp > > 2392421 Mon Jul 10 09:18:27 2023 D down_write_nested /bin/rm -rf /aaa/5539_tmp > > 2485728 Mon Jul 10 09:28:57 2023 D down_write_nested /bin/rm -rf /aaa/5539_tmp > > 2488254 Mon Jul 10 09:39:27 2023 D down_write_nested /bin/rm -rf /aaa/5539_tmp > > 2491180 Mon Jul 10 09:49:58 2023 D down_write_nested /bin/rm -rf /aaa/5539_tmp > > 3014914 Mon Jul 10 13:00:33 2023 D vfs_unlink /bin/rm -rf /bbb/5541_tmp > > 3095893 Mon Jul 10 13:11:03 2023 D down_write_nested /bin/rm -rf /bbb/5541_tmp > > 3098809 Mon Jul 10 13:21:35 2023 D down_write_nested /bin/rm -rf /bbb/5541_tmp > > 3101387 Mon Jul 10 13:32:06 2023 D down_write_nested /bin/rm -rf /bbb/5541_tmp > > 3195017 Mon Jul 10 13:42:37 2023 D down_write_nested /bin/rm -rf /bbb/5541_tmp > > > > The "rm"s are run from a process that's obviously tried a few times to get > > rid of these files. > > > There's nothing extraordinary about the files in terms of size: > > > > $ ls -ltrn --full-time /aaa/5539_tmp /bbb/5541_tmp > > -rw-rw-rw- 1 1482 1482 7870643 2023-07-10 06:07:58.684036505 +1000 /aaa/5539_tmp > > -rw-rw-rw- 1 1482 1482 701240 2023-07-10 10:00:34.181064549 +1000 /bbb/5541_tmp > > > > As hinted by the WCHAN in the ps output above, each "primary" rm (i.e. the > > first one run on each file) stack trace looks like: > > > > [<0>] vfs_unlink+0x48/0x270 > > [<0>] do_unlinkat+0x1f5/0x290 > > [<0>] __x64_sys_unlinkat+0x3b/0x60 > > [<0>] do_syscall_64+0x34/0x80 > > [<0>] entry_SYSCALL_64_after_hwframe+0x46/0xb0 This looks to be stuck on the target inode lock (i.e. the locks for the inodes at /aaa/5539_tmp and /bbb/5541_tmp). What's holding these inode locks? This hasn't even got to XFS yet here, so there's something else going on in the background. Attached the full output of 'echo w > /proc/sysrq-trigger' and 'echo t > /proc/sysrq-trigger', please? > > > > And each "secondary" rm (i.e. the subsequent ones on each file) stack trace > > looks like: > > > > == blog-230710-xfs-rm-stuckd > > [<0>] down_write_nested+0xdc/0x100 > > [<0>] do_unlinkat+0x10d/0x290 > > [<0>] __x64_sys_unlinkat+0x3b/0x60 > > [<0>] do_syscall_64+0x34/0x80 > > [<0>] entry_SYSCALL_64_after_hwframe+0x46/0xb0 These are likely all stuck on the parent directory inode lock (i.e. /aaa and /bbb). > > Where to from here? > > > > I'm guessing only a reboot is going to unstick this. Anything I should be > > looking at before reverting to v5.15.118? > > > > ...subsequent to starting writing all this down I have another two sets of > > rms stuck, again on unremarkable files, and on two more separate > > filesystems. What's an "unremarkable file" look like? Is is a reflink copy of something else, a hard link, a small/large regular data file or something else? > > > > ...oh. And an 'ls' on those files is hanging. The reboot has become more > > urgent. Yup, that's most likely getting stuck on the directory locks that the unlinks are holding.... -Dave. -- Dave Chinner david@fromorbit.com