From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6266AEB64DD for ; Tue, 11 Jul 2023 07:05:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229837AbjGKHFg (ORCPT ); Tue, 11 Jul 2023 03:05:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37670 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229558AbjGKHFf (ORCPT ); Tue, 11 Jul 2023 03:05:35 -0400 Received: from smtp2.onthe.net.au (smtp2.onthe.net.au [203.22.196.248]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 5C9D5B1 for ; Tue, 11 Jul 2023 00:05:33 -0700 (PDT) Received: from localhost (smtp2.private.onthe.net.au [10.200.63.13]) by smtp2.onthe.net.au (Postfix) with ESMTP id 3859F94E; Tue, 11 Jul 2023 17:05:31 +1000 (AEST) Received: from smtp2.onthe.net.au ([10.200.63.13]) by localhost (smtp.onthe.net.au [10.200.63.13]) (amavisd-new, port 10024) with ESMTP id hJd83ltTHRbc; Tue, 11 Jul 2023 17:05:31 +1000 (AEST) Received: from athena.private.onthe.net.au (chris-gw2-vpn.private.onthe.net.au [10.9.3.2]) by smtp2.onthe.net.au (Postfix) with ESMTP id 1805CCD; Tue, 11 Jul 2023 17:05:31 +1000 (AEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=onthe.net.au; s=default; t=1689059131; bh=mpeqNPFGHPh2conuTVh0mvFtTHX7vYT8xbaPx4xwXho=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=euHahkzZIO5DjbmA6nHNYISzsQf1NO5VTyK3IwZi3GapDNdUrrdaUEQ27ZQPnsOID LBcZqyXRxZidXV1dXs27coLaHpwYBVT9Ezw0YtzI0cr/piJWnr/NJCuTvg4xysTYQ0 lXm26G43KEnP7ZoxeH6KBeS62hfVN437weeEhnCfmfHA+81dFqFgVXKPZVoNY2Dsgt itZFvorNxRQYamVaf3zW88ZCkxKbqkgqrHKkrl8HS2wX2cnOOEMwjO53GK7nz5tbaV CDEwEqcT/VAOBkUXMtkn5QCb2Qt5ke4goLOqY3IVwQO59V78JJqtp9Cz3YVF3OkS1R nrlIUuX18RRzg== Received: by athena.private.onthe.net.au (Postfix, from userid 1026) id EE0B768061B; Tue, 11 Jul 2023 17:05:30 +1000 (AEST) Date: Tue, 11 Jul 2023 17:05:30 +1000 From: Chris Dunlop To: Dave Chinner Cc: linux-xfs@vger.kernel.org Subject: Re: rm hanging, v6.1.35 Message-ID: <20230711070530.GA761114@onthe.net.au> References: <20230710215354.GA679018@onthe.net.au> <20230711001331.GA683098@onthe.net.au> <20230711015716.GA687252@onthe.net.au> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org On Tue, Jul 11, 2023 at 01:10:11PM +1000, Dave Chinner wrote: > FYI, Chris: google is classifying your email as spam because it > doesn't have any dkim/dmarc authentication on it. You're lucky I > even got this, because I know that gmail mostly just drops such > email in a black hole now.... Dkim hopefully fixed now. More on the problem topic below, and apologies for the aside, but this is my immediate concern: post the reboot into v5.15.118 I have one of the filesystems failing to mount with: Jul 11 16:13:10 b2 kernel: XFS (dm-146): Metadata CRC error detected at xfs_allocbt_read_verify+0x12/0xb0 [xfs], xfs_bnobt block 0x10a00f7f8 Jul 11 16:13:10 b2 kernel: XFS (dm-146): Unmount and run xfs_repair Jul 11 16:13:10 b2 kernel: XFS (dm-146): First 128 bytes of corrupted metadata buffer: Jul 11 16:13:10 b2 kernel: 00000000: 41 42 33 42 00 00 01 ab 05 0a 78 fe 02 c0 2b ff AB3B......x...+. Jul 11 16:13:10 b2 kernel: 00000010: 00 00 00 01 0a 00 f7 f8 00 00 00 26 00 3b 2d 40 ...........&.;-@ Jul 11 16:13:10 b2 kernel: 00000020: cf 42 ed 07 78 a1 4e ff 9e 20 e3 d9 6f fc 3e 30 .B..x.N.. ..o.>0 Jul 11 16:13:10 b2 kernel: 00000030: 00 00 00 02 80 b2 25 41 08 c7 6c 00 00 00 01 28 ......%A..l....( Jul 11 16:13:10 b2 kernel: 00000040: 08 c7 6d 3e 00 00 00 c2 08 c7 6f 65 00 00 00 a7 ..m>......oe.... Jul 11 16:13:10 b2 kernel: 00000050: 08 c7 70 a5 00 00 00 3a 08 c7 71 00 00 00 00 c4 ..p....:..q..... Jul 11 16:13:10 b2 kernel: 00000060: 08 c7 71 e8 00 00 00 18 08 c7 72 50 00 00 02 c3 ..q.......rP.... Jul 11 16:13:10 b2 kernel: 00000070: 08 c7 75 b1 00 00 02 4b 08 c7 78 db 00 00 02 3d ..u....K..x....= Jul 11 16:13:10 b2 kernel: XFS (dm-146): log mount/recovery failed: error -74 Jul 11 16:13:10 b2 kernel: XFS (dm-146): log mount failed Then xfs_repair comes back with: # xfs_repair /dev/vg-name/lv-name Phase 1 - find and verify superblock... - reporting progress in intervals of 15 minutes Phase 2 - using internal log - zero log... ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this. At this point is "xfs_repair -L" (and attendant potential data loss) my only option? > On Tue, Jul 11, 2023 at 11:57:16AM +1000, Chris Dunlop wrote: >> On Tue, Jul 11, 2023 at 10:13:31AM +1000, Chris Dunlop wrote: >>> On Tue, Jul 11, 2023 at 07:53:54AM +1000, Chris Dunlop wrote: >>>> Hi, >>>> >>>> This box is newly booted into linux v6.1.35 (2 days ago), it was >>>> previously running v5.15.118 without any problems (other than that >>>> fixed by "5e672cd69f0a xfs: non-blocking inodegc pushes", the reason >>>> for the upgrade). >>>> >>>> I have rm operations on two files that have been stuck for in excess >>>> of 22 hours and 18 hours respectively: ... >> >> Full sysrq-w output at: >> >> https://file.io/tg7F5OqIWo1B > > Ok, you have XFS on dm-writecache on md raid 5 and > everything is stuck in either dm-writecache or md. Yes, dm-writecache, with the cache on SSD md raid6, in front of the XFS devices on lvm / ceph-rbd. > This task holds the wc_lock() while it is doing this writeback > flush. > > All the XFS filesystems are stuck in similar ways, such as trying to > push the journal and getting stuck in IO submission in > dm-writecache: ... > These are getting into dm-writecache, and the lock they are getting > stuck on is the wc_lock(). ... > > Directory reads are getting stuck in dm-write-cache: ... > Same lock. > > File data reads are getting stuck the same way in dm-writecache. > > File data writes (i.e. writeback) are getting stuck the same way > in dm-writecache. > > Yup, everything is stuck on dm-writecache flushing to the underlying > RAID-5 device. > > I don't see anything indicating a filesystem problem here. This > looks like a massively overloaded RAID 5 volume. i.e. the fast > storage that makes up the write cache has filled, and now everything > is stuck waiting for the fast cache to drain to the slow backing > store before new writes can be made to the fast cache. IOWs, > everything is running as RAID5 write speed and there's a huge > backlog of data being written to the RAID 5 volume(s) keeping them > 100% busy. This had never been an issue with v5.15. Is it possible the upgrade to v6.1 had a hand in this or that's probably just coincidence? In particular, could "5e672cd69f0a xfs: non-blocking inodegc pushes" cause a significantly greater write load on the cache? I note that there's one fs (separate to the corrupt one above) that's still in mount recovery since the boot some hours ago. On past experience that indicates the inodegc stuff is holding things up, i.e. it would have been running prior to the reboot - or at least trying to. > If there is no IO going to the RAID 5 volumes, then you've got a > RAID 5 or physical IO hang of some kind. But I can't find any > indication of that - everything looks like it's waiting on RAID 5 > stripe population to make write progress... There's been no change to the cache device over the reboot, and it currently looks busy, but it doesn't look completely swamped: load %user %nice %sys %iow %stl %idle dev rrqm/s wrqm/s r/s w/s rkB/s wkB/s arq-sz aqu-sz await rwait wwait %util 2023-07-11-16:10:00 16.7 0.6 0.0 1.0 15.0 0.0 82.5 md10 0.0 0.0 172.5 1060.1 17131.66 27450.08 72.34 6.33 5.13 0.56 5.9 48.8 2023-07-11-16:11:00 17.2 0.8 0.0 1.1 15.6 0.0 81.7 md10 0.0 0.0 185.7 1151.4 18202.36 20681.51 58.16 8.18 6.12 0.53 7.0 44.0 2023-07-11-16:12:00 18.9 0.5 0.0 1.0 17.0 0.0 80.6 md10 0.0 0.0 179.9 1139.6 15617.51 26579.51 63.96 7.62 5.78 0.46 6.6 47.5 2023-07-11-16:13:00 18.7 0.6 0.0 1.0 15.4 0.0 82.1 md10 0.0 0.0 161.8 1399.7 15751.70 26273.20 53.83 11.47 7.35 0.51 8.1 47.4 2023-07-11-16:14:00 17.2 0.5 0.0 0.9 13.9 0.0 83.8 md10 0.0 0.0 190.2 1251.1 15207.77 20907.73 50.12 14.19 9.85 2.16 11.0 42.2 2023-07-11-16:15:00 17.0 0.6 0.0 1.0 13.8 0.0 83.7 md10 0.0 0.0 163.5 1571.2 14561.41 24508.45 45.05 14.38 8.29 0.53 9.1 46.7 2023-07-11-16:16:00 18.7 0.8 0.0 1.0 15.5 0.0 81.8 md10 0.0 0.0 164.6 1294.7 13256.49 20737.23 46.59 12.54 8.59 0.50 9.6 42.3 2023-07-11-16:17:00 18.9 0.6 0.0 1.0 14.0 0.0 83.4 md10 0.0 0.0 179.4 1026.7 14079.12 19095.93 55.01 9.12 7.56 0.91 8.7 44.8 2023-07-11-16:18:00 19.4 0.6 0.0 1.0 13.8 0.0 83.7 md10 0.0 0.0 151.6 1315.5 12935.14 23547.30 49.73 11.80 8.04 0.43 8.9 48.5 2023-07-11-16:19:00 19.6 0.6 0.0 1.0 11.9 0.0 85.7 md10 0.0 0.0 168.6 1895.9 14153.63 23810.50 36.78 16.75 8.11 0.77 8.8 48.5 It should be handling about the same load as prior to the reboot. Cheers, Chris