From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wm1-f47.google.com (mail-wm1-f47.google.com [209.85.128.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7812121FF23 for ; Mon, 9 Mar 2026 11:35:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.47 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773056133; cv=none; b=n7YKgG2M43BeBx246hxHGyf68detlZry89BO9tJrwkjDtqwcjYJfklN6wBl81DFrwQoU350Y7tYlJqJx9RINzU5WrcYOfVXDK2Qn3oaL5XVnMdPE4ZeoxsRT5erH4a+RuFCWhN74KreCTzZXUkI/1USBb31kTxSVEmYiCoc76Zg= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773056133; c=relaxed/simple; bh=ElAeZ7oU7QXMxtGAoDH6YPzDPJSUZeEcXVjZ1ePL8MU=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=tScFcchFY75LLqlXh/PzA9oMUQE7y4PSSve4ueZYspkDmM5MgW0Wvdwt3wbVzCgCZzFJtfei6Be743lNRl2EhlNV6979gwf8g9qgYYpRfAxwfaJUV0JIreHI2npgPrKt0p650mir2Z1znhx+p/ra4NrQMQu0PsuGzXhviPivYrI= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=F+C0V32E; arc=none smtp.client-ip=209.85.128.47 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="F+C0V32E" Received: by mail-wm1-f47.google.com with SMTP id 5b1f17b1804b1-48535a0ef86so10373385e9.1 for ; Mon, 09 Mar 2026 04:35:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1773056130; x=1773660930; darn=lists.linux.dev; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=FqNAhn11Xh7FzklS7fF5yJqQsvr76T5xJYaiWOCrw98=; b=F+C0V32ElMpxAEXV5gSdo3ujcXbVLyNLAJufmu/rwIw40ETmMZJOuNOKb0YYS5ECov OS92KrzQpNFx09chMeCVRYI9Xo+VjPsE2Gpqr4yLE0HsfxHk0ItoXtJ0oRgCDUdeBl9q PQJMBKldqA0vuajQgN8GKuKeUwFTLLQSYGcCJTH23aHOmJFzuEuSjc7mnXR2R6JJyep6 SVPS0FH0aTIrR3wvnE+CzRRZr/YvPEj5ZSBEk6EYu4IiDK9SBP+tT75p4w1CtLKtfCtS jHcS4MSdhJnxWtS3PCu8I7uYXxkvsV3ukunUKugU59+T1a8uWwjM1YY8TtLNBXU/ti2W DEnw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1773056130; x=1773660930; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-gg:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=FqNAhn11Xh7FzklS7fF5yJqQsvr76T5xJYaiWOCrw98=; b=NK5NvAdC5uPW/WIRRSrYImhRnj+RvotXwMwxqnHdHXom68yqzP6ss5H31gSBw/9iSF rOmijU0SFhOh6bV+M/7EmoXjNEr8Le2tfDNixvyl09Zkr4jDn7XWPSbBA3TwFup+K1s0 MZCBuq4NASoK3PKT8WNffooQR5LgrvEUZjoyFhPPNKWTskulBofILYVI6rf4k5u0rkF0 YSrFOh0k5JwYyHMx228laTDxMur2+dElK+kiAy+Tow/qE/euyWqJAVY8up13PF9LbEtE TARO4TAZ1v8utOUPDHO5vit0GfaYCraBehbJD7853cFeCUFhPYNOsrqflx/TrNJhB+0I FO0Q== X-Forwarded-Encrypted: i=1; AJvYcCUay/mgb/8Bfv1/TSZoGbPMptbCibE6Z7QhTS3AuAhLwfPBnnYptjaoOUSn68LvceH6Kuym9g==@lists.linux.dev X-Gm-Message-State: AOJu0YzyP93HH1G6YY4pYEHv8y61JstV7bt4ThLrUh0IrhqDUXo7NjQF xXIbHFP5OjcsgO7al9Ek8Kn4P/CdY5SCzOF14KeOM/mBrWThvwm0FtU1 X-Gm-Gg: ATEYQzzo2mz63FzXfxgnEECBPa+x9J8976fCvsjo7lzm/LfLOfERAeEzD9yM6YE5tNL 3KwLtNVo4tELOLiLmAtXNHKjAIS9YAZKO6BCZo8scjwAFKutFNhBIiTkIUjmmUhGusr2SKOEkSs qvxkWjLqDgre5jnDfi6Elz8SYGpJMQPwdCJWI/hNcJWo/LAnbuk0m91RxPI/gjFQ63Ua1B/s13i P+L+zkAlC/Hrp6Msx4Zrs+hZyrJc8RxAh18lwGE1aol4B8dux+7hWMOGRaisFiFxVJsXsXAeSVe Vwudg7RRpmfe2MlpqlBzyS5ZUXmMX9RRCGZvpKLGN9xoYxMoONN5WmF9YQFwxVu3vT12Q/inunI 0iqzkPOK9SQZ3c13qZ7R46ck6mL/kyi8BtntniHgkCPIpWC0ZXY8CusY4+eW+E3Acsdrb5+H2MV UNeHYMlBVMmVytFD37VBMRLVxntTtcOm9fEZnqCdSZzGfYiKi+hiyitAeoe91vJj7hPZEIdBRGI SVFaZBcDnc= X-Received: by 2002:a05:600c:3492:b0:485:3e19:9e01 with SMTP id 5b1f17b1804b1-4853e19a14cmr27263535e9.28.1773056129341; Mon, 09 Mar 2026 04:35:29 -0700 (PDT) Received: from [192.168.0.125] (cdmjno2.plus.com. [31.125.38.23]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-4852378de92sm131846435e9.0.2026.03.09.04.35.28 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 09 Mar 2026 04:35:28 -0700 (PDT) Message-ID: <73c13ef8-8191-45e1-a110-2392feff96a4@gmail.com> Date: Mon, 9 Mar 2026 11:35:27 +0000 Precedence: bulk X-Mailing-List: netfs@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: fscache/NFS re-export server lockup To: linux-nfs@vger.kernel.org, netfs@lists.linux.dev Cc: Daire Byrne , dhowells@redhat.com, trondmy@hammerspace.com, okorniev@redhat.com, Mike Snitzer References: Content-Language: en-US From: Mike Owen In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit For anyone following this thread, the solution was a broad set of parameter tuning (mkfs/mount options, block device tuning, and VM sysctls) combined with switching from EXT4 to XFS. Some sizeable performance jumps were gained as well across the stack. No issue with FS-Cache here as previously implied. Thanks to those who provided feedback. -Mike On 15/01/2026 21:53, Mike Snitzer wrote: > On Thu, Jan 15, 2026 at 03:20:24PM +0000, Mike Owen wrote: >> Hi Mike S, >> >> On 12/01/2026 15:16, Mike Owen wrote: >>> Ah, this looks promising. Thanks for the info, Mike! >>> Whilst we wait for the necessary NFS client fixes PR, I'll look to add >>> the patch to 6.19-rc5 and report back if this fixes the issue we are >>> seeing. >>> I'll see what I can do internally to advocate Canonical absorbing it as well. >>> ACK on my log wrapping. My bad. >>> Thanks again! >>> -Mike >>> >>> On Sat, 10 Jan 2026 at 09:48, Mike Snitzer wrote: >>>> >>>> Hi Mike, >>>> >>>> On Fri, Jan 09, 2026 at 09:45:47PM +0000, Mike Owen wrote: >>>>> Hi Daire, thanks for the comments. >>>>> >>>>>> Can you stop the nfs server and is access to /var/cache/fscache still blocked? >>>>> As the machine is deadlocked, after reboot (so the nfs server is >>>>> definitely stopped), the actual data is gone/corrupted. >>>>> >>>>>> And I presume there is definitely nothing else that might be >>>>> interacting with that /var/cache/fscache filesystem outside of fscache >>>>> or cachefilesd? >>>>> Correct. Machine is dedicated to KNFSD caching duties. >>>>> >>>>>> Our /etc/cachefilesd.conf is pretty basic (brun 30%, bcull 10%, bstop 3%). >>>>> Similar settings here: >>>>> brun 20% >>>>> bcull 7% >>>>> bstop 3% >>>>> frun 20% >>>>> fcull 7% >>>>> fstop 3% >>>>> Although I should note that the issue happens when only ~10-20% of the >>>>> NVMe capacity is used, so culling has never had to run at this point. >>>>> >>>>> We did try running 6.17.0 but made no difference. I see another thread >>>>> of yours with Chuck: "refcount_t underflow (nfsd4_sequence_done?) with >>>>> v6.18 re-export" >>>>> and suggested commits to investigate, incl: cbfd91d22776 ("nfsd: never >>>>> defer requests during idmap lookup") as well as try using 6.18.4, so >>>>> it's possible there is a cascading issue here and we are in need of >>>>> some NFS patches. >>>>> >>>>> I'm hoping @dhowells might have some suggestions on how to further >>>>> debug this issue, given the below stack we are seeing when it >>>>> deadlocks? >>>>> >>>>> Thanks, >>>>> -Mike >>>> >>>> This commit from Trond, which he'll be sending to Linus soon as part >>>> of his 6.19-rc NFS client fixes pull request, should fix the NFS >>>> re-export induced nfs_release_folio deadlock reflected in your below >>>> stack trace: >>>> https://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=commitdiff;h=cce0be6eb4971456b703aaeafd571650d314bcca >>>> >>>> Here is more context for why I know that to be likely, it fixed my >>>> nasty LOCALIO-based reexport deadlock situation too: >>>> https://lore.kernel.org/linux-nfs/20260107160858.6847-1-snitzer@kernel.org/ >>>> >>>> I'm doing my part to advocate that Red Hat (Olga cc'd) take this >>>> fix into RHEL 10.2 (and backport as needed). >>>> >>>> Good luck getting Ubuntu to include this fix in a timely manner (we'll >>>> all thank you for that if you can help shake the Canonical tree). >>>> >>>> BTW, you'd do well to fix your editor/email so that it doesn't line >>>> wrap when you share logs on Linux mailing lists: >>>> >> ... >> >> We deployed 6.19-rc5 + kernel patch: "NFS: Fix a deadlock involving nfs_release_folio()" but unfortunately this has not fixed the issue. The KNFSD server becomes wedged (as far as NFSD is concerned, can still login) and we get the attached dmesg log. I attempted an analysis/RCA of this circular dependency/deadlock issue to try and assist getting this resolved (see attached). Any other patches to try? >> -Mike > >> [Thu Jan 15 10:37:38 2026] md127: array will not be assembled in old kernels that lack configurable LBS support (<= 6.18) >> [Thu Jan 15 10:37:38 2026] md127: detected capacity change from 0 to 29296345088 >> [Thu Jan 15 10:38:46 2026] EXT4-fs (md127): mounted filesystem cc12ea7f-34f9-445a-b8ad-a9f1a122c51d r/w with ordered data mode. Quota mode: none. >> [Thu Jan 15 10:38:46 2026] netfs: FS-Cache loaded >> [Thu Jan 15 10:38:46 2026] CacheFiles: Loaded >> [Thu Jan 15 10:38:46 2026] netfs: Cache "mycache" added (type cachefiles) >> [Thu Jan 15 10:38:46 2026] CacheFiles: File cache on mycache registered >> [Thu Jan 15 10:39:07 2026] NFSD: Using nfsdcld client tracking operations. >> [Thu Jan 15 10:39:07 2026] NFSD: no clients to reclaim, skipping NFSv4 grace period (net effffff9) >> [Thu Jan 15 10:39:28 2026] audit: type=1400 audit(1768473568.836:130): apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" name="rsyslogd" pid=8112 comm="apparmor_parser" >> [Thu Jan 15 10:40:39 2026] hrtimer: interrupt took 10300 ns >> [Thu Jan 15 10:42:35 2026] loop4: detected capacity change from 0 to 98480 >> [Thu Jan 15 10:42:35 2026] audit: type=1400 audit(1768473755.481:131): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/snap/snapd/25935/usr/lib/snapd/snap-confine" pid=9736 comm="apparmor_parser" >> [Thu Jan 15 10:42:35 2026] audit: type=1400 audit(1768473755.482:132): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/snap/snapd/25935/usr/lib/snapd/snap-confine//mount-namespace-capture-helper" pid=9736 comm="apparmor_parser" >> [Thu Jan 15 10:42:40 2026] loop5: detected capacity change from 0 to 8 >> [Thu Jan 15 10:42:40 2026] audit: type=1400 audit(1768473760.184:133): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="/snap/snapd/25935/usr/lib/snapd/snap-confine" pid=10068 comm="apparmor_parser" >> [Thu Jan 15 10:42:40 2026] audit: type=1400 audit(1768473760.194:134): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="/snap/snapd/25935/usr/lib/snapd/snap-confine//mount-namespace-capture-helper" pid=10068 comm="apparmor_parser" >> [Thu Jan 15 10:42:40 2026] audit: type=1400 audit(1768473760.201:135): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="snap.amazon-ssm-agent.amazon-ssm-agent" pid=10071 comm="apparmor_parser" >> [Thu Jan 15 10:42:40 2026] audit: type=1400 audit(1768473760.201:136): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="snap.amazon-ssm-agent.ssm-cli" pid=10072 comm="apparmor_parser" >> [Thu Jan 15 10:42:40 2026] audit: type=1400 audit(1768473760.207:137): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="snap-update-ns.amazon-ssm-agent" pid=10070 comm="apparmor_parser" >> [Thu Jan 15 11:59:09 2026] netfs: fscache_begin_operation: cookie state change wait timed out: cookie->state=1 state=1 >> [Thu Jan 15 11:59:09 2026] netfs: O-cookie c=0000df95 [fl=6124 na=1 nA=2 s=L] >> [Thu Jan 15 11:59:09 2026] netfs: O-cookie V=00000002 [Infs,3.0,2,,2109120a,7bd660f3,,,d0,100000,100000,927c0,927c0,927c0,927c0,1] >> [Thu Jan 15 11:59:09 2026] netfs: O-key=[32] 'f360d67b060000003414386507000000ffffffff000000000200c10901000000' >> [Thu Jan 15 11:59:10 2026] netfs: fscache_begin_operation: cookie state change wait timed out: cookie->state=1 state=1 >> [Thu Jan 15 11:59:10 2026] netfs: O-cookie c=0000e07f [fl=6124 na=1 nA=2 s=L] >> [Thu Jan 15 11:59:10 2026] netfs: O-cookie V=00000002 [Infs,3.0,2,,2109120a,7bd660f3,,,d0,100000,100000,927c0,927c0,927c0,927c0,1] >> [Thu Jan 15 11:59:10 2026] netfs: O-key=[32] 'f360d67b0600000020301c6207000000ffffffff000000000200c10901000000' >> [Thu Jan 15 11:59:10 2026] netfs: fscache_begin_operation: cookie state change wait timed out: cookie->state=1 state=1 >> [Thu Jan 15 11:59:10 2026] netfs: fscache_begin_operation: cookie state change wait timed out: cookie->state=1 state=1 >> [Thu Jan 15 11:59:10 2026] netfs: O-cookie c=0000e1a1 [fl=6124 na=1 nA=2 s=L] >> [Thu Jan 15 11:59:10 2026] netfs: fscache_begin_operation: cookie state change wait timed out: cookie->state=1 state=1 >> [Thu Jan 15 11:59:10 2026] netfs: fscache_begin_operation: cookie state change wait timed out: cookie->state=1 state=1 >> [Thu Jan 15 11:59:10 2026] netfs: O-cookie V=00000002 [Infs,3.0,2,,2109120a,7bd660f3,,,d0,100000,100000,927c0,927c0,927c0,927c0,1] >> [Thu Jan 15 11:59:10 2026] netfs: O-cookie c=0000e01c [fl=6124 na=1 nA=2 s=L] >> [Thu Jan 15 11:59:10 2026] netfs: O-cookie c=0000e14d [fl=6124 na=1 nA=2 s=L] >> [Thu Jan 15 11:59:10 2026] netfs: fscache_begin_operation: cookie state change wait timed out: cookie->state=1 state=1 >> [Thu Jan 15 11:59:10 2026] netfs: O-key=[32] 'f360d67b0600000071149ebc06000000ffffffff000000000200c10901000000' >> [Thu Jan 15 11:59:10 2026] netfs: O-cookie V=00000002 [Infs,3.0,2,,2109120a,7bd660f3,,,d0,100000,100000,927c0,927c0,927c0,927c0,1] >> [Thu Jan 15 11:59:10 2026] netfs: fscache_begin_operation: cookie state change wait timed out: cookie->state=1 state=1 >> [Thu Jan 15 11:59:10 2026] netfs: O-cookie c=0000b381 [fl=6124 na=1 nA=2 s=L] >> [Thu Jan 15 11:59:10 2026] netfs: O-key=[32] 'f360d67b06000000bfd9ce3407000000ffffffff000000000200c10901000000' >> [Thu Jan 15 11:59:10 2026] netfs: O-cookie c=0000df8c [fl=6124 na=1 nA=2 s=L] >> [Thu Jan 15 11:59:10 2026] netfs: O-cookie V=00000002 [Infs,3.0,2,,2109120a,7bd660f3,,,d0,100000,100000,927c0,927c0,927c0,927c0,1] >> [Thu Jan 15 11:59:10 2026] netfs: O-cookie V=00000002 [Infs,3.0,2,,2109120a,7bd660f3,,,d0,100000,100000,927c0,927c0,927c0,927c0,1] >> [Thu Jan 15 11:59:10 2026] netfs: O-key=[32] 'f360d67b060000006b7a9b3a07000000ffffffff000000000200c10901000000' >> [Thu Jan 15 11:59:10 2026] netfs: O-key=[32] 'f360d67b06000000317d6c3207000000ffffffff000000000200c10901000000' >> [Thu Jan 15 11:59:10 2026] netfs: O-cookie c=0000dbbc [fl=6124 na=1 nA=2 s=L] >> [Thu Jan 15 11:59:10 2026] netfs: O-cookie V=00000002 [Infs,3.0,2,,2109120a,7bd660f3,,,d0,100000,100000,927c0,927c0,927c0,927c0,1] >> [Thu Jan 15 11:59:10 2026] netfs: O-cookie V=00000002 [Infs,3.0,2,,2109120a,7bd660f3,,,d0,100000,100000,927c0,927c0,927c0,927c0,1] >> [Thu Jan 15 11:59:10 2026] netfs: fscache_begin_operation: cookie state change wait timed out: cookie->state=1 state=1 >> [Thu Jan 15 11:59:10 2026] netfs: O-cookie c=0000b2c0 [fl=6124 na=1 nA=2 s=L] >> [Thu Jan 15 11:59:10 2026] netfs: O-cookie V=00000002 [Infs,3.0,2,,2109120a,7bd660f3,,,d0,100000,100000,927c0,927c0,927c0,927c0,1] >> [Thu Jan 15 11:59:10 2026] netfs: O-key=[32] 'f360d67b0600000038c7f06f06000000ffffffff000000000200c10901000000' >> [Thu Jan 15 11:59:10 2026] netfs: O-key=[32] 'f360d67b06000000e1483c3807000000ffffffff000000000200c10901000000' >> [Thu Jan 15 11:59:10 2026] netfs: O-key=[32] 'f360d67b06000000e811572e07000000ffffffff000000000200c10901000000' >> [Thu Jan 15 11:59:11 2026] netfs: fscache_begin_operation: cookie state change wait timed out: cookie->state=1 state=1 >> [Thu Jan 15 11:59:11 2026] netfs: O-cookie c=0000dc24 [fl=6124 na=1 nA=2 s=L] >> [Thu Jan 15 11:59:11 2026] netfs: fscache_begin_operation: cookie state change wait timed out: cookie->state=1 state=1 >> [Thu Jan 15 11:59:11 2026] netfs: O-cookie V=00000002 [Infs,3.0,2,,2109120a,7bd660f3,,,d0,100000,100000,927c0,927c0,927c0,927c0,1] >> [Thu Jan 15 11:59:11 2026] netfs: O-cookie c=0000c542 [fl=6124 na=1 nA=2 s=L] >> [Thu Jan 15 11:59:11 2026] netfs: O-key=[32] 'f360d67b06000000af19d78706000000ffffffff000000000200c10901000000' >> [Thu Jan 15 11:59:11 2026] netfs: O-cookie V=00000002 [Infs,3.0,2,,2109120a,7bd660f3,,,d0,100000,100000,927c0,927c0,927c0,927c0,1] >> [Thu Jan 15 11:59:11 2026] netfs: O-key=[32] 'f360d67b060000003d594c7508000000ffffffff000000000200c10901000000' >> [Thu Jan 15 11:59:11 2026] netfs: fscache_begin_operation: cookie state change wait timed out: cookie->state=1 state=1 >> [Thu Jan 15 11:59:11 2026] netfs: O-cookie c=0000cea9 [fl=6124 na=1 nA=2 s=L] >> [Thu Jan 15 11:59:11 2026] netfs: O-cookie V=00000002 [Infs,3.0,2,,2109120a,7bd660f3,,,d0,100000,100000,927c0,927c0,927c0,927c0,1] >> [Thu Jan 15 11:59:11 2026] netfs: O-key=[32] 'f360d67b0600000016b7ae5907000000ffffffff000000000200c10901000000' >> [Thu Jan 15 12:01:08 2026] INFO: task kcompactd0:170 blocked for more than 122 seconds. >> [Thu Jan 15 12:01:08 2026] Tainted: G OE 6.19.0-rc5-knfsd #1 >> [Thu Jan 15 12:01:08 2026] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. >> [Thu Jan 15 12:01:08 2026] task:kcompactd0 state:D stack:0 pid:170 tgid:170 ppid:2 task_flags:0x210040 flags:0x00080000 >> [Thu Jan 15 12:01:08 2026] Call Trace: >> [Thu Jan 15 12:01:08 2026] >> [Thu Jan 15 12:01:08 2026] __schedule+0x481/0x17d0 >> [Thu Jan 15 12:01:08 2026] ? __pfx_radix_tree_node_ctor+0x10/0x10 >> [Thu Jan 15 12:01:08 2026] ? cgroup_writeback_by_id+0x4b/0x200 >> [Thu Jan 15 12:01:08 2026] ? xas_store+0x5b/0x7f0 >> [Thu Jan 15 12:01:08 2026] schedule+0x20/0xe0 >> [Thu Jan 15 12:01:08 2026] io_schedule+0x4c/0x80 >> [Thu Jan 15 12:01:08 2026] folio_wait_bit_common+0x133/0x310 >> [Thu Jan 15 12:01:08 2026] ? __pfx_wake_page_function+0x10/0x10 >> [Thu Jan 15 12:01:08 2026] folio_wait_private_2+0x2c/0x60 >> [Thu Jan 15 12:01:08 2026] nfs_release_folio+0x61/0x130 [nfs] >> [Thu Jan 15 12:01:08 2026] filemap_release_folio+0x68/0xa0 >> [Thu Jan 15 12:01:08 2026] __folio_split+0x178/0x8e0 >> [Thu Jan 15 12:01:08 2026] ? post_alloc_hook+0xc1/0x140 >> [Thu Jan 15 12:01:08 2026] __split_huge_page_to_list_to_order+0x2b/0xb0 >> [Thu Jan 15 12:01:08 2026] split_folio_to_list+0x10/0x20 >> [Thu Jan 15 12:01:08 2026] migrate_pages_batch+0x45d/0xea0 >> [Thu Jan 15 12:01:08 2026] ? __pfx_compaction_alloc+0x10/0x10 >> [Thu Jan 15 12:01:08 2026] ? __pfx_compaction_free+0x10/0x10 >> [Thu Jan 15 12:01:08 2026] ? asm_exc_xen_unknown_trap+0x1/0x20 >> [Thu Jan 15 12:01:08 2026] ? __pfx_compaction_free+0x10/0x10 >> [Thu Jan 15 12:01:08 2026] migrate_pages+0xaef/0xd80 >> [Thu Jan 15 12:01:08 2026] ? __pfx_compaction_free+0x10/0x10 >> [Thu Jan 15 12:01:08 2026] ? __pfx_compaction_alloc+0x10/0x10 >> [Thu Jan 15 12:01:08 2026] compact_zone+0xb3f/0x1200 >> [Thu Jan 15 12:01:08 2026] ? psi_group_change+0x1f8/0x4c0 >> [Thu Jan 15 12:01:08 2026] ? kvm_sched_clock_read+0x11/0x20 >> [Thu Jan 15 12:01:08 2026] compact_node+0xaf/0x130 >> [Thu Jan 15 12:01:08 2026] kcompactd+0x374/0x4d0 >> [Thu Jan 15 12:01:08 2026] ? __pfx_autoremove_wake_function+0x10/0x10 >> [Thu Jan 15 12:01:08 2026] ? __pfx_kcompactd+0x10/0x10 >> [Thu Jan 15 12:01:08 2026] kthread+0xf9/0x210 >> [Thu Jan 15 12:01:08 2026] ? __pfx_kthread+0x10/0x10 >> [Thu Jan 15 12:01:08 2026] ret_from_fork+0x25c/0x290 >> [Thu Jan 15 12:01:08 2026] ? __pfx_kthread+0x10/0x10 >> [Thu Jan 15 12:01:08 2026] ret_from_fork_asm+0x1a/0x30 >> [Thu Jan 15 12:01:08 2026] > > OK, my deadlock was blocked in wait_on_commit whereas you've > consistently shown folio_wait_private_2 in your stack traces (I > originally missed that detail). So I'm not aware of what is different > in your setup.. devil is in the details, but maybe its your use of > fscache? > >> # Analysis of KNFSD/CacheFiles Deadlock >> >> This document provides an analysis of a deadlock issue observed on Ubuntu 24.04 KNFSD nodes >> running a custom Linux kernel 6.19-rc5. The deadlock involves the NFS server (nfsd), >> CacheFiles/netfs layer, ext4 journaling, and memory compaction. >> >> ## Summary >> >> The system enters a deadlock state where: >> >> 1. Multiple **nfsd threads** are blocked waiting for ext4 inode rw-semaphores >> 2. Those rw-semaphores are held by **kworker threads** (writers) >> 3. The **jbd2 journal thread** for the cache filesystem is blocked waiting for updates >> 4. **kcompactd** (memory compaction) is blocked waiting for an NFS folio to release fscache state >> 5. The **fscache cookie state machine** is timing out > > David Howells may be able to inform his fscache-specific vantage point > by having a look at Trond's fix that I pointed to earlier? > > Mike > > ps. solid effort with your below analysis, but its quite fscache > specific so I think it'll best help inform David: > >> >> ## Environment >> >> - **Kernel**: Linux 6.19.0-rc5-knfsd (custom build) >> - **Platform**: Amazon EC2 (24 CPUs, 192GB RAM) >> - **Configuration**: 256 nfsd threads >> - **Cache Filesystem**: ext4 on md127 RAID array >> >> ## Detailed Breakdown >> >> ### 1. Initial Symptom: Cookie State Timeouts (11:59:09) >> >> The first warning signs appear as fscache cookie state timeout errors: >> >> ```text >> netfs: fscache_begin_operation: cookie state change wait timed out: cookie->state=1 state=1 >> ``` >> >> These errors indicate that fscache operations are waiting for cookies to transition to the >> correct state (`FSCACHE_COOKIE_STATE_ACTIVE`), but the state transitions are stalled. >> >> ### 2. The Deadlock Chain (12:01:08) >> >> #### kcompactd0 (memory compaction daemon) >> >> ```text >> folio_wait_private_2+0x2c/0x60 >> nfs_release_folio+0x61/0x130 [nfs] >> filemap_release_folio+0x68/0xa0 >> __folio_split+0x178/0x8e0 >> ``` >> >> The compaction daemon is trying to migrate/split NFS folios but is blocked in >> `folio_wait_private_2()`. This waits for the `PG_fscache` flag to clear, which happens when >> fscache completes its I/O operations on the folio. However, fscache operations are stuck. >> >> #### jbd2/md127-8 (ext4 journal for cache filesystem) >> >> ```text >> jbd2_journal_wait_updates+0x6e/0xe0 >> jbd2_journal_commit_transaction+0x26e/0x1730 >> ``` >> >> The journal commit is waiting for outstanding updates to complete, but those updates are >> blocked. >> >> #### nfsd threads (multiple patterns) >> >> **Pattern A** - Blocked on ext4 inode rwsem: >> >> ```text >> rwsem_down_read_slowpath+0x278/0x500 >> down_read+0x41/0xb0 >> ext4_llseek+0xfc/0x120 <-- needs inode->i_rwsem for SEEK_DATA/SEEK_HOLE >> vfs_llseek+0x1c/0x40 >> cachefiles_do_prepare_read <-- checking what's cached >> ``` >> >> The kernel logs explicitly identify the blocking relationship: >> >> - `nfsd:7300 blocked on an rw-semaphore likely owned by task kworker/u96:8:37820 ` >> >> **Pattern B** - Blocked on jbd2 transaction: >> >> ```text >> wait_transaction_locked+0x87/0xd0 >> add_transaction_credits+0x1e0/0x360 >> jbd2__journal_start >> ext4_dirty_inode <-- updating atime on cache file >> cachefiles_read >> ``` >> >> ### 3. The Circular Dependency >> >> ```text >> nfsd read request >> │ >> ▼ >> nfs_readahead → netfs_readahead → cachefiles_prepare_read >> │ >> ▼ >> ext4_llseek (needs i_rwsem READ lock) >> │ >> ▼ >> BLOCKED: kworkers hold i_rwsem as WRITER >> │ >> ├─────────────────────────────────────────────┐ >> │ │ >> ▼ ▼ >> Those kworkers are likely doing jbd2 waiting for >> cachefiles write operations updates to complete >> │ │ >> ▼ │ >> Waiting for journal │ >> │ │ >> └──────────────────►◄─────────────────────────┘ >> │ >> ▼ >> Memory pressure triggers >> kcompactd which needs to >> release NFS folios >> │ >> ▼ >> nfs_release_folio waits for >> fscache (PG_fscache/private_2) >> │ >> ▼ >> fscache operations are stuck >> waiting for the blocked operations above >> ``` >> >> ## Root Cause Analysis >> >> 1. **ext4_llseek holding i_rwsem**: When CacheFiles uses `SEEK_DATA`/`SEEK_HOLE` to check >> what's cached, it takes the inode's rw-semaphore. If writers (kworkers doing cache writes) >> hold this semaphore, readers (the prepare_read path) block. >> >> 2. **Journal contention**: The ext4 journal on the cache filesystem (md127) is under heavy >> load. With 256 nfsd threads all potentially doing cache operations, journal contention >> is severe. >> >> 3. **Memory pressure + folio release**: When memory compaction tries to free NFS folios that >> have active fscache state, it must wait for fscache to complete—but fscache is blocked >> on ext4. >> >> 4. **Cookie state machine stalls**: The fscache cookie state transitions are blocked because >> the underlying I/O can't complete, leading to the timeout messages. >