public inbox for netfs@lists.linux.dev
 help / color / mirror / Atom feed
From: Mike Owen <mjnowen@gmail.com>
To: Mike Snitzer <snitzer@kernel.org>
Cc: Daire Byrne <daire@dneg.com>,
	dhowells@redhat.com, linux-nfs@vger.kernel.org,
	netfs@lists.linux.dev, trondmy@hammerspace.com,
	okorniev@redhat.com
Subject: Re: fscache/NFS re-export server lockup
Date: Thu, 15 Jan 2026 15:20:24 +0000	[thread overview]
Message-ID: <c6dc98cc-df27-4a37-9b15-c4543e199ad0@gmail.com> (raw)
In-Reply-To: <CADFF-zf1qypwRDYkCPH9MFVLoPgsJxsNXZ7-qEbiMLo-L3Ldsg@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3328 bytes --]

Hi Mike S,

On 12/01/2026 15:16, Mike Owen wrote:
> Ah, this looks promising. Thanks for the info, Mike!
> Whilst we wait for the necessary NFS client fixes PR, I'll look to add
> the patch to 6.19-rc5 and report back if this fixes the issue we are
> seeing.
> I'll see what I can do internally to advocate Canonical absorbing it as well.
> ACK on my log wrapping. My bad.
> Thanks again!
> -Mike
> 
> On Sat, 10 Jan 2026 at 09:48, Mike Snitzer <snitzer@kernel.org> wrote:
>>
>> Hi Mike,
>>
>> On Fri, Jan 09, 2026 at 09:45:47PM +0000, Mike Owen wrote:
>>> Hi Daire, thanks for the comments.
>>>
>>>> Can you stop the nfs server and is access to /var/cache/fscache still blocked?
>>> As the machine is deadlocked, after reboot (so the nfs server is
>>> definitely stopped), the actual data is gone/corrupted.
>>>
>>>> And I presume there is definitely nothing else that might be
>>> interacting with that /var/cache/fscache filesystem outside of fscache
>>> or cachefilesd?
>>> Correct. Machine is dedicated to KNFSD caching duties.
>>>
>>>> Our /etc/cachefilesd.conf is pretty basic (brun 30%, bcull 10%, bstop 3%).
>>> Similar settings here:
>>> brun 20%
>>> bcull 7%
>>> bstop 3%
>>> frun 20%
>>> fcull 7%
>>> fstop 3%
>>> Although I should note that the issue happens when only ~10-20% of the
>>> NVMe capacity is used, so culling has never had to run at this point.
>>>
>>> We did try running 6.17.0 but made no difference. I see another thread
>>> of yours with Chuck: "refcount_t underflow (nfsd4_sequence_done?) with
>>> v6.18 re-export"
>>> and suggested commits to investigate, incl: cbfd91d22776 ("nfsd: never
>>> defer requests during idmap lookup") as well as try using 6.18.4, so
>>> it's possible there is a cascading issue here and we are in need of
>>> some NFS patches.
>>>
>>> I'm hoping @dhowells might have some suggestions on how to further
>>> debug this issue, given the below stack we are seeing when it
>>> deadlocks?
>>>
>>> Thanks,
>>> -Mike
>>
>> This commit from Trond, which he'll be sending to Linus soon as part
>> of his 6.19-rc NFS client fixes pull request, should fix the NFS
>> re-export induced nfs_release_folio deadlock reflected in your below
>> stack trace:
>> https://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=commitdiff;h=cce0be6eb4971456b703aaeafd571650d314bcca
>>
>> Here is more context for why I know that to be likely, it fixed my
>> nasty LOCALIO-based reexport deadlock situation too:
>> https://lore.kernel.org/linux-nfs/20260107160858.6847-1-snitzer@kernel.org/
>>
>> I'm doing my part to advocate that Red Hat (Olga cc'd) take this
>> fix into RHEL 10.2 (and backport as needed).
>>
>> Good luck getting Ubuntu to include this fix in a timely manner (we'll
>> all thank you for that if you can help shake the Canonical tree).
>>
>> BTW, you'd do well to fix your editor/email so that it doesn't line
>> wrap when you share logs on Linux mailing lists:
>>
...

We deployed 6.19-rc5 + kernel patch: "NFS: Fix a deadlock involving nfs_release_folio()" but unfortunately this has not fixed the issue. The KNFSD server becomes wedged (as far as NFSD is concerned, can still login) and we get the attached dmesg log. I attempted an analysis/RCA of this circular dependency/deadlock issue to try and assist getting this resolved (see attached). Any other patches to try?
-Mike

[-- Attachment #2: dmesg.txt --]
[-- Type: text/plain, Size: 36995 bytes --]

[Thu Jan 15 10:37:38 2026] md127: array will not be assembled in old kernels that lack configurable LBS support (<= 6.18)
[Thu Jan 15 10:37:38 2026] md127: detected capacity change from 0 to 29296345088
[Thu Jan 15 10:38:46 2026] EXT4-fs (md127): mounted filesystem cc12ea7f-34f9-445a-b8ad-a9f1a122c51d r/w with ordered data mode. Quota mode: none.
[Thu Jan 15 10:38:46 2026] netfs: FS-Cache loaded
[Thu Jan 15 10:38:46 2026] CacheFiles: Loaded
[Thu Jan 15 10:38:46 2026] netfs: Cache "mycache" added (type cachefiles)
[Thu Jan 15 10:38:46 2026] CacheFiles: File cache on mycache registered
[Thu Jan 15 10:39:07 2026] NFSD: Using nfsdcld client tracking operations.
[Thu Jan 15 10:39:07 2026] NFSD: no clients to reclaim, skipping NFSv4 grace period (net effffff9)
[Thu Jan 15 10:39:28 2026] audit: type=1400 audit(1768473568.836:130): apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" name="rsyslogd" pid=8112 comm="apparmor_parser"
[Thu Jan 15 10:40:39 2026] hrtimer: interrupt took 10300 ns
[Thu Jan 15 10:42:35 2026] loop4: detected capacity change from 0 to 98480
[Thu Jan 15 10:42:35 2026] audit: type=1400 audit(1768473755.481:131): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/snap/snapd/25935/usr/lib/snapd/snap-confine" pid=9736 comm="apparmor_parser"
[Thu Jan 15 10:42:35 2026] audit: type=1400 audit(1768473755.482:132): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/snap/snapd/25935/usr/lib/snapd/snap-confine//mount-namespace-capture-helper" pid=9736 comm="apparmor_parser"
[Thu Jan 15 10:42:40 2026] loop5: detected capacity change from 0 to 8
[Thu Jan 15 10:42:40 2026] audit: type=1400 audit(1768473760.184:133): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="/snap/snapd/25935/usr/lib/snapd/snap-confine" pid=10068 comm="apparmor_parser"
[Thu Jan 15 10:42:40 2026] audit: type=1400 audit(1768473760.194:134): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="/snap/snapd/25935/usr/lib/snapd/snap-confine//mount-namespace-capture-helper" pid=10068 comm="apparmor_parser"
[Thu Jan 15 10:42:40 2026] audit: type=1400 audit(1768473760.201:135): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="snap.amazon-ssm-agent.amazon-ssm-agent" pid=10071 comm="apparmor_parser"
[Thu Jan 15 10:42:40 2026] audit: type=1400 audit(1768473760.201:136): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="snap.amazon-ssm-agent.ssm-cli" pid=10072 comm="apparmor_parser"
[Thu Jan 15 10:42:40 2026] audit: type=1400 audit(1768473760.207:137): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="snap-update-ns.amazon-ssm-agent" pid=10070 comm="apparmor_parser"
[Thu Jan 15 11:59:09 2026] netfs: fscache_begin_operation: cookie state change wait timed out: cookie->state=1 state=1
[Thu Jan 15 11:59:09 2026] netfs: O-cookie c=0000df95 [fl=6124 na=1 nA=2 s=L]
[Thu Jan 15 11:59:09 2026] netfs: O-cookie V=00000002 [Infs,3.0,2,,2109120a,7bd660f3,,,d0,100000,100000,927c0,927c0,927c0,927c0,1]
[Thu Jan 15 11:59:09 2026] netfs: O-key=[32] 'f360d67b060000003414386507000000ffffffff000000000200c10901000000'
[Thu Jan 15 11:59:10 2026] netfs: fscache_begin_operation: cookie state change wait timed out: cookie->state=1 state=1
[Thu Jan 15 11:59:10 2026] netfs: O-cookie c=0000e07f [fl=6124 na=1 nA=2 s=L]
[Thu Jan 15 11:59:10 2026] netfs: O-cookie V=00000002 [Infs,3.0,2,,2109120a,7bd660f3,,,d0,100000,100000,927c0,927c0,927c0,927c0,1]
[Thu Jan 15 11:59:10 2026] netfs: O-key=[32] 'f360d67b0600000020301c6207000000ffffffff000000000200c10901000000'
[Thu Jan 15 11:59:10 2026] netfs: fscache_begin_operation: cookie state change wait timed out: cookie->state=1 state=1
[Thu Jan 15 11:59:10 2026] netfs: fscache_begin_operation: cookie state change wait timed out: cookie->state=1 state=1
[Thu Jan 15 11:59:10 2026] netfs: O-cookie c=0000e1a1 [fl=6124 na=1 nA=2 s=L]
[Thu Jan 15 11:59:10 2026] netfs: fscache_begin_operation: cookie state change wait timed out: cookie->state=1 state=1
[Thu Jan 15 11:59:10 2026] netfs: fscache_begin_operation: cookie state change wait timed out: cookie->state=1 state=1
[Thu Jan 15 11:59:10 2026] netfs: O-cookie V=00000002 [Infs,3.0,2,,2109120a,7bd660f3,,,d0,100000,100000,927c0,927c0,927c0,927c0,1]
[Thu Jan 15 11:59:10 2026] netfs: O-cookie c=0000e01c [fl=6124 na=1 nA=2 s=L]
[Thu Jan 15 11:59:10 2026] netfs: O-cookie c=0000e14d [fl=6124 na=1 nA=2 s=L]
[Thu Jan 15 11:59:10 2026] netfs: fscache_begin_operation: cookie state change wait timed out: cookie->state=1 state=1
[Thu Jan 15 11:59:10 2026] netfs: O-key=[32] 'f360d67b0600000071149ebc06000000ffffffff000000000200c10901000000'
[Thu Jan 15 11:59:10 2026] netfs: O-cookie V=00000002 [Infs,3.0,2,,2109120a,7bd660f3,,,d0,100000,100000,927c0,927c0,927c0,927c0,1]
[Thu Jan 15 11:59:10 2026] netfs: fscache_begin_operation: cookie state change wait timed out: cookie->state=1 state=1
[Thu Jan 15 11:59:10 2026] netfs: O-cookie c=0000b381 [fl=6124 na=1 nA=2 s=L]
[Thu Jan 15 11:59:10 2026] netfs: O-key=[32] 'f360d67b06000000bfd9ce3407000000ffffffff000000000200c10901000000'
[Thu Jan 15 11:59:10 2026] netfs: O-cookie c=0000df8c [fl=6124 na=1 nA=2 s=L]
[Thu Jan 15 11:59:10 2026] netfs: O-cookie V=00000002 [Infs,3.0,2,,2109120a,7bd660f3,,,d0,100000,100000,927c0,927c0,927c0,927c0,1]
[Thu Jan 15 11:59:10 2026] netfs: O-cookie V=00000002 [Infs,3.0,2,,2109120a,7bd660f3,,,d0,100000,100000,927c0,927c0,927c0,927c0,1]
[Thu Jan 15 11:59:10 2026] netfs: O-key=[32] 'f360d67b060000006b7a9b3a07000000ffffffff000000000200c10901000000'
[Thu Jan 15 11:59:10 2026] netfs: O-key=[32] 'f360d67b06000000317d6c3207000000ffffffff000000000200c10901000000'
[Thu Jan 15 11:59:10 2026] netfs: O-cookie c=0000dbbc [fl=6124 na=1 nA=2 s=L]
[Thu Jan 15 11:59:10 2026] netfs: O-cookie V=00000002 [Infs,3.0,2,,2109120a,7bd660f3,,,d0,100000,100000,927c0,927c0,927c0,927c0,1]
[Thu Jan 15 11:59:10 2026] netfs: O-cookie V=00000002 [Infs,3.0,2,,2109120a,7bd660f3,,,d0,100000,100000,927c0,927c0,927c0,927c0,1]
[Thu Jan 15 11:59:10 2026] netfs: fscache_begin_operation: cookie state change wait timed out: cookie->state=1 state=1
[Thu Jan 15 11:59:10 2026] netfs: O-cookie c=0000b2c0 [fl=6124 na=1 nA=2 s=L]
[Thu Jan 15 11:59:10 2026] netfs: O-cookie V=00000002 [Infs,3.0,2,,2109120a,7bd660f3,,,d0,100000,100000,927c0,927c0,927c0,927c0,1]
[Thu Jan 15 11:59:10 2026] netfs: O-key=[32] 'f360d67b0600000038c7f06f06000000ffffffff000000000200c10901000000'
[Thu Jan 15 11:59:10 2026] netfs: O-key=[32] 'f360d67b06000000e1483c3807000000ffffffff000000000200c10901000000'
[Thu Jan 15 11:59:10 2026] netfs: O-key=[32] 'f360d67b06000000e811572e07000000ffffffff000000000200c10901000000'
[Thu Jan 15 11:59:11 2026] netfs: fscache_begin_operation: cookie state change wait timed out: cookie->state=1 state=1
[Thu Jan 15 11:59:11 2026] netfs: O-cookie c=0000dc24 [fl=6124 na=1 nA=2 s=L]
[Thu Jan 15 11:59:11 2026] netfs: fscache_begin_operation: cookie state change wait timed out: cookie->state=1 state=1
[Thu Jan 15 11:59:11 2026] netfs: O-cookie V=00000002 [Infs,3.0,2,,2109120a,7bd660f3,,,d0,100000,100000,927c0,927c0,927c0,927c0,1]
[Thu Jan 15 11:59:11 2026] netfs: O-cookie c=0000c542 [fl=6124 na=1 nA=2 s=L]
[Thu Jan 15 11:59:11 2026] netfs: O-key=[32] 'f360d67b06000000af19d78706000000ffffffff000000000200c10901000000'
[Thu Jan 15 11:59:11 2026] netfs: O-cookie V=00000002 [Infs,3.0,2,,2109120a,7bd660f3,,,d0,100000,100000,927c0,927c0,927c0,927c0,1]
[Thu Jan 15 11:59:11 2026] netfs: O-key=[32] 'f360d67b060000003d594c7508000000ffffffff000000000200c10901000000'
[Thu Jan 15 11:59:11 2026] netfs: fscache_begin_operation: cookie state change wait timed out: cookie->state=1 state=1
[Thu Jan 15 11:59:11 2026] netfs: O-cookie c=0000cea9 [fl=6124 na=1 nA=2 s=L]
[Thu Jan 15 11:59:11 2026] netfs: O-cookie V=00000002 [Infs,3.0,2,,2109120a,7bd660f3,,,d0,100000,100000,927c0,927c0,927c0,927c0,1]
[Thu Jan 15 11:59:11 2026] netfs: O-key=[32] 'f360d67b0600000016b7ae5907000000ffffffff000000000200c10901000000'
[Thu Jan 15 12:01:08 2026] INFO: task kcompactd0:170 blocked for more than 122 seconds.
[Thu Jan 15 12:01:08 2026]       Tainted: G           OE       6.19.0-rc5-knfsd #1
[Thu Jan 15 12:01:08 2026] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Thu Jan 15 12:01:08 2026] task:kcompactd0      state:D stack:0     pid:170   tgid:170   ppid:2      task_flags:0x210040 flags:0x00080000
[Thu Jan 15 12:01:08 2026] Call Trace:
[Thu Jan 15 12:01:08 2026]  <TASK>
[Thu Jan 15 12:01:08 2026]  __schedule+0x481/0x17d0
[Thu Jan 15 12:01:08 2026]  ? __pfx_radix_tree_node_ctor+0x10/0x10
[Thu Jan 15 12:01:08 2026]  ? cgroup_writeback_by_id+0x4b/0x200
[Thu Jan 15 12:01:08 2026]  ? xas_store+0x5b/0x7f0
[Thu Jan 15 12:01:08 2026]  schedule+0x20/0xe0
[Thu Jan 15 12:01:08 2026]  io_schedule+0x4c/0x80
[Thu Jan 15 12:01:08 2026]  folio_wait_bit_common+0x133/0x310
[Thu Jan 15 12:01:08 2026]  ? __pfx_wake_page_function+0x10/0x10
[Thu Jan 15 12:01:08 2026]  folio_wait_private_2+0x2c/0x60
[Thu Jan 15 12:01:08 2026]  nfs_release_folio+0x61/0x130 [nfs]
[Thu Jan 15 12:01:08 2026]  filemap_release_folio+0x68/0xa0
[Thu Jan 15 12:01:08 2026]  __folio_split+0x178/0x8e0
[Thu Jan 15 12:01:08 2026]  ? post_alloc_hook+0xc1/0x140
[Thu Jan 15 12:01:08 2026]  __split_huge_page_to_list_to_order+0x2b/0xb0
[Thu Jan 15 12:01:08 2026]  split_folio_to_list+0x10/0x20
[Thu Jan 15 12:01:08 2026]  migrate_pages_batch+0x45d/0xea0
[Thu Jan 15 12:01:08 2026]  ? __pfx_compaction_alloc+0x10/0x10
[Thu Jan 15 12:01:08 2026]  ? __pfx_compaction_free+0x10/0x10
[Thu Jan 15 12:01:08 2026]  ? asm_exc_xen_unknown_trap+0x1/0x20
[Thu Jan 15 12:01:08 2026]  ? __pfx_compaction_free+0x10/0x10
[Thu Jan 15 12:01:08 2026]  migrate_pages+0xaef/0xd80
[Thu Jan 15 12:01:08 2026]  ? __pfx_compaction_free+0x10/0x10
[Thu Jan 15 12:01:08 2026]  ? __pfx_compaction_alloc+0x10/0x10
[Thu Jan 15 12:01:08 2026]  compact_zone+0xb3f/0x1200
[Thu Jan 15 12:01:08 2026]  ? psi_group_change+0x1f8/0x4c0
[Thu Jan 15 12:01:08 2026]  ? kvm_sched_clock_read+0x11/0x20
[Thu Jan 15 12:01:08 2026]  compact_node+0xaf/0x130
[Thu Jan 15 12:01:08 2026]  kcompactd+0x374/0x4d0
[Thu Jan 15 12:01:08 2026]  ? __pfx_autoremove_wake_function+0x10/0x10
[Thu Jan 15 12:01:08 2026]  ? __pfx_kcompactd+0x10/0x10
[Thu Jan 15 12:01:08 2026]  kthread+0xf9/0x210
[Thu Jan 15 12:01:08 2026]  ? __pfx_kthread+0x10/0x10
[Thu Jan 15 12:01:08 2026]  ret_from_fork+0x25c/0x290
[Thu Jan 15 12:01:08 2026]  ? __pfx_kthread+0x10/0x10
[Thu Jan 15 12:01:08 2026]  ret_from_fork_asm+0x1a/0x30
[Thu Jan 15 12:01:08 2026]  </TASK>
[Thu Jan 15 12:01:08 2026] INFO: task jbd2/md127-8:6833 blocked for more than 122 seconds.
[Thu Jan 15 12:01:08 2026]       Tainted: G           OE       6.19.0-rc5-knfsd #1
[Thu Jan 15 12:01:08 2026] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Thu Jan 15 12:01:08 2026] task:jbd2/md127-8    state:D stack:0     pid:6833  tgid:6833  ppid:2      task_flags:0x240040 flags:0x00080000
[Thu Jan 15 12:01:08 2026] Call Trace:
[Thu Jan 15 12:01:08 2026]  <TASK>
[Thu Jan 15 12:01:08 2026]  __schedule+0x481/0x17d0
[Thu Jan 15 12:01:08 2026]  ? ext4_process_freed_data+0x236/0x320
[Thu Jan 15 12:01:08 2026]  schedule+0x20/0xe0
[Thu Jan 15 12:01:08 2026]  jbd2_journal_wait_updates+0x6e/0xe0
[Thu Jan 15 12:01:08 2026]  ? __pfx_autoremove_wake_function+0x10/0x10
[Thu Jan 15 12:01:08 2026]  jbd2_journal_commit_transaction+0x26e/0x1730
[Thu Jan 15 12:01:08 2026]  ? psi_group_change+0x1f8/0x4c0
[Thu Jan 15 12:01:08 2026]  ? kvm_sched_clock_read+0x11/0x20
[Thu Jan 15 12:01:08 2026]  kjournald2+0xa5/0x240
[Thu Jan 15 12:01:08 2026]  ? __pfx_autoremove_wake_function+0x10/0x10
[Thu Jan 15 12:01:08 2026]  ? __pfx_kjournald2+0x10/0x10
[Thu Jan 15 12:01:08 2026]  kthread+0xf9/0x210
[Thu Jan 15 12:01:08 2026]  ? __pfx_kthread+0x10/0x10
[Thu Jan 15 12:01:08 2026]  ret_from_fork+0x25c/0x290
[Thu Jan 15 12:01:08 2026]  ? __pfx_kthread+0x10/0x10
[Thu Jan 15 12:01:08 2026]  ret_from_fork_asm+0x1a/0x30
[Thu Jan 15 12:01:08 2026]  </TASK>
[Thu Jan 15 12:01:08 2026] INFO: task nfsd:7300 blocked for more than 122 seconds.
[Thu Jan 15 12:01:08 2026]       Tainted: G           OE       6.19.0-rc5-knfsd #1
[Thu Jan 15 12:01:08 2026] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Thu Jan 15 12:01:08 2026] task:nfsd            state:D stack:0     pid:7300  tgid:7300  ppid:2      task_flags:0x240040 flags:0x00080000
[Thu Jan 15 12:01:08 2026] Call Trace:
[Thu Jan 15 12:01:08 2026]  <TASK>
[Thu Jan 15 12:01:08 2026]  __schedule+0x481/0x17d0
[Thu Jan 15 12:01:08 2026]  ? __do_sys_getcwd+0x1ce/0x1e0
[Thu Jan 15 12:01:08 2026]  schedule+0x20/0xe0
[Thu Jan 15 12:01:08 2026]  schedule_preempt_disabled+0xe/0x20
[Thu Jan 15 12:01:08 2026]  rwsem_down_read_slowpath+0x278/0x500
[Thu Jan 15 12:01:08 2026]  down_read+0x41/0xb0
[Thu Jan 15 12:01:08 2026]  ext4_llseek+0xfc/0x120
[Thu Jan 15 12:01:08 2026]  vfs_llseek+0x1c/0x40
[Thu Jan 15 12:01:08 2026]  cachefiles_do_prepare_read+0xb4/0x380 [cachefiles]
[Thu Jan 15 12:01:08 2026]  cachefiles_prepare_read+0x32/0x50 [cachefiles]
[Thu Jan 15 12:01:08 2026]  netfs_read_to_pagecache+0xc6/0x4d0 [netfs]
[Thu Jan 15 12:01:08 2026]  netfs_readahead+0x165/0x240 [netfs]
[Thu Jan 15 12:01:08 2026]  nfs_netfs_readahead+0x1f/0x40 [nfs]
[Thu Jan 15 12:01:08 2026]  nfs_readahead+0x14d/0x280 [nfs]
[Thu Jan 15 12:01:08 2026]  read_pages+0x84/0x240
[Thu Jan 15 12:01:08 2026]  page_cache_ra_unbounded+0x1ac/0x2a0
[Thu Jan 15 12:01:08 2026]  page_cache_sync_ra+0x243/0x250
[Thu Jan 15 12:01:08 2026]  filemap_get_pages+0x158/0x790
[Thu Jan 15 12:01:08 2026]  ? iput+0x16/0x290
[Thu Jan 15 12:01:08 2026]  ? __d_obtain_alias+0x7e/0x230
[Thu Jan 15 12:01:08 2026]  filemap_splice_read+0x15e/0x380
[Thu Jan 15 12:01:08 2026]  nfs_file_splice_read+0xba/0xf0 [nfs]
[Thu Jan 15 12:01:08 2026]  do_splice_read+0x63/0xe0
[Thu Jan 15 12:01:08 2026]  splice_direct_to_actor+0xb0/0x260
[Thu Jan 15 12:01:08 2026]  ? __pfx_nfsd_direct_splice_actor+0x10/0x10 [nfsd]
[Thu Jan 15 12:01:08 2026]  nfsd_splice_read+0x105/0x110 [nfsd]
[Thu Jan 15 12:01:08 2026]  nfsd_read+0xbc/0x1b0 [nfsd]
[Thu Jan 15 12:01:08 2026]  nfsd3_proc_read+0x101/0x230 [nfsd]
[Thu Jan 15 12:01:08 2026]  nfsd_dispatch+0xd6/0x1f0 [nfsd]
[Thu Jan 15 12:01:08 2026]  svc_process_common+0x4bf/0x6c0 [sunrpc]
[Thu Jan 15 12:01:08 2026]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
[Thu Jan 15 12:01:08 2026]  svc_process+0x15b/0x260 [sunrpc]
[Thu Jan 15 12:01:08 2026]  svc_recv+0x7ae/0x960 [sunrpc]
[Thu Jan 15 12:01:08 2026]  ? __pfx_nfsd+0x10/0x10 [nfsd]
[Thu Jan 15 12:01:08 2026]  nfsd+0x85/0xe0 [nfsd]
[Thu Jan 15 12:01:08 2026]  kthread+0xf9/0x210
[Thu Jan 15 12:01:08 2026]  ? __pfx_kthread+0x10/0x10
[Thu Jan 15 12:01:08 2026]  ret_from_fork+0x25c/0x290
[Thu Jan 15 12:01:08 2026]  ? __pfx_kthread+0x10/0x10
[Thu Jan 15 12:01:08 2026]  ret_from_fork_asm+0x1a/0x30
[Thu Jan 15 12:01:08 2026]  </TASK>
[Thu Jan 15 12:01:08 2026] INFO: task nfsd:7300 <reader> blocked on an rw-semaphore likely owned by task kworker/u96:8:37820 <writer>
[Thu Jan 15 12:01:08 2026] INFO: task nfsd:7302 blocked for more than 122 seconds.
[Thu Jan 15 12:01:08 2026]       Tainted: G           OE       6.19.0-rc5-knfsd #1
[Thu Jan 15 12:01:08 2026] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Thu Jan 15 12:01:08 2026] task:nfsd            state:D stack:0     pid:7302  tgid:7302  ppid:2      task_flags:0x240040 flags:0x00080000
[Thu Jan 15 12:01:08 2026] Call Trace:
[Thu Jan 15 12:01:08 2026]  <TASK>
[Thu Jan 15 12:01:08 2026]  __schedule+0x481/0x17d0
[Thu Jan 15 12:01:08 2026]  ? __do_sys_getcwd+0x1ce/0x1e0
[Thu Jan 15 12:01:08 2026]  schedule+0x20/0xe0
[Thu Jan 15 12:01:08 2026]  schedule_preempt_disabled+0xe/0x20
[Thu Jan 15 12:01:08 2026]  rwsem_down_read_slowpath+0x278/0x500
[Thu Jan 15 12:01:08 2026]  down_read+0x41/0xb0
[Thu Jan 15 12:01:08 2026]  ext4_llseek+0xfc/0x120
[Thu Jan 15 12:01:08 2026]  vfs_llseek+0x1c/0x40
[Thu Jan 15 12:01:08 2026]  cachefiles_do_prepare_read+0xb4/0x380 [cachefiles]
[Thu Jan 15 12:01:08 2026]  cachefiles_prepare_read+0x32/0x50 [cachefiles]
[Thu Jan 15 12:01:08 2026]  netfs_read_to_pagecache+0xc6/0x4d0 [netfs]
[Thu Jan 15 12:01:08 2026]  netfs_readahead+0x165/0x240 [netfs]
[Thu Jan 15 12:01:08 2026]  nfs_netfs_readahead+0x1f/0x40 [nfs]
[Thu Jan 15 12:01:08 2026]  nfs_readahead+0x14d/0x280 [nfs]
[Thu Jan 15 12:01:08 2026]  ? __pfx_lru_add+0x10/0x10
[Thu Jan 15 12:01:08 2026]  read_pages+0x84/0x240
[Thu Jan 15 12:01:08 2026]  page_cache_ra_order+0x261/0x430
[Thu Jan 15 12:01:08 2026]  page_cache_async_ra+0x193/0x210
[Thu Jan 15 12:01:08 2026]  filemap_readahead.isra.0+0x75/0xb0
[Thu Jan 15 12:01:08 2026]  filemap_get_pages+0x35b/0x790
[Thu Jan 15 12:01:08 2026]  ? iput+0x16/0x290
[Thu Jan 15 12:01:08 2026]  filemap_splice_read+0x15e/0x380
[Thu Jan 15 12:01:08 2026]  nfs_file_splice_read+0xba/0xf0 [nfs]
[Thu Jan 15 12:01:08 2026]  do_splice_read+0x63/0xe0
[Thu Jan 15 12:01:08 2026]  splice_direct_to_actor+0xb0/0x260
[Thu Jan 15 12:01:08 2026]  ? __pfx_nfsd_direct_splice_actor+0x10/0x10 [nfsd]
[Thu Jan 15 12:01:08 2026]  nfsd_splice_read+0x105/0x110 [nfsd]
[Thu Jan 15 12:01:08 2026]  nfsd_read+0xbc/0x1b0 [nfsd]
[Thu Jan 15 12:01:08 2026]  nfsd3_proc_read+0x101/0x230 [nfsd]
[Thu Jan 15 12:01:08 2026]  nfsd_dispatch+0xd6/0x1f0 [nfsd]
[Thu Jan 15 12:01:08 2026]  svc_process_common+0x4bf/0x6c0 [sunrpc]
[Thu Jan 15 12:01:08 2026]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
[Thu Jan 15 12:01:08 2026]  svc_process+0x15b/0x260 [sunrpc]
[Thu Jan 15 12:01:08 2026]  svc_recv+0x7ae/0x960 [sunrpc]
[Thu Jan 15 12:01:08 2026]  ? __pfx_nfsd+0x10/0x10 [nfsd]
[Thu Jan 15 12:01:08 2026]  nfsd+0x85/0xe0 [nfsd]
[Thu Jan 15 12:01:08 2026]  kthread+0xf9/0x210
[Thu Jan 15 12:01:08 2026]  ? __pfx_kthread+0x10/0x10
[Thu Jan 15 12:01:08 2026]  ret_from_fork+0x25c/0x290
[Thu Jan 15 12:01:08 2026]  ? __pfx_kthread+0x10/0x10
[Thu Jan 15 12:01:08 2026]  ret_from_fork_asm+0x1a/0x30
[Thu Jan 15 12:01:08 2026]  </TASK>
[Thu Jan 15 12:01:08 2026] INFO: task nfsd:7302 <reader> blocked on an rw-semaphore likely owned by task kworker/u96:2:28110 <writer>
[Thu Jan 15 12:01:08 2026] INFO: task nfsd:7303 blocked for more than 122 seconds.
[Thu Jan 15 12:01:08 2026]       Tainted: G           OE       6.19.0-rc5-knfsd #1
[Thu Jan 15 12:01:08 2026] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Thu Jan 15 12:01:08 2026] task:nfsd            state:D stack:0     pid:7303  tgid:7303  ppid:2      task_flags:0x240040 flags:0x00080000
[Thu Jan 15 12:01:08 2026] Call Trace:
[Thu Jan 15 12:01:08 2026]  <TASK>
[Thu Jan 15 12:01:08 2026]  __schedule+0x481/0x17d0
[Thu Jan 15 12:01:08 2026]  ? blk_mq_flush_plug_list+0x62/0x1e0
[Thu Jan 15 12:01:08 2026]  ? __do_sys_getcwd+0x1ce/0x1e0
[Thu Jan 15 12:01:08 2026]  schedule+0x20/0xe0
[Thu Jan 15 12:01:08 2026]  wait_transaction_locked+0x87/0xd0
[Thu Jan 15 12:01:08 2026]  ? __pfx_autoremove_wake_function+0x10/0x10
[Thu Jan 15 12:01:08 2026]  add_transaction_credits+0x1e0/0x360
[Thu Jan 15 12:01:08 2026]  ? iomap_dio_bio_iter+0x23a/0x4f0
[Thu Jan 15 12:01:08 2026]  start_this_handle+0xfe/0x520
[Thu Jan 15 12:01:08 2026]  jbd2__journal_start+0x107/0x1e0
[Thu Jan 15 12:01:08 2026]  __ext4_journal_start_sb+0x13b/0x1a0
[Thu Jan 15 12:01:08 2026]  ext4_dirty_inode+0x39/0x90
[Thu Jan 15 12:01:08 2026]  __mark_inode_dirty+0x60/0x350
[Thu Jan 15 12:01:08 2026]  generic_update_time+0x50/0x60
[Thu Jan 15 12:01:08 2026]  touch_atime+0xca/0xf0
[Thu Jan 15 12:01:08 2026]  ext4_file_read_iter+0xdf/0x200
[Thu Jan 15 12:01:08 2026]  vfs_iocb_iter_read+0xff/0x160
[Thu Jan 15 12:01:08 2026]  cachefiles_read+0x19e/0x400 [cachefiles]
[Thu Jan 15 12:01:08 2026]  netfs_read_to_pagecache+0x2ab/0x4d0 [netfs]
[Thu Jan 15 12:01:08 2026]  netfs_readahead+0x165/0x240 [netfs]
[Thu Jan 15 12:01:08 2026]  nfs_netfs_readahead+0x1f/0x40 [nfs]
[Thu Jan 15 12:01:08 2026]  nfs_readahead+0x14d/0x280 [nfs]
[Thu Jan 15 12:01:08 2026]  read_pages+0x84/0x240
[Thu Jan 15 12:01:08 2026]  page_cache_ra_unbounded+0x1ac/0x2a0
[Thu Jan 15 12:01:08 2026]  page_cache_sync_ra+0x243/0x250
[Thu Jan 15 12:01:08 2026]  filemap_get_pages+0x158/0x790
[Thu Jan 15 12:01:08 2026]  filemap_splice_read+0x15e/0x380
[Thu Jan 15 12:01:08 2026]  ? nfs_getattr+0x2b4/0x540 [nfs]
[Thu Jan 15 12:01:08 2026]  nfs_file_splice_read+0xba/0xf0 [nfs]
[Thu Jan 15 12:01:08 2026]  do_splice_read+0x63/0xe0
[Thu Jan 15 12:01:08 2026]  splice_direct_to_actor+0xb0/0x260
[Thu Jan 15 12:01:08 2026]  ? __pfx_nfsd_direct_splice_actor+0x10/0x10 [nfsd]
[Thu Jan 15 12:01:08 2026]  nfsd_splice_read+0x105/0x110 [nfsd]
[Thu Jan 15 12:01:08 2026]  nfsd_read+0xbc/0x1b0 [nfsd]
[Thu Jan 15 12:01:08 2026]  nfsd3_proc_read+0x101/0x230 [nfsd]
[Thu Jan 15 12:01:08 2026]  nfsd_dispatch+0xd6/0x1f0 [nfsd]
[Thu Jan 15 12:01:08 2026]  svc_process_common+0x4bf/0x6c0 [sunrpc]
[Thu Jan 15 12:01:08 2026]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
[Thu Jan 15 12:01:08 2026]  svc_process+0x15b/0x260 [sunrpc]
[Thu Jan 15 12:01:08 2026]  svc_recv+0x7ae/0x960 [sunrpc]
[Thu Jan 15 12:01:08 2026]  ? __pfx_nfsd+0x10/0x10 [nfsd]
[Thu Jan 15 12:01:08 2026]  nfsd+0x85/0xe0 [nfsd]
[Thu Jan 15 12:01:08 2026]  kthread+0xf9/0x210
[Thu Jan 15 12:01:08 2026]  ? __pfx_kthread+0x10/0x10
[Thu Jan 15 12:01:08 2026]  ret_from_fork+0x25c/0x290
[Thu Jan 15 12:01:08 2026]  ? __pfx_kthread+0x10/0x10
[Thu Jan 15 12:01:08 2026]  ret_from_fork_asm+0x1a/0x30
[Thu Jan 15 12:01:08 2026]  </TASK>
[Thu Jan 15 12:01:08 2026] INFO: task nfsd:7304 blocked for more than 122 seconds.
[Thu Jan 15 12:01:08 2026]       Tainted: G           OE       6.19.0-rc5-knfsd #1
[Thu Jan 15 12:01:08 2026] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Thu Jan 15 12:01:08 2026] task:nfsd            state:D stack:0     pid:7304  tgid:7304  ppid:2      task_flags:0x240040 flags:0x00080000
[Thu Jan 15 12:01:08 2026] Call Trace:
[Thu Jan 15 12:01:08 2026]  <TASK>
[Thu Jan 15 12:01:08 2026]  __schedule+0x481/0x17d0
[Thu Jan 15 12:01:08 2026]  ? __do_sys_getcwd+0x1ce/0x1e0
[Thu Jan 15 12:01:08 2026]  schedule+0x20/0xe0
[Thu Jan 15 12:01:08 2026]  schedule_preempt_disabled+0xe/0x20
[Thu Jan 15 12:01:08 2026]  rwsem_down_read_slowpath+0x278/0x500
[Thu Jan 15 12:01:08 2026]  down_read+0x41/0xb0
[Thu Jan 15 12:01:08 2026]  ext4_llseek+0xfc/0x120
[Thu Jan 15 12:01:08 2026]  vfs_llseek+0x1c/0x40
[Thu Jan 15 12:01:08 2026]  cachefiles_do_prepare_read+0xb4/0x380 [cachefiles]
[Thu Jan 15 12:01:08 2026]  cachefiles_prepare_read+0x32/0x50 [cachefiles]
[Thu Jan 15 12:01:08 2026]  netfs_read_to_pagecache+0xc6/0x4d0 [netfs]
[Thu Jan 15 12:01:08 2026]  netfs_readahead+0x165/0x240 [netfs]
[Thu Jan 15 12:01:08 2026]  nfs_netfs_readahead+0x1f/0x40 [nfs]
[Thu Jan 15 12:01:08 2026]  nfs_readahead+0x14d/0x280 [nfs]
[Thu Jan 15 12:01:08 2026]  ? __pfx_lru_add+0x10/0x10
[Thu Jan 15 12:01:08 2026]  read_pages+0x84/0x240
[Thu Jan 15 12:01:08 2026]  page_cache_ra_order+0x261/0x430
[Thu Jan 15 12:01:08 2026]  page_cache_async_ra+0x193/0x210
[Thu Jan 15 12:01:08 2026]  filemap_readahead.isra.0+0x75/0xb0
[Thu Jan 15 12:01:08 2026]  filemap_get_pages+0x35b/0x790
[Thu Jan 15 12:01:08 2026]  ? iput+0x16/0x290
[Thu Jan 15 12:01:08 2026]  filemap_splice_read+0x15e/0x380
[Thu Jan 15 12:01:08 2026]  nfs_file_splice_read+0xba/0xf0 [nfs]
[Thu Jan 15 12:01:08 2026]  do_splice_read+0x63/0xe0
[Thu Jan 15 12:01:08 2026]  splice_direct_to_actor+0xb0/0x260
[Thu Jan 15 12:01:08 2026]  ? __pfx_nfsd_direct_splice_actor+0x10/0x10 [nfsd]
[Thu Jan 15 12:01:08 2026]  nfsd_splice_read+0x105/0x110 [nfsd]
[Thu Jan 15 12:01:08 2026]  nfsd_read+0xbc/0x1b0 [nfsd]
[Thu Jan 15 12:01:08 2026]  nfsd3_proc_read+0x101/0x230 [nfsd]
[Thu Jan 15 12:01:08 2026]  nfsd_dispatch+0xd6/0x1f0 [nfsd]
[Thu Jan 15 12:01:08 2026]  svc_process_common+0x4bf/0x6c0 [sunrpc]
[Thu Jan 15 12:01:08 2026]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
[Thu Jan 15 12:01:08 2026]  svc_process+0x15b/0x260 [sunrpc]
[Thu Jan 15 12:01:08 2026]  svc_recv+0x7ae/0x960 [sunrpc]
[Thu Jan 15 12:01:08 2026]  ? __pfx_nfsd+0x10/0x10 [nfsd]
[Thu Jan 15 12:01:08 2026]  nfsd+0x85/0xe0 [nfsd]
[Thu Jan 15 12:01:08 2026]  kthread+0xf9/0x210
[Thu Jan 15 12:01:08 2026]  ? __pfx_kthread+0x10/0x10
[Thu Jan 15 12:01:08 2026]  ret_from_fork+0x25c/0x290
[Thu Jan 15 12:01:08 2026]  ? __pfx_kthread+0x10/0x10
[Thu Jan 15 12:01:08 2026]  ret_from_fork_asm+0x1a/0x30
[Thu Jan 15 12:01:08 2026]  </TASK>
[Thu Jan 15 12:01:08 2026] INFO: task nfsd:7304 <reader> blocked on an rw-semaphore likely owned by task kworker/u96:3:37809 <writer>
[Thu Jan 15 12:01:08 2026] INFO: task nfsd:7306 blocked for more than 122 seconds.
[Thu Jan 15 12:01:08 2026]       Tainted: G           OE       6.19.0-rc5-knfsd #1
[Thu Jan 15 12:01:08 2026] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Thu Jan 15 12:01:08 2026] task:nfsd            state:D stack:0     pid:7306  tgid:7306  ppid:2      task_flags:0x240040 flags:0x00080000
[Thu Jan 15 12:01:08 2026] Call Trace:
[Thu Jan 15 12:01:08 2026]  <TASK>
[Thu Jan 15 12:01:08 2026]  __schedule+0x481/0x17d0
[Thu Jan 15 12:01:08 2026]  ? __do_sys_getcwd+0x1ce/0x1e0
[Thu Jan 15 12:01:08 2026]  schedule+0x20/0xe0
[Thu Jan 15 12:01:08 2026]  schedule_preempt_disabled+0xe/0x20
[Thu Jan 15 12:01:08 2026]  rwsem_down_read_slowpath+0x278/0x500
[Thu Jan 15 12:01:08 2026]  down_read+0x41/0xb0
[Thu Jan 15 12:01:08 2026]  ext4_llseek+0xfc/0x120
[Thu Jan 15 12:01:08 2026]  vfs_llseek+0x1c/0x40
[Thu Jan 15 12:01:08 2026]  cachefiles_do_prepare_read+0xb4/0x380 [cachefiles]
[Thu Jan 15 12:01:08 2026]  cachefiles_prepare_read+0x32/0x50 [cachefiles]
[Thu Jan 15 12:01:08 2026]  netfs_read_to_pagecache+0xc6/0x4d0 [netfs]
[Thu Jan 15 12:01:08 2026]  netfs_readahead+0x165/0x240 [netfs]
[Thu Jan 15 12:01:08 2026]  nfs_netfs_readahead+0x1f/0x40 [nfs]
[Thu Jan 15 12:01:08 2026]  nfs_readahead+0x14d/0x280 [nfs]
[Thu Jan 15 12:01:08 2026]  read_pages+0x84/0x240
[Thu Jan 15 12:01:08 2026]  page_cache_ra_order+0x2c7/0x430
[Thu Jan 15 12:01:08 2026]  page_cache_sync_ra+0x176/0x250
[Thu Jan 15 12:01:08 2026]  filemap_get_pages+0x158/0x790
[Thu Jan 15 12:01:08 2026]  ? iput+0x16/0x290
[Thu Jan 15 12:01:08 2026]  ? __d_obtain_alias+0x7e/0x230
[Thu Jan 15 12:01:08 2026]  filemap_splice_read+0x15e/0x380
[Thu Jan 15 12:01:08 2026]  nfs_file_splice_read+0xba/0xf0 [nfs]
[Thu Jan 15 12:01:08 2026]  do_splice_read+0x63/0xe0
[Thu Jan 15 12:01:08 2026]  splice_direct_to_actor+0xb0/0x260
[Thu Jan 15 12:01:08 2026]  ? __pfx_nfsd_direct_splice_actor+0x10/0x10 [nfsd]
[Thu Jan 15 12:01:08 2026]  nfsd_splice_read+0x105/0x110 [nfsd]
[Thu Jan 15 12:01:08 2026]  nfsd_read+0xbc/0x1b0 [nfsd]
[Thu Jan 15 12:01:08 2026]  nfsd3_proc_read+0x101/0x230 [nfsd]
[Thu Jan 15 12:01:08 2026]  nfsd_dispatch+0xd6/0x1f0 [nfsd]
[Thu Jan 15 12:01:08 2026]  svc_process_common+0x4bf/0x6c0 [sunrpc]
[Thu Jan 15 12:01:08 2026]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
[Thu Jan 15 12:01:08 2026]  svc_process+0x15b/0x260 [sunrpc]
[Thu Jan 15 12:01:08 2026]  svc_recv+0x7ae/0x960 [sunrpc]
[Thu Jan 15 12:01:08 2026]  ? __pfx_nfsd+0x10/0x10 [nfsd]
[Thu Jan 15 12:01:08 2026]  nfsd+0x85/0xe0 [nfsd]
[Thu Jan 15 12:01:08 2026]  kthread+0xf9/0x210
[Thu Jan 15 12:01:08 2026]  ? __pfx_kthread+0x10/0x10
[Thu Jan 15 12:01:08 2026]  ret_from_fork+0x25c/0x290
[Thu Jan 15 12:01:08 2026]  ? __pfx_kthread+0x10/0x10
[Thu Jan 15 12:01:08 2026]  ret_from_fork_asm+0x1a/0x30
[Thu Jan 15 12:01:08 2026]  </TASK>
[Thu Jan 15 12:01:08 2026] INFO: task nfsd:7306 <reader> blocked on an rw-semaphore likely owned by task kworker/u96:6:29817 <writer>
[Thu Jan 15 12:01:08 2026] INFO: task nfsd:7307 blocked for more than 122 seconds.
[Thu Jan 15 12:01:08 2026]       Tainted: G           OE       6.19.0-rc5-knfsd #1
[Thu Jan 15 12:01:08 2026] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Thu Jan 15 12:01:08 2026] task:nfsd            state:D stack:0     pid:7307  tgid:7307  ppid:2      task_flags:0x240040 flags:0x00080000
[Thu Jan 15 12:01:08 2026] Call Trace:
[Thu Jan 15 12:01:08 2026]  <TASK>
[Thu Jan 15 12:01:08 2026]  __schedule+0x481/0x17d0
[Thu Jan 15 12:01:08 2026]  ? __do_sys_getcwd+0x1ce/0x1e0
[Thu Jan 15 12:01:08 2026]  schedule+0x20/0xe0
[Thu Jan 15 12:01:08 2026]  schedule_preempt_disabled+0xe/0x20
[Thu Jan 15 12:01:08 2026]  rwsem_down_read_slowpath+0x278/0x500
[Thu Jan 15 12:01:08 2026]  down_read+0x41/0xb0
[Thu Jan 15 12:01:08 2026]  ext4_llseek+0xfc/0x120
[Thu Jan 15 12:01:08 2026]  vfs_llseek+0x1c/0x40
[Thu Jan 15 12:01:08 2026]  cachefiles_do_prepare_read+0xb4/0x380 [cachefiles]
[Thu Jan 15 12:01:08 2026]  cachefiles_prepare_read+0x32/0x50 [cachefiles]
[Thu Jan 15 12:01:08 2026]  netfs_read_to_pagecache+0xc6/0x4d0 [netfs]
[Thu Jan 15 12:01:08 2026]  netfs_readahead+0x165/0x240 [netfs]
[Thu Jan 15 12:01:08 2026]  nfs_netfs_readahead+0x1f/0x40 [nfs]
[Thu Jan 15 12:01:08 2026]  nfs_readahead+0x14d/0x280 [nfs]
[Thu Jan 15 12:01:08 2026]  read_pages+0x84/0x240
[Thu Jan 15 12:01:08 2026]  page_cache_ra_order+0x261/0x430
[Thu Jan 15 12:01:08 2026]  page_cache_sync_ra+0x176/0x250
[Thu Jan 15 12:01:08 2026]  filemap_get_pages+0x158/0x790
[Thu Jan 15 12:01:08 2026]  ? iput+0x16/0x290
[Thu Jan 15 12:01:08 2026]  ? __d_obtain_alias+0x7e/0x230
[Thu Jan 15 12:01:08 2026]  filemap_splice_read+0x15e/0x380
[Thu Jan 15 12:01:08 2026]  nfs_file_splice_read+0xba/0xf0 [nfs]
[Thu Jan 15 12:01:08 2026]  do_splice_read+0x63/0xe0
[Thu Jan 15 12:01:08 2026]  splice_direct_to_actor+0xb0/0x260
[Thu Jan 15 12:01:08 2026]  ? __pfx_nfsd_direct_splice_actor+0x10/0x10 [nfsd]
[Thu Jan 15 12:01:08 2026]  nfsd_splice_read+0x105/0x110 [nfsd]
[Thu Jan 15 12:01:08 2026]  nfsd_read+0xbc/0x1b0 [nfsd]
[Thu Jan 15 12:01:08 2026]  nfsd3_proc_read+0x101/0x230 [nfsd]
[Thu Jan 15 12:01:08 2026]  nfsd_dispatch+0xd6/0x1f0 [nfsd]
[Thu Jan 15 12:01:08 2026]  svc_process_common+0x4bf/0x6c0 [sunrpc]
[Thu Jan 15 12:01:08 2026]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
[Thu Jan 15 12:01:08 2026]  svc_process+0x15b/0x260 [sunrpc]
[Thu Jan 15 12:01:08 2026]  svc_recv+0x7ae/0x960 [sunrpc]
[Thu Jan 15 12:01:08 2026]  ? __pfx_nfsd+0x10/0x10 [nfsd]
[Thu Jan 15 12:01:08 2026]  nfsd+0x85/0xe0 [nfsd]
[Thu Jan 15 12:01:08 2026]  kthread+0xf9/0x210
[Thu Jan 15 12:01:08 2026]  ? __pfx_kthread+0x10/0x10
[Thu Jan 15 12:01:08 2026]  ret_from_fork+0x25c/0x290
[Thu Jan 15 12:01:08 2026]  ? __pfx_kthread+0x10/0x10
[Thu Jan 15 12:01:08 2026]  ret_from_fork_asm+0x1a/0x30
[Thu Jan 15 12:01:08 2026]  </TASK>
[Thu Jan 15 12:01:08 2026] INFO: task nfsd:7307 <reader> blocked on an rw-semaphore likely owned by task kworker/u96:80:41180 <writer>
[Thu Jan 15 12:01:08 2026] INFO: task nfsd:7308 blocked for more than 122 seconds.
[Thu Jan 15 12:01:08 2026]       Tainted: G           OE       6.19.0-rc5-knfsd #1
[Thu Jan 15 12:01:08 2026] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Thu Jan 15 12:01:08 2026] task:nfsd            state:D stack:0     pid:7308  tgid:7308  ppid:2      task_flags:0x240040 flags:0x00080000
[Thu Jan 15 12:01:08 2026] Call Trace:
[Thu Jan 15 12:01:08 2026]  <TASK>
[Thu Jan 15 12:01:08 2026]  __schedule+0x481/0x17d0
[Thu Jan 15 12:01:08 2026]  ? kmalloc_reserve+0xa4/0x100
[Thu Jan 15 12:01:08 2026]  ? __do_sys_getcwd+0x1ce/0x1e0
[Thu Jan 15 12:01:08 2026]  schedule+0x20/0xe0
[Thu Jan 15 12:01:08 2026]  schedule_preempt_disabled+0xe/0x20
[Thu Jan 15 12:01:08 2026]  rwsem_down_read_slowpath+0x278/0x500
[Thu Jan 15 12:01:08 2026]  down_read+0x41/0xb0
[Thu Jan 15 12:01:08 2026]  ext4_llseek+0xfc/0x120
[Thu Jan 15 12:01:08 2026]  vfs_llseek+0x1c/0x40
[Thu Jan 15 12:01:08 2026]  cachefiles_do_prepare_read+0xb4/0x380 [cachefiles]
[Thu Jan 15 12:01:08 2026]  cachefiles_prepare_read+0x32/0x50 [cachefiles]
[Thu Jan 15 12:01:08 2026]  netfs_read_to_pagecache+0xc6/0x4d0 [netfs]
[Thu Jan 15 12:01:08 2026]  netfs_readahead+0x165/0x240 [netfs]
[Thu Jan 15 12:01:08 2026]  nfs_netfs_readahead+0x1f/0x40 [nfs]
[Thu Jan 15 12:01:08 2026]  nfs_readahead+0x14d/0x280 [nfs]
[Thu Jan 15 12:01:08 2026]  read_pages+0x84/0x240
[Thu Jan 15 12:01:08 2026]  page_cache_ra_unbounded+0x203/0x2a0
[Thu Jan 15 12:01:08 2026]  page_cache_ra_order+0x3ef/0x430
[Thu Jan 15 12:01:08 2026]  page_cache_async_ra+0x193/0x210
[Thu Jan 15 12:01:08 2026]  filemap_readahead.isra.0+0x75/0xb0
[Thu Jan 15 12:01:08 2026]  filemap_get_pages+0x35b/0x790
[Thu Jan 15 12:01:08 2026]  ? iput+0x16/0x290
[Thu Jan 15 12:01:08 2026]  ? __d_obtain_alias+0x7e/0x230
[Thu Jan 15 12:01:08 2026]  ? nfs_attribute_cache_expired+0x39/0xb0 [nfs]
[Thu Jan 15 12:01:08 2026]  filemap_splice_read+0x15e/0x380
[Thu Jan 15 12:01:08 2026]  nfs_file_splice_read+0xba/0xf0 [nfs]
[Thu Jan 15 12:01:08 2026]  do_splice_read+0x63/0xe0
[Thu Jan 15 12:01:08 2026]  splice_direct_to_actor+0xb0/0x260
[Thu Jan 15 12:01:08 2026]  ? __pfx_nfsd_direct_splice_actor+0x10/0x10 [nfsd]
[Thu Jan 15 12:01:08 2026]  nfsd_splice_read+0x105/0x110 [nfsd]
[Thu Jan 15 12:01:08 2026]  nfsd_read+0xbc/0x1b0 [nfsd]
[Thu Jan 15 12:01:08 2026]  nfsd3_proc_read+0x101/0x230 [nfsd]
[Thu Jan 15 12:01:08 2026]  nfsd_dispatch+0xd6/0x1f0 [nfsd]
[Thu Jan 15 12:01:08 2026]  svc_process_common+0x4bf/0x6c0 [sunrpc]
[Thu Jan 15 12:01:08 2026]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
[Thu Jan 15 12:01:08 2026]  svc_process+0x15b/0x260 [sunrpc]
[Thu Jan 15 12:01:08 2026]  svc_recv+0x7ae/0x960 [sunrpc]
[Thu Jan 15 12:01:08 2026]  ? __pfx_nfsd+0x10/0x10 [nfsd]
[Thu Jan 15 12:01:08 2026]  nfsd+0x85/0xe0 [nfsd]
[Thu Jan 15 12:01:08 2026]  kthread+0xf9/0x210
[Thu Jan 15 12:01:08 2026]  ? __pfx_kthread+0x10/0x10
[Thu Jan 15 12:01:08 2026]  ret_from_fork+0x25c/0x290
[Thu Jan 15 12:01:08 2026]  ? __pfx_kthread+0x10/0x10
[Thu Jan 15 12:01:08 2026]  ret_from_fork_asm+0x1a/0x30
[Thu Jan 15 12:01:08 2026]  </TASK>
[Thu Jan 15 12:01:08 2026] INFO: task nfsd:7308 <reader> blocked on an rw-semaphore likely owned by task kworker/u96:118:41218 <writer>
[Thu Jan 15 12:01:08 2026] INFO: task nfsd:7310 blocked for more than 122 seconds.
[Thu Jan 15 12:01:08 2026]       Tainted: G           OE       6.19.0-rc5-knfsd #1
[Thu Jan 15 12:01:08 2026] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Thu Jan 15 12:01:08 2026] task:nfsd            state:D stack:0     pid:7310  tgid:7310  ppid:2      task_flags:0x240040 flags:0x00080000
[Thu Jan 15 12:01:08 2026] Call Trace:
[Thu Jan 15 12:01:08 2026]  <TASK>
[Thu Jan 15 12:01:08 2026]  __schedule+0x481/0x17d0
[Thu Jan 15 12:01:08 2026]  ? __do_sys_getcwd+0x1ce/0x1e0
[Thu Jan 15 12:01:08 2026]  schedule+0x20/0xe0
[Thu Jan 15 12:01:08 2026]  schedule_preempt_disabled+0xe/0x20
[Thu Jan 15 12:01:08 2026]  rwsem_down_read_slowpath+0x278/0x500
[Thu Jan 15 12:01:08 2026]  down_read+0x41/0xb0
[Thu Jan 15 12:01:08 2026]  ext4_llseek+0xfc/0x120
[Thu Jan 15 12:01:08 2026]  vfs_llseek+0x1c/0x40
[Thu Jan 15 12:01:08 2026]  cachefiles_do_prepare_read+0xb4/0x380 [cachefiles]
[Thu Jan 15 12:01:08 2026]  cachefiles_prepare_read+0x32/0x50 [cachefiles]
[Thu Jan 15 12:01:08 2026]  netfs_read_to_pagecache+0xc6/0x4d0 [netfs]
[Thu Jan 15 12:01:08 2026]  netfs_readahead+0x165/0x240 [netfs]
[Thu Jan 15 12:01:08 2026]  nfs_netfs_readahead+0x1f/0x40 [nfs]
[Thu Jan 15 12:01:08 2026]  nfs_readahead+0x14d/0x280 [nfs]
[Thu Jan 15 12:01:08 2026]  read_pages+0x84/0x240
[Thu Jan 15 12:01:08 2026]  page_cache_ra_unbounded+0x1ac/0x2a0
[Thu Jan 15 12:01:08 2026]  page_cache_sync_ra+0x243/0x250
[Thu Jan 15 12:01:08 2026]  filemap_get_pages+0x158/0x790
[Thu Jan 15 12:01:08 2026]  ? iput+0x16/0x290
[Thu Jan 15 12:01:08 2026]  ? __d_obtain_alias+0x7e/0x230
[Thu Jan 15 12:01:08 2026]  filemap_splice_read+0x15e/0x380
[Thu Jan 15 12:01:08 2026]  nfs_file_splice_read+0xba/0xf0 [nfs]
[Thu Jan 15 12:01:08 2026]  do_splice_read+0x63/0xe0
[Thu Jan 15 12:01:08 2026]  splice_direct_to_actor+0xb0/0x260
[Thu Jan 15 12:01:08 2026]  ? __pfx_nfsd_direct_splice_actor+0x10/0x10 [nfsd]
[Thu Jan 15 12:01:08 2026]  nfsd_splice_read+0x105/0x110 [nfsd]
[Thu Jan 15 12:01:08 2026]  nfsd_read+0xbc/0x1b0 [nfsd]
[Thu Jan 15 12:01:08 2026]  nfsd3_proc_read+0x101/0x230 [nfsd]
[Thu Jan 15 12:01:08 2026]  nfsd_dispatch+0xd6/0x1f0 [nfsd]
[Thu Jan 15 12:01:08 2026]  svc_process_common+0x4bf/0x6c0 [sunrpc]
[Thu Jan 15 12:01:08 2026]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
[Thu Jan 15 12:01:08 2026]  svc_process+0x15b/0x260 [sunrpc]
[Thu Jan 15 12:01:08 2026]  svc_recv+0x7ae/0x960 [sunrpc]
[Thu Jan 15 12:01:08 2026]  ? __pfx_nfsd+0x10/0x10 [nfsd]
[Thu Jan 15 12:01:08 2026]  nfsd+0x85/0xe0 [nfsd]
[Thu Jan 15 12:01:08 2026]  kthread+0xf9/0x210
[Thu Jan 15 12:01:08 2026]  ? __pfx_kthread+0x10/0x10
[Thu Jan 15 12:01:08 2026]  ret_from_fork+0x25c/0x290
[Thu Jan 15 12:01:08 2026]  ? __pfx_kthread+0x10/0x10
[Thu Jan 15 12:01:08 2026]  ret_from_fork_asm+0x1a/0x30
[Thu Jan 15 12:01:08 2026]  </TASK>
[Thu Jan 15 12:01:08 2026] INFO: task nfsd:7310 <reader> blocked on an rw-semaphore likely owned by task kworker/u96:95:41195 <writer>
[Thu Jan 15 12:01:08 2026] Future hung task reports are suppressed, see sysctl kernel.hung_task_warnings

[-- Attachment #3: knfsd-cachefiles-deadlock-analysis.md --]
[-- Type: text/plain, Size: 5230 bytes --]

# Analysis of KNFSD/CacheFiles Deadlock

This document provides an analysis of a deadlock issue observed on Ubuntu 24.04 KNFSD nodes
running a custom Linux kernel 6.19-rc5. The deadlock involves the NFS server (nfsd),
CacheFiles/netfs layer, ext4 journaling, and memory compaction.

## Summary

The system enters a deadlock state where:

1. Multiple **nfsd threads** are blocked waiting for ext4 inode rw-semaphores
2. Those rw-semaphores are held by **kworker threads** (writers)
3. The **jbd2 journal thread** for the cache filesystem is blocked waiting for updates
4. **kcompactd** (memory compaction) is blocked waiting for an NFS folio to release fscache state
5. The **fscache cookie state machine** is timing out

## Environment

- **Kernel**: Linux 6.19.0-rc5-knfsd (custom build)
- **Platform**: Amazon EC2 (24 CPUs, 192GB RAM)
- **Configuration**: 256 nfsd threads
- **Cache Filesystem**: ext4 on md127 RAID array

## Detailed Breakdown

### 1. Initial Symptom: Cookie State Timeouts (11:59:09)

The first warning signs appear as fscache cookie state timeout errors:

```text
netfs: fscache_begin_operation: cookie state change wait timed out: cookie->state=1 state=1
```

These errors indicate that fscache operations are waiting for cookies to transition to the
correct state (`FSCACHE_COOKIE_STATE_ACTIVE`), but the state transitions are stalled.

### 2. The Deadlock Chain (12:01:08)

#### kcompactd0 (memory compaction daemon)

```text
folio_wait_private_2+0x2c/0x60
nfs_release_folio+0x61/0x130 [nfs]
filemap_release_folio+0x68/0xa0
__folio_split+0x178/0x8e0
```

The compaction daemon is trying to migrate/split NFS folios but is blocked in
`folio_wait_private_2()`. This waits for the `PG_fscache` flag to clear, which happens when
fscache completes its I/O operations on the folio. However, fscache operations are stuck.

#### jbd2/md127-8 (ext4 journal for cache filesystem)

```text
jbd2_journal_wait_updates+0x6e/0xe0
jbd2_journal_commit_transaction+0x26e/0x1730
```

The journal commit is waiting for outstanding updates to complete, but those updates are
blocked.

#### nfsd threads (multiple patterns)

**Pattern A** - Blocked on ext4 inode rwsem:

```text
rwsem_down_read_slowpath+0x278/0x500
down_read+0x41/0xb0
ext4_llseek+0xfc/0x120            <-- needs inode->i_rwsem for SEEK_DATA/SEEK_HOLE
vfs_llseek+0x1c/0x40
cachefiles_do_prepare_read        <-- checking what's cached
```

The kernel logs explicitly identify the blocking relationship:

- `nfsd:7300 <reader> blocked on an rw-semaphore likely owned by task kworker/u96:8:37820 <writer>`

**Pattern B** - Blocked on jbd2 transaction:

```text
wait_transaction_locked+0x87/0xd0
add_transaction_credits+0x1e0/0x360
jbd2__journal_start
ext4_dirty_inode                  <-- updating atime on cache file
cachefiles_read
```

### 3. The Circular Dependency

```text
  nfsd read request
       │
       ▼
  nfs_readahead → netfs_readahead → cachefiles_prepare_read
       │
       ▼
  ext4_llseek (needs i_rwsem READ lock)
       │
       ▼
  BLOCKED: kworkers hold i_rwsem as WRITER
       │
       ├─────────────────────────────────────────────┐
       │                                             │
       ▼                                             ▼
  Those kworkers are likely doing              jbd2 waiting for
  cachefiles write operations                  updates to complete
       │                                             │
       ▼                                             │
  Waiting for journal                                │
       │                                             │
       └──────────────────►◄─────────────────────────┘
                           │
                           ▼
                    Memory pressure triggers
                    kcompactd which needs to
                    release NFS folios
                           │
                           ▼
                    nfs_release_folio waits for
                    fscache (PG_fscache/private_2)
                           │
                           ▼
                    fscache operations are stuck
                    waiting for the blocked operations above
```

## Root Cause Analysis

1. **ext4_llseek holding i_rwsem**: When CacheFiles uses `SEEK_DATA`/`SEEK_HOLE` to check
   what's cached, it takes the inode's rw-semaphore. If writers (kworkers doing cache writes)
   hold this semaphore, readers (the prepare_read path) block.

2. **Journal contention**: The ext4 journal on the cache filesystem (md127) is under heavy
   load. With 256 nfsd threads all potentially doing cache operations, journal contention
   is severe.

3. **Memory pressure + folio release**: When memory compaction tries to free NFS folios that
   have active fscache state, it must wait for fscache to complete—but fscache is blocked
   on ext4.

4. **Cookie state machine stalls**: The fscache cookie state transitions are blocked because
   the underlying I/O can't complete, leading to the timeout messages.

  reply	other threads:[~2026-01-15 15:20 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-19 15:59 fscache/NFS re-export server lockup Mike Owen
2025-12-19 17:01 ` Daire Byrne
2026-01-09 21:45   ` Mike Owen
2026-01-10  9:48     ` Mike Snitzer
2026-01-12 15:16       ` Mike Owen
2026-01-15 15:20         ` Mike Owen [this message]
2026-01-15 21:53           ` Mike Snitzer
2026-03-09 11:35             ` Mike Owen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c6dc98cc-df27-4a37-9b15-c4543e199ad0@gmail.com \
    --to=mjnowen@gmail.com \
    --cc=daire@dneg.com \
    --cc=dhowells@redhat.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=netfs@lists.linux.dev \
    --cc=okorniev@redhat.com \
    --cc=snitzer@kernel.org \
    --cc=trondmy@hammerspace.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox