public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Mike Snitzer <snitzer@kernel.org>
To: Mike Owen <mjnowen@gmail.com>
Cc: Daire Byrne <daire@dneg.com>,
	dhowells@redhat.com, linux-nfs@vger.kernel.org,
	netfs@lists.linux.dev, trondmy@hammerspace.com,
	okorniev@redhat.com
Subject: Re: fscache/NFS re-export server lockup
Date: Sat, 10 Jan 2026 04:48:36 -0500	[thread overview]
Message-ID: <aWIgdDImfFg6fgxn@kernel.org> (raw)
In-Reply-To: <CADFF-zcFgycZ7c0KC_5eUafjvba_ZxhzED0a7yDR4oip4_KxbA@mail.gmail.com>

Hi Mike,

On Fri, Jan 09, 2026 at 09:45:47PM +0000, Mike Owen wrote:
> Hi Daire, thanks for the comments.
> 
> > Can you stop the nfs server and is access to /var/cache/fscache still blocked?
> As the machine is deadlocked, after reboot (so the nfs server is
> definitely stopped), the actual data is gone/corrupted.
> 
> >And I presume there is definitely nothing else that might be
> interacting with that /var/cache/fscache filesystem outside of fscache
> or cachefilesd?
> Correct. Machine is dedicated to KNFSD caching duties.
> 
> > Our /etc/cachefilesd.conf is pretty basic (brun 30%, bcull 10%, bstop 3%).
> Similar settings here:
> brun 20%
> bcull 7%
> bstop 3%
> frun 20%
> fcull 7%
> fstop 3%
> Although I should note that the issue happens when only ~10-20% of the
> NVMe capacity is used, so culling has never had to run at this point.
> 
> We did try running 6.17.0 but made no difference. I see another thread
> of yours with Chuck: "refcount_t underflow (nfsd4_sequence_done?) with
> v6.18 re-export"
> and suggested commits to investigate, incl: cbfd91d22776 ("nfsd: never
> defer requests during idmap lookup") as well as try using 6.18.4, so
> it's possible there is a cascading issue here and we are in need of
> some NFS patches.
> 
> I'm hoping @dhowells might have some suggestions on how to further
> debug this issue, given the below stack we are seeing when it
> deadlocks?
> 
> Thanks,
> -Mike

This commit from Trond, which he'll be sending to Linus soon as part
of his 6.19-rc NFS client fixes pull request, should fix the NFS
re-export induced nfs_release_folio deadlock reflected in your below
stack trace:
https://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=commitdiff;h=cce0be6eb4971456b703aaeafd571650d314bcca

Here is more context for why I know that to be likely, it fixed my
nasty LOCALIO-based reexport deadlock situation too:
https://lore.kernel.org/linux-nfs/20260107160858.6847-1-snitzer@kernel.org/

I'm doing my part to advocate that Red Hat (Olga cc'd) take this
fix into RHEL 10.2 (and backport as needed).

Good luck getting Ubuntu to include this fix in a timely manner (we'll
all thank you for that if you can help shake the Canonical tree).

BTW, you'd do well to fix your editor/email so that it doesn't line
wrap when you share logs on Linux mailing lists:

> 2025-12-03T15:57:25.438905+00:00 ip-172-23-113-43 kernel: INFO: task
> kcompactd0:171 blocked for more than 122 seconds.
> 2025-12-03T15:57:25.438921+00:00 ip-172-23-113-43 kernel:
> Tainted: G           OE      6.14.0-36-generic #36~24.04.1-Ubuntu
> 2025-12-03T15:57:25.438928+00:00 ip-172-23-113-43 kernel: "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> 2025-12-03T15:57:25.439 is bellow995+00:00 ip-172-23-113-43 kernel:
> task:kcompactd0      state:D stack:0     pid:171   tgid:171   ppid:2
>    task_flags:0x210040 flags:0x00004000
> 2025-12-03T15:57:25.440000+00:00 ip-172-23-113-43 kernel: Call Trace:
> 2025-12-03T15:57:25.440000+00:00 ip-172-23-113-43 kernel:  <TASK>
> 2025-12-03T15:57:25.440003+00:00 ip-172-23-113-43 kernel:
> __schedule+0x2cf/0x640
> 2025-12-03T15:57:25.441017+00:00 ip-172-23-113-43 kernel:  schedule+0x29/0xd0
> 2025-12-03T15:57:25.441022+00:00 ip-172-23-113-43 kernel:  io_schedule+0x4c/0x80
> 2025-12-03T15:57:25.441023+00:00 ip-172-23-113-43 kernel:
> folio_wait_bit_common+0x138/0x310
> 2025-12-03T15:57:25.441023+00:00 ip-172-23-113-43 kernel:  ?
> __pfx_wake_page_function+0x10/0x10
> 2025-12-03T15:57:25.441024+00:00 ip-172-23-113-43 kernel:
> folio_wait_private_2+0x2c/0x60
> 2025-12-03T15:57:25.441025+00:00 ip-172-23-113-43 kernel:
> nfs_release_folio+0xa0/0x120 [nfs]
> 2025-12-03T15:57:25.441032+00:00 ip-172-23-113-43 kernel:
> filemap_release_folio+0x68/0xa0
> 2025-12-03T15:57:25.441033+00:00 ip-172-23-113-43 kernel:
> split_huge_page_to_list_to_order+0x401/0x970
> 2025-12-03T15:57:25.441033+00:00 ip-172-23-113-43 kernel:  ?
> compaction_alloc_noprof+0x1c5/0x2f0
> 2025-12-03T15:57:25.441034+00:00 ip-172-23-113-43 kernel:
> split_folio_to_list+0x22/0x70
> 2025-12-03T15:57:25.441035+00:00 ip-172-23-113-43 kernel:
> migrate_pages_batch+0x2f2/0xa70
> 2025-12-03T15:57:25.441035+00:00 ip-172-23-113-43 kernel:  ?
> __pfx_compaction_free+0x10/0x10
> 2025-12-03T15:57:25.441038+00:00 ip-172-23-113-43 kernel:  ?
> __pfx_compaction_alloc+0x10/0x10
> 2025-12-03T15:57:25.441039+00:00 ip-172-23-113-43 kernel:  ?
> __mod_memcg_lruvec_state+0xf4/0x250
> 2025-12-03T15:57:25.441039+00:00 ip-172-23-113-43 kernel:  ?
> migrate_pages_batch+0x5e8/0xa70
> 2025-12-03T15:57:25.441040+00:00 ip-172-23-113-43 kernel:
> migrate_pages_sync+0x84/0x1e0
> 2025-12-03T15:57:25.441040+00:00 ip-172-23-113-43 kernel:  ?
> __pfx_compaction_free+0x10/0x10
> 2025-12-03T15:57:25.441041+00:00 ip-172-23-113-43 kernel:  ?
> __pfx_compaction_alloc+0x10/0x10
> 2025-12-03T15:57:25.441044+00:00 ip-172-23-113-43 kernel:
> migrate_pages+0x38f/0x4c0
> 2025-12-03T15:57:25.441047+00:00 ip-172-23-113-43 kernel:  ?
> __pfx_compaction_free+0x10/0x10
> 2025-12-03T15:57:25.441048+00:00 ip-172-23-113-43 kernel:  ?
> __pfx_compaction_alloc+0x10/0x10
> 2025-12-03T15:57:25.441048+00:00 ip-172-23-113-43 kernel:
> compact_zone+0x385/0x700
> 2025-12-03T15:57:25.441049+00:00 ip-172-23-113-43 kernel:  ?
> isolate_migratepages_range+0xc1/0xf0
> 2025-12-03T15:57:25.441049+00:00 ip-172-23-113-43 kernel:
> kcompactd_do_work+0xfc/0x240
> 2025-12-03T15:57:25.441050+00:00 ip-172-23-113-43 kernel:  kcompactd+0x43f/0x4a0
> 2025-12-03T15:57:25.441052+00:00 ip-172-23-113-43 kernel:  ?
> __pfx_autoremove_wake_function+0x10/0x10
> 2025-12-03T15:57:25.441053+00:00 ip-172-23-113-43 kernel:  ?
> __pfx_kcompactd+0x10/0x10
> 2025-12-03T15:57:25.441053+00:00 ip-172-23-113-43 kernel:  kthread+0xfe/0x230
> 2025-12-03T15:57:25.441054+00:00 ip-172-23-113-43 kernel:  ?
> __pfx_kthread+0x10/0x10
> 2025-12-03T15:57:25.441054+00:00 ip-172-23-113-43 kernel:
> ret_from_fork+0x47/0x70
> 2025-12-03T15:57:25.441055+00:00 ip-172-23-113-43 kernel:  ?
> __pfx_kthread+0x10/0x10
> 2025-12-03T15:57:25.441057+00:00 ip-172-23-113-43 kernel:
> ret_from_fork_asm+0x1a/0x30
> 2025-12-03T15:57:25.441058+00:00 ip-172-23-113-43 kernel:  </TASK>

  reply	other threads:[~2026-01-10  9:48 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-19 15:59 fscache/NFS re-export server lockup Mike Owen
2025-12-19 17:01 ` Daire Byrne
2026-01-09 21:45   ` Mike Owen
2026-01-10  9:48     ` Mike Snitzer [this message]
2026-01-12 15:16       ` Mike Owen
2026-01-15 15:20         ` Mike Owen
2026-01-15 21:53           ` Mike Snitzer
2026-03-09 11:35             ` Mike Owen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aWIgdDImfFg6fgxn@kernel.org \
    --to=snitzer@kernel.org \
    --cc=daire@dneg.com \
    --cc=dhowells@redhat.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=mjnowen@gmail.com \
    --cc=netfs@lists.linux.dev \
    --cc=okorniev@redhat.com \
    --cc=trondmy@hammerspace.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox