All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Snitzer <snitzer@kernel.org>
To: Mike Owen <mjnowen@gmail.com>
Cc: Daire Byrne <daire@dneg.com>,
	dhowells@redhat.com, linux-nfs@vger.kernel.org,
	netfs@lists.linux.dev, trondmy@hammerspace.com,
	okorniev@redhat.com
Subject: Re: fscache/NFS re-export server lockup
Date: Sat, 10 Jan 2026 04:48:36 -0500	[thread overview]
Message-ID: <aWIgdDImfFg6fgxn@kernel.org> (raw)
In-Reply-To: <CADFF-zcFgycZ7c0KC_5eUafjvba_ZxhzED0a7yDR4oip4_KxbA@mail.gmail.com>

Hi Mike,

On Fri, Jan 09, 2026 at 09:45:47PM +0000, Mike Owen wrote:
> Hi Daire, thanks for the comments.
> 
> > Can you stop the nfs server and is access to /var/cache/fscache still blocked?
> As the machine is deadlocked, after reboot (so the nfs server is
> definitely stopped), the actual data is gone/corrupted.
> 
> >And I presume there is definitely nothing else that might be
> interacting with that /var/cache/fscache filesystem outside of fscache
> or cachefilesd?
> Correct. Machine is dedicated to KNFSD caching duties.
> 
> > Our /etc/cachefilesd.conf is pretty basic (brun 30%, bcull 10%, bstop 3%).
> Similar settings here:
> brun 20%
> bcull 7%
> bstop 3%
> frun 20%
> fcull 7%
> fstop 3%
> Although I should note that the issue happens when only ~10-20% of the
> NVMe capacity is used, so culling has never had to run at this point.
> 
> We did try running 6.17.0 but made no difference. I see another thread
> of yours with Chuck: "refcount_t underflow (nfsd4_sequence_done?) with
> v6.18 re-export"
> and suggested commits to investigate, incl: cbfd91d22776 ("nfsd: never
> defer requests during idmap lookup") as well as try using 6.18.4, so
> it's possible there is a cascading issue here and we are in need of
> some NFS patches.
> 
> I'm hoping @dhowells might have some suggestions on how to further
> debug this issue, given the below stack we are seeing when it
> deadlocks?
> 
> Thanks,
> -Mike

This commit from Trond, which he'll be sending to Linus soon as part
of his 6.19-rc NFS client fixes pull request, should fix the NFS
re-export induced nfs_release_folio deadlock reflected in your below
stack trace:
https://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=commitdiff;h=cce0be6eb4971456b703aaeafd571650d314bcca

Here is more context for why I know that to be likely, it fixed my
nasty LOCALIO-based reexport deadlock situation too:
https://lore.kernel.org/linux-nfs/20260107160858.6847-1-snitzer@kernel.org/

I'm doing my part to advocate that Red Hat (Olga cc'd) take this
fix into RHEL 10.2 (and backport as needed).

Good luck getting Ubuntu to include this fix in a timely manner (we'll
all thank you for that if you can help shake the Canonical tree).

BTW, you'd do well to fix your editor/email so that it doesn't line
wrap when you share logs on Linux mailing lists:

> 2025-12-03T15:57:25.438905+00:00 ip-172-23-113-43 kernel: INFO: task
> kcompactd0:171 blocked for more than 122 seconds.
> 2025-12-03T15:57:25.438921+00:00 ip-172-23-113-43 kernel:
> Tainted: G           OE      6.14.0-36-generic #36~24.04.1-Ubuntu
> 2025-12-03T15:57:25.438928+00:00 ip-172-23-113-43 kernel: "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> 2025-12-03T15:57:25.439 is bellow995+00:00 ip-172-23-113-43 kernel:
> task:kcompactd0      state:D stack:0     pid:171   tgid:171   ppid:2
>    task_flags:0x210040 flags:0x00004000
> 2025-12-03T15:57:25.440000+00:00 ip-172-23-113-43 kernel: Call Trace:
> 2025-12-03T15:57:25.440000+00:00 ip-172-23-113-43 kernel:  <TASK>
> 2025-12-03T15:57:25.440003+00:00 ip-172-23-113-43 kernel:
> __schedule+0x2cf/0x640
> 2025-12-03T15:57:25.441017+00:00 ip-172-23-113-43 kernel:  schedule+0x29/0xd0
> 2025-12-03T15:57:25.441022+00:00 ip-172-23-113-43 kernel:  io_schedule+0x4c/0x80
> 2025-12-03T15:57:25.441023+00:00 ip-172-23-113-43 kernel:
> folio_wait_bit_common+0x138/0x310
> 2025-12-03T15:57:25.441023+00:00 ip-172-23-113-43 kernel:  ?
> __pfx_wake_page_function+0x10/0x10
> 2025-12-03T15:57:25.441024+00:00 ip-172-23-113-43 kernel:
> folio_wait_private_2+0x2c/0x60
> 2025-12-03T15:57:25.441025+00:00 ip-172-23-113-43 kernel:
> nfs_release_folio+0xa0/0x120 [nfs]
> 2025-12-03T15:57:25.441032+00:00 ip-172-23-113-43 kernel:
> filemap_release_folio+0x68/0xa0
> 2025-12-03T15:57:25.441033+00:00 ip-172-23-113-43 kernel:
> split_huge_page_to_list_to_order+0x401/0x970
> 2025-12-03T15:57:25.441033+00:00 ip-172-23-113-43 kernel:  ?
> compaction_alloc_noprof+0x1c5/0x2f0
> 2025-12-03T15:57:25.441034+00:00 ip-172-23-113-43 kernel:
> split_folio_to_list+0x22/0x70
> 2025-12-03T15:57:25.441035+00:00 ip-172-23-113-43 kernel:
> migrate_pages_batch+0x2f2/0xa70
> 2025-12-03T15:57:25.441035+00:00 ip-172-23-113-43 kernel:  ?
> __pfx_compaction_free+0x10/0x10
> 2025-12-03T15:57:25.441038+00:00 ip-172-23-113-43 kernel:  ?
> __pfx_compaction_alloc+0x10/0x10
> 2025-12-03T15:57:25.441039+00:00 ip-172-23-113-43 kernel:  ?
> __mod_memcg_lruvec_state+0xf4/0x250
> 2025-12-03T15:57:25.441039+00:00 ip-172-23-113-43 kernel:  ?
> migrate_pages_batch+0x5e8/0xa70
> 2025-12-03T15:57:25.441040+00:00 ip-172-23-113-43 kernel:
> migrate_pages_sync+0x84/0x1e0
> 2025-12-03T15:57:25.441040+00:00 ip-172-23-113-43 kernel:  ?
> __pfx_compaction_free+0x10/0x10
> 2025-12-03T15:57:25.441041+00:00 ip-172-23-113-43 kernel:  ?
> __pfx_compaction_alloc+0x10/0x10
> 2025-12-03T15:57:25.441044+00:00 ip-172-23-113-43 kernel:
> migrate_pages+0x38f/0x4c0
> 2025-12-03T15:57:25.441047+00:00 ip-172-23-113-43 kernel:  ?
> __pfx_compaction_free+0x10/0x10
> 2025-12-03T15:57:25.441048+00:00 ip-172-23-113-43 kernel:  ?
> __pfx_compaction_alloc+0x10/0x10
> 2025-12-03T15:57:25.441048+00:00 ip-172-23-113-43 kernel:
> compact_zone+0x385/0x700
> 2025-12-03T15:57:25.441049+00:00 ip-172-23-113-43 kernel:  ?
> isolate_migratepages_range+0xc1/0xf0
> 2025-12-03T15:57:25.441049+00:00 ip-172-23-113-43 kernel:
> kcompactd_do_work+0xfc/0x240
> 2025-12-03T15:57:25.441050+00:00 ip-172-23-113-43 kernel:  kcompactd+0x43f/0x4a0
> 2025-12-03T15:57:25.441052+00:00 ip-172-23-113-43 kernel:  ?
> __pfx_autoremove_wake_function+0x10/0x10
> 2025-12-03T15:57:25.441053+00:00 ip-172-23-113-43 kernel:  ?
> __pfx_kcompactd+0x10/0x10
> 2025-12-03T15:57:25.441053+00:00 ip-172-23-113-43 kernel:  kthread+0xfe/0x230
> 2025-12-03T15:57:25.441054+00:00 ip-172-23-113-43 kernel:  ?
> __pfx_kthread+0x10/0x10
> 2025-12-03T15:57:25.441054+00:00 ip-172-23-113-43 kernel:
> ret_from_fork+0x47/0x70
> 2025-12-03T15:57:25.441055+00:00 ip-172-23-113-43 kernel:  ?
> __pfx_kthread+0x10/0x10
> 2025-12-03T15:57:25.441057+00:00 ip-172-23-113-43 kernel:
> ret_from_fork_asm+0x1a/0x30
> 2025-12-03T15:57:25.441058+00:00 ip-172-23-113-43 kernel:  </TASK>

  reply	other threads:[~2026-01-10  9:48 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-19 15:59 fscache/NFS re-export server lockup Mike Owen
2025-12-19 17:01 ` Daire Byrne
2026-01-09 21:45   ` Mike Owen
2026-01-10  9:48     ` Mike Snitzer [this message]
2026-01-12 15:16       ` Mike Owen
2026-01-15 15:20         ` Mike Owen
2026-01-15 21:53           ` Mike Snitzer
2026-03-09 11:35             ` Mike Owen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aWIgdDImfFg6fgxn@kernel.org \
    --to=snitzer@kernel.org \
    --cc=daire@dneg.com \
    --cc=dhowells@redhat.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=mjnowen@gmail.com \
    --cc=netfs@lists.linux.dev \
    --cc=okorniev@redhat.com \
    --cc=trondmy@hammerspace.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.