public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Mike Snitzer <snitzer@kernel.org>
To: Anna Schumaker <anna@kernel.org>,
	Trond Myklebust <trondmy@hammerspace.com>,
	Chuck Lever <chuck.lever@oracle.com>,
	Jeff Layton <jlayton@kernel.org>, NeilBrown <neil@brown.name>
Cc: linux-nfs@vger.kernel.org
Subject: [for-6.16-final PATCH 0/9] NFSD/NFS/LOCALIO: stable fixes and revert 6.16 LOCALIO changes
Date: Sun, 13 Jul 2025 23:13:50 -0400	[thread overview]
Message-ID: <20250714031359.10192-1-snitzer@kernel.org> (raw)
In-Reply-To: <aG0pJXVtApZ9C5vy@kernel.org>

[Apologies for so many words...]

Hi,

I wanted to get this on all the NFS and NFSD maintainers' radar ASAP.

I realize the timing of this is not great due to how late we are in
the 6.16 release cycle (v6.16-rc7).  But I feel it prudent to make it
clear that the LOCALIO changes that went upstream during the 6.16 merge
window are unstable under load.  So this week we'll need to make a
call on how to handle this for v6.16 final.

And just FYI: I unfortunately don't have time this week to assist with
developing/testing a smaller fix to solve this situation.  The window
for extensive testing (by myself and others at Hammerspace) was late
last week.  At this point, given we are short on time, reverting is
the sane thing to do.

Also, the 6.16-rc7 release's LOCALIO changes put it on something of an
island relative to more enterprise production kernels I am helping to
maintain (both the RHEL10 kernel and Oracle's OCI kernel, which is
actually an Ubuntu kernel, both have NFS LOCALIO that is 6.14 based).

All that said:

The past few weeks I had to assist with an HPC benchmarking effort
that generates heavy load using the "MLperf" benchmark suite. Testing
was done on 10 enterprise grade NVMe storage systems (each with 48
CPUs, and 8 NVMe devices) that depend on LOCALIO to "just work
_well_" to achieve a favorable score.  Unfortunately LOCALIO didn't,
so I got to reverting. I started with this partial revert patch but it
wasn't enough (it just made the problem harder to hit), labeling this
previous revert proposal as "RFC" rather than "URGENT" was a mistake:
https://lore.kernel.org/linux-nfs/aG0pJXVtApZ9C5vy@kernel.org/
(which is very similar to patch 2 in this series)

It wasn't until I did a full revert of 6.16's LOCALIO changes that
LOCALIO stopped having resource leaks (nfsd_file in particular) that
prevented proper NFSD shutdown and the inability to unload nfsd.ko.ko
(which I had to do a lot of while developing other NFS and NFSD
changes that were unrelated to LOCALIO).

Neil, I value the work you did to try to address the lingering
complaints about RCU related compiler errors in LOCALIO (but when you
posted your changes months ago I didn't have time to review, and then
they went upstream; so I assumed they were ready and made sure to
include them in Hammerspace's more recent kernels so that I could gain
"production" confidence in the changes even though I still hadn't had
time to review them properly.. ugh).  Glad "we" did this heavy load
testing because otherwise we'd be oblivious about LOCALIO changes
merged for 6.16 causing regression. (I'm sending this later on my
Sunday evening in the hopes that you being in Australia enables us to
not lose a day of communication on this situation).

Patch 2 gets into how simple it is to trigger the nfsd_file leaks
resulting from running fio followed by NFSD shutdown and nfsd.ko
module removal.

Regards,
Mike

Mike Snitzer (9):
  Revert "NFSD: Clean up kdoc for nfsd_open_local_fh()"
  Revert "nfs_localio: change nfsd_file_put_local() to take a pointer to __rcu pointer"
  Revert "nfs_localio: protect race between nfs_uuid_put() and nfs_close_local_fh()"
  Revert "nfs_localio: duplicate nfs_close_local_fh()"
  Revert "nfs_localio: simplify interface to nfsd for getting nfsd_file"
  Revert "nfs_localio: always hold nfsd net ref with nfsd_file ref"
  Revert "nfs_localio: use cmpxchg() to install new nfs_file_localio"
  nfs/localio: avoid bouncing LOCALIO if nfs_client_is_local()
  nfs/localio: add localio_async_probe modparm

 fs/nfs/localio.c           | 64 ++++++++++++++++--------
 fs/nfs_common/nfslocalio.c | 99 +++++++++++++-------------------------
 fs/nfsd/filecache.c        | 34 ++-----------
 fs/nfsd/filecache.h        |  3 +-
 fs/nfsd/localio.c          | 44 ++---------------
 include/linux/nfslocalio.h | 26 +++++-----
 6 files changed, 100 insertions(+), 170 deletions(-)

-- 
2.44.0


  reply	other threads:[~2025-07-14  3:14 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-09  0:46 [PATCH 0/6 v2] nfs_localio: fixes for races and errors from older compilers NeilBrown
2025-05-09  0:46 ` [PATCH 1/6] nfs_localio: use cmpxchg() to install new nfs_file_localio NeilBrown
2025-05-09  0:46 ` [PATCH 2/6] nfs_localio: always hold nfsd net ref with nfsd_file ref NeilBrown
2025-05-09  0:46 ` [PATCH 3/6] nfs_localio: simplify interface to nfsd for getting nfsd_file NeilBrown
2025-05-09  0:46 ` [PATCH 4/6] nfs_localio: duplicate nfs_close_local_fh() NeilBrown
2025-05-09  0:46 ` [PATCH 5/6] nfs_localio: protect race between nfs_uuid_put() and nfs_close_local_fh() NeilBrown
2025-05-09  0:46 ` [PATCH 6/6] nfs_localio: change nfsd_file_put_local() to take a pointer to __rcu pointer NeilBrown
2025-05-09 11:03   ` kernel test robot
2025-07-08 14:20   ` [RFC PATCH for 6.16-rcX] Revert "nfs_localio: change nfsd_file_put_local() to take a pointer to __rcu pointer" Mike Snitzer
2025-07-14  3:13     ` Mike Snitzer [this message]
2025-07-14  3:13       ` [for-6.16-final PATCH 1/9] Revert "NFSD: Clean up kdoc for nfsd_open_local_fh()" Mike Snitzer
2025-07-14  3:13       ` [for-6.16-final PATCH 2/9] Revert "nfs_localio: change nfsd_file_put_local() to take a pointer to __rcu pointer" Mike Snitzer
2025-07-14  3:13       ` [for-6.16-final PATCH 3/9] Revert "nfs_localio: protect race between nfs_uuid_put() and nfs_close_local_fh()" Mike Snitzer
2025-07-14  3:13       ` [for-6.16-final PATCH 4/9] Revert "nfs_localio: duplicate nfs_close_local_fh()" Mike Snitzer
2025-07-14  3:13       ` [for-6.16-final PATCH 5/9] Revert "nfs_localio: simplify interface to nfsd for getting nfsd_file" Mike Snitzer
2025-07-14  3:13       ` [for-6.16-final PATCH 6/9] Revert "nfs_localio: always hold nfsd net ref with nfsd_file ref" Mike Snitzer
2025-07-14  3:13       ` [for-6.16-final PATCH 7/9] Revert "nfs_localio: use cmpxchg() to install new nfs_file_localio" Mike Snitzer
2025-07-14  3:13       ` [for-6.16-final PATCH 8/9] nfs/localio: avoid bouncing LOCALIO if nfs_client_is_local() Mike Snitzer
2025-07-14  4:19         ` NeilBrown
2025-07-14 14:37           ` Mike Snitzer
2025-07-14 12:23         ` Jeff Layton
2025-07-14  3:13       ` [for-6.16-final PATCH 9/9] nfs/localio: add localio_async_probe modparm Mike Snitzer
2025-07-14  4:23         ` NeilBrown
2025-07-14 12:28           ` Jeff Layton
2025-07-14 14:08             ` Mike Snitzer
2025-07-14  3:50     ` [RFC PATCH for 6.16-rcX] Revert "nfs_localio: change nfsd_file_put_local() to take a pointer to __rcu pointer" NeilBrown
2025-07-14 14:45       ` Mike Snitzer
2025-07-15 22:52     ` [PATCH 0/3] Fix localio hangs Trond Myklebust
2025-07-15 22:52       ` [PATCH 1/3] NFS/localio: nfs_close_local_fh() fix check for file closed Trond Myklebust
2025-07-15 22:52       ` [PATCH 2/3] NFS/localio: nfs_uuid_put() fix the wait for file unlink events Trond Myklebust
2025-07-15 22:52       ` [PATCH 3/3] NFS/localio: nfs_uuid_put() fix the wake up after unlinking the file Trond Myklebust
2025-07-16  1:09       ` [PATCH 1/3] NFS/localio: nfs_close_local_fh() fix check for file closed NeilBrown
2025-07-16  1:22       ` [PATCH 2/3] NFS/localio: nfs_uuid_put() fix the wait for file unlink events NeilBrown
2025-07-16  2:29         ` Trond Myklebust
2025-07-16  3:51           ` NeilBrown
2025-07-16  1:31       ` [PATCH 3/3] NFS/localio: nfs_uuid_put() fix the wake up after unlinking the file NeilBrown
2025-07-16  4:17         ` Trond Myklebust
2025-07-16  5:07           ` NeilBrown
2025-07-16 15:19             ` Trond Myklebust
2025-07-16 15:59       ` [PATCH v2 0/3] Fix localio hangs Trond Myklebust
2025-07-16 15:59         ` [PATCH v2 1/3] NFS/localio: nfs_close_local_fh() fix check for file closed Trond Myklebust
2025-07-16 15:59         ` [PATCH v2 2/3] NFS/localio: nfs_uuid_put() fix races with nfs_open/close_local_fh() Trond Myklebust
2025-07-16 15:59         ` [PATCH v2 3/3] NFS/localio: nfs_uuid_put() fix the wake up after unlinking the file Trond Myklebust
2025-07-16 22:09         ` [PATCH v2 0/3] Fix localio hangs NeilBrown
2025-07-16 23:27           ` Mike Snitzer
2025-07-18  0:18             ` NeilBrown
2025-05-09 16:01 ` [PATCH 0/6 v2] nfs_localio: fixes for races and errors from older compilers Chuck Lever
2025-05-09 21:02   ` Mike Snitzer
2025-05-10  0:16     ` Paul E. McKenney
2025-05-10  2:44       ` NeilBrown
2025-05-10  3:01   ` NeilBrown
2025-05-10 16:02     ` Chuck Lever
2025-05-10 19:57       ` Mike Snitzer
2025-05-16 15:33         ` Chuck Lever
2025-05-18 10:46           ` Pali Rohár
2025-05-19  3:49         ` NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250714031359.10192-1-snitzer@kernel.org \
    --to=snitzer@kernel.org \
    --cc=anna@kernel.org \
    --cc=chuck.lever@oracle.com \
    --cc=jlayton@kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neil@brown.name \
    --cc=trondmy@hammerspace.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox