Linux NFS development
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@kernel.org>
To: Chuck Lever <chuck.lever@oracle.com>, NeilBrown <neil@brown.name>,
	 Olga Kornievskaia <okorniev@redhat.com>,
	Dai Ngo <Dai.Ngo@oracle.com>,  Tom Talpey <tom@talpey.com>,
	"J. Bruce Fields" <bfields@fieldses.org>,
	 Scott Mayhew <smayhew@redhat.com>,
	 Trond Myklebust <Trond.Myklebust@netapp.com>,
	 Andreas Gruenbacher <agruen@suse.de>,
	Mike Snitzer <snitzer@kernel.org>,
	 Rick Macklem <rmacklem@uoguelph.ca>
Cc: Chris Mason <clm@meta.com>,
	linux-nfs@vger.kernel.org,  linux-kernel@vger.kernel.org,
	Jeff Layton <jlayton@kernel.org>
Subject: [PATCH 03/10] nfsd: serialize nfsd4_end_grace() with atomic test-and-set
Date: Thu, 28 May 2026 17:55:14 -0400	[thread overview]
Message-ID: <20260528-nfsd-fixes-v1-3-e78708eff77d@kernel.org> (raw)
In-Reply-To: <20260528-nfsd-fixes-v1-0-e78708eff77d@kernel.org>

From: Chris Mason <clm@meta.com>

nfsd4_end_grace() guards its drain path with a plain bool:

    if (nn->grace_ended)
            return;
    nn->grace_ended = true;

The read and the write are independent, and nothing in struct
nfsd_net serializes them.  At least two contexts can reach this
code with no lock held:

    laundromat path
      laundry_wq kworker
        nfs4_laundromat()
          nfsd4_end_grace()

    RECLAIM_COMPLETE path
      nfsd compound kthread
        nfsd4_reclaim_complete()
          inc_reclaim_complete()
            nfsd4_end_grace()

Both callers can observe grace_ended == false on different CPUs,
both store true, and both proceed into nfsd4_record_grace_done(),
which invokes the active client_tracking_ops->grace_done callback.
For tracking ops that drain reclaim_str_hashtbl (legacy_tracking_ops
via nfsd4_recdir_purge_old, and the cld v1+ ops via
nfsd4_cld_grace_done), grace_done calls nfs4_release_reclaim(),
which walks every bucket of reclaim_str_hashtbl with no lock and
calls nfs4_remove_reclaim_record() (list_del + kfree) on each
entry.  Two concurrent walkers corrupt the list and double-free
every nfs4_client_reclaim.  A concurrent nfsd4_find_reclaim_client()
iterating the same bucket reads through freed memory.

A third call site exists in nfs4_state_start_net() on the
skip_grace startup path, but it runs under nfsd_mutex before any
client has connected and before the laundromat's first delayed
work fires, so it cannot race with the two callers above.

Fix by replacing the read/write pair with try_cmpxchg() so exactly
one caller transitions grace_ended from false to true and proceeds
into the drain; the loser returns immediately.  bool supports
1-byte cmpxchg on all supported architectures, and no lock
ordering changes are needed.

Fixes: 362063a595be ("nfsd: keep a tally of RECLAIM_COMPLETE operations when using nfsdcld")
Assisted-by: kres:claude-opus-4-7
Signed-off-by: Chris Mason <clm@meta.com>
---
 fs/nfsd/nfs4state.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index f4d12dbcf97b..dc4ac541436f 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -7022,12 +7022,23 @@ nfsd4_renew(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 static void
 nfsd4_end_grace(struct nfsd_net *nn)
 {
-	/* do nothing if grace period already ended */
-	if (nn->grace_ended)
+	bool expected = false;
+
+	/*
+	 * nfsd4_end_grace() can be entered concurrently from the
+	 * laundromat workqueue and from an nfsd compound thread
+	 * handling RECLAIM_COMPLETE.  Without serialization, both
+	 * callers can observe grace_ended==false and proceed into
+	 * nfsd4_record_grace_done().  For tracking ops whose
+	 * grace_done drains reclaim_str_hashtbl, that results in
+	 * list corruption and a double free of every
+	 * nfs4_client_reclaim entry.  Use an atomic test-and-set so
+	 * exactly one caller proceeds.
+	 */
+	if (!try_cmpxchg(&nn->grace_ended, &expected, true))
 		return;
 
 	trace_nfsd_grace_complete(nn);
-	nn->grace_ended = true;
 	/*
 	 * If the server goes down again right now, an NFSv4
 	 * client will still be allowed to reclaim after it comes back up,

-- 
2.54.0


  parent reply	other threads:[~2026-05-28 21:55 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-28 21:55 [PATCH 00/10] nfsd: a pile of fixes for random bugs Jeff Layton
2026-05-28 21:55 ` [PATCH 01/10] nfsd: fix BUG_ON in nfsd4_alloc_layout_stateid on racing delegation revoke Jeff Layton
2026-05-28 23:40   ` NeilBrown
2026-05-29 14:44     ` Jeff Layton
2026-05-28 21:55 ` [PATCH 02/10] nfsd: drain callbacks and clear cl_cb_session Jeff Layton
2026-05-29 15:13   ` Chuck Lever
2026-05-29 17:31     ` Jeff Layton
2026-05-28 21:55 ` Jeff Layton [this message]
2026-05-29 15:38   ` [PATCH 03/10] nfsd: serialize nfsd4_end_grace() with atomic test-and-set Chuck Lever
2026-05-29 15:57     ` Jeff Layton
2026-05-29 16:05       ` Chuck Lever
2026-05-29 17:02         ` Jeff Layton
2026-05-28 21:55 ` [PATCH 04/10] nfsd: dedup nfs4_client_to_reclaim inserts Jeff Layton
2026-05-29 16:22   ` Chuck Lever
2026-05-28 21:55 ` [PATCH 05/10] nfsd: gate nfs3 setacl by argp->mask Jeff Layton
2026-05-28 21:55 ` [PATCH 06/10] NFSD: Enable return of an updated stable_how to NFS clients Jeff Layton
2026-05-29 10:56   ` Jeff Layton
2026-05-30  7:58   ` NFSv4.1 COMMIT of all changed areas only on flush? " Cedric Blancher
2026-05-30 10:24     ` Jeff Layton
2026-05-28 21:55 ` [PATCH 07/10] NFSD: check truncate permission under inode lock Jeff Layton
2026-05-28 21:55 ` [PATCH 08/10] nfsd: fix partial-write detection in nfsd_direct_write Jeff Layton
2026-05-29 16:57   ` Chuck Lever
2026-05-29 17:01     ` Jeff Layton
2026-05-29 17:03       ` Chuck Lever
2026-05-29 17:06         ` Jeff Layton
2026-05-29 17:09           ` Chuck Lever
2026-05-28 21:55 ` [PATCH 09/10] nfsd: cap decoded POSIX ACL count to bound sort cost Jeff Layton
2026-05-28 22:11   ` Rick Macklem
2026-05-28 23:11     ` Chuck Lever
2026-05-29  0:07       ` Chuck Lever
2026-05-29 10:48         ` Jeff Layton
2026-05-29 13:20           ` Chuck Lever
2026-05-29  7:34   ` Cedric Blancher
2026-05-29 10:50     ` Jeff Layton
2026-05-29 18:34   ` Chuck Lever
2026-05-29 18:41     ` Jeff Layton
2026-05-29 18:48       ` Chuck Lever
2026-05-29 23:04     ` Rick Macklem
2026-05-28 21:55 ` [PATCH 10/10] nfsd: validate symlink target length in NFSv4 CREATE Jeff Layton
2026-05-29 18:55   ` Chuck Lever

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260528-nfsd-fixes-v1-3-e78708eff77d@kernel.org \
    --to=jlayton@kernel.org \
    --cc=Dai.Ngo@oracle.com \
    --cc=Trond.Myklebust@netapp.com \
    --cc=agruen@suse.de \
    --cc=bfields@fieldses.org \
    --cc=chuck.lever@oracle.com \
    --cc=clm@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neil@brown.name \
    --cc=okorniev@redhat.com \
    --cc=rmacklem@uoguelph.ca \
    --cc=smayhew@redhat.com \
    --cc=snitzer@kernel.org \
    --cc=tom@talpey.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox