From: Jeff Layton <jlayton@kernel.org>
To: Chuck Lever <cel@kernel.org>,
Chuck Lever <chuck.lever@oracle.com>,
NeilBrown <neil@brown.name>,
Olga Kornievskaia <okorniev@redhat.com>,
Dai Ngo <Dai.Ngo@oracle.com>, Tom Talpey <tom@talpey.com>,
"J. Bruce Fields" <bfields@fieldses.org>,
Scott Mayhew <smayhew@redhat.com>,
Trond Myklebust <Trond.Myklebust@netapp.com>,
Andreas Gruenbacher <agruen@suse.de>,
Mike Snitzer <snitzer@kernel.org>,
Rick Macklem <rmacklem@uoguelph.ca>
Cc: Chris Mason <clm@meta.com>,
linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 03/10] nfsd: serialize nfsd4_end_grace() with atomic test-and-set
Date: Fri, 29 May 2026 13:02:54 -0400 [thread overview]
Message-ID: <a1363f45dec225d8ba59fc2ae50b206ebdd5996f.camel@kernel.org> (raw)
In-Reply-To: <b51daabc-fb34-4cf2-a5e9-2c0e59e1d5c0@kernel.org>
On Fri, 2026-05-29 at 12:05 -0400, Chuck Lever wrote:
> On 5/29/26 11:57 AM, Jeff Layton wrote:
> > On Fri, 2026-05-29 at 11:38 -0400, Chuck Lever wrote:
> > >
> > > On Thu, May 28, 2026, at 5:55 PM, Jeff Layton wrote:
> > > > From: Chris Mason <clm@meta.com>
> > > >
> > > > nfsd4_end_grace() guards its drain path with a plain bool:
> > > >
> > > > if (nn->grace_ended)
> > > > return;
> > > > nn->grace_ended = true;
> > > >
> > > > The read and the write are independent, and nothing in struct
> > > > nfsd_net serializes them. At least two contexts can reach this
> > > > code with no lock held:
> > > >
> > > > laundromat path
> > > > laundry_wq kworker
> > > > nfs4_laundromat()
> > > > nfsd4_end_grace()
> > > >
> > > > RECLAIM_COMPLETE path
> > > > nfsd compound kthread
> > > > nfsd4_reclaim_complete()
> > > > inc_reclaim_complete()
> > > > nfsd4_end_grace()
> > > >
> > > > Both callers can observe grace_ended == false on different CPUs,
> > > > both store true, and both proceed into nfsd4_record_grace_done(),
> > > > which invokes the active client_tracking_ops->grace_done callback.
> > > > For tracking ops that drain reclaim_str_hashtbl (legacy_tracking_ops
> > > > via nfsd4_recdir_purge_old, and the cld v1+ ops via
> > > > nfsd4_cld_grace_done), grace_done calls nfs4_release_reclaim(),
> > > > which walks every bucket of reclaim_str_hashtbl with no lock and
> > > > calls nfs4_remove_reclaim_record() (list_del + kfree) on each
> > > > entry. Two concurrent walkers corrupt the list and double-free
> > > > every nfs4_client_reclaim. A concurrent nfsd4_find_reclaim_client()
> > > > iterating the same bucket reads through freed memory.
> > > >
> > > > A third call site exists in nfs4_state_start_net() on the
> > > > skip_grace startup path, but it runs under nfsd_mutex before any
> > > > client has connected and before the laundromat's first delayed
> > > > work fires, so it cannot race with the two callers above.
> > > >
> > > > Fix by replacing the read/write pair with try_cmpxchg() so exactly
> > > > one caller transitions grace_ended from false to true and proceeds
> > > > into the drain; the loser returns immediately. bool supports
> > > > 1-byte cmpxchg on all supported architectures, and no lock
> > > > ordering changes are needed.
> > > >
> > > > Fixes: 362063a595be ("nfsd: keep a tally of RECLAIM_COMPLETE operations
> > > > when using nfsdcld")
> > > > Assisted-by: kres:claude-opus-4-7
> > > > Signed-off-by: Chris Mason <clm@meta.com>
> > > > ---
> > > > fs/nfsd/nfs4state.c | 17 ++++++++++++++---
> > > > 1 file changed, 14 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > > > index f4d12dbcf97b..dc4ac541436f 100644
> > > > --- a/fs/nfsd/nfs4state.c
> > > > +++ b/fs/nfsd/nfs4state.c
> > > > @@ -7022,12 +7022,23 @@ nfsd4_renew(struct svc_rqst *rqstp, struct
> > > > nfsd4_compound_state *cstate,
> > > > static void
> > > > nfsd4_end_grace(struct nfsd_net *nn)
> > > > {
> > > > - /* do nothing if grace period already ended */
> > > > - if (nn->grace_ended)
> > > > + bool expected = false;
> > > > +
> > > > + /*
> > > > + * nfsd4_end_grace() can be entered concurrently from the
> > > > + * laundromat workqueue and from an nfsd compound thread
> > > > + * handling RECLAIM_COMPLETE. Without serialization, both
> > > > + * callers can observe grace_ended==false and proceed into
> > > > + * nfsd4_record_grace_done(). For tracking ops whose
> > > > + * grace_done drains reclaim_str_hashtbl, that results in
> > > > + * list corruption and a double free of every
> > > > + * nfs4_client_reclaim entry. Use an atomic test-and-set so
> > > > + * exactly one caller proceeds.
> > > > + */
> > > > + if (!try_cmpxchg(&nn->grace_ended, &expected, true))
> > > > return;
> > > >
> > > > trace_nfsd_grace_complete(nn);
> > > > - nn->grace_ended = true;
> > > > /*
> > > > * If the server goes down again right now, an NFSv4
> > > > * client will still be allowed to reclaim after it comes back up,
> > > >
> > > > --
> > > > 2.54.0
> > >
> > > Seems like the usual idiom for something like this is an atomic
> > > bit op. Perhaps try_cmpxchg on a boolean variable is not going
> > > to behave as you expect on every hardware platform.
> >
> > We just need a single flag here though. try_cmpxchg() had better work
> > the same way on every platform or a lot of stuff is FUBAR. Where
> > wouldn't it?
>
> Codex suggests on Hexagon, cmpxchg grabs more than just that boolean.
>
Ok, I'm convinced. What I think I'll do is add a new flags unsigned
long to struct nfsd_net and we can convert most of the bools in that
struct to use it instead.
--
Jeff Layton <jlayton@kernel.org>
next prev parent reply other threads:[~2026-05-29 17:02 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-28 21:55 [PATCH 00/10] nfsd: a pile of fixes for random bugs Jeff Layton
2026-05-28 21:55 ` [PATCH 01/10] nfsd: fix BUG_ON in nfsd4_alloc_layout_stateid on racing delegation revoke Jeff Layton
2026-05-28 23:40 ` NeilBrown
2026-05-29 14:44 ` Jeff Layton
2026-05-28 21:55 ` [PATCH 02/10] nfsd: drain callbacks and clear cl_cb_session Jeff Layton
2026-05-29 15:13 ` Chuck Lever
2026-05-29 17:31 ` Jeff Layton
2026-05-28 21:55 ` [PATCH 03/10] nfsd: serialize nfsd4_end_grace() with atomic test-and-set Jeff Layton
2026-05-29 15:38 ` Chuck Lever
2026-05-29 15:57 ` Jeff Layton
2026-05-29 16:05 ` Chuck Lever
2026-05-29 17:02 ` Jeff Layton [this message]
2026-05-28 21:55 ` [PATCH 04/10] nfsd: dedup nfs4_client_to_reclaim inserts Jeff Layton
2026-05-29 16:22 ` Chuck Lever
2026-05-28 21:55 ` [PATCH 05/10] nfsd: gate nfs3 setacl by argp->mask Jeff Layton
2026-05-28 21:55 ` [PATCH 06/10] NFSD: Enable return of an updated stable_how to NFS clients Jeff Layton
2026-05-29 10:56 ` Jeff Layton
2026-05-30 7:58 ` NFSv4.1 COMMIT of all changed areas only on flush? " Cedric Blancher
2026-05-30 10:24 ` Jeff Layton
2026-05-28 21:55 ` [PATCH 07/10] NFSD: check truncate permission under inode lock Jeff Layton
2026-05-28 21:55 ` [PATCH 08/10] nfsd: fix partial-write detection in nfsd_direct_write Jeff Layton
2026-05-29 16:57 ` Chuck Lever
2026-05-29 17:01 ` Jeff Layton
2026-05-29 17:03 ` Chuck Lever
2026-05-29 17:06 ` Jeff Layton
2026-05-29 17:09 ` Chuck Lever
2026-05-28 21:55 ` [PATCH 09/10] nfsd: cap decoded POSIX ACL count to bound sort cost Jeff Layton
2026-05-28 22:11 ` Rick Macklem
2026-05-28 23:11 ` Chuck Lever
2026-05-29 0:07 ` Chuck Lever
2026-05-29 10:48 ` Jeff Layton
2026-05-29 13:20 ` Chuck Lever
2026-05-29 7:34 ` Cedric Blancher
2026-05-29 10:50 ` Jeff Layton
2026-05-29 18:34 ` Chuck Lever
2026-05-29 18:41 ` Jeff Layton
2026-05-29 18:48 ` Chuck Lever
2026-05-29 23:04 ` Rick Macklem
2026-05-28 21:55 ` [PATCH 10/10] nfsd: validate symlink target length in NFSv4 CREATE Jeff Layton
2026-05-29 18:55 ` Chuck Lever
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a1363f45dec225d8ba59fc2ae50b206ebdd5996f.camel@kernel.org \
--to=jlayton@kernel.org \
--cc=Dai.Ngo@oracle.com \
--cc=Trond.Myklebust@netapp.com \
--cc=agruen@suse.de \
--cc=bfields@fieldses.org \
--cc=cel@kernel.org \
--cc=chuck.lever@oracle.com \
--cc=clm@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=neil@brown.name \
--cc=okorniev@redhat.com \
--cc=rmacklem@uoguelph.ca \
--cc=smayhew@redhat.com \
--cc=snitzer@kernel.org \
--cc=tom@talpey.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox