From: Jeff Layton <jlayton@kernel.org>
To: Chuck Lever <cel@kernel.org>,
Chuck Lever <chuck.lever@oracle.com>,
NeilBrown <neil@brown.name>,
Olga Kornievskaia <okorniev@redhat.com>,
Dai Ngo <Dai.Ngo@oracle.com>, Tom Talpey <tom@talpey.com>,
"J. Bruce Fields" <bfields@fieldses.org>,
Scott Mayhew <smayhew@redhat.com>,
Trond Myklebust <Trond.Myklebust@netapp.com>,
Andreas Gruenbacher <agruen@suse.de>,
Mike Snitzer <snitzer@kernel.org>,
Rick Macklem <rmacklem@uoguelph.ca>
Cc: Chris Mason <clm@meta.com>,
linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 02/10] nfsd: drain callbacks and clear cl_cb_session
Date: Fri, 29 May 2026 13:31:46 -0400 [thread overview]
Message-ID: <a7d2a218bb9cb500df1af89e1db39b69b16417af.camel@kernel.org> (raw)
In-Reply-To: <fc8740de-d9bd-4686-a30e-e0a6c1b7f351@app.fastmail.com>
On Fri, 2026-05-29 at 11:13 -0400, Chuck Lever wrote:
>
> On Thu, May 28, 2026, at 5:55 PM, Jeff Layton wrote:
> > From: Chris Mason <clm@meta.com>
> >
> > After a DESTROY_SESSION the per-session teardown path can free a
> > session while rpciod still holds an inflight callback rpc_task that
> > dereferences clp->cl_cb_session. nfsd4_probe_callback_sync() flushes
> > cl_callback_wq, but once nfsd4_run_cb_work() has called
> > rpc_call_async() the rpc_task lives on rpciod; flushing the workqueue
> > does not wait for it. After the flush returns,
> > nfsd4_destroy_session() proceeds through nfsd4_put_session_locked()
> > and free_session() kfree()s the slab while rpciod's
> > nfsd4_cb_sequence_done(), grab_slot(), and nfsd41_cb_release_slot()
> > are still dereferencing cb->cb_clp->cl_cb_session.
> >
> > destroy path rpciod
> > ------------ ------
> > unhash_session(ses)
> > nfsd4_probe_callback_sync(clp)
> > flush_workqueue(cl_callback_wq)
> > /* returns; rpc_task still live */
> > nfsd4_put_session_locked(ses)
> > free_session(ses) -> kfree(ses)
> > nfsd4_cb_sequence_done()
> > reads cb_clp->cl_cb_session
> > /* freed slab */
> >
> > A second window exists in nfsd4_process_cb_update(). When
> > __nfsd4_find_backchannel() returns NULL because unhash_session() has
> > already removed the destroyed session from cl_sessions,
> > setup_callback_client() takes the v4.1 early return
> >
> > if (!conn->cb_xprt || !ses)
> > return -EINVAL;
> >
> > so clp->cl_cb_session = ses never fires and the field retains a
> > pointer to the about-to-be-freed session. Symmetrically, if a later
> > probe finds a different session's backchannel conn and that
> > setup_callback_client() call fails, the error tail must still scrub
> > any previously published cl_cb_session.
> >
> > Fix by mirroring the two-stage drain that nfsd4_shutdown_callback()
> > already performs: call nfsd41_cb_inflight_wait_complete() in
> > nfsd4_probe_callback_sync() after flush_workqueue() so rpciod-side
> > nfsd41_cb_inflight_end() decrements are observed before the caller
> > releases the final session reference. The two direct callers,
> > nfsd4_destroy_session() and nfsd4_init_conn() (itself invoked from
> > nfsd4_create_session() and nfsd4_bind_conn_to_session()), run in
> > sleepable process context and tolerate the wait_var_event() sleep:
> >
> > nfsd4_destroy_session() (fs/nfsd/nfs4state.c):
> > unhash_session(ses);
> > spin_unlock(&nn->client_lock); /* spinlock dropped */
> > nfsd4_probe_callback_sync(ses->se_client);
> >
> > nfsd4_init_conn() (fs/nfsd/nfs4state.c):
> > acquires no locks in its body; calls nfsd4_hash_conn(),
> > nfsd4_register_conn(), then nfsd4_probe_callback_sync() --
> > entirely in sleepable process context.
> >
> > Also clear clp->cl_cb_session unconditionally on the
> > nfsd4_process_cb_update() error return so every
> > setup_callback_client() failure -- whether c is NULL or points at a
> > different session whose probe failed -- leaves the field NULL rather
> > than pointing at a session that may subsequently be freed.
> >
> > Fixes: dcbeaa68dbbd ("nfsd4: allow backchannel recovery")
> > Assisted-by: kres:claude-opus-4-7
> > Signed-off-by: Chris Mason <clm@meta.com>
> > ---
> > fs/nfsd/nfs4callback.c | 21 +++++++++++++++++----
> > 1 file changed, 17 insertions(+), 4 deletions(-)
> >
> > diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
> > index 1964a213f80e..1cf6b6100357 100644
> > --- a/fs/nfsd/nfs4callback.c
> > +++ b/fs/nfsd/nfs4callback.c
> > @@ -1205,9 +1205,8 @@ static int setup_callback_client(struct
> > nfs4_client *clp, struct nfs4_cb_conn *c
> > } else {
> > if (!conn->cb_xprt || !ses)
> > return -EINVAL;
> > - clp->cl_cb_session = ses;
> > args.bc_xprt = conn->cb_xprt;
> > - args.prognumber = clp->cl_cb_session->se_cb_prog;
> > + args.prognumber = ses->se_cb_prog;
> > args.protocol = conn->cb_xprt->xpt_class->xcl_ident |
> > XPRT_TRANSPORT_BC;
> > args.authflavor = ses->se_cb_sec.flavor;
> > @@ -1225,8 +1224,10 @@ static int setup_callback_client(struct
> > nfs4_client *clp, struct nfs4_cb_conn *c
> > return -ENOMEM;
> > }
> >
> > - if (clp->cl_minorversion != 0)
> > + if (clp->cl_minorversion != 0) {
> > clp->cl_cb_conn.cb_xprt = conn->cb_xprt;
> > + clp->cl_cb_session = ses;
> > + }
> > clp->cl_cb_client = client;
> > clp->cl_cb_cred = cred;
> > rcu_read_lock();
> > @@ -1299,6 +1300,7 @@ void nfsd4_probe_callback_sync(struct nfs4_client *clp)
> > {
> > nfsd4_probe_callback(clp);
> > flush_workqueue(clp->cl_callback_wq);
> > + nfsd41_cb_inflight_wait_complete(clp);
> > }
> >
> > void nfsd4_change_callback(struct nfs4_client *clp, struct
> > nfs4_cb_conn *conn)
> > @@ -1679,7 +1681,17 @@ static struct nfsd4_conn *
> > __nfsd4_find_backchannel(struct nfs4_client *clp)
> > * Note there isn't a lot of locking in this code; instead we depend on
> > * the fact that it is run from clp->cl_callback_wq, which won't run
> > two
> > * work items at once. So, for example, clp->cl_callback_wq handles
> > all
> > - * access of cl_cb_client and all calls to rpc_create or
> > rpc_shutdown_client.
> > + * access of cl_cb_client and cl_cb_session, and all calls to
> > rpc_create
> > + * or rpc_shutdown_client.
> > + *
> > + * rpciod-side readers of cl_cb_session (encode_cb_sequence4args(),
> > + * nfsd4_cb_sequence_done(), the cb-slot helpers, and the cb_sequence
> > + * tracepoints) run outside cl_callback_wq. The
> > + * nfsd41_cb_inflight_wait_complete() drain in
> > nfsd4_probe_callback_sync()
> > + * waits until cl_cb_inflight reaches zero before the caller proceeds
> > with
> > + * session teardown; any rpc_task that reads cl_cb_session must hold an
> > + * inflight pin (via nfsd41_cb_inflight_begin) for this fence to be
> > + * effective.
> > */
> > static void nfsd4_process_cb_update(struct nfsd4_callback *cb)
> > {
> > @@ -1731,6 +1743,7 @@ static void nfsd4_process_cb_update(struct
> > nfsd4_callback *cb)
> > nfsd4_mark_cb_down(clp);
> > if (c)
> > svc_xprt_put(c->cn_xprt);
> > + clp->cl_cb_session = NULL;
> > return;
> > }
> > }
> >
> > --
> > 2.54.0
>
> Several NFSD callback done handlers retry indefinitely on
> NFS4ERR_DELAY via rpc_delay(), so a client that keeps
> replying DELAY leaves this per-client counter nonzero and
> blocks the foreground CREATE/BIND/DESTROY_SESSION request
> even though the callback no longer references the session
> being torn down.
>
> Although partly due to the way callbacks are structured
> currently, this patch potentially introduces a client-
> controlled DoS vector.
>
Good point. I'm currently looking at reworking this so that each stage
of the callback state machine can tolerate a NULL cl_cb_session
pointer. If I can make that work, then we can fix the UAF without
blocking.
I'll definitely be sending a v2 of the series.
--
Jeff Layton <jlayton@kernel.org>
next prev parent reply other threads:[~2026-05-29 17:31 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-28 21:55 [PATCH 00/10] nfsd: a pile of fixes for random bugs Jeff Layton
2026-05-28 21:55 ` [PATCH 01/10] nfsd: fix BUG_ON in nfsd4_alloc_layout_stateid on racing delegation revoke Jeff Layton
2026-05-28 23:40 ` NeilBrown
2026-05-29 14:44 ` Jeff Layton
2026-05-28 21:55 ` [PATCH 02/10] nfsd: drain callbacks and clear cl_cb_session Jeff Layton
2026-05-29 15:13 ` Chuck Lever
2026-05-29 17:31 ` Jeff Layton [this message]
2026-05-28 21:55 ` [PATCH 03/10] nfsd: serialize nfsd4_end_grace() with atomic test-and-set Jeff Layton
2026-05-29 15:38 ` Chuck Lever
2026-05-29 15:57 ` Jeff Layton
2026-05-29 16:05 ` Chuck Lever
2026-05-29 17:02 ` Jeff Layton
2026-05-28 21:55 ` [PATCH 04/10] nfsd: dedup nfs4_client_to_reclaim inserts Jeff Layton
2026-05-29 16:22 ` Chuck Lever
2026-05-28 21:55 ` [PATCH 05/10] nfsd: gate nfs3 setacl by argp->mask Jeff Layton
2026-05-28 21:55 ` [PATCH 06/10] NFSD: Enable return of an updated stable_how to NFS clients Jeff Layton
2026-05-29 10:56 ` Jeff Layton
2026-05-30 7:58 ` NFSv4.1 COMMIT of all changed areas only on flush? " Cedric Blancher
2026-05-30 10:24 ` Jeff Layton
2026-05-28 21:55 ` [PATCH 07/10] NFSD: check truncate permission under inode lock Jeff Layton
2026-05-28 21:55 ` [PATCH 08/10] nfsd: fix partial-write detection in nfsd_direct_write Jeff Layton
2026-05-29 16:57 ` Chuck Lever
2026-05-29 17:01 ` Jeff Layton
2026-05-29 17:03 ` Chuck Lever
2026-05-29 17:06 ` Jeff Layton
2026-05-29 17:09 ` Chuck Lever
2026-05-28 21:55 ` [PATCH 09/10] nfsd: cap decoded POSIX ACL count to bound sort cost Jeff Layton
2026-05-28 22:11 ` Rick Macklem
2026-05-28 23:11 ` Chuck Lever
2026-05-29 0:07 ` Chuck Lever
2026-05-29 10:48 ` Jeff Layton
2026-05-29 13:20 ` Chuck Lever
2026-05-29 7:34 ` Cedric Blancher
2026-05-29 10:50 ` Jeff Layton
2026-05-29 18:34 ` Chuck Lever
2026-05-29 18:41 ` Jeff Layton
2026-05-29 18:48 ` Chuck Lever
2026-05-29 23:04 ` Rick Macklem
2026-05-28 21:55 ` [PATCH 10/10] nfsd: validate symlink target length in NFSv4 CREATE Jeff Layton
2026-05-29 18:55 ` Chuck Lever
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a7d2a218bb9cb500df1af89e1db39b69b16417af.camel@kernel.org \
--to=jlayton@kernel.org \
--cc=Dai.Ngo@oracle.com \
--cc=Trond.Myklebust@netapp.com \
--cc=agruen@suse.de \
--cc=bfields@fieldses.org \
--cc=cel@kernel.org \
--cc=chuck.lever@oracle.com \
--cc=clm@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=neil@brown.name \
--cc=okorniev@redhat.com \
--cc=rmacklem@uoguelph.ca \
--cc=smayhew@redhat.com \
--cc=snitzer@kernel.org \
--cc=tom@talpey.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox