Linux NFS development
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@kernel.org>
To: Chuck Lever <chuck.lever@oracle.com>, NeilBrown <neil@brown.name>,
	 Olga Kornievskaia <okorniev@redhat.com>,
	Dai Ngo <Dai.Ngo@oracle.com>,  Tom Talpey <tom@talpey.com>,
	"J. Bruce Fields" <bfields@fieldses.org>,
	 Scott Mayhew <smayhew@redhat.com>,
	 Trond Myklebust <Trond.Myklebust@netapp.com>,
	 Andreas Gruenbacher <agruen@suse.de>,
	Mike Snitzer <snitzer@kernel.org>,
	 Rick Macklem <rmacklem@uoguelph.ca>
Cc: Chris Mason <clm@meta.com>,
	linux-nfs@vger.kernel.org,  linux-kernel@vger.kernel.org,
	Jeff Layton <jlayton@kernel.org>
Subject: [PATCH 02/10] nfsd: drain callbacks and clear cl_cb_session
Date: Thu, 28 May 2026 17:55:13 -0400	[thread overview]
Message-ID: <20260528-nfsd-fixes-v1-2-e78708eff77d@kernel.org> (raw)
In-Reply-To: <20260528-nfsd-fixes-v1-0-e78708eff77d@kernel.org>

From: Chris Mason <clm@meta.com>

After a DESTROY_SESSION the per-session teardown path can free a
session while rpciod still holds an inflight callback rpc_task that
dereferences clp->cl_cb_session.  nfsd4_probe_callback_sync() flushes
cl_callback_wq, but once nfsd4_run_cb_work() has called
rpc_call_async() the rpc_task lives on rpciod; flushing the workqueue
does not wait for it.  After the flush returns,
nfsd4_destroy_session() proceeds through nfsd4_put_session_locked()
and free_session() kfree()s the slab while rpciod's
nfsd4_cb_sequence_done(), grab_slot(), and nfsd41_cb_release_slot()
are still dereferencing cb->cb_clp->cl_cb_session.

    destroy path                       rpciod
    ------------                       ------
    unhash_session(ses)
    nfsd4_probe_callback_sync(clp)
      flush_workqueue(cl_callback_wq)
      /* returns; rpc_task still live */
    nfsd4_put_session_locked(ses)
    free_session(ses) -> kfree(ses)
                                       nfsd4_cb_sequence_done()
                                         reads cb_clp->cl_cb_session
                                         /* freed slab */

A second window exists in nfsd4_process_cb_update().  When
__nfsd4_find_backchannel() returns NULL because unhash_session() has
already removed the destroyed session from cl_sessions,
setup_callback_client() takes the v4.1 early return

    if (!conn->cb_xprt || !ses)
            return -EINVAL;

so clp->cl_cb_session = ses never fires and the field retains a
pointer to the about-to-be-freed session.  Symmetrically, if a later
probe finds a different session's backchannel conn and that
setup_callback_client() call fails, the error tail must still scrub
any previously published cl_cb_session.

Fix by mirroring the two-stage drain that nfsd4_shutdown_callback()
already performs: call nfsd41_cb_inflight_wait_complete() in
nfsd4_probe_callback_sync() after flush_workqueue() so rpciod-side
nfsd41_cb_inflight_end() decrements are observed before the caller
releases the final session reference.  The two direct callers,
nfsd4_destroy_session() and nfsd4_init_conn() (itself invoked from
nfsd4_create_session() and nfsd4_bind_conn_to_session()), run in
sleepable process context and tolerate the wait_var_event() sleep:

    nfsd4_destroy_session() (fs/nfsd/nfs4state.c):
      unhash_session(ses);
      spin_unlock(&nn->client_lock);   /* spinlock dropped */
      nfsd4_probe_callback_sync(ses->se_client);

    nfsd4_init_conn() (fs/nfsd/nfs4state.c):
      acquires no locks in its body; calls nfsd4_hash_conn(),
      nfsd4_register_conn(), then nfsd4_probe_callback_sync() --
      entirely in sleepable process context.

Also clear clp->cl_cb_session unconditionally on the
nfsd4_process_cb_update() error return so every
setup_callback_client() failure -- whether c is NULL or points at a
different session whose probe failed -- leaves the field NULL rather
than pointing at a session that may subsequently be freed.

Fixes: dcbeaa68dbbd ("nfsd4: allow backchannel recovery")
Assisted-by: kres:claude-opus-4-7
Signed-off-by: Chris Mason <clm@meta.com>
---
 fs/nfsd/nfs4callback.c | 21 +++++++++++++++++----
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
index 1964a213f80e..1cf6b6100357 100644
--- a/fs/nfsd/nfs4callback.c
+++ b/fs/nfsd/nfs4callback.c
@@ -1205,9 +1205,8 @@ static int setup_callback_client(struct nfs4_client *clp, struct nfs4_cb_conn *c
 	} else {
 		if (!conn->cb_xprt || !ses)
 			return -EINVAL;
-		clp->cl_cb_session = ses;
 		args.bc_xprt = conn->cb_xprt;
-		args.prognumber = clp->cl_cb_session->se_cb_prog;
+		args.prognumber = ses->se_cb_prog;
 		args.protocol = conn->cb_xprt->xpt_class->xcl_ident |
 				XPRT_TRANSPORT_BC;
 		args.authflavor = ses->se_cb_sec.flavor;
@@ -1225,8 +1224,10 @@ static int setup_callback_client(struct nfs4_client *clp, struct nfs4_cb_conn *c
 		return -ENOMEM;
 	}
 
-	if (clp->cl_minorversion != 0)
+	if (clp->cl_minorversion != 0) {
 		clp->cl_cb_conn.cb_xprt = conn->cb_xprt;
+		clp->cl_cb_session = ses;
+	}
 	clp->cl_cb_client = client;
 	clp->cl_cb_cred = cred;
 	rcu_read_lock();
@@ -1299,6 +1300,7 @@ void nfsd4_probe_callback_sync(struct nfs4_client *clp)
 {
 	nfsd4_probe_callback(clp);
 	flush_workqueue(clp->cl_callback_wq);
+	nfsd41_cb_inflight_wait_complete(clp);
 }
 
 void nfsd4_change_callback(struct nfs4_client *clp, struct nfs4_cb_conn *conn)
@@ -1679,7 +1681,17 @@ static struct nfsd4_conn * __nfsd4_find_backchannel(struct nfs4_client *clp)
  * Note there isn't a lot of locking in this code; instead we depend on
  * the fact that it is run from clp->cl_callback_wq, which won't run two
  * work items at once.  So, for example, clp->cl_callback_wq handles all
- * access of cl_cb_client and all calls to rpc_create or rpc_shutdown_client.
+ * access of cl_cb_client and cl_cb_session, and all calls to rpc_create
+ * or rpc_shutdown_client.
+ *
+ * rpciod-side readers of cl_cb_session (encode_cb_sequence4args(),
+ * nfsd4_cb_sequence_done(), the cb-slot helpers, and the cb_sequence
+ * tracepoints) run outside cl_callback_wq.  The
+ * nfsd41_cb_inflight_wait_complete() drain in nfsd4_probe_callback_sync()
+ * waits until cl_cb_inflight reaches zero before the caller proceeds with
+ * session teardown; any rpc_task that reads cl_cb_session must hold an
+ * inflight pin (via nfsd41_cb_inflight_begin) for this fence to be
+ * effective.
  */
 static void nfsd4_process_cb_update(struct nfsd4_callback *cb)
 {
@@ -1731,6 +1743,7 @@ static void nfsd4_process_cb_update(struct nfsd4_callback *cb)
 		nfsd4_mark_cb_down(clp);
 		if (c)
 			svc_xprt_put(c->cn_xprt);
+		clp->cl_cb_session = NULL;
 		return;
 	}
 }

-- 
2.54.0


  parent reply	other threads:[~2026-05-28 21:55 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-28 21:55 [PATCH 00/10] nfsd: a pile of fixes for random bugs Jeff Layton
2026-05-28 21:55 ` [PATCH 01/10] nfsd: fix BUG_ON in nfsd4_alloc_layout_stateid on racing delegation revoke Jeff Layton
2026-05-28 23:40   ` NeilBrown
2026-05-29 14:44     ` Jeff Layton
2026-05-28 21:55 ` Jeff Layton [this message]
2026-05-29 15:13   ` [PATCH 02/10] nfsd: drain callbacks and clear cl_cb_session Chuck Lever
2026-05-29 17:31     ` Jeff Layton
2026-05-28 21:55 ` [PATCH 03/10] nfsd: serialize nfsd4_end_grace() with atomic test-and-set Jeff Layton
2026-05-29 15:38   ` Chuck Lever
2026-05-29 15:57     ` Jeff Layton
2026-05-29 16:05       ` Chuck Lever
2026-05-29 17:02         ` Jeff Layton
2026-05-28 21:55 ` [PATCH 04/10] nfsd: dedup nfs4_client_to_reclaim inserts Jeff Layton
2026-05-29 16:22   ` Chuck Lever
2026-05-28 21:55 ` [PATCH 05/10] nfsd: gate nfs3 setacl by argp->mask Jeff Layton
2026-05-28 21:55 ` [PATCH 06/10] NFSD: Enable return of an updated stable_how to NFS clients Jeff Layton
2026-05-29 10:56   ` Jeff Layton
2026-05-30  7:58   ` NFSv4.1 COMMIT of all changed areas only on flush? " Cedric Blancher
2026-05-30 10:24     ` Jeff Layton
2026-05-28 21:55 ` [PATCH 07/10] NFSD: check truncate permission under inode lock Jeff Layton
2026-05-28 21:55 ` [PATCH 08/10] nfsd: fix partial-write detection in nfsd_direct_write Jeff Layton
2026-05-29 16:57   ` Chuck Lever
2026-05-29 17:01     ` Jeff Layton
2026-05-29 17:03       ` Chuck Lever
2026-05-29 17:06         ` Jeff Layton
2026-05-29 17:09           ` Chuck Lever
2026-05-28 21:55 ` [PATCH 09/10] nfsd: cap decoded POSIX ACL count to bound sort cost Jeff Layton
2026-05-28 22:11   ` Rick Macklem
2026-05-28 23:11     ` Chuck Lever
2026-05-29  0:07       ` Chuck Lever
2026-05-29 10:48         ` Jeff Layton
2026-05-29 13:20           ` Chuck Lever
2026-05-29  7:34   ` Cedric Blancher
2026-05-29 10:50     ` Jeff Layton
2026-05-29 18:34   ` Chuck Lever
2026-05-29 18:41     ` Jeff Layton
2026-05-29 18:48       ` Chuck Lever
2026-05-29 23:04     ` Rick Macklem
2026-05-28 21:55 ` [PATCH 10/10] nfsd: validate symlink target length in NFSv4 CREATE Jeff Layton
2026-05-29 18:55   ` Chuck Lever

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260528-nfsd-fixes-v1-2-e78708eff77d@kernel.org \
    --to=jlayton@kernel.org \
    --cc=Dai.Ngo@oracle.com \
    --cc=Trond.Myklebust@netapp.com \
    --cc=agruen@suse.de \
    --cc=bfields@fieldses.org \
    --cc=chuck.lever@oracle.com \
    --cc=clm@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neil@brown.name \
    --cc=okorniev@redhat.com \
    --cc=rmacklem@uoguelph.ca \
    --cc=smayhew@redhat.com \
    --cc=snitzer@kernel.org \
    --cc=tom@talpey.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox