All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/9] nfsd: fixes for locally-triggerable bugs
@ 2026-06-02 16:23 Jeff Layton
  2026-06-02 16:23 ` [PATCH v2 1/9] nfsd: defer vfree of compound ops to fix rpc_status UAF Jeff Layton
                   ` (9 more replies)
  0 siblings, 10 replies; 15+ messages in thread
From: Jeff Layton @ 2026-06-02 16:23 UTC (permalink / raw)
  To: Chuck Lever, NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	Lorenzo Bianconi, Anna Schumaker, Trond Myklebust, Anna Schumaker,
	Mike Snitzer
  Cc: Al Viro, Chris Mason, linux-nfs, linux-kernel, Trond Myklebust,
	Jeff Layton

Just some minor changes in this version, plus a cleanup patch from Al.

These are bugs that Claude classified as locally-triggerable. A couple
can be triggered by an unprivileged user, but the rest require admin
access.

The last 3 patches fix one bug. I originally had a more targeted fix
that kres generated, but I think it's better to simplify the filecache
disposal mechanism to get rid of the bug rather than add more
complexity.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
Changes in v2:
- rework filecache patch to only take net ref at disposal time
- fix ordering of operations in nfsd4_release_compoundargs()
- add Al's patch to simplify nfsd_cross_mnt() cleanup
- Link to v1: https://lore.kernel.org/r/20260601-nfsd-testing-v1-0-d0f61e536df8@kernel.org

---
Al Viro (1):
      nfsd: unify cleanups in nfsd_cross_mnt() exits

Chris Mason (3):
      nfsd: hold rcu across localio cmpxchg retry
      nfs/localio: fix ref leak on nfs_uuid_add_file failure
      nfsd: guard nfsd_serv deref in nfsd_file_net_dispose

Jeff Layton (5):
      nfsd: defer vfree of compound ops to fix rpc_status UAF
      nfsd: widen nfsd_genl_rqstp address fields to sockaddr_storage
      nfsd: fix refcount leak in nfsd_file_lru_add on insertion failure
      nfsd: fix fcache_disposal UAF by inlining dispose state into nfsd_net
      nfsd: hold net namespace reference for delayed-dispose nfsd_files

 fs/nfs_common/nfslocalio.c |  14 ++++-
 fs/nfsd/filecache.c        | 130 ++++++++++++++++++++-------------------------
 fs/nfsd/filecache.h        |   3 +-
 fs/nfsd/localio.c          |  12 +++--
 fs/nfsd/netns.h            |   3 +-
 fs/nfsd/nfs4xdr.c          |   4 +-
 fs/nfsd/nfsctl.c           |  12 ++---
 fs/nfsd/vfs.c              |  17 +++---
 include/linux/nfslocalio.h |   9 +---
 9 files changed, 101 insertions(+), 103 deletions(-)
---
base-commit: e7ca66ba17f1b5e4ecbb29b9c3c4a31aa062bed0
change-id: 20260601-nfsd-testing-e3509d5e035e

Best regards,
-- 
Jeff Layton <jlayton@kernel.org>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 1/9] nfsd: defer vfree of compound ops to fix rpc_status UAF
  2026-06-02 16:23 [PATCH v2 0/9] nfsd: fixes for locally-triggerable bugs Jeff Layton
@ 2026-06-02 16:23 ` Jeff Layton
  2026-06-02 16:23 ` [PATCH v2 2/9] nfsd: hold rcu across localio cmpxchg retry Jeff Layton
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: Jeff Layton @ 2026-06-02 16:23 UTC (permalink / raw)
  To: Chuck Lever, NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	Lorenzo Bianconi, Anna Schumaker, Trond Myklebust, Anna Schumaker,
	Mike Snitzer
  Cc: Al Viro, Chris Mason, linux-nfs, linux-kernel, Trond Myklebust,
	Jeff Layton

The rpc_status netlink dumpit walks every in-flight svc_rqst under
rcu_read_lock and, for NFSv4 requests, reads opnums out of
args->ops[]. But args->ops is a separate vmalloc buffer freed
synchronously by vfree() in nfsd4_release_compoundargs() at the end
of every compound. The dumpit's rcu_read_lock pins the svc_rqst
struct itself (freed via kfree_rcu), but nothing defers the vfree
of the ops buffer across the RCU grace period. A concurrent compound
completion can therefore free the buffer while the dumpit is reading
it — a use-after-free on vmalloc memory.

The trailing seqcount recheck (smp_load_acquire of rq_status_counter)
cannot undo a load that already retired against freed memory.

Fix by replacing vfree(args->ops) with kvfree_rcu_mightsleep(), which
defers the free until after an RCU grace period. This makes the
existing rcu_read_lock in the dumpit sufficient to protect the read.
The tradeoff is that completed compound ops buffers (up to
200 * sizeof(struct nfsd4_op)) persist in memory slightly longer,
across one grace period, before being reclaimed.

Fixes: bd9d6a3efa97 ("NFSD: add rpc_status netlink support")
Assisted-by: Claude:claude-opus-4-6
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/nfsd/nfs4xdr.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 487a1f62ce15..6680e9e1e5b4 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -6686,8 +6686,10 @@ void nfsd4_release_compoundargs(struct svc_rqst *rqstp)
 	struct nfsd4_compoundargs *args = rqstp->rq_argp;
 
 	if (args->ops != args->iops) {
-		vfree(args->ops);
+		void *old_ops = args->ops;
+
 		args->ops = args->iops;
+		kvfree_rcu_mightsleep(old_ops);
 	}
 	while (args->to_free) {
 		struct svcxdr_tmpbuf *tb = args->to_free;

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 2/9] nfsd: hold rcu across localio cmpxchg retry
  2026-06-02 16:23 [PATCH v2 0/9] nfsd: fixes for locally-triggerable bugs Jeff Layton
  2026-06-02 16:23 ` [PATCH v2 1/9] nfsd: defer vfree of compound ops to fix rpc_status UAF Jeff Layton
@ 2026-06-02 16:23 ` Jeff Layton
  2026-06-02 16:23 ` [PATCH v2 3/9] nfs/localio: fix ref leak on nfs_uuid_add_file failure Jeff Layton
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: Jeff Layton @ 2026-06-02 16:23 UTC (permalink / raw)
  To: Chuck Lever, NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	Lorenzo Bianconi, Anna Schumaker, Trond Myklebust, Anna Schumaker,
	Mike Snitzer
  Cc: Al Viro, Chris Mason, linux-nfs, linux-kernel, Trond Myklebust,
	Jeff Layton

From: Chris Mason <clm@meta.com>

nfsd_file objects are freed via call_rcu (filecache.c:296), and
nfsd_file_slab is created without SLAB_TYPESAFE_BY_RCU
(KMEM_CACHE(nfsd_file, 0) at filecache.c:789), so the slab page
backing a freed nfsd_file becomes freely reclaimable once the RCU
grace period elapses.

The again: retry block in nfsd_open_local_fh() loads a pointer with
cmpxchg and then calls nfsd_file_get(new) (which is
refcount_inc_not_zero) without holding rcu_read_lock. The sole caller
nfs_open_local_fh() drops rcu_read_lock before invoking this helper,
so no outer reader-side critical section covers the load.

    CPU 0 (nfsd_open_local_fh)        CPU 1 (nfsd_file_put_local)
    -----                             -----
    new = cmpxchg(pnf, NULL, ...)
                                      nf = xchg(pnf, NULL)
                                      nfsd_file_put(nf)
                                        last ref -> call_rcu()
                                      /* grace period elapses;
                                         slab page recycled */
    nfsd_file_get(new)
      refcount_inc_not_zero(&new->nf_ref)
      /* operates on recycled memory */

A non-zero word at the nf_ref offset of the recycled object makes the
refcount bump appear to succeed, and the caller then dereferences
new->nf_net and new->nf_file out of freed memory.

Fix by taking rcu_read_lock() immediately before the cmpxchg and
releasing it on all three exits of the if (new) block: the goto-again
retry, the lost-race cleanup path, and the install-succeeded path.
nfsd_file_put() and nfsd_net_put() stay outside the RCU section so
they remain free to block.

Fixes: e6f7e1487ab5 ("nfs_localio: simplify interface to nfsd for getting nfsd_file")
Assisted-by: kres:claude-opus-4-7
Signed-off-by: Chris Mason <clm@meta.com>
---
 fs/nfsd/localio.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
index be710d809a3b..c3eb0557b3e1 100644
--- a/fs/nfsd/localio.c
+++ b/fs/nfsd/localio.c
@@ -97,11 +97,15 @@ nfsd_open_local_fh(struct net *net, struct auth_domain *dom,
 		}
 		nfsd_file_get(localio);
 	again:
+		rcu_read_lock();
 		new = unrcu_pointer(cmpxchg(pnf, NULL, RCU_INITIALIZER(localio)));
 		if (new) {
 			/* Some other thread installed an nfsd_file */
-			if (nfsd_file_get(new) == NULL)
+			if (nfsd_file_get(new) == NULL) {
+				rcu_read_unlock();
 				goto again;
+			}
+			rcu_read_unlock();
 			/*
 			 * Drop the ref we were going to install (both file and
 			 * net) and the one we were going to return (only file).
@@ -110,6 +114,8 @@ nfsd_open_local_fh(struct net *net, struct auth_domain *dom,
 			nfsd_net_put(net);
 			nfsd_file_put(localio);
 			localio = new;
+		} else {
+			rcu_read_unlock();
 		}
 	} else
 		nfsd_net_put(net);

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 3/9] nfs/localio: fix ref leak on nfs_uuid_add_file failure
  2026-06-02 16:23 [PATCH v2 0/9] nfsd: fixes for locally-triggerable bugs Jeff Layton
  2026-06-02 16:23 ` [PATCH v2 1/9] nfsd: defer vfree of compound ops to fix rpc_status UAF Jeff Layton
  2026-06-02 16:23 ` [PATCH v2 2/9] nfsd: hold rcu across localio cmpxchg retry Jeff Layton
@ 2026-06-02 16:23 ` Jeff Layton
  2026-06-02 16:23 ` [PATCH v2 4/9] nfsd: guard nfsd_serv deref in nfsd_file_net_dispose Jeff Layton
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: Jeff Layton @ 2026-06-02 16:23 UTC (permalink / raw)
  To: Chuck Lever, NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	Lorenzo Bianconi, Anna Schumaker, Trond Myklebust, Anna Schumaker,
	Mike Snitzer
  Cc: Al Viro, Chris Mason, linux-nfs, linux-kernel, Trond Myklebust,
	Jeff Layton

From: Chris Mason <clm@meta.com>

When nfs_uuid_add_file() races with nfs_uuid_put() tearing down
uuid->net, it returns -ENXIO without publishing nfl->nfs_uuid via
rcu_assign_pointer().  nfs_open_local_fh() then enters its error
branch and only releases the slot pair plus its own entry-time net
ref, while the close path is a no-op:

    nfs_close_local_fh()
      nfs_uuid = rcu_dereference(nfl->nfs_uuid);
      if (!nfs_uuid) { rcu_read_unlock(); return; }  /* always */

nfsd_open_local_fh() returns localio holding a caller-owned +1
nfsd_file reference (from nfsd_file_get() after
nfsd_file_acquire_local()) and an entry-time nfsd_net reference
(from its first nfsd_net_try_get()) embedded as nf->nf_net.  Both
are leaked on the failure path, pinning one nfsd_file (and the
underlying struct file, dentry, inode) and one nfsd_net_ref per
occurrence, which blocks nfsd_net and netns teardown.

Fix by releasing the caller-owned file ref and its net ref through
the existing helper, using a stack-local RCU pointer so the helper
can xchg it out, then returning -ENXIO so callers do not
dereference a localio whose slot has been cleared:

    struct nfsd_file __rcu *tmp = RCU_INITIALIZER(localio);

    nfs_to_nfsd_file_put_local(pnf);
    nfs_to_nfsd_file_put_local(&tmp);
    localio = ERR_PTR(-ENXIO);

The trailing nfs_to_nfsd_net_put(net) continues to release the
outer net ref, so all three nfsd_net_try_get() increments are
balanced on the error branch.

Fixes: fdd015de7679 ("NFS/localio: nfs_uuid_put() fix races with nfs_open/close_local_fh()")
Assisted-by: kres:claude-opus-4-7
Signed-off-by: Chris Mason <clm@meta.com>
---
 fs/nfs_common/nfslocalio.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
index dd715cdb6c04..a3c1c5c2764a 100644
--- a/fs/nfs_common/nfslocalio.c
+++ b/fs/nfs_common/nfslocalio.c
@@ -292,8 +292,20 @@ struct nfsd_file *nfs_open_local_fh(nfs_uuid_t *uuid,
 	localio = nfs_to->nfsd_open_local_fh(net, uuid->dom, rpc_clnt, cred,
 					     nfs_fh, pnf, fmode);
 	if (!IS_ERR(localio) && nfs_uuid_add_file(uuid, nfl) < 0) {
-		/* Delete the cached file when racing with nfs_uuid_put() */
+		/*
+		 * Delete the cached file when racing with nfs_uuid_put().
+		 * Since nfl->nfs_uuid was never published via
+		 * rcu_assign_pointer(), nfs_close_local_fh() will early-return
+		 * and cannot clean up after us.  Drop the slot pair, then drop
+		 * the caller-owned nfsd_file ref (+1) and the entry-time
+		 * nfsd_net ref carried via nf->nf_net, and return -ENXIO so
+		 * the caller never dereferences the now-cleared localio.
+		 */
+		struct nfsd_file __rcu *tmp = RCU_INITIALIZER(localio);
+
 		nfs_to_nfsd_file_put_local(pnf);
+		nfs_to_nfsd_file_put_local(&tmp);
+		localio = ERR_PTR(-ENXIO);
 	}
 	nfs_to_nfsd_net_put(net);
 

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 4/9] nfsd: guard nfsd_serv deref in nfsd_file_net_dispose
  2026-06-02 16:23 [PATCH v2 0/9] nfsd: fixes for locally-triggerable bugs Jeff Layton
                   ` (2 preceding siblings ...)
  2026-06-02 16:23 ` [PATCH v2 3/9] nfs/localio: fix ref leak on nfs_uuid_add_file failure Jeff Layton
@ 2026-06-02 16:23 ` Jeff Layton
  2026-06-02 16:23 ` [PATCH v2 5/9] nfsd: widen nfsd_genl_rqstp address fields to sockaddr_storage Jeff Layton
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: Jeff Layton @ 2026-06-02 16:23 UTC (permalink / raw)
  To: Chuck Lever, NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	Lorenzo Bianconi, Anna Schumaker, Trond Myklebust, Anna Schumaker,
	Mike Snitzer
  Cc: Al Viro, Chris Mason, linux-nfs, linux-kernel, Trond Myklebust,
	Jeff Layton

From: Chris Mason <clm@meta.com>

nfsd_file_net_dispose() is the consumer side of l->freeme: the nfsd
service thread loop calls it to drain entries that the filecache
garbage collector and shrinker append via
nfsd_file_dispose_list_delayed().  During per-net teardown,
nn->nfsd_serv is cleared before the filecache laundrette is shut
down, so the service thread can still run a dispose pass that finds
more than eight entries on l->freeme and dereferences a NULL
svc_serv:

    nfsd service thread loop
      nfsd_file_net_dispose(nn)
        if (!list_empty(&l->freeme)) {
            ...
            svc_wake_up(nn->nfsd_serv);   /* nn->nfsd_serv == NULL */
        }

The sibling helper nfsd_file_dispose_list_delayed() already documents
this ordering and caches nn->nfsd_serv into a local before testing it
for NULL.  nfsd_file_net_dispose() was introduced with the same raw
svc_wake_up(nn->nfsd_serv) call and never picked up the guard.

Fix by loading nn->nfsd_serv into a local svc_serv pointer and only
calling svc_wake_up() when it is non-NULL, matching the pattern in
nfsd_file_dispose_list_delayed().

Fixes: ffb402596147 ("nfsd: Don't leave work of closing files to a work queue")
Assisted-by: kres:claude-opus-4-7
Signed-off-by: Chris Mason <clm@meta.com>
---
 fs/nfsd/filecache.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 2f0d4de779af..1e2e1f89216e 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -474,11 +474,20 @@ void nfsd_file_net_dispose(struct nfsd_net *nn)
 		for (i = 0; i < 8 && !list_empty(&l->freeme); i++)
 			list_move(l->freeme.next, &dispose);
 		spin_unlock(&l->lock);
-		if (!list_empty(&l->freeme))
-			/* Wake up another thread to share the work
+		if (!list_empty(&l->freeme)) {
+			/*
+			 * Wake up another thread to share the work
 			 * *before* doing any actual disposing.
+			 *
+			 * The filecache laundrette is shut down after
+			 * the nn->nfsd_serv pointer is cleared, but
+			 * before the svc_serv is freed.
 			 */
-			svc_wake_up(nn->nfsd_serv);
+			struct svc_serv *serv = nn->nfsd_serv;
+
+			if (serv)
+				svc_wake_up(serv);
+		}
 		nfsd_file_dispose_list(&dispose);
 	}
 }

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 5/9] nfsd: widen nfsd_genl_rqstp address fields to sockaddr_storage
  2026-06-02 16:23 [PATCH v2 0/9] nfsd: fixes for locally-triggerable bugs Jeff Layton
                   ` (3 preceding siblings ...)
  2026-06-02 16:23 ` [PATCH v2 4/9] nfsd: guard nfsd_serv deref in nfsd_file_net_dispose Jeff Layton
@ 2026-06-02 16:23 ` Jeff Layton
  2026-06-02 16:23 ` [PATCH v2 6/9] nfsd: fix refcount leak in nfsd_file_lru_add on insertion failure Jeff Layton
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: Jeff Layton @ 2026-06-02 16:23 UTC (permalink / raw)
  To: Chuck Lever, NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	Lorenzo Bianconi, Anna Schumaker, Trond Myklebust, Anna Schumaker,
	Mike Snitzer
  Cc: Al Viro, Chris Mason, linux-nfs, linux-kernel, Trond Myklebust,
	Jeff Layton

struct nfsd_genl_rqstp declares rq_daddr and rq_saddr as plain
"struct sockaddr" (16 bytes). When an IPv6 NFS client is connected,
nfsd_genl_rpc_status_compose_msg() casts these fields to
"struct sockaddr_in6 *" (28 bytes) and reads sin6_addr at offset 8..24,
which extends 8 bytes past the end of the 16-byte sockaddr field into
the adjacent rq_flags member. The 16-byte nla_put_in6_addr then ships 8
bytes of truncated IPv6 address followed by 8 bytes of rq_flags to
userspace via the NFSD_A_RPC_STATUS_SADDR6/DADDR6 netlink attributes.

This is reachable by any unprivileged process in the network namespace
because NFSD_CMD_RPC_STATUS_GET uses GENL_CMD_CAP_DUMP without
GENL_ADMIN_PERM.

Fix by widening rq_daddr and rq_saddr to struct sockaddr_storage so the
IPv6 casts operate within bounds, copying sizeof(struct sockaddr_storage)
bytes in the memcpy calls so the full address is captured, and
zero-initializing the genl_rqstp stack variable to prevent leaking
uninitialized tail bytes through netlink.

Fixes: bd9d6a3efa97 ("NFSD: add rpc_status netlink support")
Assisted-by: Claude:claude-opus-4-6
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/nfsd/nfsctl.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index 92f65ca6f667..6fee49a7787f 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -1414,8 +1414,8 @@ static int create_proc_exports_entry(void)
 unsigned int nfsd_net_id;
 
 struct nfsd_genl_rqstp {
-	struct sockaddr		rq_daddr;
-	struct sockaddr		rq_saddr;
+	struct sockaddr_storage	rq_daddr;
+	struct sockaddr_storage	rq_saddr;
 	unsigned long		rq_flags;
 	ktime_t			rq_stime;
 	__be32			rq_xid;
@@ -1450,7 +1450,7 @@ static int nfsd_genl_rpc_status_compose_msg(struct sk_buff *skb,
 			NFSD_A_RPC_STATUS_PAD))
 		return -ENOBUFS;
 
-	switch (genl_rqstp->rq_saddr.sa_family) {
+	switch (genl_rqstp->rq_saddr.ss_family) {
 	case AF_INET: {
 		const struct sockaddr_in *s_in, *d_in;
 
@@ -1527,7 +1527,7 @@ int nfsd_nl_rpc_status_get_dumpit(struct sk_buff *skb,
 		list_for_each_entry_rcu(rqstp,
 				&nn->nfsd_serv->sv_pools[i].sp_all_threads,
 				rq_all) {
-			struct nfsd_genl_rqstp genl_rqstp;
+			struct nfsd_genl_rqstp genl_rqstp = {};
 			unsigned int status_counter;
 
 			if (rqstp_index++ < cb->args[1]) /* already consumed */
@@ -1551,9 +1551,9 @@ int nfsd_nl_rpc_status_get_dumpit(struct sk_buff *skb,
 			genl_rqstp.rq_stime = rqstp->rq_stime;
 			genl_rqstp.rq_opcnt = 0;
 			memcpy(&genl_rqstp.rq_daddr, svc_daddr(rqstp),
-			       sizeof(struct sockaddr));
+			       sizeof(struct sockaddr_storage));
 			memcpy(&genl_rqstp.rq_saddr, svc_addr(rqstp),
-			       sizeof(struct sockaddr));
+			       sizeof(struct sockaddr_storage));
 
 #ifdef CONFIG_NFSD_V4
 			if (rqstp->rq_vers == NFS4_VERSION &&

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 6/9] nfsd: fix refcount leak in nfsd_file_lru_add on insertion failure
  2026-06-02 16:23 [PATCH v2 0/9] nfsd: fixes for locally-triggerable bugs Jeff Layton
                   ` (4 preceding siblings ...)
  2026-06-02 16:23 ` [PATCH v2 5/9] nfsd: widen nfsd_genl_rqstp address fields to sockaddr_storage Jeff Layton
@ 2026-06-02 16:23 ` Jeff Layton
  2026-06-02 16:23 ` [PATCH v2 7/9] nfsd: fix fcache_disposal UAF by inlining dispose state into nfsd_net Jeff Layton
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: Jeff Layton @ 2026-06-02 16:23 UTC (permalink / raw)
  To: Chuck Lever, NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	Lorenzo Bianconi, Anna Schumaker, Trond Myklebust, Anna Schumaker,
	Mike Snitzer
  Cc: Al Viro, Chris Mason, linux-nfs, linux-kernel, Trond Myklebust,
	Jeff Layton

nfsd_file_lru_add() unconditionally increments nf_ref before attempting
to insert the nfsd_file into the LRU via list_lru_add_obj(). If the
insertion fails (the item is already linked), the incremented reference
is never released, permanently inflating the refcount.

The LRU shrinker callback (nfsd_file_lru_cb) uses refcount_dec_if_one()
to reclaim entries, which requires nf_ref == 1. An inflated refcount
therefore blocks eviction of the affected file cache entry for the
lifetime of the nfsd instance.

While this failure path is currently unreachable -- the sole caller in
nfsd_file_do_acquire() operates on freshly-allocated objects that cannot
already be on the LRU -- it represents a latent bug that would become
exploitable if a future change adds another call site or alters the
PENDING protocol.

Fix this by:
 - Adding a compensating refcount_dec() on the failure path. Bare
   refcount_dec (rather than nfsd_file_put) is correct here because
   the caller in nfsd_file_do_acquire still holds its own construction
   reference, so the count goes from 2 back to 1 without risk of
   reaching zero.
 - Changing WARN_ON(1) to WARN_ON_ONCE(1) to prevent log flooding if
   this path is ever hit repeatedly.
 - Returning early on failure to skip the unnecessary call to
   nfsd_file_schedule_laundrette(), since no entry was added to the LRU.

Fixes: 56221b42d717 ("nfsd: filecache: don't repeatedly add/remove files on the lru list")
Assisted-by: Claude:claude-opus-4-6
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/nfsd/filecache.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 1e2e1f89216e..d5b917e40d62 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -330,8 +330,11 @@ static void nfsd_file_lru_add(struct nfsd_file *nf)
 	refcount_inc(&nf->nf_ref);
 	if (list_lru_add_obj(&nfsd_file_lru, &nf->nf_lru))
 		trace_nfsd_file_lru_add(nf);
-	else
-		WARN_ON(1);
+	else {
+		refcount_dec(&nf->nf_ref);
+		WARN_ON_ONCE(1);
+		return;
+	}
 	nfsd_file_schedule_laundrette();
 }
 

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 7/9] nfsd: fix fcache_disposal UAF by inlining dispose state into nfsd_net
  2026-06-02 16:23 [PATCH v2 0/9] nfsd: fixes for locally-triggerable bugs Jeff Layton
                   ` (5 preceding siblings ...)
  2026-06-02 16:23 ` [PATCH v2 6/9] nfsd: fix refcount leak in nfsd_file_lru_add on insertion failure Jeff Layton
@ 2026-06-02 16:23 ` Jeff Layton
  2026-06-02 16:23 ` [PATCH v2 8/9] nfsd: hold net namespace reference for delayed-dispose nfsd_files Jeff Layton
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 15+ messages in thread
From: Jeff Layton @ 2026-06-02 16:23 UTC (permalink / raw)
  To: Chuck Lever, NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	Lorenzo Bianconi, Anna Schumaker, Trond Myklebust, Anna Schumaker,
	Mike Snitzer
  Cc: Al Viro, Chris Mason, linux-nfs, linux-kernel, Trond Myklebust,
	Jeff Layton

nfsd_file_dispose_list_delayed() defers fput() to nfsd service threads
via a per-net freeme queue, preventing the shrinker and GC worker from
bearing the cost of closing files (see ffb402596147).  However, the
queue lives in a separately-allocated struct nfsd_fcache_disposal that
is freed by nfsd_free_fcache_disposal_net() during per-net teardown.
The global shrinker, laundrette, and fsnotify callbacks can still be
inside nfsd_file_dispose_list_delayed() dereferencing that pointer,
causing a use-after-free.

Inline the spinlock and freeme list directly into struct nfsd_net (as
fcache_dispose_lock and fcache_dispose_list), eliminating the separately
allocated struct nfsd_fcache_disposal entirely.  These fields now have
the same lifetime as the net namespace itself, so there is no dangling
pointer to chase.

nfsd_file_cache_start_net() now just initializes the inline fields and
cannot fail due to allocation.  nfsd_file_cache_shutdown_net() drains
the inline list directly instead of freeing a separate struct.  The
alloc/free helpers are removed.

Fixes: 1463b38e7cf3 ("NFSD: simplify per-net file cache management")
Assisted-by: Claude:claude-opus-4-6
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/nfsd/filecache.c | 90 +++++++++++++----------------------------------------
 fs/nfsd/netns.h     |  3 +-
 2 files changed, 24 insertions(+), 69 deletions(-)

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index d5b917e40d62..03f01a0beced 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -62,11 +62,6 @@ static DEFINE_PER_CPU(unsigned long, nfsd_file_releases);
 static DEFINE_PER_CPU(unsigned long, nfsd_file_total_age);
 static DEFINE_PER_CPU(unsigned long, nfsd_file_evictions);
 
-struct nfsd_fcache_disposal {
-	spinlock_t lock;
-	struct list_head freeme;
-};
-
 static struct kmem_cache		*nfsd_file_slab;
 static struct kmem_cache		*nfsd_file_mark_slab;
 static struct list_lru			nfsd_file_lru;
@@ -425,31 +420,26 @@ nfsd_file_dispose_list(struct list_head *dispose)
 }
 
 /**
- * nfsd_file_dispose_list_delayed - move list of dead files to net's freeme list
+ * nfsd_file_dispose_list_delayed - queue dead files for disposal by nfsd threads
  * @dispose: list of nfsd_files to be disposed
  *
- * Transfers each file to the "freeme" list for its nfsd_net, to eventually
- * be disposed of by the per-net garbage collector.
+ * Transfers each file to the per-net freeme list in its nfsd_net and wakes
+ * an nfsd thread to do the actual close.  This keeps the cost of fput()
+ * in the nfsd threads rather than in the shrinker or GC worker.
  */
 static void
 nfsd_file_dispose_list_delayed(struct list_head *dispose)
 {
-	while(!list_empty(dispose)) {
+	while (!list_empty(dispose)) {
 		struct nfsd_file *nf = list_first_entry(dispose,
 						struct nfsd_file, nf_gc);
 		struct nfsd_net *nn = net_generic(nf->nf_net, nfsd_net_id);
-		struct nfsd_fcache_disposal *l = nn->fcache_disposal;
 		struct svc_serv *serv;
 
-		spin_lock(&l->lock);
-		list_move_tail(&nf->nf_gc, &l->freeme);
-		spin_unlock(&l->lock);
+		spin_lock(&nn->fcache_dispose_lock);
+		list_move_tail(&nf->nf_gc, &nn->fcache_dispose_list);
+		spin_unlock(&nn->fcache_dispose_lock);
 
-		/*
-		 * The filecache laundrette is shut down after the
-		 * nn->nfsd_serv pointer is cleared, but before the
-		 * svc_serv is freed.
-		 */
 		serv = nn->nfsd_serv;
 		if (serv)
 			svc_wake_up(serv);
@@ -467,25 +457,15 @@ nfsd_file_dispose_list_delayed(struct list_head *dispose)
  */
 void nfsd_file_net_dispose(struct nfsd_net *nn)
 {
-	struct nfsd_fcache_disposal *l = nn->fcache_disposal;
-
-	if (!list_empty(&l->freeme)) {
+	if (!list_empty(&nn->fcache_dispose_list)) {
 		LIST_HEAD(dispose);
 		int i;
 
-		spin_lock(&l->lock);
-		for (i = 0; i < 8 && !list_empty(&l->freeme); i++)
-			list_move(l->freeme.next, &dispose);
-		spin_unlock(&l->lock);
-		if (!list_empty(&l->freeme)) {
-			/*
-			 * Wake up another thread to share the work
-			 * *before* doing any actual disposing.
-			 *
-			 * The filecache laundrette is shut down after
-			 * the nn->nfsd_serv pointer is cleared, but
-			 * before the svc_serv is freed.
-			 */
+		spin_lock(&nn->fcache_dispose_lock);
+		for (i = 0; i < 8 && !list_empty(&nn->fcache_dispose_list); i++)
+			list_move(nn->fcache_dispose_list.next, &dispose);
+		spin_unlock(&nn->fcache_dispose_lock);
+		if (!list_empty(&nn->fcache_dispose_list)) {
 			struct svc_serv *serv = nn->nfsd_serv;
 
 			if (serv)
@@ -701,11 +681,11 @@ nfsd_file_queue_for_close(struct inode *inode, struct list_head *dispose)
 }
 
 /**
- * nfsd_file_close_inode - attempt a delayed close of a nfsd_file
+ * nfsd_file_close_inode - attempt a deferred close of a nfsd_file
  * @inode: inode of the file to attempt to remove
  *
  * Close out any open nfsd_files that can be reaped for @inode. The
- * actual freeing is deferred to the dispose_list_delayed infrastructure.
+ * actual freeing is deferred to the nfsd service threads.
  *
  * This is used by the fsnotify callbacks and setlease notifier.
  */
@@ -990,42 +970,14 @@ __nfsd_file_cache_purge(struct net *net)
 	nfsd_file_dispose_list(&dispose);
 }
 
-static struct nfsd_fcache_disposal *
-nfsd_alloc_fcache_disposal(void)
-{
-	struct nfsd_fcache_disposal *l;
-
-	l = kmalloc_obj(*l);
-	if (!l)
-		return NULL;
-	spin_lock_init(&l->lock);
-	INIT_LIST_HEAD(&l->freeme);
-	return l;
-}
-
-static void
-nfsd_free_fcache_disposal(struct nfsd_fcache_disposal *l)
-{
-	nfsd_file_dispose_list(&l->freeme);
-	kfree(l);
-}
-
-static void
-nfsd_free_fcache_disposal_net(struct net *net)
-{
-	struct nfsd_net *nn = net_generic(net, nfsd_net_id);
-	struct nfsd_fcache_disposal *l = nn->fcache_disposal;
-
-	nfsd_free_fcache_disposal(l);
-}
-
 int
 nfsd_file_cache_start_net(struct net *net)
 {
 	struct nfsd_net *nn = net_generic(net, nfsd_net_id);
 
-	nn->fcache_disposal = nfsd_alloc_fcache_disposal();
-	return nn->fcache_disposal ? 0 : -ENOMEM;
+	spin_lock_init(&nn->fcache_dispose_lock);
+	INIT_LIST_HEAD(&nn->fcache_dispose_list);
+	return 0;
 }
 
 /**
@@ -1044,8 +996,10 @@ nfsd_file_cache_purge(struct net *net)
 void
 nfsd_file_cache_shutdown_net(struct net *net)
 {
+	struct nfsd_net *nn = net_generic(net, nfsd_net_id);
+
 	nfsd_file_cache_purge(net);
-	nfsd_free_fcache_disposal_net(net);
+	nfsd_file_dispose_list(&nn->fcache_dispose_list);
 }
 
 void
diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
index f6b8b340bf8e..5c33c96da28e 100644
--- a/fs/nfsd/netns.h
+++ b/fs/nfsd/netns.h
@@ -216,7 +216,8 @@ struct nfsd_net {
 	/* utsname taken from the process that starts the server */
 	char			nfsd_name[UNX_MAXNODENAME+1];
 
-	struct nfsd_fcache_disposal *fcache_disposal;
+	spinlock_t		fcache_dispose_lock;
+	struct list_head	fcache_dispose_list;
 
 	siphash_key_t		siphash_key;
 

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 8/9] nfsd: hold net namespace reference for delayed-dispose nfsd_files
  2026-06-02 16:23 [PATCH v2 0/9] nfsd: fixes for locally-triggerable bugs Jeff Layton
                   ` (6 preceding siblings ...)
  2026-06-02 16:23 ` [PATCH v2 7/9] nfsd: fix fcache_disposal UAF by inlining dispose state into nfsd_net Jeff Layton
@ 2026-06-02 16:23 ` Jeff Layton
  2026-06-03 17:33   ` Chuck Lever
  2026-06-02 16:23 ` [PATCH v2 9/9] nfsd: unify cleanups in nfsd_cross_mnt() exits Jeff Layton
  2026-06-03 20:30 ` [PATCH v2 0/9] nfsd: fixes for locally-triggerable bugs Chuck Lever
  9 siblings, 1 reply; 15+ messages in thread
From: Jeff Layton @ 2026-06-02 16:23 UTC (permalink / raw)
  To: Chuck Lever, NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	Lorenzo Bianconi, Anna Schumaker, Trond Myklebust, Anna Schumaker,
	Mike Snitzer
  Cc: Al Viro, Chris Mason, linux-nfs, linux-kernel, Trond Myklebust,
	Jeff Layton

Take a net-namespace reference in nfsd_file_dispose_list_delayed()
(get_net) and release it in nfsd_file_free() (put_net), so that
nf_net is always valid for files that the GC or shrinker has isolated
from the hash table and LRU -- which __nfsd_file_cache_purge() cannot
see.

Without this, nf_net can dangle for in-flight files whose net namespace
is torn down concurrently, causing a use-after-free when
nfsd_file_dispose_list_delayed() calls net_generic(nf->nf_net, ...).

Only files entering the delayed-dispose path need the net reference;
files freed synchronously (non-GC stateful opens, purge, direct put)
are always freed while the net namespace is still alive, so they skip
get_net()/put_net() entirely.  A new NFSD_FILE_NET_HELD flag tracks
whether a given nfsd_file holds a net reference.

Because nfsd_file_free() may now call put_net(nf->nf_net), the old
nfsd_file_put_local() pattern of returning nf->nf_net after
nfsd_file_put() is unsafe -- put_net() could theoretically drop the
last net namespace reference, leaving the returned pointer stale.
Fix this by moving the nfsd_net_put() call into nfsd_file_put_local()
itself, before the nfsd_file_put() that may trigger nfsd_file_free().
The function now returns void and the caller no longer needs to handle
the net reference.

Fixes: 43fd953fa7e2 ("nfsd: simplify the delayed disposal list code")
Assisted-by: Claude:claude-opus-4-6
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/nfsd/filecache.c        | 34 ++++++++++++++++++++++++++--------
 fs/nfsd/filecache.h        |  3 ++-
 fs/nfsd/localio.c          |  4 ++--
 include/linux/nfslocalio.h |  9 ++-------
 4 files changed, 32 insertions(+), 18 deletions(-)

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 03f01a0beced..957fe57be063 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -295,6 +295,9 @@ nfsd_file_free(struct nfsd_file *nf)
 	if (WARN_ON_ONCE(!list_empty(&nf->nf_lru)))
 		return;
 
+	if (test_bit(NFSD_FILE_NET_HELD, &nf->nf_flags))
+		put_net(nf->nf_net);
+
 	call_rcu(&nf->nf_rcu, nfsd_file_slab_free);
 }
 
@@ -375,24 +378,28 @@ nfsd_file_put(struct nfsd_file *nf)
 }
 
 /**
- * nfsd_file_put_local - put nfsd_file reference and arm nfsd_net_put in caller
+ * nfsd_file_put_local - put nfsd_file reference and release nfsd_net ref
  * @pnf: nfsd_file of which to put the reference
  *
- * First save the associated net to return to caller, then put
- * the reference of the nfsd_file.
+ * Drops both the nfsd_file reference and the associated nfsd_net
+ * reference.  The nfsd_net ref is released before the file ref so
+ * that put_net() inside nfsd_file_free() cannot drop the last net
+ * namespace reference while the caller still needs it.
  */
-struct net *
+void
 nfsd_file_put_local(struct nfsd_file __rcu **pnf)
 {
 	struct nfsd_file *nf;
-	struct net *net = NULL;
 
 	nf = unrcu_pointer(xchg(pnf, NULL));
 	if (nf) {
-		net = nf->nf_net;
+		struct net *net = nf->nf_net;
+
+		rcu_read_lock();
+		nfsd_net_put(net);
+		rcu_read_unlock();
 		nfsd_file_put(nf);
 	}
-	return net;
 }
 
 /**
@@ -433,9 +440,20 @@ nfsd_file_dispose_list_delayed(struct list_head *dispose)
 	while (!list_empty(dispose)) {
 		struct nfsd_file *nf = list_first_entry(dispose,
 						struct nfsd_file, nf_gc);
-		struct nfsd_net *nn = net_generic(nf->nf_net, nfsd_net_id);
+		struct nfsd_net *nn;
 		struct svc_serv *serv;
 
+		/*
+		 * Pin the net namespace so nf_net stays valid while the
+		 * file sits on the per-net dispose list.  Callers (the GC
+		 * worker, shrinker, and fsnotify callbacks) always run
+		 * before nfsd_net_exit(), so nf_net is still live here.
+		 * The matching put_net() is in nfsd_file_free().
+		 */
+		get_net(nf->nf_net);
+		set_bit(NFSD_FILE_NET_HELD, &nf->nf_flags);
+
+		nn = net_generic(nf->nf_net, nfsd_net_id);
 		spin_lock(&nn->fcache_dispose_lock);
 		list_move_tail(&nf->nf_gc, &nn->fcache_dispose_list);
 		spin_unlock(&nn->fcache_dispose_lock);
diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
index 683b6437cacc..7ae3c0ea0a2a 100644
--- a/fs/nfsd/filecache.h
+++ b/fs/nfsd/filecache.h
@@ -45,6 +45,7 @@ struct nfsd_file {
 #define NFSD_FILE_REFERENCED	(2)
 #define NFSD_FILE_GC		(3)
 #define NFSD_FILE_RECENT	(4)
+#define NFSD_FILE_NET_HELD	(5)
 	unsigned long		nf_flags;
 	refcount_t		nf_ref;
 	unsigned char		nf_may;
@@ -66,7 +67,7 @@ void nfsd_file_cache_shutdown(void);
 int nfsd_file_cache_start_net(struct net *net);
 void nfsd_file_cache_shutdown_net(struct net *net);
 void nfsd_file_put(struct nfsd_file *nf);
-struct net *nfsd_file_put_local(struct nfsd_file __rcu **nf);
+void nfsd_file_put_local(struct nfsd_file __rcu **nf);
 struct nfsd_file *nfsd_file_get(struct nfsd_file *nf);
 struct file *nfsd_file_file(struct nfsd_file *nf);
 void nfsd_file_close_inode_sync(struct inode *inode);
diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
index c3eb0557b3e1..e3295bae75a4 100644
--- a/fs/nfsd/localio.c
+++ b/fs/nfsd/localio.c
@@ -40,8 +40,8 @@
  * avoid all the NFS overhead with reads, writes and commits.
  *
  * On successful return, returned nfsd_file will have its nf_net member
- * set. Caller (NFS client) is responsible for calling nfsd_net_put and
- * nfsd_file_put (via nfs_to_nfsd_file_put_local).
+ * set. Caller (NFS client) is responsible for calling nfsd_file_put
+ * (via nfs_to_nfsd_file_put_local), which also releases the nfsd_net ref.
  */
 static struct nfsd_file *
 nfsd_open_local_fh(struct net *net, struct auth_domain *dom,
diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h
index 3d91043254e6..7267a69092d1 100644
--- a/include/linux/nfslocalio.h
+++ b/include/linux/nfslocalio.h
@@ -62,7 +62,7 @@ struct nfsd_localio_operations {
 						const struct nfs_fh *,
 						struct nfsd_file __rcu **pnf,
 						const fmode_t);
-	struct net *(*nfsd_file_put_local)(struct nfsd_file __rcu **);
+	void (*nfsd_file_put_local)(struct nfsd_file __rcu **);
 	struct file *(*nfsd_file_file)(struct nfsd_file *);
 	void (*nfsd_file_dio_alignment)(struct nfsd_file *,
 					u32 *, u32 *, u32 *);
@@ -96,12 +96,7 @@ static inline void nfs_to_nfsd_file_put_local(struct nfsd_file __rcu **localio)
 	 * must prevent nfsd shutdown from completing as nfs_close_local_fh()
 	 * does by blocking the nfs_uuid from being finally put.
 	 */
-	struct net *net;
-
-	net = nfs_to->nfsd_file_put_local(localio);
-
-	if (net)
-		nfs_to_nfsd_net_put(net);
+	nfs_to->nfsd_file_put_local(localio);
 }
 
 #else   /* CONFIG_NFS_LOCALIO */

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 9/9] nfsd: unify cleanups in nfsd_cross_mnt() exits
  2026-06-02 16:23 [PATCH v2 0/9] nfsd: fixes for locally-triggerable bugs Jeff Layton
                   ` (7 preceding siblings ...)
  2026-06-02 16:23 ` [PATCH v2 8/9] nfsd: hold net namespace reference for delayed-dispose nfsd_files Jeff Layton
@ 2026-06-02 16:23 ` Jeff Layton
  2026-06-03 20:30 ` [PATCH v2 0/9] nfsd: fixes for locally-triggerable bugs Chuck Lever
  9 siblings, 0 replies; 15+ messages in thread
From: Jeff Layton @ 2026-06-02 16:23 UTC (permalink / raw)
  To: Chuck Lever, NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	Lorenzo Bianconi, Anna Schumaker, Trond Myklebust, Anna Schumaker,
	Mike Snitzer
  Cc: Al Viro, Chris Mason, linux-nfs, linux-kernel, Trond Myklebust,
	Jeff Layton

From: Al Viro <viro@zeniv.linux.org.uk>

Instead of having a separate path_put() on each failure exit, as well as
on the normal path, let's move all of those past the point where these
codepaths join.  We want to keep the ordering between path_put() and
exp_put(), so move that one as well.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/nfsd/vfs.c | 17 +++++++----------
 1 file changed, 7 insertions(+), 10 deletions(-)

diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 95ce15440492..cfac0cc4207c 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -137,20 +137,19 @@ nfsd_cross_mnt(struct svc_rqst *rqstp, struct dentry **dpp,
 		follow_flags = LOOKUP_AUTOMOUNT;
 
 	err = follow_down(&path, follow_flags);
-	if (err < 0) {
-		path_put(&path);
+	if (err < 0)
 		goto out;
-	}
+
 	if (path.mnt == exp->ex_path.mnt && path.dentry == dentry &&
 	    nfsd_mountpoint(dentry, exp) == 2) {
 		/* This is only a mountpoint in some other namespace */
-		path_put(&path);
 		goto out;
 	}
 
 	exp2 = rqst_exp_get_by_name(rqstp, &path);
 	if (IS_ERR(exp2)) {
 		err = PTR_ERR(exp2);
+		exp2 = NULL;
 		/*
 		 * We normally allow NFS clients to continue
 		 * "underneath" a mountpoint that is not exported.
@@ -160,10 +159,7 @@ nfsd_cross_mnt(struct svc_rqst *rqstp, struct dentry **dpp,
 		 */
 		if (err == -ENOENT && !(exp->ex_flags & NFSEXP_V4ROOT))
 			err = 0;
-		path_put(&path);
-		goto out;
-	}
-	if (nfsd_v4client(rqstp) ||
+	} else if (nfsd_v4client(rqstp) ||
 		(exp->ex_flags & NFSEXP_CROSSMOUNT) || EX_NOHIDE(exp2)) {
 		/* successfully crossed mount point */
 		/*
@@ -177,9 +173,10 @@ nfsd_cross_mnt(struct svc_rqst *rqstp, struct dentry **dpp,
 		*expp = exp2;
 		exp2 = exp;
 	}
-	path_put(&path);
-	exp_put(exp2);
 out:
+	path_put(&path);
+	if (exp2)
+		exp_put(exp2);
 	return err;
 }
 

-- 
2.54.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 8/9] nfsd: hold net namespace reference for delayed-dispose nfsd_files
  2026-06-02 16:23 ` [PATCH v2 8/9] nfsd: hold net namespace reference for delayed-dispose nfsd_files Jeff Layton
@ 2026-06-03 17:33   ` Chuck Lever
  2026-06-03 17:50     ` Jeff Layton
  0 siblings, 1 reply; 15+ messages in thread
From: Chuck Lever @ 2026-06-03 17:33 UTC (permalink / raw)
  To: Jeff Layton, Chuck Lever, NeilBrown, Olga Kornievskaia, Dai Ngo,
	Tom Talpey, Lorenzo Bianconi, Anna Schumaker, Trond Myklebust,
	Anna Schumaker, Mike Snitzer
  Cc: Alexander Viro, Chris Mason, linux-nfs, linux-kernel,
	Trond Myklebust



On Tue, Jun 2, 2026, at 9:23 AM, Jeff Layton wrote:
> Take a net-namespace reference in nfsd_file_dispose_list_delayed()
> (get_net) and release it in nfsd_file_free() (put_net), so that
> nf_net is always valid for files that the GC or shrinker has isolated
> from the hash table and LRU -- which __nfsd_file_cache_purge() cannot
> see.
>
> Without this, nf_net can dangle for in-flight files whose net namespace
> is torn down concurrently, causing a use-after-free when
> nfsd_file_dispose_list_delayed() calls net_generic(nf->nf_net, ...).
>
> Only files entering the delayed-dispose path need the net reference;
> files freed synchronously (non-GC stateful opens, purge, direct put)
> are always freed while the net namespace is still alive, so they skip
> get_net()/put_net() entirely.  A new NFSD_FILE_NET_HELD flag tracks
> whether a given nfsd_file holds a net reference.
>
> Because nfsd_file_free() may now call put_net(nf->nf_net), the old
> nfsd_file_put_local() pattern of returning nf->nf_net after
> nfsd_file_put() is unsafe -- put_net() could theoretically drop the
> last net namespace reference, leaving the returned pointer stale.
> Fix this by moving the nfsd_net_put() call into nfsd_file_put_local()
> itself, before the nfsd_file_put() that may trigger nfsd_file_free().
> The function now returns void and the caller no longer needs to handle
> the net reference.
>
> Fixes: 43fd953fa7e2 ("nfsd: simplify the delayed disposal list code")
> Assisted-by: Claude:claude-opus-4-6
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
>  fs/nfsd/filecache.c        | 34 ++++++++++++++++++++++++++--------
>  fs/nfsd/filecache.h        |  3 ++-
>  fs/nfsd/localio.c          |  4 ++--
>  include/linux/nfslocalio.h |  9 ++-------
>  4 files changed, 32 insertions(+), 18 deletions(-)
>
> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> index 03f01a0beced..957fe57be063 100644
> --- a/fs/nfsd/filecache.c
> +++ b/fs/nfsd/filecache.c
> @@ -295,6 +295,9 @@ nfsd_file_free(struct nfsd_file *nf)
>  	if (WARN_ON_ONCE(!list_empty(&nf->nf_lru)))
>  		return;
> 
> +	if (test_bit(NFSD_FILE_NET_HELD, &nf->nf_flags))
> +		put_net(nf->nf_net);
> +
>  	call_rcu(&nf->nf_rcu, nfsd_file_slab_free);
>  }
> 
> @@ -375,24 +378,28 @@ nfsd_file_put(struct nfsd_file *nf)
>  }
> 
>  /**
> - * nfsd_file_put_local - put nfsd_file reference and arm nfsd_net_put in caller
> + * nfsd_file_put_local - put nfsd_file reference and release nfsd_net ref
>   * @pnf: nfsd_file of which to put the reference
>   *
> - * First save the associated net to return to caller, then put
> - * the reference of the nfsd_file.
> + * Drops both the nfsd_file reference and the associated nfsd_net
> + * reference.  The nfsd_net ref is released before the file ref so
> + * that put_net() inside nfsd_file_free() cannot drop the last net
> + * namespace reference while the caller still needs it.
>   */
> -struct net *
> +void
>  nfsd_file_put_local(struct nfsd_file __rcu **pnf)
>  {
>  	struct nfsd_file *nf;
> -	struct net *net = NULL;
> 
>  	nf = unrcu_pointer(xchg(pnf, NULL));
>  	if (nf) {
> -		net = nf->nf_net;
> +		struct net *net = nf->nf_net;
> +
> +		rcu_read_lock();
> +		nfsd_net_put(net);
> +		rcu_read_unlock();
>  		nfsd_file_put(nf);
>  	}
> -	return net;
>  }
> 
>  /**
> @@ -433,9 +440,20 @@ nfsd_file_dispose_list_delayed(struct list_head *dispose)
>  	while (!list_empty(dispose)) {
>  		struct nfsd_file *nf = list_first_entry(dispose,
>  						struct nfsd_file, nf_gc);
> -		struct nfsd_net *nn = net_generic(nf->nf_net, nfsd_net_id);
> +		struct nfsd_net *nn;
>  		struct svc_serv *serv;
> 
> +		/*
> +		 * Pin the net namespace so nf_net stays valid while the
> +		 * file sits on the per-net dispose list.  Callers (the GC
> +		 * worker, shrinker, and fsnotify callbacks) always run
> +		 * before nfsd_net_exit(), so nf_net is still live here.
> +		 * The matching put_net() is in nfsd_file_free().
> +		 */
> +		get_net(nf->nf_net);
> +		set_bit(NFSD_FILE_NET_HELD, &nf->nf_flags);
> +
> +		nn = net_generic(nf->nf_net, nfsd_net_id);
>  		spin_lock(&nn->fcache_dispose_lock);
>  		list_move_tail(&nf->nf_gc, &nn->fcache_dispose_list);
>  		spin_unlock(&nn->fcache_dispose_lock);
> diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
> index 683b6437cacc..7ae3c0ea0a2a 100644
> --- a/fs/nfsd/filecache.h
> +++ b/fs/nfsd/filecache.h
> @@ -45,6 +45,7 @@ struct nfsd_file {
>  #define NFSD_FILE_REFERENCED	(2)
>  #define NFSD_FILE_GC		(3)
>  #define NFSD_FILE_RECENT	(4)
> +#define NFSD_FILE_NET_HELD	(5)
>  	unsigned long		nf_flags;
>  	refcount_t		nf_ref;
>  	unsigned char		nf_may;
> @@ -66,7 +67,7 @@ void nfsd_file_cache_shutdown(void);
>  int nfsd_file_cache_start_net(struct net *net);
>  void nfsd_file_cache_shutdown_net(struct net *net);
>  void nfsd_file_put(struct nfsd_file *nf);
> -struct net *nfsd_file_put_local(struct nfsd_file __rcu **nf);
> +void nfsd_file_put_local(struct nfsd_file __rcu **nf);
>  struct nfsd_file *nfsd_file_get(struct nfsd_file *nf);
>  struct file *nfsd_file_file(struct nfsd_file *nf);
>  void nfsd_file_close_inode_sync(struct inode *inode);
> diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
> index c3eb0557b3e1..e3295bae75a4 100644
> --- a/fs/nfsd/localio.c
> +++ b/fs/nfsd/localio.c
> @@ -40,8 +40,8 @@
>   * avoid all the NFS overhead with reads, writes and commits.
>   *
>   * On successful return, returned nfsd_file will have its nf_net member
> - * set. Caller (NFS client) is responsible for calling nfsd_net_put and
> - * nfsd_file_put (via nfs_to_nfsd_file_put_local).
> + * set. Caller (NFS client) is responsible for calling nfsd_file_put
> + * (via nfs_to_nfsd_file_put_local), which also releases the nfsd_net 
> ref.
>   */
>  static struct nfsd_file *
>  nfsd_open_local_fh(struct net *net, struct auth_domain *dom,
> diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h
> index 3d91043254e6..7267a69092d1 100644
> --- a/include/linux/nfslocalio.h
> +++ b/include/linux/nfslocalio.h
> @@ -62,7 +62,7 @@ struct nfsd_localio_operations {
>  						const struct nfs_fh *,
>  						struct nfsd_file __rcu **pnf,
>  						const fmode_t);
> -	struct net *(*nfsd_file_put_local)(struct nfsd_file __rcu **);
> +	void (*nfsd_file_put_local)(struct nfsd_file __rcu **);
>  	struct file *(*nfsd_file_file)(struct nfsd_file *);
>  	void (*nfsd_file_dio_alignment)(struct nfsd_file *,
>  					u32 *, u32 *, u32 *);
> @@ -96,12 +96,7 @@ static inline void nfs_to_nfsd_file_put_local(struct 
> nfsd_file __rcu **localio)
>  	 * must prevent nfsd shutdown from completing as nfs_close_local_fh()
>  	 * does by blocking the nfs_uuid from being finally put.
>  	 */
> -	struct net *net;
> -
> -	net = nfs_to->nfsd_file_put_local(localio);
> -
> -	if (net)
> -		nfs_to_nfsd_net_put(net);
> +	nfs_to->nfsd_file_put_local(localio);
>  }
> 
>  #else   /* CONFIG_NFS_LOCALIO */
>
> -- 
> 2.54.0

It seems that all of the LLM reviewers have difficulty with this patch.
This is a consolidated review of the issue from Claude and Codex:

> The reordering in nfsd_file_put_local() -- nfsd_net_put() before
> nfsd_file_put() -- introduces a shutdown race.
>
> The nfsd_net_ref percpu refcount is taken only by LOCALIO
> (nfsd_open_local_fh() and nfs_open_local_fh()). The drain wait in
> nfsd_shutdown_net() (wait_for_completion(&nn->nfsd_net_free_done))
> is what holds off percpu_ref_exit() and nfsd_shutdown_generic() --
> and through the latter, nfsd_file_cache_shutdown(), which runs
> rcu_barrier() and then destroys nfsd_file_slab, nfsd_file_mark_slab,
> the fsnotify groups, and the rhltable.
>
> Per-I/O references are not covered by the nfs_uuid handshake. Each
> pgio call takes its own nfsd_file ref plus a paired nfsd_net ref
> (fs/nfs/pagelist.c, nfs_local_open_fh), stores it in iocb->localio,
> and releases it at completion through nfsd_file_put_local(). An
> iocb is not on nfs_uuid->files, so nfs_localio_invalidate_clients()
> does not wait for it; only the drain wait does. Meanwhile
> __nfsd_file_cache_purge() has already unhashed the nfsd_file but
> cannot free it (the iocb ref keeps refcount elevated in
> nfsd_file_cond_queue()).
>
> So with one I/O in flight when the server is stopped: the shutdown
> thread parks at the drain wait; the I/O completion thread enters
> nfsd_file_put_local() and drops the last nfsd_net ref, which runs
> complete() before nfsd_file_put() has executed. The shutdown thread
> then proceeds through nfsd_file_cache_shutdown() concurrently with
> the final nfsd_file_free(): the call_rcu() is queued after the
> rcu_barrier(), so nfsd_file_slab_free() does kmem_cache_free() into
> a destroyed cache, and nfsd_file_mark_put() runs against a destroyed
> fsnotify group. kmem_cache_destroy() also fires "objects remaining"
> because the nfsd_file is still allocated.
>
> The old ordering was the mechanism that prevented this: the caller
> held its paired nfsd_net ref across nfsd_file_put(), and percpu_ref
> guarantees the release callback runs only after every ref is
> dropped, so global teardown strictly followed the file free and the
> rcu_barrier() flushed its call_rcu().
>
> The hazard the commit message cites for the reorder cannot occur on
> this path: NFSD_FILE_NET_HELD is set only in
> nfsd_file_dispose_list_delayed(), reachable only through
> refcount_dec_if_one() in nfsd_file_lru_cb(), i.e. at refcount == 1.
> A file with an outstanding LOCALIO reference has refcount >= 2, so
> a file whose final put arrives via nfsd_file_put_local() never has
> NET_HELD set and its nfsd_file_free() never calls put_net().
>
> Suggest keeping the void API but restoring the put order:
>
>       nf = unrcu_pointer(xchg(pnf, NULL));
>       if (nf) {
>               struct net *net = nf->nf_net;
> 
>               nfsd_file_put(nf);
>               rcu_read_lock();
>               nfsd_net_put(net);
>               rcu_read_unlock();
>       }
>
> with the kdoc comment and the commit message paragraph about the
> old ordering being unsafe adjusted to match.


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 8/9] nfsd: hold net namespace reference for delayed-dispose nfsd_files
  2026-06-03 17:33   ` Chuck Lever
@ 2026-06-03 17:50     ` Jeff Layton
  2026-06-03 18:20       ` Chuck Lever
  0 siblings, 1 reply; 15+ messages in thread
From: Jeff Layton @ 2026-06-03 17:50 UTC (permalink / raw)
  To: Chuck Lever, Chuck Lever, NeilBrown, Olga Kornievskaia, Dai Ngo,
	Tom Talpey, Lorenzo Bianconi, Anna Schumaker, Trond Myklebust,
	Anna Schumaker, Mike Snitzer
  Cc: Alexander Viro, Chris Mason, linux-nfs, linux-kernel,
	Trond Myklebust

On Wed, 2026-06-03 at 10:33 -0700, Chuck Lever wrote:
> 
> On Tue, Jun 2, 2026, at 9:23 AM, Jeff Layton wrote:
> > Take a net-namespace reference in nfsd_file_dispose_list_delayed()
> > (get_net) and release it in nfsd_file_free() (put_net), so that
> > nf_net is always valid for files that the GC or shrinker has isolated
> > from the hash table and LRU -- which __nfsd_file_cache_purge() cannot
> > see.
> > 
> > Without this, nf_net can dangle for in-flight files whose net namespace
> > is torn down concurrently, causing a use-after-free when
> > nfsd_file_dispose_list_delayed() calls net_generic(nf->nf_net, ...).
> > 
> > Only files entering the delayed-dispose path need the net reference;
> > files freed synchronously (non-GC stateful opens, purge, direct put)
> > are always freed while the net namespace is still alive, so they skip
> > get_net()/put_net() entirely.  A new NFSD_FILE_NET_HELD flag tracks
> > whether a given nfsd_file holds a net reference.
> > 
> > Because nfsd_file_free() may now call put_net(nf->nf_net), the old
> > nfsd_file_put_local() pattern of returning nf->nf_net after
> > nfsd_file_put() is unsafe -- put_net() could theoretically drop the
> > last net namespace reference, leaving the returned pointer stale.
> > Fix this by moving the nfsd_net_put() call into nfsd_file_put_local()
> > itself, before the nfsd_file_put() that may trigger nfsd_file_free().
> > The function now returns void and the caller no longer needs to handle
> > the net reference.
> > 
> > Fixes: 43fd953fa7e2 ("nfsd: simplify the delayed disposal list code")
> > Assisted-by: Claude:claude-opus-4-6
> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > ---
> >  fs/nfsd/filecache.c        | 34 ++++++++++++++++++++++++++--------
> >  fs/nfsd/filecache.h        |  3 ++-
> >  fs/nfsd/localio.c          |  4 ++--
> >  include/linux/nfslocalio.h |  9 ++-------
> >  4 files changed, 32 insertions(+), 18 deletions(-)
> > 
> > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> > index 03f01a0beced..957fe57be063 100644
> > --- a/fs/nfsd/filecache.c
> > +++ b/fs/nfsd/filecache.c
> > @@ -295,6 +295,9 @@ nfsd_file_free(struct nfsd_file *nf)
> >  	if (WARN_ON_ONCE(!list_empty(&nf->nf_lru)))
> >  		return;
> > 
> > +	if (test_bit(NFSD_FILE_NET_HELD, &nf->nf_flags))
> > +		put_net(nf->nf_net);
> > +
> >  	call_rcu(&nf->nf_rcu, nfsd_file_slab_free);
> >  }
> > 
> > @@ -375,24 +378,28 @@ nfsd_file_put(struct nfsd_file *nf)
> >  }
> > 
> >  /**
> > - * nfsd_file_put_local - put nfsd_file reference and arm nfsd_net_put in caller
> > + * nfsd_file_put_local - put nfsd_file reference and release nfsd_net ref
> >   * @pnf: nfsd_file of which to put the reference
> >   *
> > - * First save the associated net to return to caller, then put
> > - * the reference of the nfsd_file.
> > + * Drops both the nfsd_file reference and the associated nfsd_net
> > + * reference.  The nfsd_net ref is released before the file ref so
> > + * that put_net() inside nfsd_file_free() cannot drop the last net
> > + * namespace reference while the caller still needs it.
> >   */
> > -struct net *
> > +void
> >  nfsd_file_put_local(struct nfsd_file __rcu **pnf)
> >  {
> >  	struct nfsd_file *nf;
> > -	struct net *net = NULL;
> > 
> >  	nf = unrcu_pointer(xchg(pnf, NULL));
> >  	if (nf) {
> > -		net = nf->nf_net;
> > +		struct net *net = nf->nf_net;
> > +
> > +		rcu_read_lock();
> > +		nfsd_net_put(net);
> > +		rcu_read_unlock();
> >  		nfsd_file_put(nf);
> >  	}
> > -	return net;
> >  }
> > 
> >  /**
> > @@ -433,9 +440,20 @@ nfsd_file_dispose_list_delayed(struct list_head *dispose)
> >  	while (!list_empty(dispose)) {
> >  		struct nfsd_file *nf = list_first_entry(dispose,
> >  						struct nfsd_file, nf_gc);
> > -		struct nfsd_net *nn = net_generic(nf->nf_net, nfsd_net_id);
> > +		struct nfsd_net *nn;
> >  		struct svc_serv *serv;
> > 
> > +		/*
> > +		 * Pin the net namespace so nf_net stays valid while the
> > +		 * file sits on the per-net dispose list.  Callers (the GC
> > +		 * worker, shrinker, and fsnotify callbacks) always run
> > +		 * before nfsd_net_exit(), so nf_net is still live here.
> > +		 * The matching put_net() is in nfsd_file_free().
> > +		 */
> > +		get_net(nf->nf_net);
> > +		set_bit(NFSD_FILE_NET_HELD, &nf->nf_flags);
> > +
> > +		nn = net_generic(nf->nf_net, nfsd_net_id);
> >  		spin_lock(&nn->fcache_dispose_lock);
> >  		list_move_tail(&nf->nf_gc, &nn->fcache_dispose_list);
> >  		spin_unlock(&nn->fcache_dispose_lock);
> > diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
> > index 683b6437cacc..7ae3c0ea0a2a 100644
> > --- a/fs/nfsd/filecache.h
> > +++ b/fs/nfsd/filecache.h
> > @@ -45,6 +45,7 @@ struct nfsd_file {
> >  #define NFSD_FILE_REFERENCED	(2)
> >  #define NFSD_FILE_GC		(3)
> >  #define NFSD_FILE_RECENT	(4)
> > +#define NFSD_FILE_NET_HELD	(5)
> >  	unsigned long		nf_flags;
> >  	refcount_t		nf_ref;
> >  	unsigned char		nf_may;
> > @@ -66,7 +67,7 @@ void nfsd_file_cache_shutdown(void);
> >  int nfsd_file_cache_start_net(struct net *net);
> >  void nfsd_file_cache_shutdown_net(struct net *net);
> >  void nfsd_file_put(struct nfsd_file *nf);
> > -struct net *nfsd_file_put_local(struct nfsd_file __rcu **nf);
> > +void nfsd_file_put_local(struct nfsd_file __rcu **nf);
> >  struct nfsd_file *nfsd_file_get(struct nfsd_file *nf);
> >  struct file *nfsd_file_file(struct nfsd_file *nf);
> >  void nfsd_file_close_inode_sync(struct inode *inode);
> > diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
> > index c3eb0557b3e1..e3295bae75a4 100644
> > --- a/fs/nfsd/localio.c
> > +++ b/fs/nfsd/localio.c
> > @@ -40,8 +40,8 @@
> >   * avoid all the NFS overhead with reads, writes and commits.
> >   *
> >   * On successful return, returned nfsd_file will have its nf_net member
> > - * set. Caller (NFS client) is responsible for calling nfsd_net_put and
> > - * nfsd_file_put (via nfs_to_nfsd_file_put_local).
> > + * set. Caller (NFS client) is responsible for calling nfsd_file_put
> > + * (via nfs_to_nfsd_file_put_local), which also releases the nfsd_net 
> > ref.
> >   */
> >  static struct nfsd_file *
> >  nfsd_open_local_fh(struct net *net, struct auth_domain *dom,
> > diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h
> > index 3d91043254e6..7267a69092d1 100644
> > --- a/include/linux/nfslocalio.h
> > +++ b/include/linux/nfslocalio.h
> > @@ -62,7 +62,7 @@ struct nfsd_localio_operations {
> >  						const struct nfs_fh *,
> >  						struct nfsd_file __rcu **pnf,
> >  						const fmode_t);
> > -	struct net *(*nfsd_file_put_local)(struct nfsd_file __rcu **);
> > +	void (*nfsd_file_put_local)(struct nfsd_file __rcu **);
> >  	struct file *(*nfsd_file_file)(struct nfsd_file *);
> >  	void (*nfsd_file_dio_alignment)(struct nfsd_file *,
> >  					u32 *, u32 *, u32 *);
> > @@ -96,12 +96,7 @@ static inline void nfs_to_nfsd_file_put_local(struct 
> > nfsd_file __rcu **localio)
> >  	 * must prevent nfsd shutdown from completing as nfs_close_local_fh()
> >  	 * does by blocking the nfs_uuid from being finally put.
> >  	 */
> > -	struct net *net;
> > -
> > -	net = nfs_to->nfsd_file_put_local(localio);
> > -
> > -	if (net)
> > -		nfs_to_nfsd_net_put(net);
> > +	nfs_to->nfsd_file_put_local(localio);
> >  }
> > 
> >  #else   /* CONFIG_NFS_LOCALIO */
> > 
> > -- 
> > 2.54.0
> 
> It seems that all of the LLM reviewers have difficulty with this patch.
> This is a consolidated review of the issue from Claude and Codex:
> 
> > The reordering in nfsd_file_put_local() -- nfsd_net_put() before
> > nfsd_file_put() -- introduces a shutdown race.
> > 
> > The nfsd_net_ref percpu refcount is taken only by LOCALIO
> > (nfsd_open_local_fh() and nfs_open_local_fh()). The drain wait in
> > nfsd_shutdown_net() (wait_for_completion(&nn->nfsd_net_free_done))
> > is what holds off percpu_ref_exit() and nfsd_shutdown_generic() --
> > and through the latter, nfsd_file_cache_shutdown(), which runs
> > rcu_barrier() and then destroys nfsd_file_slab, nfsd_file_mark_slab,
> > the fsnotify groups, and the rhltable.
> > 
> > Per-I/O references are not covered by the nfs_uuid handshake. Each
> > pgio call takes its own nfsd_file ref plus a paired nfsd_net ref
> > (fs/nfs/pagelist.c, nfs_local_open_fh), stores it in iocb->localio,
> > and releases it at completion through nfsd_file_put_local(). An
> > iocb is not on nfs_uuid->files, so nfs_localio_invalidate_clients()
> > does not wait for it; only the drain wait does. Meanwhile
> > __nfsd_file_cache_purge() has already unhashed the nfsd_file but
> > cannot free it (the iocb ref keeps refcount elevated in
> > nfsd_file_cond_queue()).
> > 
> > So with one I/O in flight when the server is stopped: the shutdown
> > thread parks at the drain wait; the I/O completion thread enters
> > nfsd_file_put_local() and drops the last nfsd_net ref, which runs
> > complete() before nfsd_file_put() has executed. The shutdown thread
> > then proceeds through nfsd_file_cache_shutdown() concurrently with
> > the final nfsd_file_free(): the call_rcu() is queued after the
> > rcu_barrier(), so nfsd_file_slab_free() does kmem_cache_free() into
> > a destroyed cache, and nfsd_file_mark_put() runs against a destroyed
> > fsnotify group. kmem_cache_destroy() also fires "objects remaining"
> > because the nfsd_file is still allocated.
> > 
> > The old ordering was the mechanism that prevented this: the caller
> > held its paired nfsd_net ref across nfsd_file_put(), and percpu_ref
> > guarantees the release callback runs only after every ref is
> > dropped, so global teardown strictly followed the file free and the
> > rcu_barrier() flushed its call_rcu().
> > 
> > The hazard the commit message cites for the reorder cannot occur on
> > this path: NFSD_FILE_NET_HELD is set only in
> > nfsd_file_dispose_list_delayed(), reachable only through
> > refcount_dec_if_one() in nfsd_file_lru_cb(), i.e. at refcount == 1.
> > A file with an outstanding LOCALIO reference has refcount >= 2, so
> > a file whose final put arrives via nfsd_file_put_local() never has
> > NET_HELD set and its nfsd_file_free() never calls put_net().
> > 
> > Suggest keeping the void API but restoring the put order:
> > 
> >       nf = unrcu_pointer(xchg(pnf, NULL));
> >       if (nf) {
> >               struct net *net = nf->nf_net;
> > 
> >               nfsd_file_put(nf);
> >               rcu_read_lock();
> >               nfsd_net_put(net);
> >               rcu_read_unlock();
> >       }
> > 
> > with the kdoc comment and the commit message paragraph about the
> > old ordering being unsafe adjusted to match.
> 


I had claude review this and it says:

● This is the same concern I just addressed for the previous patch's Finding 3, restated as a
  critical bug. The answer is the same: this is a false positive.

  The reviewer's scenario requires:

  1. The global shrinker unhashes and isolates an nfsd_file onto a local dispose list
  2. The net namespace teardown completes and struct net is freed
  3. The shrinker resumes and calls get_net() on freed memory

  Step 2 cannot happen before step 3. Here's why:

  The net namespace struct net is reference-counted. It isn't freed when nfsd_net_exit() runs --
  nfsd_net_exit() is a pernet subsys callback that runs during net namespace teardown, but the
  struct net itself stays alive until its own refcount drops to zero. The struct net outlives all
  pernet exit callbacks.

  More importantly, nfsd_file_dispose_list_delayed() is called from exactly three places:
  - nfsd_file_gc() -- the GC delayed work
  - nfsd_file_lru_scan() -- the shrinker
  - nfsd_file_close_inode() -- fsnotify

  All three isolate items and call nfsd_file_dispose_list_delayed() in a single, synchronous
  function call. There's no point where a local dispose list sits around waiting while the net
  namespace is torn down. The isolation and the get_net() happen in the same execution context.

  The shrinker and GC worker are global and remain registered until nfsd_file_cache_shutdown()
  (global teardown). Per-net teardown via nfsd_file_cache_shutdown_net() does not unregister them.
  But the files they process were allocated while the net namespace was alive, and nf->nf_net was
  set at allocation time in nfsd_file_alloc(). The struct net cannot have a zero refcount while nfsd
  files referencing it still exist in the LRU -- the nfsd server itself holds net references
  through its operation.

  The get_net() here is not protecting against the current dereference being unsafe. It's ensuring
  that nf_net remains valid for the arbitrarily long time the file sits on the per-net dispose list
  waiting for an nfsd thread to close it. That's exactly the right pattern.

  Verdict: False positive. The get_net() placement is correct and the struct net is guaranteed to be
  live at this point.

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 8/9] nfsd: hold net namespace reference for delayed-dispose nfsd_files
  2026-06-03 17:50     ` Jeff Layton
@ 2026-06-03 18:20       ` Chuck Lever
  2026-06-03 19:15         ` Jeff Layton
  0 siblings, 1 reply; 15+ messages in thread
From: Chuck Lever @ 2026-06-03 18:20 UTC (permalink / raw)
  To: Jeff Layton, Chuck Lever, NeilBrown, Olga Kornievskaia, Dai Ngo,
	Tom Talpey, Lorenzo Bianconi, Anna Schumaker, Trond Myklebust,
	Anna Schumaker, Mike Snitzer
  Cc: Alexander Viro, Chris Mason, linux-nfs, linux-kernel,
	Trond Myklebust



On Wed, Jun 3, 2026, at 10:50 AM, Jeff Layton wrote:
> On Wed, 2026-06-03 at 10:33 -0700, Chuck Lever wrote:
>> 
>> On Tue, Jun 2, 2026, at 9:23 AM, Jeff Layton wrote:
>> > Take a net-namespace reference in nfsd_file_dispose_list_delayed()
>> > (get_net) and release it in nfsd_file_free() (put_net), so that
>> > nf_net is always valid for files that the GC or shrinker has isolated
>> > from the hash table and LRU -- which __nfsd_file_cache_purge() cannot
>> > see.
>> > 
>> > Without this, nf_net can dangle for in-flight files whose net namespace
>> > is torn down concurrently, causing a use-after-free when
>> > nfsd_file_dispose_list_delayed() calls net_generic(nf->nf_net, ...).
>> > 
>> > Only files entering the delayed-dispose path need the net reference;
>> > files freed synchronously (non-GC stateful opens, purge, direct put)
>> > are always freed while the net namespace is still alive, so they skip
>> > get_net()/put_net() entirely.  A new NFSD_FILE_NET_HELD flag tracks
>> > whether a given nfsd_file holds a net reference.
>> > 
>> > Because nfsd_file_free() may now call put_net(nf->nf_net), the old
>> > nfsd_file_put_local() pattern of returning nf->nf_net after
>> > nfsd_file_put() is unsafe -- put_net() could theoretically drop the
>> > last net namespace reference, leaving the returned pointer stale.
>> > Fix this by moving the nfsd_net_put() call into nfsd_file_put_local()
>> > itself, before the nfsd_file_put() that may trigger nfsd_file_free().
>> > The function now returns void and the caller no longer needs to handle
>> > the net reference.
>> > 
>> > Fixes: 43fd953fa7e2 ("nfsd: simplify the delayed disposal list code")
>> > Assisted-by: Claude:claude-opus-4-6
>> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
>> > ---
>> >  fs/nfsd/filecache.c        | 34 ++++++++++++++++++++++++++--------
>> >  fs/nfsd/filecache.h        |  3 ++-
>> >  fs/nfsd/localio.c          |  4 ++--
>> >  include/linux/nfslocalio.h |  9 ++-------
>> >  4 files changed, 32 insertions(+), 18 deletions(-)
>> > 
>> > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
>> > index 03f01a0beced..957fe57be063 100644
>> > --- a/fs/nfsd/filecache.c
>> > +++ b/fs/nfsd/filecache.c
>> > @@ -295,6 +295,9 @@ nfsd_file_free(struct nfsd_file *nf)
>> >  	if (WARN_ON_ONCE(!list_empty(&nf->nf_lru)))
>> >  		return;
>> > 
>> > +	if (test_bit(NFSD_FILE_NET_HELD, &nf->nf_flags))
>> > +		put_net(nf->nf_net);
>> > +
>> >  	call_rcu(&nf->nf_rcu, nfsd_file_slab_free);
>> >  }
>> > 
>> > @@ -375,24 +378,28 @@ nfsd_file_put(struct nfsd_file *nf)
>> >  }
>> > 
>> >  /**
>> > - * nfsd_file_put_local - put nfsd_file reference and arm nfsd_net_put in caller
>> > + * nfsd_file_put_local - put nfsd_file reference and release nfsd_net ref
>> >   * @pnf: nfsd_file of which to put the reference
>> >   *
>> > - * First save the associated net to return to caller, then put
>> > - * the reference of the nfsd_file.
>> > + * Drops both the nfsd_file reference and the associated nfsd_net
>> > + * reference.  The nfsd_net ref is released before the file ref so
>> > + * that put_net() inside nfsd_file_free() cannot drop the last net
>> > + * namespace reference while the caller still needs it.
>> >   */
>> > -struct net *
>> > +void
>> >  nfsd_file_put_local(struct nfsd_file __rcu **pnf)
>> >  {
>> >  	struct nfsd_file *nf;
>> > -	struct net *net = NULL;
>> > 
>> >  	nf = unrcu_pointer(xchg(pnf, NULL));
>> >  	if (nf) {
>> > -		net = nf->nf_net;
>> > +		struct net *net = nf->nf_net;
>> > +
>> > +		rcu_read_lock();
>> > +		nfsd_net_put(net);
>> > +		rcu_read_unlock();
>> >  		nfsd_file_put(nf);
>> >  	}
>> > -	return net;
>> >  }
>> > 
>> >  /**
>> > @@ -433,9 +440,20 @@ nfsd_file_dispose_list_delayed(struct list_head *dispose)
>> >  	while (!list_empty(dispose)) {
>> >  		struct nfsd_file *nf = list_first_entry(dispose,
>> >  						struct nfsd_file, nf_gc);
>> > -		struct nfsd_net *nn = net_generic(nf->nf_net, nfsd_net_id);
>> > +		struct nfsd_net *nn;
>> >  		struct svc_serv *serv;
>> > 
>> > +		/*
>> > +		 * Pin the net namespace so nf_net stays valid while the
>> > +		 * file sits on the per-net dispose list.  Callers (the GC
>> > +		 * worker, shrinker, and fsnotify callbacks) always run
>> > +		 * before nfsd_net_exit(), so nf_net is still live here.
>> > +		 * The matching put_net() is in nfsd_file_free().
>> > +		 */
>> > +		get_net(nf->nf_net);
>> > +		set_bit(NFSD_FILE_NET_HELD, &nf->nf_flags);
>> > +
>> > +		nn = net_generic(nf->nf_net, nfsd_net_id);
>> >  		spin_lock(&nn->fcache_dispose_lock);
>> >  		list_move_tail(&nf->nf_gc, &nn->fcache_dispose_list);
>> >  		spin_unlock(&nn->fcache_dispose_lock);
>> > diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
>> > index 683b6437cacc..7ae3c0ea0a2a 100644
>> > --- a/fs/nfsd/filecache.h
>> > +++ b/fs/nfsd/filecache.h
>> > @@ -45,6 +45,7 @@ struct nfsd_file {
>> >  #define NFSD_FILE_REFERENCED	(2)
>> >  #define NFSD_FILE_GC		(3)
>> >  #define NFSD_FILE_RECENT	(4)
>> > +#define NFSD_FILE_NET_HELD	(5)
>> >  	unsigned long		nf_flags;
>> >  	refcount_t		nf_ref;
>> >  	unsigned char		nf_may;
>> > @@ -66,7 +67,7 @@ void nfsd_file_cache_shutdown(void);
>> >  int nfsd_file_cache_start_net(struct net *net);
>> >  void nfsd_file_cache_shutdown_net(struct net *net);
>> >  void nfsd_file_put(struct nfsd_file *nf);
>> > -struct net *nfsd_file_put_local(struct nfsd_file __rcu **nf);
>> > +void nfsd_file_put_local(struct nfsd_file __rcu **nf);
>> >  struct nfsd_file *nfsd_file_get(struct nfsd_file *nf);
>> >  struct file *nfsd_file_file(struct nfsd_file *nf);
>> >  void nfsd_file_close_inode_sync(struct inode *inode);
>> > diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
>> > index c3eb0557b3e1..e3295bae75a4 100644
>> > --- a/fs/nfsd/localio.c
>> > +++ b/fs/nfsd/localio.c
>> > @@ -40,8 +40,8 @@
>> >   * avoid all the NFS overhead with reads, writes and commits.
>> >   *
>> >   * On successful return, returned nfsd_file will have its nf_net member
>> > - * set. Caller (NFS client) is responsible for calling nfsd_net_put and
>> > - * nfsd_file_put (via nfs_to_nfsd_file_put_local).
>> > + * set. Caller (NFS client) is responsible for calling nfsd_file_put
>> > + * (via nfs_to_nfsd_file_put_local), which also releases the nfsd_net 
>> > ref.
>> >   */
>> >  static struct nfsd_file *
>> >  nfsd_open_local_fh(struct net *net, struct auth_domain *dom,
>> > diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h
>> > index 3d91043254e6..7267a69092d1 100644
>> > --- a/include/linux/nfslocalio.h
>> > +++ b/include/linux/nfslocalio.h
>> > @@ -62,7 +62,7 @@ struct nfsd_localio_operations {
>> >  						const struct nfs_fh *,
>> >  						struct nfsd_file __rcu **pnf,
>> >  						const fmode_t);
>> > -	struct net *(*nfsd_file_put_local)(struct nfsd_file __rcu **);
>> > +	void (*nfsd_file_put_local)(struct nfsd_file __rcu **);
>> >  	struct file *(*nfsd_file_file)(struct nfsd_file *);
>> >  	void (*nfsd_file_dio_alignment)(struct nfsd_file *,
>> >  					u32 *, u32 *, u32 *);
>> > @@ -96,12 +96,7 @@ static inline void nfs_to_nfsd_file_put_local(struct 
>> > nfsd_file __rcu **localio)
>> >  	 * must prevent nfsd shutdown from completing as nfs_close_local_fh()
>> >  	 * does by blocking the nfs_uuid from being finally put.
>> >  	 */
>> > -	struct net *net;
>> > -
>> > -	net = nfs_to->nfsd_file_put_local(localio);
>> > -
>> > -	if (net)
>> > -		nfs_to_nfsd_net_put(net);
>> > +	nfs_to->nfsd_file_put_local(localio);
>> >  }
>> > 
>> >  #else   /* CONFIG_NFS_LOCALIO */
>> > 
>> > -- 
>> > 2.54.0
>> 
>> It seems that all of the LLM reviewers have difficulty with this patch.
>> This is a consolidated review of the issue from Claude and Codex:
>> 
>> > The reordering in nfsd_file_put_local() -- nfsd_net_put() before
>> > nfsd_file_put() -- introduces a shutdown race.
>> > 
>> > The nfsd_net_ref percpu refcount is taken only by LOCALIO
>> > (nfsd_open_local_fh() and nfs_open_local_fh()). The drain wait in
>> > nfsd_shutdown_net() (wait_for_completion(&nn->nfsd_net_free_done))
>> > is what holds off percpu_ref_exit() and nfsd_shutdown_generic() --
>> > and through the latter, nfsd_file_cache_shutdown(), which runs
>> > rcu_barrier() and then destroys nfsd_file_slab, nfsd_file_mark_slab,
>> > the fsnotify groups, and the rhltable.
>> > 
>> > Per-I/O references are not covered by the nfs_uuid handshake. Each
>> > pgio call takes its own nfsd_file ref plus a paired nfsd_net ref
>> > (fs/nfs/pagelist.c, nfs_local_open_fh), stores it in iocb->localio,
>> > and releases it at completion through nfsd_file_put_local(). An
>> > iocb is not on nfs_uuid->files, so nfs_localio_invalidate_clients()
>> > does not wait for it; only the drain wait does. Meanwhile
>> > __nfsd_file_cache_purge() has already unhashed the nfsd_file but
>> > cannot free it (the iocb ref keeps refcount elevated in
>> > nfsd_file_cond_queue()).
>> > 
>> > So with one I/O in flight when the server is stopped: the shutdown
>> > thread parks at the drain wait; the I/O completion thread enters
>> > nfsd_file_put_local() and drops the last nfsd_net ref, which runs
>> > complete() before nfsd_file_put() has executed. The shutdown thread
>> > then proceeds through nfsd_file_cache_shutdown() concurrently with
>> > the final nfsd_file_free(): the call_rcu() is queued after the
>> > rcu_barrier(), so nfsd_file_slab_free() does kmem_cache_free() into
>> > a destroyed cache, and nfsd_file_mark_put() runs against a destroyed
>> > fsnotify group. kmem_cache_destroy() also fires "objects remaining"
>> > because the nfsd_file is still allocated.
>> > 
>> > The old ordering was the mechanism that prevented this: the caller
>> > held its paired nfsd_net ref across nfsd_file_put(), and percpu_ref
>> > guarantees the release callback runs only after every ref is
>> > dropped, so global teardown strictly followed the file free and the
>> > rcu_barrier() flushed its call_rcu().
>> > 
>> > The hazard the commit message cites for the reorder cannot occur on
>> > this path: NFSD_FILE_NET_HELD is set only in
>> > nfsd_file_dispose_list_delayed(), reachable only through
>> > refcount_dec_if_one() in nfsd_file_lru_cb(), i.e. at refcount == 1.
>> > A file with an outstanding LOCALIO reference has refcount >= 2, so
>> > a file whose final put arrives via nfsd_file_put_local() never has
>> > NET_HELD set and its nfsd_file_free() never calls put_net().
>> > 
>> > Suggest keeping the void API but restoring the put order:
>> > 
>> >       nf = unrcu_pointer(xchg(pnf, NULL));
>> >       if (nf) {
>> >               struct net *net = nf->nf_net;
>> > 
>> >               nfsd_file_put(nf);
>> >               rcu_read_lock();
>> >               nfsd_net_put(net);
>> >               rcu_read_unlock();
>> >       }
>> > 
>> > with the kdoc comment and the commit message paragraph about the
>> > old ordering being unsafe adjusted to match.
>> 
>
>
> I had claude review this and it says:
>
> ● This is the same concern I just addressed for the previous patch's 
> Finding 3, restated as a
>   critical bug. The answer is the same: this is a false positive.
>
>   The reviewer's scenario requires:
>
>   1. The global shrinker unhashes and isolates an nfsd_file onto a 
> local dispose list
>   2. The net namespace teardown completes and struct net is freed
>   3. The shrinker resumes and calls get_net() on freed memory

That rebuttal answers a different finding. The three-step scenario it
refutes is the one sashiko raised against the get_net() placement in
nfsd_file_dispose_list_delayed(). The review quoted above is about
the put order in nfsd_file_put_local() and does not involve struct
net's refcount at any step. Two new issues appear with 8/9:

1. Put order in nfsd_file_put_local()
  
The question is narrow: after an I/O completion thread executes
nfsd_net_put() -- dropping the last nfsd_net_ref and running
complete(&nn->nfsd_net_free_done) -- what prevents nfsd_shutdown_net()
from continuing through percpu_ref_exit() and nfsd_shutdown_generic()
into nfsd_file_cache_shutdown() before that same thread executes the
nfsd_file_put() on the next line?

In the current code the answer is the ref itself: the caller holds it
across nfsd_file_put(), and percpu_ref runs the release callback only
after every ref drops, so global teardown strictly follows the file
free and the rcu_barrier() in nfsd_file_cache_shutdown() flushes the
call_rcu() that nfsd_file_free() queued. The patch removes that
ordering and installs nothing in its place.

The nfs_uuid handshake does not cover this path: each pgio holds its
own nfsd_file + nfsd_net ref pair (fs/nfs/pagelist.c, stored in
iocb->localio), released at I/O completion through
nfsd_file_put_local(), and an iocb is not on nfs_uuid->files. The
purge has already unhashed the file but cannot free it
(nfsd_file_cond_queue() sees the elevated refcount), so the
completion thread's put is the final one and its nfsd_file_free()
races kmem_cache_destroy(nfsd_file_slab).

The stale-net hazard the commit message cites cannot occur on this
path: NFSD_FILE_NET_HELD is set only via refcount_dec_if_one() in
nfsd_file_lru_cb(), i.e. at refcount == 1, and a file with an
outstanding LOCALIO reference has refcount >= 2. So the fix is to
keep the void API but put the file ref first, then the net ref.

2. get_net() placement in nfsd_file_dispose_list_delayed()

The rebuttal's struct-net argument does not hold for the files in
question: they are no longer on the LRU (nfsd_file_lru_cb() unhashed
and isolated them onto the worker's private list), and an nfsd_file
holds no net reference -- that absence is what this patch is fixing.

The commit message itself states the premise: the purge cannot see
these files and nf_net can dangle. With a second nfsd-serving netns
keeping nfsd_users > 0, nothing quiesces the shrinker or the
laundrette during per-net teardown (cancel_delayed_work_sync() and
shrinker_free() run only in global shutdown).

A worker preempted between isolating a file and calling
dispose_list_delayed() can resume after cleanup_net() has freed the
namespace, at which point get_net(nf->nf_net), net_generic(), and
the fcache_dispose_lock all touch freed memory. The get_net() sits
at the consuming end of the window it is meant to close. v1 took
the reference in nfsd_file_alloc(), the one place guaranteed to run
while the net is alive; the v2 plan of taking it at alloc time for
GC-capable files only would address Al's cacheline concern without
reopening the window.


As a process note, I'll note that I haven't found any non-pre-
existing issues with the other patches on this series.


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 8/9] nfsd: hold net namespace reference for delayed-dispose nfsd_files
  2026-06-03 18:20       ` Chuck Lever
@ 2026-06-03 19:15         ` Jeff Layton
  0 siblings, 0 replies; 15+ messages in thread
From: Jeff Layton @ 2026-06-03 19:15 UTC (permalink / raw)
  To: Chuck Lever, Chuck Lever, NeilBrown, Olga Kornievskaia, Dai Ngo,
	Tom Talpey, Lorenzo Bianconi, Anna Schumaker, Trond Myklebust,
	Anna Schumaker, Mike Snitzer
  Cc: Alexander Viro, Chris Mason, linux-nfs, linux-kernel,
	Trond Myklebust

On Wed, 2026-06-03 at 11:20 -0700, Chuck Lever wrote:
> 
> On Wed, Jun 3, 2026, at 10:50 AM, Jeff Layton wrote:
> > On Wed, 2026-06-03 at 10:33 -0700, Chuck Lever wrote:
> > > 
> > > On Tue, Jun 2, 2026, at 9:23 AM, Jeff Layton wrote:
> > > > Take a net-namespace reference in nfsd_file_dispose_list_delayed()
> > > > (get_net) and release it in nfsd_file_free() (put_net), so that
> > > > nf_net is always valid for files that the GC or shrinker has isolated
> > > > from the hash table and LRU -- which __nfsd_file_cache_purge() cannot
> > > > see.
> > > > 
> > > > Without this, nf_net can dangle for in-flight files whose net namespace
> > > > is torn down concurrently, causing a use-after-free when
> > > > nfsd_file_dispose_list_delayed() calls net_generic(nf->nf_net, ...).
> > > > 
> > > > Only files entering the delayed-dispose path need the net reference;
> > > > files freed synchronously (non-GC stateful opens, purge, direct put)
> > > > are always freed while the net namespace is still alive, so they skip
> > > > get_net()/put_net() entirely.  A new NFSD_FILE_NET_HELD flag tracks
> > > > whether a given nfsd_file holds a net reference.
> > > > 
> > > > Because nfsd_file_free() may now call put_net(nf->nf_net), the old
> > > > nfsd_file_put_local() pattern of returning nf->nf_net after
> > > > nfsd_file_put() is unsafe -- put_net() could theoretically drop the
> > > > last net namespace reference, leaving the returned pointer stale.
> > > > Fix this by moving the nfsd_net_put() call into nfsd_file_put_local()
> > > > itself, before the nfsd_file_put() that may trigger nfsd_file_free().
> > > > The function now returns void and the caller no longer needs to handle
> > > > the net reference.
> > > > 
> > > > Fixes: 43fd953fa7e2 ("nfsd: simplify the delayed disposal list code")
> > > > Assisted-by: Claude:claude-opus-4-6
> > > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > > ---
> > > >  fs/nfsd/filecache.c        | 34 ++++++++++++++++++++++++++--------
> > > >  fs/nfsd/filecache.h        |  3 ++-
> > > >  fs/nfsd/localio.c          |  4 ++--
> > > >  include/linux/nfslocalio.h |  9 ++-------
> > > >  4 files changed, 32 insertions(+), 18 deletions(-)
> > > > 
> > > > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> > > > index 03f01a0beced..957fe57be063 100644
> > > > --- a/fs/nfsd/filecache.c
> > > > +++ b/fs/nfsd/filecache.c
> > > > @@ -295,6 +295,9 @@ nfsd_file_free(struct nfsd_file *nf)
> > > >  	if (WARN_ON_ONCE(!list_empty(&nf->nf_lru)))
> > > >  		return;
> > > > 
> > > > +	if (test_bit(NFSD_FILE_NET_HELD, &nf->nf_flags))
> > > > +		put_net(nf->nf_net);
> > > > +
> > > >  	call_rcu(&nf->nf_rcu, nfsd_file_slab_free);
> > > >  }
> > > > 
> > > > @@ -375,24 +378,28 @@ nfsd_file_put(struct nfsd_file *nf)
> > > >  }
> > > > 
> > > >  /**
> > > > - * nfsd_file_put_local - put nfsd_file reference and arm nfsd_net_put in caller
> > > > + * nfsd_file_put_local - put nfsd_file reference and release nfsd_net ref
> > > >   * @pnf: nfsd_file of which to put the reference
> > > >   *
> > > > - * First save the associated net to return to caller, then put
> > > > - * the reference of the nfsd_file.
> > > > + * Drops both the nfsd_file reference and the associated nfsd_net
> > > > + * reference.  The nfsd_net ref is released before the file ref so
> > > > + * that put_net() inside nfsd_file_free() cannot drop the last net
> > > > + * namespace reference while the caller still needs it.
> > > >   */
> > > > -struct net *
> > > > +void
> > > >  nfsd_file_put_local(struct nfsd_file __rcu **pnf)
> > > >  {
> > > >  	struct nfsd_file *nf;
> > > > -	struct net *net = NULL;
> > > > 
> > > >  	nf = unrcu_pointer(xchg(pnf, NULL));
> > > >  	if (nf) {
> > > > -		net = nf->nf_net;
> > > > +		struct net *net = nf->nf_net;
> > > > +
> > > > +		rcu_read_lock();
> > > > +		nfsd_net_put(net);
> > > > +		rcu_read_unlock();
> > > >  		nfsd_file_put(nf);
> > > >  	}
> > > > -	return net;
> > > >  }
> > > > 
> > > >  /**
> > > > @@ -433,9 +440,20 @@ nfsd_file_dispose_list_delayed(struct list_head *dispose)
> > > >  	while (!list_empty(dispose)) {
> > > >  		struct nfsd_file *nf = list_first_entry(dispose,
> > > >  						struct nfsd_file, nf_gc);
> > > > -		struct nfsd_net *nn = net_generic(nf->nf_net, nfsd_net_id);
> > > > +		struct nfsd_net *nn;
> > > >  		struct svc_serv *serv;
> > > > 
> > > > +		/*
> > > > +		 * Pin the net namespace so nf_net stays valid while the
> > > > +		 * file sits on the per-net dispose list.  Callers (the GC
> > > > +		 * worker, shrinker, and fsnotify callbacks) always run
> > > > +		 * before nfsd_net_exit(), so nf_net is still live here.
> > > > +		 * The matching put_net() is in nfsd_file_free().
> > > > +		 */
> > > > +		get_net(nf->nf_net);
> > > > +		set_bit(NFSD_FILE_NET_HELD, &nf->nf_flags);
> > > > +
> > > > +		nn = net_generic(nf->nf_net, nfsd_net_id);
> > > >  		spin_lock(&nn->fcache_dispose_lock);
> > > >  		list_move_tail(&nf->nf_gc, &nn->fcache_dispose_list);
> > > >  		spin_unlock(&nn->fcache_dispose_lock);
> > > > diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
> > > > index 683b6437cacc..7ae3c0ea0a2a 100644
> > > > --- a/fs/nfsd/filecache.h
> > > > +++ b/fs/nfsd/filecache.h
> > > > @@ -45,6 +45,7 @@ struct nfsd_file {
> > > >  #define NFSD_FILE_REFERENCED	(2)
> > > >  #define NFSD_FILE_GC		(3)
> > > >  #define NFSD_FILE_RECENT	(4)
> > > > +#define NFSD_FILE_NET_HELD	(5)
> > > >  	unsigned long		nf_flags;
> > > >  	refcount_t		nf_ref;
> > > >  	unsigned char		nf_may;
> > > > @@ -66,7 +67,7 @@ void nfsd_file_cache_shutdown(void);
> > > >  int nfsd_file_cache_start_net(struct net *net);
> > > >  void nfsd_file_cache_shutdown_net(struct net *net);
> > > >  void nfsd_file_put(struct nfsd_file *nf);
> > > > -struct net *nfsd_file_put_local(struct nfsd_file __rcu **nf);
> > > > +void nfsd_file_put_local(struct nfsd_file __rcu **nf);
> > > >  struct nfsd_file *nfsd_file_get(struct nfsd_file *nf);
> > > >  struct file *nfsd_file_file(struct nfsd_file *nf);
> > > >  void nfsd_file_close_inode_sync(struct inode *inode);
> > > > diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
> > > > index c3eb0557b3e1..e3295bae75a4 100644
> > > > --- a/fs/nfsd/localio.c
> > > > +++ b/fs/nfsd/localio.c
> > > > @@ -40,8 +40,8 @@
> > > >   * avoid all the NFS overhead with reads, writes and commits.
> > > >   *
> > > >   * On successful return, returned nfsd_file will have its nf_net member
> > > > - * set. Caller (NFS client) is responsible for calling nfsd_net_put and
> > > > - * nfsd_file_put (via nfs_to_nfsd_file_put_local).
> > > > + * set. Caller (NFS client) is responsible for calling nfsd_file_put
> > > > + * (via nfs_to_nfsd_file_put_local), which also releases the nfsd_net 
> > > > ref.
> > > >   */
> > > >  static struct nfsd_file *
> > > >  nfsd_open_local_fh(struct net *net, struct auth_domain *dom,
> > > > diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h
> > > > index 3d91043254e6..7267a69092d1 100644
> > > > --- a/include/linux/nfslocalio.h
> > > > +++ b/include/linux/nfslocalio.h
> > > > @@ -62,7 +62,7 @@ struct nfsd_localio_operations {
> > > >  						const struct nfs_fh *,
> > > >  						struct nfsd_file __rcu **pnf,
> > > >  						const fmode_t);
> > > > -	struct net *(*nfsd_file_put_local)(struct nfsd_file __rcu **);
> > > > +	void (*nfsd_file_put_local)(struct nfsd_file __rcu **);
> > > >  	struct file *(*nfsd_file_file)(struct nfsd_file *);
> > > >  	void (*nfsd_file_dio_alignment)(struct nfsd_file *,
> > > >  					u32 *, u32 *, u32 *);
> > > > @@ -96,12 +96,7 @@ static inline void nfs_to_nfsd_file_put_local(struct 
> > > > nfsd_file __rcu **localio)
> > > >  	 * must prevent nfsd shutdown from completing as nfs_close_local_fh()
> > > >  	 * does by blocking the nfs_uuid from being finally put.
> > > >  	 */
> > > > -	struct net *net;
> > > > -
> > > > -	net = nfs_to->nfsd_file_put_local(localio);
> > > > -
> > > > -	if (net)
> > > > -		nfs_to_nfsd_net_put(net);
> > > > +	nfs_to->nfsd_file_put_local(localio);
> > > >  }
> > > > 
> > > >  #else   /* CONFIG_NFS_LOCALIO */
> > > > 
> > > > -- 
> > > > 2.54.0
> > > 
> > > It seems that all of the LLM reviewers have difficulty with this patch.
> > > This is a consolidated review of the issue from Claude and Codex:
> > > 
> > > > The reordering in nfsd_file_put_local() -- nfsd_net_put() before
> > > > nfsd_file_put() -- introduces a shutdown race.
> > > > 
> > > > The nfsd_net_ref percpu refcount is taken only by LOCALIO
> > > > (nfsd_open_local_fh() and nfs_open_local_fh()). The drain wait in
> > > > nfsd_shutdown_net() (wait_for_completion(&nn->nfsd_net_free_done))
> > > > is what holds off percpu_ref_exit() and nfsd_shutdown_generic() --
> > > > and through the latter, nfsd_file_cache_shutdown(), which runs
> > > > rcu_barrier() and then destroys nfsd_file_slab, nfsd_file_mark_slab,
> > > > the fsnotify groups, and the rhltable.
> > > > 
> > > > Per-I/O references are not covered by the nfs_uuid handshake. Each
> > > > pgio call takes its own nfsd_file ref plus a paired nfsd_net ref
> > > > (fs/nfs/pagelist.c, nfs_local_open_fh), stores it in iocb->localio,
> > > > and releases it at completion through nfsd_file_put_local(). An
> > > > iocb is not on nfs_uuid->files, so nfs_localio_invalidate_clients()
> > > > does not wait for it; only the drain wait does. Meanwhile
> > > > __nfsd_file_cache_purge() has already unhashed the nfsd_file but
> > > > cannot free it (the iocb ref keeps refcount elevated in
> > > > nfsd_file_cond_queue()).
> > > > 
> > > > So with one I/O in flight when the server is stopped: the shutdown
> > > > thread parks at the drain wait; the I/O completion thread enters
> > > > nfsd_file_put_local() and drops the last nfsd_net ref, which runs
> > > > complete() before nfsd_file_put() has executed. The shutdown thread
> > > > then proceeds through nfsd_file_cache_shutdown() concurrently with
> > > > the final nfsd_file_free(): the call_rcu() is queued after the
> > > > rcu_barrier(), so nfsd_file_slab_free() does kmem_cache_free() into
> > > > a destroyed cache, and nfsd_file_mark_put() runs against a destroyed
> > > > fsnotify group. kmem_cache_destroy() also fires "objects remaining"
> > > > because the nfsd_file is still allocated.
> > > > 
> > > > The old ordering was the mechanism that prevented this: the caller
> > > > held its paired nfsd_net ref across nfsd_file_put(), and percpu_ref
> > > > guarantees the release callback runs only after every ref is
> > > > dropped, so global teardown strictly followed the file free and the
> > > > rcu_barrier() flushed its call_rcu().
> > > > 
> > > > The hazard the commit message cites for the reorder cannot occur on
> > > > this path: NFSD_FILE_NET_HELD is set only in
> > > > nfsd_file_dispose_list_delayed(), reachable only through
> > > > refcount_dec_if_one() in nfsd_file_lru_cb(), i.e. at refcount == 1.
> > > > A file with an outstanding LOCALIO reference has refcount >= 2, so
> > > > a file whose final put arrives via nfsd_file_put_local() never has
> > > > NET_HELD set and its nfsd_file_free() never calls put_net().
> > > > 
> > > > Suggest keeping the void API but restoring the put order:
> > > > 
> > > >       nf = unrcu_pointer(xchg(pnf, NULL));
> > > >       if (nf) {
> > > >               struct net *net = nf->nf_net;
> > > > 
> > > >               nfsd_file_put(nf);
> > > >               rcu_read_lock();
> > > >               nfsd_net_put(net);
> > > >               rcu_read_unlock();
> > > >       }
> > > > 
> > > > with the kdoc comment and the commit message paragraph about the
> > > > old ordering being unsafe adjusted to match.
> > > 
> > 
> > 
> > I had claude review this and it says:
> > 
> > ● This is the same concern I just addressed for the previous patch's 
> > Finding 3, restated as a
> >   critical bug. The answer is the same: this is a false positive.
> > 
> >   The reviewer's scenario requires:
> > 
> >   1. The global shrinker unhashes and isolates an nfsd_file onto a 
> > local dispose list
> >   2. The net namespace teardown completes and struct net is freed
> >   3. The shrinker resumes and calls get_net() on freed memory
> 
> That rebuttal answers a different finding. The three-step scenario it
> refutes is the one sashiko raised against the get_net() placement in
> nfsd_file_dispose_list_delayed(). The review quoted above is about
> the put order in nfsd_file_put_local() and does not involve struct
> net's refcount at any step. Two new issues appear with 8/9:
> 
> 1. Put order in nfsd_file_put_local()
>   
> The question is narrow: after an I/O completion thread executes
> nfsd_net_put() -- dropping the last nfsd_net_ref and running
> complete(&nn->nfsd_net_free_done) -- what prevents nfsd_shutdown_net()
> from continuing through percpu_ref_exit() and nfsd_shutdown_generic()
> into nfsd_file_cache_shutdown() before that same thread executes the
> nfsd_file_put() on the next line?
> 
> In the current code the answer is the ref itself: the caller holds it
> across nfsd_file_put(), and percpu_ref runs the release callback only
> after every ref drops, so global teardown strictly follows the file
> free and the rcu_barrier() in nfsd_file_cache_shutdown() flushes the
> call_rcu() that nfsd_file_free() queued. The patch removes that
> ordering and installs nothing in its place.
> 
> The nfs_uuid handshake does not cover this path: each pgio holds its
> own nfsd_file + nfsd_net ref pair (fs/nfs/pagelist.c, stored in
> iocb->localio), released at I/O completion through
> nfsd_file_put_local(), and an iocb is not on nfs_uuid->files. The
> purge has already unhashed the file but cannot free it
> (nfsd_file_cond_queue() sees the elevated refcount), so the
> completion thread's put is the final one and its nfsd_file_free()
> races kmem_cache_destroy(nfsd_file_slab).
> 
> The stale-net hazard the commit message cites cannot occur on this
> path: NFSD_FILE_NET_HELD is set only via refcount_dec_if_one() in
> nfsd_file_lru_cb(), i.e. at refcount == 1, and a file with an
> outstanding LOCALIO reference has refcount >= 2. So the fix is to
> keep the void API but put the file ref first, then the net ref.
> 
> 2. get_net() placement in nfsd_file_dispose_list_delayed()
> 
> The rebuttal's struct-net argument does not hold for the files in
> question: they are no longer on the LRU (nfsd_file_lru_cb() unhashed
> and isolated them onto the worker's private list), and an nfsd_file
> holds no net reference -- that absence is what this patch is fixing.
> 
> The commit message itself states the premise: the purge cannot see
> these files and nf_net can dangle. With a second nfsd-serving netns
> keeping nfsd_users > 0, nothing quiesces the shrinker or the
> laundrette during per-net teardown (cancel_delayed_work_sync() and
> shrinker_free() run only in global shutdown).
> 
> A worker preempted between isolating a file and calling
> dispose_list_delayed() can resume after cleanup_net() has freed the
> namespace, at which point get_net(nf->nf_net), net_generic(), and
> the fcache_dispose_lock all touch freed memory. The get_net() sits
> at the consuming end of the window it is meant to close. v1 took
> the reference in nfsd_file_alloc(), the one place guaranteed to run
> while the net is alive; the v2 plan of taking it at alloc time for
> GC-capable files only would address Al's cacheline concern without
> reopening the window.
> 
> 
> As a process note, I'll note that I haven't found any non-pre-
> existing issues with the other patches on this series.
> 

Ok, I'm convinced, and the fix looks fairly straightforward.

If you're ok with merging the rest of the set, I'll plan to resend this
one separately.

Thanks,
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 0/9] nfsd: fixes for locally-triggerable bugs
  2026-06-02 16:23 [PATCH v2 0/9] nfsd: fixes for locally-triggerable bugs Jeff Layton
                   ` (8 preceding siblings ...)
  2026-06-02 16:23 ` [PATCH v2 9/9] nfsd: unify cleanups in nfsd_cross_mnt() exits Jeff Layton
@ 2026-06-03 20:30 ` Chuck Lever
  9 siblings, 0 replies; 15+ messages in thread
From: Chuck Lever @ 2026-06-03 20:30 UTC (permalink / raw)
  To: NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
	Lorenzo Bianconi, Anna Schumaker, Trond Myklebust, Anna Schumaker,
	Mike Snitzer, Jeff Layton
  Cc: Chuck Lever, Al Viro, Chris Mason, linux-nfs, linux-kernel,
	Trond Myklebust

From: Chuck Lever <chuck.lever@oracle.com>

On Tue, 02 Jun 2026 12:23:12 -0400, Jeff Layton wrote:
> Just some minor changes in this version, plus a cleanup patch from Al.
> 
> These are bugs that Claude classified as locally-triggerable. A couple
> can be triggered by an unprivileged user, but the rest require admin
> access.
> 
> The last 3 patches fix one bug. I originally had a more targeted fix
> that kres generated, but I think it's better to simplify the filecache
> disposal mechanism to get rid of the bug rather than add more
> complexity.
> 
> [...]

Applied to nfsd-testing, thanks!

[1/9] nfsd: defer vfree of compound ops to fix rpc_status UAF
      commit: 45bdeda0ff0e26e43b5c84ead5a8859696df4a24
[2/9] nfsd: hold rcu across localio cmpxchg retry
      commit: 3132933172044d02951470c99c8cbbe54756ae45
[3/9] nfs/localio: fix ref leak on nfs_uuid_add_file failure
      (no commit info)
[4/9] nfsd: guard nfsd_serv deref in nfsd_file_net_dispose
      commit: a6dfbd5e70527b91d610bd4864d9de725b06c5ba
[5/9] nfsd: widen nfsd_genl_rqstp address fields to sockaddr_storage
      commit: a9a83f4a2b3d065f26efb7dd8153fecd55f10622
[6/9] nfsd: fix refcount leak in nfsd_file_lru_add on insertion failure
      commit: d72ae7cbbf14e2f0bc4bc5fecc06c12180fd5b66
[7/9] nfsd: fix fcache_disposal UAF by inlining dispose state into nfsd_net
      commit: fcafdda0423b27637a27594ec81b9b07ab6069e1
[9/9] nfsd: unify cleanups in nfsd_cross_mnt() exits
      commit: 3275806873389963d81e9ddd17d047e7c1812f3b

--
Chuck Lever <chuck.lever@oracle.com>

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2026-06-03 20:30 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-02 16:23 [PATCH v2 0/9] nfsd: fixes for locally-triggerable bugs Jeff Layton
2026-06-02 16:23 ` [PATCH v2 1/9] nfsd: defer vfree of compound ops to fix rpc_status UAF Jeff Layton
2026-06-02 16:23 ` [PATCH v2 2/9] nfsd: hold rcu across localio cmpxchg retry Jeff Layton
2026-06-02 16:23 ` [PATCH v2 3/9] nfs/localio: fix ref leak on nfs_uuid_add_file failure Jeff Layton
2026-06-02 16:23 ` [PATCH v2 4/9] nfsd: guard nfsd_serv deref in nfsd_file_net_dispose Jeff Layton
2026-06-02 16:23 ` [PATCH v2 5/9] nfsd: widen nfsd_genl_rqstp address fields to sockaddr_storage Jeff Layton
2026-06-02 16:23 ` [PATCH v2 6/9] nfsd: fix refcount leak in nfsd_file_lru_add on insertion failure Jeff Layton
2026-06-02 16:23 ` [PATCH v2 7/9] nfsd: fix fcache_disposal UAF by inlining dispose state into nfsd_net Jeff Layton
2026-06-02 16:23 ` [PATCH v2 8/9] nfsd: hold net namespace reference for delayed-dispose nfsd_files Jeff Layton
2026-06-03 17:33   ` Chuck Lever
2026-06-03 17:50     ` Jeff Layton
2026-06-03 18:20       ` Chuck Lever
2026-06-03 19:15         ` Jeff Layton
2026-06-02 16:23 ` [PATCH v2 9/9] nfsd: unify cleanups in nfsd_cross_mnt() exits Jeff Layton
2026-06-03 20:30 ` [PATCH v2 0/9] nfsd: fixes for locally-triggerable bugs Chuck Lever

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.