* [PATCH 01/19] nfs/localio: fix nfsd_file ref leak on nfs_local_doio() init failure
2026-06-09 17:47 [PATCH 00/19] nfsd: more bugfixes Jeff Layton
@ 2026-06-09 17:47 ` Jeff Layton
2026-06-09 17:47 ` [PATCH 02/19] nfsd: clear opcnt on compound arg release to prevent OOB read Jeff Layton
` (17 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Jeff Layton @ 2026-06-09 17:47 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker, Chuck Lever, NeilBrown,
Olga Kornievskaia, Dai Ngo, Tom Talpey, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
Christian Brauner, Benjamin Coddington, Donald Hunter,
Lorenzo Bianconi, Qi Zheng, Andrew Morton, Muchun Song
Cc: linux-nfs, linux-kernel, netdev, Jeff Layton
Two early return paths in nfs_local_doio() fail to release the localio
(nfsd_file) reference passed in by the caller:
- When hdr->args.count is zero, the function returns 0 without calling
nfs_local_file_put().
- When nfs_local_iocb_init() fails (e.g. -ENOMEM from allocation or
-EOPNOTSUPP if the file lacks read_iter/write_iter), the function
returns the error without releasing localio or completing the hdr
lifecycle.
A leaked nfsd_file pins the associated net namespace reference,
blocking network namespace teardown, and holds a reference on the
exported filesystem, preventing unmount.
Fix the zero-count path by adding the missing nfs_local_file_put()
call. Fix the iocb init failure path by jumping to a new cleanup label
that releases localio, sets hdr->task.tk_status, and calls
nfs_local_hdr_release() -- matching the existing error handling pattern
for the post-iocb error path.
Fixes: e77c464c31b3 ("nfs/nfsd: add "local io" support")
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
fs/nfs/localio.c | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)
diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index e55c5977fcc3..63cf6e2cc745 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -970,12 +970,16 @@ int nfs_local_doio(struct nfs_client *clp, struct nfsd_file *localio,
struct nfs_local_kiocb *iocb;
int status = 0;
- if (!hdr->args.count)
+ if (!hdr->args.count) {
+ nfs_local_file_put(localio);
return 0;
+ }
iocb = nfs_local_iocb_init(hdr, localio);
- if (IS_ERR(iocb))
- return PTR_ERR(iocb);
+ if (IS_ERR(iocb)) {
+ status = PTR_ERR(iocb);
+ goto out_put_localio;
+ }
switch (hdr->rw_mode) {
case FMODE_READ:
@@ -996,6 +1000,12 @@ int nfs_local_doio(struct nfs_client *clp, struct nfsd_file *localio,
nfs_local_hdr_release(hdr, call_ops);
}
return status;
+
+out_put_localio:
+ nfs_local_file_put(localio);
+ hdr->task.tk_status = status;
+ nfs_local_hdr_release(hdr, call_ops);
+ return status;
}
static void
--
2.54.0
^ permalink raw reply related [flat|nested] 20+ messages in thread* [PATCH 02/19] nfsd: clear opcnt on compound arg release to prevent OOB read
2026-06-09 17:47 [PATCH 00/19] nfsd: more bugfixes Jeff Layton
2026-06-09 17:47 ` [PATCH 01/19] nfs/localio: fix nfsd_file ref leak on nfs_local_doio() init failure Jeff Layton
@ 2026-06-09 17:47 ` Jeff Layton
2026-06-09 17:47 ` [PATCH 03/19] nfsd: add missing read barrier to rpc_status_get dumpit seqcount retry Jeff Layton
` (16 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Jeff Layton @ 2026-06-09 17:47 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker, Chuck Lever, NeilBrown,
Olga Kornievskaia, Dai Ngo, Tom Talpey, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
Christian Brauner, Benjamin Coddington, Donald Hunter,
Lorenzo Bianconi, Qi Zheng, Andrew Morton, Muchun Song
Cc: linux-nfs, linux-kernel, netdev, Jeff Layton
nfsd4_release_compoundargs() resets args->ops to the inline iops[8]
array when the dynamically-allocated ops buffer is freed, but leaves
args->opcnt at its original value (which can be up to 200 for NFSv4.1+
compounds).
If rq_status_counter is stuck at an odd value (which can happen when
nfsd_dispatch() hits an error path after setting it odd), the RPC
status dumpit handler reads min(opcnt, 16) entries from args->ops[].
Since iops only has 8 elements and is the last field in struct
nfsd4_compoundargs, reading indices 8-15 accesses adjacent slab memory
and leaks it to userspace via netlink.
Zero opcnt unconditionally in nfsd4_release_compoundargs() so stale
compound metadata is never exposed through the status interface.
Fixes: bd9d6a3efa97 ("NFSD: add rpc_status netlink support")
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
fs/nfsd/nfs4xdr.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index b9037d99b564..1e4a51926910 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -6440,6 +6440,7 @@ void nfsd4_release_compoundargs(struct svc_rqst *rqstp)
args->ops = args->iops;
kvfree_rcu_mightsleep(old_ops);
}
+ args->opcnt = 0;
while (args->to_free) {
struct svcxdr_tmpbuf *tb = args->to_free;
args->to_free = tb->next;
--
2.54.0
^ permalink raw reply related [flat|nested] 20+ messages in thread* [PATCH 03/19] nfsd: add missing read barrier to rpc_status_get dumpit seqcount retry
2026-06-09 17:47 [PATCH 00/19] nfsd: more bugfixes Jeff Layton
2026-06-09 17:47 ` [PATCH 01/19] nfs/localio: fix nfsd_file ref leak on nfs_local_doio() init failure Jeff Layton
2026-06-09 17:47 ` [PATCH 02/19] nfsd: clear opcnt on compound arg release to prevent OOB read Jeff Layton
@ 2026-06-09 17:47 ` Jeff Layton
2026-06-09 17:47 ` [PATCH 04/19] nfsd: fix netlink dumpit error handling for rpc_status_get Jeff Layton
` (15 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Jeff Layton @ 2026-06-09 17:47 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker, Chuck Lever, NeilBrown,
Olga Kornievskaia, Dai Ngo, Tom Talpey, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
Christian Brauner, Benjamin Coddington, Donald Hunter,
Lorenzo Bianconi, Qi Zheng, Andrew Morton, Muchun Song
Cc: linux-nfs, linux-kernel, netdev, Jeff Layton
The hand-rolled seqcount-like protocol in nfsd_nl_rpc_status_get_dumpit()
is missing a read memory barrier (smp_rmb) before its second counter
check. The standard kernel read_seqcount_retry() includes smp_rmb()
to ensure that all data reads complete before the counter is re-checked.
Without this barrier, on weakly-ordered architectures (ARM, POWER),
the CPU may reorder field reads past the second counter check, making
the retry logic ineffective: it could observe a consistent counter pair
while reading fields that have been concurrently modified by the writer.
Add smp_rmb() before the second smp_load_acquire() to match the
barrier semantics of the standard seqcount read-side.
Fixes: ac18892ea3f7 ("NFSD: add rpc_status netlink support")
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
fs/nfsd/nfsctl.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index c06d25c06f06..a4b5b1467fe2 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -1576,9 +1576,11 @@ int nfsd_nl_rpc_status_get_dumpit(struct sk_buff *skb,
#endif /* CONFIG_NFSD_V4 */
/*
- * Acquire rq_status_counter before reporting the rqst
- * fields to the user.
+ * Ensure all field reads complete before re-checking
+ * the status counter. Pairs with the smp_store_release
+ * in nfsd_dispatch to form a seq-lock like protocol.
*/
+ smp_rmb();
if (smp_load_acquire(&rqstp->rq_status_counter) !=
status_counter)
continue;
--
2.54.0
^ permalink raw reply related [flat|nested] 20+ messages in thread* [PATCH 04/19] nfsd: fix netlink dumpit error handling for rpc_status_get
2026-06-09 17:47 [PATCH 00/19] nfsd: more bugfixes Jeff Layton
` (2 preceding siblings ...)
2026-06-09 17:47 ` [PATCH 03/19] nfsd: add missing read barrier to rpc_status_get dumpit seqcount retry Jeff Layton
@ 2026-06-09 17:47 ` Jeff Layton
2026-06-09 17:47 ` [PATCH 05/19] sunrpc: defer rq_argp and rq_resp free until after RCU grace period Jeff Layton
` (14 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Jeff Layton @ 2026-06-09 17:47 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker, Chuck Lever, NeilBrown,
Olga Kornievskaia, Dai Ngo, Tom Talpey, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
Christian Brauner, Benjamin Coddington, Donald Hunter,
Lorenzo Bianconi, Qi Zheng, Andrew Morton, Muchun Song
Cc: linux-nfs, linux-kernel, netdev, Jeff Layton
nfsd_genl_rpc_status_compose_msg() returns -ENOBUFS on nla_put failure
without calling genlmsg_cancel(), leaving a partial message in the skb.
The caller then propagates -ENOBUFS directly, which the netlink dump
infrastructure treats as a fatal error, aborting the entire dump.
The correct netlink dump convention is:
- Cancel any partial message with genlmsg_cancel()
- If prior messages were added to the skb (skb->len > 0), save the
current iterator position and return skb->len to paginate
- Only return a negative errno when no messages fit at all
Fix compose_msg to cancel the partial message on all nla_put failure
paths, and fix the caller to paginate when possible rather than
returning a fatal error.
Fixes: ac18892ea3f7 ("NFSD: add rpc_status netlink support")
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
fs/nfsd/nfsctl.c | 20 +++++++++++++++-----
1 file changed, 15 insertions(+), 5 deletions(-)
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index a4b5b1467fe2..ab10692ee937 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -1452,7 +1452,7 @@ static int nfsd_genl_rpc_status_compose_msg(struct sk_buff *skb,
nla_put_s64(skb, NFSD_A_RPC_STATUS_SERVICE_TIME,
ktime_to_us(genl_rqstp->rq_stime),
NFSD_A_RPC_STATUS_PAD))
- return -ENOBUFS;
+ goto out_cancel;
switch (genl_rqstp->rq_saddr.ss_family) {
case AF_INET: {
@@ -1468,7 +1468,7 @@ static int nfsd_genl_rpc_status_compose_msg(struct sk_buff *skb,
s_in->sin_port) ||
nla_put_be16(skb, NFSD_A_RPC_STATUS_DPORT,
d_in->sin_port))
- return -ENOBUFS;
+ goto out_cancel;
break;
}
case AF_INET6: {
@@ -1484,7 +1484,7 @@ static int nfsd_genl_rpc_status_compose_msg(struct sk_buff *skb,
s_in->sin6_port) ||
nla_put_be16(skb, NFSD_A_RPC_STATUS_DPORT,
d_in->sin6_port))
- return -ENOBUFS;
+ goto out_cancel;
break;
}
}
@@ -1492,10 +1492,14 @@ static int nfsd_genl_rpc_status_compose_msg(struct sk_buff *skb,
for (i = 0; i < genl_rqstp->rq_opcnt; i++)
if (nla_put_u32(skb, NFSD_A_RPC_STATUS_COMPOUND_OPS,
genl_rqstp->rq_opnum[i]))
- return -ENOBUFS;
+ goto out_cancel;
genlmsg_end(skb, hdr);
return 0;
+
+out_cancel:
+ genlmsg_cancel(skb, hdr);
+ return -ENOBUFS;
}
/**
@@ -1587,8 +1591,14 @@ int nfsd_nl_rpc_status_get_dumpit(struct sk_buff *skb,
ret = nfsd_genl_rpc_status_compose_msg(skb, cb,
&genl_rqstp);
- if (ret)
+ if (ret) {
+ if (skb->len) {
+ cb->args[0] = i;
+ cb->args[1] = rqstp_index - 1;
+ ret = skb->len;
+ }
goto out;
+ }
}
}
--
2.54.0
^ permalink raw reply related [flat|nested] 20+ messages in thread* [PATCH 05/19] sunrpc: defer rq_argp and rq_resp free until after RCU grace period
2026-06-09 17:47 [PATCH 00/19] nfsd: more bugfixes Jeff Layton
` (3 preceding siblings ...)
2026-06-09 17:47 ` [PATCH 04/19] nfsd: fix netlink dumpit error handling for rpc_status_get Jeff Layton
@ 2026-06-09 17:47 ` Jeff Layton
2026-06-09 17:47 ` [PATCH 06/19] nfsd: check nfsd4_acl_to_attr() return value in nfsd4_create() Jeff Layton
` (13 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Jeff Layton @ 2026-06-09 17:47 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker, Chuck Lever, NeilBrown,
Olga Kornievskaia, Dai Ngo, Tom Talpey, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
Christian Brauner, Benjamin Coddington, Donald Hunter,
Lorenzo Bianconi, Qi Zheng, Andrew Morton, Muchun Song
Cc: linux-nfs, linux-kernel, netdev, Jeff Layton
svc_rqst_free() frees rqstp->rq_argp and rqstp->rq_resp synchronously
via kfree(), but defers the rqstp struct free via kfree_rcu(). After
svc_exit_thread() calls list_del_rcu() and svc_rqst_free(), there is
a window where RCU readers that started before list_del_rcu() can still
traverse the thread list and find the rqstp. These readers (e.g.
nfsd_nl_rpc_status_get_dumpit()) dereference rqstp->rq_argp, which has
already been freed — a use-after-free.
Fix this by moving the kfree of rq_argp and rq_resp into an explicit
call_rcu() callback alongside the struct free. Resources not accessed
by RCU readers (bvec, buffer pages, scratch folio, auth_data) remain
synchronously freed.
Fixes: 812443865c5f ("sunrpc: add a rcu_head to svc_rqst and use kfree_rcu to free it")
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
net/sunrpc/svc.c | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index 009373737ea9..a7d893bc2d73 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -716,6 +716,15 @@ svc_release_buffer(struct svc_rqst *rqstp)
}
}
+static void svc_rqst_free_rcu(struct rcu_head *head)
+{
+ struct svc_rqst *rqstp = container_of(head, struct svc_rqst, rq_rcu_head);
+
+ kfree(rqstp->rq_resp);
+ kfree(rqstp->rq_argp);
+ kfree(rqstp);
+}
+
static void
svc_rqst_free(struct svc_rqst *rqstp)
{
@@ -724,10 +733,8 @@ svc_rqst_free(struct svc_rqst *rqstp)
svc_release_buffer(rqstp);
if (rqstp->rq_scratch_folio)
folio_put(rqstp->rq_scratch_folio);
- kfree(rqstp->rq_resp);
- kfree(rqstp->rq_argp);
kfree(rqstp->rq_auth_data);
- kfree_rcu(rqstp, rq_rcu_head);
+ call_rcu(&rqstp->rq_rcu_head, svc_rqst_free_rcu);
}
static struct svc_rqst *
--
2.54.0
^ permalink raw reply related [flat|nested] 20+ messages in thread* [PATCH 06/19] nfsd: check nfsd4_acl_to_attr() return value in nfsd4_create()
2026-06-09 17:47 [PATCH 00/19] nfsd: more bugfixes Jeff Layton
` (4 preceding siblings ...)
2026-06-09 17:47 ` [PATCH 05/19] sunrpc: defer rq_argp and rq_resp free until after RCU grace period Jeff Layton
@ 2026-06-09 17:47 ` Jeff Layton
2026-06-09 17:47 ` [PATCH 07/19] nfsd: add filehandle match check to nfsd4_delegreturn() Jeff Layton
` (12 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Jeff Layton @ 2026-06-09 17:47 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker, Chuck Lever, NeilBrown,
Olga Kornievskaia, Dai Ngo, Tom Talpey, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
Christian Brauner, Benjamin Coddington, Donald Hunter,
Lorenzo Bianconi, Qi Zheng, Andrew Morton, Muchun Song
Cc: linux-nfs, linux-kernel, netdev, Jeff Layton
nfsd4_create() stores the return value of nfsd4_acl_to_attr() in
status, but the switch(create->cr_type) block unconditionally
overwrites it in every branch. ACL translation errors are silently
discarded, and the CREATE proceeds without the requested ACL.
Add an early exit check after nfsd4_acl_to_attr(), matching the
pattern already used in nfsd4_setattr().
Fixes: 4c10614c7b47 ("NFSD: move setting of ACLs into nfsd_setattr()")
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
fs/nfsd/nfs4proc.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 0c37d7c6d28c..69fee481581d 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -855,6 +855,8 @@ nfsd4_create(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
}
status = nfsd4_acl_to_attr(create->cr_type, create->cr_acl,
&attrs);
+ if (status != nfs_ok)
+ goto out_aftermask;
}
current->fs->umask = create->cr_umask;
switch (create->cr_type) {
--
2.54.0
^ permalink raw reply related [flat|nested] 20+ messages in thread* [PATCH 07/19] nfsd: add filehandle match check to nfsd4_delegreturn()
2026-06-09 17:47 [PATCH 00/19] nfsd: more bugfixes Jeff Layton
` (5 preceding siblings ...)
2026-06-09 17:47 ` [PATCH 06/19] nfsd: check nfsd4_acl_to_attr() return value in nfsd4_create() Jeff Layton
@ 2026-06-09 17:47 ` Jeff Layton
2026-06-09 17:47 ` [PATCH 08/19] nfsd: validate nseconds in TIME_DELEG decode paths Jeff Layton
` (11 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Jeff Layton @ 2026-06-09 17:47 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker, Chuck Lever, NeilBrown,
Olga Kornievskaia, Dai Ngo, Tom Talpey, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
Christian Brauner, Benjamin Coddington, Donald Hunter,
Lorenzo Bianconi, Qi Zheng, Andrew Morton, Muchun Song
Cc: linux-nfs, linux-kernel, netdev, Jeff Layton
nfsd4_delegreturn() is the only stateful NFSv4 operation that does
not call nfs4_check_fh() to verify the delegation's file matches
cstate->current_fh. A client can DELEGRETURN with a mismatched
filehandle, destroying the correct delegation but waking the wrong
inode's waiters.
Add the missing nfs4_check_fh() call after the generation check.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
fs/nfsd/nfs4state.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index c88637406773..19aab4c52548 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -8079,6 +8079,10 @@ nfsd4_delegreturn(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
if (status)
goto put_stateid;
+ status = nfs4_check_fh(&cstate->current_fh, &dp->dl_stid);
+ if (status)
+ goto put_stateid;
+
trace_nfsd_deleg_return(stateid);
destroy_delegation(dp);
smp_mb__after_atomic();
--
2.54.0
^ permalink raw reply related [flat|nested] 20+ messages in thread* [PATCH 08/19] nfsd: validate nseconds in TIME_DELEG decode paths
2026-06-09 17:47 [PATCH 00/19] nfsd: more bugfixes Jeff Layton
` (6 preceding siblings ...)
2026-06-09 17:47 ` [PATCH 07/19] nfsd: add filehandle match check to nfsd4_delegreturn() Jeff Layton
@ 2026-06-09 17:47 ` Jeff Layton
2026-06-09 17:47 ` [PATCH 09/19] nfsd: remove premature NFS4_OO_CONFIRMED in CLAIM_PREVIOUS path Jeff Layton
` (10 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Jeff Layton @ 2026-06-09 17:47 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker, Chuck Lever, NeilBrown,
Olga Kornievskaia, Dai Ngo, Tom Talpey, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
Christian Brauner, Benjamin Coddington, Donald Hunter,
Lorenzo Bianconi, Qi Zheng, Andrew Morton, Muchun Song
Cc: linux-nfs, linux-kernel, netdev, Jeff Layton
The xdrgen-based TIME_DELEG_ACCESS and TIME_DELEG_MODIFY decode arms
store a raw uint32_t nseconds directly into tv_nsec without enforcing
nseconds < NSEC_PER_SEC. The legacy nfsd4_decode_nfstime4 has this
check but the TIME_DELEG paths do not. A malformed timespec can
propagate through notify_change() to disk.
Add range checks in both nfs4xdr.c (SETATTR path) and
nfs4callback.c (CB_GETATTR path).
Fixes: 6ae30d6eb26b ("nfsd: add support for delegated timestamps")
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
fs/nfsd/nfs4callback.c | 4 ++++
fs/nfsd/nfs4xdr.c | 4 ++++
2 files changed, 8 insertions(+)
diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
index 1628bb9ef9dd..7c868afc329e 100644
--- a/fs/nfsd/nfs4callback.c
+++ b/fs/nfsd/nfs4callback.c
@@ -108,6 +108,8 @@ static int decode_cb_fattr4(struct xdr_stream *xdr, uint32_t *bitmap,
if (!xdrgen_decode_fattr4_time_deleg_access(xdr, &access))
return -EIO;
+ if (access.nseconds >= NSEC_PER_SEC)
+ return -EIO;
fattr->ncf_cb_atime.tv_sec = access.seconds;
fattr->ncf_cb_atime.tv_nsec = access.nseconds;
@@ -117,6 +119,8 @@ static int decode_cb_fattr4(struct xdr_stream *xdr, uint32_t *bitmap,
if (!xdrgen_decode_fattr4_time_deleg_modify(xdr, &modify))
return -EIO;
+ if (modify.nseconds >= NSEC_PER_SEC)
+ return -EIO;
fattr->ncf_cb_mtime.tv_sec = modify.seconds;
fattr->ncf_cb_mtime.tv_nsec = modify.nseconds;
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 1e4a51926910..056a8df3fd50 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -637,6 +637,8 @@ nfsd4_decode_fattr4(struct nfsd4_compoundargs *argp, u32 *bmval, u32 bmlen,
if (!xdrgen_decode_fattr4_time_deleg_access(argp->xdr, &access))
return nfserr_bad_xdr;
+ if (access.nseconds >= NSEC_PER_SEC)
+ return nfserr_inval;
iattr->ia_atime.tv_sec = access.seconds;
iattr->ia_atime.tv_nsec = access.nseconds;
iattr->ia_valid |= ATTR_ATIME | ATTR_ATIME_SET | ATTR_DELEG;
@@ -646,6 +648,8 @@ nfsd4_decode_fattr4(struct nfsd4_compoundargs *argp, u32 *bmval, u32 bmlen,
if (!xdrgen_decode_fattr4_time_deleg_modify(argp->xdr, &modify))
return nfserr_bad_xdr;
+ if (modify.nseconds >= NSEC_PER_SEC)
+ return nfserr_inval;
iattr->ia_mtime.tv_sec = modify.seconds;
iattr->ia_mtime.tv_nsec = modify.nseconds;
iattr->ia_ctime.tv_sec = modify.seconds;
--
2.54.0
^ permalink raw reply related [flat|nested] 20+ messages in thread* [PATCH 09/19] nfsd: remove premature NFS4_OO_CONFIRMED in CLAIM_PREVIOUS path
2026-06-09 17:47 [PATCH 00/19] nfsd: more bugfixes Jeff Layton
` (7 preceding siblings ...)
2026-06-09 17:47 ` [PATCH 08/19] nfsd: validate nseconds in TIME_DELEG decode paths Jeff Layton
@ 2026-06-09 17:47 ` Jeff Layton
2026-06-09 17:47 ` [PATCH 10/19] nfsd: fix version mismatch loops in nfsd_acl_init_request() Jeff Layton
` (9 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Jeff Layton @ 2026-06-09 17:47 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker, Chuck Lever, NeilBrown,
Olga Kornievskaia, Dai Ngo, Tom Talpey, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
Christian Brauner, Benjamin Coddington, Donald Hunter,
Lorenzo Bianconi, Qi Zheng, Andrew Morton, Muchun Song
Cc: linux-nfs, linux-kernel, netdev, Jeff Layton
nfsd4_open() sets NFS4_OO_CONFIRMED on the openowner before calling
do_open_fhandle(), which can fail. If it fails, the openowner stays
permanently confirmed despite the OPEN failing. The correct
success-path setter already exists in init_open_stateid().
Remove the premature setter. NFSv4.1+ is unaffected as sessions
always confirm at creation time.
Fixes: a525825df152 ("[PATCH] nfsd4: handle replays of failed open reclaims")
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
fs/nfsd/nfs4proc.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 69fee481581d..4fe46996c8ed 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -643,7 +643,6 @@ nfsd4_open(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
status = nfs4_check_open_reclaim(cstate->clp);
if (status)
goto out;
- open->op_openowner->oo_flags |= NFS4_OO_CONFIRMED;
reclaim = true;
fallthrough;
case NFS4_OPEN_CLAIM_FH:
--
2.54.0
^ permalink raw reply related [flat|nested] 20+ messages in thread* [PATCH 10/19] nfsd: fix version mismatch loops in nfsd_acl_init_request()
2026-06-09 17:47 [PATCH 00/19] nfsd: more bugfixes Jeff Layton
` (8 preceding siblings ...)
2026-06-09 17:47 ` [PATCH 09/19] nfsd: remove premature NFS4_OO_CONFIRMED in CLAIM_PREVIOUS path Jeff Layton
@ 2026-06-09 17:47 ` Jeff Layton
2026-06-09 17:47 ` [PATCH 11/19] nfsd: fix FL_SLEEP being set unconditionally for all LOCK types Jeff Layton
` (8 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Jeff Layton @ 2026-06-09 17:47 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker, Chuck Lever, NeilBrown,
Olga Kornievskaia, Dai Ngo, Tom Talpey, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
Christian Brauner, Benjamin Coddington, Donald Hunter,
Lorenzo Bianconi, Qi Zheng, Andrew Morton, Muchun Song
Cc: linux-nfs, linux-kernel, netdev, Jeff Layton
The loops that compute the supported version range for PROG_MISMATCH
test nfsd_support_acl_version(rqstp->rq_vers) instead of
nfsd_support_acl_version(i), so every iteration fails and the
function returns rpc_prog_unavail instead of rpc_prog_mismatch.
Replace rqstp->rq_vers with the loop variable i, matching the
pattern used by the sibling nfsd_init_request() function.
Fixes: e333f3bbefe3 ("nfsd: Allow containers to set supported nfs versions")
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
fs/nfsd/nfssvc.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index e45d46089959..d47451125761 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -815,7 +815,7 @@ nfsd_acl_init_request(struct svc_rqst *rqstp,
ret->mismatch.lovers = NFSD_ACL_NRVERS;
for (i = NFSD_ACL_MINVERS; i < NFSD_ACL_NRVERS; i++) {
- if (nfsd_support_acl_version(rqstp->rq_vers) &&
+ if (nfsd_support_acl_version(i) &&
nfsd_vers(nn, i, NFSD_TEST)) {
ret->mismatch.lovers = i;
break;
@@ -825,7 +825,7 @@ nfsd_acl_init_request(struct svc_rqst *rqstp,
return rpc_prog_unavail;
ret->mismatch.hivers = NFSD_ACL_MINVERS;
for (i = NFSD_ACL_NRVERS - 1; i >= NFSD_ACL_MINVERS; i--) {
- if (nfsd_support_acl_version(rqstp->rq_vers) &&
+ if (nfsd_support_acl_version(i) &&
nfsd_vers(nn, i, NFSD_TEST)) {
ret->mismatch.hivers = i;
break;
--
2.54.0
^ permalink raw reply related [flat|nested] 20+ messages in thread* [PATCH 11/19] nfsd: fix FL_SLEEP being set unconditionally for all LOCK types
2026-06-09 17:47 [PATCH 00/19] nfsd: more bugfixes Jeff Layton
` (9 preceding siblings ...)
2026-06-09 17:47 ` [PATCH 10/19] nfsd: fix version mismatch loops in nfsd_acl_init_request() Jeff Layton
@ 2026-06-09 17:47 ` Jeff Layton
2026-06-09 17:47 ` [PATCH 12/19] nfsd: add fh_want_write() for early-verified SETATTR in nfsd_proc_setattr() Jeff Layton
` (7 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Jeff Layton @ 2026-06-09 17:47 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker, Chuck Lever, NeilBrown,
Olga Kornievskaia, Dai Ngo, Tom Talpey, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
Christian Brauner, Benjamin Coddington, Donald Hunter,
Lorenzo Bianconi, Qi Zheng, Andrew Morton, Muchun Song
Cc: linux-nfs, linux-kernel, netdev, Jeff Layton
The FL_SLEEP guard uses lk_type & (NFS4_READW_LT | NFS4_WRITEW_LT) which
computes lk_type & 7, non-zero for all valid lock types including
non-blocking ones. This was introduced by commit 7e64c5bc497c
("NLM/NFSD: Fix lock notifications for async-capable filesystems") when
refactoring from per-case switch arms.
Replace the bitmask test with explicit equality checks.
Fixes: 7e64c5bc497c ("NLM/NFSD: Fix lock notifications for async-capable filesystems")
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
fs/nfsd/nfs4state.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 19aab4c52548..8c714001c116 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -8589,10 +8589,11 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
goto out;
}
- if (lock->lk_type & (NFS4_READW_LT | NFS4_WRITEW_LT) &&
- nfsd4_has_session(cstate) &&
- locks_can_async_lock(nf->nf_file->f_op))
- flags |= FL_SLEEP;
+ if ((lock->lk_type == NFS4_READW_LT ||
+ lock->lk_type == NFS4_WRITEW_LT) &&
+ nfsd4_has_session(cstate) &&
+ locks_can_async_lock(nf->nf_file->f_op))
+ flags |= FL_SLEEP;
nbl = find_or_allocate_block(lock_sop, &fp->fi_fhandle, nn);
if (!nbl) {
--
2.54.0
^ permalink raw reply related [flat|nested] 20+ messages in thread* [PATCH 12/19] nfsd: add fh_want_write() for early-verified SETATTR in nfsd_proc_setattr()
2026-06-09 17:47 [PATCH 00/19] nfsd: more bugfixes Jeff Layton
` (10 preceding siblings ...)
2026-06-09 17:47 ` [PATCH 11/19] nfsd: fix FL_SLEEP being set unconditionally for all LOCK types Jeff Layton
@ 2026-06-09 17:47 ` Jeff Layton
2026-06-09 17:47 ` [PATCH 13/19] nfsd: fix clock domain mismatch in clients_still_reclaiming() Jeff Layton
` (6 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Jeff Layton @ 2026-06-09 17:47 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker, Chuck Lever, NeilBrown,
Olga Kornievskaia, Dai Ngo, Tom Talpey, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
Christian Brauner, Benjamin Coddington, Donald Hunter,
Lorenzo Bianconi, Qi Zheng, Andrew Morton, Muchun Song
Cc: linux-nfs, linux-kernel, netdev, Jeff Layton
The BOTH_TIME_SET branch calls fh_verify() early so setattr_prepare()
can inspect the dentry. This causes nfsd_setattr() to skip
fh_want_write(), so notify_change() runs without a mount write
reference.
Add the missing fh_want_write() call after the early fh_verify().
Fixes: cc265089ce1b ("nfsd: Disable NFSv2 timestamp workaround for NFSv3+")
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
fs/nfsd/nfsproc.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c
index 8873033d1e82..a73d5c259cd9 100644
--- a/fs/nfsd/nfsproc.c
+++ b/fs/nfsd/nfsproc.c
@@ -82,6 +82,7 @@ nfsd_proc_setattr(struct svc_rqst *rqstp)
.na_iattr = iap,
};
struct svc_fh *fhp;
+ int hosterr;
dprintk("nfsd: SETATTR %s, valid=%x, size=%ld\n",
SVCFH_fmt(&argp->fh),
@@ -117,6 +118,12 @@ nfsd_proc_setattr(struct svc_rqst *rqstp)
if (resp->status != nfs_ok)
goto out;
+ hosterr = fh_want_write(fhp);
+ if (hosterr) {
+ resp->status = nfserrno(hosterr);
+ goto out;
+ }
+
if (delta < 0)
delta = -delta;
if (delta < MAX_TOUCH_TIME_ERROR &&
--
2.54.0
^ permalink raw reply related [flat|nested] 20+ messages in thread* [PATCH 13/19] nfsd: fix clock domain mismatch in clients_still_reclaiming()
2026-06-09 17:47 [PATCH 00/19] nfsd: more bugfixes Jeff Layton
` (11 preceding siblings ...)
2026-06-09 17:47 ` [PATCH 12/19] nfsd: add fh_want_write() for early-verified SETATTR in nfsd_proc_setattr() Jeff Layton
@ 2026-06-09 17:47 ` Jeff Layton
2026-06-09 17:47 ` [PATCH 14/19] nfsd: use test_and_clear_bit for somebody_reclaimed to prevent lost update Jeff Layton
` (5 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Jeff Layton @ 2026-06-09 17:47 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker, Chuck Lever, NeilBrown,
Olga Kornievskaia, Dai Ngo, Tom Talpey, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
Christian Brauner, Benjamin Coddington, Donald Hunter,
Lorenzo Bianconi, Qi Zheng, Andrew Morton, Muchun Song
Cc: linux-nfs, linux-kernel, netdev, Jeff Layton
clients_still_reclaiming() computes a deadline from nn->boot_time
(CLOCK_REALTIME, ~1.7 billion) but compares it against
ktime_get_boottime_seconds() (CLOCK_BOOTTIME, seconds since boot).
The comparison is always false — it would take ~54 years of uptime
for BOOTTIME to exceed the REALTIME-derived deadline.
This means any client can hold the server in grace indefinitely by
sending CLAIM_PREVIOUS OPEN requests, blocking all non-reclaim
operations for all other clients.
Add boot_time_bt (CLOCK_BOOTTIME) alongside the existing boot_time
and use it for the deadline computation. boot_time (CLOCK_REALTIME)
is preserved for its cl_boot clientid-nonce role.
Fixes: 20b7d86f29d3 ("nfsd: use boottime for lease expiry calculation")
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
fs/nfsd/netns.h | 1 +
fs/nfsd/nfs4state.c | 3 ++-
2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
index 5c33c96da28e..03724bef10a7 100644
--- a/fs/nfsd/netns.h
+++ b/fs/nfsd/netns.h
@@ -78,6 +78,7 @@ struct nfsd_net {
struct lock_manager nfsd4_manager;
unsigned long flags;
time64_t boot_time;
+ time64_t boot_time_bt; /* same instant in CLOCK_BOOTTIME */
struct dentry *nfsd_client_dir;
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 8c714001c116..6e47330c6365 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -6823,7 +6823,7 @@ bool nfsd4_force_end_grace(struct nfsd_net *nn)
*/
static bool clients_still_reclaiming(struct nfsd_net *nn)
{
- time64_t double_grace_period_end = nn->boot_time +
+ time64_t double_grace_period_end = nn->boot_time_bt +
2 * nn->nfsd4_lease;
if (test_bit(NFSD_NET_GRACE_END_FORCED, &nn->flags))
@@ -9198,6 +9198,7 @@ static int nfs4_state_create_net(struct net *net)
nn->conf_name_tree = RB_ROOT;
nn->unconf_name_tree = RB_ROOT;
nn->boot_time = ktime_get_real_seconds();
+ nn->boot_time_bt = ktime_get_boottime_seconds();
clear_bit(NFSD_NET_GRACE_ENDED, &nn->flags);
clear_bit(NFSD_NET_GRACE_END_FORCED, &nn->flags);
nn->nfsd4_manager.block_opens = true;
--
2.54.0
^ permalink raw reply related [flat|nested] 20+ messages in thread* [PATCH 14/19] nfsd: use test_and_clear_bit for somebody_reclaimed to prevent lost update
2026-06-09 17:47 [PATCH 00/19] nfsd: more bugfixes Jeff Layton
` (12 preceding siblings ...)
2026-06-09 17:47 ` [PATCH 13/19] nfsd: fix clock domain mismatch in clients_still_reclaiming() Jeff Layton
@ 2026-06-09 17:47 ` Jeff Layton
2026-06-09 17:47 ` [PATCH 15/19] nfsd: reject reclaim LOCK after RECLAIM_COMPLETE Jeff Layton
` (4 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Jeff Layton @ 2026-06-09 17:47 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker, Chuck Lever, NeilBrown,
Olga Kornievskaia, Dai Ngo, Tom Talpey, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
Christian Brauner, Benjamin Coddington, Donald Hunter,
Lorenzo Bianconi, Qi Zheng, Andrew Morton, Muchun Song
Cc: linux-nfs, linux-kernel, netdev, Jeff Layton
clients_still_reclaiming() uses separate test_bit() and clear_bit()
calls on NFSD_NET_SOMEBODY_RECLAIMED. A concurrent set_bit() from
the OPEN or LOCK reclaim path arriving between the test and clear
is silently lost, causing the next laundromat tick to end grace
prematurely.
Replace with test_and_clear_bit() to make the read-and-clear atomic.
Fixes: 8c67a210c90c ("nfsd: convert nfsd_net boolean flags to unsigned long flags word")
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
fs/nfsd/nfs4state.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 6e47330c6365..fddef6f8db7c 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -6837,9 +6837,8 @@ static bool clients_still_reclaiming(struct nfsd_net *nn)
if (atomic_read(&nn->nr_reclaim_complete) == size)
return false;
}
- if (!test_bit(NFSD_NET_SOMEBODY_RECLAIMED, &nn->flags))
+ if (!test_and_clear_bit(NFSD_NET_SOMEBODY_RECLAIMED, &nn->flags))
return false;
- clear_bit(NFSD_NET_SOMEBODY_RECLAIMED, &nn->flags);
/*
* If we've given them *two* lease times to reclaim, and they're
* still not done, give up:
--
2.54.0
^ permalink raw reply related [flat|nested] 20+ messages in thread* [PATCH 15/19] nfsd: reject reclaim LOCK after RECLAIM_COMPLETE
2026-06-09 17:47 [PATCH 00/19] nfsd: more bugfixes Jeff Layton
` (13 preceding siblings ...)
2026-06-09 17:47 ` [PATCH 14/19] nfsd: use test_and_clear_bit for somebody_reclaimed to prevent lost update Jeff Layton
@ 2026-06-09 17:47 ` Jeff Layton
2026-06-09 17:47 ` [PATCH 16/19] nfsd: validate sockaddr length per family in listener_set Jeff Layton
` (3 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Jeff Layton @ 2026-06-09 17:47 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker, Chuck Lever, NeilBrown,
Olga Kornievskaia, Dai Ngo, Tom Talpey, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
Christian Brauner, Benjamin Coddington, Donald Hunter,
Lorenzo Bianconi, Qi Zheng, Andrew Morton, Muchun Song
Cc: linux-nfs, linux-kernel, netdev, Jeff Layton
nfsd4_lock() only checks the namespace-wide grace flag when deciding
whether to accept a reclaim LOCK. It does not check the per-client
NFSD4_CLIENT_RECLAIM_COMPLETE bit. A NFSv4.1+ client that has
already sent RECLAIM_COMPLETE can submit lk_reclaim=1 while grace is
still active (e.g. lockd holds the grace list open), and the server
accepts it instead of returning NFS4ERR_NO_GRACE as required by
RFC 8881 section 8.4.2.1.
The OPEN path already has the correct two-tier guard via
nfs4_check_open_reclaim(). Add the equivalent check to the LOCK path.
Fixes: 3b3e7b72239a ("nfsd: reject reclaim request when client has already sent RECLAIM_COMPLETE")
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
fs/nfsd/nfs4state.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index fddef6f8db7c..7d96bffd2fd5 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -8552,6 +8552,9 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
status = nfserr_no_grace;
if (!locks_in_grace(net) && lock->lk_reclaim)
goto out;
+ if (lock->lk_reclaim &&
+ test_bit(NFSD4_CLIENT_RECLAIM_COMPLETE, &cstate->clp->cl_flags))
+ goto out;
if (lock->lk_reclaim)
flags |= FL_RECLAIM;
--
2.54.0
^ permalink raw reply related [flat|nested] 20+ messages in thread* [PATCH 16/19] nfsd: validate sockaddr length per family in listener_set
2026-06-09 17:47 [PATCH 00/19] nfsd: more bugfixes Jeff Layton
` (14 preceding siblings ...)
2026-06-09 17:47 ` [PATCH 15/19] nfsd: reject reclaim LOCK after RECLAIM_COMPLETE Jeff Layton
@ 2026-06-09 17:47 ` Jeff Layton
2026-06-09 17:47 ` [PATCH 17/19] lockd, nfsd: RCU-protect nlmsvc_ops dispatch Jeff Layton
` (2 subsequent siblings)
18 siblings, 0 replies; 20+ messages in thread
From: Jeff Layton @ 2026-06-09 17:47 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker, Chuck Lever, NeilBrown,
Olga Kornievskaia, Dai Ngo, Tom Talpey, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
Christian Brauner, Benjamin Coddington, Donald Hunter,
Lorenzo Bianconi, Qi Zheng, Andrew Morton, Muchun Song
Cc: linux-nfs, linux-kernel, netdev, Jeff Layton
nfsd_sock_nl_policy declares NFSD_A_SOCK_ADDR as bare NLA_BINARY
with no minimum length. A CAP_NET_ADMIN caller can send a 16-byte
NFSD_A_SOCK_ADDR with sa_family=AF_INET6, causing a 12-byte OOB
read across three consumers (rpc_cmp_addr_port, svc_find_listener,
kernel_bind).
Tighten the policy to NLA_POLICY_MIN_LEN(16) and add per-family
length validation in both nlmsg_for_each_attr_type loops.
Fixes: 16a471177496 ("NFSD: add listener-{set,get} netlink command")
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
Documentation/netlink/specs/nfsd.yaml | 4 ++++
fs/nfsd/netlink.c | 2 +-
fs/nfsd/nfsctl.c | 30 ++++++++++++++++++++++++++++++
3 files changed, 35 insertions(+), 1 deletion(-)
diff --git a/Documentation/netlink/specs/nfsd.yaml b/Documentation/netlink/specs/nfsd.yaml
index 8f36fadd68f7..9677ba19ffcd 100644
--- a/Documentation/netlink/specs/nfsd.yaml
+++ b/Documentation/netlink/specs/nfsd.yaml
@@ -156,6 +156,10 @@ attribute-sets:
-
name: addr
type: binary
+ # 16 == sizeof(struct sockaddr_in); AF_INET6 callers
+ # validate the full sockaddr_in6 length in nfsctl.c.
+ checks:
+ min-len: 16
-
name: transport-name
type: string
diff --git a/fs/nfsd/netlink.c b/fs/nfsd/netlink.c
index fbee3676d253..6570960034f1 100644
--- a/fs/nfsd/netlink.c
+++ b/fs/nfsd/netlink.c
@@ -37,7 +37,7 @@ const struct nla_policy nfsd_fslocations_nl_policy[NFSD_A_FSLOCATIONS_LOCATION +
};
const struct nla_policy nfsd_sock_nl_policy[NFSD_A_SOCK_TRANSPORT_NAME + 1] = {
- [NFSD_A_SOCK_ADDR] = { .type = NLA_BINARY, },
+ [NFSD_A_SOCK_ADDR] = NLA_POLICY_MIN_LEN(16),
[NFSD_A_SOCK_TRANSPORT_NAME] = { .type = NLA_NUL_STRING, },
};
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index ab10692ee937..f3b3154b16c5 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -2016,6 +2016,21 @@ int nfsd_nl_listener_set_doit(struct sk_buff *skb, struct genl_info *info)
xcl_name = nla_data(tb[NFSD_A_SOCK_TRANSPORT_NAME]);
sa = nla_data(tb[NFSD_A_SOCK_ADDR]);
+ switch (sa->sa_family) {
+ case AF_INET:
+ if (nla_len(tb[NFSD_A_SOCK_ADDR]) <
+ sizeof(struct sockaddr_in))
+ continue;
+ break;
+ case AF_INET6:
+ if (nla_len(tb[NFSD_A_SOCK_ADDR]) <
+ sizeof(struct sockaddr_in6))
+ continue;
+ break;
+ default:
+ continue;
+ }
+
/* Put back any matching sockets */
list_for_each_entry_safe(xprt, tmp, &permsocks, xpt_list) {
/* This shouldn't be possible */
@@ -2077,6 +2092,21 @@ int nfsd_nl_listener_set_doit(struct sk_buff *skb, struct genl_info *info)
xcl_name = nla_data(tb[NFSD_A_SOCK_TRANSPORT_NAME]);
sa = nla_data(tb[NFSD_A_SOCK_ADDR]);
+ switch (sa->sa_family) {
+ case AF_INET:
+ if (nla_len(tb[NFSD_A_SOCK_ADDR]) <
+ sizeof(struct sockaddr_in))
+ continue;
+ break;
+ case AF_INET6:
+ if (nla_len(tb[NFSD_A_SOCK_ADDR]) <
+ sizeof(struct sockaddr_in6))
+ continue;
+ break;
+ default:
+ continue;
+ }
+
xprt = svc_find_listener(serv, xcl_name, net, sa);
if (xprt) {
if (delete)
--
2.54.0
^ permalink raw reply related [flat|nested] 20+ messages in thread* [PATCH 17/19] lockd, nfsd: RCU-protect nlmsvc_ops dispatch
2026-06-09 17:47 [PATCH 00/19] nfsd: more bugfixes Jeff Layton
` (15 preceding siblings ...)
2026-06-09 17:47 ` [PATCH 16/19] nfsd: validate sockaddr length per family in listener_set Jeff Layton
@ 2026-06-09 17:47 ` Jeff Layton
2026-06-09 17:47 ` [PATCH 18/19] nfsd: move nfsd_debugfs_init() after nfsd4_init_slabs() in init_nfsd() Jeff Layton
2026-06-09 17:47 ` [PATCH 19/19] nfsd: initialize DRC hash table before registering shrinker Jeff Layton
18 siblings, 0 replies; 20+ messages in thread
From: Jeff Layton @ 2026-06-09 17:47 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker, Chuck Lever, NeilBrown,
Olga Kornievskaia, Dai Ngo, Tom Talpey, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
Christian Brauner, Benjamin Coddington, Donald Hunter,
Lorenzo Bianconi, Qi Zheng, Andrew Morton, Muchun Song
Cc: linux-nfs, linux-kernel, netdev, Jeff Layton
nlmsvc_ops is published by nfsd_lockd_init() and cleared by
nfsd_lockd_shutdown() with plain stores, while lockd dereferences
it unguarded from dispatch sites in fs/lockd/svcsubs.c. The pointer
targets nfsd's .rodata and the fopen/fclose callbacks live in nfsd's
.text, so a stale load after rmmod nfsd results in either a NULL
deref or a module-text use-after-free.
Declare nlmsvc_ops as __rcu, publish via rcu_assign_pointer(), clear
via RCU_INIT_POINTER() + synchronize_rcu(). Add a struct module
*owner field to nlmsvc_binding and pin the module across indirect
calls with try_module_get/module_put. When the binding is torn down,
fall back to fput() to avoid leaking struct file references.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
fs/lockd/svc.c | 4 ++--
fs/lockd/svc4proc.c | 4 ++--
fs/lockd/svcproc.c | 4 ++--
fs/lockd/svcsubs.c | 52 +++++++++++++++++++++++++++++++++++++++-------
fs/nfsd/lockd.c | 6 ++++--
include/linux/lockd/bind.h | 12 ++++++++---
6 files changed, 64 insertions(+), 18 deletions(-)
diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c
index 490551369ef2..ee90e743064a 100644
--- a/fs/lockd/svc.c
+++ b/fs/lockd/svc.c
@@ -47,7 +47,7 @@
static struct svc_program nlmsvc_program;
-const struct nlmsvc_binding *nlmsvc_ops;
+const struct nlmsvc_binding __rcu *nlmsvc_ops;
EXPORT_SYMBOL_GPL(nlmsvc_ops);
static DEFINE_MUTEX(nlmsvc_mutex);
@@ -142,7 +142,7 @@ lockd(void *vrqstp)
nlmsvc_retry_blocked(rqstp);
svc_recv(rqstp, 0);
}
- if (nlmsvc_ops)
+ if (rcu_access_pointer(nlmsvc_ops))
nlmsvc_invalidate_all();
nlm_shutdown_hosts();
cancel_delayed_work_sync(&ln->grace_period_end);
diff --git a/fs/lockd/svc4proc.c b/fs/lockd/svc4proc.c
index 78e675470c4b..080dffce9d8e 100644
--- a/fs/lockd/svc4proc.c
+++ b/fs/lockd/svc4proc.c
@@ -128,7 +128,7 @@ nlm4svc_lookup_host(struct svc_rqst *rqstp, string caller, bool monitored)
{
struct nlm_host *host;
- if (!nlmsvc_ops)
+ if (!rcu_access_pointer(nlmsvc_ops))
return NULL;
host = nlmsvc_lookup_host(rqstp, caller.data, caller.len);
if (!host)
@@ -894,7 +894,7 @@ static __be32 nlm4svc_proc_granted_res(struct svc_rqst *rqstp)
{
struct nlm4_res_wrapper *argp = rqstp->rq_argp;
- if (!nlmsvc_ops)
+ if (!rcu_access_pointer(nlmsvc_ops))
return rpc_success;
if (nlm4_netobj_to_cookie(&argp->cookie, &argp->xdrgen.cookie))
diff --git a/fs/lockd/svcproc.c b/fs/lockd/svcproc.c
index 4836887f11ef..dce6f6e3fd40 100644
--- a/fs/lockd/svcproc.c
+++ b/fs/lockd/svcproc.c
@@ -133,7 +133,7 @@ nlm3svc_lookup_host(struct svc_rqst *rqstp, string caller, bool monitored)
{
struct nlm_host *host;
- if (!nlmsvc_ops)
+ if (!rcu_access_pointer(nlmsvc_ops))
return NULL;
host = nlmsvc_lookup_host(rqstp, caller.data, caller.len);
if (!host)
@@ -923,7 +923,7 @@ static __be32 nlmsvc_proc_granted_res(struct svc_rqst *rqstp)
{
struct nlm_res_wrapper *argp = rqstp->rq_argp;
- if (!nlmsvc_ops)
+ if (!rcu_access_pointer(nlmsvc_ops))
return rpc_success;
if (nlm_netobj_to_cookie(&argp->cookie, &argp->xdrgen.cookie))
diff --git a/fs/lockd/svcsubs.c b/fs/lockd/svcsubs.c
index d7ada90dc048..e44eb20d3453 100644
--- a/fs/lockd/svcsubs.c
+++ b/fs/lockd/svcsubs.c
@@ -90,22 +90,35 @@ int lock_to_openmode(struct file_lock *lock)
static __be32 nlm_do_fopen(struct svc_rqst *rqstp,
struct nlm_file *file, int mode)
{
+ const struct nlmsvc_binding *ops;
__be32 nlmerr = nlm__int__failed;
__be32 deferred = 0;
int error;
int m;
+ rcu_read_lock();
+ ops = rcu_dereference(nlmsvc_ops);
+ if (!ops || !try_module_get(ops->owner)) {
+ rcu_read_unlock();
+ return nlm__int__failed;
+ }
+ rcu_read_unlock();
+
for (m = O_RDONLY; m <= O_WRONLY; m++) {
struct file **fp = &file->f_file[m];
if (mode != O_RDWR && mode != m)
continue;
- if (*fp)
+ if (*fp) {
+ module_put(ops->owner);
return nlm_granted;
+ }
- error = nlmsvc_ops->fopen(rqstp, &file->f_handle, fp, m);
- if (!error)
+ error = ops->fopen(rqstp, &file->f_handle, fp, m);
+ if (!error) {
+ module_put(ops->owner);
return nlm_granted;
+ }
dprintk("lockd: open failed (errno %d)\n", error);
switch (error) {
@@ -122,6 +135,7 @@ static __be32 nlm_do_fopen(struct svc_rqst *rqstp,
}
}
+ module_put(ops->owner);
return deferred ? deferred : nlmerr;
}
@@ -185,6 +199,33 @@ nlm_lookup_file(struct svc_rqst *rqstp, struct nlm_file **result,
goto out_unlock;
}
+/*
+ * Release the struct file references held by a nlm_file.
+ */
+static void nlm_release_files(struct nlm_file *file)
+{
+ const struct nlmsvc_binding *ops;
+ bool have_ops;
+
+ rcu_read_lock();
+ ops = rcu_dereference(nlmsvc_ops);
+ have_ops = ops && try_module_get(ops->owner);
+ rcu_read_unlock();
+
+ if (have_ops) {
+ if (file->f_file[O_RDONLY])
+ ops->fclose(file->f_file[O_RDONLY]);
+ if (file->f_file[O_WRONLY])
+ ops->fclose(file->f_file[O_WRONLY]);
+ module_put(ops->owner);
+ } else {
+ if (file->f_file[O_RDONLY])
+ fput(file->f_file[O_RDONLY]);
+ if (file->f_file[O_WRONLY])
+ fput(file->f_file[O_WRONLY]);
+ }
+}
+
/*
* Delete a file after having released all locks, blocks and shares
*/
@@ -194,10 +235,7 @@ nlm_delete_file(struct nlm_file *file)
nlm_debug_print_file("closing file", file);
if (!hlist_unhashed(&file->f_list)) {
hlist_del(&file->f_list);
- if (file->f_file[O_RDONLY])
- nlmsvc_ops->fclose(file->f_file[O_RDONLY]);
- if (file->f_file[O_WRONLY])
- nlmsvc_ops->fclose(file->f_file[O_WRONLY]);
+ nlm_release_files(file);
kfree(file);
} else {
printk(KERN_WARNING "lockd: attempt to release unknown file!\n");
diff --git a/fs/nfsd/lockd.c b/fs/nfsd/lockd.c
index 6fe1325815e0..72a5b499839d 100644
--- a/fs/nfsd/lockd.c
+++ b/fs/nfsd/lockd.c
@@ -92,6 +92,7 @@ nlm_fclose(struct file *filp)
}
static const struct nlmsvc_binding nfsd_nlm_ops = {
+ .owner = THIS_MODULE,
.fopen = nlm_fopen, /* open file for locking */
.fclose = nlm_fclose, /* close file */
};
@@ -100,11 +101,12 @@ void
nfsd_lockd_init(void)
{
dprintk("nfsd: initializing lockd\n");
- nlmsvc_ops = &nfsd_nlm_ops;
+ rcu_assign_pointer(nlmsvc_ops, &nfsd_nlm_ops);
}
void
nfsd_lockd_shutdown(void)
{
- nlmsvc_ops = NULL;
+ RCU_INIT_POINTER(nlmsvc_ops, NULL);
+ synchronize_rcu();
}
diff --git a/include/linux/lockd/bind.h b/include/linux/lockd/bind.h
index b614e0deea72..db8207d4059f 100644
--- a/include/linux/lockd/bind.h
+++ b/include/linux/lockd/bind.h
@@ -16,17 +16,23 @@ struct svc_rqst;
struct rpc_task;
struct rpc_clnt;
struct super_block;
+struct module;
-/*
- * This is the set of functions for lockd->nfsd communication
+/**
+ * struct nlmsvc_binding - lockd -> nfsd callback table
+ * @owner: module that provides this binding.
+ * @fopen: open a file by NFS file handle on behalf of an NLM request.
+ * @fclose: close a file that was previously opened via @fopen.
+ * Implementations MUST be semantically equivalent to fput().
*/
struct nlmsvc_binding {
+ struct module *owner;
int (*fopen)(struct svc_rqst *rqstp, struct nfs_fh *f,
struct file **filp, int flags);
void (*fclose)(struct file *filp);
};
-extern const struct nlmsvc_binding *nlmsvc_ops;
+extern const struct nlmsvc_binding __rcu *nlmsvc_ops;
/*
* Similar to nfs_client_initdata, but without the NFS-specific
--
2.54.0
^ permalink raw reply related [flat|nested] 20+ messages in thread* [PATCH 18/19] nfsd: move nfsd_debugfs_init() after nfsd4_init_slabs() in init_nfsd()
2026-06-09 17:47 [PATCH 00/19] nfsd: more bugfixes Jeff Layton
` (16 preceding siblings ...)
2026-06-09 17:47 ` [PATCH 17/19] lockd, nfsd: RCU-protect nlmsvc_ops dispatch Jeff Layton
@ 2026-06-09 17:47 ` Jeff Layton
2026-06-09 17:47 ` [PATCH 19/19] nfsd: initialize DRC hash table before registering shrinker Jeff Layton
18 siblings, 0 replies; 20+ messages in thread
From: Jeff Layton @ 2026-06-09 17:47 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker, Chuck Lever, NeilBrown,
Olga Kornievskaia, Dai Ngo, Tom Talpey, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
Christian Brauner, Benjamin Coddington, Donald Hunter,
Lorenzo Bianconi, Qi Zheng, Andrew Morton, Muchun Song
Cc: linux-nfs, linux-kernel, netdev, Jeff Layton
nfsd_debugfs_init() runs before nfsd4_init_slabs() in init_nfsd().
If the slab allocation fails, the bare "return retval" bypasses
nfsd_debugfs_exit(), leaving orphan debugfs files with stale fops
pointers into the freed module text.
Move nfsd_debugfs_init() to after the slab init succeeds, so the
early return has no debugfs state to clean up.
Fixes: 9fe5ea760e64 ("NFSD: Add /sys/kernel/debug/nfsd")
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
fs/nfsd/nfsctl.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index f3b3154b16c5..b69e5f686e9d 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -2557,11 +2557,12 @@ static int __init init_nfsd(void)
{
int retval;
- nfsd_debugfs_init();
-
retval = nfsd4_init_slabs();
if (retval)
return retval;
+
+ nfsd_debugfs_init();
+
retval = nfsd4_init_pnfs();
if (retval)
goto out_free_slabs;
--
2.54.0
^ permalink raw reply related [flat|nested] 20+ messages in thread* [PATCH 19/19] nfsd: initialize DRC hash table before registering shrinker
2026-06-09 17:47 [PATCH 00/19] nfsd: more bugfixes Jeff Layton
` (17 preceding siblings ...)
2026-06-09 17:47 ` [PATCH 18/19] nfsd: move nfsd_debugfs_init() after nfsd4_init_slabs() in init_nfsd() Jeff Layton
@ 2026-06-09 17:47 ` Jeff Layton
18 siblings, 0 replies; 20+ messages in thread
From: Jeff Layton @ 2026-06-09 17:47 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker, Chuck Lever, NeilBrown,
Olga Kornievskaia, Dai Ngo, Tom Talpey, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
Christian Brauner, Benjamin Coddington, Donald Hunter,
Lorenzo Bianconi, Qi Zheng, Andrew Morton, Muchun Song
Cc: linux-nfs, linux-kernel, netdev, Jeff Layton
shrinker_register() precedes the INIT_LIST_HEAD loop and the
drc_hashsize store. On weakly-ordered architectures (arm64, ppc),
a shrinker scan can observe drc_hashsize before the bucket list
heads are initialized, causing a NULL deref in the DRC shrinker
callback.
Move bucket initialization and the drc_hashsize store before
shrinker_register() so the hash table is fully initialized before
it becomes visible to the shrinker.
Fixes: 8eea99a81c6f ("nfsd: dynamically allocate the nfsd-reply shrinker")
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
fs/nfsd/nfscache.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/nfsd/nfscache.c b/fs/nfsd/nfscache.c
index 154468ceccdc..18f8556d33dd 100644
--- a/fs/nfsd/nfscache.c
+++ b/fs/nfsd/nfscache.c
@@ -200,14 +200,14 @@ int nfsd_reply_cache_init(struct nfsd_net *nn)
nn->nfsd_reply_cache_shrinker->seeks = 1;
nn->nfsd_reply_cache_shrinker->private_data = nn;
- shrinker_register(nn->nfsd_reply_cache_shrinker);
-
for (i = 0; i < hashsize; i++) {
INIT_LIST_HEAD(&nn->drc_hashtbl[i].lru_head);
spin_lock_init(&nn->drc_hashtbl[i].cache_lock);
}
nn->drc_hashsize = hashsize;
+ shrinker_register(nn->nfsd_reply_cache_shrinker);
+
return 0;
out_shrinker:
kvfree(nn->drc_hashtbl);
--
2.54.0
^ permalink raw reply related [flat|nested] 20+ messages in thread