* [PATCH] nfsd: use dynamic allocation for oversized NFSv4.0 replay cache
@ 2026-02-24 19:33 Chuck Lever
2026-02-24 19:39 ` Jeff Layton
0 siblings, 1 reply; 6+ messages in thread
From: Chuck Lever @ 2026-02-24 19:33 UTC (permalink / raw)
To: Jeff Layton
Cc: NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey, linux-nfs,
Chuck Lever
From: Chuck Lever <chuck.lever@oracle.com>
Commit 1e8e9913672a ("nfsd: fix heap overflow in NFSv4.0 LOCK
replay cache") capped the replay cache copy at NFSD4_REPLAY_ISIZE
to prevent a heap overflow, but set rp_buflen to zero when the
encoded response exceeded the inline buffer. A retransmitted LOCK
reaching the replay path then produced only a status code with no
operation body, resulting in a malformed XDR response.
When the encoded response exceeds the 112-byte inline rp_ibuf, a
buffer is kmalloc'd to hold it. If the allocation fails, rp_buflen
remains zero, preserving the behavior from the capped-copy fix.
The buffer is freed when the stateowner is released or when a
subsequent operation's response fits in the inline buffer.
Fixes: 1e8e9913672a ("nfsd: fix heap overflow in NFSv4.0 LOCK replay cache")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/nfsd/nfs4state.c | 16 ++++++++++++++++
fs/nfsd/nfs4xdr.c | 23 ++++++++++++++++-------
fs/nfsd/state.h | 12 +++++++-----
3 files changed, 39 insertions(+), 12 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index ba49f49bb93b..b4d0e82b2690 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -1496,8 +1496,24 @@ release_all_access(struct nfs4_ol_stateid *stp)
}
}
+/**
+ * nfs4_replay_free_cache - release dynamically allocated replay buffer
+ * @rp: replay cache to reset
+ *
+ * If @rp->rp_buf points to a kmalloc'd buffer, free it and reset
+ * rp_buf to the inline rp_ibuf. Always zeroes rp_buflen.
+ */
+void nfs4_replay_free_cache(struct nfs4_replay *rp)
+{
+ if (rp->rp_buf != rp->rp_ibuf)
+ kfree(rp->rp_buf);
+ rp->rp_buf = rp->rp_ibuf;
+ rp->rp_buflen = 0;
+}
+
static inline void nfs4_free_stateowner(struct nfs4_stateowner *sop)
{
+ nfs4_replay_free_cache(&sop->so_replay);
kfree(sop->so_owner.data);
sop->so_ops->so_free(sop);
}
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 690f7a3122ec..2a0946c630e1 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -6282,14 +6282,23 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op)
int len = xdr->buf->len - (op_status_offset + XDR_UNIT);
so->so_replay.rp_status = op->status;
- if (len <= NFSD4_REPLAY_ISIZE) {
- so->so_replay.rp_buflen = len;
- read_bytes_from_xdr_buf(xdr->buf,
- op_status_offset + XDR_UNIT,
- so->so_replay.rp_buf, len);
- } else {
- so->so_replay.rp_buflen = 0;
+ if (len > NFSD4_REPLAY_ISIZE) {
+ char *buf = kmalloc(len, GFP_KERNEL);
+
+ nfs4_replay_free_cache(&so->so_replay);
+ if (buf) {
+ so->so_replay.rp_buf = buf;
+ } else {
+ /* rp_buflen already zeroed; skip caching */
+ goto status;
+ }
+ } else if (so->so_replay.rp_buf != so->so_replay.rp_ibuf) {
+ nfs4_replay_free_cache(&so->so_replay);
}
+ so->so_replay.rp_buflen = len;
+ read_bytes_from_xdr_buf(xdr->buf,
+ op_status_offset + XDR_UNIT,
+ so->so_replay.rp_buf, len);
}
status:
op->status = nfsd4_map_status(op->status,
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index 3159c7b67f50..9b05462da4cc 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -554,10 +554,10 @@ struct nfs4_client_reclaim {
* ~32(deleg. ace) = 112 bytes
*
* Some responses can exceed this. A LOCK denial includes the conflicting
- * lock owner, which can be up to 1024 bytes (NFS4_OPAQUE_LIMIT). Responses
- * larger than REPLAY_ISIZE are not cached in rp_ibuf; only rp_status is
- * saved. Enlarging this constant increases the size of every
- * nfs4_stateowner.
+ * lock owner, which can be up to 1024 bytes (NFS4_OPAQUE_LIMIT). When a
+ * response exceeds REPLAY_ISIZE, a buffer is dynamically allocated. If
+ * that allocation fails, only rp_status is saved. Enlarging this constant
+ * increases the size of every nfs4_stateowner.
*/
#define NFSD4_REPLAY_ISIZE 112
@@ -569,12 +569,14 @@ struct nfs4_client_reclaim {
struct nfs4_replay {
__be32 rp_status;
unsigned int rp_buflen;
- char *rp_buf;
+ char *rp_buf; /* rp_ibuf or kmalloc'd */
struct knfsd_fh rp_openfh;
int rp_locked;
char rp_ibuf[NFSD4_REPLAY_ISIZE];
};
+extern void nfs4_replay_free_cache(struct nfs4_replay *rp);
+
struct nfs4_stateowner;
struct nfs4_stateowner_operations {
--
2.53.0
^ permalink raw reply related [flat|nested] 6+ messages in thread* Re: [PATCH] nfsd: use dynamic allocation for oversized NFSv4.0 replay cache 2026-02-24 19:33 [PATCH] nfsd: use dynamic allocation for oversized NFSv4.0 replay cache Chuck Lever @ 2026-02-24 19:39 ` Jeff Layton 2026-02-24 19:42 ` Chuck Lever 0 siblings, 1 reply; 6+ messages in thread From: Jeff Layton @ 2026-02-24 19:39 UTC (permalink / raw) To: Chuck Lever Cc: NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey, linux-nfs, Chuck Lever On Tue, 2026-02-24 at 14:33 -0500, Chuck Lever wrote: > From: Chuck Lever <chuck.lever@oracle.com> > > Commit 1e8e9913672a ("nfsd: fix heap overflow in NFSv4.0 LOCK > replay cache") capped the replay cache copy at NFSD4_REPLAY_ISIZE > to prevent a heap overflow, but set rp_buflen to zero when the > encoded response exceeded the inline buffer. A retransmitted LOCK > reaching the replay path then produced only a status code with no > operation body, resulting in a malformed XDR response. > > When the encoded response exceeds the 112-byte inline rp_ibuf, a > buffer is kmalloc'd to hold it. If the allocation fails, rp_buflen > remains zero, preserving the behavior from the capped-copy fix. > The buffer is freed when the stateowner is released or when a > subsequent operation's response fits in the inline buffer. > > Fixes: 1e8e9913672a ("nfsd: fix heap overflow in NFSv4.0 LOCK replay cache") > Signed-off-by: Chuck Lever <chuck.lever@oracle.com> > --- > fs/nfsd/nfs4state.c | 16 ++++++++++++++++ > fs/nfsd/nfs4xdr.c | 23 ++++++++++++++++------- > fs/nfsd/state.h | 12 +++++++----- > 3 files changed, 39 insertions(+), 12 deletions(-) > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c > index ba49f49bb93b..b4d0e82b2690 100644 > --- a/fs/nfsd/nfs4state.c > +++ b/fs/nfsd/nfs4state.c > @@ -1496,8 +1496,24 @@ release_all_access(struct nfs4_ol_stateid *stp) > } > } > > +/** > + * nfs4_replay_free_cache - release dynamically allocated replay buffer > + * @rp: replay cache to reset > + * > + * If @rp->rp_buf points to a kmalloc'd buffer, free it and reset > + * rp_buf to the inline rp_ibuf. Always zeroes rp_buflen. > + */ > +void nfs4_replay_free_cache(struct nfs4_replay *rp) > +{ > + if (rp->rp_buf != rp->rp_ibuf) > + kfree(rp->rp_buf); > + rp->rp_buf = rp->rp_ibuf; > + rp->rp_buflen = 0; > +} > + > static inline void nfs4_free_stateowner(struct nfs4_stateowner *sop) > { > + nfs4_replay_free_cache(&sop->so_replay); > kfree(sop->so_owner.data); > sop->so_ops->so_free(sop); > } > diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c > index 690f7a3122ec..2a0946c630e1 100644 > --- a/fs/nfsd/nfs4xdr.c > +++ b/fs/nfsd/nfs4xdr.c > @@ -6282,14 +6282,23 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op) > int len = xdr->buf->len - (op_status_offset + XDR_UNIT); > > so->so_replay.rp_status = op->status; > - if (len <= NFSD4_REPLAY_ISIZE) { > - so->so_replay.rp_buflen = len; > - read_bytes_from_xdr_buf(xdr->buf, > - op_status_offset + XDR_UNIT, > - so->so_replay.rp_buf, len); > - } else { > - so->so_replay.rp_buflen = 0; > + if (len > NFSD4_REPLAY_ISIZE) { > + char *buf = kmalloc(len, GFP_KERNEL); > + > + nfs4_replay_free_cache(&so->so_replay); > + if (buf) { > + so->so_replay.rp_buf = buf; > + } else { > + /* rp_buflen already zeroed; skip caching */ > + goto status; > + } > + } else if (so->so_replay.rp_buf != so->so_replay.rp_ibuf) { > + nfs4_replay_free_cache(&so->so_replay); > } > + so->so_replay.rp_buflen = len; > + read_bytes_from_xdr_buf(xdr->buf, > + op_status_offset + XDR_UNIT, > + so->so_replay.rp_buf, len); > } > status: > op->status = nfsd4_map_status(op->status, > diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h > index 3159c7b67f50..9b05462da4cc 100644 > --- a/fs/nfsd/state.h > +++ b/fs/nfsd/state.h > @@ -554,10 +554,10 @@ struct nfs4_client_reclaim { > * ~32(deleg. ace) = 112 bytes > * > * Some responses can exceed this. A LOCK denial includes the conflicting > - * lock owner, which can be up to 1024 bytes (NFS4_OPAQUE_LIMIT). Responses > - * larger than REPLAY_ISIZE are not cached in rp_ibuf; only rp_status is > - * saved. Enlarging this constant increases the size of every > - * nfs4_stateowner. > + * lock owner, which can be up to 1024 bytes (NFS4_OPAQUE_LIMIT). When a > + * response exceeds REPLAY_ISIZE, a buffer is dynamically allocated. If > + * that allocation fails, only rp_status is saved. Enlarging this constant > + * increases the size of every nfs4_stateowner. > */ > > #define NFSD4_REPLAY_ISIZE 112 > @@ -569,12 +569,14 @@ struct nfs4_client_reclaim { > struct nfs4_replay { > __be32 rp_status; > unsigned int rp_buflen; > - char *rp_buf; > + char *rp_buf; /* rp_ibuf or kmalloc'd */ > struct knfsd_fh rp_openfh; > int rp_locked; > char rp_ibuf[NFSD4_REPLAY_ISIZE]; > }; > > +extern void nfs4_replay_free_cache(struct nfs4_replay *rp); > + > struct nfs4_stateowner; > > struct nfs4_stateowner_operations { Certainly a reasonable approach if we care about full correctness when dealing with a large lockowner on NFSv4.0. Do we? -- Jeff Layton <jlayton@kernel.org> ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] nfsd: use dynamic allocation for oversized NFSv4.0 replay cache 2026-02-24 19:39 ` Jeff Layton @ 2026-02-24 19:42 ` Chuck Lever 2026-02-24 19:51 ` Jeff Layton 0 siblings, 1 reply; 6+ messages in thread From: Chuck Lever @ 2026-02-24 19:42 UTC (permalink / raw) To: Jeff Layton Cc: NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey, linux-nfs, Chuck Lever On 2/24/26 2:39 PM, Jeff Layton wrote: > On Tue, 2026-02-24 at 14:33 -0500, Chuck Lever wrote: >> From: Chuck Lever <chuck.lever@oracle.com> >> >> Commit 1e8e9913672a ("nfsd: fix heap overflow in NFSv4.0 LOCK >> replay cache") capped the replay cache copy at NFSD4_REPLAY_ISIZE >> to prevent a heap overflow, but set rp_buflen to zero when the >> encoded response exceeded the inline buffer. A retransmitted LOCK >> reaching the replay path then produced only a status code with no >> operation body, resulting in a malformed XDR response. >> >> When the encoded response exceeds the 112-byte inline rp_ibuf, a >> buffer is kmalloc'd to hold it. If the allocation fails, rp_buflen >> remains zero, preserving the behavior from the capped-copy fix. >> The buffer is freed when the stateowner is released or when a >> subsequent operation's response fits in the inline buffer. >> >> Fixes: 1e8e9913672a ("nfsd: fix heap overflow in NFSv4.0 LOCK replay cache") >> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> >> --- >> fs/nfsd/nfs4state.c | 16 ++++++++++++++++ >> fs/nfsd/nfs4xdr.c | 23 ++++++++++++++++------- >> fs/nfsd/state.h | 12 +++++++----- >> 3 files changed, 39 insertions(+), 12 deletions(-) >> >> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c >> index ba49f49bb93b..b4d0e82b2690 100644 >> --- a/fs/nfsd/nfs4state.c >> +++ b/fs/nfsd/nfs4state.c >> @@ -1496,8 +1496,24 @@ release_all_access(struct nfs4_ol_stateid *stp) >> } >> } >> >> +/** >> + * nfs4_replay_free_cache - release dynamically allocated replay buffer >> + * @rp: replay cache to reset >> + * >> + * If @rp->rp_buf points to a kmalloc'd buffer, free it and reset >> + * rp_buf to the inline rp_ibuf. Always zeroes rp_buflen. >> + */ >> +void nfs4_replay_free_cache(struct nfs4_replay *rp) >> +{ >> + if (rp->rp_buf != rp->rp_ibuf) >> + kfree(rp->rp_buf); >> + rp->rp_buf = rp->rp_ibuf; >> + rp->rp_buflen = 0; >> +} >> + >> static inline void nfs4_free_stateowner(struct nfs4_stateowner *sop) >> { >> + nfs4_replay_free_cache(&sop->so_replay); >> kfree(sop->so_owner.data); >> sop->so_ops->so_free(sop); >> } >> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c >> index 690f7a3122ec..2a0946c630e1 100644 >> --- a/fs/nfsd/nfs4xdr.c >> +++ b/fs/nfsd/nfs4xdr.c >> @@ -6282,14 +6282,23 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op) >> int len = xdr->buf->len - (op_status_offset + XDR_UNIT); >> >> so->so_replay.rp_status = op->status; >> - if (len <= NFSD4_REPLAY_ISIZE) { >> - so->so_replay.rp_buflen = len; >> - read_bytes_from_xdr_buf(xdr->buf, >> - op_status_offset + XDR_UNIT, >> - so->so_replay.rp_buf, len); >> - } else { >> - so->so_replay.rp_buflen = 0; >> + if (len > NFSD4_REPLAY_ISIZE) { >> + char *buf = kmalloc(len, GFP_KERNEL); >> + >> + nfs4_replay_free_cache(&so->so_replay); >> + if (buf) { >> + so->so_replay.rp_buf = buf; >> + } else { >> + /* rp_buflen already zeroed; skip caching */ >> + goto status; >> + } >> + } else if (so->so_replay.rp_buf != so->so_replay.rp_ibuf) { >> + nfs4_replay_free_cache(&so->so_replay); >> } >> + so->so_replay.rp_buflen = len; >> + read_bytes_from_xdr_buf(xdr->buf, >> + op_status_offset + XDR_UNIT, >> + so->so_replay.rp_buf, len); >> } >> status: >> op->status = nfsd4_map_status(op->status, >> diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h >> index 3159c7b67f50..9b05462da4cc 100644 >> --- a/fs/nfsd/state.h >> +++ b/fs/nfsd/state.h >> @@ -554,10 +554,10 @@ struct nfs4_client_reclaim { >> * ~32(deleg. ace) = 112 bytes >> * >> * Some responses can exceed this. A LOCK denial includes the conflicting >> - * lock owner, which can be up to 1024 bytes (NFS4_OPAQUE_LIMIT). Responses >> - * larger than REPLAY_ISIZE are not cached in rp_ibuf; only rp_status is >> - * saved. Enlarging this constant increases the size of every >> - * nfs4_stateowner. >> + * lock owner, which can be up to 1024 bytes (NFS4_OPAQUE_LIMIT). When a >> + * response exceeds REPLAY_ISIZE, a buffer is dynamically allocated. If >> + * that allocation fails, only rp_status is saved. Enlarging this constant >> + * increases the size of every nfs4_stateowner. >> */ >> >> #define NFSD4_REPLAY_ISIZE 112 >> @@ -569,12 +569,14 @@ struct nfs4_client_reclaim { >> struct nfs4_replay { >> __be32 rp_status; >> unsigned int rp_buflen; >> - char *rp_buf; >> + char *rp_buf; /* rp_ibuf or kmalloc'd */ >> struct knfsd_fh rp_openfh; >> int rp_locked; >> char rp_ibuf[NFSD4_REPLAY_ISIZE]; >> }; >> >> +extern void nfs4_replay_free_cache(struct nfs4_replay *rp); >> + >> struct nfs4_stateowner; >> >> struct nfs4_stateowner_operations { > > > Certainly a reasonable approach if we care about full correctness when > dealing with a large lockowner on NFSv4.0. Do we? The idea would be to either: o Backport your fix and not this update, or o Squash these two together, and backport both Admittedly this is a narrow corner case for a minor version that is destined for the scrap heap. -- Chuck Lever ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] nfsd: use dynamic allocation for oversized NFSv4.0 replay cache 2026-02-24 19:42 ` Chuck Lever @ 2026-02-24 19:51 ` Jeff Layton 2026-02-24 19:53 ` Chuck Lever 0 siblings, 1 reply; 6+ messages in thread From: Jeff Layton @ 2026-02-24 19:51 UTC (permalink / raw) To: Chuck Lever Cc: NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey, linux-nfs, Chuck Lever On Tue, 2026-02-24 at 14:42 -0500, Chuck Lever wrote: > On 2/24/26 2:39 PM, Jeff Layton wrote: > > On Tue, 2026-02-24 at 14:33 -0500, Chuck Lever wrote: > > > From: Chuck Lever <chuck.lever@oracle.com> > > > > > > Commit 1e8e9913672a ("nfsd: fix heap overflow in NFSv4.0 LOCK > > > replay cache") capped the replay cache copy at NFSD4_REPLAY_ISIZE > > > to prevent a heap overflow, but set rp_buflen to zero when the > > > encoded response exceeded the inline buffer. A retransmitted LOCK > > > reaching the replay path then produced only a status code with no > > > operation body, resulting in a malformed XDR response. > > > > > > When the encoded response exceeds the 112-byte inline rp_ibuf, a > > > buffer is kmalloc'd to hold it. If the allocation fails, rp_buflen > > > remains zero, preserving the behavior from the capped-copy fix. > > > The buffer is freed when the stateowner is released or when a > > > subsequent operation's response fits in the inline buffer. > > > > > > Fixes: 1e8e9913672a ("nfsd: fix heap overflow in NFSv4.0 LOCK replay cache") > > > Signed-off-by: Chuck Lever <chuck.lever@oracle.com> > > > --- > > > fs/nfsd/nfs4state.c | 16 ++++++++++++++++ > > > fs/nfsd/nfs4xdr.c | 23 ++++++++++++++++------- > > > fs/nfsd/state.h | 12 +++++++----- > > > 3 files changed, 39 insertions(+), 12 deletions(-) > > > > > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c > > > index ba49f49bb93b..b4d0e82b2690 100644 > > > --- a/fs/nfsd/nfs4state.c > > > +++ b/fs/nfsd/nfs4state.c > > > @@ -1496,8 +1496,24 @@ release_all_access(struct nfs4_ol_stateid *stp) > > > } > > > } > > > > > > +/** > > > + * nfs4_replay_free_cache - release dynamically allocated replay buffer > > > + * @rp: replay cache to reset > > > + * > > > + * If @rp->rp_buf points to a kmalloc'd buffer, free it and reset > > > + * rp_buf to the inline rp_ibuf. Always zeroes rp_buflen. > > > + */ > > > +void nfs4_replay_free_cache(struct nfs4_replay *rp) > > > +{ > > > + if (rp->rp_buf != rp->rp_ibuf) > > > + kfree(rp->rp_buf); > > > + rp->rp_buf = rp->rp_ibuf; > > > + rp->rp_buflen = 0; > > > +} > > > + > > > static inline void nfs4_free_stateowner(struct nfs4_stateowner *sop) > > > { > > > + nfs4_replay_free_cache(&sop->so_replay); > > > kfree(sop->so_owner.data); > > > sop->so_ops->so_free(sop); > > > } > > > diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c > > > index 690f7a3122ec..2a0946c630e1 100644 > > > --- a/fs/nfsd/nfs4xdr.c > > > +++ b/fs/nfsd/nfs4xdr.c > > > @@ -6282,14 +6282,23 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op) > > > int len = xdr->buf->len - (op_status_offset + XDR_UNIT); > > > > > > so->so_replay.rp_status = op->status; > > > - if (len <= NFSD4_REPLAY_ISIZE) { > > > - so->so_replay.rp_buflen = len; > > > - read_bytes_from_xdr_buf(xdr->buf, > > > - op_status_offset + XDR_UNIT, > > > - so->so_replay.rp_buf, len); > > > - } else { > > > - so->so_replay.rp_buflen = 0; > > > + if (len > NFSD4_REPLAY_ISIZE) { > > > + char *buf = kmalloc(len, GFP_KERNEL); > > > + > > > + nfs4_replay_free_cache(&so->so_replay); > > > + if (buf) { > > > + so->so_replay.rp_buf = buf; > > > + } else { > > > + /* rp_buflen already zeroed; skip caching */ > > > + goto status; > > > + } > > > + } else if (so->so_replay.rp_buf != so->so_replay.rp_ibuf) { > > > + nfs4_replay_free_cache(&so->so_replay); > > > } > > > + so->so_replay.rp_buflen = len; > > > + read_bytes_from_xdr_buf(xdr->buf, > > > + op_status_offset + XDR_UNIT, > > > + so->so_replay.rp_buf, len); > > > } > > > status: > > > op->status = nfsd4_map_status(op->status, > > > diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h > > > index 3159c7b67f50..9b05462da4cc 100644 > > > --- a/fs/nfsd/state.h > > > +++ b/fs/nfsd/state.h > > > @@ -554,10 +554,10 @@ struct nfs4_client_reclaim { > > > * ~32(deleg. ace) = 112 bytes > > > * > > > * Some responses can exceed this. A LOCK denial includes the conflicting > > > - * lock owner, which can be up to 1024 bytes (NFS4_OPAQUE_LIMIT). Responses > > > - * larger than REPLAY_ISIZE are not cached in rp_ibuf; only rp_status is > > > - * saved. Enlarging this constant increases the size of every > > > - * nfs4_stateowner. > > > + * lock owner, which can be up to 1024 bytes (NFS4_OPAQUE_LIMIT). When a > > > + * response exceeds REPLAY_ISIZE, a buffer is dynamically allocated. If > > > + * that allocation fails, only rp_status is saved. Enlarging this constant > > > + * increases the size of every nfs4_stateowner. > > > */ > > > > > > #define NFSD4_REPLAY_ISIZE 112 > > > @@ -569,12 +569,14 @@ struct nfs4_client_reclaim { > > > struct nfs4_replay { > > > __be32 rp_status; > > > unsigned int rp_buflen; > > > - char *rp_buf; > > > + char *rp_buf; /* rp_ibuf or kmalloc'd */ > > > struct knfsd_fh rp_openfh; > > > int rp_locked; > > > char rp_ibuf[NFSD4_REPLAY_ISIZE]; > > > }; > > > > > > +extern void nfs4_replay_free_cache(struct nfs4_replay *rp); > > > + > > > struct nfs4_stateowner; > > > > > > struct nfs4_stateowner_operations { > > > > > > Certainly a reasonable approach if we care about full correctness when > > dealing with a large lockowner on NFSv4.0. Do we? > > The idea would be to either: > > o Backport your fix and not this update, or > o Squash these two together, and backport both > > Admittedly this is a narrow corner case for a minor version that is > destined for the scrap heap. > Right. I ask because I looked at this approach when I was fixing this, and decided it wasn't worthwhile. I certainly won't stand in your way if you decide you want to handle long lockowner blobs, but I doubt any legitimate user will ever care. -- Jeff Layton <jlayton@kernel.org> ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] nfsd: use dynamic allocation for oversized NFSv4.0 replay cache 2026-02-24 19:51 ` Jeff Layton @ 2026-02-24 19:53 ` Chuck Lever 2026-02-24 19:59 ` Jeff Layton 0 siblings, 1 reply; 6+ messages in thread From: Chuck Lever @ 2026-02-24 19:53 UTC (permalink / raw) To: Jeff Layton Cc: NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey, linux-nfs, Chuck Lever On 2/24/26 2:51 PM, Jeff Layton wrote: > On Tue, 2026-02-24 at 14:42 -0500, Chuck Lever wrote: >> On 2/24/26 2:39 PM, Jeff Layton wrote: >>> On Tue, 2026-02-24 at 14:33 -0500, Chuck Lever wrote: >>>> From: Chuck Lever <chuck.lever@oracle.com> >>>> >>>> Commit 1e8e9913672a ("nfsd: fix heap overflow in NFSv4.0 LOCK >>>> replay cache") capped the replay cache copy at NFSD4_REPLAY_ISIZE >>>> to prevent a heap overflow, but set rp_buflen to zero when the >>>> encoded response exceeded the inline buffer. A retransmitted LOCK >>>> reaching the replay path then produced only a status code with no >>>> operation body, resulting in a malformed XDR response. >>>> >>>> When the encoded response exceeds the 112-byte inline rp_ibuf, a >>>> buffer is kmalloc'd to hold it. If the allocation fails, rp_buflen >>>> remains zero, preserving the behavior from the capped-copy fix. >>>> The buffer is freed when the stateowner is released or when a >>>> subsequent operation's response fits in the inline buffer. >>>> >>>> Fixes: 1e8e9913672a ("nfsd: fix heap overflow in NFSv4.0 LOCK replay cache") >>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> >>>> --- >>>> fs/nfsd/nfs4state.c | 16 ++++++++++++++++ >>>> fs/nfsd/nfs4xdr.c | 23 ++++++++++++++++------- >>>> fs/nfsd/state.h | 12 +++++++----- >>>> 3 files changed, 39 insertions(+), 12 deletions(-) >>>> >>>> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c >>>> index ba49f49bb93b..b4d0e82b2690 100644 >>>> --- a/fs/nfsd/nfs4state.c >>>> +++ b/fs/nfsd/nfs4state.c >>>> @@ -1496,8 +1496,24 @@ release_all_access(struct nfs4_ol_stateid *stp) >>>> } >>>> } >>>> >>>> +/** >>>> + * nfs4_replay_free_cache - release dynamically allocated replay buffer >>>> + * @rp: replay cache to reset >>>> + * >>>> + * If @rp->rp_buf points to a kmalloc'd buffer, free it and reset >>>> + * rp_buf to the inline rp_ibuf. Always zeroes rp_buflen. >>>> + */ >>>> +void nfs4_replay_free_cache(struct nfs4_replay *rp) >>>> +{ >>>> + if (rp->rp_buf != rp->rp_ibuf) >>>> + kfree(rp->rp_buf); >>>> + rp->rp_buf = rp->rp_ibuf; >>>> + rp->rp_buflen = 0; >>>> +} >>>> + >>>> static inline void nfs4_free_stateowner(struct nfs4_stateowner *sop) >>>> { >>>> + nfs4_replay_free_cache(&sop->so_replay); >>>> kfree(sop->so_owner.data); >>>> sop->so_ops->so_free(sop); >>>> } >>>> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c >>>> index 690f7a3122ec..2a0946c630e1 100644 >>>> --- a/fs/nfsd/nfs4xdr.c >>>> +++ b/fs/nfsd/nfs4xdr.c >>>> @@ -6282,14 +6282,23 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op) >>>> int len = xdr->buf->len - (op_status_offset + XDR_UNIT); >>>> >>>> so->so_replay.rp_status = op->status; >>>> - if (len <= NFSD4_REPLAY_ISIZE) { >>>> - so->so_replay.rp_buflen = len; >>>> - read_bytes_from_xdr_buf(xdr->buf, >>>> - op_status_offset + XDR_UNIT, >>>> - so->so_replay.rp_buf, len); >>>> - } else { >>>> - so->so_replay.rp_buflen = 0; >>>> + if (len > NFSD4_REPLAY_ISIZE) { >>>> + char *buf = kmalloc(len, GFP_KERNEL); >>>> + >>>> + nfs4_replay_free_cache(&so->so_replay); >>>> + if (buf) { >>>> + so->so_replay.rp_buf = buf; >>>> + } else { >>>> + /* rp_buflen already zeroed; skip caching */ >>>> + goto status; >>>> + } >>>> + } else if (so->so_replay.rp_buf != so->so_replay.rp_ibuf) { >>>> + nfs4_replay_free_cache(&so->so_replay); >>>> } >>>> + so->so_replay.rp_buflen = len; >>>> + read_bytes_from_xdr_buf(xdr->buf, >>>> + op_status_offset + XDR_UNIT, >>>> + so->so_replay.rp_buf, len); >>>> } >>>> status: >>>> op->status = nfsd4_map_status(op->status, >>>> diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h >>>> index 3159c7b67f50..9b05462da4cc 100644 >>>> --- a/fs/nfsd/state.h >>>> +++ b/fs/nfsd/state.h >>>> @@ -554,10 +554,10 @@ struct nfs4_client_reclaim { >>>> * ~32(deleg. ace) = 112 bytes >>>> * >>>> * Some responses can exceed this. A LOCK denial includes the conflicting >>>> - * lock owner, which can be up to 1024 bytes (NFS4_OPAQUE_LIMIT). Responses >>>> - * larger than REPLAY_ISIZE are not cached in rp_ibuf; only rp_status is >>>> - * saved. Enlarging this constant increases the size of every >>>> - * nfs4_stateowner. >>>> + * lock owner, which can be up to 1024 bytes (NFS4_OPAQUE_LIMIT). When a >>>> + * response exceeds REPLAY_ISIZE, a buffer is dynamically allocated. If >>>> + * that allocation fails, only rp_status is saved. Enlarging this constant >>>> + * increases the size of every nfs4_stateowner. >>>> */ >>>> >>>> #define NFSD4_REPLAY_ISIZE 112 >>>> @@ -569,12 +569,14 @@ struct nfs4_client_reclaim { >>>> struct nfs4_replay { >>>> __be32 rp_status; >>>> unsigned int rp_buflen; >>>> - char *rp_buf; >>>> + char *rp_buf; /* rp_ibuf or kmalloc'd */ >>>> struct knfsd_fh rp_openfh; >>>> int rp_locked; >>>> char rp_ibuf[NFSD4_REPLAY_ISIZE]; >>>> }; >>>> >>>> +extern void nfs4_replay_free_cache(struct nfs4_replay *rp); >>>> + >>>> struct nfs4_stateowner; >>>> >>>> struct nfs4_stateowner_operations { >>> >>> >>> Certainly a reasonable approach if we care about full correctness when >>> dealing with a large lockowner on NFSv4.0. Do we? >> >> The idea would be to either: >> >> o Backport your fix and not this update, or >> o Squash these two together, and backport both >> >> Admittedly this is a narrow corner case for a minor version that is >> destined for the scrap heap. >> > > Right. I ask because I looked at this approach when I was fixing this, > and decided it wasn't worthwhile. I certainly won't stand in your way > if you decide you want to handle long lockowner blobs, but I doubt any > legitimate user will ever care. I don't disagree at all. My concern is handling replay compliantly. Maybe there's another approach. -- Chuck Lever ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] nfsd: use dynamic allocation for oversized NFSv4.0 replay cache 2026-02-24 19:53 ` Chuck Lever @ 2026-02-24 19:59 ` Jeff Layton 0 siblings, 0 replies; 6+ messages in thread From: Jeff Layton @ 2026-02-24 19:59 UTC (permalink / raw) To: Chuck Lever Cc: NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey, linux-nfs, Chuck Lever On Tue, 2026-02-24 at 14:53 -0500, Chuck Lever wrote: > On 2/24/26 2:51 PM, Jeff Layton wrote: > > On Tue, 2026-02-24 at 14:42 -0500, Chuck Lever wrote: > > > On 2/24/26 2:39 PM, Jeff Layton wrote: > > > > On Tue, 2026-02-24 at 14:33 -0500, Chuck Lever wrote: > > > > > From: Chuck Lever <chuck.lever@oracle.com> > > > > > > > > > > Commit 1e8e9913672a ("nfsd: fix heap overflow in NFSv4.0 LOCK > > > > > replay cache") capped the replay cache copy at NFSD4_REPLAY_ISIZE > > > > > to prevent a heap overflow, but set rp_buflen to zero when the > > > > > encoded response exceeded the inline buffer. A retransmitted LOCK > > > > > reaching the replay path then produced only a status code with no > > > > > operation body, resulting in a malformed XDR response. > > > > > > > > > > When the encoded response exceeds the 112-byte inline rp_ibuf, a > > > > > buffer is kmalloc'd to hold it. If the allocation fails, rp_buflen > > > > > remains zero, preserving the behavior from the capped-copy fix. > > > > > The buffer is freed when the stateowner is released or when a > > > > > subsequent operation's response fits in the inline buffer. > > > > > > > > > > Fixes: 1e8e9913672a ("nfsd: fix heap overflow in NFSv4.0 LOCK replay cache") > > > > > Signed-off-by: Chuck Lever <chuck.lever@oracle.com> > > > > > --- > > > > > fs/nfsd/nfs4state.c | 16 ++++++++++++++++ > > > > > fs/nfsd/nfs4xdr.c | 23 ++++++++++++++++------- > > > > > fs/nfsd/state.h | 12 +++++++----- > > > > > 3 files changed, 39 insertions(+), 12 deletions(-) > > > > > > > > > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c > > > > > index ba49f49bb93b..b4d0e82b2690 100644 > > > > > --- a/fs/nfsd/nfs4state.c > > > > > +++ b/fs/nfsd/nfs4state.c > > > > > @@ -1496,8 +1496,24 @@ release_all_access(struct nfs4_ol_stateid *stp) > > > > > } > > > > > } > > > > > > > > > > +/** > > > > > + * nfs4_replay_free_cache - release dynamically allocated replay buffer > > > > > + * @rp: replay cache to reset > > > > > + * > > > > > + * If @rp->rp_buf points to a kmalloc'd buffer, free it and reset > > > > > + * rp_buf to the inline rp_ibuf. Always zeroes rp_buflen. > > > > > + */ > > > > > +void nfs4_replay_free_cache(struct nfs4_replay *rp) > > > > > +{ > > > > > + if (rp->rp_buf != rp->rp_ibuf) > > > > > + kfree(rp->rp_buf); > > > > > + rp->rp_buf = rp->rp_ibuf; > > > > > + rp->rp_buflen = 0; > > > > > +} > > > > > + > > > > > static inline void nfs4_free_stateowner(struct nfs4_stateowner *sop) > > > > > { > > > > > + nfs4_replay_free_cache(&sop->so_replay); > > > > > kfree(sop->so_owner.data); > > > > > sop->so_ops->so_free(sop); > > > > > } > > > > > diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c > > > > > index 690f7a3122ec..2a0946c630e1 100644 > > > > > --- a/fs/nfsd/nfs4xdr.c > > > > > +++ b/fs/nfsd/nfs4xdr.c > > > > > @@ -6282,14 +6282,23 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op) > > > > > int len = xdr->buf->len - (op_status_offset + XDR_UNIT); > > > > > > > > > > so->so_replay.rp_status = op->status; > > > > > - if (len <= NFSD4_REPLAY_ISIZE) { > > > > > - so->so_replay.rp_buflen = len; > > > > > - read_bytes_from_xdr_buf(xdr->buf, > > > > > - op_status_offset + XDR_UNIT, > > > > > - so->so_replay.rp_buf, len); > > > > > - } else { > > > > > - so->so_replay.rp_buflen = 0; > > > > > + if (len > NFSD4_REPLAY_ISIZE) { > > > > > + char *buf = kmalloc(len, GFP_KERNEL); > > > > > + > > > > > + nfs4_replay_free_cache(&so->so_replay); > > > > > + if (buf) { > > > > > + so->so_replay.rp_buf = buf; > > > > > + } else { > > > > > + /* rp_buflen already zeroed; skip caching */ > > > > > + goto status; > > > > > + } > > > > > + } else if (so->so_replay.rp_buf != so->so_replay.rp_ibuf) { > > > > > + nfs4_replay_free_cache(&so->so_replay); > > > > > } > > > > > + so->so_replay.rp_buflen = len; > > > > > + read_bytes_from_xdr_buf(xdr->buf, > > > > > + op_status_offset + XDR_UNIT, > > > > > + so->so_replay.rp_buf, len); > > > > > } > > > > > status: > > > > > op->status = nfsd4_map_status(op->status, > > > > > diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h > > > > > index 3159c7b67f50..9b05462da4cc 100644 > > > > > --- a/fs/nfsd/state.h > > > > > +++ b/fs/nfsd/state.h > > > > > @@ -554,10 +554,10 @@ struct nfs4_client_reclaim { > > > > > * ~32(deleg. ace) = 112 bytes > > > > > * > > > > > * Some responses can exceed this. A LOCK denial includes the conflicting > > > > > - * lock owner, which can be up to 1024 bytes (NFS4_OPAQUE_LIMIT). Responses > > > > > - * larger than REPLAY_ISIZE are not cached in rp_ibuf; only rp_status is > > > > > - * saved. Enlarging this constant increases the size of every > > > > > - * nfs4_stateowner. > > > > > + * lock owner, which can be up to 1024 bytes (NFS4_OPAQUE_LIMIT). When a > > > > > + * response exceeds REPLAY_ISIZE, a buffer is dynamically allocated. If > > > > > + * that allocation fails, only rp_status is saved. Enlarging this constant > > > > > + * increases the size of every nfs4_stateowner. > > > > > */ > > > > > > > > > > #define NFSD4_REPLAY_ISIZE 112 > > > > > @@ -569,12 +569,14 @@ struct nfs4_client_reclaim { > > > > > struct nfs4_replay { > > > > > __be32 rp_status; > > > > > unsigned int rp_buflen; > > > > > - char *rp_buf; > > > > > + char *rp_buf; /* rp_ibuf or kmalloc'd */ > > > > > struct knfsd_fh rp_openfh; > > > > > int rp_locked; > > > > > char rp_ibuf[NFSD4_REPLAY_ISIZE]; > > > > > }; > > > > > > > > > > +extern void nfs4_replay_free_cache(struct nfs4_replay *rp); > > > > > + > > > > > struct nfs4_stateowner; > > > > > > > > > > struct nfs4_stateowner_operations { > > > > > > > > > > > > Certainly a reasonable approach if we care about full correctness when > > > > dealing with a large lockowner on NFSv4.0. Do we? > > > > > > The idea would be to either: > > > > > > o Backport your fix and not this update, or > > > o Squash these two together, and backport both > > > > > > Admittedly this is a narrow corner case for a minor version that is > > > destined for the scrap heap. > > > > > > > Right. I ask because I looked at this approach when I was fixing this, > > and decided it wasn't worthwhile. I certainly won't stand in your way > > if you decide you want to handle long lockowner blobs, but I doubt any > > legitimate user will ever care. > > I don't disagree at all. My concern is handling replay compliantly. > Maybe there's another approach. > I think that the only other way is to grow NFSD4_REPLAY_ISIZE, and doing dynamic allocation is preferable to that, IMO. To be clear: I don't have a problem with your patch. It just didn't seem worthwhile to me. If you think it's worth fixing though, then go for it. -- Jeff Layton <jlayton@kernel.org> ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-02-24 19:59 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-02-24 19:33 [PATCH] nfsd: use dynamic allocation for oversized NFSv4.0 replay cache Chuck Lever 2026-02-24 19:39 ` Jeff Layton 2026-02-24 19:42 ` Chuck Lever 2026-02-24 19:51 ` Jeff Layton 2026-02-24 19:53 ` Chuck Lever 2026-02-24 19:59 ` Jeff Layton
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox