public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] nfsd: use dynamic allocation for oversized NFSv4.0 replay cache
@ 2026-02-24 19:33 Chuck Lever
  2026-02-24 19:39 ` Jeff Layton
  0 siblings, 1 reply; 6+ messages in thread
From: Chuck Lever @ 2026-02-24 19:33 UTC (permalink / raw)
  To: Jeff Layton
  Cc: NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey, linux-nfs,
	Chuck Lever

From: Chuck Lever <chuck.lever@oracle.com>

Commit 1e8e9913672a ("nfsd: fix heap overflow in NFSv4.0 LOCK
replay cache") capped the replay cache copy at NFSD4_REPLAY_ISIZE
to prevent a heap overflow, but set rp_buflen to zero when the
encoded response exceeded the inline buffer. A retransmitted LOCK
reaching the replay path then produced only a status code with no
operation body, resulting in a malformed XDR response.

When the encoded response exceeds the 112-byte inline rp_ibuf, a
buffer is kmalloc'd to hold it. If the allocation fails, rp_buflen
remains zero, preserving the behavior from the capped-copy fix.
The buffer is freed when the stateowner is released or when a
subsequent operation's response fits in the inline buffer.

Fixes: 1e8e9913672a ("nfsd: fix heap overflow in NFSv4.0 LOCK replay cache")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/nfs4state.c | 16 ++++++++++++++++
 fs/nfsd/nfs4xdr.c   | 23 ++++++++++++++++-------
 fs/nfsd/state.h     | 12 +++++++-----
 3 files changed, 39 insertions(+), 12 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index ba49f49bb93b..b4d0e82b2690 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -1496,8 +1496,24 @@ release_all_access(struct nfs4_ol_stateid *stp)
 	}
 }
 
+/**
+ * nfs4_replay_free_cache - release dynamically allocated replay buffer
+ * @rp: replay cache to reset
+ *
+ * If @rp->rp_buf points to a kmalloc'd buffer, free it and reset
+ * rp_buf to the inline rp_ibuf. Always zeroes rp_buflen.
+ */
+void nfs4_replay_free_cache(struct nfs4_replay *rp)
+{
+	if (rp->rp_buf != rp->rp_ibuf)
+		kfree(rp->rp_buf);
+	rp->rp_buf = rp->rp_ibuf;
+	rp->rp_buflen = 0;
+}
+
 static inline void nfs4_free_stateowner(struct nfs4_stateowner *sop)
 {
+	nfs4_replay_free_cache(&sop->so_replay);
 	kfree(sop->so_owner.data);
 	sop->so_ops->so_free(sop);
 }
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 690f7a3122ec..2a0946c630e1 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -6282,14 +6282,23 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op)
 		int len = xdr->buf->len - (op_status_offset + XDR_UNIT);
 
 		so->so_replay.rp_status = op->status;
-		if (len <= NFSD4_REPLAY_ISIZE) {
-			so->so_replay.rp_buflen = len;
-			read_bytes_from_xdr_buf(xdr->buf,
-						op_status_offset + XDR_UNIT,
-						so->so_replay.rp_buf, len);
-		} else {
-			so->so_replay.rp_buflen = 0;
+		if (len > NFSD4_REPLAY_ISIZE) {
+			char *buf = kmalloc(len, GFP_KERNEL);
+
+			nfs4_replay_free_cache(&so->so_replay);
+			if (buf) {
+				so->so_replay.rp_buf = buf;
+			} else {
+				/* rp_buflen already zeroed; skip caching */
+				goto status;
+			}
+		} else if (so->so_replay.rp_buf != so->so_replay.rp_ibuf) {
+			nfs4_replay_free_cache(&so->so_replay);
 		}
+		so->so_replay.rp_buflen = len;
+		read_bytes_from_xdr_buf(xdr->buf,
+					op_status_offset + XDR_UNIT,
+					so->so_replay.rp_buf, len);
 	}
 status:
 	op->status = nfsd4_map_status(op->status,
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index 3159c7b67f50..9b05462da4cc 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -554,10 +554,10 @@ struct nfs4_client_reclaim {
  *   ~32(deleg. ace) = 112 bytes
  *
  * Some responses can exceed this. A LOCK denial includes the conflicting
- * lock owner, which can be up to 1024 bytes (NFS4_OPAQUE_LIMIT). Responses
- * larger than REPLAY_ISIZE are not cached in rp_ibuf; only rp_status is
- * saved. Enlarging this constant increases the size of every
- * nfs4_stateowner.
+ * lock owner, which can be up to 1024 bytes (NFS4_OPAQUE_LIMIT). When a
+ * response exceeds REPLAY_ISIZE, a buffer is dynamically allocated. If
+ * that allocation fails, only rp_status is saved. Enlarging this constant
+ * increases the size of every nfs4_stateowner.
  */
 
 #define NFSD4_REPLAY_ISIZE       112 
@@ -569,12 +569,14 @@ struct nfs4_client_reclaim {
 struct nfs4_replay {
 	__be32			rp_status;
 	unsigned int		rp_buflen;
-	char			*rp_buf;
+	char			*rp_buf; /* rp_ibuf or kmalloc'd */
 	struct knfsd_fh		rp_openfh;
 	int			rp_locked;
 	char			rp_ibuf[NFSD4_REPLAY_ISIZE];
 };
 
+extern void nfs4_replay_free_cache(struct nfs4_replay *rp);
+
 struct nfs4_stateowner;
 
 struct nfs4_stateowner_operations {
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] nfsd: use dynamic allocation for oversized NFSv4.0 replay cache
  2026-02-24 19:33 [PATCH] nfsd: use dynamic allocation for oversized NFSv4.0 replay cache Chuck Lever
@ 2026-02-24 19:39 ` Jeff Layton
  2026-02-24 19:42   ` Chuck Lever
  0 siblings, 1 reply; 6+ messages in thread
From: Jeff Layton @ 2026-02-24 19:39 UTC (permalink / raw)
  To: Chuck Lever
  Cc: NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey, linux-nfs,
	Chuck Lever

On Tue, 2026-02-24 at 14:33 -0500, Chuck Lever wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
> 
> Commit 1e8e9913672a ("nfsd: fix heap overflow in NFSv4.0 LOCK
> replay cache") capped the replay cache copy at NFSD4_REPLAY_ISIZE
> to prevent a heap overflow, but set rp_buflen to zero when the
> encoded response exceeded the inline buffer. A retransmitted LOCK
> reaching the replay path then produced only a status code with no
> operation body, resulting in a malformed XDR response.
> 
> When the encoded response exceeds the 112-byte inline rp_ibuf, a
> buffer is kmalloc'd to hold it. If the allocation fails, rp_buflen
> remains zero, preserving the behavior from the capped-copy fix.
> The buffer is freed when the stateowner is released or when a
> subsequent operation's response fits in the inline buffer.
> 
> Fixes: 1e8e9913672a ("nfsd: fix heap overflow in NFSv4.0 LOCK replay cache")
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>  fs/nfsd/nfs4state.c | 16 ++++++++++++++++
>  fs/nfsd/nfs4xdr.c   | 23 ++++++++++++++++-------
>  fs/nfsd/state.h     | 12 +++++++-----
>  3 files changed, 39 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index ba49f49bb93b..b4d0e82b2690 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -1496,8 +1496,24 @@ release_all_access(struct nfs4_ol_stateid *stp)
>  	}
>  }
>  
> +/**
> + * nfs4_replay_free_cache - release dynamically allocated replay buffer
> + * @rp: replay cache to reset
> + *
> + * If @rp->rp_buf points to a kmalloc'd buffer, free it and reset
> + * rp_buf to the inline rp_ibuf. Always zeroes rp_buflen.
> + */
> +void nfs4_replay_free_cache(struct nfs4_replay *rp)
> +{
> +	if (rp->rp_buf != rp->rp_ibuf)
> +		kfree(rp->rp_buf);
> +	rp->rp_buf = rp->rp_ibuf;
> +	rp->rp_buflen = 0;
> +}
> +
>  static inline void nfs4_free_stateowner(struct nfs4_stateowner *sop)
>  {
> +	nfs4_replay_free_cache(&sop->so_replay);
>  	kfree(sop->so_owner.data);
>  	sop->so_ops->so_free(sop);
>  }
> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> index 690f7a3122ec..2a0946c630e1 100644
> --- a/fs/nfsd/nfs4xdr.c
> +++ b/fs/nfsd/nfs4xdr.c
> @@ -6282,14 +6282,23 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op)
>  		int len = xdr->buf->len - (op_status_offset + XDR_UNIT);
>  
>  		so->so_replay.rp_status = op->status;
> -		if (len <= NFSD4_REPLAY_ISIZE) {
> -			so->so_replay.rp_buflen = len;
> -			read_bytes_from_xdr_buf(xdr->buf,
> -						op_status_offset + XDR_UNIT,
> -						so->so_replay.rp_buf, len);
> -		} else {
> -			so->so_replay.rp_buflen = 0;
> +		if (len > NFSD4_REPLAY_ISIZE) {
> +			char *buf = kmalloc(len, GFP_KERNEL);
> +
> +			nfs4_replay_free_cache(&so->so_replay);
> +			if (buf) {
> +				so->so_replay.rp_buf = buf;
> +			} else {
> +				/* rp_buflen already zeroed; skip caching */
> +				goto status;
> +			}
> +		} else if (so->so_replay.rp_buf != so->so_replay.rp_ibuf) {
> +			nfs4_replay_free_cache(&so->so_replay);
>  		}
> +		so->so_replay.rp_buflen = len;
> +		read_bytes_from_xdr_buf(xdr->buf,
> +					op_status_offset + XDR_UNIT,
> +					so->so_replay.rp_buf, len);
>  	}
>  status:
>  	op->status = nfsd4_map_status(op->status,
> diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
> index 3159c7b67f50..9b05462da4cc 100644
> --- a/fs/nfsd/state.h
> +++ b/fs/nfsd/state.h
> @@ -554,10 +554,10 @@ struct nfs4_client_reclaim {
>   *   ~32(deleg. ace) = 112 bytes
>   *
>   * Some responses can exceed this. A LOCK denial includes the conflicting
> - * lock owner, which can be up to 1024 bytes (NFS4_OPAQUE_LIMIT). Responses
> - * larger than REPLAY_ISIZE are not cached in rp_ibuf; only rp_status is
> - * saved. Enlarging this constant increases the size of every
> - * nfs4_stateowner.
> + * lock owner, which can be up to 1024 bytes (NFS4_OPAQUE_LIMIT). When a
> + * response exceeds REPLAY_ISIZE, a buffer is dynamically allocated. If
> + * that allocation fails, only rp_status is saved. Enlarging this constant
> + * increases the size of every nfs4_stateowner.
>   */
>  
>  #define NFSD4_REPLAY_ISIZE       112 
> @@ -569,12 +569,14 @@ struct nfs4_client_reclaim {
>  struct nfs4_replay {
>  	__be32			rp_status;
>  	unsigned int		rp_buflen;
> -	char			*rp_buf;
> +	char			*rp_buf; /* rp_ibuf or kmalloc'd */
>  	struct knfsd_fh		rp_openfh;
>  	int			rp_locked;
>  	char			rp_ibuf[NFSD4_REPLAY_ISIZE];
>  };
>  
> +extern void nfs4_replay_free_cache(struct nfs4_replay *rp);
> +
>  struct nfs4_stateowner;
>  
>  struct nfs4_stateowner_operations {


Certainly a reasonable approach if we care about full correctness when
dealing with a large lockowner on NFSv4.0. Do we?
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] nfsd: use dynamic allocation for oversized NFSv4.0 replay cache
  2026-02-24 19:39 ` Jeff Layton
@ 2026-02-24 19:42   ` Chuck Lever
  2026-02-24 19:51     ` Jeff Layton
  0 siblings, 1 reply; 6+ messages in thread
From: Chuck Lever @ 2026-02-24 19:42 UTC (permalink / raw)
  To: Jeff Layton
  Cc: NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey, linux-nfs,
	Chuck Lever

On 2/24/26 2:39 PM, Jeff Layton wrote:
> On Tue, 2026-02-24 at 14:33 -0500, Chuck Lever wrote:
>> From: Chuck Lever <chuck.lever@oracle.com>
>>
>> Commit 1e8e9913672a ("nfsd: fix heap overflow in NFSv4.0 LOCK
>> replay cache") capped the replay cache copy at NFSD4_REPLAY_ISIZE
>> to prevent a heap overflow, but set rp_buflen to zero when the
>> encoded response exceeded the inline buffer. A retransmitted LOCK
>> reaching the replay path then produced only a status code with no
>> operation body, resulting in a malformed XDR response.
>>
>> When the encoded response exceeds the 112-byte inline rp_ibuf, a
>> buffer is kmalloc'd to hold it. If the allocation fails, rp_buflen
>> remains zero, preserving the behavior from the capped-copy fix.
>> The buffer is freed when the stateowner is released or when a
>> subsequent operation's response fits in the inline buffer.
>>
>> Fixes: 1e8e9913672a ("nfsd: fix heap overflow in NFSv4.0 LOCK replay cache")
>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>> ---
>>  fs/nfsd/nfs4state.c | 16 ++++++++++++++++
>>  fs/nfsd/nfs4xdr.c   | 23 ++++++++++++++++-------
>>  fs/nfsd/state.h     | 12 +++++++-----
>>  3 files changed, 39 insertions(+), 12 deletions(-)
>>
>> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
>> index ba49f49bb93b..b4d0e82b2690 100644
>> --- a/fs/nfsd/nfs4state.c
>> +++ b/fs/nfsd/nfs4state.c
>> @@ -1496,8 +1496,24 @@ release_all_access(struct nfs4_ol_stateid *stp)
>>  	}
>>  }
>>  
>> +/**
>> + * nfs4_replay_free_cache - release dynamically allocated replay buffer
>> + * @rp: replay cache to reset
>> + *
>> + * If @rp->rp_buf points to a kmalloc'd buffer, free it and reset
>> + * rp_buf to the inline rp_ibuf. Always zeroes rp_buflen.
>> + */
>> +void nfs4_replay_free_cache(struct nfs4_replay *rp)
>> +{
>> +	if (rp->rp_buf != rp->rp_ibuf)
>> +		kfree(rp->rp_buf);
>> +	rp->rp_buf = rp->rp_ibuf;
>> +	rp->rp_buflen = 0;
>> +}
>> +
>>  static inline void nfs4_free_stateowner(struct nfs4_stateowner *sop)
>>  {
>> +	nfs4_replay_free_cache(&sop->so_replay);
>>  	kfree(sop->so_owner.data);
>>  	sop->so_ops->so_free(sop);
>>  }
>> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
>> index 690f7a3122ec..2a0946c630e1 100644
>> --- a/fs/nfsd/nfs4xdr.c
>> +++ b/fs/nfsd/nfs4xdr.c
>> @@ -6282,14 +6282,23 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op)
>>  		int len = xdr->buf->len - (op_status_offset + XDR_UNIT);
>>  
>>  		so->so_replay.rp_status = op->status;
>> -		if (len <= NFSD4_REPLAY_ISIZE) {
>> -			so->so_replay.rp_buflen = len;
>> -			read_bytes_from_xdr_buf(xdr->buf,
>> -						op_status_offset + XDR_UNIT,
>> -						so->so_replay.rp_buf, len);
>> -		} else {
>> -			so->so_replay.rp_buflen = 0;
>> +		if (len > NFSD4_REPLAY_ISIZE) {
>> +			char *buf = kmalloc(len, GFP_KERNEL);
>> +
>> +			nfs4_replay_free_cache(&so->so_replay);
>> +			if (buf) {
>> +				so->so_replay.rp_buf = buf;
>> +			} else {
>> +				/* rp_buflen already zeroed; skip caching */
>> +				goto status;
>> +			}
>> +		} else if (so->so_replay.rp_buf != so->so_replay.rp_ibuf) {
>> +			nfs4_replay_free_cache(&so->so_replay);
>>  		}
>> +		so->so_replay.rp_buflen = len;
>> +		read_bytes_from_xdr_buf(xdr->buf,
>> +					op_status_offset + XDR_UNIT,
>> +					so->so_replay.rp_buf, len);
>>  	}
>>  status:
>>  	op->status = nfsd4_map_status(op->status,
>> diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
>> index 3159c7b67f50..9b05462da4cc 100644
>> --- a/fs/nfsd/state.h
>> +++ b/fs/nfsd/state.h
>> @@ -554,10 +554,10 @@ struct nfs4_client_reclaim {
>>   *   ~32(deleg. ace) = 112 bytes
>>   *
>>   * Some responses can exceed this. A LOCK denial includes the conflicting
>> - * lock owner, which can be up to 1024 bytes (NFS4_OPAQUE_LIMIT). Responses
>> - * larger than REPLAY_ISIZE are not cached in rp_ibuf; only rp_status is
>> - * saved. Enlarging this constant increases the size of every
>> - * nfs4_stateowner.
>> + * lock owner, which can be up to 1024 bytes (NFS4_OPAQUE_LIMIT). When a
>> + * response exceeds REPLAY_ISIZE, a buffer is dynamically allocated. If
>> + * that allocation fails, only rp_status is saved. Enlarging this constant
>> + * increases the size of every nfs4_stateowner.
>>   */
>>  
>>  #define NFSD4_REPLAY_ISIZE       112 
>> @@ -569,12 +569,14 @@ struct nfs4_client_reclaim {
>>  struct nfs4_replay {
>>  	__be32			rp_status;
>>  	unsigned int		rp_buflen;
>> -	char			*rp_buf;
>> +	char			*rp_buf; /* rp_ibuf or kmalloc'd */
>>  	struct knfsd_fh		rp_openfh;
>>  	int			rp_locked;
>>  	char			rp_ibuf[NFSD4_REPLAY_ISIZE];
>>  };
>>  
>> +extern void nfs4_replay_free_cache(struct nfs4_replay *rp);
>> +
>>  struct nfs4_stateowner;
>>  
>>  struct nfs4_stateowner_operations {
> 
> 
> Certainly a reasonable approach if we care about full correctness when
> dealing with a large lockowner on NFSv4.0. Do we?

The idea would be to either:

o Backport your fix and not this update, or
o Squash these two together, and backport both

Admittedly this is a narrow corner case for a minor version that is
destined for the scrap heap.


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] nfsd: use dynamic allocation for oversized NFSv4.0 replay cache
  2026-02-24 19:42   ` Chuck Lever
@ 2026-02-24 19:51     ` Jeff Layton
  2026-02-24 19:53       ` Chuck Lever
  0 siblings, 1 reply; 6+ messages in thread
From: Jeff Layton @ 2026-02-24 19:51 UTC (permalink / raw)
  To: Chuck Lever
  Cc: NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey, linux-nfs,
	Chuck Lever

On Tue, 2026-02-24 at 14:42 -0500, Chuck Lever wrote:
> On 2/24/26 2:39 PM, Jeff Layton wrote:
> > On Tue, 2026-02-24 at 14:33 -0500, Chuck Lever wrote:
> > > From: Chuck Lever <chuck.lever@oracle.com>
> > > 
> > > Commit 1e8e9913672a ("nfsd: fix heap overflow in NFSv4.0 LOCK
> > > replay cache") capped the replay cache copy at NFSD4_REPLAY_ISIZE
> > > to prevent a heap overflow, but set rp_buflen to zero when the
> > > encoded response exceeded the inline buffer. A retransmitted LOCK
> > > reaching the replay path then produced only a status code with no
> > > operation body, resulting in a malformed XDR response.
> > > 
> > > When the encoded response exceeds the 112-byte inline rp_ibuf, a
> > > buffer is kmalloc'd to hold it. If the allocation fails, rp_buflen
> > > remains zero, preserving the behavior from the capped-copy fix.
> > > The buffer is freed when the stateowner is released or when a
> > > subsequent operation's response fits in the inline buffer.
> > > 
> > > Fixes: 1e8e9913672a ("nfsd: fix heap overflow in NFSv4.0 LOCK replay cache")
> > > Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> > > ---
> > >  fs/nfsd/nfs4state.c | 16 ++++++++++++++++
> > >  fs/nfsd/nfs4xdr.c   | 23 ++++++++++++++++-------
> > >  fs/nfsd/state.h     | 12 +++++++-----
> > >  3 files changed, 39 insertions(+), 12 deletions(-)
> > > 
> > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > > index ba49f49bb93b..b4d0e82b2690 100644
> > > --- a/fs/nfsd/nfs4state.c
> > > +++ b/fs/nfsd/nfs4state.c
> > > @@ -1496,8 +1496,24 @@ release_all_access(struct nfs4_ol_stateid *stp)
> > >  	}
> > >  }
> > >  
> > > +/**
> > > + * nfs4_replay_free_cache - release dynamically allocated replay buffer
> > > + * @rp: replay cache to reset
> > > + *
> > > + * If @rp->rp_buf points to a kmalloc'd buffer, free it and reset
> > > + * rp_buf to the inline rp_ibuf. Always zeroes rp_buflen.
> > > + */
> > > +void nfs4_replay_free_cache(struct nfs4_replay *rp)
> > > +{
> > > +	if (rp->rp_buf != rp->rp_ibuf)
> > > +		kfree(rp->rp_buf);
> > > +	rp->rp_buf = rp->rp_ibuf;
> > > +	rp->rp_buflen = 0;
> > > +}
> > > +
> > >  static inline void nfs4_free_stateowner(struct nfs4_stateowner *sop)
> > >  {
> > > +	nfs4_replay_free_cache(&sop->so_replay);
> > >  	kfree(sop->so_owner.data);
> > >  	sop->so_ops->so_free(sop);
> > >  }
> > > diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> > > index 690f7a3122ec..2a0946c630e1 100644
> > > --- a/fs/nfsd/nfs4xdr.c
> > > +++ b/fs/nfsd/nfs4xdr.c
> > > @@ -6282,14 +6282,23 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op)
> > >  		int len = xdr->buf->len - (op_status_offset + XDR_UNIT);
> > >  
> > >  		so->so_replay.rp_status = op->status;
> > > -		if (len <= NFSD4_REPLAY_ISIZE) {
> > > -			so->so_replay.rp_buflen = len;
> > > -			read_bytes_from_xdr_buf(xdr->buf,
> > > -						op_status_offset + XDR_UNIT,
> > > -						so->so_replay.rp_buf, len);
> > > -		} else {
> > > -			so->so_replay.rp_buflen = 0;
> > > +		if (len > NFSD4_REPLAY_ISIZE) {
> > > +			char *buf = kmalloc(len, GFP_KERNEL);
> > > +
> > > +			nfs4_replay_free_cache(&so->so_replay);
> > > +			if (buf) {
> > > +				so->so_replay.rp_buf = buf;
> > > +			} else {
> > > +				/* rp_buflen already zeroed; skip caching */
> > > +				goto status;
> > > +			}
> > > +		} else if (so->so_replay.rp_buf != so->so_replay.rp_ibuf) {
> > > +			nfs4_replay_free_cache(&so->so_replay);
> > >  		}
> > > +		so->so_replay.rp_buflen = len;
> > > +		read_bytes_from_xdr_buf(xdr->buf,
> > > +					op_status_offset + XDR_UNIT,
> > > +					so->so_replay.rp_buf, len);
> > >  	}
> > >  status:
> > >  	op->status = nfsd4_map_status(op->status,
> > > diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
> > > index 3159c7b67f50..9b05462da4cc 100644
> > > --- a/fs/nfsd/state.h
> > > +++ b/fs/nfsd/state.h
> > > @@ -554,10 +554,10 @@ struct nfs4_client_reclaim {
> > >   *   ~32(deleg. ace) = 112 bytes
> > >   *
> > >   * Some responses can exceed this. A LOCK denial includes the conflicting
> > > - * lock owner, which can be up to 1024 bytes (NFS4_OPAQUE_LIMIT). Responses
> > > - * larger than REPLAY_ISIZE are not cached in rp_ibuf; only rp_status is
> > > - * saved. Enlarging this constant increases the size of every
> > > - * nfs4_stateowner.
> > > + * lock owner, which can be up to 1024 bytes (NFS4_OPAQUE_LIMIT). When a
> > > + * response exceeds REPLAY_ISIZE, a buffer is dynamically allocated. If
> > > + * that allocation fails, only rp_status is saved. Enlarging this constant
> > > + * increases the size of every nfs4_stateowner.
> > >   */
> > >  
> > >  #define NFSD4_REPLAY_ISIZE       112 
> > > @@ -569,12 +569,14 @@ struct nfs4_client_reclaim {
> > >  struct nfs4_replay {
> > >  	__be32			rp_status;
> > >  	unsigned int		rp_buflen;
> > > -	char			*rp_buf;
> > > +	char			*rp_buf; /* rp_ibuf or kmalloc'd */
> > >  	struct knfsd_fh		rp_openfh;
> > >  	int			rp_locked;
> > >  	char			rp_ibuf[NFSD4_REPLAY_ISIZE];
> > >  };
> > >  
> > > +extern void nfs4_replay_free_cache(struct nfs4_replay *rp);
> > > +
> > >  struct nfs4_stateowner;
> > >  
> > >  struct nfs4_stateowner_operations {
> > 
> > 
> > Certainly a reasonable approach if we care about full correctness when
> > dealing with a large lockowner on NFSv4.0. Do we?
> 
> The idea would be to either:
> 
> o Backport your fix and not this update, or
> o Squash these two together, and backport both
> 
> Admittedly this is a narrow corner case for a minor version that is
> destined for the scrap heap.
> 

Right. I ask because I looked at this approach when I was fixing this,
and decided it wasn't worthwhile. I certainly won't stand in your way
if you decide you want to handle long lockowner blobs, but I doubt any
legitimate user will ever care.
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] nfsd: use dynamic allocation for oversized NFSv4.0 replay cache
  2026-02-24 19:51     ` Jeff Layton
@ 2026-02-24 19:53       ` Chuck Lever
  2026-02-24 19:59         ` Jeff Layton
  0 siblings, 1 reply; 6+ messages in thread
From: Chuck Lever @ 2026-02-24 19:53 UTC (permalink / raw)
  To: Jeff Layton
  Cc: NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey, linux-nfs,
	Chuck Lever

On 2/24/26 2:51 PM, Jeff Layton wrote:
> On Tue, 2026-02-24 at 14:42 -0500, Chuck Lever wrote:
>> On 2/24/26 2:39 PM, Jeff Layton wrote:
>>> On Tue, 2026-02-24 at 14:33 -0500, Chuck Lever wrote:
>>>> From: Chuck Lever <chuck.lever@oracle.com>
>>>>
>>>> Commit 1e8e9913672a ("nfsd: fix heap overflow in NFSv4.0 LOCK
>>>> replay cache") capped the replay cache copy at NFSD4_REPLAY_ISIZE
>>>> to prevent a heap overflow, but set rp_buflen to zero when the
>>>> encoded response exceeded the inline buffer. A retransmitted LOCK
>>>> reaching the replay path then produced only a status code with no
>>>> operation body, resulting in a malformed XDR response.
>>>>
>>>> When the encoded response exceeds the 112-byte inline rp_ibuf, a
>>>> buffer is kmalloc'd to hold it. If the allocation fails, rp_buflen
>>>> remains zero, preserving the behavior from the capped-copy fix.
>>>> The buffer is freed when the stateowner is released or when a
>>>> subsequent operation's response fits in the inline buffer.
>>>>
>>>> Fixes: 1e8e9913672a ("nfsd: fix heap overflow in NFSv4.0 LOCK replay cache")
>>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>>>> ---
>>>>  fs/nfsd/nfs4state.c | 16 ++++++++++++++++
>>>>  fs/nfsd/nfs4xdr.c   | 23 ++++++++++++++++-------
>>>>  fs/nfsd/state.h     | 12 +++++++-----
>>>>  3 files changed, 39 insertions(+), 12 deletions(-)
>>>>
>>>> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
>>>> index ba49f49bb93b..b4d0e82b2690 100644
>>>> --- a/fs/nfsd/nfs4state.c
>>>> +++ b/fs/nfsd/nfs4state.c
>>>> @@ -1496,8 +1496,24 @@ release_all_access(struct nfs4_ol_stateid *stp)
>>>>  	}
>>>>  }
>>>>  
>>>> +/**
>>>> + * nfs4_replay_free_cache - release dynamically allocated replay buffer
>>>> + * @rp: replay cache to reset
>>>> + *
>>>> + * If @rp->rp_buf points to a kmalloc'd buffer, free it and reset
>>>> + * rp_buf to the inline rp_ibuf. Always zeroes rp_buflen.
>>>> + */
>>>> +void nfs4_replay_free_cache(struct nfs4_replay *rp)
>>>> +{
>>>> +	if (rp->rp_buf != rp->rp_ibuf)
>>>> +		kfree(rp->rp_buf);
>>>> +	rp->rp_buf = rp->rp_ibuf;
>>>> +	rp->rp_buflen = 0;
>>>> +}
>>>> +
>>>>  static inline void nfs4_free_stateowner(struct nfs4_stateowner *sop)
>>>>  {
>>>> +	nfs4_replay_free_cache(&sop->so_replay);
>>>>  	kfree(sop->so_owner.data);
>>>>  	sop->so_ops->so_free(sop);
>>>>  }
>>>> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
>>>> index 690f7a3122ec..2a0946c630e1 100644
>>>> --- a/fs/nfsd/nfs4xdr.c
>>>> +++ b/fs/nfsd/nfs4xdr.c
>>>> @@ -6282,14 +6282,23 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op)
>>>>  		int len = xdr->buf->len - (op_status_offset + XDR_UNIT);
>>>>  
>>>>  		so->so_replay.rp_status = op->status;
>>>> -		if (len <= NFSD4_REPLAY_ISIZE) {
>>>> -			so->so_replay.rp_buflen = len;
>>>> -			read_bytes_from_xdr_buf(xdr->buf,
>>>> -						op_status_offset + XDR_UNIT,
>>>> -						so->so_replay.rp_buf, len);
>>>> -		} else {
>>>> -			so->so_replay.rp_buflen = 0;
>>>> +		if (len > NFSD4_REPLAY_ISIZE) {
>>>> +			char *buf = kmalloc(len, GFP_KERNEL);
>>>> +
>>>> +			nfs4_replay_free_cache(&so->so_replay);
>>>> +			if (buf) {
>>>> +				so->so_replay.rp_buf = buf;
>>>> +			} else {
>>>> +				/* rp_buflen already zeroed; skip caching */
>>>> +				goto status;
>>>> +			}
>>>> +		} else if (so->so_replay.rp_buf != so->so_replay.rp_ibuf) {
>>>> +			nfs4_replay_free_cache(&so->so_replay);
>>>>  		}
>>>> +		so->so_replay.rp_buflen = len;
>>>> +		read_bytes_from_xdr_buf(xdr->buf,
>>>> +					op_status_offset + XDR_UNIT,
>>>> +					so->so_replay.rp_buf, len);
>>>>  	}
>>>>  status:
>>>>  	op->status = nfsd4_map_status(op->status,
>>>> diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
>>>> index 3159c7b67f50..9b05462da4cc 100644
>>>> --- a/fs/nfsd/state.h
>>>> +++ b/fs/nfsd/state.h
>>>> @@ -554,10 +554,10 @@ struct nfs4_client_reclaim {
>>>>   *   ~32(deleg. ace) = 112 bytes
>>>>   *
>>>>   * Some responses can exceed this. A LOCK denial includes the conflicting
>>>> - * lock owner, which can be up to 1024 bytes (NFS4_OPAQUE_LIMIT). Responses
>>>> - * larger than REPLAY_ISIZE are not cached in rp_ibuf; only rp_status is
>>>> - * saved. Enlarging this constant increases the size of every
>>>> - * nfs4_stateowner.
>>>> + * lock owner, which can be up to 1024 bytes (NFS4_OPAQUE_LIMIT). When a
>>>> + * response exceeds REPLAY_ISIZE, a buffer is dynamically allocated. If
>>>> + * that allocation fails, only rp_status is saved. Enlarging this constant
>>>> + * increases the size of every nfs4_stateowner.
>>>>   */
>>>>  
>>>>  #define NFSD4_REPLAY_ISIZE       112 
>>>> @@ -569,12 +569,14 @@ struct nfs4_client_reclaim {
>>>>  struct nfs4_replay {
>>>>  	__be32			rp_status;
>>>>  	unsigned int		rp_buflen;
>>>> -	char			*rp_buf;
>>>> +	char			*rp_buf; /* rp_ibuf or kmalloc'd */
>>>>  	struct knfsd_fh		rp_openfh;
>>>>  	int			rp_locked;
>>>>  	char			rp_ibuf[NFSD4_REPLAY_ISIZE];
>>>>  };
>>>>  
>>>> +extern void nfs4_replay_free_cache(struct nfs4_replay *rp);
>>>> +
>>>>  struct nfs4_stateowner;
>>>>  
>>>>  struct nfs4_stateowner_operations {
>>>
>>>
>>> Certainly a reasonable approach if we care about full correctness when
>>> dealing with a large lockowner on NFSv4.0. Do we?
>>
>> The idea would be to either:
>>
>> o Backport your fix and not this update, or
>> o Squash these two together, and backport both
>>
>> Admittedly this is a narrow corner case for a minor version that is
>> destined for the scrap heap.
>>
> 
> Right. I ask because I looked at this approach when I was fixing this,
> and decided it wasn't worthwhile. I certainly won't stand in your way
> if you decide you want to handle long lockowner blobs, but I doubt any
> legitimate user will ever care.

I don't disagree at all. My concern is handling replay compliantly.
Maybe there's another approach.


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] nfsd: use dynamic allocation for oversized NFSv4.0 replay cache
  2026-02-24 19:53       ` Chuck Lever
@ 2026-02-24 19:59         ` Jeff Layton
  0 siblings, 0 replies; 6+ messages in thread
From: Jeff Layton @ 2026-02-24 19:59 UTC (permalink / raw)
  To: Chuck Lever
  Cc: NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey, linux-nfs,
	Chuck Lever

On Tue, 2026-02-24 at 14:53 -0500, Chuck Lever wrote:
> On 2/24/26 2:51 PM, Jeff Layton wrote:
> > On Tue, 2026-02-24 at 14:42 -0500, Chuck Lever wrote:
> > > On 2/24/26 2:39 PM, Jeff Layton wrote:
> > > > On Tue, 2026-02-24 at 14:33 -0500, Chuck Lever wrote:
> > > > > From: Chuck Lever <chuck.lever@oracle.com>
> > > > > 
> > > > > Commit 1e8e9913672a ("nfsd: fix heap overflow in NFSv4.0 LOCK
> > > > > replay cache") capped the replay cache copy at NFSD4_REPLAY_ISIZE
> > > > > to prevent a heap overflow, but set rp_buflen to zero when the
> > > > > encoded response exceeded the inline buffer. A retransmitted LOCK
> > > > > reaching the replay path then produced only a status code with no
> > > > > operation body, resulting in a malformed XDR response.
> > > > > 
> > > > > When the encoded response exceeds the 112-byte inline rp_ibuf, a
> > > > > buffer is kmalloc'd to hold it. If the allocation fails, rp_buflen
> > > > > remains zero, preserving the behavior from the capped-copy fix.
> > > > > The buffer is freed when the stateowner is released or when a
> > > > > subsequent operation's response fits in the inline buffer.
> > > > > 
> > > > > Fixes: 1e8e9913672a ("nfsd: fix heap overflow in NFSv4.0 LOCK replay cache")
> > > > > Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> > > > > ---
> > > > >  fs/nfsd/nfs4state.c | 16 ++++++++++++++++
> > > > >  fs/nfsd/nfs4xdr.c   | 23 ++++++++++++++++-------
> > > > >  fs/nfsd/state.h     | 12 +++++++-----
> > > > >  3 files changed, 39 insertions(+), 12 deletions(-)
> > > > > 
> > > > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > > > > index ba49f49bb93b..b4d0e82b2690 100644
> > > > > --- a/fs/nfsd/nfs4state.c
> > > > > +++ b/fs/nfsd/nfs4state.c
> > > > > @@ -1496,8 +1496,24 @@ release_all_access(struct nfs4_ol_stateid *stp)
> > > > >  	}
> > > > >  }
> > > > >  
> > > > > +/**
> > > > > + * nfs4_replay_free_cache - release dynamically allocated replay buffer
> > > > > + * @rp: replay cache to reset
> > > > > + *
> > > > > + * If @rp->rp_buf points to a kmalloc'd buffer, free it and reset
> > > > > + * rp_buf to the inline rp_ibuf. Always zeroes rp_buflen.
> > > > > + */
> > > > > +void nfs4_replay_free_cache(struct nfs4_replay *rp)
> > > > > +{
> > > > > +	if (rp->rp_buf != rp->rp_ibuf)
> > > > > +		kfree(rp->rp_buf);
> > > > > +	rp->rp_buf = rp->rp_ibuf;
> > > > > +	rp->rp_buflen = 0;
> > > > > +}
> > > > > +
> > > > >  static inline void nfs4_free_stateowner(struct nfs4_stateowner *sop)
> > > > >  {
> > > > > +	nfs4_replay_free_cache(&sop->so_replay);
> > > > >  	kfree(sop->so_owner.data);
> > > > >  	sop->so_ops->so_free(sop);
> > > > >  }
> > > > > diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> > > > > index 690f7a3122ec..2a0946c630e1 100644
> > > > > --- a/fs/nfsd/nfs4xdr.c
> > > > > +++ b/fs/nfsd/nfs4xdr.c
> > > > > @@ -6282,14 +6282,23 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op)
> > > > >  		int len = xdr->buf->len - (op_status_offset + XDR_UNIT);
> > > > >  
> > > > >  		so->so_replay.rp_status = op->status;
> > > > > -		if (len <= NFSD4_REPLAY_ISIZE) {
> > > > > -			so->so_replay.rp_buflen = len;
> > > > > -			read_bytes_from_xdr_buf(xdr->buf,
> > > > > -						op_status_offset + XDR_UNIT,
> > > > > -						so->so_replay.rp_buf, len);
> > > > > -		} else {
> > > > > -			so->so_replay.rp_buflen = 0;
> > > > > +		if (len > NFSD4_REPLAY_ISIZE) {
> > > > > +			char *buf = kmalloc(len, GFP_KERNEL);
> > > > > +
> > > > > +			nfs4_replay_free_cache(&so->so_replay);
> > > > > +			if (buf) {
> > > > > +				so->so_replay.rp_buf = buf;
> > > > > +			} else {
> > > > > +				/* rp_buflen already zeroed; skip caching */
> > > > > +				goto status;
> > > > > +			}
> > > > > +		} else if (so->so_replay.rp_buf != so->so_replay.rp_ibuf) {
> > > > > +			nfs4_replay_free_cache(&so->so_replay);
> > > > >  		}
> > > > > +		so->so_replay.rp_buflen = len;
> > > > > +		read_bytes_from_xdr_buf(xdr->buf,
> > > > > +					op_status_offset + XDR_UNIT,
> > > > > +					so->so_replay.rp_buf, len);
> > > > >  	}
> > > > >  status:
> > > > >  	op->status = nfsd4_map_status(op->status,
> > > > > diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
> > > > > index 3159c7b67f50..9b05462da4cc 100644
> > > > > --- a/fs/nfsd/state.h
> > > > > +++ b/fs/nfsd/state.h
> > > > > @@ -554,10 +554,10 @@ struct nfs4_client_reclaim {
> > > > >   *   ~32(deleg. ace) = 112 bytes
> > > > >   *
> > > > >   * Some responses can exceed this. A LOCK denial includes the conflicting
> > > > > - * lock owner, which can be up to 1024 bytes (NFS4_OPAQUE_LIMIT). Responses
> > > > > - * larger than REPLAY_ISIZE are not cached in rp_ibuf; only rp_status is
> > > > > - * saved. Enlarging this constant increases the size of every
> > > > > - * nfs4_stateowner.
> > > > > + * lock owner, which can be up to 1024 bytes (NFS4_OPAQUE_LIMIT). When a
> > > > > + * response exceeds REPLAY_ISIZE, a buffer is dynamically allocated. If
> > > > > + * that allocation fails, only rp_status is saved. Enlarging this constant
> > > > > + * increases the size of every nfs4_stateowner.
> > > > >   */
> > > > >  
> > > > >  #define NFSD4_REPLAY_ISIZE       112 
> > > > > @@ -569,12 +569,14 @@ struct nfs4_client_reclaim {
> > > > >  struct nfs4_replay {
> > > > >  	__be32			rp_status;
> > > > >  	unsigned int		rp_buflen;
> > > > > -	char			*rp_buf;
> > > > > +	char			*rp_buf; /* rp_ibuf or kmalloc'd */
> > > > >  	struct knfsd_fh		rp_openfh;
> > > > >  	int			rp_locked;
> > > > >  	char			rp_ibuf[NFSD4_REPLAY_ISIZE];
> > > > >  };
> > > > >  
> > > > > +extern void nfs4_replay_free_cache(struct nfs4_replay *rp);
> > > > > +
> > > > >  struct nfs4_stateowner;
> > > > >  
> > > > >  struct nfs4_stateowner_operations {
> > > > 
> > > > 
> > > > Certainly a reasonable approach if we care about full correctness when
> > > > dealing with a large lockowner on NFSv4.0. Do we?
> > > 
> > > The idea would be to either:
> > > 
> > > o Backport your fix and not this update, or
> > > o Squash these two together, and backport both
> > > 
> > > Admittedly this is a narrow corner case for a minor version that is
> > > destined for the scrap heap.
> > > 
> > 
> > Right. I ask because I looked at this approach when I was fixing this,
> > and decided it wasn't worthwhile. I certainly won't stand in your way
> > if you decide you want to handle long lockowner blobs, but I doubt any
> > legitimate user will ever care.
> 
> I don't disagree at all. My concern is handling replay compliantly.
> Maybe there's another approach.
> 

I think that the only other way is to grow NFSD4_REPLAY_ISIZE, and
doing dynamic allocation is preferable to that, IMO.

To be clear: I don't have a problem with your patch. It just didn't
seem worthwhile to me. If you think it's worth fixing though, then go
for it.
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-02-24 19:59 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-24 19:33 [PATCH] nfsd: use dynamic allocation for oversized NFSv4.0 replay cache Chuck Lever
2026-02-24 19:39 ` Jeff Layton
2026-02-24 19:42   ` Chuck Lever
2026-02-24 19:51     ` Jeff Layton
2026-02-24 19:53       ` Chuck Lever
2026-02-24 19:59         ` Jeff Layton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox