public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@kernel.org>
To: Chuck Lever <cel@kernel.org>
Cc: NeilBrown <neilb@ownmail.net>,
	Olga Kornievskaia <okorniev@redhat.com>,
	 Dai Ngo <dai.ngo@oracle.com>, Tom Talpey <tom@talpey.com>,
	linux-nfs@vger.kernel.org, Chuck Lever	 <chuck.lever@oracle.com>
Subject: Re: [PATCH] nfsd: use dynamic allocation for oversized NFSv4.0 replay cache
Date: Tue, 24 Feb 2026 14:59:17 -0500	[thread overview]
Message-ID: <f16c1806a705e08252b1b39ea44b1de1e6be17d6.camel@kernel.org> (raw)
In-Reply-To: <3a700716-9b7d-4bc5-8d9b-24bcd585df26@kernel.org>

On Tue, 2026-02-24 at 14:53 -0500, Chuck Lever wrote:
> On 2/24/26 2:51 PM, Jeff Layton wrote:
> > On Tue, 2026-02-24 at 14:42 -0500, Chuck Lever wrote:
> > > On 2/24/26 2:39 PM, Jeff Layton wrote:
> > > > On Tue, 2026-02-24 at 14:33 -0500, Chuck Lever wrote:
> > > > > From: Chuck Lever <chuck.lever@oracle.com>
> > > > > 
> > > > > Commit 1e8e9913672a ("nfsd: fix heap overflow in NFSv4.0 LOCK
> > > > > replay cache") capped the replay cache copy at NFSD4_REPLAY_ISIZE
> > > > > to prevent a heap overflow, but set rp_buflen to zero when the
> > > > > encoded response exceeded the inline buffer. A retransmitted LOCK
> > > > > reaching the replay path then produced only a status code with no
> > > > > operation body, resulting in a malformed XDR response.
> > > > > 
> > > > > When the encoded response exceeds the 112-byte inline rp_ibuf, a
> > > > > buffer is kmalloc'd to hold it. If the allocation fails, rp_buflen
> > > > > remains zero, preserving the behavior from the capped-copy fix.
> > > > > The buffer is freed when the stateowner is released or when a
> > > > > subsequent operation's response fits in the inline buffer.
> > > > > 
> > > > > Fixes: 1e8e9913672a ("nfsd: fix heap overflow in NFSv4.0 LOCK replay cache")
> > > > > Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> > > > > ---
> > > > >  fs/nfsd/nfs4state.c | 16 ++++++++++++++++
> > > > >  fs/nfsd/nfs4xdr.c   | 23 ++++++++++++++++-------
> > > > >  fs/nfsd/state.h     | 12 +++++++-----
> > > > >  3 files changed, 39 insertions(+), 12 deletions(-)
> > > > > 
> > > > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > > > > index ba49f49bb93b..b4d0e82b2690 100644
> > > > > --- a/fs/nfsd/nfs4state.c
> > > > > +++ b/fs/nfsd/nfs4state.c
> > > > > @@ -1496,8 +1496,24 @@ release_all_access(struct nfs4_ol_stateid *stp)
> > > > >  	}
> > > > >  }
> > > > >  
> > > > > +/**
> > > > > + * nfs4_replay_free_cache - release dynamically allocated replay buffer
> > > > > + * @rp: replay cache to reset
> > > > > + *
> > > > > + * If @rp->rp_buf points to a kmalloc'd buffer, free it and reset
> > > > > + * rp_buf to the inline rp_ibuf. Always zeroes rp_buflen.
> > > > > + */
> > > > > +void nfs4_replay_free_cache(struct nfs4_replay *rp)
> > > > > +{
> > > > > +	if (rp->rp_buf != rp->rp_ibuf)
> > > > > +		kfree(rp->rp_buf);
> > > > > +	rp->rp_buf = rp->rp_ibuf;
> > > > > +	rp->rp_buflen = 0;
> > > > > +}
> > > > > +
> > > > >  static inline void nfs4_free_stateowner(struct nfs4_stateowner *sop)
> > > > >  {
> > > > > +	nfs4_replay_free_cache(&sop->so_replay);
> > > > >  	kfree(sop->so_owner.data);
> > > > >  	sop->so_ops->so_free(sop);
> > > > >  }
> > > > > diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> > > > > index 690f7a3122ec..2a0946c630e1 100644
> > > > > --- a/fs/nfsd/nfs4xdr.c
> > > > > +++ b/fs/nfsd/nfs4xdr.c
> > > > > @@ -6282,14 +6282,23 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op)
> > > > >  		int len = xdr->buf->len - (op_status_offset + XDR_UNIT);
> > > > >  
> > > > >  		so->so_replay.rp_status = op->status;
> > > > > -		if (len <= NFSD4_REPLAY_ISIZE) {
> > > > > -			so->so_replay.rp_buflen = len;
> > > > > -			read_bytes_from_xdr_buf(xdr->buf,
> > > > > -						op_status_offset + XDR_UNIT,
> > > > > -						so->so_replay.rp_buf, len);
> > > > > -		} else {
> > > > > -			so->so_replay.rp_buflen = 0;
> > > > > +		if (len > NFSD4_REPLAY_ISIZE) {
> > > > > +			char *buf = kmalloc(len, GFP_KERNEL);
> > > > > +
> > > > > +			nfs4_replay_free_cache(&so->so_replay);
> > > > > +			if (buf) {
> > > > > +				so->so_replay.rp_buf = buf;
> > > > > +			} else {
> > > > > +				/* rp_buflen already zeroed; skip caching */
> > > > > +				goto status;
> > > > > +			}
> > > > > +		} else if (so->so_replay.rp_buf != so->so_replay.rp_ibuf) {
> > > > > +			nfs4_replay_free_cache(&so->so_replay);
> > > > >  		}
> > > > > +		so->so_replay.rp_buflen = len;
> > > > > +		read_bytes_from_xdr_buf(xdr->buf,
> > > > > +					op_status_offset + XDR_UNIT,
> > > > > +					so->so_replay.rp_buf, len);
> > > > >  	}
> > > > >  status:
> > > > >  	op->status = nfsd4_map_status(op->status,
> > > > > diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
> > > > > index 3159c7b67f50..9b05462da4cc 100644
> > > > > --- a/fs/nfsd/state.h
> > > > > +++ b/fs/nfsd/state.h
> > > > > @@ -554,10 +554,10 @@ struct nfs4_client_reclaim {
> > > > >   *   ~32(deleg. ace) = 112 bytes
> > > > >   *
> > > > >   * Some responses can exceed this. A LOCK denial includes the conflicting
> > > > > - * lock owner, which can be up to 1024 bytes (NFS4_OPAQUE_LIMIT). Responses
> > > > > - * larger than REPLAY_ISIZE are not cached in rp_ibuf; only rp_status is
> > > > > - * saved. Enlarging this constant increases the size of every
> > > > > - * nfs4_stateowner.
> > > > > + * lock owner, which can be up to 1024 bytes (NFS4_OPAQUE_LIMIT). When a
> > > > > + * response exceeds REPLAY_ISIZE, a buffer is dynamically allocated. If
> > > > > + * that allocation fails, only rp_status is saved. Enlarging this constant
> > > > > + * increases the size of every nfs4_stateowner.
> > > > >   */
> > > > >  
> > > > >  #define NFSD4_REPLAY_ISIZE       112 
> > > > > @@ -569,12 +569,14 @@ struct nfs4_client_reclaim {
> > > > >  struct nfs4_replay {
> > > > >  	__be32			rp_status;
> > > > >  	unsigned int		rp_buflen;
> > > > > -	char			*rp_buf;
> > > > > +	char			*rp_buf; /* rp_ibuf or kmalloc'd */
> > > > >  	struct knfsd_fh		rp_openfh;
> > > > >  	int			rp_locked;
> > > > >  	char			rp_ibuf[NFSD4_REPLAY_ISIZE];
> > > > >  };
> > > > >  
> > > > > +extern void nfs4_replay_free_cache(struct nfs4_replay *rp);
> > > > > +
> > > > >  struct nfs4_stateowner;
> > > > >  
> > > > >  struct nfs4_stateowner_operations {
> > > > 
> > > > 
> > > > Certainly a reasonable approach if we care about full correctness when
> > > > dealing with a large lockowner on NFSv4.0. Do we?
> > > 
> > > The idea would be to either:
> > > 
> > > o Backport your fix and not this update, or
> > > o Squash these two together, and backport both
> > > 
> > > Admittedly this is a narrow corner case for a minor version that is
> > > destined for the scrap heap.
> > > 
> > 
> > Right. I ask because I looked at this approach when I was fixing this,
> > and decided it wasn't worthwhile. I certainly won't stand in your way
> > if you decide you want to handle long lockowner blobs, but I doubt any
> > legitimate user will ever care.
> 
> I don't disagree at all. My concern is handling replay compliantly.
> Maybe there's another approach.
> 

I think that the only other way is to grow NFSD4_REPLAY_ISIZE, and
doing dynamic allocation is preferable to that, IMO.

To be clear: I don't have a problem with your patch. It just didn't
seem worthwhile to me. If you think it's worth fixing though, then go
for it.
-- 
Jeff Layton <jlayton@kernel.org>

      reply	other threads:[~2026-02-24 19:59 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-24 19:33 [PATCH] nfsd: use dynamic allocation for oversized NFSv4.0 replay cache Chuck Lever
2026-02-24 19:39 ` Jeff Layton
2026-02-24 19:42   ` Chuck Lever
2026-02-24 19:51     ` Jeff Layton
2026-02-24 19:53       ` Chuck Lever
2026-02-24 19:59         ` Jeff Layton [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f16c1806a705e08252b1b39ea44b1de1e6be17d6.camel@kernel.org \
    --to=jlayton@kernel.org \
    --cc=cel@kernel.org \
    --cc=chuck.lever@oracle.com \
    --cc=dai.ngo@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neilb@ownmail.net \
    --cc=okorniev@redhat.com \
    --cc=tom@talpey.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox