From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F300330F52B for ; Tue, 24 Feb 2026 19:53:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771962802; cv=none; b=NR6hBBpUrUBPv1Jw7gysBwU3TlWN1+hIOe18bjtmEssxztDlwMxSKtzBk6FVqsUYOHaEMp+Y8cRKzZgA2vHx7zSeecu6p2qyu25QN4xoJGdiNkK9GHvcR5W9IgiBLSGINvJPEuD4ZhA4bot0CGZJz445EY8LwyYOmxsmHq2x9/I= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1771962802; c=relaxed/simple; bh=SpD/hzQjvMlJSfDGeKmu+6X7irJnJYPDaclSKcBu0rg=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=rpLqmrfPpIC0+LIfNASCQKhE8ljZd22uByStm5ht9nAiadZkfLUEgl9ehNYvoiMt7O0J4AGBWm8rDP8bnxJVzupySzkQcj7RnXqn3EopdId37+uLiyVG27lPdBy+bHcYko7ZhJn0sUNXATtY6NllseZCDP4jxM3SFg6TtZRXVKc= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=W+8TYSXe; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="W+8TYSXe" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 06D13C116D0; Tue, 24 Feb 2026 19:53:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1771962801; bh=SpD/hzQjvMlJSfDGeKmu+6X7irJnJYPDaclSKcBu0rg=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=W+8TYSXemoNE9FB9SKmfda23Spw3kMRjkqrzsyW1D8ql2yXc3XPQeapJXsVsfevge t+lA90JQkTu2Ua29kskXSAf7IHlMzk+6YMpc3yrUOLPvzfT3MIyQHHO7rwXOGz55rE TpImKtOz6+SO3/OF5A/2rMeYn9fBOALYwvOeVzl4BU5d9JmVb+KmP0eVjA/G7m2qdi +tC560/RCuMAdTQZ2q/tjdUeZ/93oUmzJt9miRxgn5ert9y5rpID2qBSudt8O73Xi7 OvWAb/IE9WGsV7SdDALmeGgHN7VI8lbJwYikn5H3KeI4sQczq7R7qtI/MCLX3IFmli J9U/ng/zlgSGw== Message-ID: <3a700716-9b7d-4bc5-8d9b-24bcd585df26@kernel.org> Date: Tue, 24 Feb 2026 14:53:19 -0500 Precedence: bulk X-Mailing-List: linux-nfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] nfsd: use dynamic allocation for oversized NFSv4.0 replay cache To: Jeff Layton Cc: NeilBrown , Olga Kornievskaia , Dai Ngo , Tom Talpey , linux-nfs@vger.kernel.org, Chuck Lever References: <20260224193354.57849-1-cel@kernel.org> <887c1ca78b34974160dee3ce7f25d6d077da93ab.camel@kernel.org> <20cd1871-581d-4fdd-b4fe-10684719c419@kernel.org> <5c6b6e52619caf720912639697af5b388a3ea79a.camel@kernel.org> From: Chuck Lever Content-Language: en-US Organization: kernel.org In-Reply-To: <5c6b6e52619caf720912639697af5b388a3ea79a.camel@kernel.org> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit On 2/24/26 2:51 PM, Jeff Layton wrote: > On Tue, 2026-02-24 at 14:42 -0500, Chuck Lever wrote: >> On 2/24/26 2:39 PM, Jeff Layton wrote: >>> On Tue, 2026-02-24 at 14:33 -0500, Chuck Lever wrote: >>>> From: Chuck Lever >>>> >>>> Commit 1e8e9913672a ("nfsd: fix heap overflow in NFSv4.0 LOCK >>>> replay cache") capped the replay cache copy at NFSD4_REPLAY_ISIZE >>>> to prevent a heap overflow, but set rp_buflen to zero when the >>>> encoded response exceeded the inline buffer. A retransmitted LOCK >>>> reaching the replay path then produced only a status code with no >>>> operation body, resulting in a malformed XDR response. >>>> >>>> When the encoded response exceeds the 112-byte inline rp_ibuf, a >>>> buffer is kmalloc'd to hold it. If the allocation fails, rp_buflen >>>> remains zero, preserving the behavior from the capped-copy fix. >>>> The buffer is freed when the stateowner is released or when a >>>> subsequent operation's response fits in the inline buffer. >>>> >>>> Fixes: 1e8e9913672a ("nfsd: fix heap overflow in NFSv4.0 LOCK replay cache") >>>> Signed-off-by: Chuck Lever >>>> --- >>>> fs/nfsd/nfs4state.c | 16 ++++++++++++++++ >>>> fs/nfsd/nfs4xdr.c | 23 ++++++++++++++++------- >>>> fs/nfsd/state.h | 12 +++++++----- >>>> 3 files changed, 39 insertions(+), 12 deletions(-) >>>> >>>> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c >>>> index ba49f49bb93b..b4d0e82b2690 100644 >>>> --- a/fs/nfsd/nfs4state.c >>>> +++ b/fs/nfsd/nfs4state.c >>>> @@ -1496,8 +1496,24 @@ release_all_access(struct nfs4_ol_stateid *stp) >>>> } >>>> } >>>> >>>> +/** >>>> + * nfs4_replay_free_cache - release dynamically allocated replay buffer >>>> + * @rp: replay cache to reset >>>> + * >>>> + * If @rp->rp_buf points to a kmalloc'd buffer, free it and reset >>>> + * rp_buf to the inline rp_ibuf. Always zeroes rp_buflen. >>>> + */ >>>> +void nfs4_replay_free_cache(struct nfs4_replay *rp) >>>> +{ >>>> + if (rp->rp_buf != rp->rp_ibuf) >>>> + kfree(rp->rp_buf); >>>> + rp->rp_buf = rp->rp_ibuf; >>>> + rp->rp_buflen = 0; >>>> +} >>>> + >>>> static inline void nfs4_free_stateowner(struct nfs4_stateowner *sop) >>>> { >>>> + nfs4_replay_free_cache(&sop->so_replay); >>>> kfree(sop->so_owner.data); >>>> sop->so_ops->so_free(sop); >>>> } >>>> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c >>>> index 690f7a3122ec..2a0946c630e1 100644 >>>> --- a/fs/nfsd/nfs4xdr.c >>>> +++ b/fs/nfsd/nfs4xdr.c >>>> @@ -6282,14 +6282,23 @@ nfsd4_encode_operation(struct nfsd4_compoundres *resp, struct nfsd4_op *op) >>>> int len = xdr->buf->len - (op_status_offset + XDR_UNIT); >>>> >>>> so->so_replay.rp_status = op->status; >>>> - if (len <= NFSD4_REPLAY_ISIZE) { >>>> - so->so_replay.rp_buflen = len; >>>> - read_bytes_from_xdr_buf(xdr->buf, >>>> - op_status_offset + XDR_UNIT, >>>> - so->so_replay.rp_buf, len); >>>> - } else { >>>> - so->so_replay.rp_buflen = 0; >>>> + if (len > NFSD4_REPLAY_ISIZE) { >>>> + char *buf = kmalloc(len, GFP_KERNEL); >>>> + >>>> + nfs4_replay_free_cache(&so->so_replay); >>>> + if (buf) { >>>> + so->so_replay.rp_buf = buf; >>>> + } else { >>>> + /* rp_buflen already zeroed; skip caching */ >>>> + goto status; >>>> + } >>>> + } else if (so->so_replay.rp_buf != so->so_replay.rp_ibuf) { >>>> + nfs4_replay_free_cache(&so->so_replay); >>>> } >>>> + so->so_replay.rp_buflen = len; >>>> + read_bytes_from_xdr_buf(xdr->buf, >>>> + op_status_offset + XDR_UNIT, >>>> + so->so_replay.rp_buf, len); >>>> } >>>> status: >>>> op->status = nfsd4_map_status(op->status, >>>> diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h >>>> index 3159c7b67f50..9b05462da4cc 100644 >>>> --- a/fs/nfsd/state.h >>>> +++ b/fs/nfsd/state.h >>>> @@ -554,10 +554,10 @@ struct nfs4_client_reclaim { >>>> * ~32(deleg. ace) = 112 bytes >>>> * >>>> * Some responses can exceed this. A LOCK denial includes the conflicting >>>> - * lock owner, which can be up to 1024 bytes (NFS4_OPAQUE_LIMIT). Responses >>>> - * larger than REPLAY_ISIZE are not cached in rp_ibuf; only rp_status is >>>> - * saved. Enlarging this constant increases the size of every >>>> - * nfs4_stateowner. >>>> + * lock owner, which can be up to 1024 bytes (NFS4_OPAQUE_LIMIT). When a >>>> + * response exceeds REPLAY_ISIZE, a buffer is dynamically allocated. If >>>> + * that allocation fails, only rp_status is saved. Enlarging this constant >>>> + * increases the size of every nfs4_stateowner. >>>> */ >>>> >>>> #define NFSD4_REPLAY_ISIZE 112 >>>> @@ -569,12 +569,14 @@ struct nfs4_client_reclaim { >>>> struct nfs4_replay { >>>> __be32 rp_status; >>>> unsigned int rp_buflen; >>>> - char *rp_buf; >>>> + char *rp_buf; /* rp_ibuf or kmalloc'd */ >>>> struct knfsd_fh rp_openfh; >>>> int rp_locked; >>>> char rp_ibuf[NFSD4_REPLAY_ISIZE]; >>>> }; >>>> >>>> +extern void nfs4_replay_free_cache(struct nfs4_replay *rp); >>>> + >>>> struct nfs4_stateowner; >>>> >>>> struct nfs4_stateowner_operations { >>> >>> >>> Certainly a reasonable approach if we care about full correctness when >>> dealing with a large lockowner on NFSv4.0. Do we? >> >> The idea would be to either: >> >> o Backport your fix and not this update, or >> o Squash these two together, and backport both >> >> Admittedly this is a narrow corner case for a minor version that is >> destined for the scrap heap. >> > > Right. I ask because I looked at this approach when I was fixing this, > and decided it wasn't worthwhile. I certainly won't stand in your way > if you decide you want to handle long lockowner blobs, but I doubt any > legitimate user will ever care. I don't disagree at all. My concern is handling replay compliantly. Maybe there's another approach. -- Chuck Lever