Netdev List
 help / color / mirror / Atom feed
From: Steffen Klassert <steffen.klassert@secunet.com>
To: Paolo Abeni <pabeni@redhat.com>
Cc: David Miller <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>,
	Herbert Xu <herbert@gondor.apana.org.au>,
	<netdev@vger.kernel.org>,
	Alessandro Schino <7991aleschino@gmail.com>
Subject: Re: [PATCH 5/9] esp: fix page frag reference leak on skb_to_sgvec failure
Date: Fri, 29 May 2026 10:27:14 +0200	[thread overview]
Message-ID: <ahlN4nG1GPi289rX@secunet.com> (raw)
In-Reply-To: <e7124e98-5633-4982-a850-8787c3949174@redhat.com>

On Fri, May 29, 2026 at 09:14:28AM +0200, Paolo Abeni wrote:
> On 5/29/26 7:52 AM, Steffen Klassert wrote:
> > Ccing the author of this patch.
> > 
> > On Thu, May 28, 2026 at 03:44:14PM +0200, Paolo Abeni wrote:
> >> On 5/27/26 10:41 AM, Steffen Klassert wrote:
> >>> From: e521588 <alessandro.schino@sbb.ch>
> >>>
> >>> In esp_output_tail(), when esp->inplace is false, the old skb page frags
> >>> are replaced with a new page from the xfrm page_frag cache. The source
> >>> scatterlist (sg) is built from the old frags before the replacement, and
> >>> esp_ssg_unref() is responsible for releasing the old page references
> >>> after the crypto operation completes.
> >>>
> >>> However, if the second skb_to_sgvec() call (which builds the destination
> >>> scatterlist from the new page) fails, the code jumps to error_free which
> >>> only calls kfree(tmp). The old page frag references captured in the
> >>> source scatterlist are never released:
> >>>
> >>>   1. sg[] is built from old frags via skb_to_sgvec() (no extra get_page)
> >>>   2. nr_frags is set to 1 and frag[0] is replaced with the new page
> >>>   3. Second skb_to_sgvec() fails -> goto error_free
> >>>   4. kfree(tmp) frees the sg[] memory but old frags are not unref'd
> >>>   5. kfree_skb() only releases frag[0] (the new page), not the old ones
> >>>
> >>> Fix this by adding a bool parameter to esp_ssg_unref() that, when true,
> >>> unconditionally unrefs the source scatterlist frags without checking
> >>> req->src and req->dst, since those fields are not yet initialized by
> >>> aead_request_set_crypt() at the point of the error. Existing callers
> >>> pass false to preserve the original behavior.
> >>>
> >>> The same issue exists in both esp4 and esp6 as the code is identical.
> >>>
> >>> Fixes: cac2661c53f3 ("esp4: Avoid skb_cow_data whenever possible")
> >>> Fixes: 03e2a30f6a27 ("esp6: Avoid skb_cow_data whenever possible")
> >>>
> >>> Signed-off-by: Alessandro Schino <7991aleschino@gmail.com>
> >>> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
> >>> ---
> >>>  net/ipv4/esp4.c | 12 +++++++-----
> >>>  net/ipv6/esp6.c | 12 +++++++-----
> >>>  2 files changed, 14 insertions(+), 10 deletions(-)
> >>>
> >>> diff --git a/net/ipv4/esp4.c b/net/ipv4/esp4.c
> >>> index 6a5febbdbee4..8314d7bddcb7 100644
> >>> --- a/net/ipv4/esp4.c
> >>> +++ b/net/ipv4/esp4.c
> >>> @@ -96,7 +96,7 @@ static inline struct scatterlist *esp_req_sg(struct crypto_aead *aead,
> >>>  			     __alignof__(struct scatterlist));
> >>>  }
> >>>  
> >>> -static void esp_ssg_unref(struct xfrm_state *x, void *tmp, struct sk_buff *skb)
> >>> +static void esp_ssg_unref(struct xfrm_state *x, void *tmp, struct sk_buff *skb, bool already_unref)
> >>>  {
> >>>  	struct crypto_aead *aead = x->data;
> >>>  	int extralen = 0;
> >>> @@ -113,7 +113,7 @@ static void esp_ssg_unref(struct xfrm_state *x, void *tmp, struct sk_buff *skb)
> >>>  	/* Unref skb_frag_pages in the src scatterlist if necessary.
> >>>  	 * Skip the first sg which comes from skb->data.
> >>>  	 */
> >>> -	if (req->src != req->dst)
> >>> +	if (already_unref || req->src != req->dst)
> >>>  		for (sg = sg_next(req->src); sg; sg = sg_next(sg))
> >>>  			skb_page_unref(page_to_netmem(sg_page(sg)),
> >>>  				       skb->pp_recycle);
> >>
> >> Sashiko points the above is still unsafe:
> > 
> > For the original patch submission, sashiko reported 'failed
> > to apply' because it was targeted to the ipsec tree. Now
> > with the full pull request it applied and it found issues,
> > but now it is too late for me to fix this.
> > 
> > Would it be possible to integrate subtrees of net and net-next
> > into the netdev CI? This would avoid such problems in future.
> > 
> >> ---
> >> When the new caller passes already_unref = true, can the loop still be
> >> reached safely?
> >> Looking at the allocation and request setup in esp_output_tail():
> >> 	tmp = esp_alloc_tmp(aead, esp->nfrags + 2, extralen);
> >> 	...
> >> 	req = esp_tmp_req(aead, iv);
> >> 	sg  = esp_req_sg(aead, req);
> >> esp_alloc_tmp() calls kmalloc(len, GFP_ATOMIC) — the buffer is not
> >> zeroed.
> >> esp_tmp_req() only does aead_request_set_tfm(req, aead), which assigns
> >> req->base.tfm. It does not touch req->src or req->dst.
> >> The local variable sg is filled in by skb_to_sgvec(skb, sg, ...) but
> >> nothing copies sg into req->src; that assignment is performed later by
> >> aead_request_set_crypt(req, sg, dsg, ivlen + esp->clen, iv).
> >> In the new error path the call sequence is:
> >> 	err = skb_to_sgvec(skb, dsg, ...);
> >> 	if (unlikely(err < 0)) {
> >> 		esp_ssg_unref(x, tmp, skb, true);   /* before set_crypt */
> >> 		goto error_free;
> >> 	}
> >> 	...
> >> 	aead_request_set_crypt(req, sg, dsg, ivlen + esp->clen, iv);
> >> Inside esp_ssg_unref() with already_unref = true, the predicate is
> >> short-circuited but the loop body still dereferences req->src:
> >> 	if (already_unref || req->src != req->dst)
> >> 		for (sg = sg_next(req->src); sg; sg = sg_next(sg))
> >> 			skb_page_unref(page_to_netmem(sg_page(sg)),
> >> 				       skb->pp_recycle);
> >> Since req->src has not been initialized at this point (the kmalloc
> >> buffer holds whatever was there before, and aead_request_set_crypt()
> >> has not yet run), sg_next(req->src) walks an arbitrary pointer,
> >> sg_page(sg) reads sg->page_link from arbitrary memory, and
> >> skb_page_unref(page_to_netmem(...), ...) then drops a refcount on a
> >> fabricated page.
> >> The commit message also seems to acknowledge this:
> >>     Fix this by adding a bool parameter to esp_ssg_unref() that, when true,
> >>     unconditionally unrefs the source scatterlist frags without checking
> >>     req->src and req->dst, since those fields are not yet initialized by
> >>     aead_request_set_crypt() at the point of the error.
> >> If req->src/req->dst are not yet initialized, how can the loop
> >> sg = sg_next(req->src) be valid?  Should the helper instead walk the
> >> local sg returned by esp_req_sg(aead, req) (which was actually
> >> populated by the first skb_to_sgvec()), or should
> >> aead_request_set_crypt() be moved before the failure check so that
> >> req->src/req->dst are well defined?
> >> The same construct appears in net/ipv6/esp6.c:
> >> ---
> >>
> >> And the SoB is incorrect for this patch. Do you rebase your tree? (I'm
> >> not suggesting starting that otherwise!!!) If so you could possibly drop
> >> this patch ?!?
> > 
> > No, I never rebased the ipsec tree so far. I could revert it, or you can
> > do so if you don't accept the patch as is. Rebasing would be the last
> > resort as this creates problems for everyone who cloned my tree.
> 
> Sure, I just could not remember if xfrm tree was usually rebased or not.
> A revert or an incremental patch are just fine.

I'll revert it for now and resend the pull reuest.

  reply	other threads:[~2026-05-29  8:32 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-27  8:41 [PATCH 0/9] pull request (net): ipsec 2026-05-27 Steffen Klassert
2026-05-27  8:41 ` [PATCH 1/9] xfrm: route MIGRATE notifications to caller's netns Steffen Klassert
2026-05-27  8:41 ` [PATCH 2/9] xfrm: ipcomp: Free destination pages on acomp errors Steffen Klassert
2026-05-27  8:41 ` [PATCH 3/9] xfrm: Check for underflow in xfrm_state_mtu Steffen Klassert
2026-05-27  8:41 ` [PATCH 4/9] xfrm: ah: use skb_to_full_sk in async output callbacks Steffen Klassert
2026-05-27  8:41 ` [PATCH 5/9] esp: fix page frag reference leak on skb_to_sgvec failure Steffen Klassert
2026-05-28 13:44   ` Paolo Abeni
2026-05-29  5:52     ` Steffen Klassert
2026-05-29  7:14       ` Paolo Abeni
2026-05-29  8:27         ` Steffen Klassert [this message]
2026-05-29 18:14       ` Jakub Kicinski
2026-05-27  8:41 ` [PATCH 6/9] xfrm: esp: restore combined single-frag length gate Steffen Klassert
2026-05-27  8:41 ` [PATCH 7/9] xfrm: iptfs: reset runtime state when cloning SAs Steffen Klassert
2026-05-27  8:41 ` [PATCH 8/9] xfrm: move policy_bydst RCU sync from per-netns .exit to .pre_exit Steffen Klassert
2026-05-27  8:41 ` [PATCH 9/9] xfrm: input: hold netns during deferred transport reinjection Steffen Klassert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ahlN4nG1GPi289rX@secunet.com \
    --to=steffen.klassert@secunet.com \
    --cc=7991aleschino@gmail.com \
    --cc=davem@davemloft.net \
    --cc=herbert@gondor.apana.org.au \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox