From: Steffen Klassert <steffen.klassert@secunet.com>
To: Paolo Abeni <pabeni@redhat.com>
Cc: David Miller <davem@davemloft.net>,
Jakub Kicinski <kuba@kernel.org>,
Herbert Xu <herbert@gondor.apana.org.au>,
<netdev@vger.kernel.org>,
Alessandro Schino <7991aleschino@gmail.com>
Subject: Re: [PATCH 5/9] esp: fix page frag reference leak on skb_to_sgvec failure
Date: Fri, 29 May 2026 07:52:07 +0200 [thread overview]
Message-ID: <ahkphzYYBeoCcbRI@secunet.com> (raw)
In-Reply-To: <0902bdb9-2746-45fd-ba4d-8e5b96617b8b@redhat.com>
Ccing the author of this patch.
On Thu, May 28, 2026 at 03:44:14PM +0200, Paolo Abeni wrote:
> On 5/27/26 10:41 AM, Steffen Klassert wrote:
> > From: e521588 <alessandro.schino@sbb.ch>
> >
> > In esp_output_tail(), when esp->inplace is false, the old skb page frags
> > are replaced with a new page from the xfrm page_frag cache. The source
> > scatterlist (sg) is built from the old frags before the replacement, and
> > esp_ssg_unref() is responsible for releasing the old page references
> > after the crypto operation completes.
> >
> > However, if the second skb_to_sgvec() call (which builds the destination
> > scatterlist from the new page) fails, the code jumps to error_free which
> > only calls kfree(tmp). The old page frag references captured in the
> > source scatterlist are never released:
> >
> > 1. sg[] is built from old frags via skb_to_sgvec() (no extra get_page)
> > 2. nr_frags is set to 1 and frag[0] is replaced with the new page
> > 3. Second skb_to_sgvec() fails -> goto error_free
> > 4. kfree(tmp) frees the sg[] memory but old frags are not unref'd
> > 5. kfree_skb() only releases frag[0] (the new page), not the old ones
> >
> > Fix this by adding a bool parameter to esp_ssg_unref() that, when true,
> > unconditionally unrefs the source scatterlist frags without checking
> > req->src and req->dst, since those fields are not yet initialized by
> > aead_request_set_crypt() at the point of the error. Existing callers
> > pass false to preserve the original behavior.
> >
> > The same issue exists in both esp4 and esp6 as the code is identical.
> >
> > Fixes: cac2661c53f3 ("esp4: Avoid skb_cow_data whenever possible")
> > Fixes: 03e2a30f6a27 ("esp6: Avoid skb_cow_data whenever possible")
> >
> > Signed-off-by: Alessandro Schino <7991aleschino@gmail.com>
> > Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
> > ---
> > net/ipv4/esp4.c | 12 +++++++-----
> > net/ipv6/esp6.c | 12 +++++++-----
> > 2 files changed, 14 insertions(+), 10 deletions(-)
> >
> > diff --git a/net/ipv4/esp4.c b/net/ipv4/esp4.c
> > index 6a5febbdbee4..8314d7bddcb7 100644
> > --- a/net/ipv4/esp4.c
> > +++ b/net/ipv4/esp4.c
> > @@ -96,7 +96,7 @@ static inline struct scatterlist *esp_req_sg(struct crypto_aead *aead,
> > __alignof__(struct scatterlist));
> > }
> >
> > -static void esp_ssg_unref(struct xfrm_state *x, void *tmp, struct sk_buff *skb)
> > +static void esp_ssg_unref(struct xfrm_state *x, void *tmp, struct sk_buff *skb, bool already_unref)
> > {
> > struct crypto_aead *aead = x->data;
> > int extralen = 0;
> > @@ -113,7 +113,7 @@ static void esp_ssg_unref(struct xfrm_state *x, void *tmp, struct sk_buff *skb)
> > /* Unref skb_frag_pages in the src scatterlist if necessary.
> > * Skip the first sg which comes from skb->data.
> > */
> > - if (req->src != req->dst)
> > + if (already_unref || req->src != req->dst)
> > for (sg = sg_next(req->src); sg; sg = sg_next(sg))
> > skb_page_unref(page_to_netmem(sg_page(sg)),
> > skb->pp_recycle);
>
> Sashiko points the above is still unsafe:
For the original patch submission, sashiko reported 'failed
to apply' because it was targeted to the ipsec tree. Now
with the full pull request it applied and it found issues,
but now it is too late for me to fix this.
Would it be possible to integrate subtrees of net and net-next
into the netdev CI? This would avoid such problems in future.
> ---
> When the new caller passes already_unref = true, can the loop still be
> reached safely?
> Looking at the allocation and request setup in esp_output_tail():
> tmp = esp_alloc_tmp(aead, esp->nfrags + 2, extralen);
> ...
> req = esp_tmp_req(aead, iv);
> sg = esp_req_sg(aead, req);
> esp_alloc_tmp() calls kmalloc(len, GFP_ATOMIC) — the buffer is not
> zeroed.
> esp_tmp_req() only does aead_request_set_tfm(req, aead), which assigns
> req->base.tfm. It does not touch req->src or req->dst.
> The local variable sg is filled in by skb_to_sgvec(skb, sg, ...) but
> nothing copies sg into req->src; that assignment is performed later by
> aead_request_set_crypt(req, sg, dsg, ivlen + esp->clen, iv).
> In the new error path the call sequence is:
> err = skb_to_sgvec(skb, dsg, ...);
> if (unlikely(err < 0)) {
> esp_ssg_unref(x, tmp, skb, true); /* before set_crypt */
> goto error_free;
> }
> ...
> aead_request_set_crypt(req, sg, dsg, ivlen + esp->clen, iv);
> Inside esp_ssg_unref() with already_unref = true, the predicate is
> short-circuited but the loop body still dereferences req->src:
> if (already_unref || req->src != req->dst)
> for (sg = sg_next(req->src); sg; sg = sg_next(sg))
> skb_page_unref(page_to_netmem(sg_page(sg)),
> skb->pp_recycle);
> Since req->src has not been initialized at this point (the kmalloc
> buffer holds whatever was there before, and aead_request_set_crypt()
> has not yet run), sg_next(req->src) walks an arbitrary pointer,
> sg_page(sg) reads sg->page_link from arbitrary memory, and
> skb_page_unref(page_to_netmem(...), ...) then drops a refcount on a
> fabricated page.
> The commit message also seems to acknowledge this:
> Fix this by adding a bool parameter to esp_ssg_unref() that, when true,
> unconditionally unrefs the source scatterlist frags without checking
> req->src and req->dst, since those fields are not yet initialized by
> aead_request_set_crypt() at the point of the error.
> If req->src/req->dst are not yet initialized, how can the loop
> sg = sg_next(req->src) be valid? Should the helper instead walk the
> local sg returned by esp_req_sg(aead, req) (which was actually
> populated by the first skb_to_sgvec()), or should
> aead_request_set_crypt() be moved before the failure check so that
> req->src/req->dst are well defined?
> The same construct appears in net/ipv6/esp6.c:
> ---
>
> And the SoB is incorrect for this patch. Do you rebase your tree? (I'm
> not suggesting starting that otherwise!!!) If so you could possibly drop
> this patch ?!?
No, I never rebased the ipsec tree so far. I could revert it, or you can
do so if you don't accept the patch as is. Rebasing would be the last
resort as this creates problems for everyone who cloned my tree.
next prev parent reply other threads:[~2026-05-29 5:52 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-27 8:41 [PATCH 0/9] pull request (net): ipsec 2026-05-27 Steffen Klassert
2026-05-27 8:41 ` [PATCH 1/9] xfrm: route MIGRATE notifications to caller's netns Steffen Klassert
2026-05-27 8:41 ` [PATCH 2/9] xfrm: ipcomp: Free destination pages on acomp errors Steffen Klassert
2026-05-27 8:41 ` [PATCH 3/9] xfrm: Check for underflow in xfrm_state_mtu Steffen Klassert
2026-05-27 8:41 ` [PATCH 4/9] xfrm: ah: use skb_to_full_sk in async output callbacks Steffen Klassert
2026-05-27 8:41 ` [PATCH 5/9] esp: fix page frag reference leak on skb_to_sgvec failure Steffen Klassert
2026-05-28 13:44 ` Paolo Abeni
2026-05-29 5:52 ` Steffen Klassert [this message]
2026-05-29 7:14 ` Paolo Abeni
2026-05-29 8:27 ` Steffen Klassert
2026-05-29 18:14 ` Jakub Kicinski
2026-05-27 8:41 ` [PATCH 6/9] xfrm: esp: restore combined single-frag length gate Steffen Klassert
2026-05-27 8:41 ` [PATCH 7/9] xfrm: iptfs: reset runtime state when cloning SAs Steffen Klassert
2026-05-27 8:41 ` [PATCH 8/9] xfrm: move policy_bydst RCU sync from per-netns .exit to .pre_exit Steffen Klassert
2026-05-27 8:41 ` [PATCH 9/9] xfrm: input: hold netns during deferred transport reinjection Steffen Klassert
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ahkphzYYBeoCcbRI@secunet.com \
--to=steffen.klassert@secunet.com \
--cc=7991aleschino@gmail.com \
--cc=davem@davemloft.net \
--cc=herbert@gondor.apana.org.au \
--cc=kuba@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox