From: Steffen Klassert <steffen.klassert@secunet.com>
To: Paolo Abeni <pabeni@redhat.com>
Cc: David Miller <davem@davemloft.net>,
Jakub Kicinski <kuba@kernel.org>,
Herbert Xu <herbert@gondor.apana.org.au>,
<netdev@vger.kernel.org>,
Alessandro Schino <7991aleschino@gmail.com>
Subject: Re: [PATCH 5/9] esp: fix page frag reference leak on skb_to_sgvec failure
Date: Fri, 29 May 2026 10:27:14 +0200 [thread overview]
Message-ID: <ahlN4nG1GPi289rX@secunet.com> (raw)
In-Reply-To: <e7124e98-5633-4982-a850-8787c3949174@redhat.com>
On Fri, May 29, 2026 at 09:14:28AM +0200, Paolo Abeni wrote:
> On 5/29/26 7:52 AM, Steffen Klassert wrote:
> > Ccing the author of this patch.
> >
> > On Thu, May 28, 2026 at 03:44:14PM +0200, Paolo Abeni wrote:
> >> On 5/27/26 10:41 AM, Steffen Klassert wrote:
> >>> From: e521588 <alessandro.schino@sbb.ch>
> >>>
> >>> In esp_output_tail(), when esp->inplace is false, the old skb page frags
> >>> are replaced with a new page from the xfrm page_frag cache. The source
> >>> scatterlist (sg) is built from the old frags before the replacement, and
> >>> esp_ssg_unref() is responsible for releasing the old page references
> >>> after the crypto operation completes.
> >>>
> >>> However, if the second skb_to_sgvec() call (which builds the destination
> >>> scatterlist from the new page) fails, the code jumps to error_free which
> >>> only calls kfree(tmp). The old page frag references captured in the
> >>> source scatterlist are never released:
> >>>
> >>> 1. sg[] is built from old frags via skb_to_sgvec() (no extra get_page)
> >>> 2. nr_frags is set to 1 and frag[0] is replaced with the new page
> >>> 3. Second skb_to_sgvec() fails -> goto error_free
> >>> 4. kfree(tmp) frees the sg[] memory but old frags are not unref'd
> >>> 5. kfree_skb() only releases frag[0] (the new page), not the old ones
> >>>
> >>> Fix this by adding a bool parameter to esp_ssg_unref() that, when true,
> >>> unconditionally unrefs the source scatterlist frags without checking
> >>> req->src and req->dst, since those fields are not yet initialized by
> >>> aead_request_set_crypt() at the point of the error. Existing callers
> >>> pass false to preserve the original behavior.
> >>>
> >>> The same issue exists in both esp4 and esp6 as the code is identical.
> >>>
> >>> Fixes: cac2661c53f3 ("esp4: Avoid skb_cow_data whenever possible")
> >>> Fixes: 03e2a30f6a27 ("esp6: Avoid skb_cow_data whenever possible")
> >>>
> >>> Signed-off-by: Alessandro Schino <7991aleschino@gmail.com>
> >>> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
> >>> ---
> >>> net/ipv4/esp4.c | 12 +++++++-----
> >>> net/ipv6/esp6.c | 12 +++++++-----
> >>> 2 files changed, 14 insertions(+), 10 deletions(-)
> >>>
> >>> diff --git a/net/ipv4/esp4.c b/net/ipv4/esp4.c
> >>> index 6a5febbdbee4..8314d7bddcb7 100644
> >>> --- a/net/ipv4/esp4.c
> >>> +++ b/net/ipv4/esp4.c
> >>> @@ -96,7 +96,7 @@ static inline struct scatterlist *esp_req_sg(struct crypto_aead *aead,
> >>> __alignof__(struct scatterlist));
> >>> }
> >>>
> >>> -static void esp_ssg_unref(struct xfrm_state *x, void *tmp, struct sk_buff *skb)
> >>> +static void esp_ssg_unref(struct xfrm_state *x, void *tmp, struct sk_buff *skb, bool already_unref)
> >>> {
> >>> struct crypto_aead *aead = x->data;
> >>> int extralen = 0;
> >>> @@ -113,7 +113,7 @@ static void esp_ssg_unref(struct xfrm_state *x, void *tmp, struct sk_buff *skb)
> >>> /* Unref skb_frag_pages in the src scatterlist if necessary.
> >>> * Skip the first sg which comes from skb->data.
> >>> */
> >>> - if (req->src != req->dst)
> >>> + if (already_unref || req->src != req->dst)
> >>> for (sg = sg_next(req->src); sg; sg = sg_next(sg))
> >>> skb_page_unref(page_to_netmem(sg_page(sg)),
> >>> skb->pp_recycle);
> >>
> >> Sashiko points the above is still unsafe:
> >
> > For the original patch submission, sashiko reported 'failed
> > to apply' because it was targeted to the ipsec tree. Now
> > with the full pull request it applied and it found issues,
> > but now it is too late for me to fix this.
> >
> > Would it be possible to integrate subtrees of net and net-next
> > into the netdev CI? This would avoid such problems in future.
> >
> >> ---
> >> When the new caller passes already_unref = true, can the loop still be
> >> reached safely?
> >> Looking at the allocation and request setup in esp_output_tail():
> >> tmp = esp_alloc_tmp(aead, esp->nfrags + 2, extralen);
> >> ...
> >> req = esp_tmp_req(aead, iv);
> >> sg = esp_req_sg(aead, req);
> >> esp_alloc_tmp() calls kmalloc(len, GFP_ATOMIC) — the buffer is not
> >> zeroed.
> >> esp_tmp_req() only does aead_request_set_tfm(req, aead), which assigns
> >> req->base.tfm. It does not touch req->src or req->dst.
> >> The local variable sg is filled in by skb_to_sgvec(skb, sg, ...) but
> >> nothing copies sg into req->src; that assignment is performed later by
> >> aead_request_set_crypt(req, sg, dsg, ivlen + esp->clen, iv).
> >> In the new error path the call sequence is:
> >> err = skb_to_sgvec(skb, dsg, ...);
> >> if (unlikely(err < 0)) {
> >> esp_ssg_unref(x, tmp, skb, true); /* before set_crypt */
> >> goto error_free;
> >> }
> >> ...
> >> aead_request_set_crypt(req, sg, dsg, ivlen + esp->clen, iv);
> >> Inside esp_ssg_unref() with already_unref = true, the predicate is
> >> short-circuited but the loop body still dereferences req->src:
> >> if (already_unref || req->src != req->dst)
> >> for (sg = sg_next(req->src); sg; sg = sg_next(sg))
> >> skb_page_unref(page_to_netmem(sg_page(sg)),
> >> skb->pp_recycle);
> >> Since req->src has not been initialized at this point (the kmalloc
> >> buffer holds whatever was there before, and aead_request_set_crypt()
> >> has not yet run), sg_next(req->src) walks an arbitrary pointer,
> >> sg_page(sg) reads sg->page_link from arbitrary memory, and
> >> skb_page_unref(page_to_netmem(...), ...) then drops a refcount on a
> >> fabricated page.
> >> The commit message also seems to acknowledge this:
> >> Fix this by adding a bool parameter to esp_ssg_unref() that, when true,
> >> unconditionally unrefs the source scatterlist frags without checking
> >> req->src and req->dst, since those fields are not yet initialized by
> >> aead_request_set_crypt() at the point of the error.
> >> If req->src/req->dst are not yet initialized, how can the loop
> >> sg = sg_next(req->src) be valid? Should the helper instead walk the
> >> local sg returned by esp_req_sg(aead, req) (which was actually
> >> populated by the first skb_to_sgvec()), or should
> >> aead_request_set_crypt() be moved before the failure check so that
> >> req->src/req->dst are well defined?
> >> The same construct appears in net/ipv6/esp6.c:
> >> ---
> >>
> >> And the SoB is incorrect for this patch. Do you rebase your tree? (I'm
> >> not suggesting starting that otherwise!!!) If so you could possibly drop
> >> this patch ?!?
> >
> > No, I never rebased the ipsec tree so far. I could revert it, or you can
> > do so if you don't accept the patch as is. Rebasing would be the last
> > resort as this creates problems for everyone who cloned my tree.
>
> Sure, I just could not remember if xfrm tree was usually rebased or not.
> A revert or an incremental patch are just fine.
I'll revert it for now and resend the pull reuest.
next prev parent reply other threads:[~2026-05-29 8:32 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-27 8:41 [PATCH 0/9] pull request (net): ipsec 2026-05-27 Steffen Klassert
2026-05-27 8:41 ` [PATCH 1/9] xfrm: route MIGRATE notifications to caller's netns Steffen Klassert
2026-05-27 8:41 ` [PATCH 2/9] xfrm: ipcomp: Free destination pages on acomp errors Steffen Klassert
2026-05-27 8:41 ` [PATCH 3/9] xfrm: Check for underflow in xfrm_state_mtu Steffen Klassert
2026-05-27 8:41 ` [PATCH 4/9] xfrm: ah: use skb_to_full_sk in async output callbacks Steffen Klassert
2026-05-27 8:41 ` [PATCH 5/9] esp: fix page frag reference leak on skb_to_sgvec failure Steffen Klassert
2026-05-28 13:44 ` Paolo Abeni
2026-05-29 5:52 ` Steffen Klassert
2026-05-29 7:14 ` Paolo Abeni
2026-05-29 8:27 ` Steffen Klassert [this message]
2026-05-29 18:14 ` Jakub Kicinski
2026-05-27 8:41 ` [PATCH 6/9] xfrm: esp: restore combined single-frag length gate Steffen Klassert
2026-05-27 8:41 ` [PATCH 7/9] xfrm: iptfs: reset runtime state when cloning SAs Steffen Klassert
2026-05-27 8:41 ` [PATCH 8/9] xfrm: move policy_bydst RCU sync from per-netns .exit to .pre_exit Steffen Klassert
2026-05-27 8:41 ` [PATCH 9/9] xfrm: input: hold netns during deferred transport reinjection Steffen Klassert
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ahlN4nG1GPi289rX@secunet.com \
--to=steffen.klassert@secunet.com \
--cc=7991aleschino@gmail.com \
--cc=davem@davemloft.net \
--cc=herbert@gondor.apana.org.au \
--cc=kuba@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox