Netdev List
 help / color / mirror / Atom feed
From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
To: Lorenz Brun <lorenz@monogon.tech>
Cc: Alexander Lobakin <aleksander.lobakin@intel.com>,
	Tony Nguyen <anthony.l.nguyen@intel.com>,
	Przemek Kitszel <przemyslaw.kitszel@intel.com>,
	Andrew Lunn <andrew+netdev@lunn.ch>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>,
	"Paolo Abeni" <pabeni@redhat.com>,
	Simon Horman <horms@kernel.org>,
	"Alexei Starovoitov" <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	"Jesper Dangaard Brouer" <hawk@kernel.org>,
	John Fastabend <john.fastabend@gmail.com>,
	Stanislav Fomichev <sdf@fomichev.me>, <stable@vger.kernel.org>,
	<intel-wired-lan@lists.osuosl.org>, <netdev@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>, <bpf@vger.kernel.org>
Subject: Re: [PATCH] xsk: switch xdp_build_skb_from_zc() to napi_alloc_skb()
Date: Mon, 18 May 2026 15:29:00 +0200	[thread overview]
Message-ID: <agsUHNss20SweH3a@boxer> (raw)
In-Reply-To: <CAJMi0nQN+XB14Z81=W2reEGnax526-MB=Armx+f_miWMWUmRFw@mail.gmail.com>

On Mon, May 18, 2026 at 02:57:55PM +0200, Lorenz Brun wrote:
> On Wed, 13 May 2026 at 17:21, Alexander Lobakin
> <aleksander.lobakin@intel.com> wrote:
> >
> > From: Lorenz Brun <lorenz@monogon.tech>
> > Date: Tue, 12 May 2026 17:26:56 +0200
> >
> > > xdp_build_skb_from_zc() allocated xdp->frame_sz bytes from the per-cpu
> > > system_page_pool and built the skb head with napi_build_skb(). The
> > > latter places skb_shared_info at the tail of the buffer, but the
> > > helper sized the allocation as if the whole frame_sz were usable for
> > > data. Whenever the packet plus reserved headroom approached frame_sz,
> > > the head memcpy overran shinfo with packet content, corrupting
> > > ->flags (SKBFL_ZEROCOPY_ENABLE) and ->nr_frags, which then drove
> > > skb_copy_ubufs() off the end of frags[] on the RX path:
> > >
> > >   UBSAN: array-index-out-of-bounds in include/linux/skbuff.h:2541
> > >   index 113 is out of range for type 'skb_frag_t [17]'
> > >    skb_copy_ubufs+0x7da/0x960
> > >    ip_local_deliver_finish+0xcd/0x110
> > >    ice_napi_poll+0xe4/0x2a0 [ice]
> > >
> > > The overrun bytes come from the packet, so an on-wire sender can
> > > corrupt kernel memory remotely whenever the XDP program returns
> > > XDP_PASS.
> > >
> > > Rather than patch the sizing math, switch to the pattern used by other
> > > in-tree AF_XDP zero-copy drivers like mlx5 and i40e which use
> > > napi_alloc_skb() sized to the actual packet plus skb_put_data().
> > > This sizes the head exactly for the data being copied, drops the
> > > system_page_pool local_lock from this path, and removes the
> > > structural mismatch between frame_sz and the skb head buffer. Frags
> > > are allocated with alloc_page() per frag, matching the other drivers.
> >
> > I used napi_build_skb() + system page_pool to enable PP recycling
> > improving XSk XDP_PASS performance a lot.
> > Are you sure there's no other way to approach this?
> >
> > napi_alloc_skb() used in other drivers works, but it's sorta old
> > approach which is way slower.
> >
> > System page_pools always allocate a full page, why can it create an skb
> > prone to overruns?
> >
> > >
> > > Fixes: 560d958c6c68 ("xsk: add generic XSk &xdp_buff -> skb conversion")
> > > Cc: stable@vger.kernel.org
> > > Signed-off-by: Lorenz Brun <lorenz@monogon.tech>
> > Thanks,
> > Olek
> 
> Hi Olek
> 
> I looked at the code again. While your approach is indeed faster, it
> is only faster for traffic bypassing AF_XDP, which is generally not
> that relevant for performance.
> 
> More critically, it currently corrupts kernel memory and panics the
> kernel very quickly when running with frame-size set to 2048, 1500
> MTU, and passing received packets. To be honest, I'm not familiar
> enough with the XSK subsystem to know exactly what specific sizing
> assumption was violated here. By comparison, the approach taken by the
> other drivers is a lot more obviously correct and works perfectly.
> 
> If you want to preserve the current approach, I'm perfectly happy with
> that. However, I don't feel comfortable sending patches for it, as I
> don't understand exactly what the expectations of the various data
> blocks are.
> 
> AFAIK, reproduction should be fairly easy. You just need to run a TCP
> connection to the receiving node (which gets passed to the kernel)
> while receiving some UDP packets via AF_XDP at the same time. As
> mentioned, it also needs frame-size 2048 to reproduce quickly.
> 
> I checked if I could get you an easy reproducer, but xdp-tools is
> quite limited. If you want to keep your approach and can't reproduce
> the panic yourself, let me know and I can see if I can synthesize a
> minimal reproducer.

We now respect the tailroom in UMEM which is supposed to address shinfo
override cases. Could you re-test this on your side with cited patchset
being present on your tree?

https://lore.kernel.org/bpf/20260402154958.562179-1-maciej.fijalkowski@intel.com/

> 
> Regards,
> Lorenz

  reply	other threads:[~2026-05-18 13:29 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-12 15:26 [PATCH] xsk: switch xdp_build_skb_from_zc() to napi_alloc_skb() Lorenz Brun
2026-05-13 15:18 ` Alexander Lobakin
2026-05-18 12:57   ` Lorenz Brun
2026-05-18 13:29     ` Maciej Fijalkowski [this message]
2026-05-18 13:59       ` Alexander Lobakin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=agsUHNss20SweH3a@boxer \
    --to=maciej.fijalkowski@intel.com \
    --cc=aleksander.lobakin@intel.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=anthony.l.nguyen@intel.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=hawk@kernel.org \
    --cc=horms@kernel.org \
    --cc=intel-wired-lan@lists.osuosl.org \
    --cc=john.fastabend@gmail.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lorenz@monogon.tech \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=przemyslaw.kitszel@intel.com \
    --cc=sdf@fomichev.me \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox