Netdev List
 help / color / mirror / Atom feed
From: Alexander Lobakin <aleksander.lobakin@intel.com>
To: Maciej Fijalkowski <maciej.fijalkowski@intel.com>,
	Lorenz Brun <lorenz@monogon.tech>
Cc: Tony Nguyen <anthony.l.nguyen@intel.com>,
	Przemek Kitszel <przemyslaw.kitszel@intel.com>,
	Andrew Lunn <andrew+netdev@lunn.ch>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	"Jakub Kicinski" <kuba@kernel.org>,
	Paolo Abeni <pabeni@redhat.com>, Simon Horman <horms@kernel.org>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Jesper Dangaard Brouer <hawk@kernel.org>,
	"John Fastabend" <john.fastabend@gmail.com>,
	Stanislav Fomichev <sdf@fomichev.me>, <stable@vger.kernel.org>,
	<intel-wired-lan@lists.osuosl.org>, <netdev@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>, <bpf@vger.kernel.org>
Subject: Re: [PATCH] xsk: switch xdp_build_skb_from_zc() to napi_alloc_skb()
Date: Mon, 18 May 2026 15:59:00 +0200	[thread overview]
Message-ID: <744d9c62-a5e8-4702-bcdf-c9a8d31a026d@intel.com> (raw)
In-Reply-To: <agsUHNss20SweH3a@boxer>

From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Date: Mon, 18 May 2026 15:29:00 +0200

> On Mon, May 18, 2026 at 02:57:55PM +0200, Lorenz Brun wrote:
>> On Wed, 13 May 2026 at 17:21, Alexander Lobakin
>> <aleksander.lobakin@intel.com> wrote:
>>>
>>> From: Lorenz Brun <lorenz@monogon.tech>
>>> Date: Tue, 12 May 2026 17:26:56 +0200
>>>
>>>> xdp_build_skb_from_zc() allocated xdp->frame_sz bytes from the per-cpu
>>>> system_page_pool and built the skb head with napi_build_skb(). The
>>>> latter places skb_shared_info at the tail of the buffer, but the
>>>> helper sized the allocation as if the whole frame_sz were usable for
>>>> data. Whenever the packet plus reserved headroom approached frame_sz,
>>>> the head memcpy overran shinfo with packet content, corrupting
>>>> ->flags (SKBFL_ZEROCOPY_ENABLE) and ->nr_frags, which then drove
>>>> skb_copy_ubufs() off the end of frags[] on the RX path:
>>>>
>>>>   UBSAN: array-index-out-of-bounds in include/linux/skbuff.h:2541
>>>>   index 113 is out of range for type 'skb_frag_t [17]'
>>>>    skb_copy_ubufs+0x7da/0x960
>>>>    ip_local_deliver_finish+0xcd/0x110
>>>>    ice_napi_poll+0xe4/0x2a0 [ice]
>>>>
>>>> The overrun bytes come from the packet, so an on-wire sender can
>>>> corrupt kernel memory remotely whenever the XDP program returns
>>>> XDP_PASS.
>>>>
>>>> Rather than patch the sizing math, switch to the pattern used by other
>>>> in-tree AF_XDP zero-copy drivers like mlx5 and i40e which use
>>>> napi_alloc_skb() sized to the actual packet plus skb_put_data().
>>>> This sizes the head exactly for the data being copied, drops the
>>>> system_page_pool local_lock from this path, and removes the
>>>> structural mismatch between frame_sz and the skb head buffer. Frags
>>>> are allocated with alloc_page() per frag, matching the other drivers.
>>>
>>> I used napi_build_skb() + system page_pool to enable PP recycling
>>> improving XSk XDP_PASS performance a lot.
>>> Are you sure there's no other way to approach this?
>>>
>>> napi_alloc_skb() used in other drivers works, but it's sorta old
>>> approach which is way slower.
>>>
>>> System page_pools always allocate a full page, why can it create an skb
>>> prone to overruns?
>>>
>>>>
>>>> Fixes: 560d958c6c68 ("xsk: add generic XSk &xdp_buff -> skb conversion")
>>>> Cc: stable@vger.kernel.org
>>>> Signed-off-by: Lorenz Brun <lorenz@monogon.tech>
>>> Thanks,
>>> Olek
>>
>> Hi Olek
>>
>> I looked at the code again. While your approach is indeed faster, it
>> is only faster for traffic bypassing AF_XDP, which is generally not
>> that relevant for performance.
>>
>> More critically, it currently corrupts kernel memory and panics the
>> kernel very quickly when running with frame-size set to 2048, 1500
>> MTU, and passing received packets. To be honest, I'm not familiar
>> enough with the XSK subsystem to know exactly what specific sizing
>> assumption was violated here. By comparison, the approach taken by the
>> other drivers is a lot more obviously correct and works perfectly.
>>
>> If you want to preserve the current approach, I'm perfectly happy with
>> that. However, I don't feel comfortable sending patches for it, as I
>> don't understand exactly what the expectations of the various data
>> blocks are.
>>
>> AFAIK, reproduction should be fairly easy. You just need to run a TCP
>> connection to the receiving node (which gets passed to the kernel)
>> while receiving some UDP packets via AF_XDP at the same time. As
>> mentioned, it also needs frame-size 2048 to reproduce quickly.
>>
>> I checked if I could get you an easy reproducer, but xdp-tools is
>> quite limited. If you want to keep your approach and can't reproduce
>> the panic yourself, let me know and I can see if I can synthesize a
>> minimal reproducer.
> 
> We now respect the tailroom in UMEM which is supposed to address shinfo
> override cases. Could you re-test this on your side with cited patchset
> being present on your tree?
> 
> https://lore.kernel.org/bpf/20260402154958.562179-1-maciej.fijalkowski@intel.com/

Either way and regardless of whether XSk XDP_PASS is
performance-demanding or not, fixing an issue by replacing the
implementation with the one from some driver "because it works" is not
something I'd like to see.
If you have difficulties with root-causing the actual problem, I can
take a look and fix it since it's my code.

But yeah, first make sure the series Maciej mentioned is present in your
tree.

Thanks,
Olek

      reply	other threads:[~2026-05-18 13:59 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-12 15:26 [PATCH] xsk: switch xdp_build_skb_from_zc() to napi_alloc_skb() Lorenz Brun
2026-05-13 15:18 ` Alexander Lobakin
2026-05-18 12:57   ` Lorenz Brun
2026-05-18 13:29     ` Maciej Fijalkowski
2026-05-18 13:59       ` Alexander Lobakin [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=744d9c62-a5e8-4702-bcdf-c9a8d31a026d@intel.com \
    --to=aleksander.lobakin@intel.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=anthony.l.nguyen@intel.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=hawk@kernel.org \
    --cc=horms@kernel.org \
    --cc=intel-wired-lan@lists.osuosl.org \
    --cc=john.fastabend@gmail.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lorenz@monogon.tech \
    --cc=maciej.fijalkowski@intel.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=przemyslaw.kitszel@intel.com \
    --cc=sdf@fomichev.me \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox