All of lore.kernel.org
 help / color / mirror / Atom feed
From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
To: Fernando Fernandez Mancera <fmancera@suse.de>
Cc: <netdev@vger.kernel.org>, <csmate@nop.hu>,
	<kerneljasonxing@gmail.com>, <bjorn@kernel.org>,
	<sdf@fomichev.me>, <jonathan.lemon@gmail.com>,
	<bpf@vger.kernel.org>, <davem@davemloft.net>,
	<edumazet@google.com>, <kuba@kernel.org>, <pabeni@redhat.com>,
	<horms@kernel.org>
Subject: Re: [PATCH net v2] xsk: avoid data corruption on cq descriptor number
Date: Mon, 27 Oct 2025 13:32:00 +0100	[thread overview]
Message-ID: <aP9mQG4dRvPxUJhj@boxer> (raw)
In-Reply-To: <4723fa89-17d3-4204-b490-979df9182454@suse.de>

On Sat, Oct 25, 2025 at 08:18:17PM +0200, Fernando Fernandez Mancera wrote:
> 
> 
> On 10/24/25 7:16 PM, Maciej Fijalkowski wrote:
> > On Fri, Oct 24, 2025 at 12:40:49PM +0200, Fernando Fernandez Mancera wrote:
> > > Since commit 30f241fcf52a ("xsk: Fix immature cq descriptor
> > > production"), the descriptor number is stored in skb control block and
> > > xsk_cq_submit_addr_locked() relies on it to put the umem addrs onto
> > > pool's completion queue.
> > > 
> > > skb control block shouldn't be used for this purpose as after transmit
> > > xsk doesn't have control over it and other subsystems could use it. This
> > > leads to the following kernel panic due to a NULL pointer dereference.
> > > 
> > >   BUG: kernel NULL pointer dereference, address: 0000000000000000
> > >   #PF: supervisor read access in kernel mode
> > >   #PF: error_code(0x0000) - not-present page
> > >   PGD 0 P4D 0
> > >   Oops: Oops: 0000 [#1] SMP NOPTI
> > >   CPU: 2 UID: 1 PID: 927 Comm: p4xsk.bin Not tainted 6.16.12+deb14-cloud-amd64 #1 PREEMPT(lazy)  Debian 6.16.12-1
> > >   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-debian-1.17.0-1 04/01/2014
> > >   RIP: 0010:xsk_destruct_skb+0xd0/0x180
> > >   [...]
> > >   Call Trace:
> > >    <IRQ>
> > >    ? napi_complete_done+0x7a/0x1a0
> > >    ip_rcv_core+0x1bb/0x340
> > >    ip_rcv+0x30/0x1f0
> > >    __netif_receive_skb_one_core+0x85/0xa0
> > >    process_backlog+0x87/0x130
> > >    __napi_poll+0x28/0x180
> > >    net_rx_action+0x339/0x420
> > >    handle_softirqs+0xdc/0x320
> > >    ? handle_edge_irq+0x90/0x1e0
> > >    do_softirq.part.0+0x3b/0x60
> > >    </IRQ>
> > >    <TASK>
> > >    __local_bh_enable_ip+0x60/0x70
> > >    __dev_direct_xmit+0x14e/0x1f0
> > >    __xsk_generic_xmit+0x482/0xb70
> > >    ? __remove_hrtimer+0x41/0xa0
> > >    ? __xsk_generic_xmit+0x51/0xb70
> > >    ? _raw_spin_unlock_irqrestore+0xe/0x40
> > >    xsk_sendmsg+0xda/0x1c0
> > >    __sys_sendto+0x1ee/0x200
> > >    __x64_sys_sendto+0x24/0x30
> > >    do_syscall_64+0x84/0x2f0
> > >    ? __pfx_pollwake+0x10/0x10
> > >    ? __rseq_handle_notify_resume+0xad/0x4c0
> > >    ? restore_fpregs_from_fpstate+0x3c/0x90
> > >    ? switch_fpu_return+0x5b/0xe0
> > >    ? do_syscall_64+0x204/0x2f0
> > >    ? do_syscall_64+0x204/0x2f0
> > >    ? do_syscall_64+0x204/0x2f0
> > >    entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > >    </TASK>
> > >   [...]
> > >   Kernel panic - not syncing: Fatal exception in interrupt
> > >   Kernel Offset: 0x1c000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> > > 
> > > The approach proposed stores the first address also in the xsk_addr_node
> > > along with the number of descriptors. The head xsk_addr_node is
> > > referenced in skb_shinfo(skb)->destructor_arg. The rest of the fragments
> > > store the address on the list.
> > > 
> > > This is less efficient as 4 bytes are wasted when storing each address.
> > 
> > Hi Fernando,
> > it's not about 4 bytes being wasted but rather memory allocation that you
> > introduce here. I tried hard to avoid hurting non-fragmented traffic,
> > below you can find impact reported by Jason from similar approach as
> > yours:
> > https://lore.kernel.org/bpf/CAL+tcoD=Gn6ZmJ+_Y48vPRyHVHmP-7irsx=fRVRnyhDrpTrEtQ@mail.gmail.com/
> > 
> > I assume this patch will yield a similar performance degradation...
> > 
> 
> It does, thank you for explaining Maciej. I have been thinking about
> possible solutions and remembered skb extensions. If I am not wrong, it
> shouldn't yield a performance degratation or at least it should be a much
> less severe one. Although, XDP_SOCKETS Kconfig would require "select
> SKB_EXTENSIONS".

SGTM. However I have not used this feature in the past, so I'm looking
forward for your implementation.

> 
> What do you think about this approach? I could draft a series for net-next..

That should be targetted to 'bpf' as a fix, still.

> I am just looking for different options other than using skb control block
> because I believe similar issues will arise in the future even if we fix the
> current one on ip_rcv..

Yes I agree. Heart says 'just fix the IP layer' but as you said, we never
know if other layer would wipe out the cb.

> 
> Thanks,
> Fernando.

      reply	other threads:[~2025-10-27 12:32 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-24 10:40 [PATCH net v2] xsk: avoid data corruption on cq descriptor number Fernando Fernandez Mancera
2025-10-24 17:16 ` Maciej Fijalkowski
2025-10-25 18:18   ` Fernando Fernandez Mancera
2025-10-27 12:32     ` Maciej Fijalkowski [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aP9mQG4dRvPxUJhj@boxer \
    --to=maciej.fijalkowski@intel.com \
    --cc=bjorn@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=csmate@nop.hu \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=fmancera@suse.de \
    --cc=horms@kernel.org \
    --cc=jonathan.lemon@gmail.com \
    --cc=kerneljasonxing@gmail.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=sdf@fomichev.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.