From: Jesper Dangaard Brouer <hawk@kernel.org>
To: "Jason Xing" <kerneljasonxing@gmail.com>,
"Karlsson, Magnus" <magnus.karlsson@intel.com>,
"Björn Töpel" <bjorn@kernel.org>
Cc: davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
pabeni@redhat.com, bjorn@kernel.org, magnus.karlsson@intel.com,
maciej.fijalkowski@intel.com, jonathan.lemon@gmail.com,
sdf@fomichev.me, ast@kernel.org, daniel@iogearbox.net,
john.fastabend@gmail.com, horms@kernel.org,
andrew+netdev@lunn.ch, bpf@vger.kernel.org,
netdev@vger.kernel.org, Jason Xing <kernelxing@tencent.com>
Subject: Re: [PATCH net-next 2/2] xsk: support generic batch xmit in copy mode
Date: Thu, 21 Aug 2025 19:29:43 +0200 [thread overview]
Message-ID: <599598da-5453-4cd9-b19d-ca7935985030@kernel.org> (raw)
In-Reply-To: <CAL+tcoBWOUCd8f1Q6BYh+xuKs5=Qgr2oOBb9CLU_6BrasD0vfg@mail.gmail.com>
I need some help from Cc Magnus or Björn, to explain why you changes
fails in xsk_destruct_skb().
On 15/08/2025 08.44, Jason Xing wrote:
> On Tue, Aug 12, 2025 at 10:30 PM Jesper Dangaard Brouer <hawk@kernel.org> wrote:
>>
> ...
>>
>> But this also requires changing the SKB alloc function used by
>> xsk_build_skb(). As a seperate patch, I recommend that you change the
>> sock_alloc_send_skb() to instead use build_skb (or build_skb_around).
>> I expect this will be a large performance improvement on it's own.
>> Can I ask you to benchmark this change before the batch xmit change?
>>
>> Opinions needed from other maintainers please (I might be wrong!):
>> I don't think the socket level accounting done in sock_alloc_send_skb()
>> is correct/relevant for AF_XDP/XSK, because the "backpressure mechanism"
>> code comment above.
>
> Here I'm bringing back the last test you expected to know :)
>
> I use alloc_skb() to replace sock_alloc_send_skb() and introduce other
> minor changes, say, removing sock_wfree() from xsk_destruct_skb(). It
> turns out to be a stable 5% performance improvement on i40e driver.
> slight improvement on virtio_net. That's good news.
>
> Bad news is that the above logic has bugs like freeing skb in the napi
> poll causes accessing skb->sk in xsk_destruct_skb() which triggers a
> NULL pointer issue. How did I spot this one? I removed the BQL flow
> control and started two xdpsock on different queues, then I saw a
> panic[1]... To solve the problem like that, I'm afraid that we still
> need to charge a certain length value into sk_wmem_alloc so that
> sock_wfree(skb) can be the last one to free the socket finally.
>
> So this socket level accounting mechanism keeps its safety in the above case.
>
> IMHO, we can get rid of the limitation of sk_sndbuf but still use
> skb_set_owner_w() that charges the len of skb. If we stick to removing
> the whole accounting function, probably we have to adjust the position
> of xsk_cq_submit_locked(), but I reckon for now it's not practical...
>
> Any thoughts on this?
>
> [1]
> 997 [ 133.528449] RIP: 0010:xsk_destruct_skb+0x6a/0x90
> 998 [ 133.528920] Code: 8b 6c 02 28 48 8b 43 18 4c 8b a0 68 03 00 00
> 49 8d 9c 24 e8 00 00 00 48 89 df e8 f1 eb 06 00 48 89 c6 49 8b 84 24
> 88 00 00 00 <48> 8b 50 10 03 2a 48 8b 40 10 48 89 df 89 28 5b 5d
> 41 5c e9 6e ec
> 999 [ 133.530526] RSP: 0018:ffffae71c06a0d08 EFLAGS: 00010046
> 1000 [ 133.531005] RAX: 0000000000000000 RBX: ffff9f42c81c49e8 RCX:
> 00000000000002e7
> 1001 [ 133.531631] RDX: 0000000000000001 RSI: 0000000000000286 RDI:
> ffff9f42c81c49e8
> 1002 [ 133.532249] RBP: 0000000000000001 R08: 0000000000000008 R09:
> 00000000000000001003 [ 133.532867] R10: ffffffff978080c0 R11:
> ffffae71c06a0ff8 R12: ffff9f42c81c4900
> 1004 [ 133.533491] R13: ffffae71c06a0d88 R14: ffff9f42e0f1f900 R15:
> ffff9f42ce850d801005 [ 133.534123] FS: 0000000000000000(0000)
> GS:ffff9f5227655000(0000) knlGS:00000000000000001006 [ 133.534831]
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> 1007 [ 133.535366] CR2: 0000000000000010 CR3: 000000011c820000 CR4:
> 00000000003506f0
> 1008 [ 133.536014] Call Trace:
> 1009 [ 133.536313] <IRQ>
> 1010 [ 133.536583] skb_release_head_state+0x20/0x90
> 1011 [ 133.537021] napi_consume_skb+0x42/0x120
> 1012 [ 133.537429] __free_old_xmit+0x76/0x170 [virtio_net]
> 1013 [ 133.537923] free_old_xmit+0x53/0xc0 [virtio_net]
> 1014 [ 133.538395] virtnet_poll+0xed/0x5d0 [virtio_net]
> 1015 [ 133.538867] ? blake2s_compress+0x52/0xa0
> 1016 [ 133.539286] __napi_poll+0x28/0x200
> 1017 [ 133.539668] net_rx_action+0x319/0x400
> 1018 [ 133.540068] ? sched_clock_cpu+0xb/0x190
> 1019 [ 133.540482] ? __run_timers+0x1d1/0x260
> 1020 [ 133.540906] ? __pfx_dl_task_timer+0x10/0x10
> 1021 [ 133.541349] ? lock_timer_base+0x72/0x90
> 1022 [ 133.541767] handle_softirqs+0xce/0x2e0
> 1023 [ 133.542178] __irq_exit_rcu+0xc6/0xf0
> 1024 [ 133.542575] common_interrupt+0x81/0xa0
>
> Thanks,
> Jason
next prev parent reply other threads:[~2025-08-21 17:29 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-11 13:12 [PATCH net-next 0/2] xsk: improvement performance in copy mode Jason Xing
2025-08-11 13:12 ` [PATCH net-next 1/2] xsk: introduce XDP_GENERIC_XMIT_BATCH setsockopt Jason Xing
2025-08-12 16:40 ` Maciej Fijalkowski
2025-08-12 23:46 ` Jason Xing
2025-08-11 13:12 ` [PATCH net-next 2/2] xsk: support generic batch xmit in copy mode Jason Xing
2025-08-12 14:30 ` Jesper Dangaard Brouer
2025-08-12 17:49 ` Maciej Fijalkowski
2025-08-13 1:02 ` Jason Xing
2025-08-13 13:06 ` Jason Xing
2025-08-15 16:40 ` Jesper Dangaard Brouer
2025-08-16 0:03 ` Jason Xing
2025-08-16 13:42 ` Jason Xing
2025-08-13 0:57 ` Jason Xing
2025-08-15 6:44 ` Jason Xing
2025-08-21 17:29 ` Jesper Dangaard Brouer [this message]
2025-08-22 1:13 ` Jason Xing
-- strict thread matches above, loose matches on Subject: below --
2025-08-17 3:05 [PATCH net-next 1/2] xsk: introduce XDP_GENERIC_XMIT_BATCH setsockopt kernel test robot
2025-08-18 6:20 ` Dan Carpenter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=599598da-5453-4cd9-b19d-ca7935985030@kernel.org \
--to=hawk@kernel.org \
--cc=andrew+netdev@lunn.ch \
--cc=ast@kernel.org \
--cc=bjorn@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=john.fastabend@gmail.com \
--cc=jonathan.lemon@gmail.com \
--cc=kerneljasonxing@gmail.com \
--cc=kernelxing@tencent.com \
--cc=kuba@kernel.org \
--cc=maciej.fijalkowski@intel.com \
--cc=magnus.karlsson@intel.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=sdf@fomichev.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.