From: Paolo Abeni <pabeni@redhat.com>
To: Jason Xing <kerneljasonxing@gmail.com>
Cc: davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
bjorn@kernel.org, magnus.karlsson@intel.com,
maciej.fijalkowski@intel.com, jonathan.lemon@gmail.com,
sdf@fomichev.me, ast@kernel.org, daniel@iogearbox.net,
hawk@kernel.org, john.fastabend@gmail.com, bpf@vger.kernel.org,
netdev@vger.kernel.org, Jason Xing <kernelxing@tencent.com>
Subject: Re: [PATCH net-next v3] xsk: skip validating skb list in xmit path
Date: Thu, 27 Nov 2025 18:58:18 +0100 [thread overview]
Message-ID: <f8d6dbe0-b213-4990-a8af-2f95d25d21be@redhat.com> (raw)
In-Reply-To: <CAL+tcoDdntkJ8SFaqjPvkJoCDwiitqsCNeFUq7CYa_fajPQL4A@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 3317 bytes --]
On 11/27/25 1:49 PM, Jason Xing wrote:
> On Thu, Nov 27, 2025 at 8:02 PM Paolo Abeni <pabeni@redhat.com> wrote:
>> On 11/25/25 12:57 PM, Jason Xing wrote:
>>> This patch also removes total ~4% consumption which can be observed
>>> by perf:
>>> |--2.97%--validate_xmit_skb
>>> | |
>>> | --1.76%--netif_skb_features
>>> | |
>>> | --0.65%--skb_network_protocol
>>> |
>>> |--1.06%--validate_xmit_xfrm
>>>
>>> The above result has been verfied on different NICs, like I40E. I
>>> managed to see the number is going up by 4%.
>>
>> I must admit this delta is surprising, and does not fit my experience in
>> slightly different scenarios with the plain UDP TX path.
>
> My take is that when the path is extremely hot, even the mathematics
> calculation could cause unexpected overhead. You can see the pps is
> now over 2,000,000. The reason why I say this is because I've done a
> few similar tests to verify this thought.
Uhm... 2M is not that huge. Prior to the H/W vulnerability fallout
(spectre and friends) reasonable good H/W (2016 old) could do ~2Mpps
with a single plain UDP socket.
Also validate_xmit_xfrm() should be basically a no-op, possibly some bad
luck with icache?
Could you please try the attached patch instead?
Should not be as good as skipping the whole validation but should give
some measurable gain.
>>> [1] - analysis of the validate_xmit_skb()
>>> 1. validate_xmit_unreadable_skb()
>>> xsk doesn't initialize skb->unreadable, so the function will not free
>>> the skb.
>>> 2. validate_xmit_vlan()
>>> xsk also doesn't initialize skb->vlan_all.
>>> 3. sk_validate_xmit_skb()
>>> skb from xsk_build_skb() doesn't have either sk_validate_xmit_skb or
>>> sk_state, so the skb will not be validated.
>>> 4. netif_needs_gso()
>>> af_xdp doesn't support gso/tso.
>>> 5. skb_needs_linearize() && __skb_linearize()
>>> skb doesn't have frag_list as always, so skb_has_frag_list() returns
>>> false. In copy mode, skb can put more data in the frags[] that can be
>>> found in xsk_build_skb_zerocopy().
>>
>> I'm not sure parse this last sentence correctly, could you please
>> re-phrase?
>>
>> I read it as as the xsk xmit path could build skb with nr_frags > 0.
>> That in turn will need validation from
>> validate_xmit_skb()/skb_needs_linearize() depending on the egress device
>> (lack of NETIF_F_SG), regardless of any other offload required.
>
> There are two paths where the allocation of frags happen:
> 1) xsk_build_skb() -> xsk_build_skb_zerocopy() -> skb_fill_page_desc()
> -> shinfo->frags[i]
> 2) xsk_build_skb() -> skb_add_rx_frag() -> ... -> shinfo->frags[i]
>
> Neither of them touch skb->frag_list, which means frag_list is NULL.
> IIUC, there is no place where frag_list is used (which actually I
> tested). we can see skb_needs_linearize() needs to check
> skb_has_frag_list() first, so it will not proceed after seeing it
> return false.
https://elixir.bootlin.com/linux/v6.18-rc7/source/include/linux/skbuff.h#L4322
return skb_is_nonlinear(skb) &&
((skb_has_frag_list(skb) && !(features & NETIF_F_FRAGLIST)) ||
(skb_shinfo(skb)->nr_frags && !(features & NETIF_F_SG)));
can return true even if `!skb_has_frag_list(skb)`.
I think you still need to call validate_xmit_skb()
/P
[-- Attachment #2: sec_path.patch --]
[-- Type: text/x-patch, Size: 383 bytes --]
diff --git a/net/core/dev.c b/net/core/dev.c
index 9094c0fb8c68..39516a5766e5 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4030,7 +4030,8 @@ static struct sk_buff *validate_xmit_skb(struct sk_buff *skb, struct net_device
}
}
- skb = validate_xmit_xfrm(skb, features, again);
+ if (skb_sec_path(skb)
+ skb = validate_xmit_xfrm(skb, features, again);
return skb;
next prev parent reply other threads:[~2025-11-27 17:58 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-25 11:57 [PATCH net-next v3] xsk: skip validating skb list in xmit path Jason Xing
2025-11-27 12:02 ` Paolo Abeni
2025-11-27 12:49 ` Jason Xing
2025-11-27 17:58 ` Paolo Abeni [this message]
2025-11-28 1:44 ` Jason Xing
2025-11-28 8:40 ` Paolo Abeni
2025-11-28 12:59 ` Jason Xing
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f8d6dbe0-b213-4990-a8af-2f95d25d21be@redhat.com \
--to=pabeni@redhat.com \
--cc=ast@kernel.org \
--cc=bjorn@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=hawk@kernel.org \
--cc=john.fastabend@gmail.com \
--cc=jonathan.lemon@gmail.com \
--cc=kerneljasonxing@gmail.com \
--cc=kernelxing@tencent.com \
--cc=kuba@kernel.org \
--cc=maciej.fijalkowski@intel.com \
--cc=magnus.karlsson@intel.com \
--cc=netdev@vger.kernel.org \
--cc=sdf@fomichev.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox