* [PATCH net-next v3] xsk: skip validating skb list in xmit path
@ 2025-11-25 11:57 Jason Xing
2025-11-27 12:02 ` Paolo Abeni
0 siblings, 1 reply; 7+ messages in thread
From: Jason Xing @ 2025-11-25 11:57 UTC (permalink / raw)
To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
john.fastabend
Cc: bpf, netdev, Jason Xing
From: Jason Xing <kernelxing@tencent.com>
This patch only does one thing that removes validate_xmit_skb_list()
for xsk.
For xsk, it's not needed to validate and check the skb in
validate_xmit_skb_list() in copy mode because xsk_build_skb() doesn't
and doesn't need to prepare those requisites to validate. Xsk is just
responsible for delivering raw data from userspace to the driver. This
is also how zerocopy works.
The __dev_direct_xmit was taken out of af_packet in commit 865b03f21162
("dev: packet: make packet_direct_xmit a common function"). And a call
to validate_xmit_skb_list was added in commit 104ba78c9880 ("packet: on
direct_xmit, limit tso and csum to supported devices") to support TSO.
Since we don't support tso/vlan offloads in xsk_build_skb, we can remove
validate_xmit_skb_list for xsk. I put the full analysis at the end of
the commit log[1].
Skipping numerous checks helps the transmission especially in the extremely
hot path, say, over 2,000,000 pps. In this kind of workload, even trivial
mathematical operations can bring performance overhead.
Performance-wise, I run './xdpsock -i enp2s0f0np0 -t -S -s 64' on 1Gb/sec
ixgbe driver to verify. It stably goes up by 5.48%, which can be seen in
the shown below:
Before:
sock0@enp2s0f0np0:0 txonly xdp-skb
pps pkts 1.00
rx 0 0
tx 1,187,410 3,513,536
After:
sock0@enp2s0f0np0:0 txonly xdp-skb
pps pkts 1.00
rx 0 0
tx 1,252,590 2,459,456
This patch also removes total ~4% consumption which can be observed
by perf:
|--2.97%--validate_xmit_skb
| |
| --1.76%--netif_skb_features
| |
| --0.65%--skb_network_protocol
|
|--1.06%--validate_xmit_xfrm
The above result has been verfied on different NICs, like I40E. I
managed to see the number is going up by 4%.
[1] - analysis of the validate_xmit_skb()
1. validate_xmit_unreadable_skb()
xsk doesn't initialize skb->unreadable, so the function will not free
the skb.
2. validate_xmit_vlan()
xsk also doesn't initialize skb->vlan_all.
3. sk_validate_xmit_skb()
skb from xsk_build_skb() doesn't have either sk_validate_xmit_skb or
sk_state, so the skb will not be validated.
4. netif_needs_gso()
af_xdp doesn't support gso/tso.
5. skb_needs_linearize() && __skb_linearize()
skb doesn't have frag_list as always, so skb_has_frag_list() returns
false. In copy mode, skb can put more data in the frags[] that can be
found in xsk_build_skb_zerocopy().
6. CHECKSUM_PARTIAL
skb doesn't have to set ip_summed, so we can skip this part as well.
7. validate_xmit_xfrm()
af_xdp has nothing to do with IPsec/XFRM, so we don't need this check
either.
Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
V3
Link: https://lore.kernel.org/all/20250716122725.6088-1-kerneljasonxing@gmail.com/
1. add a full analysis about why we can remove validation in af_xdp
2. I didn't add Stan's acked-by since it has been a while.
V2
Link: https://lore.kernel.org/all/20250713025756.24601-1-kerneljasonxing@gmail.com/
1. avoid adding a new flag
2. add more descriptions from Stan
---
include/linux/netdevice.h | 30 ++++++++++++++++++++----------
net/core/dev.c | 6 ------
2 files changed, 20 insertions(+), 16 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index e808071dbb7d..cafeb06b523d 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3374,16 +3374,6 @@ static inline int dev_queue_xmit_accel(struct sk_buff *skb,
return __dev_queue_xmit(skb, sb_dev);
}
-static inline int dev_direct_xmit(struct sk_buff *skb, u16 queue_id)
-{
- int ret;
-
- ret = __dev_direct_xmit(skb, queue_id);
- if (!dev_xmit_complete(ret))
- kfree_skb(skb);
- return ret;
-}
-
int register_netdevice(struct net_device *dev);
void unregister_netdevice_queue(struct net_device *dev, struct list_head *head);
void unregister_netdevice_many(struct list_head *head);
@@ -4343,6 +4333,26 @@ static __always_inline int ____dev_forward_skb(struct net_device *dev,
return 0;
}
+static inline int dev_direct_xmit(struct sk_buff *skb, u16 queue_id)
+{
+ struct net_device *dev = skb->dev;
+ struct sk_buff *orig_skb = skb;
+ bool again = false;
+ int ret;
+
+ skb = validate_xmit_skb_list(skb, dev, &again);
+ if (skb != orig_skb) {
+ dev_core_stats_tx_dropped_inc(dev);
+ kfree_skb_list(skb);
+ return NET_XMIT_DROP;
+ }
+
+ ret = __dev_direct_xmit(skb, queue_id);
+ if (!dev_xmit_complete(ret))
+ kfree_skb(skb);
+ return ret;
+}
+
bool dev_nit_active_rcu(const struct net_device *dev);
static inline bool dev_nit_active(const struct net_device *dev)
{
diff --git a/net/core/dev.c b/net/core/dev.c
index 69515edd17bc..82d5d098464f 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4843,19 +4843,13 @@ EXPORT_SYMBOL(__dev_queue_xmit);
int __dev_direct_xmit(struct sk_buff *skb, u16 queue_id)
{
struct net_device *dev = skb->dev;
- struct sk_buff *orig_skb = skb;
struct netdev_queue *txq;
int ret = NETDEV_TX_BUSY;
- bool again = false;
if (unlikely(!netif_running(dev) ||
!netif_carrier_ok(dev)))
goto drop;
- skb = validate_xmit_skb_list(skb, dev, &again);
- if (skb != orig_skb)
- goto drop;
-
skb_set_queue_mapping(skb, queue_id);
txq = skb_get_tx_queue(dev, skb);
--
2.41.3
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH net-next v3] xsk: skip validating skb list in xmit path
2025-11-25 11:57 [PATCH net-next v3] xsk: skip validating skb list in xmit path Jason Xing
@ 2025-11-27 12:02 ` Paolo Abeni
2025-11-27 12:49 ` Jason Xing
0 siblings, 1 reply; 7+ messages in thread
From: Paolo Abeni @ 2025-11-27 12:02 UTC (permalink / raw)
To: Jason Xing, davem, edumazet, kuba, bjorn, magnus.karlsson,
maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
john.fastabend
Cc: bpf, netdev, Jason Xing
On 11/25/25 12:57 PM, Jason Xing wrote:
> This patch also removes total ~4% consumption which can be observed
> by perf:
> |--2.97%--validate_xmit_skb
> | |
> | --1.76%--netif_skb_features
> | |
> | --0.65%--skb_network_protocol
> |
> |--1.06%--validate_xmit_xfrm
>
> The above result has been verfied on different NICs, like I40E. I
> managed to see the number is going up by 4%.
I must admit this delta is surprising, and does not fit my experience in
slightly different scenarios with the plain UDP TX path.
> [1] - analysis of the validate_xmit_skb()
> 1. validate_xmit_unreadable_skb()
> xsk doesn't initialize skb->unreadable, so the function will not free
> the skb.
> 2. validate_xmit_vlan()
> xsk also doesn't initialize skb->vlan_all.
> 3. sk_validate_xmit_skb()
> skb from xsk_build_skb() doesn't have either sk_validate_xmit_skb or
> sk_state, so the skb will not be validated.
> 4. netif_needs_gso()
> af_xdp doesn't support gso/tso.
> 5. skb_needs_linearize() && __skb_linearize()
> skb doesn't have frag_list as always, so skb_has_frag_list() returns
> false. In copy mode, skb can put more data in the frags[] that can be
> found in xsk_build_skb_zerocopy().
I'm not sure parse this last sentence correctly, could you please
re-phrase?
I read it as as the xsk xmit path could build skb with nr_frags > 0.
That in turn will need validation from
validate_xmit_skb()/skb_needs_linearize() depending on the egress device
(lack of NETIF_F_SG), regardless of any other offload required.
/P
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net-next v3] xsk: skip validating skb list in xmit path
2025-11-27 12:02 ` Paolo Abeni
@ 2025-11-27 12:49 ` Jason Xing
2025-11-27 17:58 ` Paolo Abeni
0 siblings, 1 reply; 7+ messages in thread
From: Jason Xing @ 2025-11-27 12:49 UTC (permalink / raw)
To: Paolo Abeni
Cc: davem, edumazet, kuba, bjorn, magnus.karlsson, maciej.fijalkowski,
jonathan.lemon, sdf, ast, daniel, hawk, john.fastabend, bpf,
netdev, Jason Xing
On Thu, Nov 27, 2025 at 8:02 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On 11/25/25 12:57 PM, Jason Xing wrote:
> > This patch also removes total ~4% consumption which can be observed
> > by perf:
> > |--2.97%--validate_xmit_skb
> > | |
> > | --1.76%--netif_skb_features
> > | |
> > | --0.65%--skb_network_protocol
> > |
> > |--1.06%--validate_xmit_xfrm
> >
> > The above result has been verfied on different NICs, like I40E. I
> > managed to see the number is going up by 4%.
>
> I must admit this delta is surprising, and does not fit my experience in
> slightly different scenarios with the plain UDP TX path.
My take is that when the path is extremely hot, even the mathematics
calculation could cause unexpected overhead. You can see the pps is
now over 2,000,000. The reason why I say this is because I've done a
few similar tests to verify this thought.
>
> > [1] - analysis of the validate_xmit_skb()
> > 1. validate_xmit_unreadable_skb()
> > xsk doesn't initialize skb->unreadable, so the function will not free
> > the skb.
> > 2. validate_xmit_vlan()
> > xsk also doesn't initialize skb->vlan_all.
> > 3. sk_validate_xmit_skb()
> > skb from xsk_build_skb() doesn't have either sk_validate_xmit_skb or
> > sk_state, so the skb will not be validated.
> > 4. netif_needs_gso()
> > af_xdp doesn't support gso/tso.
> > 5. skb_needs_linearize() && __skb_linearize()
> > skb doesn't have frag_list as always, so skb_has_frag_list() returns
> > false. In copy mode, skb can put more data in the frags[] that can be
> > found in xsk_build_skb_zerocopy().
>
> I'm not sure parse this last sentence correctly, could you please
> re-phrase?
>
> I read it as as the xsk xmit path could build skb with nr_frags > 0.
> That in turn will need validation from
> validate_xmit_skb()/skb_needs_linearize() depending on the egress device
> (lack of NETIF_F_SG), regardless of any other offload required.
There are two paths where the allocation of frags happen:
1) xsk_build_skb() -> xsk_build_skb_zerocopy() -> skb_fill_page_desc()
-> shinfo->frags[i]
2) xsk_build_skb() -> skb_add_rx_frag() -> ... -> shinfo->frags[i]
Neither of them touch skb->frag_list, which means frag_list is NULL.
IIUC, there is no place where frag_list is used (which actually I
tested). we can see skb_needs_linearize() needs to check
skb_has_frag_list() first, so it will not proceed after seeing it
return false.
Does it make sense to you, I wonder?
Thanks,
Jason
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net-next v3] xsk: skip validating skb list in xmit path
2025-11-27 12:49 ` Jason Xing
@ 2025-11-27 17:58 ` Paolo Abeni
2025-11-28 1:44 ` Jason Xing
0 siblings, 1 reply; 7+ messages in thread
From: Paolo Abeni @ 2025-11-27 17:58 UTC (permalink / raw)
To: Jason Xing
Cc: davem, edumazet, kuba, bjorn, magnus.karlsson, maciej.fijalkowski,
jonathan.lemon, sdf, ast, daniel, hawk, john.fastabend, bpf,
netdev, Jason Xing
[-- Attachment #1: Type: text/plain, Size: 3317 bytes --]
On 11/27/25 1:49 PM, Jason Xing wrote:
> On Thu, Nov 27, 2025 at 8:02 PM Paolo Abeni <pabeni@redhat.com> wrote:
>> On 11/25/25 12:57 PM, Jason Xing wrote:
>>> This patch also removes total ~4% consumption which can be observed
>>> by perf:
>>> |--2.97%--validate_xmit_skb
>>> | |
>>> | --1.76%--netif_skb_features
>>> | |
>>> | --0.65%--skb_network_protocol
>>> |
>>> |--1.06%--validate_xmit_xfrm
>>>
>>> The above result has been verfied on different NICs, like I40E. I
>>> managed to see the number is going up by 4%.
>>
>> I must admit this delta is surprising, and does not fit my experience in
>> slightly different scenarios with the plain UDP TX path.
>
> My take is that when the path is extremely hot, even the mathematics
> calculation could cause unexpected overhead. You can see the pps is
> now over 2,000,000. The reason why I say this is because I've done a
> few similar tests to verify this thought.
Uhm... 2M is not that huge. Prior to the H/W vulnerability fallout
(spectre and friends) reasonable good H/W (2016 old) could do ~2Mpps
with a single plain UDP socket.
Also validate_xmit_xfrm() should be basically a no-op, possibly some bad
luck with icache?
Could you please try the attached patch instead?
Should not be as good as skipping the whole validation but should give
some measurable gain.
>>> [1] - analysis of the validate_xmit_skb()
>>> 1. validate_xmit_unreadable_skb()
>>> xsk doesn't initialize skb->unreadable, so the function will not free
>>> the skb.
>>> 2. validate_xmit_vlan()
>>> xsk also doesn't initialize skb->vlan_all.
>>> 3. sk_validate_xmit_skb()
>>> skb from xsk_build_skb() doesn't have either sk_validate_xmit_skb or
>>> sk_state, so the skb will not be validated.
>>> 4. netif_needs_gso()
>>> af_xdp doesn't support gso/tso.
>>> 5. skb_needs_linearize() && __skb_linearize()
>>> skb doesn't have frag_list as always, so skb_has_frag_list() returns
>>> false. In copy mode, skb can put more data in the frags[] that can be
>>> found in xsk_build_skb_zerocopy().
>>
>> I'm not sure parse this last sentence correctly, could you please
>> re-phrase?
>>
>> I read it as as the xsk xmit path could build skb with nr_frags > 0.
>> That in turn will need validation from
>> validate_xmit_skb()/skb_needs_linearize() depending on the egress device
>> (lack of NETIF_F_SG), regardless of any other offload required.
>
> There are two paths where the allocation of frags happen:
> 1) xsk_build_skb() -> xsk_build_skb_zerocopy() -> skb_fill_page_desc()
> -> shinfo->frags[i]
> 2) xsk_build_skb() -> skb_add_rx_frag() -> ... -> shinfo->frags[i]
>
> Neither of them touch skb->frag_list, which means frag_list is NULL.
> IIUC, there is no place where frag_list is used (which actually I
> tested). we can see skb_needs_linearize() needs to check
> skb_has_frag_list() first, so it will not proceed after seeing it
> return false.
https://elixir.bootlin.com/linux/v6.18-rc7/source/include/linux/skbuff.h#L4322
return skb_is_nonlinear(skb) &&
((skb_has_frag_list(skb) && !(features & NETIF_F_FRAGLIST)) ||
(skb_shinfo(skb)->nr_frags && !(features & NETIF_F_SG)));
can return true even if `!skb_has_frag_list(skb)`.
I think you still need to call validate_xmit_skb()
/P
[-- Attachment #2: sec_path.patch --]
[-- Type: text/x-patch, Size: 383 bytes --]
diff --git a/net/core/dev.c b/net/core/dev.c
index 9094c0fb8c68..39516a5766e5 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4030,7 +4030,8 @@ static struct sk_buff *validate_xmit_skb(struct sk_buff *skb, struct net_device
}
}
- skb = validate_xmit_xfrm(skb, features, again);
+ if (skb_sec_path(skb)
+ skb = validate_xmit_xfrm(skb, features, again);
return skb;
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH net-next v3] xsk: skip validating skb list in xmit path
2025-11-27 17:58 ` Paolo Abeni
@ 2025-11-28 1:44 ` Jason Xing
2025-11-28 8:40 ` Paolo Abeni
0 siblings, 1 reply; 7+ messages in thread
From: Jason Xing @ 2025-11-28 1:44 UTC (permalink / raw)
To: Paolo Abeni
Cc: davem, edumazet, kuba, bjorn, magnus.karlsson, maciej.fijalkowski,
jonathan.lemon, sdf, ast, daniel, hawk, john.fastabend, bpf,
netdev, Jason Xing
On Fri, Nov 28, 2025 at 1:58 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On 11/27/25 1:49 PM, Jason Xing wrote:
> > On Thu, Nov 27, 2025 at 8:02 PM Paolo Abeni <pabeni@redhat.com> wrote:
> >> On 11/25/25 12:57 PM, Jason Xing wrote:
> >>> This patch also removes total ~4% consumption which can be observed
> >>> by perf:
> >>> |--2.97%--validate_xmit_skb
> >>> | |
> >>> | --1.76%--netif_skb_features
> >>> | |
> >>> | --0.65%--skb_network_protocol
> >>> |
> >>> |--1.06%--validate_xmit_xfrm
> >>>
> >>> The above result has been verfied on different NICs, like I40E. I
> >>> managed to see the number is going up by 4%.
> >>
> >> I must admit this delta is surprising, and does not fit my experience in
> >> slightly different scenarios with the plain UDP TX path.
> >
> > My take is that when the path is extremely hot, even the mathematics
> > calculation could cause unexpected overhead. You can see the pps is
> > now over 2,000,000. The reason why I say this is because I've done a
> > few similar tests to verify this thought.
>
> Uhm... 2M is not that huge. Prior to the H/W vulnerability fallout
> (spectre and friends) reasonable good H/W (2016 old) could do ~2Mpps
> with a single plain UDP socket.
Interesting number that I'm not aware of. Thanks.
But for now it's really hard for xsk (in copy mode) to reach over 2M
pps even with some recent optimizations applied. I wonder how you test
UDP? Could you share the benchmark here?
IMHO, xsk should not be slower than a plain UDP socket. So I think it
should be a huge room for xsk to improve...
>
> Also validate_xmit_xfrm() should be basically a no-op, possibly some bad
> luck with icache?
Maybe. I strongly feel that I need to work on the layout of those structures.
>
> Could you please try the attached patch instead?
Yep, and I didn't manage to see any improvement.
>
> Should not be as good as skipping the whole validation but should give
> some measurable gain.
> >>> [1] - analysis of the validate_xmit_skb()
> >>> 1. validate_xmit_unreadable_skb()
> >>> xsk doesn't initialize skb->unreadable, so the function will not free
> >>> the skb.
> >>> 2. validate_xmit_vlan()
> >>> xsk also doesn't initialize skb->vlan_all.
> >>> 3. sk_validate_xmit_skb()
> >>> skb from xsk_build_skb() doesn't have either sk_validate_xmit_skb or
> >>> sk_state, so the skb will not be validated.
> >>> 4. netif_needs_gso()
> >>> af_xdp doesn't support gso/tso.
> >>> 5. skb_needs_linearize() && __skb_linearize()
> >>> skb doesn't have frag_list as always, so skb_has_frag_list() returns
> >>> false. In copy mode, skb can put more data in the frags[] that can be
> >>> found in xsk_build_skb_zerocopy().
> >>
> >> I'm not sure parse this last sentence correctly, could you please
> >> re-phrase?
> >>
> >> I read it as as the xsk xmit path could build skb with nr_frags > 0.
> >> That in turn will need validation from
> >> validate_xmit_skb()/skb_needs_linearize() depending on the egress device
> >> (lack of NETIF_F_SG), regardless of any other offload required.
> >
> > There are two paths where the allocation of frags happen:
> > 1) xsk_build_skb() -> xsk_build_skb_zerocopy() -> skb_fill_page_desc()
> > -> shinfo->frags[i]
> > 2) xsk_build_skb() -> skb_add_rx_frag() -> ... -> shinfo->frags[i]
> >
> > Neither of them touch skb->frag_list, which means frag_list is NULL.
> > IIUC, there is no place where frag_list is used (which actually I
> > tested). we can see skb_needs_linearize() needs to check
> > skb_has_frag_list() first, so it will not proceed after seeing it
> > return false.
> https://elixir.bootlin.com/linux/v6.18-rc7/source/include/linux/skbuff.h#L4322
>
> return skb_is_nonlinear(skb) &&
> ((skb_has_frag_list(skb) && !(features & NETIF_F_FRAGLIST)) ||
> (skb_shinfo(skb)->nr_frags && !(features & NETIF_F_SG)));
>
> can return true even if `!skb_has_frag_list(skb)`.
Oh well, indeed, I missed the nr_frags condition.
> I think you still need to call validate_xmit_skb()
I can simplify the whole logic as much as possible that is only
suitable for xsk: only keeping the linear check. That is the only
place that xsk could run into.
Thanks,
Jason
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net-next v3] xsk: skip validating skb list in xmit path
2025-11-28 1:44 ` Jason Xing
@ 2025-11-28 8:40 ` Paolo Abeni
2025-11-28 12:59 ` Jason Xing
0 siblings, 1 reply; 7+ messages in thread
From: Paolo Abeni @ 2025-11-28 8:40 UTC (permalink / raw)
To: Jason Xing, edumazet
Cc: davem, kuba, bjorn, magnus.karlsson, maciej.fijalkowski,
jonathan.lemon, sdf, ast, daniel, hawk, john.fastabend, bpf,
netdev, Jason Xing
On 11/28/25 2:44 AM, Jason Xing wrote:
> On Fri, Nov 28, 2025 at 1:58 AM Paolo Abeni <pabeni@redhat.com> wrote:
>> On 11/27/25 1:49 PM, Jason Xing wrote:
>>> On Thu, Nov 27, 2025 at 8:02 PM Paolo Abeni <pabeni@redhat.com> wrote:
>>>> On 11/25/25 12:57 PM, Jason Xing wrote:
>>>>> This patch also removes total ~4% consumption which can be observed
>>>>> by perf:
>>>>> |--2.97%--validate_xmit_skb
>>>>> | |
>>>>> | --1.76%--netif_skb_features
>>>>> | |
>>>>> | --0.65%--skb_network_protocol
>>>>> |
>>>>> |--1.06%--validate_xmit_xfrm
>>>>>
>>>>> The above result has been verfied on different NICs, like I40E. I
>>>>> managed to see the number is going up by 4%.
>>>>
>>>> I must admit this delta is surprising, and does not fit my experience in
>>>> slightly different scenarios with the plain UDP TX path.
>>>
>>> My take is that when the path is extremely hot, even the mathematics
>>> calculation could cause unexpected overhead. You can see the pps is
>>> now over 2,000,000. The reason why I say this is because I've done a
>>> few similar tests to verify this thought.
>>
>> Uhm... 2M is not that huge. Prior to the H/W vulnerability fallout
>> (spectre and friends) reasonable good H/W (2016 old) could do ~2Mpps
>> with a single plain UDP socket.
>
> Interesting number that I'm not aware of. Thanks.
>
> But for now it's really hard for xsk (in copy mode) to reach over 2M
> pps even with some recent optimizations applied. I wonder how you test
> UDP? Could you share the benchmark here?
>
> IMHO, xsk should not be slower than a plain UDP socket. So I think it
> should be a huge room for xsk to improve...
I can agree with that. Do you have baseline UDP figures for your H/W?
>> Also validate_xmit_xfrm() should be basically a no-op, possibly some bad
>> luck with icache?
>
> Maybe. I strongly feel that I need to work on the layout of those structures.
>>
>> Could you please try the attached patch instead?
>
> Yep, and I didn't manage to see any improvement.
That is unexpected. At very least that 1% due to validate_xmit_xfrm()
should go away. Could you please share the exact perf command line you
are using? Sometimes I see weird artifacts in perf reports that go away
adding the ":ppp" modifier on the command line, i.e.:
perf record -ag cycles:ppp <workload>
>> I think you still need to call validate_xmit_skb()
>
> I can simplify the whole logic as much as possible that is only
> suitable for xsk: only keeping the linear check. That is the only
> place that xsk could run into.
What about checksum offload? If I read correctly xsk could build
CSUM_PARTIAL skbs, and they will need skb_csum_hwoffload_help().
Generally speaking if validate_xmit_skb() takes a relevant slice of time
for frequently generated traffic, I guess we should try to optimize it.
@Eric: if you have the data handy, do you see validate_xmit_skb() as a
relevant cost in your UDP xmit tests?
Thanks,
Paolo
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net-next v3] xsk: skip validating skb list in xmit path
2025-11-28 8:40 ` Paolo Abeni
@ 2025-11-28 12:59 ` Jason Xing
0 siblings, 0 replies; 7+ messages in thread
From: Jason Xing @ 2025-11-28 12:59 UTC (permalink / raw)
To: Paolo Abeni
Cc: edumazet, davem, kuba, bjorn, magnus.karlsson, maciej.fijalkowski,
jonathan.lemon, sdf, ast, daniel, hawk, john.fastabend, bpf,
netdev, Jason Xing
On Fri, Nov 28, 2025 at 4:40 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On 11/28/25 2:44 AM, Jason Xing wrote:
> > On Fri, Nov 28, 2025 at 1:58 AM Paolo Abeni <pabeni@redhat.com> wrote:
> >> On 11/27/25 1:49 PM, Jason Xing wrote:
> >>> On Thu, Nov 27, 2025 at 8:02 PM Paolo Abeni <pabeni@redhat.com> wrote:
> >>>> On 11/25/25 12:57 PM, Jason Xing wrote:
> >>>>> This patch also removes total ~4% consumption which can be observed
> >>>>> by perf:
> >>>>> |--2.97%--validate_xmit_skb
> >>>>> | |
> >>>>> | --1.76%--netif_skb_features
> >>>>> | |
> >>>>> | --0.65%--skb_network_protocol
> >>>>> |
> >>>>> |--1.06%--validate_xmit_xfrm
> >>>>>
> >>>>> The above result has been verfied on different NICs, like I40E. I
> >>>>> managed to see the number is going up by 4%.
> >>>>
> >>>> I must admit this delta is surprising, and does not fit my experience in
> >>>> slightly different scenarios with the plain UDP TX path.
> >>>
> >>> My take is that when the path is extremely hot, even the mathematics
> >>> calculation could cause unexpected overhead. You can see the pps is
> >>> now over 2,000,000. The reason why I say this is because I've done a
> >>> few similar tests to verify this thought.
> >>
> >> Uhm... 2M is not that huge. Prior to the H/W vulnerability fallout
> >> (spectre and friends) reasonable good H/W (2016 old) could do ~2Mpps
> >> with a single plain UDP socket.
> >
> > Interesting number that I'm not aware of. Thanks.
> >
> > But for now it's really hard for xsk (in copy mode) to reach over 2M
> > pps even with some recent optimizations applied. I wonder how you test
> > UDP? Could you share the benchmark here?
> >
> > IMHO, xsk should not be slower than a plain UDP socket. So I think it
> > should be a huge room for xsk to improve...
>
> I can agree with that. Do you have baseline UDP figures for your H/W?
No, sorry. So I'm going to figure out how to test like xdpsock. I
think netperf/iperf should be fine?
>
> >> Also validate_xmit_xfrm() should be basically a no-op, possibly some bad
> >> luck with icache?
> >
> > Maybe. I strongly feel that I need to work on the layout of those structures.
> >>
> >> Could you please try the attached patch instead?
> >
> > Yep, and I didn't manage to see any improvement.
>
> That is unexpected. At very least that 1% due to validate_xmit_xfrm()
Ah, I finally realize why you asked xfrm. The perf graph I provided in
the log was generated on my VM a few months ago and the test that I
did today is running on the physical server. There is one common thing
on both setups that is validate_xmit_skb() introducing additional
overhead.
> should go away. Could you please share the exact perf command line you
> are using? Sometimes I see weird artifacts in perf reports that go away
> adding the ":ppp" modifier on the command line, i.e.:
>
> perf record -ag cycles:ppp <workload>
I will try this one :)
>
> >> I think you still need to call validate_xmit_skb()
> >
> > I can simplify the whole logic as much as possible that is only
> > suitable for xsk: only keeping the linear check. That is the only
> > place that xsk could run into.
> What about checksum offload? If I read correctly xsk could build
> CSUM_PARTIAL skbs, and they will need skb_csum_hwoffload_help().
Thanks for your reminder. What you said pushed me again to go through
all the details as much as I can. Apparently I missed the
xsk_skb_metadata() function as I never used it before.
>
> Generally speaking if validate_xmit_skb() takes a relevant slice of time
> for frequently generated traffic, I guess we should try to optimize it.
I agree on this since I can definitely see the overhead through
perf[1] on every machine I own.
[1] perf record -g -p <pid> -- sleep 10
>
> @Eric: if you have the data handy, do you see validate_xmit_skb() as a
> relevant cost in your UDP xmit tests?
>
> Thanks,
>
> Paolo
>
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-11-28 13:00 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-25 11:57 [PATCH net-next v3] xsk: skip validating skb list in xmit path Jason Xing
2025-11-27 12:02 ` Paolo Abeni
2025-11-27 12:49 ` Jason Xing
2025-11-27 17:58 ` Paolo Abeni
2025-11-28 1:44 ` Jason Xing
2025-11-28 8:40 ` Paolo Abeni
2025-11-28 12:59 ` Jason Xing
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).