* [PATCH net-next] xsk: skip validating skb list in xmit path
@ 2025-07-13 2:57 Jason Xing
2025-07-14 16:03 ` Stanislav Fomichev
0 siblings, 1 reply; 8+ messages in thread
From: Jason Xing @ 2025-07-13 2:57 UTC (permalink / raw)
To: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
john.fastabend, joe, willemdebruijn.kernel
Cc: bpf, netdev, Jason Xing
From: Jason Xing <kernelxing@tencent.com>
For xsk, it's not needed to validate and check the skb in
validate_xmit_skb_list() in copy mode because xsk_build_skb() doesn't
and doesn't need to prepare those requisites to validate. Xsk is just
responsible for delivering raw data from userspace to the driver.
Skipping numerous checks somehow contributes to the transmission
especially in the extremely hot path.
Performance-wise, I used './xdpsock -i enp2s0f0np0 -t -S -s 64' to verify
the guess and then measured on the machine with ixgbe driver. It stably
goes up by 5.48%, which can be seen in the shown below:
Before:
sock0@enp2s0f0np0:0 txonly xdp-skb
pps pkts 1.00
rx 0 0
tx 1,187,410 3,513,536
After:
sock0@enp2s0f0np0:0 txonly xdp-skb
pps pkts 1.00
rx 0 0
tx 1,252,590 2,459,456
This patch also removes total ~4% consumption which can be observed
by perf:
|--2.97%--validate_xmit_skb
| |
| --1.76%--netif_skb_features
| |
| --0.65%--skb_network_protocol
|
|--1.06%--validate_xmit_xfrm
Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
include/linux/netdevice.h | 4 ++--
net/core/dev.c | 10 ++++++----
net/xdp/xsk.c | 2 +-
3 files changed, 9 insertions(+), 7 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index a80d21a14612..2df44c22406c 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3351,7 +3351,7 @@ u16 dev_pick_tx_zero(struct net_device *dev, struct sk_buff *skb,
struct net_device *sb_dev);
int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev);
-int __dev_direct_xmit(struct sk_buff *skb, u16 queue_id);
+int __dev_direct_xmit(struct sk_buff *skb, u16 queue_id, bool validate);
static inline int dev_queue_xmit(struct sk_buff *skb)
{
@@ -3368,7 +3368,7 @@ static inline int dev_direct_xmit(struct sk_buff *skb, u16 queue_id)
{
int ret;
- ret = __dev_direct_xmit(skb, queue_id);
+ ret = __dev_direct_xmit(skb, queue_id, true);
if (!dev_xmit_complete(ret))
kfree_skb(skb);
return ret;
diff --git a/net/core/dev.c b/net/core/dev.c
index e365b099484e..9fa805c26601 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4741,7 +4741,7 @@ int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
}
EXPORT_SYMBOL(__dev_queue_xmit);
-int __dev_direct_xmit(struct sk_buff *skb, u16 queue_id)
+int __dev_direct_xmit(struct sk_buff *skb, u16 queue_id, bool validate)
{
struct net_device *dev = skb->dev;
struct sk_buff *orig_skb = skb;
@@ -4753,9 +4753,11 @@ int __dev_direct_xmit(struct sk_buff *skb, u16 queue_id)
!netif_carrier_ok(dev)))
goto drop;
- skb = validate_xmit_skb_list(skb, dev, &again);
- if (skb != orig_skb)
- goto drop;
+ if (validate) {
+ skb = validate_xmit_skb_list(skb, dev, &again);
+ if (skb != orig_skb)
+ goto drop;
+ }
skb_set_queue_mapping(skb, queue_id);
txq = skb_get_tx_queue(dev, skb);
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 9c3acecc14b1..55278ad0a558 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -834,7 +834,7 @@ static int __xsk_generic_xmit(struct sock *sk)
continue;
}
- err = __dev_direct_xmit(skb, xs->queue_id);
+ err = __dev_direct_xmit(skb, xs->queue_id, false);
if (err == NETDEV_TX_BUSY) {
/* Tell user-space to retry the send */
xskq_cons_cancel_n(xs->tx, xsk_get_num_desc(skb));
--
2.41.3
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH net-next] xsk: skip validating skb list in xmit path
2025-07-13 2:57 [PATCH net-next] xsk: skip validating skb list in xmit path Jason Xing
@ 2025-07-14 16:03 ` Stanislav Fomichev
2025-07-14 23:53 ` Jason Xing
0 siblings, 1 reply; 8+ messages in thread
From: Stanislav Fomichev @ 2025-07-14 16:03 UTC (permalink / raw)
To: Jason Xing
Cc: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
john.fastabend, joe, willemdebruijn.kernel, bpf, netdev,
Jason Xing
On 07/13, Jason Xing wrote:
> From: Jason Xing <kernelxing@tencent.com>
>
> For xsk, it's not needed to validate and check the skb in
> validate_xmit_skb_list() in copy mode because xsk_build_skb() doesn't
> and doesn't need to prepare those requisites to validate. Xsk is just
> responsible for delivering raw data from userspace to the driver.
So the __dev_direct_xmit was taken out of af_packet in commit 865b03f21162
("dev: packet: make packet_direct_xmit a common function"). And a call
to validate_xmit_skb_list was added in 104ba78c9880 ("packet: on direct_xmit,
limit tso and csum to supported devices") to support TSO. Since we don't
support tso/vlan offloads in xsk_build_skb, removing validate_xmit_skb_list
seems fair.
Although, again, if you care about performance, why not use zerocopy
mode?
> Skipping numerous checks somehow contributes to the transmission
> especially in the extremely hot path.
>
> Performance-wise, I used './xdpsock -i enp2s0f0np0 -t -S -s 64' to verify
> the guess and then measured on the machine with ixgbe driver. It stably
> goes up by 5.48%, which can be seen in the shown below:
> Before:
> sock0@enp2s0f0np0:0 txonly xdp-skb
> pps pkts 1.00
> rx 0 0
> tx 1,187,410 3,513,536
> After:
> sock0@enp2s0f0np0:0 txonly xdp-skb
> pps pkts 1.00
> rx 0 0
> tx 1,252,590 2,459,456
>
> This patch also removes total ~4% consumption which can be observed
> by perf:
> |--2.97%--validate_xmit_skb
> | |
> | --1.76%--netif_skb_features
> | |
> | --0.65%--skb_network_protocol
> |
> |--1.06%--validate_xmit_xfrm
It is a bit surprising that mostly no-op validate_xmit_skb_list takes
4% of the cycles. netif_skb_features taking ~2%? Any idea why? Is
it unoptimized kernel? Which driver is it?
> Signed-off-by: Jason Xing <kernelxing@tencent.com>
> ---
> include/linux/netdevice.h | 4 ++--
> net/core/dev.c | 10 ++++++----
> net/xdp/xsk.c | 2 +-
> 3 files changed, 9 insertions(+), 7 deletions(-)
>
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index a80d21a14612..2df44c22406c 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -3351,7 +3351,7 @@ u16 dev_pick_tx_zero(struct net_device *dev, struct sk_buff *skb,
> struct net_device *sb_dev);
>
> int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev);
> -int __dev_direct_xmit(struct sk_buff *skb, u16 queue_id);
> +int __dev_direct_xmit(struct sk_buff *skb, u16 queue_id, bool validate);
>
> static inline int dev_queue_xmit(struct sk_buff *skb)
> {
> @@ -3368,7 +3368,7 @@ static inline int dev_direct_xmit(struct sk_buff *skb, u16 queue_id)
> {
> int ret;
>
> - ret = __dev_direct_xmit(skb, queue_id);
> + ret = __dev_direct_xmit(skb, queue_id, true);
> if (!dev_xmit_complete(ret))
> kfree_skb(skb);
> return ret;
Implementation wise, will it be better if we move a call to validate_xmit_skb_list
from __dev_direct_xmit to dev_direct_xmit (and a few other callers of
__dev_direct_xmit)? This will avoid the new flag.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net-next] xsk: skip validating skb list in xmit path
2025-07-14 16:03 ` Stanislav Fomichev
@ 2025-07-14 23:53 ` Jason Xing
2025-07-15 20:44 ` Stanislav Fomichev
2025-07-15 23:19 ` Jakub Kicinski
0 siblings, 2 replies; 8+ messages in thread
From: Jason Xing @ 2025-07-14 23:53 UTC (permalink / raw)
To: Stanislav Fomichev
Cc: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
john.fastabend, joe, willemdebruijn.kernel, bpf, netdev,
Jason Xing
On Tue, Jul 15, 2025 at 12:03 AM Stanislav Fomichev
<stfomichev@gmail.com> wrote:
>
> On 07/13, Jason Xing wrote:
> > From: Jason Xing <kernelxing@tencent.com>
> >
> > For xsk, it's not needed to validate and check the skb in
> > validate_xmit_skb_list() in copy mode because xsk_build_skb() doesn't
> > and doesn't need to prepare those requisites to validate. Xsk is just
> > responsible for delivering raw data from userspace to the driver.
>
> So the __dev_direct_xmit was taken out of af_packet in commit 865b03f21162
> ("dev: packet: make packet_direct_xmit a common function"). And a call
> to validate_xmit_skb_list was added in 104ba78c9880 ("packet: on direct_xmit,
> limit tso and csum to supported devices") to support TSO. Since we don't
> support tso/vlan offloads in xsk_build_skb, removing validate_xmit_skb_list
> seems fair.
Right, if you don't mind, I think I will copy&paste your words in the
next respin.
>
> Although, again, if you care about performance, why not use zerocopy
> mode?
I attached the performance impact because I'm working on the different
modes in xsk to see how it really behaves. You can take it as a kind
of investigation :)
I like zc mode, but the fact is that:
1) with ixgbe driver, my machine could totally lose connection as long
as the xsk tries to send packets. I'm still debugging it :(
2) some customers using virtio_net don't have a supported host, so
copy mode is the only one choice.
>
> > Skipping numerous checks somehow contributes to the transmission
> > especially in the extremely hot path.
> >
> > Performance-wise, I used './xdpsock -i enp2s0f0np0 -t -S -s 64' to verify
> > the guess and then measured on the machine with ixgbe driver. It stably
> > goes up by 5.48%, which can be seen in the shown below:
> > Before:
> > sock0@enp2s0f0np0:0 txonly xdp-skb
> > pps pkts 1.00
> > rx 0 0
> > tx 1,187,410 3,513,536
> > After:
> > sock0@enp2s0f0np0:0 txonly xdp-skb
> > pps pkts 1.00
> > rx 0 0
> > tx 1,252,590 2,459,456
> >
> > This patch also removes total ~4% consumption which can be observed
> > by perf:
> > |--2.97%--validate_xmit_skb
> > | |
> > | --1.76%--netif_skb_features
> > | |
> > | --0.65%--skb_network_protocol
> > |
> > |--1.06%--validate_xmit_xfrm
>
> It is a bit surprising that mostly no-op validate_xmit_skb_list takes
> 4% of the cycles. netif_skb_features taking ~2%? Any idea why? Is
> it unoptimized kernel? Which driver is it?
No idea on this one, sorry. I tested with different drivers (like
i40e) and it turned out to be nearly the same result.
One of my machines looks like this:
# lspci -vv | grep -i ether
02:00.0 Ethernet controller: Intel Corporation Ethernet Controller
10-Gigabit X540-AT2 (rev 01)
02:00.1 Ethernet controller: Intel Corporation Ethernet Controller
10-Gigabit X540-AT2 (rev 01)
# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 46 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 48
On-line CPU(s) list: 0-47
Vendor ID: GenuineIntel
BIOS Vendor ID: Intel(R) Corporation
Model name: Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz
BIOS Model name: Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz
CPU @ 2.3GHz
BIOS CPU family: 179
CPU family: 6
Model: 63
Thread(s) per core: 2
Core(s) per socket: 12
Socket(s): 2
Stepping: 2
CPU(s) scaling MHz: 96%
CPU max MHz: 3100.0000
CPU min MHz: 1200.0000
BogoMIPS: 4589.31
Flags: fpu vme de pse tsc msr pae mce cx8 apic
sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss
ht tm pbe syscall nx pdpe1gb rdtscp lm constant_ts
c arch_perfmon pebs bts rep_good nopl
xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor
ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm p
cid dca sse4_1 sse4_2 x2apic movbe popcnt
tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm cpuid_fault
epb intel_ppin ssbd ibrs ibpb stibp tpr_shadow fl
expriority ept vpid ept_ad fsgsbase
tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc
cqm_occup_llc dtherm ida arat pln pts vnmi md_clear flush_l
1d
>
> > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > ---
> > include/linux/netdevice.h | 4 ++--
> > net/core/dev.c | 10 ++++++----
> > net/xdp/xsk.c | 2 +-
> > 3 files changed, 9 insertions(+), 7 deletions(-)
> >
> > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > index a80d21a14612..2df44c22406c 100644
> > --- a/include/linux/netdevice.h
> > +++ b/include/linux/netdevice.h
> > @@ -3351,7 +3351,7 @@ u16 dev_pick_tx_zero(struct net_device *dev, struct sk_buff *skb,
> > struct net_device *sb_dev);
> >
> > int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev);
> > -int __dev_direct_xmit(struct sk_buff *skb, u16 queue_id);
> > +int __dev_direct_xmit(struct sk_buff *skb, u16 queue_id, bool validate);
> >
> > static inline int dev_queue_xmit(struct sk_buff *skb)
> > {
> > @@ -3368,7 +3368,7 @@ static inline int dev_direct_xmit(struct sk_buff *skb, u16 queue_id)
> > {
> > int ret;
> >
> > - ret = __dev_direct_xmit(skb, queue_id);
> > + ret = __dev_direct_xmit(skb, queue_id, true);
> > if (!dev_xmit_complete(ret))
> > kfree_skb(skb);
> > return ret;
>
> Implementation wise, will it be better if we move a call to validate_xmit_skb_list
> from __dev_direct_xmit to dev_direct_xmit (and a few other callers of
> __dev_direct_xmit)? This will avoid the new flag.
__dev_direct_xmit() helper was developed to serve the xsk type a few
years ago. For now it has only two callers. If we expect to avoid a
new parameter, we will also move the dev check[1] as below to the
callers of __dev_direct_xmit(). Then move validate_xmit_skb_list to
__dev_direct_xmit(). It's not that concise, I assume? I'm not sure if
I miss your point.
[1]
if (unlikely(!netif_running(dev) || !netif_carrier_ok(dev)))
goto drop;
Thanks,
Jason
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net-next] xsk: skip validating skb list in xmit path
2025-07-14 23:53 ` Jason Xing
@ 2025-07-15 20:44 ` Stanislav Fomichev
2025-07-15 23:24 ` Jason Xing
2025-07-15 23:19 ` Jakub Kicinski
1 sibling, 1 reply; 8+ messages in thread
From: Stanislav Fomichev @ 2025-07-15 20:44 UTC (permalink / raw)
To: Jason Xing
Cc: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
john.fastabend, joe, willemdebruijn.kernel, bpf, netdev,
Jason Xing
On 07/15, Jason Xing wrote:
> On Tue, Jul 15, 2025 at 12:03 AM Stanislav Fomichev
> <stfomichev@gmail.com> wrote:
> >
> > On 07/13, Jason Xing wrote:
> > > From: Jason Xing <kernelxing@tencent.com>
> > >
> > > For xsk, it's not needed to validate and check the skb in
> > > validate_xmit_skb_list() in copy mode because xsk_build_skb() doesn't
> > > and doesn't need to prepare those requisites to validate. Xsk is just
> > > responsible for delivering raw data from userspace to the driver.
> >
> > So the __dev_direct_xmit was taken out of af_packet in commit 865b03f21162
> > ("dev: packet: make packet_direct_xmit a common function"). And a call
> > to validate_xmit_skb_list was added in 104ba78c9880 ("packet: on direct_xmit,
> > limit tso and csum to supported devices") to support TSO. Since we don't
> > support tso/vlan offloads in xsk_build_skb, removing validate_xmit_skb_list
> > seems fair.
>
> Right, if you don't mind, I think I will copy&paste your words in the
> next respin.
>
> >
> > Although, again, if you care about performance, why not use zerocopy
> > mode?
>
> I attached the performance impact because I'm working on the different
> modes in xsk to see how it really behaves. You can take it as a kind
> of investigation :)
>
> I like zc mode, but the fact is that:
> 1) with ixgbe driver, my machine could totally lose connection as long
> as the xsk tries to send packets. I'm still debugging it :(
> 2) some customers using virtio_net don't have a supported host, so
> copy mode is the only one choice.
>
> >
> > > Skipping numerous checks somehow contributes to the transmission
> > > especially in the extremely hot path.
> > >
> > > Performance-wise, I used './xdpsock -i enp2s0f0np0 -t -S -s 64' to verify
> > > the guess and then measured on the machine with ixgbe driver. It stably
> > > goes up by 5.48%, which can be seen in the shown below:
> > > Before:
> > > sock0@enp2s0f0np0:0 txonly xdp-skb
> > > pps pkts 1.00
> > > rx 0 0
> > > tx 1,187,410 3,513,536
> > > After:
> > > sock0@enp2s0f0np0:0 txonly xdp-skb
> > > pps pkts 1.00
> > > rx 0 0
> > > tx 1,252,590 2,459,456
> > >
> > > This patch also removes total ~4% consumption which can be observed
> > > by perf:
> > > |--2.97%--validate_xmit_skb
> > > | |
> > > | --1.76%--netif_skb_features
> > > | |
> > > | --0.65%--skb_network_protocol
> > > |
> > > |--1.06%--validate_xmit_xfrm
> >
> > It is a bit surprising that mostly no-op validate_xmit_skb_list takes
> > 4% of the cycles. netif_skb_features taking ~2%? Any idea why? Is
> > it unoptimized kernel? Which driver is it?
>
> No idea on this one, sorry. I tested with different drivers (like
> i40e) and it turned out to be nearly the same result.
I was trying to follow validate_xmit_skb_list, but too many things
happen in there. Although, without gso/vlan, most of these should
be no-op. Plus you have xfrm compiled in. So still surprising, let's see
if other people have any suggestions.
> One of my machines looks like this:
> # lspci -vv | grep -i ether
> 02:00.0 Ethernet controller: Intel Corporation Ethernet Controller
> 10-Gigabit X540-AT2 (rev 01)
> 02:00.1 Ethernet controller: Intel Corporation Ethernet Controller
> 10-Gigabit X540-AT2 (rev 01)
> # lscpu
> Architecture: x86_64
> CPU op-mode(s): 32-bit, 64-bit
> Address sizes: 46 bits physical, 48 bits virtual
> Byte Order: Little Endian
> CPU(s): 48
> On-line CPU(s) list: 0-47
> Vendor ID: GenuineIntel
> BIOS Vendor ID: Intel(R) Corporation
> Model name: Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz
> BIOS Model name: Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz
> CPU @ 2.3GHz
> BIOS CPU family: 179
> CPU family: 6
> Model: 63
> Thread(s) per core: 2
> Core(s) per socket: 12
> Socket(s): 2
> Stepping: 2
> CPU(s) scaling MHz: 96%
> CPU max MHz: 3100.0000
> CPU min MHz: 1200.0000
> BogoMIPS: 4589.31
> Flags: fpu vme de pse tsc msr pae mce cx8 apic
> sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss
> ht tm pbe syscall nx pdpe1gb rdtscp lm constant_ts
> c arch_perfmon pebs bts rep_good nopl
> xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor
> ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm p
> cid dca sse4_1 sse4_2 x2apic movbe popcnt
> tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm cpuid_fault
> epb intel_ppin ssbd ibrs ibpb stibp tpr_shadow fl
> expriority ept vpid ept_ad fsgsbase
> tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc
> cqm_occup_llc dtherm ida arat pln pts vnmi md_clear flush_l
> 1d
>
> >
> > > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > > ---
> > > include/linux/netdevice.h | 4 ++--
> > > net/core/dev.c | 10 ++++++----
> > > net/xdp/xsk.c | 2 +-
> > > 3 files changed, 9 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > > index a80d21a14612..2df44c22406c 100644
> > > --- a/include/linux/netdevice.h
> > > +++ b/include/linux/netdevice.h
> > > @@ -3351,7 +3351,7 @@ u16 dev_pick_tx_zero(struct net_device *dev, struct sk_buff *skb,
> > > struct net_device *sb_dev);
> > >
> > > int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev);
> > > -int __dev_direct_xmit(struct sk_buff *skb, u16 queue_id);
> > > +int __dev_direct_xmit(struct sk_buff *skb, u16 queue_id, bool validate);
> > >
> > > static inline int dev_queue_xmit(struct sk_buff *skb)
> > > {
> > > @@ -3368,7 +3368,7 @@ static inline int dev_direct_xmit(struct sk_buff *skb, u16 queue_id)
> > > {
> > > int ret;
> > >
> > > - ret = __dev_direct_xmit(skb, queue_id);
> > > + ret = __dev_direct_xmit(skb, queue_id, true);
> > > if (!dev_xmit_complete(ret))
> > > kfree_skb(skb);
> > > return ret;
> >
> > Implementation wise, will it be better if we move a call to validate_xmit_skb_list
> > from __dev_direct_xmit to dev_direct_xmit (and a few other callers of
> > __dev_direct_xmit)? This will avoid the new flag.
>
> __dev_direct_xmit() helper was developed to serve the xsk type a few
> years ago. For now it has only two callers. If we expect to avoid a
> new parameter, we will also move the dev check[1] as below to the
> callers of __dev_direct_xmit(). Then move validate_xmit_skb_list to
> __dev_direct_xmit(). It's not that concise, I assume? I'm not sure if
> I miss your point.
>
> [1]
> if (unlikely(!netif_running(dev) || !netif_carrier_ok(dev)))
> goto drop;
We can keep the check in its original place. I don't think the order of
the checks matters? I was thinking something along these (untested)
lines. Avoids one conditional in each path (not that it matters)..
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index e6131c529af4..36cdeef6a5e9 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3367,8 +3367,15 @@ static inline int dev_queue_xmit_accel(struct sk_buff *skb,
static inline int dev_direct_xmit(struct sk_buff *skb, u16 queue_id)
{
+ struct sk_buff *orig_skb = skb;
+ bool again = false;
int ret;
+ skb = validate_xmit_skb_list(skb, dev, &again);
+ if (skb != orig_skb) {
+ /* dropped_inc and kfree */
+ }
+
ret = __dev_direct_xmit(skb, queue_id);
if (!dev_xmit_complete(ret))
kfree_skb(skb);
diff --git a/net/core/dev.c b/net/core/dev.c
index 26253802f6cd..d3b9a75852fd 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4744,19 +4744,13 @@ EXPORT_SYMBOL(__dev_queue_xmit);
int __dev_direct_xmit(struct sk_buff *skb, u16 queue_id)
{
struct net_device *dev = skb->dev;
- struct sk_buff *orig_skb = skb;
struct netdev_queue *txq;
int ret = NETDEV_TX_BUSY;
- bool again = false;
if (unlikely(!netif_running(dev) ||
!netif_carrier_ok(dev)))
goto drop;
- skb = validate_xmit_skb_list(skb, dev, &again);
- if (skb != orig_skb)
- goto drop;
-
skb_set_queue_mapping(skb, queue_id);
txq = skb_get_tx_queue(dev, skb);
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH net-next] xsk: skip validating skb list in xmit path
2025-07-14 23:53 ` Jason Xing
2025-07-15 20:44 ` Stanislav Fomichev
@ 2025-07-15 23:19 ` Jakub Kicinski
2025-07-15 23:39 ` Jason Xing
1 sibling, 1 reply; 8+ messages in thread
From: Jakub Kicinski @ 2025-07-15 23:19 UTC (permalink / raw)
To: Jason Xing
Cc: Stanislav Fomichev, davem, edumazet, pabeni, bjorn,
magnus.karlsson, maciej.fijalkowski, jonathan.lemon, sdf, ast,
daniel, hawk, john.fastabend, joe, willemdebruijn.kernel, bpf,
netdev, Jason Xing
On Tue, 15 Jul 2025 07:53:19 +0800 Jason Xing wrote:
> > Although, again, if you care about performance, why not use zerocopy
> > mode?
>
> I attached the performance impact because I'm working on the different
> modes in xsk to see how it really behaves. You can take it as a kind
> of investigation :)
How does the copy mode compare to a normal packet socket?
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net-next] xsk: skip validating skb list in xmit path
2025-07-15 20:44 ` Stanislav Fomichev
@ 2025-07-15 23:24 ` Jason Xing
2025-07-16 21:40 ` Stanislav Fomichev
0 siblings, 1 reply; 8+ messages in thread
From: Jason Xing @ 2025-07-15 23:24 UTC (permalink / raw)
To: Stanislav Fomichev
Cc: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
john.fastabend, joe, willemdebruijn.kernel, bpf, netdev,
Jason Xing
On Wed, Jul 16, 2025 at 4:44 AM Stanislav Fomichev <stfomichev@gmail.com> wrote:
>
> On 07/15, Jason Xing wrote:
> > On Tue, Jul 15, 2025 at 12:03 AM Stanislav Fomichev
> > <stfomichev@gmail.com> wrote:
> > >
> > > On 07/13, Jason Xing wrote:
> > > > From: Jason Xing <kernelxing@tencent.com>
> > > >
> > > > For xsk, it's not needed to validate and check the skb in
> > > > validate_xmit_skb_list() in copy mode because xsk_build_skb() doesn't
> > > > and doesn't need to prepare those requisites to validate. Xsk is just
> > > > responsible for delivering raw data from userspace to the driver.
> > >
> > > So the __dev_direct_xmit was taken out of af_packet in commit 865b03f21162
> > > ("dev: packet: make packet_direct_xmit a common function"). And a call
> > > to validate_xmit_skb_list was added in 104ba78c9880 ("packet: on direct_xmit,
> > > limit tso and csum to supported devices") to support TSO. Since we don't
> > > support tso/vlan offloads in xsk_build_skb, removing validate_xmit_skb_list
> > > seems fair.
> >
> > Right, if you don't mind, I think I will copy&paste your words in the
> > next respin.
> >
> > >
> > > Although, again, if you care about performance, why not use zerocopy
> > > mode?
> >
> > I attached the performance impact because I'm working on the different
> > modes in xsk to see how it really behaves. You can take it as a kind
> > of investigation :)
> >
> > I like zc mode, but the fact is that:
> > 1) with ixgbe driver, my machine could totally lose connection as long
> > as the xsk tries to send packets. I'm still debugging it :(
> > 2) some customers using virtio_net don't have a supported host, so
> > copy mode is the only one choice.
> >
> > >
> > > > Skipping numerous checks somehow contributes to the transmission
> > > > especially in the extremely hot path.
> > > >
> > > > Performance-wise, I used './xdpsock -i enp2s0f0np0 -t -S -s 64' to verify
> > > > the guess and then measured on the machine with ixgbe driver. It stably
> > > > goes up by 5.48%, which can be seen in the shown below:
> > > > Before:
> > > > sock0@enp2s0f0np0:0 txonly xdp-skb
> > > > pps pkts 1.00
> > > > rx 0 0
> > > > tx 1,187,410 3,513,536
> > > > After:
> > > > sock0@enp2s0f0np0:0 txonly xdp-skb
> > > > pps pkts 1.00
> > > > rx 0 0
> > > > tx 1,252,590 2,459,456
> > > >
> > > > This patch also removes total ~4% consumption which can be observed
> > > > by perf:
> > > > |--2.97%--validate_xmit_skb
> > > > | |
> > > > | --1.76%--netif_skb_features
> > > > | |
> > > > | --0.65%--skb_network_protocol
> > > > |
> > > > |--1.06%--validate_xmit_xfrm
> > >
> > > It is a bit surprising that mostly no-op validate_xmit_skb_list takes
> > > 4% of the cycles. netif_skb_features taking ~2%? Any idea why? Is
> > > it unoptimized kernel? Which driver is it?
> >
> > No idea on this one, sorry. I tested with different drivers (like
> > i40e) and it turned out to be nearly the same result.
>
> I was trying to follow validate_xmit_skb_list, but too many things
> happen in there. Although, without gso/vlan, most of these should
> be no-op. Plus you have xfrm compiled in. So still surprising, let's see
> if other people have any suggestions.
>
> > One of my machines looks like this:
> > # lspci -vv | grep -i ether
> > 02:00.0 Ethernet controller: Intel Corporation Ethernet Controller
> > 10-Gigabit X540-AT2 (rev 01)
> > 02:00.1 Ethernet controller: Intel Corporation Ethernet Controller
> > 10-Gigabit X540-AT2 (rev 01)
> > # lscpu
> > Architecture: x86_64
> > CPU op-mode(s): 32-bit, 64-bit
> > Address sizes: 46 bits physical, 48 bits virtual
> > Byte Order: Little Endian
> > CPU(s): 48
> > On-line CPU(s) list: 0-47
> > Vendor ID: GenuineIntel
> > BIOS Vendor ID: Intel(R) Corporation
> > Model name: Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz
> > BIOS Model name: Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz
> > CPU @ 2.3GHz
> > BIOS CPU family: 179
> > CPU family: 6
> > Model: 63
> > Thread(s) per core: 2
> > Core(s) per socket: 12
> > Socket(s): 2
> > Stepping: 2
> > CPU(s) scaling MHz: 96%
> > CPU max MHz: 3100.0000
> > CPU min MHz: 1200.0000
> > BogoMIPS: 4589.31
> > Flags: fpu vme de pse tsc msr pae mce cx8 apic
> > sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss
> > ht tm pbe syscall nx pdpe1gb rdtscp lm constant_ts
> > c arch_perfmon pebs bts rep_good nopl
> > xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor
> > ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm p
> > cid dca sse4_1 sse4_2 x2apic movbe popcnt
> > tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm cpuid_fault
> > epb intel_ppin ssbd ibrs ibpb stibp tpr_shadow fl
> > expriority ept vpid ept_ad fsgsbase
> > tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc
> > cqm_occup_llc dtherm ida arat pln pts vnmi md_clear flush_l
> > 1d
> >
> > >
> > > > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > > > ---
> > > > include/linux/netdevice.h | 4 ++--
> > > > net/core/dev.c | 10 ++++++----
> > > > net/xdp/xsk.c | 2 +-
> > > > 3 files changed, 9 insertions(+), 7 deletions(-)
> > > >
> > > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > > > index a80d21a14612..2df44c22406c 100644
> > > > --- a/include/linux/netdevice.h
> > > > +++ b/include/linux/netdevice.h
> > > > @@ -3351,7 +3351,7 @@ u16 dev_pick_tx_zero(struct net_device *dev, struct sk_buff *skb,
> > > > struct net_device *sb_dev);
> > > >
> > > > int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev);
> > > > -int __dev_direct_xmit(struct sk_buff *skb, u16 queue_id);
> > > > +int __dev_direct_xmit(struct sk_buff *skb, u16 queue_id, bool validate);
> > > >
> > > > static inline int dev_queue_xmit(struct sk_buff *skb)
> > > > {
> > > > @@ -3368,7 +3368,7 @@ static inline int dev_direct_xmit(struct sk_buff *skb, u16 queue_id)
> > > > {
> > > > int ret;
> > > >
> > > > - ret = __dev_direct_xmit(skb, queue_id);
> > > > + ret = __dev_direct_xmit(skb, queue_id, true);
> > > > if (!dev_xmit_complete(ret))
> > > > kfree_skb(skb);
> > > > return ret;
> > >
> > > Implementation wise, will it be better if we move a call to validate_xmit_skb_list
> > > from __dev_direct_xmit to dev_direct_xmit (and a few other callers of
> > > __dev_direct_xmit)? This will avoid the new flag.
> >
> > __dev_direct_xmit() helper was developed to serve the xsk type a few
> > years ago. For now it has only two callers. If we expect to avoid a
> > new parameter, we will also move the dev check[1] as below to the
> > callers of __dev_direct_xmit(). Then move validate_xmit_skb_list to
> > __dev_direct_xmit(). It's not that concise, I assume? I'm not sure if
> > I miss your point.
> >
> > [1]
> > if (unlikely(!netif_running(dev) || !netif_carrier_ok(dev)))
> > goto drop;
>
> We can keep the check in its original place. I don't think the order of
> the checks matters? I was thinking something along these (untested)
> lines. Avoids one conditional in each path (not that it matters)..
Sorry, in my point of view, one additional flag is really not a big
deal even though it is on the hot path. A new flag doesn't cause any
side effects at all, does it? On the contrary, running on top of the
following code snippet, if the netdevice is down, dev_direct_xmit()
will take a lot of time to run validate_xmit_skb_list() before two
simple device if-statements.
Since the status of the patch was marked 'changed-requested', I will
send a v2 as you requested.
Thanks,
Jason
>
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index e6131c529af4..36cdeef6a5e9 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -3367,8 +3367,15 @@ static inline int dev_queue_xmit_accel(struct sk_buff *skb,
>
> static inline int dev_direct_xmit(struct sk_buff *skb, u16 queue_id)
> {
> + struct sk_buff *orig_skb = skb;
> + bool again = false;
> int ret;
>
> + skb = validate_xmit_skb_list(skb, dev, &again);
> + if (skb != orig_skb) {
> + /* dropped_inc and kfree */
> + }
> +
> ret = __dev_direct_xmit(skb, queue_id);
> if (!dev_xmit_complete(ret))
> kfree_skb(skb);
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 26253802f6cd..d3b9a75852fd 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -4744,19 +4744,13 @@ EXPORT_SYMBOL(__dev_queue_xmit);
> int __dev_direct_xmit(struct sk_buff *skb, u16 queue_id)
> {
> struct net_device *dev = skb->dev;
> - struct sk_buff *orig_skb = skb;
> struct netdev_queue *txq;
> int ret = NETDEV_TX_BUSY;
> - bool again = false;
>
> if (unlikely(!netif_running(dev) ||
> !netif_carrier_ok(dev)))
> goto drop;
>
> - skb = validate_xmit_skb_list(skb, dev, &again);
> - if (skb != orig_skb)
> - goto drop;
> -
> skb_set_queue_mapping(skb, queue_id);
> txq = skb_get_tx_queue(dev, skb);
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net-next] xsk: skip validating skb list in xmit path
2025-07-15 23:19 ` Jakub Kicinski
@ 2025-07-15 23:39 ` Jason Xing
0 siblings, 0 replies; 8+ messages in thread
From: Jason Xing @ 2025-07-15 23:39 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Stanislav Fomichev, davem, edumazet, pabeni, bjorn,
magnus.karlsson, maciej.fijalkowski, jonathan.lemon, sdf, ast,
daniel, hawk, john.fastabend, joe, willemdebruijn.kernel, bpf,
netdev, Jason Xing
On Wed, Jul 16, 2025 at 7:19 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Tue, 15 Jul 2025 07:53:19 +0800 Jason Xing wrote:
> > > Although, again, if you care about performance, why not use zerocopy
> > > mode?
> >
> > I attached the performance impact because I'm working on the different
> > modes in xsk to see how it really behaves. You can take it as a kind
> > of investigation :)
>
> How does the copy mode compare to a normal packet socket?
We combine a TCP user-space stack that is not mature obviously with
this copy mode af_xdp in the virtual machine, seeing that in some
cases:
1) in the real workload the cpu% could be minimized a lot (almost
50%), which is one of factors we care about most of the time.
2) the throughput does not always outperform TCP in the kernel. So
far, only in request & response cases, we're able to see xsk ramp up
the transmission.
I'm focusing on the physical machine scenario in the meantime.
Everything went not well admittedly.
As I said, I'm still trying to find/test every possible feature in
xsk, hoping to find a final solution to deploy.
I'm also thinking if it's possible to 1) remove the sendto syscall
that is used to drive/notify the xsk, 2) prepare enough skbs
beforehand instead of allocating one and then freeing it over and over
again. Well, these so-called ideas are just out of thin air :p
Thanks,
Jason
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net-next] xsk: skip validating skb list in xmit path
2025-07-15 23:24 ` Jason Xing
@ 2025-07-16 21:40 ` Stanislav Fomichev
0 siblings, 0 replies; 8+ messages in thread
From: Stanislav Fomichev @ 2025-07-16 21:40 UTC (permalink / raw)
To: Jason Xing
Cc: davem, edumazet, kuba, pabeni, bjorn, magnus.karlsson,
maciej.fijalkowski, jonathan.lemon, sdf, ast, daniel, hawk,
john.fastabend, joe, willemdebruijn.kernel, bpf, netdev,
Jason Xing
On 07/16, Jason Xing wrote:
> On Wed, Jul 16, 2025 at 4:44 AM Stanislav Fomichev <stfomichev@gmail.com> wrote:
> >
> > On 07/15, Jason Xing wrote:
> > > On Tue, Jul 15, 2025 at 12:03 AM Stanislav Fomichev
> > > <stfomichev@gmail.com> wrote:
> > > >
> > > > On 07/13, Jason Xing wrote:
> > > > > From: Jason Xing <kernelxing@tencent.com>
> > > > >
> > > > > For xsk, it's not needed to validate and check the skb in
> > > > > validate_xmit_skb_list() in copy mode because xsk_build_skb() doesn't
> > > > > and doesn't need to prepare those requisites to validate. Xsk is just
> > > > > responsible for delivering raw data from userspace to the driver.
> > > >
> > > > So the __dev_direct_xmit was taken out of af_packet in commit 865b03f21162
> > > > ("dev: packet: make packet_direct_xmit a common function"). And a call
> > > > to validate_xmit_skb_list was added in 104ba78c9880 ("packet: on direct_xmit,
> > > > limit tso and csum to supported devices") to support TSO. Since we don't
> > > > support tso/vlan offloads in xsk_build_skb, removing validate_xmit_skb_list
> > > > seems fair.
> > >
> > > Right, if you don't mind, I think I will copy&paste your words in the
> > > next respin.
> > >
> > > >
> > > > Although, again, if you care about performance, why not use zerocopy
> > > > mode?
> > >
> > > I attached the performance impact because I'm working on the different
> > > modes in xsk to see how it really behaves. You can take it as a kind
> > > of investigation :)
> > >
> > > I like zc mode, but the fact is that:
> > > 1) with ixgbe driver, my machine could totally lose connection as long
> > > as the xsk tries to send packets. I'm still debugging it :(
> > > 2) some customers using virtio_net don't have a supported host, so
> > > copy mode is the only one choice.
> > >
> > > >
> > > > > Skipping numerous checks somehow contributes to the transmission
> > > > > especially in the extremely hot path.
> > > > >
> > > > > Performance-wise, I used './xdpsock -i enp2s0f0np0 -t -S -s 64' to verify
> > > > > the guess and then measured on the machine with ixgbe driver. It stably
> > > > > goes up by 5.48%, which can be seen in the shown below:
> > > > > Before:
> > > > > sock0@enp2s0f0np0:0 txonly xdp-skb
> > > > > pps pkts 1.00
> > > > > rx 0 0
> > > > > tx 1,187,410 3,513,536
> > > > > After:
> > > > > sock0@enp2s0f0np0:0 txonly xdp-skb
> > > > > pps pkts 1.00
> > > > > rx 0 0
> > > > > tx 1,252,590 2,459,456
> > > > >
> > > > > This patch also removes total ~4% consumption which can be observed
> > > > > by perf:
> > > > > |--2.97%--validate_xmit_skb
> > > > > | |
> > > > > | --1.76%--netif_skb_features
> > > > > | |
> > > > > | --0.65%--skb_network_protocol
> > > > > |
> > > > > |--1.06%--validate_xmit_xfrm
> > > >
> > > > It is a bit surprising that mostly no-op validate_xmit_skb_list takes
> > > > 4% of the cycles. netif_skb_features taking ~2%? Any idea why? Is
> > > > it unoptimized kernel? Which driver is it?
> > >
> > > No idea on this one, sorry. I tested with different drivers (like
> > > i40e) and it turned out to be nearly the same result.
> >
> > I was trying to follow validate_xmit_skb_list, but too many things
> > happen in there. Although, without gso/vlan, most of these should
> > be no-op. Plus you have xfrm compiled in. So still surprising, let's see
> > if other people have any suggestions.
> >
> > > One of my machines looks like this:
> > > # lspci -vv | grep -i ether
> > > 02:00.0 Ethernet controller: Intel Corporation Ethernet Controller
> > > 10-Gigabit X540-AT2 (rev 01)
> > > 02:00.1 Ethernet controller: Intel Corporation Ethernet Controller
> > > 10-Gigabit X540-AT2 (rev 01)
> > > # lscpu
> > > Architecture: x86_64
> > > CPU op-mode(s): 32-bit, 64-bit
> > > Address sizes: 46 bits physical, 48 bits virtual
> > > Byte Order: Little Endian
> > > CPU(s): 48
> > > On-line CPU(s) list: 0-47
> > > Vendor ID: GenuineIntel
> > > BIOS Vendor ID: Intel(R) Corporation
> > > Model name: Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz
> > > BIOS Model name: Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz
> > > CPU @ 2.3GHz
> > > BIOS CPU family: 179
> > > CPU family: 6
> > > Model: 63
> > > Thread(s) per core: 2
> > > Core(s) per socket: 12
> > > Socket(s): 2
> > > Stepping: 2
> > > CPU(s) scaling MHz: 96%
> > > CPU max MHz: 3100.0000
> > > CPU min MHz: 1200.0000
> > > BogoMIPS: 4589.31
> > > Flags: fpu vme de pse tsc msr pae mce cx8 apic
> > > sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss
> > > ht tm pbe syscall nx pdpe1gb rdtscp lm constant_ts
> > > c arch_perfmon pebs bts rep_good nopl
> > > xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor
> > > ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm p
> > > cid dca sse4_1 sse4_2 x2apic movbe popcnt
> > > tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm cpuid_fault
> > > epb intel_ppin ssbd ibrs ibpb stibp tpr_shadow fl
> > > expriority ept vpid ept_ad fsgsbase
> > > tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc
> > > cqm_occup_llc dtherm ida arat pln pts vnmi md_clear flush_l
> > > 1d
> > >
> > > >
> > > > > Signed-off-by: Jason Xing <kernelxing@tencent.com>
> > > > > ---
> > > > > include/linux/netdevice.h | 4 ++--
> > > > > net/core/dev.c | 10 ++++++----
> > > > > net/xdp/xsk.c | 2 +-
> > > > > 3 files changed, 9 insertions(+), 7 deletions(-)
> > > > >
> > > > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > > > > index a80d21a14612..2df44c22406c 100644
> > > > > --- a/include/linux/netdevice.h
> > > > > +++ b/include/linux/netdevice.h
> > > > > @@ -3351,7 +3351,7 @@ u16 dev_pick_tx_zero(struct net_device *dev, struct sk_buff *skb,
> > > > > struct net_device *sb_dev);
> > > > >
> > > > > int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev);
> > > > > -int __dev_direct_xmit(struct sk_buff *skb, u16 queue_id);
> > > > > +int __dev_direct_xmit(struct sk_buff *skb, u16 queue_id, bool validate);
> > > > >
> > > > > static inline int dev_queue_xmit(struct sk_buff *skb)
> > > > > {
> > > > > @@ -3368,7 +3368,7 @@ static inline int dev_direct_xmit(struct sk_buff *skb, u16 queue_id)
> > > > > {
> > > > > int ret;
> > > > >
> > > > > - ret = __dev_direct_xmit(skb, queue_id);
> > > > > + ret = __dev_direct_xmit(skb, queue_id, true);
> > > > > if (!dev_xmit_complete(ret))
> > > > > kfree_skb(skb);
> > > > > return ret;
> > > >
> > > > Implementation wise, will it be better if we move a call to validate_xmit_skb_list
> > > > from __dev_direct_xmit to dev_direct_xmit (and a few other callers of
> > > > __dev_direct_xmit)? This will avoid the new flag.
> > >
> > > __dev_direct_xmit() helper was developed to serve the xsk type a few
> > > years ago. For now it has only two callers. If we expect to avoid a
> > > new parameter, we will also move the dev check[1] as below to the
> > > callers of __dev_direct_xmit(). Then move validate_xmit_skb_list to
> > > __dev_direct_xmit(). It's not that concise, I assume? I'm not sure if
> > > I miss your point.
> > >
> > > [1]
> > > if (unlikely(!netif_running(dev) || !netif_carrier_ok(dev)))
> > > goto drop;
> >
> > We can keep the check in its original place. I don't think the order of
> > the checks matters? I was thinking something along these (untested)
> > lines. Avoids one conditional in each path (not that it matters)..
>
> Sorry, in my point of view, one additional flag is really not a big
> deal even though it is on the hot path. A new flag doesn't cause any
[..]
> side effects at all, does it? On the contrary, running on top of the
> following code snippet, if the netdevice is down, dev_direct_xmit()
> will take a lot of time to run validate_xmit_skb_list() before two
> simple device if-statements.
Not sure that's worth optimizing (unless someone actively complains). We want
to send the packets fast, not drop them fast :-)
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-07-16 21:40 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-13 2:57 [PATCH net-next] xsk: skip validating skb list in xmit path Jason Xing
2025-07-14 16:03 ` Stanislav Fomichev
2025-07-14 23:53 ` Jason Xing
2025-07-15 20:44 ` Stanislav Fomichev
2025-07-15 23:24 ` Jason Xing
2025-07-16 21:40 ` Stanislav Fomichev
2025-07-15 23:19 ` Jakub Kicinski
2025-07-15 23:39 ` Jason Xing
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).