From: Jason Xing <kerneljasonxing@gmail.com>
To: davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
pabeni@redhat.com, bjorn@kernel.org, magnus.karlsson@intel.com,
maciej.fijalkowski@intel.com, jonathan.lemon@gmail.com,
sdf@fomichev.me, ast@kernel.org, daniel@iogearbox.net,
hawk@kernel.org, john.fastabend@gmail.com
Cc: bpf@vger.kernel.org, netdev@vger.kernel.org,
Jason Xing <kernelxing@tencent.com>
Subject: [PATCH net-next v3] xsk: skip validating skb list in xmit path
Date: Tue, 25 Nov 2025 19:57:54 +0800 [thread overview]
Message-ID: <20251125115754.46793-1-kerneljasonxing@gmail.com> (raw)
From: Jason Xing <kernelxing@tencent.com>
This patch only does one thing that removes validate_xmit_skb_list()
for xsk.
For xsk, it's not needed to validate and check the skb in
validate_xmit_skb_list() in copy mode because xsk_build_skb() doesn't
and doesn't need to prepare those requisites to validate. Xsk is just
responsible for delivering raw data from userspace to the driver. This
is also how zerocopy works.
The __dev_direct_xmit was taken out of af_packet in commit 865b03f21162
("dev: packet: make packet_direct_xmit a common function"). And a call
to validate_xmit_skb_list was added in commit 104ba78c9880 ("packet: on
direct_xmit, limit tso and csum to supported devices") to support TSO.
Since we don't support tso/vlan offloads in xsk_build_skb, we can remove
validate_xmit_skb_list for xsk. I put the full analysis at the end of
the commit log[1].
Skipping numerous checks helps the transmission especially in the extremely
hot path, say, over 2,000,000 pps. In this kind of workload, even trivial
mathematical operations can bring performance overhead.
Performance-wise, I run './xdpsock -i enp2s0f0np0 -t -S -s 64' on 1Gb/sec
ixgbe driver to verify. It stably goes up by 5.48%, which can be seen in
the shown below:
Before:
sock0@enp2s0f0np0:0 txonly xdp-skb
pps pkts 1.00
rx 0 0
tx 1,187,410 3,513,536
After:
sock0@enp2s0f0np0:0 txonly xdp-skb
pps pkts 1.00
rx 0 0
tx 1,252,590 2,459,456
This patch also removes total ~4% consumption which can be observed
by perf:
|--2.97%--validate_xmit_skb
| |
| --1.76%--netif_skb_features
| |
| --0.65%--skb_network_protocol
|
|--1.06%--validate_xmit_xfrm
The above result has been verfied on different NICs, like I40E. I
managed to see the number is going up by 4%.
[1] - analysis of the validate_xmit_skb()
1. validate_xmit_unreadable_skb()
xsk doesn't initialize skb->unreadable, so the function will not free
the skb.
2. validate_xmit_vlan()
xsk also doesn't initialize skb->vlan_all.
3. sk_validate_xmit_skb()
skb from xsk_build_skb() doesn't have either sk_validate_xmit_skb or
sk_state, so the skb will not be validated.
4. netif_needs_gso()
af_xdp doesn't support gso/tso.
5. skb_needs_linearize() && __skb_linearize()
skb doesn't have frag_list as always, so skb_has_frag_list() returns
false. In copy mode, skb can put more data in the frags[] that can be
found in xsk_build_skb_zerocopy().
6. CHECKSUM_PARTIAL
skb doesn't have to set ip_summed, so we can skip this part as well.
7. validate_xmit_xfrm()
af_xdp has nothing to do with IPsec/XFRM, so we don't need this check
either.
Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
V3
Link: https://lore.kernel.org/all/20250716122725.6088-1-kerneljasonxing@gmail.com/
1. add a full analysis about why we can remove validation in af_xdp
2. I didn't add Stan's acked-by since it has been a while.
V2
Link: https://lore.kernel.org/all/20250713025756.24601-1-kerneljasonxing@gmail.com/
1. avoid adding a new flag
2. add more descriptions from Stan
---
include/linux/netdevice.h | 30 ++++++++++++++++++++----------
net/core/dev.c | 6 ------
2 files changed, 20 insertions(+), 16 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index e808071dbb7d..cafeb06b523d 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3374,16 +3374,6 @@ static inline int dev_queue_xmit_accel(struct sk_buff *skb,
return __dev_queue_xmit(skb, sb_dev);
}
-static inline int dev_direct_xmit(struct sk_buff *skb, u16 queue_id)
-{
- int ret;
-
- ret = __dev_direct_xmit(skb, queue_id);
- if (!dev_xmit_complete(ret))
- kfree_skb(skb);
- return ret;
-}
-
int register_netdevice(struct net_device *dev);
void unregister_netdevice_queue(struct net_device *dev, struct list_head *head);
void unregister_netdevice_many(struct list_head *head);
@@ -4343,6 +4333,26 @@ static __always_inline int ____dev_forward_skb(struct net_device *dev,
return 0;
}
+static inline int dev_direct_xmit(struct sk_buff *skb, u16 queue_id)
+{
+ struct net_device *dev = skb->dev;
+ struct sk_buff *orig_skb = skb;
+ bool again = false;
+ int ret;
+
+ skb = validate_xmit_skb_list(skb, dev, &again);
+ if (skb != orig_skb) {
+ dev_core_stats_tx_dropped_inc(dev);
+ kfree_skb_list(skb);
+ return NET_XMIT_DROP;
+ }
+
+ ret = __dev_direct_xmit(skb, queue_id);
+ if (!dev_xmit_complete(ret))
+ kfree_skb(skb);
+ return ret;
+}
+
bool dev_nit_active_rcu(const struct net_device *dev);
static inline bool dev_nit_active(const struct net_device *dev)
{
diff --git a/net/core/dev.c b/net/core/dev.c
index 69515edd17bc..82d5d098464f 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4843,19 +4843,13 @@ EXPORT_SYMBOL(__dev_queue_xmit);
int __dev_direct_xmit(struct sk_buff *skb, u16 queue_id)
{
struct net_device *dev = skb->dev;
- struct sk_buff *orig_skb = skb;
struct netdev_queue *txq;
int ret = NETDEV_TX_BUSY;
- bool again = false;
if (unlikely(!netif_running(dev) ||
!netif_carrier_ok(dev)))
goto drop;
- skb = validate_xmit_skb_list(skb, dev, &again);
- if (skb != orig_skb)
- goto drop;
-
skb_set_queue_mapping(skb, queue_id);
txq = skb_get_tx_queue(dev, skb);
--
2.41.3
next reply other threads:[~2025-11-25 11:58 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-25 11:57 Jason Xing [this message]
2025-11-27 12:02 ` [PATCH net-next v3] xsk: skip validating skb list in xmit path Paolo Abeni
2025-11-27 12:49 ` Jason Xing
2025-11-27 17:58 ` Paolo Abeni
2025-11-28 1:44 ` Jason Xing
2025-11-28 8:40 ` Paolo Abeni
2025-11-28 12:59 ` Jason Xing
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251125115754.46793-1-kerneljasonxing@gmail.com \
--to=kerneljasonxing@gmail.com \
--cc=ast@kernel.org \
--cc=bjorn@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=hawk@kernel.org \
--cc=john.fastabend@gmail.com \
--cc=jonathan.lemon@gmail.com \
--cc=kernelxing@tencent.com \
--cc=kuba@kernel.org \
--cc=maciej.fijalkowski@intel.com \
--cc=magnus.karlsson@intel.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=sdf@fomichev.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).