netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jason Xing <kerneljasonxing@gmail.com>
To: davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
	pabeni@redhat.com, bjorn@kernel.org, magnus.karlsson@intel.com,
	maciej.fijalkowski@intel.com, jonathan.lemon@gmail.com,
	sdf@fomichev.me, ast@kernel.org, daniel@iogearbox.net,
	hawk@kernel.org, john.fastabend@gmail.com
Cc: bpf@vger.kernel.org, netdev@vger.kernel.org,
	Jason Xing <kernelxing@tencent.com>
Subject: [PATCH net-next v3] xsk: skip validating skb list in xmit path
Date: Tue, 25 Nov 2025 19:57:54 +0800	[thread overview]
Message-ID: <20251125115754.46793-1-kerneljasonxing@gmail.com> (raw)

From: Jason Xing <kernelxing@tencent.com>

This patch only does one thing that removes validate_xmit_skb_list()
for xsk.

For xsk, it's not needed to validate and check the skb in
validate_xmit_skb_list() in copy mode because xsk_build_skb() doesn't
and doesn't need to prepare those requisites to validate. Xsk is just
responsible for delivering raw data from userspace to the driver. This
is also how zerocopy works.

The __dev_direct_xmit was taken out of af_packet in commit 865b03f21162
("dev: packet: make packet_direct_xmit a common function"). And a call
to validate_xmit_skb_list was added in commit 104ba78c9880 ("packet: on
direct_xmit, limit tso and csum to supported devices") to support TSO.
Since we don't support tso/vlan offloads in xsk_build_skb, we can remove
validate_xmit_skb_list for xsk. I put the full analysis at the end of
the commit log[1].

Skipping numerous checks helps the transmission especially in the extremely
hot path, say, over 2,000,000 pps. In this kind of workload, even trivial
mathematical operations can bring performance overhead.

Performance-wise, I run './xdpsock -i enp2s0f0np0 -t  -S -s 64' on 1Gb/sec
ixgbe driver to verify. It stably goes up by 5.48%, which can be seen in
the shown below:
Before:
 sock0@enp2s0f0np0:0 txonly xdp-skb
                   pps            pkts           1.00
rx                 0              0
tx                 1,187,410      3,513,536
After:
 sock0@enp2s0f0np0:0 txonly xdp-skb
                   pps            pkts           1.00
rx                 0              0
tx                 1,252,590      2,459,456

This patch also removes total ~4% consumption which can be observed
by perf:
|--2.97%--validate_xmit_skb
|          |
|           --1.76%--netif_skb_features
|                     |
|                      --0.65%--skb_network_protocol
|
|--1.06%--validate_xmit_xfrm

The above result has been verfied on different NICs, like I40E. I
managed to see the number is going up by 4%.

[1] - analysis of the validate_xmit_skb()
1. validate_xmit_unreadable_skb()
   xsk doesn't initialize skb->unreadable, so the function will not free
   the skb.
2. validate_xmit_vlan()
   xsk also doesn't initialize skb->vlan_all.
3. sk_validate_xmit_skb()
   skb from xsk_build_skb() doesn't have either sk_validate_xmit_skb or
   sk_state, so the skb will not be validated.
4. netif_needs_gso()
   af_xdp doesn't support gso/tso.
5. skb_needs_linearize() && __skb_linearize()
   skb doesn't have frag_list as always, so skb_has_frag_list() returns
   false. In copy mode, skb can put more data in the frags[] that can be
   found in xsk_build_skb_zerocopy().
6. CHECKSUM_PARTIAL
   skb doesn't have to set ip_summed, so we can skip this part as well.
7. validate_xmit_xfrm()
   af_xdp has nothing to do with IPsec/XFRM, so we don't need this check
   either.

Signed-off-by: Jason Xing <kernelxing@tencent.com>
---
V3
Link: https://lore.kernel.org/all/20250716122725.6088-1-kerneljasonxing@gmail.com/
1. add a full analysis about why we can remove validation in af_xdp
2. I didn't add Stan's acked-by since it has been a while.

V2
Link: https://lore.kernel.org/all/20250713025756.24601-1-kerneljasonxing@gmail.com/
1. avoid adding a new flag
2. add more descriptions from Stan
---
 include/linux/netdevice.h | 30 ++++++++++++++++++++----------
 net/core/dev.c            |  6 ------
 2 files changed, 20 insertions(+), 16 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index e808071dbb7d..cafeb06b523d 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3374,16 +3374,6 @@ static inline int dev_queue_xmit_accel(struct sk_buff *skb,
 	return __dev_queue_xmit(skb, sb_dev);
 }
 
-static inline int dev_direct_xmit(struct sk_buff *skb, u16 queue_id)
-{
-	int ret;
-
-	ret = __dev_direct_xmit(skb, queue_id);
-	if (!dev_xmit_complete(ret))
-		kfree_skb(skb);
-	return ret;
-}
-
 int register_netdevice(struct net_device *dev);
 void unregister_netdevice_queue(struct net_device *dev, struct list_head *head);
 void unregister_netdevice_many(struct list_head *head);
@@ -4343,6 +4333,26 @@ static __always_inline int ____dev_forward_skb(struct net_device *dev,
 	return 0;
 }
 
+static inline int dev_direct_xmit(struct sk_buff *skb, u16 queue_id)
+{
+	struct net_device *dev = skb->dev;
+	struct sk_buff *orig_skb = skb;
+	bool again = false;
+	int ret;
+
+	skb = validate_xmit_skb_list(skb, dev, &again);
+	if (skb != orig_skb) {
+		dev_core_stats_tx_dropped_inc(dev);
+		kfree_skb_list(skb);
+		return NET_XMIT_DROP;
+	}
+
+	ret = __dev_direct_xmit(skb, queue_id);
+	if (!dev_xmit_complete(ret))
+		kfree_skb(skb);
+	return ret;
+}
+
 bool dev_nit_active_rcu(const struct net_device *dev);
 static inline bool dev_nit_active(const struct net_device *dev)
 {
diff --git a/net/core/dev.c b/net/core/dev.c
index 69515edd17bc..82d5d098464f 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4843,19 +4843,13 @@ EXPORT_SYMBOL(__dev_queue_xmit);
 int __dev_direct_xmit(struct sk_buff *skb, u16 queue_id)
 {
 	struct net_device *dev = skb->dev;
-	struct sk_buff *orig_skb = skb;
 	struct netdev_queue *txq;
 	int ret = NETDEV_TX_BUSY;
-	bool again = false;
 
 	if (unlikely(!netif_running(dev) ||
 		     !netif_carrier_ok(dev)))
 		goto drop;
 
-	skb = validate_xmit_skb_list(skb, dev, &again);
-	if (skb != orig_skb)
-		goto drop;
-
 	skb_set_queue_mapping(skb, queue_id);
 	txq = skb_get_tx_queue(dev, skb);
 
-- 
2.41.3


             reply	other threads:[~2025-11-25 11:58 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-25 11:57 Jason Xing [this message]
2025-11-27 12:02 ` [PATCH net-next v3] xsk: skip validating skb list in xmit path Paolo Abeni
2025-11-27 12:49   ` Jason Xing
2025-11-27 17:58     ` Paolo Abeni
2025-11-28  1:44       ` Jason Xing
2025-11-28  8:40         ` Paolo Abeni
2025-11-28 12:59           ` Jason Xing

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251125115754.46793-1-kerneljasonxing@gmail.com \
    --to=kerneljasonxing@gmail.com \
    --cc=ast@kernel.org \
    --cc=bjorn@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=hawk@kernel.org \
    --cc=john.fastabend@gmail.com \
    --cc=jonathan.lemon@gmail.com \
    --cc=kernelxing@tencent.com \
    --cc=kuba@kernel.org \
    --cc=maciej.fijalkowski@intel.com \
    --cc=magnus.karlsson@intel.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=sdf@fomichev.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).