All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Xing <kerneljasonxing@gmail.com>
To: davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
	pabeni@redhat.com, horms@kernel.org
Cc: netdev@vger.kernel.org, Jason Xing <kerneljasonxing@gmail.com>
Subject: [PATCH net-next] net: add sysctl to toggle napi_consume_skb() alien skb defer
Date: Thu, 26 Mar 2026 22:42:49 +0800	[thread overview]
Message-ID: <20260326144249.97213-1-kerneljasonxing@gmail.com> (raw)

Commit e20dfbad8aab ("net: fix napi_consume_skb() with alien skbs")
defers freeing of alien SKBs (alloc_cpu != current cpu) via
skb_attempt_defer_free() on the TX completion path to reduce cross-NUMA
SLUB spinlock contention to improve multi-queue UDP workloads.

However, this unconditionally impacts the napi_skb_cache fast recycle
path for single-flow / few-flow workloads (e.g. AF_XDP benchmarks[1]):
when the TX completion NAPI CPU differs from the SKB allocation CPU,
SKBs are deferred instead of being returned to the local napi_skb_cache,
forcing RX allocations back to the slow slab path.

The existing net.core.skb_defer_max=0 could disable this, but it is a
global switch that also disables the defer mechanism in TCP/UDP/MPTCP
recvmsg paths, losing its positive SLUB locality benefits there. AF_XDP
can co-exist with other protocols. That's the reason why I gave up
reusing skb_defer_disable_key. Besides, if the defer path is disabled,
that means TCP/UDP/MPTCP in process path will trigger directly freeing
skb with enabling/disabling bottom half(in kfree_skb_napi_cache())
which could affect others. So my thinking is not to touch this path.

Add a dedicated sysctl net.core.napi_consume_skb_defer backed by a
static key to selectively control the alien skb defer feature. Let
users decide which is the best fit for their own requirements.

This patch also avoids touching local_bh* pair(in
kfree_skb_napi_cache())
to minimize the overhead.

[1]: taskset -c 0 ./xdpsock -i enp2s0f1 -q 1 -t -S -s 64
1) sysctl -w net.core.napi_consume_skb_defer=1 (as default)
 sock0@enp2s0f1:1 txonly xdp-skb
                   pps            pkts           1.00
rx                 0              0
tx                 1,851,950      20,397,952

2)sysctl -w net.core.napi_consume_skb_defer=0
 sock0@enp2s0f1:1 txonly xdp-skb
                   pps            pkts           1.00
rx                 0              0
tx                 1,985,067      25,530,432

For AF_XDP scenario, it turns out to be around 6.6% improvement.

Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
---
 net/core/net-sysfs.h       |  1 +
 net/core/skbuff.c          |  5 ++++-
 net/core/sysctl_net_core.c | 35 +++++++++++++++++++++++++++++++++++
 3 files changed, 40 insertions(+), 1 deletion(-)

diff --git a/net/core/net-sysfs.h b/net/core/net-sysfs.h
index 38e2e3ffd0bd..a026f757867e 100644
--- a/net/core/net-sysfs.h
+++ b/net/core/net-sysfs.h
@@ -14,4 +14,5 @@ int netdev_change_owner(struct net_device *, const struct net *net_old,
 extern struct mutex rps_default_mask_mutex;
 
 DECLARE_STATIC_KEY_FALSE(skb_defer_disable_key);
+DECLARE_STATIC_KEY_TRUE(napi_consume_skb_defer_key);
 #endif
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 3d6978dd0aa8..3db90a9aa61d 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -94,6 +94,7 @@
 
 #include "dev.h"
 #include "devmem.h"
+#include "net-sysfs.h"
 #include "netmem_priv.h"
 #include "sock_destructor.h"
 
@@ -1519,7 +1520,8 @@ void napi_consume_skb(struct sk_buff *skb, int budget)
 
 	DEBUG_NET_WARN_ON_ONCE(!in_softirq());
 
-	if (skb->alloc_cpu != smp_processor_id() && !skb_shared(skb)) {
+	if (static_branch_likely(&napi_consume_skb_defer_key) &&
+	    skb->alloc_cpu != smp_processor_id() && !skb_shared(skb)) {
 		skb_release_head_state(skb);
 		return skb_attempt_defer_free(skb);
 	}
@@ -7257,6 +7259,7 @@ static void kfree_skb_napi_cache(struct sk_buff *skb)
 }
 
 DEFINE_STATIC_KEY_FALSE(skb_defer_disable_key);
+DEFINE_STATIC_KEY_TRUE(napi_consume_skb_defer_key);
 
 /**
  * skb_attempt_defer_free - queue skb for remote freeing
diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index b508618bfc12..33e8217ee1ce 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -372,6 +372,35 @@ static int proc_do_skb_defer_max(const struct ctl_table *table, int write,
 	return ret;
 }
 
+static int proc_do_napi_consume_skb_defer(const struct ctl_table *table,
+					  int write, void *buffer,
+					  size_t *lenp, loff_t *ppos)
+{
+	static DEFINE_MUTEX(napi_consume_skb_defer_mutex);
+	int val, ret;
+
+	mutex_lock(&napi_consume_skb_defer_mutex);
+
+	val = static_key_enabled(&napi_consume_skb_defer_key);
+	struct ctl_table tmp = {
+		.data	= &val,
+		.maxlen	= sizeof(val),
+		.extra1	= SYSCTL_ZERO,
+		.extra2	= SYSCTL_ONE,
+	};
+
+	ret = proc_dointvec_minmax(&tmp, write, buffer, lenp, ppos);
+	if (!ret && write) {
+		if (val)
+			static_branch_enable(&napi_consume_skb_defer_key);
+		else
+			static_branch_disable(&napi_consume_skb_defer_key);
+	}
+
+	mutex_unlock(&napi_consume_skb_defer_mutex);
+	return ret;
+}
+
 #ifdef CONFIG_BPF_JIT
 static int proc_dointvec_minmax_bpf_enable(const struct ctl_table *table, int write,
 					   void *buffer, size_t *lenp,
@@ -676,6 +705,12 @@ static struct ctl_table net_core_table[] = {
 		.proc_handler	= proc_do_skb_defer_max,
 		.extra1		= SYSCTL_ZERO,
 	},
+	{
+		.procname	= "napi_consume_skb_defer",
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_do_napi_consume_skb_defer,
+	},
 };
 
 static struct ctl_table netns_core_table[] = {
-- 
2.41.3


             reply	other threads:[~2026-03-26 14:43 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-26 14:42 Jason Xing [this message]
2026-03-26 14:55 ` [PATCH net-next] net: add sysctl to toggle napi_consume_skb() alien skb defer Eric Dumazet
2026-03-26 16:21   ` Jason Xing
2026-03-27 10:37     ` Jason Xing
2026-03-27 14:47       ` Kuniyuki Iwashima
2026-03-27 15:29         ` Jason Xing
2026-03-27  4:13 ` Stanislav Fomichev
2026-03-27  8:34   ` Jason Xing

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260326144249.97213-1-kerneljasonxing@gmail.com \
    --to=kerneljasonxing@gmail.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.