* [PATCH net-next] net: add sysctl to toggle napi_consume_skb() alien skb defer
@ 2026-03-26 14:42 Jason Xing
2026-03-26 14:55 ` Eric Dumazet
2026-03-27 4:13 ` Stanislav Fomichev
0 siblings, 2 replies; 5+ messages in thread
From: Jason Xing @ 2026-03-26 14:42 UTC (permalink / raw)
To: davem, edumazet, kuba, pabeni, horms; +Cc: netdev, Jason Xing
Commit e20dfbad8aab ("net: fix napi_consume_skb() with alien skbs")
defers freeing of alien SKBs (alloc_cpu != current cpu) via
skb_attempt_defer_free() on the TX completion path to reduce cross-NUMA
SLUB spinlock contention to improve multi-queue UDP workloads.
However, this unconditionally impacts the napi_skb_cache fast recycle
path for single-flow / few-flow workloads (e.g. AF_XDP benchmarks[1]):
when the TX completion NAPI CPU differs from the SKB allocation CPU,
SKBs are deferred instead of being returned to the local napi_skb_cache,
forcing RX allocations back to the slow slab path.
The existing net.core.skb_defer_max=0 could disable this, but it is a
global switch that also disables the defer mechanism in TCP/UDP/MPTCP
recvmsg paths, losing its positive SLUB locality benefits there. AF_XDP
can co-exist with other protocols. That's the reason why I gave up
reusing skb_defer_disable_key. Besides, if the defer path is disabled,
that means TCP/UDP/MPTCP in process path will trigger directly freeing
skb with enabling/disabling bottom half(in kfree_skb_napi_cache())
which could affect others. So my thinking is not to touch this path.
Add a dedicated sysctl net.core.napi_consume_skb_defer backed by a
static key to selectively control the alien skb defer feature. Let
users decide which is the best fit for their own requirements.
This patch also avoids touching local_bh* pair(in
kfree_skb_napi_cache())
to minimize the overhead.
[1]: taskset -c 0 ./xdpsock -i enp2s0f1 -q 1 -t -S -s 64
1) sysctl -w net.core.napi_consume_skb_defer=1 (as default)
sock0@enp2s0f1:1 txonly xdp-skb
pps pkts 1.00
rx 0 0
tx 1,851,950 20,397,952
2)sysctl -w net.core.napi_consume_skb_defer=0
sock0@enp2s0f1:1 txonly xdp-skb
pps pkts 1.00
rx 0 0
tx 1,985,067 25,530,432
For AF_XDP scenario, it turns out to be around 6.6% improvement.
Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
---
net/core/net-sysfs.h | 1 +
net/core/skbuff.c | 5 ++++-
net/core/sysctl_net_core.c | 35 +++++++++++++++++++++++++++++++++++
3 files changed, 40 insertions(+), 1 deletion(-)
diff --git a/net/core/net-sysfs.h b/net/core/net-sysfs.h
index 38e2e3ffd0bd..a026f757867e 100644
--- a/net/core/net-sysfs.h
+++ b/net/core/net-sysfs.h
@@ -14,4 +14,5 @@ int netdev_change_owner(struct net_device *, const struct net *net_old,
extern struct mutex rps_default_mask_mutex;
DECLARE_STATIC_KEY_FALSE(skb_defer_disable_key);
+DECLARE_STATIC_KEY_TRUE(napi_consume_skb_defer_key);
#endif
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 3d6978dd0aa8..3db90a9aa61d 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -94,6 +94,7 @@
#include "dev.h"
#include "devmem.h"
+#include "net-sysfs.h"
#include "netmem_priv.h"
#include "sock_destructor.h"
@@ -1519,7 +1520,8 @@ void napi_consume_skb(struct sk_buff *skb, int budget)
DEBUG_NET_WARN_ON_ONCE(!in_softirq());
- if (skb->alloc_cpu != smp_processor_id() && !skb_shared(skb)) {
+ if (static_branch_likely(&napi_consume_skb_defer_key) &&
+ skb->alloc_cpu != smp_processor_id() && !skb_shared(skb)) {
skb_release_head_state(skb);
return skb_attempt_defer_free(skb);
}
@@ -7257,6 +7259,7 @@ static void kfree_skb_napi_cache(struct sk_buff *skb)
}
DEFINE_STATIC_KEY_FALSE(skb_defer_disable_key);
+DEFINE_STATIC_KEY_TRUE(napi_consume_skb_defer_key);
/**
* skb_attempt_defer_free - queue skb for remote freeing
diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index b508618bfc12..33e8217ee1ce 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -372,6 +372,35 @@ static int proc_do_skb_defer_max(const struct ctl_table *table, int write,
return ret;
}
+static int proc_do_napi_consume_skb_defer(const struct ctl_table *table,
+ int write, void *buffer,
+ size_t *lenp, loff_t *ppos)
+{
+ static DEFINE_MUTEX(napi_consume_skb_defer_mutex);
+ int val, ret;
+
+ mutex_lock(&napi_consume_skb_defer_mutex);
+
+ val = static_key_enabled(&napi_consume_skb_defer_key);
+ struct ctl_table tmp = {
+ .data = &val,
+ .maxlen = sizeof(val),
+ .extra1 = SYSCTL_ZERO,
+ .extra2 = SYSCTL_ONE,
+ };
+
+ ret = proc_dointvec_minmax(&tmp, write, buffer, lenp, ppos);
+ if (!ret && write) {
+ if (val)
+ static_branch_enable(&napi_consume_skb_defer_key);
+ else
+ static_branch_disable(&napi_consume_skb_defer_key);
+ }
+
+ mutex_unlock(&napi_consume_skb_defer_mutex);
+ return ret;
+}
+
#ifdef CONFIG_BPF_JIT
static int proc_dointvec_minmax_bpf_enable(const struct ctl_table *table, int write,
void *buffer, size_t *lenp,
@@ -676,6 +705,12 @@ static struct ctl_table net_core_table[] = {
.proc_handler = proc_do_skb_defer_max,
.extra1 = SYSCTL_ZERO,
},
+ {
+ .procname = "napi_consume_skb_defer",
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_do_napi_consume_skb_defer,
+ },
};
static struct ctl_table netns_core_table[] = {
--
2.41.3
^ permalink raw reply related [flat|nested] 5+ messages in thread* Re: [PATCH net-next] net: add sysctl to toggle napi_consume_skb() alien skb defer
2026-03-26 14:42 [PATCH net-next] net: add sysctl to toggle napi_consume_skb() alien skb defer Jason Xing
@ 2026-03-26 14:55 ` Eric Dumazet
2026-03-26 16:21 ` Jason Xing
2026-03-27 4:13 ` Stanislav Fomichev
1 sibling, 1 reply; 5+ messages in thread
From: Eric Dumazet @ 2026-03-26 14:55 UTC (permalink / raw)
To: Jason Xing; +Cc: davem, kuba, pabeni, horms, netdev
On Thu, Mar 26, 2026 at 7:43 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
>
> Commit e20dfbad8aab ("net: fix napi_consume_skb() with alien skbs")
> defers freeing of alien SKBs (alloc_cpu != current cpu) via
> skb_attempt_defer_free() on the TX completion path to reduce cross-NUMA
> SLUB spinlock contention to improve multi-queue UDP workloads.
>
> However, this unconditionally impacts the napi_skb_cache fast recycle
> path for single-flow / few-flow workloads (e.g. AF_XDP benchmarks[1]):
> when the TX completion NAPI CPU differs from the SKB allocation CPU,
> SKBs are deferred instead of being returned to the local napi_skb_cache,
> forcing RX allocations back to the slow slab path.
>
> The existing net.core.skb_defer_max=0 could disable this, but it is a
> global switch that also disables the defer mechanism in TCP/UDP/MPTCP
> recvmsg paths, losing its positive SLUB locality benefits there. AF_XDP
> can co-exist with other protocols. That's the reason why I gave up
> reusing skb_defer_disable_key. Besides, if the defer path is disabled,
> that means TCP/UDP/MPTCP in process path will trigger directly freeing
> skb with enabling/disabling bottom half(in kfree_skb_napi_cache())
> which could affect others. So my thinking is not to touch this path.
>
> Add a dedicated sysctl net.core.napi_consume_skb_defer backed by a
> static key to selectively control the alien skb defer feature. Let
> users decide which is the best fit for their own requirements.
>
> This patch also avoids touching local_bh* pair(in
> kfree_skb_napi_cache())
> to minimize the overhead.
>
> [1]: taskset -c 0 ./xdpsock -i enp2s0f1 -q 1 -t -S -s 64
> 1) sysctl -w net.core.napi_consume_skb_defer=1 (as default)
> sock0@enp2s0f1:1 txonly xdp-skb
> pps pkts 1.00
> rx 0 0
> tx 1,851,950 20,397,952
>
> 2)sysctl -w net.core.napi_consume_skb_defer=0
> sock0@enp2s0f1:1 txonly xdp-skb
> pps pkts 1.00
> rx 0 0
> tx 1,985,067 25,530,432
>
> For AF_XDP scenario, it turns out to be around 6.6% improvement.
>
> Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
> ---
> net/core/net-sysfs.h | 1 +
> net/core/skbuff.c | 5 ++++-
> net/core/sysctl_net_core.c | 35 +++++++++++++++++++++++++++++++++++
> 3 files changed, 40 insertions(+), 1 deletion(-)
>
> diff --git a/net/core/net-sysfs.h b/net/core/net-sysfs.h
> index 38e2e3ffd0bd..a026f757867e 100644
> --- a/net/core/net-sysfs.h
> +++ b/net/core/net-sysfs.h
> @@ -14,4 +14,5 @@ int netdev_change_owner(struct net_device *, const struct net *net_old,
> extern struct mutex rps_default_mask_mutex;
>
> DECLARE_STATIC_KEY_FALSE(skb_defer_disable_key);
> +DECLARE_STATIC_KEY_TRUE(napi_consume_skb_defer_key);
> #endif
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 3d6978dd0aa8..3db90a9aa61d 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -94,6 +94,7 @@
>
> #include "dev.h"
> #include "devmem.h"
> +#include "net-sysfs.h"
> #include "netmem_priv.h"
> #include "sock_destructor.h"
>
> @@ -1519,7 +1520,8 @@ void napi_consume_skb(struct sk_buff *skb, int budget)
>
> DEBUG_NET_WARN_ON_ONCE(!in_softirq());
>
> - if (skb->alloc_cpu != smp_processor_id() && !skb_shared(skb)) {
> + if (static_branch_likely(&napi_consume_skb_defer_key) &&
> + skb->alloc_cpu != smp_processor_id() && !skb_shared(skb)) {
> skb_release_head_state(skb);
> return skb_attempt_defer_free(skb);
> }
> @@ -7257,6 +7259,7 @@ static void kfree_skb_napi_cache(struct sk_buff *skb)
> }
>
> DEFINE_STATIC_KEY_FALSE(skb_defer_disable_key);
> +DEFINE_STATIC_KEY_TRUE(napi_consume_skb_defer_key);
>
> /**
> * skb_attempt_defer_free - queue skb for remote freeing
> diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
> index b508618bfc12..33e8217ee1ce 100644
> --- a/net/core/sysctl_net_core.c
> +++ b/net/core/sysctl_net_core.c
> @@ -372,6 +372,35 @@ static int proc_do_skb_defer_max(const struct ctl_table *table, int write,
> return ret;
> }
>
> +static int proc_do_napi_consume_skb_defer(const struct ctl_table *table,
> + int write, void *buffer,
> + size_t *lenp, loff_t *ppos)
> +{
> + static DEFINE_MUTEX(napi_consume_skb_defer_mutex);
> + int val, ret;
> +
> + mutex_lock(&napi_consume_skb_defer_mutex);
> +
> + val = static_key_enabled(&napi_consume_skb_defer_key);
> + struct ctl_table tmp = {
> + .data = &val,
> + .maxlen = sizeof(val),
> + .extra1 = SYSCTL_ZERO,
> + .extra2 = SYSCTL_ONE,
> + };
> +
> + ret = proc_dointvec_minmax(&tmp, write, buffer, lenp, ppos);
> + if (!ret && write) {
> + if (val)
> + static_branch_enable(&napi_consume_skb_defer_key);
> + else
> + static_branch_disable(&napi_consume_skb_defer_key);
> + }
> +
> + mutex_unlock(&napi_consume_skb_defer_mutex);
> + return ret;
> +}
> +
Seems a copy/paste of proc_do_static_key() ?
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [PATCH net-next] net: add sysctl to toggle napi_consume_skb() alien skb defer
2026-03-26 14:55 ` Eric Dumazet
@ 2026-03-26 16:21 ` Jason Xing
0 siblings, 0 replies; 5+ messages in thread
From: Jason Xing @ 2026-03-26 16:21 UTC (permalink / raw)
To: Eric Dumazet; +Cc: davem, kuba, pabeni, horms, netdev
On Thu, Mar 26, 2026 at 10:55 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Thu, Mar 26, 2026 at 7:43 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> >
> > Commit e20dfbad8aab ("net: fix napi_consume_skb() with alien skbs")
> > defers freeing of alien SKBs (alloc_cpu != current cpu) via
> > skb_attempt_defer_free() on the TX completion path to reduce cross-NUMA
> > SLUB spinlock contention to improve multi-queue UDP workloads.
> >
> > However, this unconditionally impacts the napi_skb_cache fast recycle
> > path for single-flow / few-flow workloads (e.g. AF_XDP benchmarks[1]):
> > when the TX completion NAPI CPU differs from the SKB allocation CPU,
> > SKBs are deferred instead of being returned to the local napi_skb_cache,
> > forcing RX allocations back to the slow slab path.
> >
> > The existing net.core.skb_defer_max=0 could disable this, but it is a
> > global switch that also disables the defer mechanism in TCP/UDP/MPTCP
> > recvmsg paths, losing its positive SLUB locality benefits there. AF_XDP
> > can co-exist with other protocols. That's the reason why I gave up
> > reusing skb_defer_disable_key. Besides, if the defer path is disabled,
> > that means TCP/UDP/MPTCP in process path will trigger directly freeing
> > skb with enabling/disabling bottom half(in kfree_skb_napi_cache())
> > which could affect others. So my thinking is not to touch this path.
> >
> > Add a dedicated sysctl net.core.napi_consume_skb_defer backed by a
> > static key to selectively control the alien skb defer feature. Let
> > users decide which is the best fit for their own requirements.
> >
> > This patch also avoids touching local_bh* pair(in
> > kfree_skb_napi_cache())
> > to minimize the overhead.
> >
> > [1]: taskset -c 0 ./xdpsock -i enp2s0f1 -q 1 -t -S -s 64
> > 1) sysctl -w net.core.napi_consume_skb_defer=1 (as default)
> > sock0@enp2s0f1:1 txonly xdp-skb
> > pps pkts 1.00
> > rx 0 0
> > tx 1,851,950 20,397,952
> >
> > 2)sysctl -w net.core.napi_consume_skb_defer=0
> > sock0@enp2s0f1:1 txonly xdp-skb
> > pps pkts 1.00
> > rx 0 0
> > tx 1,985,067 25,530,432
> >
> > For AF_XDP scenario, it turns out to be around 6.6% improvement.
> >
> > Signed-off-by: Jason Xing <kerneljasonxing@gmail.com>
> > ---
> > net/core/net-sysfs.h | 1 +
> > net/core/skbuff.c | 5 ++++-
> > net/core/sysctl_net_core.c | 35 +++++++++++++++++++++++++++++++++++
> > 3 files changed, 40 insertions(+), 1 deletion(-)
> >
> > diff --git a/net/core/net-sysfs.h b/net/core/net-sysfs.h
> > index 38e2e3ffd0bd..a026f757867e 100644
> > --- a/net/core/net-sysfs.h
> > +++ b/net/core/net-sysfs.h
> > @@ -14,4 +14,5 @@ int netdev_change_owner(struct net_device *, const struct net *net_old,
> > extern struct mutex rps_default_mask_mutex;
> >
> > DECLARE_STATIC_KEY_FALSE(skb_defer_disable_key);
> > +DECLARE_STATIC_KEY_TRUE(napi_consume_skb_defer_key);
> > #endif
> > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > index 3d6978dd0aa8..3db90a9aa61d 100644
> > --- a/net/core/skbuff.c
> > +++ b/net/core/skbuff.c
> > @@ -94,6 +94,7 @@
> >
> > #include "dev.h"
> > #include "devmem.h"
> > +#include "net-sysfs.h"
> > #include "netmem_priv.h"
> > #include "sock_destructor.h"
> >
> > @@ -1519,7 +1520,8 @@ void napi_consume_skb(struct sk_buff *skb, int budget)
> >
> > DEBUG_NET_WARN_ON_ONCE(!in_softirq());
> >
> > - if (skb->alloc_cpu != smp_processor_id() && !skb_shared(skb)) {
> > + if (static_branch_likely(&napi_consume_skb_defer_key) &&
> > + skb->alloc_cpu != smp_processor_id() && !skb_shared(skb)) {
> > skb_release_head_state(skb);
> > return skb_attempt_defer_free(skb);
> > }
> > @@ -7257,6 +7259,7 @@ static void kfree_skb_napi_cache(struct sk_buff *skb)
> > }
> >
> > DEFINE_STATIC_KEY_FALSE(skb_defer_disable_key);
> > +DEFINE_STATIC_KEY_TRUE(napi_consume_skb_defer_key);
> >
> > /**
> > * skb_attempt_defer_free - queue skb for remote freeing
> > diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
> > index b508618bfc12..33e8217ee1ce 100644
> > --- a/net/core/sysctl_net_core.c
> > +++ b/net/core/sysctl_net_core.c
> > @@ -372,6 +372,35 @@ static int proc_do_skb_defer_max(const struct ctl_table *table, int write,
> > return ret;
> > }
> >
> > +static int proc_do_napi_consume_skb_defer(const struct ctl_table *table,
> > + int write, void *buffer,
> > + size_t *lenp, loff_t *ppos)
> > +{
> > + static DEFINE_MUTEX(napi_consume_skb_defer_mutex);
> > + int val, ret;
> > +
> > + mutex_lock(&napi_consume_skb_defer_mutex);
> > +
> > + val = static_key_enabled(&napi_consume_skb_defer_key);
> > + struct ctl_table tmp = {
> > + .data = &val,
> > + .maxlen = sizeof(val),
> > + .extra1 = SYSCTL_ZERO,
> > + .extra2 = SYSCTL_ONE,
> > + };
> > +
> > + ret = proc_dointvec_minmax(&tmp, write, buffer, lenp, ppos);
> > + if (!ret && write) {
> > + if (val)
> > + static_branch_enable(&napi_consume_skb_defer_key);
> > + else
> > + static_branch_disable(&napi_consume_skb_defer_key);
> > + }
> > +
> > + mutex_unlock(&napi_consume_skb_defer_mutex);
> > + return ret;
> > +}
> > +
>
> Seems a copy/paste of proc_do_static_key() ?
Thanks, Eric. Much simpler right now:
diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index b508618bfc12..a6a1b2c3f8e1 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -676,6 +676,16 @@ static struct ctl_table net_core_table[] = {
.proc_handler = proc_do_skb_defer_max,
.extra1 = SYSCTL_ZERO,
},
+ {
+ .procname = "napi_consume_skb_defer",
+ .data = &napi_consume_skb_defer_key.key,
+ .maxlen = sizeof(napi_consume_skb_defer_key),
+ .mode = 0644,
+ .proc_handler = proc_do_static_key,
+ },
};
I will post it then after more tests tomorrow morning :)
Thanks,
Jason
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH net-next] net: add sysctl to toggle napi_consume_skb() alien skb defer
2026-03-26 14:42 [PATCH net-next] net: add sysctl to toggle napi_consume_skb() alien skb defer Jason Xing
2026-03-26 14:55 ` Eric Dumazet
@ 2026-03-27 4:13 ` Stanislav Fomichev
2026-03-27 8:34 ` Jason Xing
1 sibling, 1 reply; 5+ messages in thread
From: Stanislav Fomichev @ 2026-03-27 4:13 UTC (permalink / raw)
To: Jason Xing; +Cc: davem, edumazet, kuba, pabeni, horms, netdev
On 03/26, Jason Xing wrote:
> Commit e20dfbad8aab ("net: fix napi_consume_skb() with alien skbs")
> defers freeing of alien SKBs (alloc_cpu != current cpu) via
> skb_attempt_defer_free() on the TX completion path to reduce cross-NUMA
> SLUB spinlock contention to improve multi-queue UDP workloads.
>
> However, this unconditionally impacts the napi_skb_cache fast recycle
> path for single-flow / few-flow workloads (e.g. AF_XDP benchmarks[1]):
> when the TX completion NAPI CPU differs from the SKB allocation CPU,
> SKBs are deferred instead of being returned to the local napi_skb_cache,
> forcing RX allocations back to the slow slab path.
>
> The existing net.core.skb_defer_max=0 could disable this, but it is a
> global switch that also disables the defer mechanism in TCP/UDP/MPTCP
> recvmsg paths, losing its positive SLUB locality benefits there. AF_XDP
> can co-exist with other protocols. That's the reason why I gave up
> reusing skb_defer_disable_key. Besides, if the defer path is disabled,
> that means TCP/UDP/MPTCP in process path will trigger directly freeing
> skb with enabling/disabling bottom half(in kfree_skb_napi_cache())
> which could affect others. So my thinking is not to touch this path.
>
> Add a dedicated sysctl net.core.napi_consume_skb_defer backed by a
> static key to selectively control the alien skb defer feature. Let
> users decide which is the best fit for their own requirements.
For a non-zc path adding a userspace knob feels like too much. And there
is zero documentation for users to decide which mode to use.
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [PATCH net-next] net: add sysctl to toggle napi_consume_skb() alien skb defer
2026-03-27 4:13 ` Stanislav Fomichev
@ 2026-03-27 8:34 ` Jason Xing
0 siblings, 0 replies; 5+ messages in thread
From: Jason Xing @ 2026-03-27 8:34 UTC (permalink / raw)
To: Stanislav Fomichev; +Cc: davem, edumazet, kuba, pabeni, horms, netdev
On Fri, Mar 27, 2026 at 12:13 PM Stanislav Fomichev
<stfomichev@gmail.com> wrote:
>
> On 03/26, Jason Xing wrote:
> > Commit e20dfbad8aab ("net: fix napi_consume_skb() with alien skbs")
> > defers freeing of alien SKBs (alloc_cpu != current cpu) via
> > skb_attempt_defer_free() on the TX completion path to reduce cross-NUMA
> > SLUB spinlock contention to improve multi-queue UDP workloads.
> >
> > However, this unconditionally impacts the napi_skb_cache fast recycle
> > path for single-flow / few-flow workloads (e.g. AF_XDP benchmarks[1]):
> > when the TX completion NAPI CPU differs from the SKB allocation CPU,
> > SKBs are deferred instead of being returned to the local napi_skb_cache,
> > forcing RX allocations back to the slow slab path.
> >
> > The existing net.core.skb_defer_max=0 could disable this, but it is a
> > global switch that also disables the defer mechanism in TCP/UDP/MPTCP
> > recvmsg paths, losing its positive SLUB locality benefits there. AF_XDP
> > can co-exist with other protocols. That's the reason why I gave up
> > reusing skb_defer_disable_key. Besides, if the defer path is disabled,
> > that means TCP/UDP/MPTCP in process path will trigger directly freeing
> > skb with enabling/disabling bottom half(in kfree_skb_napi_cache())
> > which could affect others. So my thinking is not to touch this path.
> >
> > Add a dedicated sysctl net.core.napi_consume_skb_defer backed by a
> > static key to selectively control the alien skb defer feature. Let
> > users decide which is the best fit for their own requirements.
>
> For a non-zc path adding a userspace knob feels like too much. And there
> is zero documentation for users to decide which mode to use.
AF_XDP is the only one case that I use to exemplify. Reusing defer max
is not flexible and good enough as I've mentioned in the commit.
Please note that the knob is applicable for single or few flows
regardless of what kinds of protocols/tunnels, which brings 6.6%
improvement back that is not trivial/negligible.
You're right about the doc. Will add the following one, thanks.
diff --git a/Documentation/admin-guide/sysctl/net.rst
b/Documentation/admin-guide/sysctl/net.rst
index 0724a793798f..42e06f93306f 100644
--- a/Documentation/admin-guide/sysctl/net.rst
+++ b/Documentation/admin-guide/sysctl/net.rst
@@ -368,6 +368,19 @@ by the cpu which allocated them.
Default: 128
+napi_consume_skb_defer
+----------------------
+When set to 1 (default), napi_consume_skb() defers freeing SKBs whose
+allocation CPU differs from the current CPU via skb_attempt_defer_free().
+This reduces cross-NUMA SLUB spinlock contention for multi-queue workloads.
+
+Setting this to 0 disables the defer path in napi_consume_skb() only,
+allowing SKBs to be returned to the local napi_skb_cache immediately.
+This can benefit single-flow or few-flow workloads (e.g. AF_XDP TX)
+where the defer detour hurts the fast recycle path.
+
+Default: 1
+
optmem_max
----------
Thanks,
Jason
^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-03-27 8:35 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-26 14:42 [PATCH net-next] net: add sysctl to toggle napi_consume_skb() alien skb defer Jason Xing
2026-03-26 14:55 ` Eric Dumazet
2026-03-26 16:21 ` Jason Xing
2026-03-27 4:13 ` Stanislav Fomichev
2026-03-27 8:34 ` Jason Xing
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox