From: Eric Dumazet <edumazet@google.com>
To: "David S . Miller" <davem@davemloft.net>,
Jakub Kicinski <kuba@kernel.org>,
Paolo Abeni <pabeni@redhat.com>
Cc: Simon Horman <horms@kernel.org>,
Kuniyuki Iwashima <kuniyu@google.com>,
Willem de Bruijn <willemb@google.com>,
netdev@vger.kernel.org, eric.dumazet@gmail.com,
Eric Dumazet <edumazet@google.com>
Subject: [PATCH net-next 2/3] net: fix napi_consume_skb() with alien skbs
Date: Thu, 6 Nov 2025 20:29:34 +0000 [thread overview]
Message-ID: <20251106202935.1776179-3-edumazet@google.com> (raw)
In-Reply-To: <20251106202935.1776179-1-edumazet@google.com>
There is a lack of NUMA awareness and more generally lack
of slab caches affinity on TX completion path.
Modern drivers are using napi_consume_skb(), hoping to cache sk_buff
in per-cpu caches so that they can be recycled in RX path.
Only use this if the skb was allocated on the same cpu,
otherwise use skb_attempt_defer_free() so that the skb
is freed on the original cpu.
This removes contention on SLUB spinlocks and data structures.
After this patch, I get ~50% improvement for an UDP tx workload
on an AMD EPYC 9B45 (IDPF 200Gbit NIC with 32 TX queues).
80 Mpps -> 120 Mpps.
Profiling one of the 32 cpus servicing NIC interrupts :
Before:
mpstat -P 511 1 1
Average: CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
Average: 511 0.00 0.00 0.00 0.00 0.00 98.00 0.00 0.00 0.00 2.00
31.01% ksoftirqd/511 [kernel.kallsyms] [k] queued_spin_lock_slowpath
12.45% swapper [kernel.kallsyms] [k] queued_spin_lock_slowpath
5.60% ksoftirqd/511 [kernel.kallsyms] [k] __slab_free
3.31% ksoftirqd/511 [kernel.kallsyms] [k] idpf_tx_clean_buf_ring
3.27% ksoftirqd/511 [kernel.kallsyms] [k] idpf_tx_splitq_clean_all
2.95% ksoftirqd/511 [kernel.kallsyms] [k] idpf_tx_splitq_start
2.52% ksoftirqd/511 [kernel.kallsyms] [k] fq_dequeue
2.32% ksoftirqd/511 [kernel.kallsyms] [k] read_tsc
2.25% ksoftirqd/511 [kernel.kallsyms] [k] build_detached_freelist
2.15% ksoftirqd/511 [kernel.kallsyms] [k] kmem_cache_free
2.11% swapper [kernel.kallsyms] [k] __slab_free
2.06% ksoftirqd/511 [kernel.kallsyms] [k] idpf_features_check
2.01% ksoftirqd/511 [kernel.kallsyms] [k] idpf_tx_splitq_clean_hdr
1.97% ksoftirqd/511 [kernel.kallsyms] [k] skb_release_data
1.52% ksoftirqd/511 [kernel.kallsyms] [k] sock_wfree
1.34% swapper [kernel.kallsyms] [k] idpf_tx_clean_buf_ring
1.23% swapper [kernel.kallsyms] [k] idpf_tx_splitq_clean_all
1.15% ksoftirqd/511 [kernel.kallsyms] [k] dma_unmap_page_attrs
1.11% swapper [kernel.kallsyms] [k] idpf_tx_splitq_start
1.03% swapper [kernel.kallsyms] [k] fq_dequeue
0.94% swapper [kernel.kallsyms] [k] kmem_cache_free
0.93% swapper [kernel.kallsyms] [k] read_tsc
0.81% ksoftirqd/511 [kernel.kallsyms] [k] napi_consume_skb
0.79% swapper [kernel.kallsyms] [k] idpf_tx_splitq_clean_hdr
0.77% ksoftirqd/511 [kernel.kallsyms] [k] skb_free_head
0.76% swapper [kernel.kallsyms] [k] idpf_features_check
0.72% swapper [kernel.kallsyms] [k] skb_release_data
0.69% swapper [kernel.kallsyms] [k] build_detached_freelist
0.58% ksoftirqd/511 [kernel.kallsyms] [k] skb_release_head_state
0.56% ksoftirqd/511 [kernel.kallsyms] [k] __put_partials
0.55% ksoftirqd/511 [kernel.kallsyms] [k] kmem_cache_free_bulk
0.48% swapper [kernel.kallsyms] [k] sock_wfree
After:
mpstat -P 511 1 1
Average: CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
Average: 511 0.00 0.00 0.00 0.00 0.00 51.49 0.00 0.00 0.00 48.51
19.10% swapper [kernel.kallsyms] [k] idpf_tx_splitq_clean_hdr
13.86% swapper [kernel.kallsyms] [k] idpf_tx_clean_buf_ring
10.80% swapper [kernel.kallsyms] [k] skb_attempt_defer_free
10.57% swapper [kernel.kallsyms] [k] idpf_tx_splitq_clean_all
7.18% swapper [kernel.kallsyms] [k] queued_spin_lock_slowpath
6.69% swapper [kernel.kallsyms] [k] sock_wfree
5.55% swapper [kernel.kallsyms] [k] dma_unmap_page_attrs
3.10% swapper [kernel.kallsyms] [k] fq_dequeue
3.00% swapper [kernel.kallsyms] [k] skb_release_head_state
2.73% swapper [kernel.kallsyms] [k] read_tsc
2.48% swapper [kernel.kallsyms] [k] idpf_tx_splitq_start
1.20% swapper [kernel.kallsyms] [k] idpf_features_check
1.13% swapper [kernel.kallsyms] [k] napi_consume_skb
0.93% swapper [kernel.kallsyms] [k] idpf_vport_splitq_napi_poll
0.64% swapper [kernel.kallsyms] [k] native_send_call_func_single_ipi
0.60% swapper [kernel.kallsyms] [k] acpi_processor_ffh_cstate_enter
0.53% swapper [kernel.kallsyms] [k] io_idle
0.43% swapper [kernel.kallsyms] [k] netif_skb_features
0.41% swapper [kernel.kallsyms] [k] __direct_call_cpuidle_state_enter2
0.40% swapper [kernel.kallsyms] [k] native_irq_return_iret
0.40% swapper [kernel.kallsyms] [k] idpf_tx_buf_hw_update
0.36% swapper [kernel.kallsyms] [k] sched_clock_noinstr
0.34% swapper [kernel.kallsyms] [k] handle_softirqs
0.32% swapper [kernel.kallsyms] [k] net_rx_action
0.32% swapper [kernel.kallsyms] [k] dql_completed
0.32% swapper [kernel.kallsyms] [k] validate_xmit_skb
0.31% swapper [kernel.kallsyms] [k] skb_network_protocol
0.29% swapper [kernel.kallsyms] [k] skb_csum_hwoffload_help
0.29% swapper [kernel.kallsyms] [k] x2apic_send_IPI
0.28% swapper [kernel.kallsyms] [k] ktime_get
0.24% swapper [kernel.kallsyms] [k] __qdisc_run
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/core/skbuff.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index eeddb9e737ff28e47c77739db7b25ea68e5aa735..7ac5f8aa1235a55db02b40b5a0f51bb3fa53fa03 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1476,6 +1476,11 @@ void napi_consume_skb(struct sk_buff *skb, int budget)
DEBUG_NET_WARN_ON_ONCE(!in_softirq());
+ if (skb->alloc_cpu != smp_processor_id() && !skb_shared(skb)) {
+ skb_release_head_state(skb);
+ return skb_attempt_defer_free(skb);
+ }
+
if (!skb_unref(skb))
return;
--
2.51.2.1041.gc1ab5b90ca-goog
next prev parent reply other threads:[~2025-11-06 20:29 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-06 20:29 [PATCH net-next 0/3] net: use skb_attempt_defer_free() in napi_consume_skb() Eric Dumazet
2025-11-06 20:29 ` [PATCH net-next 1/3] net: allow skb_release_head_state() to be called multiple times Eric Dumazet
2025-11-07 6:42 ` Kuniyuki Iwashima
2025-11-07 11:23 ` Toke Høiland-Jørgensen
2025-11-06 20:29 ` Eric Dumazet [this message]
2025-11-07 6:46 ` [PATCH net-next 2/3] net: fix napi_consume_skb() with alien skbs Kuniyuki Iwashima
2025-11-07 11:23 ` Toke Høiland-Jørgensen
2025-11-07 12:28 ` Eric Dumazet
2025-11-07 14:34 ` Toke Høiland-Jørgensen
2025-11-07 15:44 ` Jason Xing
2025-11-11 16:56 ` Aditya Garg
2025-11-11 17:17 ` Eric Dumazet
2025-11-12 5:14 ` Aditya Garg
2025-11-12 14:03 ` Jon Hunter
2025-11-12 14:08 ` Eric Dumazet
2025-11-12 15:26 ` Jon Hunter
2025-11-12 15:32 ` Eric Dumazet
2025-11-06 20:29 ` [PATCH net-next 3/3] net: increase skb_defer_max default to 128 Eric Dumazet
2025-11-07 6:47 ` Kuniyuki Iwashima
2025-11-07 11:23 ` Toke Høiland-Jørgensen
2025-11-07 15:37 ` Jason Xing
2025-11-07 15:47 ` Eric Dumazet
2025-11-07 15:49 ` Jason Xing
2025-11-07 16:00 ` Eric Dumazet
2025-11-07 16:03 ` Jason Xing
2025-11-07 16:08 ` Eric Dumazet
2025-11-07 16:20 ` Jason Xing
2025-11-07 16:26 ` Eric Dumazet
2025-11-07 16:59 ` Jason Xing
2025-11-08 3:10 ` [PATCH net-next 0/3] net: use skb_attempt_defer_free() in napi_consume_skb() patchwork-bot+netdevbpf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251106202935.1776179-3-edumazet@google.com \
--to=edumazet@google.com \
--cc=davem@davemloft.net \
--cc=eric.dumazet@gmail.com \
--cc=horms@kernel.org \
--cc=kuba@kernel.org \
--cc=kuniyu@google.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=willemb@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).