Netdev List
 help / color / mirror / Atom feed
* [PATCH net-next] net: add prefetch() in skb_defer_free_flush()
@ 2025-11-06  8:55 Eric Dumazet
  2025-11-06  9:05 ` Paolo Abeni
  2025-11-08  3:10 ` patchwork-bot+netdevbpf
  0 siblings, 2 replies; 5+ messages in thread
From: Eric Dumazet @ 2025-11-06  8:55 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, netdev, eric.dumazet, Eric Dumazet

skb_defer_free_flush() is becoming more important these days.

Add a prefetch operation to reduce latency a bit on some
platforms like AMD EPYC 7B12.

On more recent cpus, a stall happens when reading skb_shinfo().
Avoiding it will require a more elaborate strategy.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/core/dev.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/core/dev.c b/net/core/dev.c
index 537aa43edff0e4bfedb42593146cfdf7511d8c37..69515edd17bc6a157046f31b3dd343a59ae192ab 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6782,6 +6782,7 @@ static void skb_defer_free_flush(void)
 		free_list = llist_del_all(&sdn->defer_list);
 
 		llist_for_each_entry_safe(skb, next, free_list, ll_node) {
+			prefetch(next);
 			napi_consume_skb(skb, 1);
 		}
 	}
-- 
2.51.2.1026.g39e6a42477-goog


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next] net: add prefetch() in skb_defer_free_flush()
  2025-11-06  8:55 [PATCH net-next] net: add prefetch() in skb_defer_free_flush() Eric Dumazet
@ 2025-11-06  9:05 ` Paolo Abeni
  2025-11-06  9:13   ` Eric Dumazet
  2025-11-08  3:10 ` patchwork-bot+netdevbpf
  1 sibling, 1 reply; 5+ messages in thread
From: Paolo Abeni @ 2025-11-06  9:05 UTC (permalink / raw)
  To: Eric Dumazet, David S . Miller, Jakub Kicinski
  Cc: Simon Horman, netdev, eric.dumazet

On 11/6/25 9:55 AM, Eric Dumazet wrote:
> skb_defer_free_flush() is becoming more important these days.
> 
> Add a prefetch operation to reduce latency a bit on some
> platforms like AMD EPYC 7B12.
> 
> On more recent cpus, a stall happens when reading skb_shinfo().
> Avoiding it will require a more elaborate strategy.

For my education, how do you catch such stalls? looking for specific
perf events? Or just based on cycles spent in a given function/chunk of
code?

> Signed-off-by: Eric Dumazet <edumazet@google.com>

Just to avoid doubts on my thoughts about this patch:

Acked-by: Paolo Abeni <pabeni@redhat.com>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next] net: add prefetch() in skb_defer_free_flush()
  2025-11-06  9:05 ` Paolo Abeni
@ 2025-11-06  9:13   ` Eric Dumazet
  2025-11-06 15:04     ` Paolo Abeni
  0 siblings, 1 reply; 5+ messages in thread
From: Eric Dumazet @ 2025-11-06  9:13 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: David S . Miller, Jakub Kicinski, Simon Horman, netdev,
	eric.dumazet

On Thu, Nov 6, 2025 at 1:05 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On 11/6/25 9:55 AM, Eric Dumazet wrote:
> > skb_defer_free_flush() is becoming more important these days.
> >
> > Add a prefetch operation to reduce latency a bit on some
> > platforms like AMD EPYC 7B12.
> >
> > On more recent cpus, a stall happens when reading skb_shinfo().
> > Avoiding it will require a more elaborate strategy.
>
> For my education, how do you catch such stalls? looking for specific
> perf events? Or just based on cycles spent in a given function/chunk of
> code?

In this case, I was focusing on a NIC driver handling both RX and TX
from a single cpu.

I am using "perf record -g -C one_of_the_hot_cpu sleep 5; perf report
--no-children"

I am working on an issue with napi_complete_skb() which has no NUMA awareness.

With the following WIP series, I can push 115 Mpps UDP packets
(instead of 80Mpps) on IDPF.
I need more tests before pushing it for review, but the prefetch()
from skb_defer_free_flush()
is a no-brainer.


git diff d24e4780d5783b8eecd33aab03bd4efd24703c65..
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 5b4bc8b1c7d5674c19b64f8b15685d74632048fe..7ac5f8aa1235a55db02b40b5a0f51bb3fa53fa03
100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1149,11 +1149,10 @@ void skb_release_head_state(struct sk_buff *skb)
                                skb);

 #endif
+               skb->destructor = NULL;
        }
-#if IS_ENABLED(CONFIG_NF_CONNTRACK)
-       nf_conntrack_put(skb_nfct(skb));
-#endif
-       skb_ext_put(skb);
+       nf_reset_ct(skb);
+       skb_ext_reset(skb);
 }

 /* Free everything but the sk_buff shell. */
@@ -1477,6 +1476,11 @@ void napi_consume_skb(struct sk_buff *skb, int budget)

        DEBUG_NET_WARN_ON_ONCE(!in_softirq());

+       if (skb->alloc_cpu != smp_processor_id() && !skb_shared(skb)) {
+               skb_release_head_state(skb);
+               return skb_attempt_defer_free(skb);
+       }
+
        if (!skb_unref(skb))
                return;



commit df7dacc619117ebab7ea330ccc6390618f04dff3
Author: Eric Dumazet <edumazet@google.com>
Date:   Wed Nov 5 17:02:20 2025 +0000

    net: fix napi_consume_skb() with alien skbs

    There is a lack of NUMA awareness and more generally lack
    of slab caches affinity on TX completion path.

    Modern drivers are using napi_consume_skb(), hoping to cache sk_buff
    in per-cpu caches so that they can be recycled in RX path.

    Only allow this if the skb was allocated on the same cpu,
    otherwise use skb_attempt_defer_free() so that the skb
    is freed on the original cpu.

    This removes contention on SLUB spinlocks and data structures.

    After this patch, I get 40% improvement for an UDP tx workload
    on an AMD EPYC 9B45 (IDPF 200Gbit NIC with 32 TX queues).

    80 Mpps -> 115 Mpps.

    Signed-off-by: Eric Dumazet <edumazet@google.com>

commit 42593ad5f2bed6abd3a6cce3483e2980b114cbd9
Author: Eric Dumazet <edumazet@google.com>
Date:   Wed Nov 5 16:50:29 2025 +0000

    net: allow skb_release_head_state() to be called multiple times

    Currently, only skb dst is cleared (thanks to skb_dst_drop())

    Make sure skb->destructor, conntrack and extensions are cleared.

    Signed-off-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next] net: add prefetch() in skb_defer_free_flush()
  2025-11-06  9:13   ` Eric Dumazet
@ 2025-11-06 15:04     ` Paolo Abeni
  0 siblings, 0 replies; 5+ messages in thread
From: Paolo Abeni @ 2025-11-06 15:04 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Simon Horman, netdev,
	eric.dumazet

On 11/6/25 10:13 AM, Eric Dumazet wrote:
> On Thu, Nov 6, 2025 at 1:05 AM Paolo Abeni <pabeni@redhat.com> wrote:
>>
>> On 11/6/25 9:55 AM, Eric Dumazet wrote:
>>> skb_defer_free_flush() is becoming more important these days.
>>>
>>> Add a prefetch operation to reduce latency a bit on some
>>> platforms like AMD EPYC 7B12.
>>>
>>> On more recent cpus, a stall happens when reading skb_shinfo().
>>> Avoiding it will require a more elaborate strategy.
>>
>> For my education, how do you catch such stalls? looking for specific
>> perf events? Or just based on cycles spent in a given function/chunk of
>> code?
> 
> In this case, I was focusing on a NIC driver handling both RX and TX
> from a single cpu.
> 
> I am using "perf record -g -C one_of_the_hot_cpu sleep 5; perf report
> --no-children"
> 
> I am working on an issue with napi_complete_skb() which has no NUMA awareness.

Many thanks for sharing!
> With the following WIP series, I can push 115 Mpps UDP packets
> (instead of 80Mpps) on IDPF.
> I need more tests before pushing it for review, but the prefetch()
> from skb_defer_free_flush()
> is a no-brainer.

FWIW, the napi_complete_skb() makes sense to me, looking forward to it!

/P


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next] net: add prefetch() in skb_defer_free_flush()
  2025-11-06  8:55 [PATCH net-next] net: add prefetch() in skb_defer_free_flush() Eric Dumazet
  2025-11-06  9:05 ` Paolo Abeni
@ 2025-11-08  3:10 ` patchwork-bot+netdevbpf
  1 sibling, 0 replies; 5+ messages in thread
From: patchwork-bot+netdevbpf @ 2025-11-08  3:10 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: davem, kuba, pabeni, horms, netdev, eric.dumazet

Hello:

This patch was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Thu,  6 Nov 2025 08:55:00 +0000 you wrote:
> skb_defer_free_flush() is becoming more important these days.
> 
> Add a prefetch operation to reduce latency a bit on some
> platforms like AMD EPYC 7B12.
> 
> On more recent cpus, a stall happens when reading skb_shinfo().
> Avoiding it will require a more elaborate strategy.
> 
> [...]

Here is the summary with links:
  - [net-next] net: add prefetch() in skb_defer_free_flush()
    https://git.kernel.org/netdev/net-next/c/fd9557c3606b

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-11-08  3:10 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-06  8:55 [PATCH net-next] net: add prefetch() in skb_defer_free_flush() Eric Dumazet
2025-11-06  9:05 ` Paolo Abeni
2025-11-06  9:13   ` Eric Dumazet
2025-11-06 15:04     ` Paolo Abeni
2025-11-08  3:10 ` patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox