* [PATCH v2] net: cache for same cpu skb_attempt_defer_free
@ 2024-02-13 13:33 Pavel Begunkov
2024-02-13 13:53 ` Eric Dumazet
0 siblings, 1 reply; 7+ messages in thread
From: Pavel Begunkov @ 2024-02-13 13:33 UTC (permalink / raw)
To: netdev, edumazet, davem, dsahern, pabeni, kuba; +Cc: Pavel Begunkov
Optimise skb_attempt_defer_free() executed by the CPU the skb was
allocated on. Instead of __kfree_skb() -> kmem_cache_free() we can
disable softirqs and put the buffer into cpu local caches.
Trying it with a TCP CPU bound ping pong benchmark (i.e. netbench), it
showed a 1% throughput improvement (392.2 -> 396.4 Krps). Cross checking
with profiles, the total CPU share of skb_attempt_defer_free() dropped by
0.6%. Note, I'd expect the win doubled with rx only benchmarks, as the
optimisation is for the receive path, but the test spends >55% of CPU
doing writes.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
v2: remove in_hardirq()
net/core/skbuff.c | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 9b790994da0c..f32f358ef1d8 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -6947,6 +6947,20 @@ void __skb_ext_put(struct skb_ext *ext)
EXPORT_SYMBOL(__skb_ext_put);
#endif /* CONFIG_SKB_EXTENSIONS */
+static void kfree_skb_napi_cache(struct sk_buff *skb)
+{
+ /* if SKB is a clone, don't handle this case */
+ if (skb->fclone != SKB_FCLONE_UNAVAILABLE) {
+ __kfree_skb(skb);
+ return;
+ }
+
+ local_bh_disable();
+ skb_release_all(skb, SKB_DROP_REASON_NOT_SPECIFIED, false);
+ napi_skb_cache_put(skb);
+ local_bh_enable();
+}
+
/**
* skb_attempt_defer_free - queue skb for remote freeing
* @skb: buffer
@@ -6965,7 +6979,7 @@ void skb_attempt_defer_free(struct sk_buff *skb)
if (WARN_ON_ONCE(cpu >= nr_cpu_ids) ||
!cpu_online(cpu) ||
cpu == raw_smp_processor_id()) {
-nodefer: __kfree_skb(skb);
+nodefer: kfree_skb_napi_cache(skb);
return;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 7+ messages in thread* Re: [PATCH v2] net: cache for same cpu skb_attempt_defer_free
2024-02-13 13:33 [PATCH v2] net: cache for same cpu skb_attempt_defer_free Pavel Begunkov
@ 2024-02-13 13:53 ` Eric Dumazet
2024-02-13 14:07 ` Pavel Begunkov
0 siblings, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2024-02-13 13:53 UTC (permalink / raw)
To: Pavel Begunkov; +Cc: netdev, davem, dsahern, pabeni, kuba
On Tue, Feb 13, 2024 at 2:42 PM Pavel Begunkov <asml.silence@gmail.com> wrote:
>
> Optimise skb_attempt_defer_free() executed by the CPU the skb was
> allocated on. Instead of __kfree_skb() -> kmem_cache_free() we can
> disable softirqs and put the buffer into cpu local caches.
>
> Trying it with a TCP CPU bound ping pong benchmark (i.e. netbench), it
> showed a 1% throughput improvement (392.2 -> 396.4 Krps). Cross checking
> with profiles, the total CPU share of skb_attempt_defer_free() dropped by
> 0.6%. Note, I'd expect the win doubled with rx only benchmarks, as the
> optimisation is for the receive path, but the test spends >55% of CPU
> doing writes.
>
> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> ---
>
> v2: remove in_hardirq()
>
> net/core/skbuff.c | 16 +++++++++++++++-
> 1 file changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 9b790994da0c..f32f358ef1d8 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -6947,6 +6947,20 @@ void __skb_ext_put(struct skb_ext *ext)
> EXPORT_SYMBOL(__skb_ext_put);
> #endif /* CONFIG_SKB_EXTENSIONS */
>
> +static void kfree_skb_napi_cache(struct sk_buff *skb)
> +{
> + /* if SKB is a clone, don't handle this case */
> + if (skb->fclone != SKB_FCLONE_UNAVAILABLE) {
> + __kfree_skb(skb);
> + return;
> + }
> +
> + local_bh_disable();
> + skb_release_all(skb, SKB_DROP_REASON_NOT_SPECIFIED, false);
I am trying to understand why we use false instead of true here ?
Or if you prefer:
local_bh_disable();
__napi_kfree_skb(skb, SKB_DROP_REASON_NOT_SPECIFIED);
local_bh_enable();
> + napi_skb_cache_put(skb);
> + local_bh_enable();
> +}
> +
> /**
> * skb_attempt_defer_free - queue skb for remote freeing
> * @skb: buffer
> @@ -6965,7 +6979,7 @@ void skb_attempt_defer_free(struct sk_buff *skb)
> if (WARN_ON_ONCE(cpu >= nr_cpu_ids) ||
> !cpu_online(cpu) ||
> cpu == raw_smp_processor_id()) {
> -nodefer: __kfree_skb(skb);
> +nodefer: kfree_skb_napi_cache(skb);
> return;
> }
>
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [PATCH v2] net: cache for same cpu skb_attempt_defer_free
2024-02-13 13:53 ` Eric Dumazet
@ 2024-02-13 14:07 ` Pavel Begunkov
2024-02-13 14:28 ` Eric Dumazet
2024-02-14 3:13 ` Jakub Kicinski
0 siblings, 2 replies; 7+ messages in thread
From: Pavel Begunkov @ 2024-02-13 14:07 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netdev, davem, dsahern, pabeni, kuba
On 2/13/24 13:53, Eric Dumazet wrote:
> On Tue, Feb 13, 2024 at 2:42 PM Pavel Begunkov <asml.silence@gmail.com> wrote:
>>
>> Optimise skb_attempt_defer_free() executed by the CPU the skb was
>> allocated on. Instead of __kfree_skb() -> kmem_cache_free() we can
>> disable softirqs and put the buffer into cpu local caches.
>>
>> Trying it with a TCP CPU bound ping pong benchmark (i.e. netbench), it
>> showed a 1% throughput improvement (392.2 -> 396.4 Krps). Cross checking
>> with profiles, the total CPU share of skb_attempt_defer_free() dropped by
>> 0.6%. Note, I'd expect the win doubled with rx only benchmarks, as the
>> optimisation is for the receive path, but the test spends >55% of CPU
>> doing writes.
>>
>> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
>> ---
>>
>> v2: remove in_hardirq()
>>
>> net/core/skbuff.c | 16 +++++++++++++++-
>> 1 file changed, 15 insertions(+), 1 deletion(-)
>>
>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>> index 9b790994da0c..f32f358ef1d8 100644
>> --- a/net/core/skbuff.c
>> +++ b/net/core/skbuff.c
>> @@ -6947,6 +6947,20 @@ void __skb_ext_put(struct skb_ext *ext)
>> EXPORT_SYMBOL(__skb_ext_put);
>> #endif /* CONFIG_SKB_EXTENSIONS */
>>
>> +static void kfree_skb_napi_cache(struct sk_buff *skb)
>> +{
>> + /* if SKB is a clone, don't handle this case */
>> + if (skb->fclone != SKB_FCLONE_UNAVAILABLE) {
>> + __kfree_skb(skb);
>> + return;
>> + }
>> +
>> + local_bh_disable();
>> + skb_release_all(skb, SKB_DROP_REASON_NOT_SPECIFIED, false);
>
> I am trying to understand why we use false instead of true here ?
> Or if you prefer:
> local_bh_disable();
> __napi_kfree_skb(skb, SKB_DROP_REASON_NOT_SPECIFIED);
> local_bh_enable();
Maybe it's my misunderstanding but disabled bh != "napi safe",
e.g. the napi_struct we're interested in might be scheduled for
another CPU. Which is also why "napi" prefix in percpu
napi_alloc_cache sounds a bit misleading to me.
The second reason is that it shouldn't change anything
performance wise
napi_pp_put_page(napi_safe) {
...
if (napi_safe || in_softirq()) { ... }
}
>> + napi_skb_cache_put(skb);
>> + local_bh_enable();
>> +}
>> +
>> /**
>> * skb_attempt_defer_free - queue skb for remote freeing
>> * @skb: buffer
>> @@ -6965,7 +6979,7 @@ void skb_attempt_defer_free(struct sk_buff *skb)
>> if (WARN_ON_ONCE(cpu >= nr_cpu_ids) ||
>> !cpu_online(cpu) ||
>> cpu == raw_smp_processor_id()) {
>> -nodefer: __kfree_skb(skb);
>> +nodefer: kfree_skb_napi_cache(skb);
>> return;
>> }
>>
>> --
>> 2.43.0
>>
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [PATCH v2] net: cache for same cpu skb_attempt_defer_free
2024-02-13 14:07 ` Pavel Begunkov
@ 2024-02-13 14:28 ` Eric Dumazet
2024-02-14 3:13 ` Jakub Kicinski
1 sibling, 0 replies; 7+ messages in thread
From: Eric Dumazet @ 2024-02-13 14:28 UTC (permalink / raw)
To: Pavel Begunkov; +Cc: netdev, davem, dsahern, pabeni, kuba
On Tue, Feb 13, 2024 at 3:17 PM Pavel Begunkov <asml.silence@gmail.com> wrote:
>
> On 2/13/24 13:53, Eric Dumazet wrote:
> > On Tue, Feb 13, 2024 at 2:42 PM Pavel Begunkov <asml.silence@gmail.com> wrote:
> >>
> >> Optimise skb_attempt_defer_free() executed by the CPU the skb was
> >> allocated on. Instead of __kfree_skb() -> kmem_cache_free() we can
> >> disable softirqs and put the buffer into cpu local caches.
> >>
> >> Trying it with a TCP CPU bound ping pong benchmark (i.e. netbench), it
> >> showed a 1% throughput improvement (392.2 -> 396.4 Krps). Cross checking
> >> with profiles, the total CPU share of skb_attempt_defer_free() dropped by
> >> 0.6%. Note, I'd expect the win doubled with rx only benchmarks, as the
> >> optimisation is for the receive path, but the test spends >55% of CPU
> >> doing writes.
> >>
> >> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
> >> ---
> >>
> >> v2: remove in_hardirq()
> >>
> >> net/core/skbuff.c | 16 +++++++++++++++-
> >> 1 file changed, 15 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> >> index 9b790994da0c..f32f358ef1d8 100644
> >> --- a/net/core/skbuff.c
> >> +++ b/net/core/skbuff.c
> >> @@ -6947,6 +6947,20 @@ void __skb_ext_put(struct skb_ext *ext)
> >> EXPORT_SYMBOL(__skb_ext_put);
> >> #endif /* CONFIG_SKB_EXTENSIONS */
> >>
> >> +static void kfree_skb_napi_cache(struct sk_buff *skb)
> >> +{
> >> + /* if SKB is a clone, don't handle this case */
> >> + if (skb->fclone != SKB_FCLONE_UNAVAILABLE) {
> >> + __kfree_skb(skb);
> >> + return;
> >> + }
> >> +
> >> + local_bh_disable();
> >> + skb_release_all(skb, SKB_DROP_REASON_NOT_SPECIFIED, false);
> >
> > I am trying to understand why we use false instead of true here ?
> > Or if you prefer:
> > local_bh_disable();
> > __napi_kfree_skb(skb, SKB_DROP_REASON_NOT_SPECIFIED);
> > local_bh_enable();
>
> Maybe it's my misunderstanding but disabled bh != "napi safe",
> e.g. the napi_struct we're interested in might be scheduled for
> another CPU. Which is also why "napi" prefix in percpu
> napi_alloc_cache sounds a bit misleading to me.
Indeed, this is very misleading.
napi_skb_cache_put() & napi_skb_cache_get() should be renamed eventually.
Thanks.
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [PATCH v2] net: cache for same cpu skb_attempt_defer_free
2024-02-13 14:07 ` Pavel Begunkov
2024-02-13 14:28 ` Eric Dumazet
@ 2024-02-14 3:13 ` Jakub Kicinski
2024-02-14 16:37 ` Pavel Begunkov
1 sibling, 1 reply; 7+ messages in thread
From: Jakub Kicinski @ 2024-02-14 3:13 UTC (permalink / raw)
To: Pavel Begunkov; +Cc: Eric Dumazet, netdev, davem, dsahern, pabeni
On Tue, 13 Feb 2024 14:07:24 +0000 Pavel Begunkov wrote:
> >> + local_bh_disable();
> >> + skb_release_all(skb, SKB_DROP_REASON_NOT_SPECIFIED, false);
> >
> > I am trying to understand why we use false instead of true here ?
> > Or if you prefer:
> > local_bh_disable();
> > __napi_kfree_skb(skb, SKB_DROP_REASON_NOT_SPECIFIED);
> > local_bh_enable();
FWIW I had the same reaction. napi_safe = false followed by
napi_skb_cache_put() looks sus. No argument that naming is bad,
not the first time it comes up :(
> Maybe it's my misunderstanding but disabled bh != "napi safe",
> e.g. the napi_struct we're interested in might be scheduled for
> another CPU. Which is also why "napi" prefix in percpu
> napi_alloc_cache sounds a bit misleading to me.
FWIW the skb recycling is called napi_* to hint to driver authors that
if they are in NAPI context this is a better function to call.
The connection to a particular NAPI instance matters only for the page
pool recycling, but that's handled. The conditions you actually
need to look out for are hardware IRQs and whatever async paths which
can trigger trigger while NAPI is half way thru touching the cache of
the local CPU.
> The second reason is that it shouldn't change anything
> performance wise
>
> napi_pp_put_page(napi_safe) {
> ...
> if (napi_safe || in_softirq()) { ... }
> }
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [PATCH v2] net: cache for same cpu skb_attempt_defer_free
2024-02-14 3:13 ` Jakub Kicinski
@ 2024-02-14 16:37 ` Pavel Begunkov
2024-02-14 16:49 ` Jakub Kicinski
0 siblings, 1 reply; 7+ messages in thread
From: Pavel Begunkov @ 2024-02-14 16:37 UTC (permalink / raw)
To: Jakub Kicinski; +Cc: Eric Dumazet, netdev, davem, dsahern, pabeni
On 2/14/24 03:13, Jakub Kicinski wrote:
> On Tue, 13 Feb 2024 14:07:24 +0000 Pavel Begunkov wrote:
>>>> + local_bh_disable();
>>>> + skb_release_all(skb, SKB_DROP_REASON_NOT_SPECIFIED, false);
>>>
>>> I am trying to understand why we use false instead of true here ?
>>> Or if you prefer:
>>> local_bh_disable();
>>> __napi_kfree_skb(skb, SKB_DROP_REASON_NOT_SPECIFIED);
>>> local_bh_enable();
>
> FWIW I had the same reaction. napi_safe = false followed by
> napi_skb_cache_put() looks sus. No argument that naming is bad,
> not the first time it comes up :(
>
>> Maybe it's my misunderstanding but disabled bh != "napi safe",
>> e.g. the napi_struct we're interested in might be scheduled for
>> another CPU. Which is also why "napi" prefix in percpu
>> napi_alloc_cache sounds a bit misleading to me.
>
> FWIW the skb recycling is called napi_* to hint to driver authors that
> if they are in NAPI context this is a better function to call.
Which is absolutely reasonable, napi_skb_cache_put() on the
other hand is rather internal and wouldn't be used by drivers
directly.
I guess I'll just do a little bit of renaming later hopefully after
this patch is taken in, unless there are other comments / objections.
> The connection to a particular NAPI instance matters only for the page
> pool recycling, but that's handled. The conditions you actually
> need to look out for are hardware IRQs and whatever async paths which
> can trigger trigger while NAPI is half way thru touching the cache of
> the local CPU.
Yeah the usual percpu (bh protected) caching stuff and likes, and
the question is rather about the semantics.
To draw some conclusion, do you suggest to change anything
in the patch?
>> The second reason is that it shouldn't change anything
>> performance wise
>>
>> napi_pp_put_page(napi_safe) {
>> ...
>> if (napi_safe || in_softirq()) { ... }
>> }
--
Pavel Begunkov
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-02-14 16:50 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-13 13:33 [PATCH v2] net: cache for same cpu skb_attempt_defer_free Pavel Begunkov
2024-02-13 13:53 ` Eric Dumazet
2024-02-13 14:07 ` Pavel Begunkov
2024-02-13 14:28 ` Eric Dumazet
2024-02-14 3:13 ` Jakub Kicinski
2024-02-14 16:37 ` Pavel Begunkov
2024-02-14 16:49 ` Jakub Kicinski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).