* [PATCH net] net: use SLAB_NO_MERGE for kmem_cache skbuff_head_cache
@ 2023-08-15 15:17 Jesper Dangaard Brouer
2023-08-15 15:53 ` Matthew Wilcox
` (4 more replies)
0 siblings, 5 replies; 9+ messages in thread
From: Jesper Dangaard Brouer @ 2023-08-15 15:17 UTC (permalink / raw)
To: netdev, vbabka
Cc: Jesper Dangaard Brouer, Eric Dumazet, David S. Miller,
Jakub Kicinski, Paolo Abeni, linux-mm, Andrew Morton, Mel Gorman,
Christoph Lameter, roman.gushchin, dsterba
Since v6.5-rc1 MM-tree is merged and contains a new flag SLAB_NO_MERGE
in commit d0bf7d5759c1 ("mm/slab: introduce kmem_cache flag SLAB_NO_MERGE")
now is the time to use this flag for networking as proposed
earlier see link.
The SKB (sk_buff) kmem_cache slab is critical for network performance.
Network stack uses kmem_cache_{alloc,free}_bulk APIs to gain
performance by amortising the alloc/free cost.
For the bulk API to perform efficiently the slub fragmentation need to
be low. Especially for the SLUB allocator, the efficiency of bulk free
API depend on objects belonging to the same slab (page).
When running different network performance microbenchmarks, I started
to notice that performance was reduced (slightly) when machines had
longer uptimes. I believe the cause was 'skbuff_head_cache' got
aliased/merged into the general slub for 256 bytes sized objects (with
my kernel config, without CONFIG_HARDENED_USERCOPY).
For SKB kmem_cache network stack have other various reasons for
not merging, but it varies depending on kernel config (e.g.
CONFIG_HARDENED_USERCOPY). We want to explicitly set SLAB_NO_MERGE
for this kmem_cache to get most out of kmem_cache_{alloc,free}_bulk APIs.
When CONFIG_SLUB_TINY is configured the bulk APIs are essentially
disabled. Thus, for this case drop the SLAB_NO_MERGE flag.
Link: https://lore.kernel.org/all/167396280045.539803.7540459812377220500.stgit@firesoul/
Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org>
---
net/core/skbuff.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index a298992060e6..92aee3e0376a 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -4750,12 +4750,23 @@ static void skb_extensions_init(void)
static void skb_extensions_init(void) {}
#endif
+/* The SKB kmem_cache slab is critical for network performance. Never
+ * merge/alias the slab with similar sized objects. This avoids fragmentation
+ * that hurts performance of kmem_cache_{alloc,free}_bulk APIs.
+ */
+#ifndef CONFIG_SLUB_TINY
+#define FLAG_SKB_NO_MERGE SLAB_NO_MERGE
+#else /* CONFIG_SLUB_TINY - simple loop in kmem_cache_alloc_bulk */
+#define FLAG_SKB_NO_MERGE 0
+#endif
+
void __init skb_init(void)
{
skbuff_cache = kmem_cache_create_usercopy("skbuff_head_cache",
sizeof(struct sk_buff),
0,
- SLAB_HWCACHE_ALIGN|SLAB_PANIC,
+ SLAB_HWCACHE_ALIGN|SLAB_PANIC|
+ FLAG_SKB_NO_MERGE,
offsetof(struct sk_buff, cb),
sizeof_field(struct sk_buff, cb),
NULL);
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH net] net: use SLAB_NO_MERGE for kmem_cache skbuff_head_cache
2023-08-15 15:17 [PATCH net] net: use SLAB_NO_MERGE for kmem_cache skbuff_head_cache Jesper Dangaard Brouer
@ 2023-08-15 15:53 ` Matthew Wilcox
2023-08-18 12:32 ` Jesper Dangaard Brouer
2023-08-18 15:15 ` Vlastimil Babka
` (3 subsequent siblings)
4 siblings, 1 reply; 9+ messages in thread
From: Matthew Wilcox @ 2023-08-15 15:53 UTC (permalink / raw)
To: Jesper Dangaard Brouer
Cc: netdev, vbabka, Eric Dumazet, David S. Miller, Jakub Kicinski,
Paolo Abeni, linux-mm, Andrew Morton, Mel Gorman,
Christoph Lameter, roman.gushchin, dsterba
On Tue, Aug 15, 2023 at 05:17:36PM +0200, Jesper Dangaard Brouer wrote:
> For the bulk API to perform efficiently the slub fragmentation need to
> be low. Especially for the SLUB allocator, the efficiency of bulk free
> API depend on objects belonging to the same slab (page).
Hey Jesper,
You probably haven't seen this patch series from Vlastimil:
https://lore.kernel.org/linux-mm/20230810163627.6206-9-vbabka@suse.cz/
I wonder if you'd like to give it a try? It should provide some immunity
to this problem, and might even be faster than the current approach.
If it isn't, it'd be good to understand why, and if it could be improved.
No objection to this patch going in for now, of course.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net] net: use SLAB_NO_MERGE for kmem_cache skbuff_head_cache
2023-08-15 15:53 ` Matthew Wilcox
@ 2023-08-18 12:32 ` Jesper Dangaard Brouer
2023-08-18 15:20 ` Vlastimil Babka
0 siblings, 1 reply; 9+ messages in thread
From: Jesper Dangaard Brouer @ 2023-08-18 12:32 UTC (permalink / raw)
To: Matthew Wilcox, Jesper Dangaard Brouer, Vlastimil Babka
Cc: brouer, netdev, vbabka, Eric Dumazet, David S. Miller,
Jakub Kicinski, Paolo Abeni, linux-mm, Andrew Morton, Mel Gorman,
Christoph Lameter, roman.gushchin, dsterba
On 15/08/2023 17.53, Matthew Wilcox wrote:
> On Tue, Aug 15, 2023 at 05:17:36PM +0200, Jesper Dangaard Brouer wrote:
>> For the bulk API to perform efficiently the slub fragmentation need to
>> be low. Especially for the SLUB allocator, the efficiency of bulk free
>> API depend on objects belonging to the same slab (page).
>
> Hey Jesper,
>
> You probably haven't seen this patch series from Vlastimil:
>
> https://lore.kernel.org/linux-mm/20230810163627.6206-9-vbabka@suse.cz/
>
> I wonder if you'd like to give it a try? It should provide some immunity
> to this problem, and might even be faster than the current approach.
> If it isn't, it'd be good to understand why, and if it could be improved.
I took a quick look at:
-
https://lore.kernel.org/linux-mm/20230810163627.6206-11-vbabka@suse.cz/#Z31mm:slub.c
To Vlastimil, sorry but I don't think this approach with spin_lock will
be faster than SLUB's normal fast-path using this_cpu_cmpxchg.
My experience is that SLUB this_cpu_cmpxchg trick is faster than spin_lock.
On my testlab CPU E5-1650 v4 @ 3.60GHz:
- spin_lock+unlock : 34 cycles(tsc) 9.485 ns
- this_cpu_cmpxchg : 5 cycles(tsc) 1.585 ns
- locked cmpxchg : 18 cycles(tsc) 5.006 ns
SLUB does use a cmpxchg_double which I don't have a microbench for.
> No objection to this patch going in for now, of course.
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net] net: use SLAB_NO_MERGE for kmem_cache skbuff_head_cache
2023-08-15 15:17 [PATCH net] net: use SLAB_NO_MERGE for kmem_cache skbuff_head_cache Jesper Dangaard Brouer
2023-08-15 15:53 ` Matthew Wilcox
@ 2023-08-18 15:15 ` Vlastimil Babka
2023-08-18 16:26 ` Jakub Kicinski
` (2 subsequent siblings)
4 siblings, 0 replies; 9+ messages in thread
From: Vlastimil Babka @ 2023-08-18 15:15 UTC (permalink / raw)
To: Jesper Dangaard Brouer, netdev
Cc: Eric Dumazet, David S. Miller, Jakub Kicinski, Paolo Abeni,
linux-mm, Andrew Morton, Mel Gorman, Christoph Lameter,
roman.gushchin, dsterba
On 8/15/23 17:17, Jesper Dangaard Brouer wrote:
> Since v6.5-rc1 MM-tree is merged and contains a new flag SLAB_NO_MERGE
> in commit d0bf7d5759c1 ("mm/slab: introduce kmem_cache flag SLAB_NO_MERGE")
> now is the time to use this flag for networking as proposed
> earlier see link.
>
> The SKB (sk_buff) kmem_cache slab is critical for network performance.
> Network stack uses kmem_cache_{alloc,free}_bulk APIs to gain
> performance by amortising the alloc/free cost.
>
> For the bulk API to perform efficiently the slub fragmentation need to
> be low. Especially for the SLUB allocator, the efficiency of bulk free
> API depend on objects belonging to the same slab (page).
>
> When running different network performance microbenchmarks, I started
> to notice that performance was reduced (slightly) when machines had
> longer uptimes. I believe the cause was 'skbuff_head_cache' got
> aliased/merged into the general slub for 256 bytes sized objects (with
> my kernel config, without CONFIG_HARDENED_USERCOPY).
>
> For SKB kmem_cache network stack have other various reasons for
> not merging, but it varies depending on kernel config (e.g.
> CONFIG_HARDENED_USERCOPY). We want to explicitly set SLAB_NO_MERGE
> for this kmem_cache to get most out of kmem_cache_{alloc,free}_bulk APIs.
>
> When CONFIG_SLUB_TINY is configured the bulk APIs are essentially
> disabled. Thus, for this case drop the SLAB_NO_MERGE flag.
>
> Link: https://lore.kernel.org/all/167396280045.539803.7540459812377220500.stgit@firesoul/
> Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
> ---
> net/core/skbuff.c | 13 ++++++++++++-
> 1 file changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index a298992060e6..92aee3e0376a 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -4750,12 +4750,23 @@ static void skb_extensions_init(void)
> static void skb_extensions_init(void) {}
> #endif
>
> +/* The SKB kmem_cache slab is critical for network performance. Never
> + * merge/alias the slab with similar sized objects. This avoids fragmentation
> + * that hurts performance of kmem_cache_{alloc,free}_bulk APIs.
> + */
> +#ifndef CONFIG_SLUB_TINY
> +#define FLAG_SKB_NO_MERGE SLAB_NO_MERGE
> +#else /* CONFIG_SLUB_TINY - simple loop in kmem_cache_alloc_bulk */
> +#define FLAG_SKB_NO_MERGE 0
> +#endif
> +
> void __init skb_init(void)
> {
> skbuff_cache = kmem_cache_create_usercopy("skbuff_head_cache",
> sizeof(struct sk_buff),
> 0,
> - SLAB_HWCACHE_ALIGN|SLAB_PANIC,
> + SLAB_HWCACHE_ALIGN|SLAB_PANIC|
> + FLAG_SKB_NO_MERGE,
> offsetof(struct sk_buff, cb),
> sizeof_field(struct sk_buff, cb),
> NULL);
>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net] net: use SLAB_NO_MERGE for kmem_cache skbuff_head_cache
2023-08-18 12:32 ` Jesper Dangaard Brouer
@ 2023-08-18 15:20 ` Vlastimil Babka
0 siblings, 0 replies; 9+ messages in thread
From: Vlastimil Babka @ 2023-08-18 15:20 UTC (permalink / raw)
To: Jesper Dangaard Brouer, Matthew Wilcox, Jesper Dangaard Brouer
Cc: brouer, netdev, Eric Dumazet, David S. Miller, Jakub Kicinski,
Paolo Abeni, linux-mm, Andrew Morton, Mel Gorman,
Christoph Lameter, roman.gushchin, dsterba
On 8/18/23 14:32, Jesper Dangaard Brouer wrote:
>
>
> On 15/08/2023 17.53, Matthew Wilcox wrote:
>> On Tue, Aug 15, 2023 at 05:17:36PM +0200, Jesper Dangaard Brouer wrote:
>>> For the bulk API to perform efficiently the slub fragmentation need to
>>> be low. Especially for the SLUB allocator, the efficiency of bulk free
>>> API depend on objects belonging to the same slab (page).
>>
>> Hey Jesper,
>>
>> You probably haven't seen this patch series from Vlastimil:
>>
>> https://lore.kernel.org/linux-mm/20230810163627.6206-9-vbabka@suse.cz/
>>
>> I wonder if you'd like to give it a try? It should provide some immunity
>> to this problem, and might even be faster than the current approach.
>> If it isn't, it'd be good to understand why, and if it could be improved.
I didn't Cc Jesper on that yet, as the initial attempt was focused on
the maple tree nodes use case. But you'll notice using the percpu array
requires the cache to be created with SLAB_NO_MERGE anyway, so this
patch would be still necessary :)
> I took a quick look at:
> -
> https://lore.kernel.org/linux-mm/20230810163627.6206-11-vbabka@suse.cz/#Z31mm:slub.c
>
> To Vlastimil, sorry but I don't think this approach with spin_lock will
> be faster than SLUB's normal fast-path using this_cpu_cmpxchg.
>
> My experience is that SLUB this_cpu_cmpxchg trick is faster than spin_lock.
>
> On my testlab CPU E5-1650 v4 @ 3.60GHz:
> - spin_lock+unlock : 34 cycles(tsc) 9.485 ns
> - this_cpu_cmpxchg : 5 cycles(tsc) 1.585 ns
> - locked cmpxchg : 18 cycles(tsc) 5.006 ns
Hm that's unexpected difference between spin_lock+unlock where AFAIK
spin_lock is basically a locked cmpxchg and unlock a simple write, and I
assume these measurements are on uncontended lock?
> SLUB does use a cmpxchg_double which I don't have a microbench for.
Yeah it's possible the _double will be slower. Yeah the locking will
have to be considered more thoroughly for the percpu array.
>> No objection to this patch going in for now, of course.
>>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net] net: use SLAB_NO_MERGE for kmem_cache skbuff_head_cache
2023-08-15 15:17 [PATCH net] net: use SLAB_NO_MERGE for kmem_cache skbuff_head_cache Jesper Dangaard Brouer
2023-08-15 15:53 ` Matthew Wilcox
2023-08-18 15:15 ` Vlastimil Babka
@ 2023-08-18 16:26 ` Jakub Kicinski
2023-08-18 19:59 ` Jesper Dangaard Brouer
2023-08-18 22:20 ` patchwork-bot+netdevbpf
2023-08-21 13:55 ` Alexander Lobakin
4 siblings, 1 reply; 9+ messages in thread
From: Jakub Kicinski @ 2023-08-18 16:26 UTC (permalink / raw)
To: Jesper Dangaard Brouer
Cc: netdev, vbabka, Eric Dumazet, David S. Miller, Paolo Abeni,
linux-mm, Andrew Morton, Mel Gorman, Christoph Lameter,
roman.gushchin, dsterba
On Tue, 15 Aug 2023 17:17:36 +0200 Jesper Dangaard Brouer wrote:
> Subject: [PATCH net] net: use SLAB_NO_MERGE for kmem_cache skbuff_head_cache
Has this been a problem "forever"?
We're getting late in the release cycle for non-regression & non-crash
changes, even if small regression potential..
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net] net: use SLAB_NO_MERGE for kmem_cache skbuff_head_cache
2023-08-18 16:26 ` Jakub Kicinski
@ 2023-08-18 19:59 ` Jesper Dangaard Brouer
0 siblings, 0 replies; 9+ messages in thread
From: Jesper Dangaard Brouer @ 2023-08-18 19:59 UTC (permalink / raw)
To: Jakub Kicinski, Jesper Dangaard Brouer
Cc: brouer, netdev, vbabka, Eric Dumazet, David S. Miller,
Paolo Abeni, linux-mm, Andrew Morton, Mel Gorman,
Christoph Lameter, roman.gushchin, dsterba
On 18/08/2023 18.26, Jakub Kicinski wrote:
> On Tue, 15 Aug 2023 17:17:36 +0200 Jesper Dangaard Brouer wrote:
>> Subject: [PATCH net] net: use SLAB_NO_MERGE for kmem_cache skbuff_head_cache
>
> Has this been a problem "forever"?
> We're getting late in the release cycle for non-regression & non-crash
> changes, even if small regression potential..
>
This patch is not dangerous ;-)
SKB slab/kmem_cache is already almost always "no_merge" (due to the
flags it uses), but certain kernel .config combinations can cause it to
be merged (with other 256 bytes slabs). I happen to hit this config
combination in my testlab, and noticed the problem slight performance
impact. If anything this patch makes it more consistent.
--Jesper
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net] net: use SLAB_NO_MERGE for kmem_cache skbuff_head_cache
2023-08-15 15:17 [PATCH net] net: use SLAB_NO_MERGE for kmem_cache skbuff_head_cache Jesper Dangaard Brouer
` (2 preceding siblings ...)
2023-08-18 16:26 ` Jakub Kicinski
@ 2023-08-18 22:20 ` patchwork-bot+netdevbpf
2023-08-21 13:55 ` Alexander Lobakin
4 siblings, 0 replies; 9+ messages in thread
From: patchwork-bot+netdevbpf @ 2023-08-18 22:20 UTC (permalink / raw)
To: Jesper Dangaard Brouer
Cc: netdev, vbabka, eric.dumazet, davem, kuba, pabeni, linux-mm, akpm,
mgorman, cl, roman.gushchin, dsterba
Hello:
This patch was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Tue, 15 Aug 2023 17:17:36 +0200 you wrote:
> Since v6.5-rc1 MM-tree is merged and contains a new flag SLAB_NO_MERGE
> in commit d0bf7d5759c1 ("mm/slab: introduce kmem_cache flag SLAB_NO_MERGE")
> now is the time to use this flag for networking as proposed
> earlier see link.
>
> The SKB (sk_buff) kmem_cache slab is critical for network performance.
> Network stack uses kmem_cache_{alloc,free}_bulk APIs to gain
> performance by amortising the alloc/free cost.
>
> [...]
Here is the summary with links:
- [net] net: use SLAB_NO_MERGE for kmem_cache skbuff_head_cache
https://git.kernel.org/netdev/net-next/c/0a0643164da4
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net] net: use SLAB_NO_MERGE for kmem_cache skbuff_head_cache
2023-08-15 15:17 [PATCH net] net: use SLAB_NO_MERGE for kmem_cache skbuff_head_cache Jesper Dangaard Brouer
` (3 preceding siblings ...)
2023-08-18 22:20 ` patchwork-bot+netdevbpf
@ 2023-08-21 13:55 ` Alexander Lobakin
4 siblings, 0 replies; 9+ messages in thread
From: Alexander Lobakin @ 2023-08-21 13:55 UTC (permalink / raw)
To: Jesper Dangaard Brouer, netdev, vbabka
Cc: Eric Dumazet, David S. Miller, Jakub Kicinski, Paolo Abeni,
linux-mm, Andrew Morton, Mel Gorman, Christoph Lameter,
roman.gushchin, dsterba
From: Jesper Dangaard Brouer <hawk@kernel.org>
Date: Tue, 15 Aug 2023 17:17:36 +0200
> Since v6.5-rc1 MM-tree is merged and contains a new flag SLAB_NO_MERGE
> in commit d0bf7d5759c1 ("mm/slab: introduce kmem_cache flag SLAB_NO_MERGE")
> now is the time to use this flag for networking as proposed
> earlier see link.
>
> The SKB (sk_buff) kmem_cache slab is critical for network performance.
> Network stack uses kmem_cache_{alloc,free}_bulk APIs to gain
> performance by amortising the alloc/free cost.
>
> For the bulk API to perform efficiently the slub fragmentation need to
> be low. Especially for the SLUB allocator, the efficiency of bulk free
> API depend on objects belonging to the same slab (page).
>
> When running different network performance microbenchmarks, I started
> to notice that performance was reduced (slightly) when machines had
> longer uptimes. I believe the cause was 'skbuff_head_cache' got
> aliased/merged into the general slub for 256 bytes sized objects (with
> my kernel config, without CONFIG_HARDENED_USERCOPY).
>
> For SKB kmem_cache network stack have other various reasons for
> not merging, but it varies depending on kernel config (e.g.
> CONFIG_HARDENED_USERCOPY). We want to explicitly set SLAB_NO_MERGE
> for this kmem_cache to get most out of kmem_cache_{alloc,free}_bulk APIs.
>
> When CONFIG_SLUB_TINY is configured the bulk APIs are essentially
> disabled. Thus, for this case drop the SLAB_NO_MERGE flag.
>
> Link: https://lore.kernel.org/all/167396280045.539803.7540459812377220500.stgit@firesoul/
> Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org>
> ---
> net/core/skbuff.c | 13 ++++++++++++-
> 1 file changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index a298992060e6..92aee3e0376a 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -4750,12 +4750,23 @@ static void skb_extensions_init(void)
> static void skb_extensions_init(void) {}
> #endif
>
> +/* The SKB kmem_cache slab is critical for network performance. Never
> + * merge/alias the slab with similar sized objects. This avoids fragmentation
> + * that hurts performance of kmem_cache_{alloc,free}_bulk APIs.
> + */
> +#ifndef CONFIG_SLUB_TINY
> +#define FLAG_SKB_NO_MERGE SLAB_NO_MERGE
> +#else /* CONFIG_SLUB_TINY - simple loop in kmem_cache_alloc_bulk */
> +#define FLAG_SKB_NO_MERGE 0
> +#endif
> +
> void __init skb_init(void)
> {
> skbuff_cache = kmem_cache_create_usercopy("skbuff_head_cache",
> sizeof(struct sk_buff),
> 0,
> - SLAB_HWCACHE_ALIGN|SLAB_PANIC,
> + SLAB_HWCACHE_ALIGN|SLAB_PANIC|
> + FLAG_SKB_NO_MERGE,
That alignment tho xD
> offsetof(struct sk_buff, cb),
> sizeof_field(struct sk_buff, cb),
> NULL);
Thanks,
Olek
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2023-08-21 13:57 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-15 15:17 [PATCH net] net: use SLAB_NO_MERGE for kmem_cache skbuff_head_cache Jesper Dangaard Brouer
2023-08-15 15:53 ` Matthew Wilcox
2023-08-18 12:32 ` Jesper Dangaard Brouer
2023-08-18 15:20 ` Vlastimil Babka
2023-08-18 15:15 ` Vlastimil Babka
2023-08-18 16:26 ` Jakub Kicinski
2023-08-18 19:59 ` Jesper Dangaard Brouer
2023-08-18 22:20 ` patchwork-bot+netdevbpf
2023-08-21 13:55 ` Alexander Lobakin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).