* [PATCH 1/2] mm: slub: allocate slab object extensions non-contiguously
@ 2025-05-20 12:25 Usama Arif
2025-05-20 12:25 ` [PATCH 2/2] mm: slub: only warn once when allocating slab obj extensions fails Usama Arif
` (2 more replies)
0 siblings, 3 replies; 26+ messages in thread
From: Usama Arif @ 2025-05-20 12:25 UTC (permalink / raw)
To: Andrew Morton, surenb
Cc: hannes, shakeel.butt, vlad.wing, linux-mm, kent.overstreet,
linux-kernel, kernel-team, Usama Arif
When memory allocation profiling is running on memory bound services,
allocations greater than order 0 for slab object extensions can fail,
for e.g. zs_handle zswap slab which will be 512 objsperslab x 16 bytes
per slabobj_ext (order 1 allocation). Use kvcalloc to improve chances
of the allocation being successful.
Signed-off-by: Usama Arif <usamaarif642@gmail.com>
Reported-by: Vlad Poenaru <vlad.wing@gmail.com>
Closes: https://lore.kernel.org/all/17fab2d6-5a74-4573-bcc3-b75951508f0a@gmail.com/
---
mm/slub.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/slub.c b/mm/slub.c
index dc9e729e1d26..bf43c403ead2 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1989,7 +1989,7 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
gfp &= ~OBJCGS_CLEAR_MASK;
/* Prevent recursive extension vector allocation */
gfp |= __GFP_NO_OBJ_EXT;
- vec = kcalloc_node(objects, sizeof(struct slabobj_ext), gfp,
+ vec = kvcalloc_node(objects, sizeof(struct slabobj_ext), gfp,
slab_nid(slab));
if (!vec) {
/* Mark vectors which failed to allocate */
--
2.47.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [PATCH 2/2] mm: slub: only warn once when allocating slab obj extensions fails
2025-05-20 12:25 [PATCH 1/2] mm: slub: allocate slab object extensions non-contiguously Usama Arif
@ 2025-05-20 12:25 ` Usama Arif
2025-05-20 13:34 ` Harry Yoo
2025-05-20 13:44 ` [PATCH 1/2] mm: slub: allocate slab object extensions non-contiguously Kent Overstreet
2025-05-20 14:01 ` Usama Arif
2 siblings, 1 reply; 26+ messages in thread
From: Usama Arif @ 2025-05-20 12:25 UTC (permalink / raw)
To: Andrew Morton, surenb
Cc: hannes, shakeel.butt, vlad.wing, linux-mm, kent.overstreet,
linux-kernel, kernel-team, Usama Arif
In memory bound systems, a large number of warnings for failing this
allocation repeatedly may mask any real issues in the system
during memory pressure being reported in dmesg. Change this to
WARN_ONCE.
Signed-off-by: Usama Arif <usamaarif642@gmail.com>
Reported-by: Vlad Poenaru <vlad.wing@gmail.com>
Closes: https://lore.kernel.org/all/17fab2d6-5a74-4573-bcc3-b75951508f0a@gmail.com/
---
mm/slub.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/slub.c b/mm/slub.c
index bf43c403ead2..97cb3d9e8d00 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2102,7 +2102,7 @@ prepare_slab_obj_exts_hook(struct kmem_cache *s, gfp_t flags, void *p)
slab = virt_to_slab(p);
if (!slab_obj_exts(slab) &&
- WARN(alloc_slab_obj_exts(slab, s, flags, false),
+ WARN_ONCE(alloc_slab_obj_exts(slab, s, flags, false),
"%s, %s: Failed to create slab extension vector!\n",
__func__, s->name))
return NULL;
--
2.47.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [PATCH 2/2] mm: slub: only warn once when allocating slab obj extensions fails
2025-05-20 12:25 ` [PATCH 2/2] mm: slub: only warn once when allocating slab obj extensions fails Usama Arif
@ 2025-05-20 13:34 ` Harry Yoo
2025-05-20 13:42 ` Usama Arif
0 siblings, 1 reply; 26+ messages in thread
From: Harry Yoo @ 2025-05-20 13:34 UTC (permalink / raw)
To: Usama Arif
Cc: Andrew Morton, surenb, hannes, shakeel.butt, vlad.wing, linux-mm,
kent.overstreet, linux-kernel, kernel-team, vbabka, cl, rientjes,
roman.gushchin
On Tue, May 20, 2025 at 01:25:47PM +0100, Usama Arif wrote:
> In memory bound systems, a large number of warnings for failing this
> allocation repeatedly may mask any real issues in the system
> during memory pressure being reported in dmesg. Change this to
> WARN_ONCE.
>
> Signed-off-by: Usama Arif <usamaarif642@gmail.com>
> Reported-by: Vlad Poenaru <vlad.wing@gmail.com>
> Closes: https://lore.kernel.org/all/17fab2d6-5a74-4573-bcc3-b75951508f0a@gmail.com/
> ---
Hi,
Please Cc SLAB ALLOCATOR folks in MAINTAINERS on patches that touch
slab code ;)
> mm/slub.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/slub.c b/mm/slub.c
> index bf43c403ead2..97cb3d9e8d00 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2102,7 +2102,7 @@ prepare_slab_obj_exts_hook(struct kmem_cache *s, gfp_t flags, void *p)
>
> slab = virt_to_slab(p);
> if (!slab_obj_exts(slab) &&
> - WARN(alloc_slab_obj_exts(slab, s, flags, false),
> + WARN_ONCE(alloc_slab_obj_exts(slab, s, flags, false),
> "%s, %s: Failed to create slab extension vector!\n",
> __func__, s->name))
I think this should be pr_warn_once()?
I'm not sure why this was WARN() in the first place.
The coding style guide explicitly states that:
> Do not WARN lightly
> ===================
>
> WARN*() is intended for unexpected, this-should-never-happen situations.
> WARN*() macros are not to be used for anything that is expected to happen
> during normal operation. These are not pre- or post-condition asserts,
> for example. Again: WARN*() must not be used for a condition that is
> expected to trigger easily, for example, by user space actions.
> pr_warn_once() is a possible alternative, if you need to notify the user
> of a problem.
And failing to allocate the extension vector can happen during normal
operations.
panic_on_warn users will be unhappy if they notice their kernel panicked
just because their kernel failed to allocate slab extension vectors, which is
a totally normal situtation.
> return NULL;
> --
> 2.47.1
>
>
--
Cheers,
Harry / Hyeonggon
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 2/2] mm: slub: only warn once when allocating slab obj extensions fails
2025-05-20 13:34 ` Harry Yoo
@ 2025-05-20 13:42 ` Usama Arif
2025-05-20 14:18 ` Shakeel Butt
0 siblings, 1 reply; 26+ messages in thread
From: Usama Arif @ 2025-05-20 13:42 UTC (permalink / raw)
To: Harry Yoo
Cc: Andrew Morton, surenb, hannes, shakeel.butt, vlad.wing, linux-mm,
kent.overstreet, linux-kernel, kernel-team, vbabka, cl, rientjes,
roman.gushchin
On 20/05/2025 14:34, Harry Yoo wrote:
> On Tue, May 20, 2025 at 01:25:47PM +0100, Usama Arif wrote:
>> In memory bound systems, a large number of warnings for failing this
>> allocation repeatedly may mask any real issues in the system
>> during memory pressure being reported in dmesg. Change this to
>> WARN_ONCE.
>>
>> Signed-off-by: Usama Arif <usamaarif642@gmail.com>
>> Reported-by: Vlad Poenaru <vlad.wing@gmail.com>
>> Closes: https://lore.kernel.org/all/17fab2d6-5a74-4573-bcc3-b75951508f0a@gmail.com/
>> ---
>
> Hi,
>
> Please Cc SLAB ALLOCATOR folks in MAINTAINERS on patches that touch
> slab code ;)
>
Thanks for adding them to CC! I was just thinking of this as a memory
allocation profiling issue and added the maintainers for it,
but should have added slab maintainers as well.
>> mm/slub.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/mm/slub.c b/mm/slub.c
>> index bf43c403ead2..97cb3d9e8d00 100644
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -2102,7 +2102,7 @@ prepare_slab_obj_exts_hook(struct kmem_cache *s, gfp_t flags, void *p)
>>
>> slab = virt_to_slab(p);
>> if (!slab_obj_exts(slab) &&
>> - WARN(alloc_slab_obj_exts(slab, s, flags, false),
>> + WARN_ONCE(alloc_slab_obj_exts(slab, s, flags, false),
>> "%s, %s: Failed to create slab extension vector!\n",
>> __func__, s->name))
>
> I think this should be pr_warn_once()?
> I'm not sure why this was WARN() in the first place.
>
Isn't WARN_ONCE the same as pr_warn_once but with needing the condition
of the first arg to be true? We only want to warn if alloc_slab_obj_exts
returns non-zero. So WARN_ONCE should be ok?
> The coding style guide explicitly states that:
>> Do not WARN lightly
>> ===================
>>
>> WARN*() is intended for unexpected, this-should-never-happen situations.
>> WARN*() macros are not to be used for anything that is expected to happen
>> during normal operation. These are not pre- or post-condition asserts,
>> for example. Again: WARN*() must not be used for a condition that is
>> expected to trigger easily, for example, by user space actions.
>> pr_warn_once() is a possible alternative, if you need to notify the user
>> of a problem.
>
> And failing to allocate the extension vector can happen during normal
> operations.
>
> panic_on_warn users will be unhappy if they notice their kernel panicked
> just because their kernel failed to allocate slab extension vectors, which is
> a totally normal situtation.
>
>> return NULL;
>> --
>> 2.47.1
>>
>>
>
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/2] mm: slub: allocate slab object extensions non-contiguously
2025-05-20 12:25 [PATCH 1/2] mm: slub: allocate slab object extensions non-contiguously Usama Arif
2025-05-20 12:25 ` [PATCH 2/2] mm: slub: only warn once when allocating slab obj extensions fails Usama Arif
@ 2025-05-20 13:44 ` Kent Overstreet
2025-05-20 13:46 ` Usama Arif
2025-05-20 14:01 ` Usama Arif
2 siblings, 1 reply; 26+ messages in thread
From: Kent Overstreet @ 2025-05-20 13:44 UTC (permalink / raw)
To: Usama Arif
Cc: Andrew Morton, surenb, hannes, shakeel.butt, vlad.wing, linux-mm,
linux-kernel, kernel-team
On Tue, May 20, 2025 at 01:25:46PM +0100, Usama Arif wrote:
> When memory allocation profiling is running on memory bound services,
> allocations greater than order 0 for slab object extensions can fail,
> for e.g. zs_handle zswap slab which will be 512 objsperslab x 16 bytes
> per slabobj_ext (order 1 allocation). Use kvcalloc to improve chances
> of the allocation being successful.
>
> Signed-off-by: Usama Arif <usamaarif642@gmail.com>
> Reported-by: Vlad Poenaru <vlad.wing@gmail.com>
> Closes: https://lore.kernel.org/all/17fab2d6-5a74-4573-bcc3-b75951508f0a@gmail.com/
> ---
> mm/slub.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/slub.c b/mm/slub.c
> index dc9e729e1d26..bf43c403ead2 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1989,7 +1989,7 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
> gfp &= ~OBJCGS_CLEAR_MASK;
> /* Prevent recursive extension vector allocation */
> gfp |= __GFP_NO_OBJ_EXT;
> - vec = kcalloc_node(objects, sizeof(struct slabobj_ext), gfp,
> + vec = kvcalloc_node(objects, sizeof(struct slabobj_ext), gfp,
> slab_nid(slab));
And what's the latency going to be on a vmalloc() allocation when we're
low on memory?
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/2] mm: slub: allocate slab object extensions non-contiguously
2025-05-20 13:44 ` [PATCH 1/2] mm: slub: allocate slab object extensions non-contiguously Kent Overstreet
@ 2025-05-20 13:46 ` Usama Arif
2025-05-20 14:01 ` Kent Overstreet
2025-05-20 14:13 ` Usama Arif
0 siblings, 2 replies; 26+ messages in thread
From: Usama Arif @ 2025-05-20 13:46 UTC (permalink / raw)
To: Kent Overstreet
Cc: Andrew Morton, surenb, hannes, shakeel.butt, vlad.wing, linux-mm,
linux-kernel, kernel-team
On 20/05/2025 14:44, Kent Overstreet wrote:
> On Tue, May 20, 2025 at 01:25:46PM +0100, Usama Arif wrote:
>> When memory allocation profiling is running on memory bound services,
>> allocations greater than order 0 for slab object extensions can fail,
>> for e.g. zs_handle zswap slab which will be 512 objsperslab x 16 bytes
>> per slabobj_ext (order 1 allocation). Use kvcalloc to improve chances
>> of the allocation being successful.
>>
>> Signed-off-by: Usama Arif <usamaarif642@gmail.com>
>> Reported-by: Vlad Poenaru <vlad.wing@gmail.com>
>> Closes: https://lore.kernel.org/all/17fab2d6-5a74-4573-bcc3-b75951508f0a@gmail.com/
>> ---
>> mm/slub.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/mm/slub.c b/mm/slub.c
>> index dc9e729e1d26..bf43c403ead2 100644
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -1989,7 +1989,7 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
>> gfp &= ~OBJCGS_CLEAR_MASK;
>> /* Prevent recursive extension vector allocation */
>> gfp |= __GFP_NO_OBJ_EXT;
>> - vec = kcalloc_node(objects, sizeof(struct slabobj_ext), gfp,
>> + vec = kvcalloc_node(objects, sizeof(struct slabobj_ext), gfp,
>> slab_nid(slab));
>
> And what's the latency going to be on a vmalloc() allocation when we're
> low on memory?
Would it not be better to get the allocation slighly slower than to not get
it at all?
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/2] mm: slub: allocate slab object extensions non-contiguously
2025-05-20 13:46 ` Usama Arif
@ 2025-05-20 14:01 ` Kent Overstreet
2025-05-20 14:24 ` Shakeel Butt
2025-05-20 14:13 ` Usama Arif
1 sibling, 1 reply; 26+ messages in thread
From: Kent Overstreet @ 2025-05-20 14:01 UTC (permalink / raw)
To: Usama Arif
Cc: Andrew Morton, surenb, hannes, shakeel.butt, vlad.wing, linux-mm,
linux-kernel, kernel-team
On Tue, May 20, 2025 at 02:46:14PM +0100, Usama Arif wrote:
>
>
> On 20/05/2025 14:44, Kent Overstreet wrote:
> > On Tue, May 20, 2025 at 01:25:46PM +0100, Usama Arif wrote:
> >> When memory allocation profiling is running on memory bound services,
> >> allocations greater than order 0 for slab object extensions can fail,
> >> for e.g. zs_handle zswap slab which will be 512 objsperslab x 16 bytes
> >> per slabobj_ext (order 1 allocation). Use kvcalloc to improve chances
> >> of the allocation being successful.
> >>
> >> Signed-off-by: Usama Arif <usamaarif642@gmail.com>
> >> Reported-by: Vlad Poenaru <vlad.wing@gmail.com>
> >> Closes: https://lore.kernel.org/all/17fab2d6-5a74-4573-bcc3-b75951508f0a@gmail.com/
> >> ---
> >> mm/slub.c | 2 +-
> >> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/mm/slub.c b/mm/slub.c
> >> index dc9e729e1d26..bf43c403ead2 100644
> >> --- a/mm/slub.c
> >> +++ b/mm/slub.c
> >> @@ -1989,7 +1989,7 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
> >> gfp &= ~OBJCGS_CLEAR_MASK;
> >> /* Prevent recursive extension vector allocation */
> >> gfp |= __GFP_NO_OBJ_EXT;
> >> - vec = kcalloc_node(objects, sizeof(struct slabobj_ext), gfp,
> >> + vec = kvcalloc_node(objects, sizeof(struct slabobj_ext), gfp,
> >> slab_nid(slab));
> >
> > And what's the latency going to be on a vmalloc() allocation when we're
> > low on memory?
>
> Would it not be better to get the allocation slighly slower than to not get
> it at all?
Our behaviour when thrashing sucks, we don't want to do anything to make
that worse.
There's also the fact that vmalloc doesn't correctly respect gfp flags,
so until that gets fixed this doesn't work at all.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/2] mm: slub: allocate slab object extensions non-contiguously
2025-05-20 12:25 [PATCH 1/2] mm: slub: allocate slab object extensions non-contiguously Usama Arif
2025-05-20 12:25 ` [PATCH 2/2] mm: slub: only warn once when allocating slab obj extensions fails Usama Arif
2025-05-20 13:44 ` [PATCH 1/2] mm: slub: allocate slab object extensions non-contiguously Kent Overstreet
@ 2025-05-20 14:01 ` Usama Arif
2 siblings, 0 replies; 26+ messages in thread
From: Usama Arif @ 2025-05-20 14:01 UTC (permalink / raw)
To: Andrew Morton, surenb
Cc: hannes, shakeel.butt, vlad.wing, linux-mm, kent.overstreet,
linux-kernel, kernel-team
On 20/05/2025 13:25, Usama Arif wrote:
> When memory allocation profiling is running on memory bound services,
> allocations greater than order 0 for slab object extensions can fail,
> for e.g. zs_handle zswap slab which will be 512 objsperslab x 16 bytes
> per slabobj_ext (order 1 allocation). Use kvcalloc to improve chances
> of the allocation being successful.
>
> Signed-off-by: Usama Arif <usamaarif642@gmail.com>
> Reported-by: Vlad Poenaru <vlad.wing@gmail.com>
> Closes: https://lore.kernel.org/all/17fab2d6-5a74-4573-bcc3-b75951508f0a@gmail.com/
> ---
> mm/slub.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/slub.c b/mm/slub.c
> index dc9e729e1d26..bf43c403ead2 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1989,7 +1989,7 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
> gfp &= ~OBJCGS_CLEAR_MASK;
> /* Prevent recursive extension vector allocation */
> gfp |= __GFP_NO_OBJ_EXT;
> - vec = kcalloc_node(objects, sizeof(struct slabobj_ext), gfp,
> + vec = kvcalloc_node(objects, sizeof(struct slabobj_ext), gfp,
> slab_nid(slab));
> if (!vec) {
> /* Mark vectors which failed to allocate */
I forgot the freeing part, This will need the below fixlet as well.
I will add it to the next revision if there is another one,
otherwise it can be folded into this one. Thanks!
commit fa48eab7faddfdb94faa80a1575ac1840919071e (HEAD -> prctl_huge_v3)
Author: Usama Arif <usamaarif642@gmail.com>
Date: Tue May 20 14:58:10 2025 +0100
mm: slub: change slab_obj_exts freeing to kvfree
This is to match the kvmalloc.
Signed-off-by: Usama Arif <usamaarif642@gmail.com>
diff --git a/mm/slub.c b/mm/slub.c
index 97cb3d9e8d00..2245e8d8fffb 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2045,7 +2045,7 @@ static noinline void free_slab_obj_exts(struct slab *slab)
* the extension for obj_exts is expected to be NULL.
*/
mark_objexts_empty(obj_exts);
- kfree(obj_exts);
+ kvfree(obj_exts);
slab->obj_exts = 0;
}
^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [PATCH 1/2] mm: slub: allocate slab object extensions non-contiguously
2025-05-20 13:46 ` Usama Arif
2025-05-20 14:01 ` Kent Overstreet
@ 2025-05-20 14:13 ` Usama Arif
2025-05-20 15:20 ` Suren Baghdasaryan
1 sibling, 1 reply; 26+ messages in thread
From: Usama Arif @ 2025-05-20 14:13 UTC (permalink / raw)
To: Kent Overstreet
Cc: Andrew Morton, surenb, hannes, shakeel.butt, vlad.wing, linux-mm,
linux-kernel, kernel-team
On 20/05/2025 14:46, Usama Arif wrote:
>
>
> On 20/05/2025 14:44, Kent Overstreet wrote:
>> On Tue, May 20, 2025 at 01:25:46PM +0100, Usama Arif wrote:
>>> When memory allocation profiling is running on memory bound services,
>>> allocations greater than order 0 for slab object extensions can fail,
>>> for e.g. zs_handle zswap slab which will be 512 objsperslab x 16 bytes
>>> per slabobj_ext (order 1 allocation). Use kvcalloc to improve chances
>>> of the allocation being successful.
>>>
>>> Signed-off-by: Usama Arif <usamaarif642@gmail.com>
>>> Reported-by: Vlad Poenaru <vlad.wing@gmail.com>
>>> Closes: https://lore.kernel.org/all/17fab2d6-5a74-4573-bcc3-b75951508f0a@gmail.com/
>>> ---
>>> mm/slub.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/mm/slub.c b/mm/slub.c
>>> index dc9e729e1d26..bf43c403ead2 100644
>>> --- a/mm/slub.c
>>> +++ b/mm/slub.c
>>> @@ -1989,7 +1989,7 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
>>> gfp &= ~OBJCGS_CLEAR_MASK;
>>> /* Prevent recursive extension vector allocation */
>>> gfp |= __GFP_NO_OBJ_EXT;
>>> - vec = kcalloc_node(objects, sizeof(struct slabobj_ext), gfp,
>>> + vec = kvcalloc_node(objects, sizeof(struct slabobj_ext), gfp,
>>> slab_nid(slab));
>>
>> And what's the latency going to be on a vmalloc() allocation when we're
>> low on memory?
>
> Would it not be better to get the allocation slighly slower than to not get
> it at all?
Also a majority of them are less than 1 page. kvmalloc of less than 1 page
falls back to kmalloc. So vmalloc will only be on those greater than 1 page
size, which are in the minority (for e.g. zs_handle, request_sock_subflow_v6,
request_sock_subflow_v4...).
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 2/2] mm: slub: only warn once when allocating slab obj extensions fails
2025-05-20 13:42 ` Usama Arif
@ 2025-05-20 14:18 ` Shakeel Butt
2025-05-20 15:14 ` Suren Baghdasaryan
2025-05-20 15:22 ` Usama Arif
0 siblings, 2 replies; 26+ messages in thread
From: Shakeel Butt @ 2025-05-20 14:18 UTC (permalink / raw)
To: Usama Arif
Cc: Harry Yoo, Andrew Morton, surenb, hannes, vlad.wing, linux-mm,
kent.overstreet, linux-kernel, kernel-team, vbabka, cl, rientjes,
roman.gushchin
On Tue, May 20, 2025 at 02:42:09PM +0100, Usama Arif wrote:
>
>
> On 20/05/2025 14:34, Harry Yoo wrote:
> > On Tue, May 20, 2025 at 01:25:47PM +0100, Usama Arif wrote:
> >> In memory bound systems, a large number of warnings for failing this
> >> allocation repeatedly may mask any real issues in the system
> >> during memory pressure being reported in dmesg. Change this to
> >> WARN_ONCE.
> >>
> >> Signed-off-by: Usama Arif <usamaarif642@gmail.com>
> >> Reported-by: Vlad Poenaru <vlad.wing@gmail.com>
> >> Closes: https://lore.kernel.org/all/17fab2d6-5a74-4573-bcc3-b75951508f0a@gmail.com/
> >> ---
> >
> > Hi,
> >
> > Please Cc SLAB ALLOCATOR folks in MAINTAINERS on patches that touch
> > slab code ;)
> >
>
> Thanks for adding them to CC! I was just thinking of this as a memory
> allocation profiling issue and added the maintainers for it,
> but should have added slab maintainers as well.
>
>
> >> mm/slub.c | 2 +-
> >> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/mm/slub.c b/mm/slub.c
> >> index bf43c403ead2..97cb3d9e8d00 100644
> >> --- a/mm/slub.c
> >> +++ b/mm/slub.c
> >> @@ -2102,7 +2102,7 @@ prepare_slab_obj_exts_hook(struct kmem_cache *s, gfp_t flags, void *p)
> >>
> >> slab = virt_to_slab(p);
> >> if (!slab_obj_exts(slab) &&
> >> - WARN(alloc_slab_obj_exts(slab, s, flags, false),
> >> + WARN_ONCE(alloc_slab_obj_exts(slab, s, flags, false),
> >> "%s, %s: Failed to create slab extension vector!\n",
> >> __func__, s->name))
> >
> > I think this should be pr_warn_once()?
> > I'm not sure why this was WARN() in the first place.
> >
>
> Isn't WARN_ONCE the same as pr_warn_once but with needing the condition
> of the first arg to be true? We only want to warn if alloc_slab_obj_exts
> returns non-zero. So WARN_ONCE should be ok?
>
The difference is the impact on panic_on_warn users which are mostly
testing bots. This warning is not actionable, so I agree with Harry to
covert this to pr_warn_once().
> > The coding style guide explicitly states that:
> >> Do not WARN lightly
> >> ===================
> >>
> >> WARN*() is intended for unexpected, this-should-never-happen situations.
> >> WARN*() macros are not to be used for anything that is expected to happen
> >> during normal operation. These are not pre- or post-condition asserts,
> >> for example. Again: WARN*() must not be used for a condition that is
> >> expected to trigger easily, for example, by user space actions.
> >> pr_warn_once() is a possible alternative, if you need to notify the user
> >> of a problem.
> >
> > And failing to allocate the extension vector can happen during normal
> > operations.
> >
> > panic_on_warn users will be unhappy if they notice their kernel panicked
> > just because their kernel failed to allocate slab extension vectors, which is
> > a totally normal situtation.
> >
> >> return NULL;
> >> --
> >> 2.47.1
> >>
> >>
> >
>
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/2] mm: slub: allocate slab object extensions non-contiguously
2025-05-20 14:01 ` Kent Overstreet
@ 2025-05-20 14:24 ` Shakeel Butt
2025-05-20 14:28 ` Kent Overstreet
0 siblings, 1 reply; 26+ messages in thread
From: Shakeel Butt @ 2025-05-20 14:24 UTC (permalink / raw)
To: Kent Overstreet
Cc: Usama Arif, Andrew Morton, surenb, hannes, vlad.wing, linux-mm,
linux-kernel, kernel-team
On Tue, May 20, 2025 at 10:01:27AM -0400, Kent Overstreet wrote:
> On Tue, May 20, 2025 at 02:46:14PM +0100, Usama Arif wrote:
> >
> >
> > On 20/05/2025 14:44, Kent Overstreet wrote:
> > > On Tue, May 20, 2025 at 01:25:46PM +0100, Usama Arif wrote:
> > >> When memory allocation profiling is running on memory bound services,
> > >> allocations greater than order 0 for slab object extensions can fail,
> > >> for e.g. zs_handle zswap slab which will be 512 objsperslab x 16 bytes
> > >> per slabobj_ext (order 1 allocation). Use kvcalloc to improve chances
> > >> of the allocation being successful.
> > >>
> > >> Signed-off-by: Usama Arif <usamaarif642@gmail.com>
> > >> Reported-by: Vlad Poenaru <vlad.wing@gmail.com>
> > >> Closes: https://lore.kernel.org/all/17fab2d6-5a74-4573-bcc3-b75951508f0a@gmail.com/
> > >> ---
> > >> mm/slub.c | 2 +-
> > >> 1 file changed, 1 insertion(+), 1 deletion(-)
> > >>
> > >> diff --git a/mm/slub.c b/mm/slub.c
> > >> index dc9e729e1d26..bf43c403ead2 100644
> > >> --- a/mm/slub.c
> > >> +++ b/mm/slub.c
> > >> @@ -1989,7 +1989,7 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
> > >> gfp &= ~OBJCGS_CLEAR_MASK;
> > >> /* Prevent recursive extension vector allocation */
> > >> gfp |= __GFP_NO_OBJ_EXT;
> > >> - vec = kcalloc_node(objects, sizeof(struct slabobj_ext), gfp,
> > >> + vec = kvcalloc_node(objects, sizeof(struct slabobj_ext), gfp,
> > >> slab_nid(slab));
> > >
> > > And what's the latency going to be on a vmalloc() allocation when we're
> > > low on memory?
> >
> > Would it not be better to get the allocation slighly slower than to not get
> > it at all?
>
> Our behaviour when thrashing sucks, we don't want to do anything to make
> that worse.
>
> There's also the fact that vmalloc doesn't correctly respect gfp flags,
> so until that gets fixed this doesn't work at all.
Which gfp flags vmalloc is not respecting today?
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/2] mm: slub: allocate slab object extensions non-contiguously
2025-05-20 14:24 ` Shakeel Butt
@ 2025-05-20 14:28 ` Kent Overstreet
2025-05-20 17:44 ` Uladzislau Rezki
0 siblings, 1 reply; 26+ messages in thread
From: Kent Overstreet @ 2025-05-20 14:28 UTC (permalink / raw)
To: Shakeel Butt
Cc: Usama Arif, Andrew Morton, surenb, hannes, vlad.wing, linux-mm,
linux-kernel, kernel-team
On Tue, May 20, 2025 at 07:24:40AM -0700, Shakeel Butt wrote:
> On Tue, May 20, 2025 at 10:01:27AM -0400, Kent Overstreet wrote:
> > On Tue, May 20, 2025 at 02:46:14PM +0100, Usama Arif wrote:
> > >
> > >
> > > On 20/05/2025 14:44, Kent Overstreet wrote:
> > > > On Tue, May 20, 2025 at 01:25:46PM +0100, Usama Arif wrote:
> > > >> When memory allocation profiling is running on memory bound services,
> > > >> allocations greater than order 0 for slab object extensions can fail,
> > > >> for e.g. zs_handle zswap slab which will be 512 objsperslab x 16 bytes
> > > >> per slabobj_ext (order 1 allocation). Use kvcalloc to improve chances
> > > >> of the allocation being successful.
> > > >>
> > > >> Signed-off-by: Usama Arif <usamaarif642@gmail.com>
> > > >> Reported-by: Vlad Poenaru <vlad.wing@gmail.com>
> > > >> Closes: https://lore.kernel.org/all/17fab2d6-5a74-4573-bcc3-b75951508f0a@gmail.com/
> > > >> ---
> > > >> mm/slub.c | 2 +-
> > > >> 1 file changed, 1 insertion(+), 1 deletion(-)
> > > >>
> > > >> diff --git a/mm/slub.c b/mm/slub.c
> > > >> index dc9e729e1d26..bf43c403ead2 100644
> > > >> --- a/mm/slub.c
> > > >> +++ b/mm/slub.c
> > > >> @@ -1989,7 +1989,7 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
> > > >> gfp &= ~OBJCGS_CLEAR_MASK;
> > > >> /* Prevent recursive extension vector allocation */
> > > >> gfp |= __GFP_NO_OBJ_EXT;
> > > >> - vec = kcalloc_node(objects, sizeof(struct slabobj_ext), gfp,
> > > >> + vec = kvcalloc_node(objects, sizeof(struct slabobj_ext), gfp,
> > > >> slab_nid(slab));
> > > >
> > > > And what's the latency going to be on a vmalloc() allocation when we're
> > > > low on memory?
> > >
> > > Would it not be better to get the allocation slighly slower than to not get
> > > it at all?
> >
> > Our behaviour when thrashing sucks, we don't want to do anything to make
> > that worse.
> >
> > There's also the fact that vmalloc doesn't correctly respect gfp flags,
> > so until that gets fixed this doesn't work at all.
>
> Which gfp flags vmalloc is not respecting today?
GFP_NOWAIT.
As to why, you'd better ask Michal Hocko...
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 2/2] mm: slub: only warn once when allocating slab obj extensions fails
2025-05-20 14:18 ` Shakeel Butt
@ 2025-05-20 15:14 ` Suren Baghdasaryan
2025-05-20 15:22 ` Usama Arif
1 sibling, 0 replies; 26+ messages in thread
From: Suren Baghdasaryan @ 2025-05-20 15:14 UTC (permalink / raw)
To: Shakeel Butt
Cc: Usama Arif, Harry Yoo, Andrew Morton, hannes, vlad.wing, linux-mm,
kent.overstreet, linux-kernel, kernel-team, vbabka, cl, rientjes,
roman.gushchin
On Tue, May 20, 2025 at 7:18 AM Shakeel Butt <shakeel.butt@linux.dev> wrote:
>
> On Tue, May 20, 2025 at 02:42:09PM +0100, Usama Arif wrote:
> >
> >
> > On 20/05/2025 14:34, Harry Yoo wrote:
> > > On Tue, May 20, 2025 at 01:25:47PM +0100, Usama Arif wrote:
> > >> In memory bound systems, a large number of warnings for failing this
> > >> allocation repeatedly may mask any real issues in the system
> > >> during memory pressure being reported in dmesg. Change this to
> > >> WARN_ONCE.
> > >>
> > >> Signed-off-by: Usama Arif <usamaarif642@gmail.com>
> > >> Reported-by: Vlad Poenaru <vlad.wing@gmail.com>
> > >> Closes: https://lore.kernel.org/all/17fab2d6-5a74-4573-bcc3-b75951508f0a@gmail.com/
> > >> ---
> > >
> > > Hi,
> > >
> > > Please Cc SLAB ALLOCATOR folks in MAINTAINERS on patches that touch
> > > slab code ;)
> > >
> >
> > Thanks for adding them to CC! I was just thinking of this as a memory
> > allocation profiling issue and added the maintainers for it,
> > but should have added slab maintainers as well.
> >
> >
> > >> mm/slub.c | 2 +-
> > >> 1 file changed, 1 insertion(+), 1 deletion(-)
> > >>
> > >> diff --git a/mm/slub.c b/mm/slub.c
> > >> index bf43c403ead2..97cb3d9e8d00 100644
> > >> --- a/mm/slub.c
> > >> +++ b/mm/slub.c
> > >> @@ -2102,7 +2102,7 @@ prepare_slab_obj_exts_hook(struct kmem_cache *s, gfp_t flags, void *p)
> > >>
> > >> slab = virt_to_slab(p);
> > >> if (!slab_obj_exts(slab) &&
> > >> - WARN(alloc_slab_obj_exts(slab, s, flags, false),
> > >> + WARN_ONCE(alloc_slab_obj_exts(slab, s, flags, false),
> > >> "%s, %s: Failed to create slab extension vector!\n",
> > >> __func__, s->name))
> > >
> > > I think this should be pr_warn_once()?
> > > I'm not sure why this was WARN() in the first place.
> > >
> >
> > Isn't WARN_ONCE the same as pr_warn_once but with needing the condition
> > of the first arg to be true? We only want to warn if alloc_slab_obj_exts
> > returns non-zero. So WARN_ONCE should be ok?
> >
>
> The difference is the impact on panic_on_warn users which are mostly
> testing bots.
Another difference is that pr_warn() does not spit out the call stack.
> This warning is not actionable, so I agree with Harry to
> covert this to pr_warn_once().
Makes sense.
>
> > > The coding style guide explicitly states that:
> > >> Do not WARN lightly
> > >> ===================
> > >>
> > >> WARN*() is intended for unexpected, this-should-never-happen situations.
> > >> WARN*() macros are not to be used for anything that is expected to happen
> > >> during normal operation. These are not pre- or post-condition asserts,
> > >> for example. Again: WARN*() must not be used for a condition that is
> > >> expected to trigger easily, for example, by user space actions.
> > >> pr_warn_once() is a possible alternative, if you need to notify the user
> > >> of a problem.
> > >
> > > And failing to allocate the extension vector can happen during normal
> > > operations.
> > >
> > > panic_on_warn users will be unhappy if they notice their kernel panicked
> > > just because their kernel failed to allocate slab extension vectors, which is
> > > a totally normal situtation.
> > >
> > >> return NULL;
> > >> --
> > >> 2.47.1
> > >>
> > >>
> > >
> >
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/2] mm: slub: allocate slab object extensions non-contiguously
2025-05-20 14:13 ` Usama Arif
@ 2025-05-20 15:20 ` Suren Baghdasaryan
2025-05-20 16:41 ` Kent Overstreet
2025-05-20 17:18 ` Johannes Weiner
0 siblings, 2 replies; 26+ messages in thread
From: Suren Baghdasaryan @ 2025-05-20 15:20 UTC (permalink / raw)
To: Usama Arif
Cc: Kent Overstreet, Andrew Morton, hannes, shakeel.butt, vlad.wing,
linux-mm, linux-kernel, kernel-team
On Tue, May 20, 2025 at 7:13 AM Usama Arif <usamaarif642@gmail.com> wrote:
>
>
>
> On 20/05/2025 14:46, Usama Arif wrote:
> >
> >
> > On 20/05/2025 14:44, Kent Overstreet wrote:
> >> On Tue, May 20, 2025 at 01:25:46PM +0100, Usama Arif wrote:
> >>> When memory allocation profiling is running on memory bound services,
> >>> allocations greater than order 0 for slab object extensions can fail,
> >>> for e.g. zs_handle zswap slab which will be 512 objsperslab x 16 bytes
> >>> per slabobj_ext (order 1 allocation). Use kvcalloc to improve chances
> >>> of the allocation being successful.
> >>>
> >>> Signed-off-by: Usama Arif <usamaarif642@gmail.com>
> >>> Reported-by: Vlad Poenaru <vlad.wing@gmail.com>
> >>> Closes: https://lore.kernel.org/all/17fab2d6-5a74-4573-bcc3-b75951508f0a@gmail.com/
> >>> ---
> >>> mm/slub.c | 2 +-
> >>> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>>
> >>> diff --git a/mm/slub.c b/mm/slub.c
> >>> index dc9e729e1d26..bf43c403ead2 100644
> >>> --- a/mm/slub.c
> >>> +++ b/mm/slub.c
> >>> @@ -1989,7 +1989,7 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
> >>> gfp &= ~OBJCGS_CLEAR_MASK;
> >>> /* Prevent recursive extension vector allocation */
> >>> gfp |= __GFP_NO_OBJ_EXT;
> >>> - vec = kcalloc_node(objects, sizeof(struct slabobj_ext), gfp,
> >>> + vec = kvcalloc_node(objects, sizeof(struct slabobj_ext), gfp,
> >>> slab_nid(slab));
> >>
> >> And what's the latency going to be on a vmalloc() allocation when we're
> >> low on memory?
> >
> > Would it not be better to get the allocation slighly slower than to not get
> > it at all?
>
> Also a majority of them are less than 1 page. kvmalloc of less than 1 page
> falls back to kmalloc. So vmalloc will only be on those greater than 1 page
> size, which are in the minority (for e.g. zs_handle, request_sock_subflow_v6,
> request_sock_subflow_v4...).
Not just the majority. For all of these kvmalloc allocations kmalloc
will be tried first and vmalloc will be used only if the former
failed: https://elixir.bootlin.com/linux/v6.14.7/source/mm/util.c#L665
That's why I think this should not regress normal case when slab has
enough space to satisfy the allocation.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 2/2] mm: slub: only warn once when allocating slab obj extensions fails
2025-05-20 14:18 ` Shakeel Butt
2025-05-20 15:14 ` Suren Baghdasaryan
@ 2025-05-20 15:22 ` Usama Arif
2025-05-22 0:16 ` Harry Yoo
1 sibling, 1 reply; 26+ messages in thread
From: Usama Arif @ 2025-05-20 15:22 UTC (permalink / raw)
To: Shakeel Butt, surenb, Harry Yoo
Cc: Andrew Morton, hannes, vlad.wing, linux-mm, kent.overstreet,
linux-kernel, kernel-team, vbabka, cl, rientjes, roman.gushchin
On 20/05/2025 15:18, Shakeel Butt wrote:
> On Tue, May 20, 2025 at 02:42:09PM +0100, Usama Arif wrote:
>>
>>
>> On 20/05/2025 14:34, Harry Yoo wrote:
>>> On Tue, May 20, 2025 at 01:25:47PM +0100, Usama Arif wrote:
>>>> In memory bound systems, a large number of warnings for failing this
>>>> allocation repeatedly may mask any real issues in the system
>>>> during memory pressure being reported in dmesg. Change this to
>>>> WARN_ONCE.
>>>>
>>>> Signed-off-by: Usama Arif <usamaarif642@gmail.com>
>>>> Reported-by: Vlad Poenaru <vlad.wing@gmail.com>
>>>> Closes: https://lore.kernel.org/all/17fab2d6-5a74-4573-bcc3-b75951508f0a@gmail.com/
>>>> ---
>>>
>>> Hi,
>>>
>>> Please Cc SLAB ALLOCATOR folks in MAINTAINERS on patches that touch
>>> slab code ;)
>>>
>>
>> Thanks for adding them to CC! I was just thinking of this as a memory
>> allocation profiling issue and added the maintainers for it,
>> but should have added slab maintainers as well.
>>
>>
>>>> mm/slub.c | 2 +-
>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/mm/slub.c b/mm/slub.c
>>>> index bf43c403ead2..97cb3d9e8d00 100644
>>>> --- a/mm/slub.c
>>>> +++ b/mm/slub.c
>>>> @@ -2102,7 +2102,7 @@ prepare_slab_obj_exts_hook(struct kmem_cache *s, gfp_t flags, void *p)
>>>>
>>>> slab = virt_to_slab(p);
>>>> if (!slab_obj_exts(slab) &&
>>>> - WARN(alloc_slab_obj_exts(slab, s, flags, false),
>>>> + WARN_ONCE(alloc_slab_obj_exts(slab, s, flags, false),
>>>> "%s, %s: Failed to create slab extension vector!\n",
>>>> __func__, s->name))
>>>
>>> I think this should be pr_warn_once()?
>>> I'm not sure why this was WARN() in the first place.
>>>
>>
>> Isn't WARN_ONCE the same as pr_warn_once but with needing the condition
>> of the first arg to be true? We only want to warn if alloc_slab_obj_exts
>> returns non-zero. So WARN_ONCE should be ok?
>>
>
> The difference is the impact on panic_on_warn users which are mostly
> testing bots. This warning is not actionable, so I agree with Harry to
> covert this to pr_warn_once().
>
Sounds good! Will change it to below for the next revision.
Will wait for the kvmalloc conversation to conclude before sending
the next revision.
diff --git a/mm/slub.c b/mm/slub.c
index 08804d2f2ead..ab0b7ee87159 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2101,11 +2101,13 @@ prepare_slab_obj_exts_hook(struct kmem_cache *s, gfp_t flags, void *p)
return NULL;
slab = virt_to_slab(p);
- if (!slab_obj_exts(slab) &&
- WARN(alloc_slab_obj_exts(slab, s, flags, false),
- "%s, %s: Failed to create slab extension vector!\n",
- __func__, s->name))
- return NULL;
+ if (!slab_obj_exts(slab)) {
+ if(alloc_slab_obj_exts(slab, s, flags, false))
+ pr_warn_once("%s, %s: Failed to create slab extension vector!\n",
+ __func__, s->name);
+ else
+ return NULL;
+ }
return slab_obj_exts(slab) + obj_to_index(s, slab, p);
}
^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [PATCH 1/2] mm: slub: allocate slab object extensions non-contiguously
2025-05-20 15:20 ` Suren Baghdasaryan
@ 2025-05-20 16:41 ` Kent Overstreet
2025-05-20 17:20 ` Suren Baghdasaryan
2025-05-20 17:18 ` Johannes Weiner
1 sibling, 1 reply; 26+ messages in thread
From: Kent Overstreet @ 2025-05-20 16:41 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: Usama Arif, Andrew Morton, hannes, shakeel.butt, vlad.wing,
linux-mm, linux-kernel, kernel-team
On Tue, May 20, 2025 at 08:20:38AM -0700, Suren Baghdasaryan wrote:
> On Tue, May 20, 2025 at 7:13 AM Usama Arif <usamaarif642@gmail.com> wrote:
> >
> >
> >
> > On 20/05/2025 14:46, Usama Arif wrote:
> > >
> > >
> > > On 20/05/2025 14:44, Kent Overstreet wrote:
> > >> On Tue, May 20, 2025 at 01:25:46PM +0100, Usama Arif wrote:
> > >>> When memory allocation profiling is running on memory bound services,
> > >>> allocations greater than order 0 for slab object extensions can fail,
> > >>> for e.g. zs_handle zswap slab which will be 512 objsperslab x 16 bytes
> > >>> per slabobj_ext (order 1 allocation). Use kvcalloc to improve chances
> > >>> of the allocation being successful.
> > >>>
> > >>> Signed-off-by: Usama Arif <usamaarif642@gmail.com>
> > >>> Reported-by: Vlad Poenaru <vlad.wing@gmail.com>
> > >>> Closes: https://lore.kernel.org/all/17fab2d6-5a74-4573-bcc3-b75951508f0a@gmail.com/
> > >>> ---
> > >>> mm/slub.c | 2 +-
> > >>> 1 file changed, 1 insertion(+), 1 deletion(-)
> > >>>
> > >>> diff --git a/mm/slub.c b/mm/slub.c
> > >>> index dc9e729e1d26..bf43c403ead2 100644
> > >>> --- a/mm/slub.c
> > >>> +++ b/mm/slub.c
> > >>> @@ -1989,7 +1989,7 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
> > >>> gfp &= ~OBJCGS_CLEAR_MASK;
> > >>> /* Prevent recursive extension vector allocation */
> > >>> gfp |= __GFP_NO_OBJ_EXT;
> > >>> - vec = kcalloc_node(objects, sizeof(struct slabobj_ext), gfp,
> > >>> + vec = kvcalloc_node(objects, sizeof(struct slabobj_ext), gfp,
> > >>> slab_nid(slab));
> > >>
> > >> And what's the latency going to be on a vmalloc() allocation when we're
> > >> low on memory?
> > >
> > > Would it not be better to get the allocation slighly slower than to not get
> > > it at all?
> >
> > Also a majority of them are less than 1 page. kvmalloc of less than 1 page
> > falls back to kmalloc. So vmalloc will only be on those greater than 1 page
> > size, which are in the minority (for e.g. zs_handle, request_sock_subflow_v6,
> > request_sock_subflow_v4...).
>
> Not just the majority. For all of these kvmalloc allocations kmalloc
> will be tried first and vmalloc will be used only if the former
> failed: https://elixir.bootlin.com/linux/v6.14.7/source/mm/util.c#L665
> That's why I think this should not regress normal case when slab has
> enough space to satisfy the allocation.
And you really should consider just letting the extension vector
allocation fail if we're under that much memory pressure.
Failing allocations is an important mechanism for load shedding,
otherwise stuff just piles up - it's a big cause of our terrible
behaviour when we're thrashing.
It's equivalent to bufferbloat in the networking world.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/2] mm: slub: allocate slab object extensions non-contiguously
2025-05-20 15:20 ` Suren Baghdasaryan
2025-05-20 16:41 ` Kent Overstreet
@ 2025-05-20 17:18 ` Johannes Weiner
1 sibling, 0 replies; 26+ messages in thread
From: Johannes Weiner @ 2025-05-20 17:18 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: Usama Arif, Kent Overstreet, Andrew Morton, shakeel.butt,
vlad.wing, linux-mm, linux-kernel, kernel-team
On Tue, May 20, 2025 at 08:20:38AM -0700, Suren Baghdasaryan wrote:
> On Tue, May 20, 2025 at 7:13 AM Usama Arif <usamaarif642@gmail.com> wrote:
> >
> >
> >
> > On 20/05/2025 14:46, Usama Arif wrote:
> > >
> > >
> > > On 20/05/2025 14:44, Kent Overstreet wrote:
> > >> On Tue, May 20, 2025 at 01:25:46PM +0100, Usama Arif wrote:
> > >>> When memory allocation profiling is running on memory bound services,
> > >>> allocations greater than order 0 for slab object extensions can fail,
> > >>> for e.g. zs_handle zswap slab which will be 512 objsperslab x 16 bytes
> > >>> per slabobj_ext (order 1 allocation). Use kvcalloc to improve chances
> > >>> of the allocation being successful.
> > >>>
> > >>> Signed-off-by: Usama Arif <usamaarif642@gmail.com>
> > >>> Reported-by: Vlad Poenaru <vlad.wing@gmail.com>
> > >>> Closes: https://lore.kernel.org/all/17fab2d6-5a74-4573-bcc3-b75951508f0a@gmail.com/
> > >>> ---
> > >>> mm/slub.c | 2 +-
> > >>> 1 file changed, 1 insertion(+), 1 deletion(-)
> > >>>
> > >>> diff --git a/mm/slub.c b/mm/slub.c
> > >>> index dc9e729e1d26..bf43c403ead2 100644
> > >>> --- a/mm/slub.c
> > >>> +++ b/mm/slub.c
> > >>> @@ -1989,7 +1989,7 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
> > >>> gfp &= ~OBJCGS_CLEAR_MASK;
> > >>> /* Prevent recursive extension vector allocation */
> > >>> gfp |= __GFP_NO_OBJ_EXT;
> > >>> - vec = kcalloc_node(objects, sizeof(struct slabobj_ext), gfp,
> > >>> + vec = kvcalloc_node(objects, sizeof(struct slabobj_ext), gfp,
> > >>> slab_nid(slab));
> > >>
> > >> And what's the latency going to be on a vmalloc() allocation when we're
> > >> low on memory?
> > >
> > > Would it not be better to get the allocation slighly slower than to not get
> > > it at all?
> >
> > Also a majority of them are less than 1 page. kvmalloc of less than 1 page
> > falls back to kmalloc. So vmalloc will only be on those greater than 1 page
> > size, which are in the minority (for e.g. zs_handle, request_sock_subflow_v6,
> > request_sock_subflow_v4...).
>
> Not just the majority. For all of these kvmalloc allocations kmalloc
> will be tried first and vmalloc will be used only if the former
> failed: https://elixir.bootlin.com/linux/v6.14.7/source/mm/util.c#L665
> That's why I think this should not regress normal case when slab has
> enough space to satisfy the allocation.
Alexei raised a good point offline that having slab enter vmalloc
messes with the whole slab re-entrancy and nmi safety he has been
pursuing for bpf/probing.
Add that to the other concerns around vmalloc, and I think we should
just drop that part.
IMO, the more important takeaway is that we accept that this
allocation is optimistic, and can and does fail in practice, even if
the slab allocation itself succeeded.
So it probably makes sense to 1) ax the warning entirely - since it's
not indicative of a bug. And 2) accept that the numbers can have a
fudge factor in practice, and mark line items in the report
correspondingly when they do.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/2] mm: slub: allocate slab object extensions non-contiguously
2025-05-20 16:41 ` Kent Overstreet
@ 2025-05-20 17:20 ` Suren Baghdasaryan
2025-05-20 17:25 ` Kent Overstreet
0 siblings, 1 reply; 26+ messages in thread
From: Suren Baghdasaryan @ 2025-05-20 17:20 UTC (permalink / raw)
To: Kent Overstreet
Cc: Usama Arif, Andrew Morton, hannes, shakeel.butt, vlad.wing,
linux-mm, linux-kernel, kernel-team
On Tue, May 20, 2025 at 9:41 AM Kent Overstreet
<kent.overstreet@linux.dev> wrote:
>
> On Tue, May 20, 2025 at 08:20:38AM -0700, Suren Baghdasaryan wrote:
> > On Tue, May 20, 2025 at 7:13 AM Usama Arif <usamaarif642@gmail.com> wrote:
> > >
> > >
> > >
> > > On 20/05/2025 14:46, Usama Arif wrote:
> > > >
> > > >
> > > > On 20/05/2025 14:44, Kent Overstreet wrote:
> > > >> On Tue, May 20, 2025 at 01:25:46PM +0100, Usama Arif wrote:
> > > >>> When memory allocation profiling is running on memory bound services,
> > > >>> allocations greater than order 0 for slab object extensions can fail,
> > > >>> for e.g. zs_handle zswap slab which will be 512 objsperslab x 16 bytes
> > > >>> per slabobj_ext (order 1 allocation). Use kvcalloc to improve chances
> > > >>> of the allocation being successful.
> > > >>>
> > > >>> Signed-off-by: Usama Arif <usamaarif642@gmail.com>
> > > >>> Reported-by: Vlad Poenaru <vlad.wing@gmail.com>
> > > >>> Closes: https://lore.kernel.org/all/17fab2d6-5a74-4573-bcc3-b75951508f0a@gmail.com/
> > > >>> ---
> > > >>> mm/slub.c | 2 +-
> > > >>> 1 file changed, 1 insertion(+), 1 deletion(-)
> > > >>>
> > > >>> diff --git a/mm/slub.c b/mm/slub.c
> > > >>> index dc9e729e1d26..bf43c403ead2 100644
> > > >>> --- a/mm/slub.c
> > > >>> +++ b/mm/slub.c
> > > >>> @@ -1989,7 +1989,7 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
> > > >>> gfp &= ~OBJCGS_CLEAR_MASK;
> > > >>> /* Prevent recursive extension vector allocation */
> > > >>> gfp |= __GFP_NO_OBJ_EXT;
> > > >>> - vec = kcalloc_node(objects, sizeof(struct slabobj_ext), gfp,
> > > >>> + vec = kvcalloc_node(objects, sizeof(struct slabobj_ext), gfp,
> > > >>> slab_nid(slab));
> > > >>
> > > >> And what's the latency going to be on a vmalloc() allocation when we're
> > > >> low on memory?
> > > >
> > > > Would it not be better to get the allocation slighly slower than to not get
> > > > it at all?
> > >
> > > Also a majority of them are less than 1 page. kvmalloc of less than 1 page
> > > falls back to kmalloc. So vmalloc will only be on those greater than 1 page
> > > size, which are in the minority (for e.g. zs_handle, request_sock_subflow_v6,
> > > request_sock_subflow_v4...).
> >
> > Not just the majority. For all of these kvmalloc allocations kmalloc
> > will be tried first and vmalloc will be used only if the former
> > failed: https://elixir.bootlin.com/linux/v6.14.7/source/mm/util.c#L665
> > That's why I think this should not regress normal case when slab has
> > enough space to satisfy the allocation.
>
> And you really should consider just letting the extension vector
> allocation fail if we're under that much memory pressure.
I see your point. One case we would want to use vmalloc is if the
allocation is sizable (multiple pages), so failing it does not mean
critical memory pressure level yet. I don't think today's extension
vectors would be large enough to span multiple pages. That would
require a rather large obj_per_slab and in most cases I think this
change would not affect current behavior, the allocations will be
smaller than PAGE_SIZE and kvmalloc will fail anyway.
I guess the question is whether we want to fail if allocation size is
higher than PAGE_SIZE but less than PAGE_ALLOC_COSTLY_ORDER. Failing
that I think is reasonable and I don't think any extension vector will
be large enough to reach PAGE_ALLOC_COSTLY_ORDER. So, I'm ok with
dropping this part of the patchset.
>
> Failing allocations is an important mechanism for load shedding,
> otherwise stuff just piles up - it's a big cause of our terrible
> behaviour when we're thrashing.
>
> It's equivalent to bufferbloat in the networking world.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/2] mm: slub: allocate slab object extensions non-contiguously
2025-05-20 17:20 ` Suren Baghdasaryan
@ 2025-05-20 17:25 ` Kent Overstreet
0 siblings, 0 replies; 26+ messages in thread
From: Kent Overstreet @ 2025-05-20 17:25 UTC (permalink / raw)
To: Suren Baghdasaryan
Cc: Usama Arif, Andrew Morton, hannes, shakeel.butt, vlad.wing,
linux-mm, linux-kernel, kernel-team
On Tue, May 20, 2025 at 10:20:23AM -0700, Suren Baghdasaryan wrote:
> On Tue, May 20, 2025 at 9:41 AM Kent Overstreet
> <kent.overstreet@linux.dev> wrote:
> I see your point. One case we would want to use vmalloc is if the
> allocation is sizable (multiple pages), so failing it does not mean
> critical memory pressure level yet. I don't think today's extension
> vectors would be large enough to span multiple pages. That would
> require a rather large obj_per_slab and in most cases I think this
> change would not affect current behavior, the allocations will be
> smaller than PAGE_SIZE and kvmalloc will fail anyway.
> I guess the question is whether we want to fail if allocation size is
> higher than PAGE_SIZE but less than PAGE_ALLOC_COSTLY_ORDER. Failing
> that I think is reasonable and I don't think any extension vector will
> be large enough to reach PAGE_ALLOC_COSTLY_ORDER. So, I'm ok with
> dropping this part of the patchset.
Johannes raisess more pertinent points, so I think vmalloc allocation is
out.
If it turns out that extension vector allocation failures do cause real
problems (not for memory allocation profiling, but maybe someone is
depending on memcg being precise), I think it would acceptable to dip
into reserves for them, since they're a small fraction of the slabs
themselves.
With the caveat that we'd need to look at how the reserves are sized, to
make sure we're not running them empty and causing new problems
elsewhere.
But just failing the allocation should probably be the default approach
here.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/2] mm: slub: allocate slab object extensions non-contiguously
2025-05-20 14:28 ` Kent Overstreet
@ 2025-05-20 17:44 ` Uladzislau Rezki
2025-05-20 17:47 ` Kent Overstreet
0 siblings, 1 reply; 26+ messages in thread
From: Uladzislau Rezki @ 2025-05-20 17:44 UTC (permalink / raw)
To: Kent Overstreet
Cc: Shakeel Butt, Usama Arif, Andrew Morton, surenb, hannes,
vlad.wing, linux-mm, linux-kernel, kernel-team
On Tue, May 20, 2025 at 10:28:06AM -0400, Kent Overstreet wrote:
> On Tue, May 20, 2025 at 07:24:40AM -0700, Shakeel Butt wrote:
> > On Tue, May 20, 2025 at 10:01:27AM -0400, Kent Overstreet wrote:
> > > On Tue, May 20, 2025 at 02:46:14PM +0100, Usama Arif wrote:
> > > >
> > > >
> > > > On 20/05/2025 14:44, Kent Overstreet wrote:
> > > > > On Tue, May 20, 2025 at 01:25:46PM +0100, Usama Arif wrote:
> > > > >> When memory allocation profiling is running on memory bound services,
> > > > >> allocations greater than order 0 for slab object extensions can fail,
> > > > >> for e.g. zs_handle zswap slab which will be 512 objsperslab x 16 bytes
> > > > >> per slabobj_ext (order 1 allocation). Use kvcalloc to improve chances
> > > > >> of the allocation being successful.
> > > > >>
> > > > >> Signed-off-by: Usama Arif <usamaarif642@gmail.com>
> > > > >> Reported-by: Vlad Poenaru <vlad.wing@gmail.com>
> > > > >> Closes: https://lore.kernel.org/all/17fab2d6-5a74-4573-bcc3-b75951508f0a@gmail.com/
> > > > >> ---
> > > > >> mm/slub.c | 2 +-
> > > > >> 1 file changed, 1 insertion(+), 1 deletion(-)
> > > > >>
> > > > >> diff --git a/mm/slub.c b/mm/slub.c
> > > > >> index dc9e729e1d26..bf43c403ead2 100644
> > > > >> --- a/mm/slub.c
> > > > >> +++ b/mm/slub.c
> > > > >> @@ -1989,7 +1989,7 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
> > > > >> gfp &= ~OBJCGS_CLEAR_MASK;
> > > > >> /* Prevent recursive extension vector allocation */
> > > > >> gfp |= __GFP_NO_OBJ_EXT;
> > > > >> - vec = kcalloc_node(objects, sizeof(struct slabobj_ext), gfp,
> > > > >> + vec = kvcalloc_node(objects, sizeof(struct slabobj_ext), gfp,
> > > > >> slab_nid(slab));
> > > > >
> > > > > And what's the latency going to be on a vmalloc() allocation when we're
> > > > > low on memory?
> > > >
> > > > Would it not be better to get the allocation slighly slower than to not get
> > > > it at all?
> > >
> > > Our behaviour when thrashing sucks, we don't want to do anything to make
> > > that worse.
> > >
> > > There's also the fact that vmalloc doesn't correctly respect gfp flags,
> > > so until that gets fixed this doesn't work at all.
> >
> > Which gfp flags vmalloc is not respecting today?
>
> GFP_NOWAIT.
>
> As to why, you'd better ask Michal Hocko...
>
It is mainly due to pte_alloc_one_kernel(), it uses the GFP_KERNEL
#define GFP_PGTABLE_KERNEL (GFP_KERNEL | __GFP_ZERO)
to get a new pte entry.
I think we can fix it. For example if we populate some region and allocate
there for NOWAIT. But there are of course can be other hidden problems.
--
Uladzislau Rezki
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/2] mm: slub: allocate slab object extensions non-contiguously
2025-05-20 17:44 ` Uladzislau Rezki
@ 2025-05-20 17:47 ` Kent Overstreet
2025-05-20 17:57 ` Uladzislau Rezki
0 siblings, 1 reply; 26+ messages in thread
From: Kent Overstreet @ 2025-05-20 17:47 UTC (permalink / raw)
To: Uladzislau Rezki
Cc: Shakeel Butt, Usama Arif, Andrew Morton, surenb, hannes,
vlad.wing, linux-mm, linux-kernel, kernel-team
On Tue, May 20, 2025 at 07:44:49PM +0200, Uladzislau Rezki wrote:
> On Tue, May 20, 2025 at 10:28:06AM -0400, Kent Overstreet wrote:
> > On Tue, May 20, 2025 at 07:24:40AM -0700, Shakeel Butt wrote:
> > > On Tue, May 20, 2025 at 10:01:27AM -0400, Kent Overstreet wrote:
> > > > On Tue, May 20, 2025 at 02:46:14PM +0100, Usama Arif wrote:
> > > > >
> > > > >
> > > > > On 20/05/2025 14:44, Kent Overstreet wrote:
> > > > > > On Tue, May 20, 2025 at 01:25:46PM +0100, Usama Arif wrote:
> > > > > >> When memory allocation profiling is running on memory bound services,
> > > > > >> allocations greater than order 0 for slab object extensions can fail,
> > > > > >> for e.g. zs_handle zswap slab which will be 512 objsperslab x 16 bytes
> > > > > >> per slabobj_ext (order 1 allocation). Use kvcalloc to improve chances
> > > > > >> of the allocation being successful.
> > > > > >>
> > > > > >> Signed-off-by: Usama Arif <usamaarif642@gmail.com>
> > > > > >> Reported-by: Vlad Poenaru <vlad.wing@gmail.com>
> > > > > >> Closes: https://lore.kernel.org/all/17fab2d6-5a74-4573-bcc3-b75951508f0a@gmail.com/
> > > > > >> ---
> > > > > >> mm/slub.c | 2 +-
> > > > > >> 1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > >>
> > > > > >> diff --git a/mm/slub.c b/mm/slub.c
> > > > > >> index dc9e729e1d26..bf43c403ead2 100644
> > > > > >> --- a/mm/slub.c
> > > > > >> +++ b/mm/slub.c
> > > > > >> @@ -1989,7 +1989,7 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
> > > > > >> gfp &= ~OBJCGS_CLEAR_MASK;
> > > > > >> /* Prevent recursive extension vector allocation */
> > > > > >> gfp |= __GFP_NO_OBJ_EXT;
> > > > > >> - vec = kcalloc_node(objects, sizeof(struct slabobj_ext), gfp,
> > > > > >> + vec = kvcalloc_node(objects, sizeof(struct slabobj_ext), gfp,
> > > > > >> slab_nid(slab));
> > > > > >
> > > > > > And what's the latency going to be on a vmalloc() allocation when we're
> > > > > > low on memory?
> > > > >
> > > > > Would it not be better to get the allocation slighly slower than to not get
> > > > > it at all?
> > > >
> > > > Our behaviour when thrashing sucks, we don't want to do anything to make
> > > > that worse.
> > > >
> > > > There's also the fact that vmalloc doesn't correctly respect gfp flags,
> > > > so until that gets fixed this doesn't work at all.
> > >
> > > Which gfp flags vmalloc is not respecting today?
> >
> > GFP_NOWAIT.
> >
> > As to why, you'd better ask Michal Hocko...
> >
> It is mainly due to pte_alloc_one_kernel(), it uses the GFP_KERNEL
>
> #define GFP_PGTABLE_KERNEL (GFP_KERNEL | __GFP_ZERO)
>
> to get a new pte entry.
>
> I think we can fix it. For example if we populate some region and allocate
> there for NOWAIT. But there are of course can be other hidden problems.
No, PF_MEMALLOC flags allow for passing most of gfp flags for pte
allocation.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/2] mm: slub: allocate slab object extensions non-contiguously
2025-05-20 17:47 ` Kent Overstreet
@ 2025-05-20 17:57 ` Uladzislau Rezki
2025-05-20 17:58 ` Kent Overstreet
0 siblings, 1 reply; 26+ messages in thread
From: Uladzislau Rezki @ 2025-05-20 17:57 UTC (permalink / raw)
To: Kent Overstreet
Cc: Uladzislau Rezki, Shakeel Butt, Usama Arif, Andrew Morton, surenb,
hannes, vlad.wing, linux-mm, linux-kernel, kernel-team
On Tue, May 20, 2025 at 01:47:54PM -0400, Kent Overstreet wrote:
> On Tue, May 20, 2025 at 07:44:49PM +0200, Uladzislau Rezki wrote:
> > On Tue, May 20, 2025 at 10:28:06AM -0400, Kent Overstreet wrote:
> > > On Tue, May 20, 2025 at 07:24:40AM -0700, Shakeel Butt wrote:
> > > > On Tue, May 20, 2025 at 10:01:27AM -0400, Kent Overstreet wrote:
> > > > > On Tue, May 20, 2025 at 02:46:14PM +0100, Usama Arif wrote:
> > > > > >
> > > > > >
> > > > > > On 20/05/2025 14:44, Kent Overstreet wrote:
> > > > > > > On Tue, May 20, 2025 at 01:25:46PM +0100, Usama Arif wrote:
> > > > > > >> When memory allocation profiling is running on memory bound services,
> > > > > > >> allocations greater than order 0 for slab object extensions can fail,
> > > > > > >> for e.g. zs_handle zswap slab which will be 512 objsperslab x 16 bytes
> > > > > > >> per slabobj_ext (order 1 allocation). Use kvcalloc to improve chances
> > > > > > >> of the allocation being successful.
> > > > > > >>
> > > > > > >> Signed-off-by: Usama Arif <usamaarif642@gmail.com>
> > > > > > >> Reported-by: Vlad Poenaru <vlad.wing@gmail.com>
> > > > > > >> Closes: https://lore.kernel.org/all/17fab2d6-5a74-4573-bcc3-b75951508f0a@gmail.com/
> > > > > > >> ---
> > > > > > >> mm/slub.c | 2 +-
> > > > > > >> 1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > >>
> > > > > > >> diff --git a/mm/slub.c b/mm/slub.c
> > > > > > >> index dc9e729e1d26..bf43c403ead2 100644
> > > > > > >> --- a/mm/slub.c
> > > > > > >> +++ b/mm/slub.c
> > > > > > >> @@ -1989,7 +1989,7 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
> > > > > > >> gfp &= ~OBJCGS_CLEAR_MASK;
> > > > > > >> /* Prevent recursive extension vector allocation */
> > > > > > >> gfp |= __GFP_NO_OBJ_EXT;
> > > > > > >> - vec = kcalloc_node(objects, sizeof(struct slabobj_ext), gfp,
> > > > > > >> + vec = kvcalloc_node(objects, sizeof(struct slabobj_ext), gfp,
> > > > > > >> slab_nid(slab));
> > > > > > >
> > > > > > > And what's the latency going to be on a vmalloc() allocation when we're
> > > > > > > low on memory?
> > > > > >
> > > > > > Would it not be better to get the allocation slighly slower than to not get
> > > > > > it at all?
> > > > >
> > > > > Our behaviour when thrashing sucks, we don't want to do anything to make
> > > > > that worse.
> > > > >
> > > > > There's also the fact that vmalloc doesn't correctly respect gfp flags,
> > > > > so until that gets fixed this doesn't work at all.
> > > >
> > > > Which gfp flags vmalloc is not respecting today?
> > >
> > > GFP_NOWAIT.
> > >
> > > As to why, you'd better ask Michal Hocko...
> > >
> > It is mainly due to pte_alloc_one_kernel(), it uses the GFP_KERNEL
> >
> > #define GFP_PGTABLE_KERNEL (GFP_KERNEL | __GFP_ZERO)
> >
> > to get a new pte entry.
> >
> > I think we can fix it. For example if we populate some region and allocate
> > there for NOWAIT. But there are of course can be other hidden problems.
>
> No, PF_MEMALLOC flags allow for passing most of gfp flags for pte
> allocation.
>
It is hard-coded:
static inline pte_t *__pte_alloc_one_kernel_noprof(struct mm_struct *mm)
{
struct ptdesc *ptdesc = pagetable_alloc_noprof(GFP_PGTABLE_KERNEL &
~__GFP_HIGHMEM, 0);
if (!ptdesc)
return NULL;
return ptdesc_address(ptdesc);
}
--
Uladzislau Rezki
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/2] mm: slub: allocate slab object extensions non-contiguously
2025-05-20 17:57 ` Uladzislau Rezki
@ 2025-05-20 17:58 ` Kent Overstreet
2025-05-20 18:59 ` Uladzislau Rezki
0 siblings, 1 reply; 26+ messages in thread
From: Kent Overstreet @ 2025-05-20 17:58 UTC (permalink / raw)
To: Uladzislau Rezki
Cc: Shakeel Butt, Usama Arif, Andrew Morton, surenb, hannes,
vlad.wing, linux-mm, linux-kernel, kernel-team
On Tue, May 20, 2025 at 07:57:10PM +0200, Uladzislau Rezki wrote:
> On Tue, May 20, 2025 at 01:47:54PM -0400, Kent Overstreet wrote:
> > On Tue, May 20, 2025 at 07:44:49PM +0200, Uladzislau Rezki wrote:
> > > On Tue, May 20, 2025 at 10:28:06AM -0400, Kent Overstreet wrote:
> > > > On Tue, May 20, 2025 at 07:24:40AM -0700, Shakeel Butt wrote:
> > > > > On Tue, May 20, 2025 at 10:01:27AM -0400, Kent Overstreet wrote:
> > > > > > On Tue, May 20, 2025 at 02:46:14PM +0100, Usama Arif wrote:
> > > > > > >
> > > > > > >
> > > > > > > On 20/05/2025 14:44, Kent Overstreet wrote:
> > > > > > > > On Tue, May 20, 2025 at 01:25:46PM +0100, Usama Arif wrote:
> > > > > > > >> When memory allocation profiling is running on memory bound services,
> > > > > > > >> allocations greater than order 0 for slab object extensions can fail,
> > > > > > > >> for e.g. zs_handle zswap slab which will be 512 objsperslab x 16 bytes
> > > > > > > >> per slabobj_ext (order 1 allocation). Use kvcalloc to improve chances
> > > > > > > >> of the allocation being successful.
> > > > > > > >>
> > > > > > > >> Signed-off-by: Usama Arif <usamaarif642@gmail.com>
> > > > > > > >> Reported-by: Vlad Poenaru <vlad.wing@gmail.com>
> > > > > > > >> Closes: https://lore.kernel.org/all/17fab2d6-5a74-4573-bcc3-b75951508f0a@gmail.com/
> > > > > > > >> ---
> > > > > > > >> mm/slub.c | 2 +-
> > > > > > > >> 1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > > >>
> > > > > > > >> diff --git a/mm/slub.c b/mm/slub.c
> > > > > > > >> index dc9e729e1d26..bf43c403ead2 100644
> > > > > > > >> --- a/mm/slub.c
> > > > > > > >> +++ b/mm/slub.c
> > > > > > > >> @@ -1989,7 +1989,7 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
> > > > > > > >> gfp &= ~OBJCGS_CLEAR_MASK;
> > > > > > > >> /* Prevent recursive extension vector allocation */
> > > > > > > >> gfp |= __GFP_NO_OBJ_EXT;
> > > > > > > >> - vec = kcalloc_node(objects, sizeof(struct slabobj_ext), gfp,
> > > > > > > >> + vec = kvcalloc_node(objects, sizeof(struct slabobj_ext), gfp,
> > > > > > > >> slab_nid(slab));
> > > > > > > >
> > > > > > > > And what's the latency going to be on a vmalloc() allocation when we're
> > > > > > > > low on memory?
> > > > > > >
> > > > > > > Would it not be better to get the allocation slighly slower than to not get
> > > > > > > it at all?
> > > > > >
> > > > > > Our behaviour when thrashing sucks, we don't want to do anything to make
> > > > > > that worse.
> > > > > >
> > > > > > There's also the fact that vmalloc doesn't correctly respect gfp flags,
> > > > > > so until that gets fixed this doesn't work at all.
> > > > >
> > > > > Which gfp flags vmalloc is not respecting today?
> > > >
> > > > GFP_NOWAIT.
> > > >
> > > > As to why, you'd better ask Michal Hocko...
> > > >
> > > It is mainly due to pte_alloc_one_kernel(), it uses the GFP_KERNEL
> > >
> > > #define GFP_PGTABLE_KERNEL (GFP_KERNEL | __GFP_ZERO)
> > >
> > > to get a new pte entry.
> > >
> > > I think we can fix it. For example if we populate some region and allocate
> > > there for NOWAIT. But there are of course can be other hidden problems.
> >
> > No, PF_MEMALLOC flags allow for passing most of gfp flags for pte
> > allocation.
> >
> It is hard-coded:
>
> static inline pte_t *__pte_alloc_one_kernel_noprof(struct mm_struct *mm)
> {
> struct ptdesc *ptdesc = pagetable_alloc_noprof(GFP_PGTABLE_KERNEL &
> ~__GFP_HIGHMEM, 0);
>
> if (!ptdesc)
> return NULL;
> return ptdesc_address(ptdesc);
> }
I suggest you read the code around PF_MEMALLOC flags.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 1/2] mm: slub: allocate slab object extensions non-contiguously
2025-05-20 17:58 ` Kent Overstreet
@ 2025-05-20 18:59 ` Uladzislau Rezki
0 siblings, 0 replies; 26+ messages in thread
From: Uladzislau Rezki @ 2025-05-20 18:59 UTC (permalink / raw)
To: Kent Overstreet
Cc: Uladzislau Rezki, Shakeel Butt, Usama Arif, Andrew Morton, surenb,
hannes, vlad.wing, linux-mm, linux-kernel, kernel-team
On Tue, May 20, 2025 at 01:58:25PM -0400, Kent Overstreet wrote:
> On Tue, May 20, 2025 at 07:57:10PM +0200, Uladzislau Rezki wrote:
> > On Tue, May 20, 2025 at 01:47:54PM -0400, Kent Overstreet wrote:
> > > On Tue, May 20, 2025 at 07:44:49PM +0200, Uladzislau Rezki wrote:
> > > > On Tue, May 20, 2025 at 10:28:06AM -0400, Kent Overstreet wrote:
> > > > > On Tue, May 20, 2025 at 07:24:40AM -0700, Shakeel Butt wrote:
> > > > > > On Tue, May 20, 2025 at 10:01:27AM -0400, Kent Overstreet wrote:
> > > > > > > On Tue, May 20, 2025 at 02:46:14PM +0100, Usama Arif wrote:
> > > > > > > >
> > > > > > > >
> > > > > > > > On 20/05/2025 14:44, Kent Overstreet wrote:
> > > > > > > > > On Tue, May 20, 2025 at 01:25:46PM +0100, Usama Arif wrote:
> > > > > > > > >> When memory allocation profiling is running on memory bound services,
> > > > > > > > >> allocations greater than order 0 for slab object extensions can fail,
> > > > > > > > >> for e.g. zs_handle zswap slab which will be 512 objsperslab x 16 bytes
> > > > > > > > >> per slabobj_ext (order 1 allocation). Use kvcalloc to improve chances
> > > > > > > > >> of the allocation being successful.
> > > > > > > > >>
> > > > > > > > >> Signed-off-by: Usama Arif <usamaarif642@gmail.com>
> > > > > > > > >> Reported-by: Vlad Poenaru <vlad.wing@gmail.com>
> > > > > > > > >> Closes: https://lore.kernel.org/all/17fab2d6-5a74-4573-bcc3-b75951508f0a@gmail.com/
> > > > > > > > >> ---
> > > > > > > > >> mm/slub.c | 2 +-
> > > > > > > > >> 1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > > > >>
> > > > > > > > >> diff --git a/mm/slub.c b/mm/slub.c
> > > > > > > > >> index dc9e729e1d26..bf43c403ead2 100644
> > > > > > > > >> --- a/mm/slub.c
> > > > > > > > >> +++ b/mm/slub.c
> > > > > > > > >> @@ -1989,7 +1989,7 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
> > > > > > > > >> gfp &= ~OBJCGS_CLEAR_MASK;
> > > > > > > > >> /* Prevent recursive extension vector allocation */
> > > > > > > > >> gfp |= __GFP_NO_OBJ_EXT;
> > > > > > > > >> - vec = kcalloc_node(objects, sizeof(struct slabobj_ext), gfp,
> > > > > > > > >> + vec = kvcalloc_node(objects, sizeof(struct slabobj_ext), gfp,
> > > > > > > > >> slab_nid(slab));
> > > > > > > > >
> > > > > > > > > And what's the latency going to be on a vmalloc() allocation when we're
> > > > > > > > > low on memory?
> > > > > > > >
> > > > > > > > Would it not be better to get the allocation slighly slower than to not get
> > > > > > > > it at all?
> > > > > > >
> > > > > > > Our behaviour when thrashing sucks, we don't want to do anything to make
> > > > > > > that worse.
> > > > > > >
> > > > > > > There's also the fact that vmalloc doesn't correctly respect gfp flags,
> > > > > > > so until that gets fixed this doesn't work at all.
> > > > > >
> > > > > > Which gfp flags vmalloc is not respecting today?
> > > > >
> > > > > GFP_NOWAIT.
> > > > >
> > > > > As to why, you'd better ask Michal Hocko...
> > > > >
> > > > It is mainly due to pte_alloc_one_kernel(), it uses the GFP_KERNEL
> > > >
> > > > #define GFP_PGTABLE_KERNEL (GFP_KERNEL | __GFP_ZERO)
> > > >
> > > > to get a new pte entry.
> > > >
> > > > I think we can fix it. For example if we populate some region and allocate
> > > > there for NOWAIT. But there are of course can be other hidden problems.
> > >
> > > No, PF_MEMALLOC flags allow for passing most of gfp flags for pte
> > > allocation.
> > >
> > It is hard-coded:
> >
> > static inline pte_t *__pte_alloc_one_kernel_noprof(struct mm_struct *mm)
> > {
> > struct ptdesc *ptdesc = pagetable_alloc_noprof(GFP_PGTABLE_KERNEL &
> > ~__GFP_HIGHMEM, 0);
> >
> > if (!ptdesc)
> > return NULL;
> > return ptdesc_address(ptdesc);
> > }
>
> I suggest you read the code around PF_MEMALLOC flags.
>
To wrap the allocation context by the PF_MEMALLOC to prevent entering into
direct reclaim and no sleeping, looks like another approach, i can think about.
One concern is depleting of memory reserves. Populating PTEs would not require
this but i tend to say it is ugly approach which i mentioned above.
--
Uladzislau Rezki
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 2/2] mm: slub: only warn once when allocating slab obj extensions fails
2025-05-20 15:22 ` Usama Arif
@ 2025-05-22 0:16 ` Harry Yoo
2025-05-22 12:42 ` Usama Arif
0 siblings, 1 reply; 26+ messages in thread
From: Harry Yoo @ 2025-05-22 0:16 UTC (permalink / raw)
To: Usama Arif
Cc: Shakeel Butt, surenb, Andrew Morton, hannes, vlad.wing, linux-mm,
kent.overstreet, linux-kernel, kernel-team, vbabka, cl, rientjes,
roman.gushchin
On Tue, May 20, 2025 at 04:22:16PM +0100, Usama Arif wrote:
>
>
> On 20/05/2025 15:18, Shakeel Butt wrote:
> > On Tue, May 20, 2025 at 02:42:09PM +0100, Usama Arif wrote:
> >>
> >>
> >> On 20/05/2025 14:34, Harry Yoo wrote:
> >>> On Tue, May 20, 2025 at 01:25:47PM +0100, Usama Arif wrote:
> >>>> In memory bound systems, a large number of warnings for failing this
> >>>> allocation repeatedly may mask any real issues in the system
> >>>> during memory pressure being reported in dmesg. Change this to
> >>>> WARN_ONCE.
> >>>>
> >>>> Signed-off-by: Usama Arif <usamaarif642@gmail.com>
> >>>> Reported-by: Vlad Poenaru <vlad.wing@gmail.com>
> >>>> Closes: https://lore.kernel.org/all/17fab2d6-5a74-4573-bcc3-b75951508f0a@gmail.com/
> >>>> ---
> >>>
> >>> Hi,
> >>>
> >>> Please Cc SLAB ALLOCATOR folks in MAINTAINERS on patches that touch
> >>> slab code ;)
> >>>
> >>
> >> Thanks for adding them to CC! I was just thinking of this as a memory
> >> allocation profiling issue and added the maintainers for it,
> >> but should have added slab maintainers as well.
> >>
> >>
> >>>> mm/slub.c | 2 +-
> >>>> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/mm/slub.c b/mm/slub.c
> >>>> index bf43c403ead2..97cb3d9e8d00 100644
> >>>> --- a/mm/slub.c
> >>>> +++ b/mm/slub.c
> >>>> @@ -2102,7 +2102,7 @@ prepare_slab_obj_exts_hook(struct kmem_cache *s, gfp_t flags, void *p)
> >>>>
> >>>> slab = virt_to_slab(p);
> >>>> if (!slab_obj_exts(slab) &&
> >>>> - WARN(alloc_slab_obj_exts(slab, s, flags, false),
> >>>> + WARN_ONCE(alloc_slab_obj_exts(slab, s, flags, false),
> >>>> "%s, %s: Failed to create slab extension vector!\n",
> >>>> __func__, s->name))
> >>>
> >>> I think this should be pr_warn_once()?
> >>> I'm not sure why this was WARN() in the first place.
> >>>
> >>
> >> Isn't WARN_ONCE the same as pr_warn_once but with needing the condition
> >> of the first arg to be true? We only want to warn if alloc_slab_obj_exts
> >> returns non-zero. So WARN_ONCE should be ok?
> >>
> >
> > The difference is the impact on panic_on_warn users which are mostly
> > testing bots. This warning is not actionable, so I agree with Harry to
> > covert this to pr_warn_once().
> >
>
> Sounds good! Will change it to below for the next revision.
> Will wait for the kvmalloc conversation to conclude before sending
> the next revision.
>
> diff --git a/mm/slub.c b/mm/slub.c
> index 08804d2f2ead..ab0b7ee87159 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2101,11 +2101,13 @@ prepare_slab_obj_exts_hook(struct kmem_cache *s, gfp_t flags, void *p)
> return NULL;
>
> slab = virt_to_slab(p);
> - if (!slab_obj_exts(slab) &&
> - WARN(alloc_slab_obj_exts(slab, s, flags, false),
> - "%s, %s: Failed to create slab extension vector!\n",
> - __func__, s->name))
> - return NULL;
> + if (!slab_obj_exts(slab)) {
> + if(alloc_slab_obj_exts(slab, s, flags, false))
> + pr_warn_once("%s, %s: Failed to create slab extension vector!\n",
> + __func__, s->name);
> + else
> + return NULL;
Returning NULL when alloc_slab_obj_exts() succeeds doesn't make sense.
I think you meant something like this?
if (!slab_obj_exts(slab) &&
alloc_slab_obj_exts(slab, s, flags, false)) {
pr_warn_once("%s, %s: Failed to create slab extension vector!\n",
__func__, s->name);
return NULL;
}
> + }
>
> return slab_obj_exts(slab) + obj_to_index(s, slab, p);
> }
>
--
Cheers,
Harry / Hyeonggon
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH 2/2] mm: slub: only warn once when allocating slab obj extensions fails
2025-05-22 0:16 ` Harry Yoo
@ 2025-05-22 12:42 ` Usama Arif
0 siblings, 0 replies; 26+ messages in thread
From: Usama Arif @ 2025-05-22 12:42 UTC (permalink / raw)
To: Harry Yoo
Cc: Shakeel Butt, surenb, Andrew Morton, hannes, vlad.wing, linux-mm,
kent.overstreet, linux-kernel, kernel-team, vbabka, cl, rientjes,
roman.gushchin
On 22/05/2025 01:16, Harry Yoo wrote:
> On Tue, May 20, 2025 at 04:22:16PM +0100, Usama Arif wrote:
>>
>>
>> On 20/05/2025 15:18, Shakeel Butt wrote:
>>> On Tue, May 20, 2025 at 02:42:09PM +0100, Usama Arif wrote:
>>>>
>>>>
>>>> On 20/05/2025 14:34, Harry Yoo wrote:
>>>>> On Tue, May 20, 2025 at 01:25:47PM +0100, Usama Arif wrote:
>>>>>> In memory bound systems, a large number of warnings for failing this
>>>>>> allocation repeatedly may mask any real issues in the system
>>>>>> during memory pressure being reported in dmesg. Change this to
>>>>>> WARN_ONCE.
>>>>>>
>>>>>> Signed-off-by: Usama Arif <usamaarif642@gmail.com>
>>>>>> Reported-by: Vlad Poenaru <vlad.wing@gmail.com>
>>>>>> Closes: https://lore.kernel.org/all/17fab2d6-5a74-4573-bcc3-b75951508f0a@gmail.com/
>>>>>> ---
>>>>>
>>>>> Hi,
>>>>>
>>>>> Please Cc SLAB ALLOCATOR folks in MAINTAINERS on patches that touch
>>>>> slab code ;)
>>>>>
>>>>
>>>> Thanks for adding them to CC! I was just thinking of this as a memory
>>>> allocation profiling issue and added the maintainers for it,
>>>> but should have added slab maintainers as well.
>>>>
>>>>
>>>>>> mm/slub.c | 2 +-
>>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/mm/slub.c b/mm/slub.c
>>>>>> index bf43c403ead2..97cb3d9e8d00 100644
>>>>>> --- a/mm/slub.c
>>>>>> +++ b/mm/slub.c
>>>>>> @@ -2102,7 +2102,7 @@ prepare_slab_obj_exts_hook(struct kmem_cache *s, gfp_t flags, void *p)
>>>>>>
>>>>>> slab = virt_to_slab(p);
>>>>>> if (!slab_obj_exts(slab) &&
>>>>>> - WARN(alloc_slab_obj_exts(slab, s, flags, false),
>>>>>> + WARN_ONCE(alloc_slab_obj_exts(slab, s, flags, false),
>>>>>> "%s, %s: Failed to create slab extension vector!\n",
>>>>>> __func__, s->name))
>>>>>
>>>>> I think this should be pr_warn_once()?
>>>>> I'm not sure why this was WARN() in the first place.
>>>>>
>>>>
>>>> Isn't WARN_ONCE the same as pr_warn_once but with needing the condition
>>>> of the first arg to be true? We only want to warn if alloc_slab_obj_exts
>>>> returns non-zero. So WARN_ONCE should be ok?
>>>>
>>>
>>> The difference is the impact on panic_on_warn users which are mostly
>>> testing bots. This warning is not actionable, so I agree with Harry to
>>> covert this to pr_warn_once().
>>>
>>
>> Sounds good! Will change it to below for the next revision.
>> Will wait for the kvmalloc conversation to conclude before sending
>> the next revision.
>>
>> diff --git a/mm/slub.c b/mm/slub.c
>> index 08804d2f2ead..ab0b7ee87159 100644
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -2101,11 +2101,13 @@ prepare_slab_obj_exts_hook(struct kmem_cache *s, gfp_t flags, void *p)
>> return NULL;
>>
>> slab = virt_to_slab(p);
>> - if (!slab_obj_exts(slab) &&
>> - WARN(alloc_slab_obj_exts(slab, s, flags, false),
>> - "%s, %s: Failed to create slab extension vector!\n",
>> - __func__, s->name))
>> - return NULL;
>> + if (!slab_obj_exts(slab)) {
>> + if(alloc_slab_obj_exts(slab, s, flags, false))
>> + pr_warn_once("%s, %s: Failed to create slab extension vector!\n",
>> + __func__, s->name);
>> + else
>> + return NULL;
>
> Returning NULL when alloc_slab_obj_exts() succeeds doesn't make sense.
> I think you meant something like this?
>
> if (!slab_obj_exts(slab) &&
> alloc_slab_obj_exts(slab, s, flags, false)) {
> pr_warn_once("%s, %s: Failed to create slab extension vector!\n",
> __func__, s->name);
> return NULL;
> }
>
Yes, Thank you!
>> + }
>>
>> return slab_obj_exts(slab) + obj_to_index(s, slab, p);
>> }
>>
>
^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2025-05-22 12:42 UTC | newest]
Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-20 12:25 [PATCH 1/2] mm: slub: allocate slab object extensions non-contiguously Usama Arif
2025-05-20 12:25 ` [PATCH 2/2] mm: slub: only warn once when allocating slab obj extensions fails Usama Arif
2025-05-20 13:34 ` Harry Yoo
2025-05-20 13:42 ` Usama Arif
2025-05-20 14:18 ` Shakeel Butt
2025-05-20 15:14 ` Suren Baghdasaryan
2025-05-20 15:22 ` Usama Arif
2025-05-22 0:16 ` Harry Yoo
2025-05-22 12:42 ` Usama Arif
2025-05-20 13:44 ` [PATCH 1/2] mm: slub: allocate slab object extensions non-contiguously Kent Overstreet
2025-05-20 13:46 ` Usama Arif
2025-05-20 14:01 ` Kent Overstreet
2025-05-20 14:24 ` Shakeel Butt
2025-05-20 14:28 ` Kent Overstreet
2025-05-20 17:44 ` Uladzislau Rezki
2025-05-20 17:47 ` Kent Overstreet
2025-05-20 17:57 ` Uladzislau Rezki
2025-05-20 17:58 ` Kent Overstreet
2025-05-20 18:59 ` Uladzislau Rezki
2025-05-20 14:13 ` Usama Arif
2025-05-20 15:20 ` Suren Baghdasaryan
2025-05-20 16:41 ` Kent Overstreet
2025-05-20 17:20 ` Suren Baghdasaryan
2025-05-20 17:25 ` Kent Overstreet
2025-05-20 17:18 ` Johannes Weiner
2025-05-20 14:01 ` Usama Arif
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).