Re: [PATCH] mm/slub: Reduce memory consumption in extreme scenarios

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "chenjun (AM)" <chenjun102@huawei.com>
To: Vlastimil Babka <vbabka@suse.cz>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"cl@linux.com" <cl@linux.com>,
	"penberg@kernel.org" <penberg@kernel.org>,
	"rientjes@google.com" <rientjes@google.com>,
	"iamjoonsoo.kim@lge.com" <iamjoonsoo.kim@lge.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: "xuqiang (M)" <xuqiang36@huawei.com>,
	"Wangkefeng (OS Kernel Lab)" <wangkefeng.wang@huawei.com>
Subject: Re: [PATCH] mm/slub: Reduce memory consumption in extreme scenarios
Date: Fri, 17 Mar 2023 11:32:15 +0000	[thread overview]
Message-ID: <344c7521d72e4107b451c19b329e9864@huawei.com> (raw)
In-Reply-To: 0cad1ff3-8339-a3eb-fc36-c8bda1392451@suse.cz

在 2023/3/14 22:41, Vlastimil Babka 写道:
> 
> On 3/14/23 13:34, Chen Jun wrote:
>> When kmalloc_node() is called without __GFP_THISNODE and the target node
>> lacks sufficient memory, SLUB allocates a folio from a different node
>> other than the requested node, instead of taking a partial slab from it.
>>
>> However, since the allocated folio does not belong to the requested
>> node, it is deactivated and added to the partial slab list of the node
>> it belongs to.
>>
>> This behavior can result in excessive memory usage when the requested
>> node has insufficient memory, as SLUB will repeatedly allocate folios
>> from other nodes without reusing the previously allocated ones.
>>
>> To prevent memory wastage,
>> when (node != NUMA_NO_NODE) && (gfpflags & __GFP_THISNODE) is:
>> 1) try to get a partial slab from target node with __GFP_THISNODE.
>> 2) if 1) failed, try to allocate a new slab from target node with
>>     __GFP_THISNODE.
>> 3) if 2) failed, retry 1) and 2) without __GFP_THISNODE constraint.
>>
>> when node != NUMA_NO_NODE || (gfpflags & __GFP_THISNODE), the behavior
>> remains unchanged.
>>
>> On qemu with 4 numa nodes and each numa has 1G memory. Write a test ko
>> to call kmalloc_node(196, GFP_KERNEL, 3) for (4 * 1024 + 4) * 1024 times.
>>
>> cat /proc/slabinfo shows:
>> kmalloc-256       4200530 13519712    256   32    2 : tunables..
>>
>> after this patch,
>> cat /proc/slabinfo shows:
>> kmalloc-256       4200558 4200768    256   32    2 : tunables..
>>
>> Signed-off-by: Chen Jun <chenjun102@huawei.com>
>> ---
>>   mm/slub.c | 22 +++++++++++++++++++---
>>   1 file changed, 19 insertions(+), 3 deletions(-)
>>
>> diff --git a/mm/slub.c b/mm/slub.c
>> index 39327e98fce3..32e436957e03 100644
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -2384,7 +2384,7 @@ static void *get_partial(struct kmem_cache *s, int node, struct partial_context
>>   		searchnode = numa_mem_id();
>>   
>>   	object = get_partial_node(s, get_node(s, searchnode), pc);
>> -	if (object || node != NUMA_NO_NODE)
>> +	if (object || (node != NUMA_NO_NODE && (pc->flags & __GFP_THISNODE)))
>>   		return object;
>>   
>>   	return get_any_partial(s, pc);
>> @@ -3069,6 +3069,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
>>   	struct slab *slab;
>>   	unsigned long flags;
>>   	struct partial_context pc;
>> +	bool try_thisnode = true;
>>   
>>   	stat(s, ALLOC_SLOWPATH);
>>   
>> @@ -3181,8 +3182,18 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
>>   	}
>>   
>>   new_objects:
>> -
>>   	pc.flags = gfpflags;
>> +
>> +	/*
>> +	 * when (node != NUMA_NO_NODE) && (gfpflags & __GFP_THISNODE)
>> +	 * 1) try to get a partial slab from target node with __GFP_THISNODE.
>> +	 * 2) if 1) failed, try to allocate a new slab from target node with
>> +	 *    __GFP_THISNODE.
>> +	 * 3) if 2) failed, retry 1) and 2) without __GFP_THISNODE constraint.
>> +	 */
>> +	if (node != NUMA_NO_NODE && !(gfpflags & __GFP_THISNODE) && try_thisnode)
>> +			pc.flags |= __GFP_THISNODE;
> 
> Hmm I'm thinking we should also perhaps remove direct reclaim possibilities
> from the attempt 2). In your qemu test it should make no difference, as it
> fills everything with kernel memory that is not reclaimable. But in practice
> the target node might be filled with user memory, and I think it's better to
> quickly allocate on a different node than spend time in direct reclaim. So
> the following should work I think?
> 
> pc.flags = GFP_NOWAIT | __GFP_NOWARN |__GFP_THISNODE
> 

Hmm, Should it be that:

pc.flags |= GFP_NOWAIT | __GFP_NOWARN |__GFP_THISNODE
          ^
>> +
>>   	pc.slab = &slab;
>>   	pc.orig_size = orig_size;
>>   	freelist = get_partial(s, node, &pc);
>> @@ -3190,10 +3201,15 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
>>   		goto check_new_slab;
>>   
>>   	slub_put_cpu_ptr(s->cpu_slab);
>> -	slab = new_slab(s, gfpflags, node);
>> +	slab = new_slab(s, pc.flags, node);
>>   	c = slub_get_cpu_ptr(s->cpu_slab);
>>   
>>   	if (unlikely(!slab)) {
>> +		if (node != NUMA_NO_NODE && !(gfpflags & __GFP_THISNODE) && try_thisnode) {
>> +			try_thisnode = false;
>> +			goto new_objects;
>> +		}
>> +
>>   		slab_out_of_memory(s, gfpflags, node);
>>   		return NULL;
>>   	}
> 
>

next prev parent reply	other threads:[~2023-03-17 12:01 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-14 12:34 [PATCH] mm/slub: Reduce memory consumption in extreme scenarios Chen Jun
2023-03-14 14:41 ` Vlastimil Babka
2023-03-17 11:32   ` chenjun (AM) [this message]
2023-03-17 12:06     ` Vlastimil Babka
2023-03-19  7:22       ` chenjun (AM)
2023-03-20  8:05         ` Vlastimil Babka
2023-03-20  9:12           ` Mike Rapoport
2023-03-21  9:30             ` chenjun (AM)
2023-03-29  8:41               ` Vlastimil Babka
2023-03-21  9:41             ` Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=344c7521d72e4107b451c19b329e9864@huawei.com \
    --to=chenjun102@huawei.com \
    --cc=42.hyeyoo@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=penberg@kernel.org \
    --cc=rientjes@google.com \
    --cc=vbabka@suse.cz \
    --cc=wangkefeng.wang@huawei.com \
    --cc=xuqiang36@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.