linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Wanpeng Li <liwanp@linux.vnet.ibm.com>
To: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: cl@linux-foundation.org, nacc@linux.vnet.ibm.com,
	penberg@kernel.org, linux-mm@kvack.org, paulus@samba.org,
	Anton Blanchard <anton@samba.org>,
	mpm@selenic.com, linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH] slub: Don't throw away partial remote slabs if there is no local memory
Date: Tue, 7 Jan 2014 17:52:31 +0800	[thread overview]
Message-ID: <20140107095231.GB14040@hacker.(null)> (raw)
In-Reply-To: <20140107074136.GA4011@lge.com>

On Tue, Jan 07, 2014 at 04:41:36PM +0900, Joonsoo Kim wrote:
>On Tue, Jan 07, 2014 at 01:21:00PM +1100, Anton Blanchard wrote:
>> 
>> We noticed a huge amount of slab memory consumed on a large ppc64 box:
>> 
>> Slab:            2094336 kB
>> 
>> Almost 2GB. This box is not balanced and some nodes do not have local
>> memory, causing slub to be very inefficient in its slab usage.
>> 
>> Each time we call kmem_cache_alloc_node slub checks the per cpu slab,
>> sees it isn't node local, deactivates it and tries to allocate a new
>> slab. On empty nodes we will allocate a new remote slab and use the
>> first slot, but as explained above when we get called a second time
>> we will just deactivate that slab and retry.
>> 
>> As such we end up only using 1 entry in each slab:
>> 
>> slab                    mem  objects
>>                        used   active
>> ------------------------------------
>> kmalloc-16384       1404 MB    4.90%
>> task_struct          668 MB    2.90%
>> kmalloc-128          193 MB    3.61%
>> kmalloc-192          152 MB    5.23%
>> kmalloc-8192          72 MB   23.40%
>> kmalloc-16            64 MB    7.43%
>> kmalloc-512           33 MB   22.41%
>> 
>> The patch below checks that a node is not empty before deactivating a
>> slab and trying to allocate it again. With this patch applied we now
>> use about 352MB:
>> 
>> Slab:             360192 kB
>> 
>> And our efficiency is much better:
>> 
>> slab                    mem  objects
>>                        used   active
>> ------------------------------------
>> kmalloc-16384         92 MB   74.27%
>> task_struct           23 MB   83.46%
>> idr_layer_cache       18 MB  100.00%
>> pgtable-2^12          17 MB  100.00%
>> kmalloc-65536         15 MB  100.00%
>> inode_cache           14 MB  100.00%
>> kmalloc-256           14 MB   97.81%
>> kmalloc-8192          14 MB   85.71%
>> 
>> Signed-off-by: Anton Blanchard <anton@samba.org>
>> ---
>> 
>> Thoughts? It seems like we could hit a similar situation if a machine
>> is balanced but we run out of memory on a single node.
>> 
>> Index: b/mm/slub.c
>> ===================================================================
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -2278,10 +2278,17 @@ redo:
>>  
>>  	if (unlikely(!node_match(page, node))) {
>>  		stat(s, ALLOC_NODE_MISMATCH);
>> -		deactivate_slab(s, page, c->freelist);
>> -		c->page = NULL;
>> -		c->freelist = NULL;
>> -		goto new_slab;
>> +
>> +		/*
>> +		 * If the node contains no memory there is no point in trying
>> +		 * to allocate a new node local slab
>> +		 */
>> +		if (node_spanned_pages(node)) {
>> +			deactivate_slab(s, page, c->freelist);
>> +			c->page = NULL;
>> +			c->freelist = NULL;
>> +			goto new_slab;
>> +		}
>>  	}
>>  
>>  	/*
>
>Hello,
>
>I think that we need more efforts to solve unbalanced node problem.
>
>With this patch, even if node of current cpu slab is not favorable to
>unbalanced node, allocation would proceed and we would get the unintended memory.
>
>And there is one more problem. Even if we have some partial slabs on
>compatible node, we would allocate new slab, because get_partial() cannot handle
>this unbalance node case.
>
>To fix this correctly, how about following patch?
>
>Thanks.
>
>------------->8--------------------
>diff --git a/mm/slub.c b/mm/slub.c
>index c3eb3d3..a1f6dfa 100644
>--- a/mm/slub.c
>+++ b/mm/slub.c
>@@ -1672,7 +1672,19 @@ static void *get_partial(struct kmem_cache *s, gfp_t flags, int node,
> {
>        void *object;
>        int searchnode = (node == NUMA_NO_NODE) ? numa_node_id() : node;
>+       struct zonelist *zonelist;
>+       struct zoneref *z;
>+       struct zone *zone;
>+       enum zone_type high_zoneidx = gfp_zone(flags);
>
>+       if (!node_present_pages(searchnode)) {
>+               zonelist = node_zonelist(searchnode, flags);
>+               for_each_zone_zonelist(zone, z, zonelist, high_zoneidx) {
>+                       searchnode = zone_to_nid(zone);
>+                       if (node_present_pages(searchnode))
>+                               break;
>+               }
>+       }

Why change searchnode instead of depending on fallback zones/nodes in 
get_any_partial() to allocate partial slabs?

Regards,
Wanpeng Li 

>        object = get_partial_node(s, get_node(s, searchnode), c, flags);
>        if (object || node != NUMA_NO_NODE)
>                return object;
>
>--
>To unsubscribe, send a message with 'unsubscribe linux-mm' in
>the body to majordomo@kvack.org.  For more info on Linux MM,
>see: http://www.linux-mm.org/ .
>Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2014-01-07  9:53 UTC|newest]

Thread overview: 104+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-07  2:21 [PATCH] slub: Don't throw away partial remote slabs if there is no local memory Anton Blanchard
2014-01-07  4:19 ` Wanpeng Li
2014-01-08 14:17   ` Anton Blanchard
2014-01-07  6:49 ` Andi Kleen
2014-01-08 14:03   ` Anton Blanchard
2014-01-07  7:41 ` Joonsoo Kim
2014-01-07  8:48   ` Wanpeng Li
     [not found]   ` <52cbbf7b.2792420a.571c.ffffd476SMTPIN_ADDED_BROKEN@mx.google.com>
2014-01-07  9:10     ` Joonsoo Kim
2014-01-07  9:21       ` Wanpeng Li
     [not found]       ` <52cbc738.c727440a.5ead.27a3SMTPIN_ADDED_BROKEN@mx.google.com>
2014-01-07  9:31         ` Joonsoo Kim
2014-01-07  9:49           ` Wanpeng Li
2014-01-07  9:52   ` Wanpeng Li [this message]
     [not found]   ` <52cbce84.aa71b60a.537c.ffffd9efSMTPIN_ADDED_BROKEN@mx.google.com>
2014-01-09  0:20     ` Joonsoo Kim
2014-01-20  9:10   ` Wanpeng Li
     [not found]   ` <52dce7fe.e5e6420a.5ff6.ffff84a0SMTPIN_ADDED_BROKEN@mx.google.com>
2014-01-20 22:13     ` Christoph Lameter
2014-01-21  2:20       ` Wanpeng Li
2014-01-24  3:09       ` Wanpeng Li
     [not found]       ` <52e1d960.2715420a.3569.1013SMTPIN_ADDED_BROKEN@mx.google.com>
2014-01-24  3:14         ` Wanpeng Li
     [not found]         ` <52e1da8f.86f7440a.120f.25f3SMTPIN_ADDED_BROKEN@mx.google.com>
2014-01-24 15:50           ` Christoph Lameter
2014-01-24 21:03             ` David Rientjes
2014-01-24 22:19               ` Nishanth Aravamudan
2014-01-24 23:29               ` Nishanth Aravamudan
2014-01-24 23:49                 ` David Rientjes
2014-01-25  0:16                   ` Nishanth Aravamudan
2014-01-25  0:25                     ` David Rientjes
2014-01-25  1:10                       ` Nishanth Aravamudan
2014-01-27  5:58                         ` Joonsoo Kim
2014-01-28 18:29                           ` Nishanth Aravamudan
2014-01-29 15:54                             ` Christoph Lameter
2014-01-29 22:36                             ` Nishanth Aravamudan
2014-01-30 16:26                               ` Christoph Lameter
2014-02-03 23:00                             ` Nishanth Aravamudan
2014-02-04  3:38                               ` Christoph Lameter
2014-02-04  7:26                                 ` Nishanth Aravamudan
2014-02-04 20:39                                   ` Christoph Lameter
2014-02-05  0:13                                     ` Nishanth Aravamudan
2014-02-05 19:28                                       ` Christoph Lameter
2014-02-06  2:08                                         ` Nishanth Aravamudan
2014-02-06 17:25                                           ` Christoph Lameter
2014-01-27 16:18                         ` Christoph Lameter
2014-02-06  2:07                       ` Nishanth Aravamudan
2014-02-06  8:04                         ` Joonsoo Kim
     [not found]                           ` <20140206185955.GA7845@linux.vnet.ibm.com>
2014-02-06 19:28                             ` Nishanth Aravamudan
2014-02-07  8:03                               ` Joonsoo Kim
2014-02-06  8:07                         ` [RFC PATCH 1/3] slub: search partial list on numa_mem_id(), instead of numa_node_id() Joonsoo Kim
2014-02-06  8:07                           ` [RFC PATCH 2/3] topology: support node_numa_mem() for determining the fallback node Joonsoo Kim
2014-02-06  8:52                             ` David Rientjes
2014-02-06 10:29                               ` Joonsoo Kim
2014-02-06 19:11                                 ` Nishanth Aravamudan
2014-02-07  5:42                                   ` Joonsoo Kim
2014-02-06 20:52                                 ` David Rientjes
2014-02-07  5:48                                   ` Joonsoo Kim
2014-02-07 17:53                                     ` Christoph Lameter
2014-02-07 18:51                                       ` Christoph Lameter
2014-02-07 21:38                                         ` Nishanth Aravamudan
2014-02-10  1:15                                           ` Joonsoo Kim
2014-02-10  1:29                                         ` Joonsoo Kim
2014-02-11 18:45                                           ` Christoph Lameter
2014-02-10 19:13                                         ` Nishanth Aravamudan
2014-02-11  7:42                                           ` Joonsoo Kim
2014-02-12 22:16                                             ` Christoph Lameter
2014-02-13  3:53                                               ` Nishanth Aravamudan
2014-02-17  6:52                                               ` Joonsoo Kim
2014-02-18 16:38                                                 ` Christoph Lameter
2014-02-19 22:04                                                   ` David Rientjes
2014-02-20 16:02                                                     ` Christoph Lameter
2014-02-24  5:08                                                   ` Joonsoo Kim
2014-02-24 19:54                                                     ` Christoph Lameter
2014-03-13 16:51                                                       ` Nishanth Aravamudan
2014-02-18 17:22                                               ` Nishanth Aravamudan
2014-02-13  6:51                                             ` Nishanth Aravamudan
2014-02-17  7:00                                               ` Joonsoo Kim
2014-02-18 16:57                                                 ` Christoph Lameter
2014-02-18 17:28                                                   ` Nishanth Aravamudan
2014-02-18 19:58                                                     ` Christoph Lameter
2014-02-18 21:09                                                       ` Nishanth Aravamudan
2014-02-18 21:49                                                         ` Christoph Lameter
2014-02-18 22:22                                                           ` Nishanth Aravamudan
2014-02-19 16:11                                                             ` Christoph Lameter
2014-02-19 22:03                                                       ` David Rientjes
2014-02-08  9:57                                     ` David Rientjes
2014-02-10  1:09                                       ` Joonsoo Kim
2014-07-22  1:03                                         ` Nishanth Aravamudan
2014-07-22  1:16                                           ` David Rientjes
2014-07-22 21:43                                             ` Nishanth Aravamudan
2014-07-22 21:49                                               ` Tejun Heo
2014-07-22 23:47                                               ` Nishanth Aravamudan
2014-07-23  0:43                                               ` David Rientjes
2014-02-06  8:07                           ` [RFC PATCH 3/3] slub: fallback to get_numa_mem() node if we want to allocate on memoryless node Joonsoo Kim
2014-02-06 17:30                             ` Christoph Lameter
2014-02-07  5:41                               ` Joonsoo Kim
2014-02-07 17:49                                 ` Christoph Lameter
2014-02-10  1:22                                   ` Joonsoo Kim
2014-02-06  8:37                           ` [RFC PATCH 1/3] slub: search partial list on numa_mem_id(), instead of numa_node_id() David Rientjes
2014-02-06 17:31                             ` Christoph Lameter
2014-02-06 17:26                           ` Christoph Lameter
2014-05-16 23:37                           ` Nishanth Aravamudan
2014-05-19  2:41                             ` Joonsoo Kim
2014-06-05  0:13                           ` [RESEND PATCH] " David Rientjes
2014-01-27 16:24                     ` [PATCH] slub: Don't throw away partial remote slabs if there is no local memory Christoph Lameter
2014-01-27 16:16                   ` Christoph Lameter
2014-01-07  9:42 ` David Laight
2014-01-08 14:14   ` Anton Blanchard
2014-01-07 10:28 ` Wanpeng Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='20140107095231.GB14040@hacker.(null)' \
    --to=liwanp@linux.vnet.ibm.com \
    --cc=anton@samba.org \
    --cc=cl@linux-foundation.org \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpm@selenic.com \
    --cc=nacc@linux.vnet.ibm.com \
    --cc=paulus@samba.org \
    --cc=penberg@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).