linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
To: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Cc: David Rientjes <rientjes@google.com>,
	Han Pingtian <hanpt@linux.vnet.ibm.com>,
	penberg@kernel.org, linux-mm@kvack.org, paulus@samba.org,
	Anton Blanchard <anton@samba.org>,
	mpm@selenic.com, Christoph Lameter <cl@linux.com>,
	linuxppc-dev@lists.ozlabs.org,
	Wanpeng Li <liwanp@linux.vnet.ibm.com>
Subject: Re: [PATCH] slub: Don't throw away partial remote slabs if there is no local memory
Date: Fri, 7 Feb 2014 17:03:09 +0900	[thread overview]
Message-ID: <20140207080309.GA29393@lge.com> (raw)
In-Reply-To: <20140206192812.GC7845@linux.vnet.ibm.com>

On Thu, Feb 06, 2014 at 11:28:12AM -0800, Nishanth Aravamudan wrote:
> On 06.02.2014 [10:59:55 -0800], Nishanth Aravamudan wrote:
> > On 06.02.2014 [17:04:18 +0900], Joonsoo Kim wrote:
> > > On Wed, Feb 05, 2014 at 06:07:57PM -0800, Nishanth Aravamudan wrote:
> > > > On 24.01.2014 [16:25:58 -0800], David Rientjes wrote:
> > > > > On Fri, 24 Jan 2014, Nishanth Aravamudan wrote:
> > > > > 
> > > > > > Thank you for clarifying and providing  a test patch. I ran with this on
> > > > > > the system showing the original problem, configured to have 15GB of
> > > > > > memory.
> > > > > > 
> > > > > > With your patch after boot:
> > > > > > 
> > > > > > MemTotal:       15604736 kB
> > > > > > MemFree:         8768192 kB
> > > > > > Slab:            3882560 kB
> > > > > > SReclaimable:     105408 kB
> > > > > > SUnreclaim:      3777152 kB
> > > > > > 
> > > > > > With Anton's patch after boot:
> > > > > > 
> > > > > > MemTotal:       15604736 kB
> > > > > > MemFree:        11195008 kB
> > > > > > Slab:            1427968 kB
> > > > > > SReclaimable:     109184 kB
> > > > > > SUnreclaim:      1318784 kB
> > > > > > 
> > > > > > 
> > > > > > I know that's fairly unscientific, but the numbers are reproducible. 
> > > > > > 
> > > > > 
> > > > > I don't think the goal of the discussion is to reduce the amount of slab 
> > > > > allocated, but rather get the most local slab memory possible by use of 
> > > > > kmalloc_node().  When a memoryless node is being passed to kmalloc_node(), 
> > > > > which is probably cpu_to_node() for a cpu bound to a node without memory, 
> > > > > my patch is allocating it on the most local node; Anton's patch is 
> > > > > allocating it on whatever happened to be the cpu slab.
> > > > > 
> > > > > > > diff --git a/mm/slub.c b/mm/slub.c
> > > > > > > --- a/mm/slub.c
> > > > > > > +++ b/mm/slub.c
> > > > > > > @@ -2278,10 +2278,14 @@ redo:
> > > > > > > 
> > > > > > >  	if (unlikely(!node_match(page, node))) {
> > > > > > >  		stat(s, ALLOC_NODE_MISMATCH);
> > > > > > > -		deactivate_slab(s, page, c->freelist);
> > > > > > > -		c->page = NULL;
> > > > > > > -		c->freelist = NULL;
> > > > > > > -		goto new_slab;
> > > > > > > +		if (unlikely(!node_present_pages(node)))
> > > > > > > +			node = numa_mem_id();
> > > > > > > +		if (!node_match(page, node)) {
> > > > > > > +			deactivate_slab(s, page, c->freelist);
> > > > > > > +			c->page = NULL;
> > > > > > > +			c->freelist = NULL;
> > > > > > > +			goto new_slab;
> > > > > > > +		}
> > > > > > 
> > > > > > Semantically, and please correct me if I'm wrong, this patch is saying
> > > > > > if we have a memoryless node, we expect the page's locality to be that
> > > > > > of numa_mem_id(), and we still deactivate the slab if that isn't true.
> > > > > > Just wanting to make sure I understand the intent.
> > > > > > 
> > > > > 
> > > > > Yeah, the default policy should be to fallback to local memory if the node 
> > > > > passed is memoryless.
> > > > > 
> > > > > > What I find odd is that there are only 2 nodes on this system, node 0
> > > > > > (empty) and node 1. So won't numa_mem_id() always be 1? And every page
> > > > > > should be coming from node 1 (thus node_match() should always be true?)
> > > > > > 
> > > > > 
> > > > > The nice thing about slub is its debugging ability, what is 
> > > > > /sys/kernel/slab/cache/objects showing in comparison between the two 
> > > > > patches?
> > > > 
> > > > Ok, I finally got around to writing a script that compares the objects
> > > > output from both kernels.
> > > > 
> > > > log1 is with CONFIG_HAVE_MEMORYLESS_NODES on, my kthread locality patch
> > > > and Joonsoo's patch.
> > > > 
> > > > log2 is with CONFIG_HAVE_MEMORYLESS_NODES on, my kthread locality patch
> > > > and Anton's patch.
> > > > 
> > > > slab                           objects    objects   percent
> > > >                                log1       log2      change
> > > > -----------------------------------------------------------
> > > > :t-0000104                     71190      85680      20.353982 %
> > > > UDP                            4352       3392       22.058824 %
> > > > inode_cache                    54302      41923      22.796582 %
> > > > fscache_cookie_jar             3276       2457       25.000000 %
> > > > :t-0000896                     438        292        33.333333 %
> > > > :t-0000080                     310401     195323     37.073978 %
> > > > ext4_inode_cache               335        201        40.000000 %
> > > > :t-0000192                     89408      128898     44.168307 %
> > > > :t-0000184                     151300     81880      45.882353 %
> > > > :t-0000512                     49698      73648      48.191074 %
> > > > :at-0000192                    242867     120948     50.199904 %
> > > > xfs_inode                      34350      15221      55.688501 %
> > > > :t-0016384                     11005      17257      56.810541 %
> > > > proc_inode_cache               103868     34717      66.575846 %
> > > > tw_sock_TCP                    768        256        66.666667 %
> > > > :t-0004096                     15240      25672      68.451444 %
> > > > nfs_inode_cache                1008       315        68.750000 %
> > > > :t-0001024                     14528      24720      70.154185 %
> > > > :t-0032768                     655        1312       100.305344%
> > > > :t-0002048                     14242      30720      115.700042%
> > > > :t-0000640                     1020       2550       150.000000%
> > > > :t-0008192                     10005      27905      178.910545%
> > > > 
> > > > FWIW, the configuration of this LPAR has slightly changed. It is now configured
> > > > for maximally 400 CPUs, of which 200 are present. The result is that even with
> > > > Joonsoo's patch (log1 above), we OOM pretty easily and Anton's slab usage
> > > > script reports:
> > > > 
> > > > slab                                   mem     objs    slabs
> > > >                                       used   active   active
> > > > ------------------------------------------------------------
> > > > kmalloc-512                        1182 MB    2.03%  100.00%
> > > > kmalloc-192                        1182 MB    1.38%  100.00%
> > > > kmalloc-16384                       966 MB   17.66%  100.00%
> > > > kmalloc-4096                        353 MB   15.92%  100.00%
> > > > kmalloc-8192                        259 MB   27.28%  100.00%
> > > > kmalloc-32768                       207 MB    9.86%  100.00%
> > > > 
> > > > In comparison (log2 above):
> > > > 
> > > > slab                                   mem     objs    slabs
> > > >                                       used   active   active
> > > > ------------------------------------------------------------
> > > > kmalloc-16384                       273 MB   98.76%  100.00%
> > > > kmalloc-8192                        225 MB   98.67%  100.00%
> > > > pgtable-2^11                        114 MB  100.00%  100.00%
> > > > pgtable-2^12                        109 MB  100.00%  100.00%
> > > > kmalloc-4096                        104 MB   98.59%  100.00%
> > > > 
> > > > I appreciate all the help so far, if anyone has any ideas how best to
> > > > proceed further, or what they'd like debugged more, I'm happy to get
> > > > this fixed. We're hitting this on a couple of different systems and I'd
> > > > like to find a good resolution to the problem.
> > > 
> > > Hello,
> > > 
> > > I have no memoryless system, so, to debug it, I need your help. :)
> > > First, please let me know node information on your system.
> > 
> > [    0.000000] Node 0 Memory:
> > [    0.000000] Node 1 Memory: 0x0-0x200000000
> > 
> > [    0.000000] On node 0 totalpages: 0
> > [    0.000000] On node 1 totalpages: 131072
> > [    0.000000]   DMA zone: 112 pages used for memmap
> > [    0.000000]   DMA zone: 0 pages reserved
> > [    0.000000]   DMA zone: 131072 pages, LIFO batch:1
> > 
> > [    0.638391] Node 0 CPUs: 0-199
> > [    0.638394] Node 1 CPUs:
> > 
> > Do you need anything else?
> > 
> > > I'm preparing 3 another patches which are nearly same with previous patch,
> > > but slightly different approach. Could you test them on your system?
> > > I will send them soon.
> > 
> > Test results are in the attached tarball [1].
> > 
> > > And I think that same problem exists if CONFIG_SLAB is enabled. Could you
> > > confirm that?
> > 
> > I will test and let you know.
> 
> Ok, with your patches applied and CONFIG_SLAB enabled:
> 
> MemTotal:        8264640 kB
> MemFree:         7119680 kB
> Slab:             207232 kB
> SReclaimable:      32896 kB
> SUnreclaim:       174336 kB
> 
> For reference, same kernel with CONFIG_SLUB:
> 
> MemTotal:        8264640 kB
> MemFree:         4264000 kB
> Slab:            3065408 kB
> SReclaimable:     104704 kB
> SUnreclaim:      2960704 kB
> 


Hello,

First of all, thanks for testing!

My patch only affects CONFIG_SLUB. Request to test on CONFIG_SLAB is just
for reference. It seems that my patches doesn't have any effect to your case.
Could you check that numa_mem_id() and get_numa_mem() returns correctly?
I think that numa_mem_id() for all cpus and get_numa_mem() for all nodes
should return 1 on your system.

I will investigate further on my side.

Thanks!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2014-02-07  8:03 UTC|newest]

Thread overview: 124+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-07  2:21 [PATCH] slub: Don't throw away partial remote slabs if there is no local memory Anton Blanchard
2014-01-07  4:19 ` Wanpeng Li
2014-01-07  4:19 ` Wanpeng Li
2014-01-07  4:19 ` Wanpeng Li
2014-01-07  6:49 ` Andi Kleen
2014-01-08 14:03   ` Anton Blanchard
2014-01-07  7:41 ` Joonsoo Kim
2014-01-07  8:48   ` Wanpeng Li
2014-01-07  8:48   ` Wanpeng Li
2014-01-07  9:10     ` Joonsoo Kim
2014-01-07  9:21       ` Wanpeng Li
2014-01-07  9:31         ` Joonsoo Kim
2014-01-07  9:49           ` Wanpeng Li
2014-01-07  9:49           ` Wanpeng Li
2014-01-07  9:49           ` Wanpeng Li
2014-01-07  9:21       ` Wanpeng Li
2014-01-07  9:21       ` Wanpeng Li
2014-01-07  8:48   ` Wanpeng Li
2014-01-07  9:52   ` Wanpeng Li
2014-01-09  0:20     ` Joonsoo Kim
2014-01-07  9:52   ` Wanpeng Li
2014-01-07  9:52   ` Wanpeng Li
2014-01-20  9:10   ` Wanpeng Li
2014-01-20  9:10   ` Wanpeng Li
2014-01-20  9:10   ` Wanpeng Li
     [not found]   ` <52dce7fe.e5e6420a.5ff6.ffff84a0SMTPIN_ADDED_BROKEN@mx.google.com>
2014-01-20 22:13     ` Christoph Lameter
2014-01-21  2:20       ` Wanpeng Li
2014-01-21  2:20       ` Wanpeng Li
2014-01-21  2:20       ` Wanpeng Li
2014-01-24  3:09       ` Wanpeng Li
2014-01-24  3:09       ` Wanpeng Li
2014-01-24  3:14         ` Wanpeng Li
2014-01-24  3:14         ` Wanpeng Li
2014-01-24  3:14         ` Wanpeng Li
     [not found]         ` <52e1da8f.86f7440a.120f.25f3SMTPIN_ADDED_BROKEN@mx.google.com>
2014-01-24 15:50           ` Christoph Lameter
2014-01-24 21:03             ` David Rientjes
2014-01-24 22:19               ` Nishanth Aravamudan
2014-01-24 23:29               ` Nishanth Aravamudan
2014-01-24 23:49                 ` David Rientjes
2014-01-25  0:16                   ` Nishanth Aravamudan
2014-01-25  0:25                     ` David Rientjes
2014-01-25  1:10                       ` Nishanth Aravamudan
2014-01-27  5:58                         ` Joonsoo Kim
2014-01-28 18:29                           ` Nishanth Aravamudan
2014-01-29 15:54                             ` Christoph Lameter
2014-01-29 22:36                             ` Nishanth Aravamudan
2014-01-30 16:26                               ` Christoph Lameter
2014-02-03 23:00                             ` Nishanth Aravamudan
2014-02-04  3:38                               ` Christoph Lameter
2014-02-04  7:26                                 ` Nishanth Aravamudan
2014-02-04 20:39                                   ` Christoph Lameter
2014-02-05  0:13                                     ` Nishanth Aravamudan
2014-02-05 19:28                                       ` Christoph Lameter
2014-02-06  2:08                                         ` Nishanth Aravamudan
2014-02-06 17:25                                           ` Christoph Lameter
2014-01-27 16:18                         ` Christoph Lameter
2014-02-06  2:07                       ` Nishanth Aravamudan
2014-02-06  8:04                         ` Joonsoo Kim
     [not found]                           ` <20140206185955.GA7845@linux.vnet.ibm.com>
2014-02-06 19:28                             ` Nishanth Aravamudan
2014-02-07  8:03                               ` Joonsoo Kim [this message]
2014-02-06  8:07                         ` [RFC PATCH 1/3] slub: search partial list on numa_mem_id(), instead of numa_node_id() Joonsoo Kim
2014-02-06  8:07                           ` [RFC PATCH 2/3] topology: support node_numa_mem() for determining the fallback node Joonsoo Kim
2014-02-06  8:52                             ` David Rientjes
2014-02-06 10:29                               ` Joonsoo Kim
2014-02-06 19:11                                 ` Nishanth Aravamudan
2014-02-07  5:42                                   ` Joonsoo Kim
2014-02-06 20:52                                 ` David Rientjes
2014-02-07  5:48                                   ` Joonsoo Kim
2014-02-07 17:53                                     ` Christoph Lameter
2014-02-07 18:51                                       ` Christoph Lameter
2014-02-07 21:38                                         ` Nishanth Aravamudan
2014-02-10  1:15                                           ` Joonsoo Kim
2014-02-10  1:29                                         ` Joonsoo Kim
2014-02-11 18:45                                           ` Christoph Lameter
2014-02-10 19:13                                         ` Nishanth Aravamudan
2014-02-11  7:42                                           ` Joonsoo Kim
2014-02-12 22:16                                             ` Christoph Lameter
2014-02-13  3:53                                               ` Nishanth Aravamudan
2014-02-17  6:52                                               ` Joonsoo Kim
2014-02-18 16:38                                                 ` Christoph Lameter
2014-02-19 22:04                                                   ` David Rientjes
2014-02-20 16:02                                                     ` Christoph Lameter
2014-02-24  5:08                                                   ` Joonsoo Kim
2014-02-24 19:54                                                     ` Christoph Lameter
2014-03-13 16:51                                                       ` Nishanth Aravamudan
2014-02-18 17:22                                               ` Nishanth Aravamudan
2014-02-13  6:51                                             ` Nishanth Aravamudan
2014-02-17  7:00                                               ` Joonsoo Kim
2014-02-18 16:57                                                 ` Christoph Lameter
2014-02-18 17:28                                                   ` Nishanth Aravamudan
2014-02-18 19:58                                                     ` Christoph Lameter
2014-02-18 21:09                                                       ` Nishanth Aravamudan
2014-02-18 21:49                                                         ` Christoph Lameter
2014-02-18 22:22                                                           ` Nishanth Aravamudan
2014-02-19 16:11                                                             ` Christoph Lameter
2014-02-19 22:03                                                       ` David Rientjes
2014-02-08  9:57                                     ` David Rientjes
2014-02-10  1:09                                       ` Joonsoo Kim
2014-07-22  1:03                                         ` Nishanth Aravamudan
2014-07-22  1:16                                           ` David Rientjes
2014-07-22 21:43                                             ` Nishanth Aravamudan
2014-07-22 21:49                                               ` Tejun Heo
2014-07-22 23:47                                               ` Nishanth Aravamudan
2014-07-23  0:43                                               ` David Rientjes
2014-02-06  8:07                           ` [RFC PATCH 3/3] slub: fallback to get_numa_mem() node if we want to allocate on memoryless node Joonsoo Kim
2014-02-06 17:30                             ` Christoph Lameter
2014-02-07  5:41                               ` Joonsoo Kim
2014-02-07 17:49                                 ` Christoph Lameter
2014-02-10  1:22                                   ` Joonsoo Kim
2014-02-06  8:37                           ` [RFC PATCH 1/3] slub: search partial list on numa_mem_id(), instead of numa_node_id() David Rientjes
2014-02-06 17:31                             ` Christoph Lameter
2014-02-06 17:26                           ` Christoph Lameter
2014-05-16 23:37                           ` Nishanth Aravamudan
2014-05-19  2:41                             ` Joonsoo Kim
2014-06-05  0:13                           ` [RESEND PATCH] " David Rientjes
2014-01-27 16:24                     ` [PATCH] slub: Don't throw away partial remote slabs if there is no local memory Christoph Lameter
2014-01-27 16:16                   ` Christoph Lameter
2014-01-24  3:09       ` Wanpeng Li
2014-01-07  9:42 ` David Laight
2014-01-08 14:14   ` Anton Blanchard
2014-01-07 10:28 ` Wanpeng Li
2014-01-07 10:28 ` Wanpeng Li
2014-01-07 10:28 ` Wanpeng Li
     [not found] ` <20140107041939.GA20916@hacker.(null)>
2014-01-08 14:17   ` Anton Blanchard

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140207080309.GA29393@lge.com \
    --to=iamjoonsoo.kim@lge.com \
    --cc=anton@samba.org \
    --cc=cl@linux.com \
    --cc=hanpt@linux.vnet.ibm.com \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=liwanp@linux.vnet.ibm.com \
    --cc=mpm@selenic.com \
    --cc=nacc@linux.vnet.ibm.com \
    --cc=paulus@samba.org \
    --cc=penberg@kernel.org \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).