From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
To: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Cc: Han Pingtian <hanpt@linux.vnet.ibm.com>,
mpm@selenic.com, penberg@kernel.org, linux-mm@kvack.org,
paulus@samba.org, Anton Blanchard <anton@samba.org>,
David Rientjes <rientjes@google.com>,
Christoph Lameter <cl@linux.com>,
linuxppc-dev@lists.ozlabs.org,
Wanpeng Li <liwanp@linux.vnet.ibm.com>
Subject: Re: [PATCH] slub: Don't throw away partial remote slabs if there is no local memory
Date: Thu, 6 Feb 2014 17:04:18 +0900 [thread overview]
Message-ID: <20140206080418.GA19913@lge.com> (raw)
In-Reply-To: <20140206020757.GC5433@linux.vnet.ibm.com>
On Wed, Feb 05, 2014 at 06:07:57PM -0800, Nishanth Aravamudan wrote:
> On 24.01.2014 [16:25:58 -0800], David Rientjes wrote:
> > On Fri, 24 Jan 2014, Nishanth Aravamudan wrote:
> >
> > > Thank you for clarifying and providing a test patch. I ran with this on
> > > the system showing the original problem, configured to have 15GB of
> > > memory.
> > >
> > > With your patch after boot:
> > >
> > > MemTotal: 15604736 kB
> > > MemFree: 8768192 kB
> > > Slab: 3882560 kB
> > > SReclaimable: 105408 kB
> > > SUnreclaim: 3777152 kB
> > >
> > > With Anton's patch after boot:
> > >
> > > MemTotal: 15604736 kB
> > > MemFree: 11195008 kB
> > > Slab: 1427968 kB
> > > SReclaimable: 109184 kB
> > > SUnreclaim: 1318784 kB
> > >
> > >
> > > I know that's fairly unscientific, but the numbers are reproducible.
> > >
> >
> > I don't think the goal of the discussion is to reduce the amount of slab
> > allocated, but rather get the most local slab memory possible by use of
> > kmalloc_node(). When a memoryless node is being passed to kmalloc_node(),
> > which is probably cpu_to_node() for a cpu bound to a node without memory,
> > my patch is allocating it on the most local node; Anton's patch is
> > allocating it on whatever happened to be the cpu slab.
> >
> > > > diff --git a/mm/slub.c b/mm/slub.c
> > > > --- a/mm/slub.c
> > > > +++ b/mm/slub.c
> > > > @@ -2278,10 +2278,14 @@ redo:
> > > >
> > > > if (unlikely(!node_match(page, node))) {
> > > > stat(s, ALLOC_NODE_MISMATCH);
> > > > - deactivate_slab(s, page, c->freelist);
> > > > - c->page = NULL;
> > > > - c->freelist = NULL;
> > > > - goto new_slab;
> > > > + if (unlikely(!node_present_pages(node)))
> > > > + node = numa_mem_id();
> > > > + if (!node_match(page, node)) {
> > > > + deactivate_slab(s, page, c->freelist);
> > > > + c->page = NULL;
> > > > + c->freelist = NULL;
> > > > + goto new_slab;
> > > > + }
> > >
> > > Semantically, and please correct me if I'm wrong, this patch is saying
> > > if we have a memoryless node, we expect the page's locality to be that
> > > of numa_mem_id(), and we still deactivate the slab if that isn't true.
> > > Just wanting to make sure I understand the intent.
> > >
> >
> > Yeah, the default policy should be to fallback to local memory if the node
> > passed is memoryless.
> >
> > > What I find odd is that there are only 2 nodes on this system, node 0
> > > (empty) and node 1. So won't numa_mem_id() always be 1? And every page
> > > should be coming from node 1 (thus node_match() should always be true?)
> > >
> >
> > The nice thing about slub is its debugging ability, what is
> > /sys/kernel/slab/cache/objects showing in comparison between the two
> > patches?
>
> Ok, I finally got around to writing a script that compares the objects
> output from both kernels.
>
> log1 is with CONFIG_HAVE_MEMORYLESS_NODES on, my kthread locality patch
> and Joonsoo's patch.
>
> log2 is with CONFIG_HAVE_MEMORYLESS_NODES on, my kthread locality patch
> and Anton's patch.
>
> slab objects objects percent
> log1 log2 change
> -----------------------------------------------------------
> :t-0000104 71190 85680 20.353982 %
> UDP 4352 3392 22.058824 %
> inode_cache 54302 41923 22.796582 %
> fscache_cookie_jar 3276 2457 25.000000 %
> :t-0000896 438 292 33.333333 %
> :t-0000080 310401 195323 37.073978 %
> ext4_inode_cache 335 201 40.000000 %
> :t-0000192 89408 128898 44.168307 %
> :t-0000184 151300 81880 45.882353 %
> :t-0000512 49698 73648 48.191074 %
> :at-0000192 242867 120948 50.199904 %
> xfs_inode 34350 15221 55.688501 %
> :t-0016384 11005 17257 56.810541 %
> proc_inode_cache 103868 34717 66.575846 %
> tw_sock_TCP 768 256 66.666667 %
> :t-0004096 15240 25672 68.451444 %
> nfs_inode_cache 1008 315 68.750000 %
> :t-0001024 14528 24720 70.154185 %
> :t-0032768 655 1312 100.305344%
> :t-0002048 14242 30720 115.700042%
> :t-0000640 1020 2550 150.000000%
> :t-0008192 10005 27905 178.910545%
>
> FWIW, the configuration of this LPAR has slightly changed. It is now configured
> for maximally 400 CPUs, of which 200 are present. The result is that even with
> Joonsoo's patch (log1 above), we OOM pretty easily and Anton's slab usage
> script reports:
>
> slab mem objs slabs
> used active active
> ------------------------------------------------------------
> kmalloc-512 1182 MB 2.03% 100.00%
> kmalloc-192 1182 MB 1.38% 100.00%
> kmalloc-16384 966 MB 17.66% 100.00%
> kmalloc-4096 353 MB 15.92% 100.00%
> kmalloc-8192 259 MB 27.28% 100.00%
> kmalloc-32768 207 MB 9.86% 100.00%
>
> In comparison (log2 above):
>
> slab mem objs slabs
> used active active
> ------------------------------------------------------------
> kmalloc-16384 273 MB 98.76% 100.00%
> kmalloc-8192 225 MB 98.67% 100.00%
> pgtable-2^11 114 MB 100.00% 100.00%
> pgtable-2^12 109 MB 100.00% 100.00%
> kmalloc-4096 104 MB 98.59% 100.00%
>
> I appreciate all the help so far, if anyone has any ideas how best to
> proceed further, or what they'd like debugged more, I'm happy to get
> this fixed. We're hitting this on a couple of different systems and I'd
> like to find a good resolution to the problem.
Hello,
I have no memoryless system, so, to debug it, I need your help. :)
First, please let me know node information on your system.
I'm preparing 3 another patches which are nearly same with previous patch,
but slightly different approach. Could you test them on your system?
I will send them soon.
And I think that same problem exists if CONFIG_SLAB is enabled. Could you
confirm that?
And, could you confirm that your system's numa_mem_id() is properly set?
And, could you confirm that node_present_pages() test works properly?
And, with my patches, could you give me more information on slub stat?
For this, you need to enable CONFIG_SLUB_STATS. Then please send me all the
slub stat on /proc/sys/kernel/debug/slab.
Sorry for too many request.
If it bothers you too much, please ignore it :)
Thanks.
next prev parent reply other threads:[~2014-02-06 8:04 UTC|newest]
Thread overview: 104+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-01-07 2:21 [PATCH] slub: Don't throw away partial remote slabs if there is no local memory Anton Blanchard
2014-01-07 4:19 ` Wanpeng Li
2014-01-08 14:17 ` Anton Blanchard
2014-01-07 6:49 ` Andi Kleen
2014-01-08 14:03 ` Anton Blanchard
2014-01-07 7:41 ` Joonsoo Kim
2014-01-07 8:48 ` Wanpeng Li
[not found] ` <52cbbf7b.2792420a.571c.ffffd476SMTPIN_ADDED_BROKEN@mx.google.com>
2014-01-07 9:10 ` Joonsoo Kim
2014-01-07 9:21 ` Wanpeng Li
[not found] ` <52cbc738.c727440a.5ead.27a3SMTPIN_ADDED_BROKEN@mx.google.com>
2014-01-07 9:31 ` Joonsoo Kim
2014-01-07 9:49 ` Wanpeng Li
2014-01-07 9:52 ` Wanpeng Li
[not found] ` <52cbce84.aa71b60a.537c.ffffd9efSMTPIN_ADDED_BROKEN@mx.google.com>
2014-01-09 0:20 ` Joonsoo Kim
2014-01-20 9:10 ` Wanpeng Li
[not found] ` <52dce7fe.e5e6420a.5ff6.ffff84a0SMTPIN_ADDED_BROKEN@mx.google.com>
2014-01-20 22:13 ` Christoph Lameter
2014-01-21 2:20 ` Wanpeng Li
2014-01-24 3:09 ` Wanpeng Li
[not found] ` <52e1d960.2715420a.3569.1013SMTPIN_ADDED_BROKEN@mx.google.com>
2014-01-24 3:14 ` Wanpeng Li
[not found] ` <52e1da8f.86f7440a.120f.25f3SMTPIN_ADDED_BROKEN@mx.google.com>
2014-01-24 15:50 ` Christoph Lameter
2014-01-24 21:03 ` David Rientjes
2014-01-24 22:19 ` Nishanth Aravamudan
2014-01-24 23:29 ` Nishanth Aravamudan
2014-01-24 23:49 ` David Rientjes
2014-01-25 0:16 ` Nishanth Aravamudan
2014-01-25 0:25 ` David Rientjes
2014-01-25 1:10 ` Nishanth Aravamudan
2014-01-27 5:58 ` Joonsoo Kim
2014-01-28 18:29 ` Nishanth Aravamudan
2014-01-29 15:54 ` Christoph Lameter
2014-01-29 22:36 ` Nishanth Aravamudan
2014-01-30 16:26 ` Christoph Lameter
2014-02-03 23:00 ` Nishanth Aravamudan
2014-02-04 3:38 ` Christoph Lameter
2014-02-04 7:26 ` Nishanth Aravamudan
2014-02-04 20:39 ` Christoph Lameter
2014-02-05 0:13 ` Nishanth Aravamudan
2014-02-05 19:28 ` Christoph Lameter
2014-02-06 2:08 ` Nishanth Aravamudan
2014-02-06 17:25 ` Christoph Lameter
2014-01-27 16:18 ` Christoph Lameter
2014-02-06 2:07 ` Nishanth Aravamudan
2014-02-06 8:04 ` Joonsoo Kim [this message]
[not found] ` <20140206185955.GA7845@linux.vnet.ibm.com>
2014-02-06 19:28 ` Nishanth Aravamudan
2014-02-07 8:03 ` Joonsoo Kim
2014-02-06 8:07 ` [RFC PATCH 1/3] slub: search partial list on numa_mem_id(), instead of numa_node_id() Joonsoo Kim
2014-02-06 8:07 ` [RFC PATCH 2/3] topology: support node_numa_mem() for determining the fallback node Joonsoo Kim
2014-02-06 8:52 ` David Rientjes
2014-02-06 10:29 ` Joonsoo Kim
2014-02-06 19:11 ` Nishanth Aravamudan
2014-02-07 5:42 ` Joonsoo Kim
2014-02-06 20:52 ` David Rientjes
2014-02-07 5:48 ` Joonsoo Kim
2014-02-07 17:53 ` Christoph Lameter
2014-02-07 18:51 ` Christoph Lameter
2014-02-07 21:38 ` Nishanth Aravamudan
2014-02-10 1:15 ` Joonsoo Kim
2014-02-10 1:29 ` Joonsoo Kim
2014-02-11 18:45 ` Christoph Lameter
2014-02-10 19:13 ` Nishanth Aravamudan
2014-02-11 7:42 ` Joonsoo Kim
2014-02-12 22:16 ` Christoph Lameter
2014-02-13 3:53 ` Nishanth Aravamudan
2014-02-17 6:52 ` Joonsoo Kim
2014-02-18 16:38 ` Christoph Lameter
2014-02-19 22:04 ` David Rientjes
2014-02-20 16:02 ` Christoph Lameter
2014-02-24 5:08 ` Joonsoo Kim
2014-02-24 19:54 ` Christoph Lameter
2014-03-13 16:51 ` Nishanth Aravamudan
2014-02-18 17:22 ` Nishanth Aravamudan
2014-02-13 6:51 ` Nishanth Aravamudan
2014-02-17 7:00 ` Joonsoo Kim
2014-02-18 16:57 ` Christoph Lameter
2014-02-18 17:28 ` Nishanth Aravamudan
2014-02-18 19:58 ` Christoph Lameter
2014-02-18 21:09 ` Nishanth Aravamudan
2014-02-18 21:49 ` Christoph Lameter
2014-02-18 22:22 ` Nishanth Aravamudan
2014-02-19 16:11 ` Christoph Lameter
2014-02-19 22:03 ` David Rientjes
2014-02-08 9:57 ` David Rientjes
2014-02-10 1:09 ` Joonsoo Kim
2014-07-22 1:03 ` Nishanth Aravamudan
2014-07-22 1:16 ` David Rientjes
2014-07-22 21:43 ` Nishanth Aravamudan
2014-07-22 21:49 ` Tejun Heo
2014-07-22 23:47 ` Nishanth Aravamudan
2014-07-23 0:43 ` David Rientjes
2014-02-06 8:07 ` [RFC PATCH 3/3] slub: fallback to get_numa_mem() node if we want to allocate on memoryless node Joonsoo Kim
2014-02-06 17:30 ` Christoph Lameter
2014-02-07 5:41 ` Joonsoo Kim
2014-02-07 17:49 ` Christoph Lameter
2014-02-10 1:22 ` Joonsoo Kim
2014-02-06 8:37 ` [RFC PATCH 1/3] slub: search partial list on numa_mem_id(), instead of numa_node_id() David Rientjes
2014-02-06 17:31 ` Christoph Lameter
2014-02-06 17:26 ` Christoph Lameter
2014-05-16 23:37 ` Nishanth Aravamudan
2014-05-19 2:41 ` Joonsoo Kim
2014-06-05 0:13 ` [RESEND PATCH] " David Rientjes
2014-01-27 16:24 ` [PATCH] slub: Don't throw away partial remote slabs if there is no local memory Christoph Lameter
2014-01-27 16:16 ` Christoph Lameter
2014-01-07 9:42 ` David Laight
2014-01-08 14:14 ` Anton Blanchard
2014-01-07 10:28 ` Wanpeng Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140206080418.GA19913@lge.com \
--to=iamjoonsoo.kim@lge.com \
--cc=anton@samba.org \
--cc=cl@linux.com \
--cc=hanpt@linux.vnet.ibm.com \
--cc=linux-mm@kvack.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=liwanp@linux.vnet.ibm.com \
--cc=mpm@selenic.com \
--cc=nacc@linux.vnet.ibm.com \
--cc=paulus@samba.org \
--cc=penberg@kernel.org \
--cc=rientjes@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).