From: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
To: David Rientjes <rientjes@google.com>
Cc: Han Pingtian <hanpt@linux.vnet.ibm.com>,
penberg@kernel.org, linux-mm@kvack.org, paulus@samba.org,
Anton Blanchard <anton@samba.org>,
mpm@selenic.com, Christoph Lameter <cl@linux.com>,
linuxppc-dev@lists.ozlabs.org,
Joonsoo Kim <iamjoonsoo.kim@lge.com>,
Wanpeng Li <liwanp@linux.vnet.ibm.com>
Subject: Re: [PATCH] slub: Don't throw away partial remote slabs if there is no local memory
Date: Wed, 5 Feb 2014 18:07:57 -0800 [thread overview]
Message-ID: <20140206020757.GC5433@linux.vnet.ibm.com> (raw)
In-Reply-To: <alpine.DEB.2.02.1401241618500.20466@chino.kir.corp.google.com>
On 24.01.2014 [16:25:58 -0800], David Rientjes wrote:
> On Fri, 24 Jan 2014, Nishanth Aravamudan wrote:
>
> > Thank you for clarifying and providing a test patch. I ran with this on
> > the system showing the original problem, configured to have 15GB of
> > memory.
> >
> > With your patch after boot:
> >
> > MemTotal: 15604736 kB
> > MemFree: 8768192 kB
> > Slab: 3882560 kB
> > SReclaimable: 105408 kB
> > SUnreclaim: 3777152 kB
> >
> > With Anton's patch after boot:
> >
> > MemTotal: 15604736 kB
> > MemFree: 11195008 kB
> > Slab: 1427968 kB
> > SReclaimable: 109184 kB
> > SUnreclaim: 1318784 kB
> >
> >
> > I know that's fairly unscientific, but the numbers are reproducible.
> >
>
> I don't think the goal of the discussion is to reduce the amount of slab
> allocated, but rather get the most local slab memory possible by use of
> kmalloc_node(). When a memoryless node is being passed to kmalloc_node(),
> which is probably cpu_to_node() for a cpu bound to a node without memory,
> my patch is allocating it on the most local node; Anton's patch is
> allocating it on whatever happened to be the cpu slab.
>
> > > diff --git a/mm/slub.c b/mm/slub.c
> > > --- a/mm/slub.c
> > > +++ b/mm/slub.c
> > > @@ -2278,10 +2278,14 @@ redo:
> > >
> > > if (unlikely(!node_match(page, node))) {
> > > stat(s, ALLOC_NODE_MISMATCH);
> > > - deactivate_slab(s, page, c->freelist);
> > > - c->page = NULL;
> > > - c->freelist = NULL;
> > > - goto new_slab;
> > > + if (unlikely(!node_present_pages(node)))
> > > + node = numa_mem_id();
> > > + if (!node_match(page, node)) {
> > > + deactivate_slab(s, page, c->freelist);
> > > + c->page = NULL;
> > > + c->freelist = NULL;
> > > + goto new_slab;
> > > + }
> >
> > Semantically, and please correct me if I'm wrong, this patch is saying
> > if we have a memoryless node, we expect the page's locality to be that
> > of numa_mem_id(), and we still deactivate the slab if that isn't true.
> > Just wanting to make sure I understand the intent.
> >
>
> Yeah, the default policy should be to fallback to local memory if the node
> passed is memoryless.
>
> > What I find odd is that there are only 2 nodes on this system, node 0
> > (empty) and node 1. So won't numa_mem_id() always be 1? And every page
> > should be coming from node 1 (thus node_match() should always be true?)
> >
>
> The nice thing about slub is its debugging ability, what is
> /sys/kernel/slab/cache/objects showing in comparison between the two
> patches?
Ok, I finally got around to writing a script that compares the objects
output from both kernels.
log1 is with CONFIG_HAVE_MEMORYLESS_NODES on, my kthread locality patch
and Joonsoo's patch.
log2 is with CONFIG_HAVE_MEMORYLESS_NODES on, my kthread locality patch
and Anton's patch.
slab objects objects percent
log1 log2 change
-----------------------------------------------------------
:t-0000104 71190 85680 20.353982 %
UDP 4352 3392 22.058824 %
inode_cache 54302 41923 22.796582 %
fscache_cookie_jar 3276 2457 25.000000 %
:t-0000896 438 292 33.333333 %
:t-0000080 310401 195323 37.073978 %
ext4_inode_cache 335 201 40.000000 %
:t-0000192 89408 128898 44.168307 %
:t-0000184 151300 81880 45.882353 %
:t-0000512 49698 73648 48.191074 %
:at-0000192 242867 120948 50.199904 %
xfs_inode 34350 15221 55.688501 %
:t-0016384 11005 17257 56.810541 %
proc_inode_cache 103868 34717 66.575846 %
tw_sock_TCP 768 256 66.666667 %
:t-0004096 15240 25672 68.451444 %
nfs_inode_cache 1008 315 68.750000 %
:t-0001024 14528 24720 70.154185 %
:t-0032768 655 1312 100.305344%
:t-0002048 14242 30720 115.700042%
:t-0000640 1020 2550 150.000000%
:t-0008192 10005 27905 178.910545%
FWIW, the configuration of this LPAR has slightly changed. It is now configured
for maximally 400 CPUs, of which 200 are present. The result is that even with
Joonsoo's patch (log1 above), we OOM pretty easily and Anton's slab usage
script reports:
slab mem objs slabs
used active active
------------------------------------------------------------
kmalloc-512 1182 MB 2.03% 100.00%
kmalloc-192 1182 MB 1.38% 100.00%
kmalloc-16384 966 MB 17.66% 100.00%
kmalloc-4096 353 MB 15.92% 100.00%
kmalloc-8192 259 MB 27.28% 100.00%
kmalloc-32768 207 MB 9.86% 100.00%
In comparison (log2 above):
slab mem objs slabs
used active active
------------------------------------------------------------
kmalloc-16384 273 MB 98.76% 100.00%
kmalloc-8192 225 MB 98.67% 100.00%
pgtable-2^11 114 MB 100.00% 100.00%
pgtable-2^12 109 MB 100.00% 100.00%
kmalloc-4096 104 MB 98.59% 100.00%
I appreciate all the help so far, if anyone has any ideas how best to
proceed further, or what they'd like debugged more, I'm happy to get
this fixed. We're hitting this on a couple of different systems and I'd
like to find a good resolution to the problem.
Thanks,
Nish
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2014-02-06 2:08 UTC|newest]
Thread overview: 124+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-01-07 2:21 [PATCH] slub: Don't throw away partial remote slabs if there is no local memory Anton Blanchard
2014-01-07 4:19 ` Wanpeng Li
2014-01-07 4:19 ` Wanpeng Li
2014-01-07 4:19 ` Wanpeng Li
2014-01-07 6:49 ` Andi Kleen
2014-01-08 14:03 ` Anton Blanchard
2014-01-07 7:41 ` Joonsoo Kim
2014-01-07 8:48 ` Wanpeng Li
2014-01-07 8:48 ` Wanpeng Li
2014-01-07 8:48 ` Wanpeng Li
2014-01-07 9:10 ` Joonsoo Kim
2014-01-07 9:21 ` Wanpeng Li
2014-01-07 9:31 ` Joonsoo Kim
2014-01-07 9:49 ` Wanpeng Li
2014-01-07 9:49 ` Wanpeng Li
2014-01-07 9:49 ` Wanpeng Li
2014-01-07 9:21 ` Wanpeng Li
2014-01-07 9:21 ` Wanpeng Li
2014-01-07 9:52 ` Wanpeng Li
2014-01-07 9:52 ` Wanpeng Li
2014-01-07 9:52 ` Wanpeng Li
2014-01-09 0:20 ` Joonsoo Kim
2014-01-20 9:10 ` Wanpeng Li
2014-01-20 9:10 ` Wanpeng Li
2014-01-20 9:10 ` Wanpeng Li
[not found] ` <52dce7fe.e5e6420a.5ff6.ffff84a0SMTPIN_ADDED_BROKEN@mx.google.com>
2014-01-20 22:13 ` Christoph Lameter
2014-01-21 2:20 ` Wanpeng Li
2014-01-21 2:20 ` Wanpeng Li
2014-01-21 2:20 ` Wanpeng Li
2014-01-24 3:09 ` Wanpeng Li
2014-01-24 3:09 ` Wanpeng Li
2014-01-24 3:14 ` Wanpeng Li
2014-01-24 3:14 ` Wanpeng Li
2014-01-24 3:14 ` Wanpeng Li
[not found] ` <52e1da8f.86f7440a.120f.25f3SMTPIN_ADDED_BROKEN@mx.google.com>
2014-01-24 15:50 ` Christoph Lameter
2014-01-24 21:03 ` David Rientjes
2014-01-24 22:19 ` Nishanth Aravamudan
2014-01-24 23:29 ` Nishanth Aravamudan
2014-01-24 23:49 ` David Rientjes
2014-01-25 0:16 ` Nishanth Aravamudan
2014-01-25 0:25 ` David Rientjes
2014-01-25 1:10 ` Nishanth Aravamudan
2014-01-27 5:58 ` Joonsoo Kim
2014-01-28 18:29 ` Nishanth Aravamudan
2014-01-29 15:54 ` Christoph Lameter
2014-01-29 22:36 ` Nishanth Aravamudan
2014-01-30 16:26 ` Christoph Lameter
2014-02-03 23:00 ` Nishanth Aravamudan
2014-02-04 3:38 ` Christoph Lameter
2014-02-04 7:26 ` Nishanth Aravamudan
2014-02-04 20:39 ` Christoph Lameter
2014-02-05 0:13 ` Nishanth Aravamudan
2014-02-05 19:28 ` Christoph Lameter
2014-02-06 2:08 ` Nishanth Aravamudan
2014-02-06 17:25 ` Christoph Lameter
2014-01-27 16:18 ` Christoph Lameter
2014-02-06 2:07 ` Nishanth Aravamudan [this message]
2014-02-06 8:04 ` Joonsoo Kim
[not found] ` <20140206185955.GA7845@linux.vnet.ibm.com>
2014-02-06 19:28 ` Nishanth Aravamudan
2014-02-07 8:03 ` Joonsoo Kim
2014-02-06 8:07 ` [RFC PATCH 1/3] slub: search partial list on numa_mem_id(), instead of numa_node_id() Joonsoo Kim
2014-02-06 8:07 ` [RFC PATCH 2/3] topology: support node_numa_mem() for determining the fallback node Joonsoo Kim
2014-02-06 8:52 ` David Rientjes
2014-02-06 10:29 ` Joonsoo Kim
2014-02-06 19:11 ` Nishanth Aravamudan
2014-02-07 5:42 ` Joonsoo Kim
2014-02-06 20:52 ` David Rientjes
2014-02-07 5:48 ` Joonsoo Kim
2014-02-07 17:53 ` Christoph Lameter
2014-02-07 18:51 ` Christoph Lameter
2014-02-07 21:38 ` Nishanth Aravamudan
2014-02-10 1:15 ` Joonsoo Kim
2014-02-10 1:29 ` Joonsoo Kim
2014-02-11 18:45 ` Christoph Lameter
2014-02-10 19:13 ` Nishanth Aravamudan
2014-02-11 7:42 ` Joonsoo Kim
2014-02-12 22:16 ` Christoph Lameter
2014-02-13 3:53 ` Nishanth Aravamudan
2014-02-17 6:52 ` Joonsoo Kim
2014-02-18 16:38 ` Christoph Lameter
2014-02-19 22:04 ` David Rientjes
2014-02-20 16:02 ` Christoph Lameter
2014-02-24 5:08 ` Joonsoo Kim
2014-02-24 19:54 ` Christoph Lameter
2014-03-13 16:51 ` Nishanth Aravamudan
2014-02-18 17:22 ` Nishanth Aravamudan
2014-02-13 6:51 ` Nishanth Aravamudan
2014-02-17 7:00 ` Joonsoo Kim
2014-02-18 16:57 ` Christoph Lameter
2014-02-18 17:28 ` Nishanth Aravamudan
2014-02-18 19:58 ` Christoph Lameter
2014-02-18 21:09 ` Nishanth Aravamudan
2014-02-18 21:49 ` Christoph Lameter
2014-02-18 22:22 ` Nishanth Aravamudan
2014-02-19 16:11 ` Christoph Lameter
2014-02-19 22:03 ` David Rientjes
2014-02-08 9:57 ` David Rientjes
2014-02-10 1:09 ` Joonsoo Kim
2014-07-22 1:03 ` Nishanth Aravamudan
2014-07-22 1:16 ` David Rientjes
2014-07-22 21:43 ` Nishanth Aravamudan
2014-07-22 21:49 ` Tejun Heo
2014-07-22 23:47 ` Nishanth Aravamudan
2014-07-23 0:43 ` David Rientjes
2014-02-06 8:07 ` [RFC PATCH 3/3] slub: fallback to get_numa_mem() node if we want to allocate on memoryless node Joonsoo Kim
2014-02-06 17:30 ` Christoph Lameter
2014-02-07 5:41 ` Joonsoo Kim
2014-02-07 17:49 ` Christoph Lameter
2014-02-10 1:22 ` Joonsoo Kim
2014-02-06 8:37 ` [RFC PATCH 1/3] slub: search partial list on numa_mem_id(), instead of numa_node_id() David Rientjes
2014-02-06 17:31 ` Christoph Lameter
2014-02-06 17:26 ` Christoph Lameter
2014-05-16 23:37 ` Nishanth Aravamudan
2014-05-19 2:41 ` Joonsoo Kim
2014-06-05 0:13 ` [RESEND PATCH] " David Rientjes
2014-01-27 16:24 ` [PATCH] slub: Don't throw away partial remote slabs if there is no local memory Christoph Lameter
2014-01-27 16:16 ` Christoph Lameter
2014-01-24 3:09 ` Wanpeng Li
2014-01-07 9:42 ` David Laight
2014-01-08 14:14 ` Anton Blanchard
2014-01-07 10:28 ` Wanpeng Li
2014-01-07 10:28 ` Wanpeng Li
2014-01-07 10:28 ` Wanpeng Li
[not found] ` <20140107041939.GA20916@hacker.(null)>
2014-01-08 14:17 ` Anton Blanchard
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140206020757.GC5433@linux.vnet.ibm.com \
--to=nacc@linux.vnet.ibm.com \
--cc=anton@samba.org \
--cc=cl@linux.com \
--cc=hanpt@linux.vnet.ibm.com \
--cc=iamjoonsoo.kim@lge.com \
--cc=linux-mm@kvack.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=liwanp@linux.vnet.ibm.com \
--cc=mpm@selenic.com \
--cc=paulus@samba.org \
--cc=penberg@kernel.org \
--cc=rientjes@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).