From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
To: Christoph Lameter <cl@linux.com>
Cc: Han Pingtian <hanpt@linux.vnet.ibm.com>,
Nishanth Aravamudan <nacc@linux.vnet.ibm.com>,
Matt Mackall <mpm@selenic.com>, Pekka Enberg <penberg@kernel.org>,
Linux Memory Management List <linux-mm@kvack.org>,
Paul Mackerras <paulus@samba.org>,
Anton Blanchard <anton@samba.org>,
David Rientjes <rientjes@google.com>,
linuxppc-dev@lists.ozlabs.org,
Wanpeng Li <liwanp@linux.vnet.ibm.com>
Subject: Re: [RFC PATCH 2/3] topology: support node_numa_mem() for determining the fallback node
Date: Mon, 24 Feb 2014 14:08:51 +0900 [thread overview]
Message-ID: <20140224050851.GB14814@lge.com> (raw)
In-Reply-To: <alpine.DEB.2.10.1402181033480.28964@nuc>
On Tue, Feb 18, 2014 at 10:38:01AM -0600, Christoph Lameter wrote:
> On Mon, 17 Feb 2014, Joonsoo Kim wrote:
>
> > On Wed, Feb 12, 2014 at 04:16:11PM -0600, Christoph Lameter wrote:
> > > Here is another patch with some fixes. The additional logic is only
> > > compiled in if CONFIG_HAVE_MEMORYLESS_NODES is set.
> > >
> > > Subject: slub: Memoryless node support
> > >
> > > Support memoryless nodes by tracking which allocations are failing.
> >
> > I still don't understand why this tracking is needed.
>
> Its an optimization to avoid calling the page allocator to figure out if
> there is memory available on a particular node.
>
> > All we need for allcation targeted to memoryless node is to fallback proper
> > node, that it, numa_mem_id() node of targeted node. My previous patch
> > implements it and use proper fallback node on every allocation code path.
> > Why this tracking is needed? Please elaborate more on this.
>
> Its too slow to do that on every alloc. One needs to be able to satisfy
> most allocations without switching percpu slabs for optimal performance.
I don't think that we need to switch percpu slabs on every alloc.
Allocation targeted to specific node is rare. And most of these allocations
may be targeted to either numa_node_id() or numa_mem_id(). My patch considers
these cases, so most of allocations are processed by percpu slabs. There is
no suboptimal performance.
>
> > > Allocations targeted to the nodes without memory fall back to the
> > > current available per cpu objects and if that is not available will
> > > create a new slab using the page allocator to fallback from the
> > > memoryless node to some other node.
>
> And what about the next alloc? Assuem there are N allocs from a memoryless
> node this means we push back the partial slab on each alloc and then fall
> back?
>
> > > {
> > > void *object;
> > > - int searchnode = (node == NUMA_NO_NODE) ? numa_node_id() : node;
> > > + int searchnode = (node == NUMA_NO_NODE) ? numa_mem_id() : node;
> > >
> > > object = get_partial_node(s, get_node(s, searchnode), c, flags);
> > > if (object || node != NUMA_NO_NODE)
> >
> > This isn't enough.
> > Consider that allcation targeted to memoryless node.
>
> It will not common get there because of the tracking. Instead a per cpu
> object will be used.
> > get_partial_node() always fails even if there are some partial slab on
> > memoryless node's neareast node.
>
> Correct and that leads to a page allocator action whereupon the node will
> be marked as empty.
Why do we need to request to a page allocator if there is partial slab?
Checking whether node is memoryless or not is really easy, so we don't need
to skip this. To skip this is suboptimal solution.
> > We should fallback to some proper node in this case, since there is no slab
> > on memoryless node.
>
> NUMA is about optimization of memory allocations. It is often *not* about
> correctness but heuristics are used in many cases. F.e. see the zone
> reclaim logic, zone reclaim mode, fallback scenarios in the page allocator
> etc etc.
Okay. But, 'do our best' is preferable to me.
Thanks.
WARNING: multiple messages have this Message-ID (diff)
From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
To: Christoph Lameter <cl@linux.com>
Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>,
David Rientjes <rientjes@google.com>,
Han Pingtian <hanpt@linux.vnet.ibm.com>,
Pekka Enberg <penberg@kernel.org>,
Linux Memory Management List <linux-mm@kvack.org>,
Paul Mackerras <paulus@samba.org>,
Anton Blanchard <anton@samba.org>, Matt Mackall <mpm@selenic.com>,
linuxppc-dev@lists.ozlabs.org,
Wanpeng Li <liwanp@linux.vnet.ibm.com>
Subject: Re: [RFC PATCH 2/3] topology: support node_numa_mem() for determining the fallback node
Date: Mon, 24 Feb 2014 14:08:51 +0900 [thread overview]
Message-ID: <20140224050851.GB14814@lge.com> (raw)
In-Reply-To: <alpine.DEB.2.10.1402181033480.28964@nuc>
On Tue, Feb 18, 2014 at 10:38:01AM -0600, Christoph Lameter wrote:
> On Mon, 17 Feb 2014, Joonsoo Kim wrote:
>
> > On Wed, Feb 12, 2014 at 04:16:11PM -0600, Christoph Lameter wrote:
> > > Here is another patch with some fixes. The additional logic is only
> > > compiled in if CONFIG_HAVE_MEMORYLESS_NODES is set.
> > >
> > > Subject: slub: Memoryless node support
> > >
> > > Support memoryless nodes by tracking which allocations are failing.
> >
> > I still don't understand why this tracking is needed.
>
> Its an optimization to avoid calling the page allocator to figure out if
> there is memory available on a particular node.
>
> > All we need for allcation targeted to memoryless node is to fallback proper
> > node, that it, numa_mem_id() node of targeted node. My previous patch
> > implements it and use proper fallback node on every allocation code path.
> > Why this tracking is needed? Please elaborate more on this.
>
> Its too slow to do that on every alloc. One needs to be able to satisfy
> most allocations without switching percpu slabs for optimal performance.
I don't think that we need to switch percpu slabs on every alloc.
Allocation targeted to specific node is rare. And most of these allocations
may be targeted to either numa_node_id() or numa_mem_id(). My patch considers
these cases, so most of allocations are processed by percpu slabs. There is
no suboptimal performance.
>
> > > Allocations targeted to the nodes without memory fall back to the
> > > current available per cpu objects and if that is not available will
> > > create a new slab using the page allocator to fallback from the
> > > memoryless node to some other node.
>
> And what about the next alloc? Assuem there are N allocs from a memoryless
> node this means we push back the partial slab on each alloc and then fall
> back?
>
> > > {
> > > void *object;
> > > - int searchnode = (node == NUMA_NO_NODE) ? numa_node_id() : node;
> > > + int searchnode = (node == NUMA_NO_NODE) ? numa_mem_id() : node;
> > >
> > > object = get_partial_node(s, get_node(s, searchnode), c, flags);
> > > if (object || node != NUMA_NO_NODE)
> >
> > This isn't enough.
> > Consider that allcation targeted to memoryless node.
>
> It will not common get there because of the tracking. Instead a per cpu
> object will be used.
> > get_partial_node() always fails even if there are some partial slab on
> > memoryless node's neareast node.
>
> Correct and that leads to a page allocator action whereupon the node will
> be marked as empty.
Why do we need to request to a page allocator if there is partial slab?
Checking whether node is memoryless or not is really easy, so we don't need
to skip this. To skip this is suboptimal solution.
> > We should fallback to some proper node in this case, since there is no slab
> > on memoryless node.
>
> NUMA is about optimization of memory allocations. It is often *not* about
> correctness but heuristics are used in many cases. F.e. see the zone
> reclaim logic, zone reclaim mode, fallback scenarios in the page allocator
> etc etc.
Okay. But, 'do our best' is preferable to me.
Thanks.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2014-02-24 5:08 UTC|newest]
Thread overview: 229+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-01-07 2:21 [PATCH] slub: Don't throw away partial remote slabs if there is no local memory Anton Blanchard
2014-01-07 2:21 ` Anton Blanchard
2014-01-07 4:19 ` Wanpeng Li
2014-01-07 4:19 ` Wanpeng Li
2014-01-08 14:17 ` Anton Blanchard
2014-01-08 14:17 ` Anton Blanchard
2014-01-07 4:19 ` Wanpeng Li
2014-01-07 4:19 ` Wanpeng Li
2014-01-07 6:49 ` Andi Kleen
2014-01-07 6:49 ` Andi Kleen
2014-01-08 14:03 ` Anton Blanchard
2014-01-08 14:03 ` Anton Blanchard
2014-01-07 7:41 ` Joonsoo Kim
2014-01-07 7:41 ` Joonsoo Kim
2014-01-07 8:48 ` Wanpeng Li
2014-01-07 8:48 ` Wanpeng Li
2014-01-07 8:48 ` Wanpeng Li
2014-01-07 9:10 ` Joonsoo Kim
2014-01-07 9:10 ` Joonsoo Kim
2014-01-07 9:21 ` Wanpeng Li
2014-01-07 9:31 ` Joonsoo Kim
2014-01-07 9:31 ` Joonsoo Kim
2014-01-07 9:49 ` Wanpeng Li
2014-01-07 9:49 ` Wanpeng Li
2014-01-07 9:49 ` Wanpeng Li
2014-01-07 9:49 ` Wanpeng Li
2014-01-07 9:21 ` Wanpeng Li
2014-01-07 9:21 ` Wanpeng Li
2014-01-07 9:21 ` Wanpeng Li
2014-01-07 8:48 ` Wanpeng Li
2014-01-07 9:52 ` Wanpeng Li
2014-01-09 0:20 ` Joonsoo Kim
2014-01-09 0:20 ` Joonsoo Kim
2014-01-07 9:52 ` Wanpeng Li
2014-01-07 9:52 ` Wanpeng Li
2014-01-07 9:52 ` Wanpeng Li
2014-01-20 9:10 ` Wanpeng Li
2014-01-20 9:10 ` Wanpeng Li
2014-01-20 9:10 ` Wanpeng Li
2014-01-20 9:10 ` Wanpeng Li
[not found] ` <52dce7fe.e5e6420a.5ff6.ffff84a0SMTPIN_ADDED_BROKEN@mx.google.com>
2014-01-20 22:13 ` Christoph Lameter
2014-01-20 22:13 ` Christoph Lameter
2014-01-21 2:20 ` Wanpeng Li
2014-01-21 2:20 ` Wanpeng Li
2014-01-21 2:20 ` Wanpeng Li
2014-01-21 2:20 ` Wanpeng Li
2014-01-24 3:09 ` Wanpeng Li
2014-01-24 3:09 ` Wanpeng Li
2014-01-24 3:09 ` Wanpeng Li
2014-01-24 3:14 ` Wanpeng Li
2014-01-24 3:14 ` Wanpeng Li
2014-01-24 3:14 ` Wanpeng Li
2014-01-24 3:14 ` Wanpeng Li
[not found] ` <52e1da8f.86f7440a.120f.25f3SMTPIN_ADDED_BROKEN@mx.google.com>
2014-01-24 15:50 ` Christoph Lameter
2014-01-24 15:50 ` Christoph Lameter
2014-01-24 21:03 ` David Rientjes
2014-01-24 21:03 ` David Rientjes
2014-01-24 22:19 ` Nishanth Aravamudan
2014-01-24 22:19 ` Nishanth Aravamudan
2014-01-24 23:29 ` Nishanth Aravamudan
2014-01-24 23:29 ` Nishanth Aravamudan
2014-01-24 23:49 ` David Rientjes
2014-01-24 23:49 ` David Rientjes
2014-01-25 0:16 ` Nishanth Aravamudan
2014-01-25 0:16 ` Nishanth Aravamudan
2014-01-25 0:25 ` David Rientjes
2014-01-25 0:25 ` David Rientjes
2014-01-25 1:10 ` Nishanth Aravamudan
2014-01-25 1:10 ` Nishanth Aravamudan
2014-01-27 5:58 ` Joonsoo Kim
2014-01-27 5:58 ` Joonsoo Kim
2014-01-28 18:29 ` Nishanth Aravamudan
2014-01-28 18:29 ` Nishanth Aravamudan
2014-01-29 15:54 ` Christoph Lameter
2014-01-29 15:54 ` Christoph Lameter
2014-01-29 22:36 ` Nishanth Aravamudan
2014-01-29 22:36 ` Nishanth Aravamudan
2014-01-30 16:26 ` Christoph Lameter
2014-01-30 16:26 ` Christoph Lameter
2014-02-03 23:00 ` Nishanth Aravamudan
2014-02-03 23:00 ` Nishanth Aravamudan
2014-02-04 3:38 ` Christoph Lameter
2014-02-04 3:38 ` Christoph Lameter
2014-02-04 7:26 ` Nishanth Aravamudan
2014-02-04 7:26 ` Nishanth Aravamudan
2014-02-04 20:39 ` Christoph Lameter
2014-02-04 20:39 ` Christoph Lameter
2014-02-05 0:13 ` Nishanth Aravamudan
2014-02-05 0:13 ` Nishanth Aravamudan
2014-02-05 19:28 ` Christoph Lameter
2014-02-05 19:28 ` Christoph Lameter
2014-02-06 2:08 ` Nishanth Aravamudan
2014-02-06 2:08 ` Nishanth Aravamudan
2014-02-06 17:25 ` Christoph Lameter
2014-02-06 17:25 ` Christoph Lameter
2014-01-27 16:18 ` Christoph Lameter
2014-01-27 16:18 ` Christoph Lameter
2014-02-06 2:07 ` Nishanth Aravamudan
2014-02-06 2:07 ` Nishanth Aravamudan
2014-02-06 8:04 ` Joonsoo Kim
2014-02-06 8:04 ` Joonsoo Kim
[not found] ` <20140206185955.GA7845@linux.vnet.ibm.com>
2014-02-06 19:28 ` Nishanth Aravamudan
2014-02-06 19:28 ` Nishanth Aravamudan
2014-02-07 8:03 ` Joonsoo Kim
2014-02-07 8:03 ` Joonsoo Kim
2014-02-06 8:07 ` [RFC PATCH 1/3] slub: search partial list on numa_mem_id(), instead of numa_node_id() Joonsoo Kim
2014-02-06 8:07 ` Joonsoo Kim
2014-02-06 8:07 ` [RFC PATCH 2/3] topology: support node_numa_mem() for determining the fallback node Joonsoo Kim
2014-02-06 8:07 ` Joonsoo Kim
2014-02-06 8:52 ` David Rientjes
2014-02-06 8:52 ` David Rientjes
2014-02-06 10:29 ` Joonsoo Kim
2014-02-06 10:29 ` Joonsoo Kim
2014-02-06 19:11 ` Nishanth Aravamudan
2014-02-06 19:11 ` Nishanth Aravamudan
2014-02-07 5:42 ` Joonsoo Kim
2014-02-07 5:42 ` Joonsoo Kim
2014-02-06 20:52 ` David Rientjes
2014-02-06 20:52 ` David Rientjes
2014-02-07 5:48 ` Joonsoo Kim
2014-02-07 5:48 ` Joonsoo Kim
2014-02-07 17:53 ` Christoph Lameter
2014-02-07 17:53 ` Christoph Lameter
2014-02-07 18:51 ` Christoph Lameter
2014-02-07 18:51 ` Christoph Lameter
2014-02-07 21:38 ` Nishanth Aravamudan
2014-02-07 21:38 ` Nishanth Aravamudan
2014-02-10 1:15 ` Joonsoo Kim
2014-02-10 1:15 ` Joonsoo Kim
2014-02-10 1:29 ` Joonsoo Kim
2014-02-10 1:29 ` Joonsoo Kim
2014-02-11 18:45 ` Christoph Lameter
2014-02-11 18:45 ` Christoph Lameter
2014-02-10 19:13 ` Nishanth Aravamudan
2014-02-10 19:13 ` Nishanth Aravamudan
2014-02-11 7:42 ` Joonsoo Kim
2014-02-11 7:42 ` Joonsoo Kim
2014-02-12 22:16 ` Christoph Lameter
2014-02-12 22:16 ` Christoph Lameter
2014-02-13 3:53 ` Nishanth Aravamudan
2014-02-13 3:53 ` Nishanth Aravamudan
2014-02-17 6:52 ` Joonsoo Kim
2014-02-17 6:52 ` Joonsoo Kim
2014-02-18 16:38 ` Christoph Lameter
2014-02-18 16:38 ` Christoph Lameter
2014-02-19 22:04 ` David Rientjes
2014-02-19 22:04 ` David Rientjes
2014-02-20 16:02 ` Christoph Lameter
2014-02-20 16:02 ` Christoph Lameter
2014-02-24 5:08 ` Joonsoo Kim [this message]
2014-02-24 5:08 ` Joonsoo Kim
2014-02-24 19:54 ` Christoph Lameter
2014-02-24 19:54 ` Christoph Lameter
2014-03-13 16:51 ` Nishanth Aravamudan
2014-03-13 16:51 ` Nishanth Aravamudan
2014-02-18 17:22 ` Nishanth Aravamudan
2014-02-18 17:22 ` Nishanth Aravamudan
2014-02-13 6:51 ` Nishanth Aravamudan
2014-02-13 6:51 ` Nishanth Aravamudan
2014-02-17 7:00 ` Joonsoo Kim
2014-02-17 7:00 ` Joonsoo Kim
2014-02-18 16:57 ` Christoph Lameter
2014-02-18 16:57 ` Christoph Lameter
2014-02-18 17:28 ` Nishanth Aravamudan
2014-02-18 17:28 ` Nishanth Aravamudan
2014-02-18 19:58 ` Christoph Lameter
2014-02-18 19:58 ` Christoph Lameter
2014-02-18 21:09 ` Nishanth Aravamudan
2014-02-18 21:09 ` Nishanth Aravamudan
2014-02-18 21:49 ` Christoph Lameter
2014-02-18 21:49 ` Christoph Lameter
2014-02-18 22:22 ` Nishanth Aravamudan
2014-02-18 22:22 ` Nishanth Aravamudan
2014-02-19 16:11 ` Christoph Lameter
2014-02-19 16:11 ` Christoph Lameter
2014-02-19 22:03 ` David Rientjes
2014-02-19 22:03 ` David Rientjes
2014-02-08 9:57 ` David Rientjes
2014-02-08 9:57 ` David Rientjes
2014-02-10 1:09 ` Joonsoo Kim
2014-02-10 1:09 ` Joonsoo Kim
2014-07-22 1:03 ` Nishanth Aravamudan
2014-07-22 1:03 ` Nishanth Aravamudan
2014-07-22 1:16 ` David Rientjes
2014-07-22 1:16 ` David Rientjes
2014-07-22 21:43 ` Nishanth Aravamudan
2014-07-22 21:43 ` Nishanth Aravamudan
2014-07-22 21:49 ` Tejun Heo
2014-07-22 21:49 ` Tejun Heo
2014-07-22 23:47 ` Nishanth Aravamudan
2014-07-22 23:47 ` Nishanth Aravamudan
2014-07-23 0:43 ` David Rientjes
2014-07-23 0:43 ` David Rientjes
2014-02-06 8:07 ` [RFC PATCH 3/3] slub: fallback to get_numa_mem() node if we want to allocate on memoryless node Joonsoo Kim
2014-02-06 8:07 ` Joonsoo Kim
2014-02-06 17:30 ` Christoph Lameter
2014-02-06 17:30 ` Christoph Lameter
2014-02-07 5:41 ` Joonsoo Kim
2014-02-07 5:41 ` Joonsoo Kim
2014-02-07 17:49 ` Christoph Lameter
2014-02-07 17:49 ` Christoph Lameter
2014-02-10 1:22 ` Joonsoo Kim
2014-02-10 1:22 ` Joonsoo Kim
2014-02-06 8:37 ` [RFC PATCH 1/3] slub: search partial list on numa_mem_id(), instead of numa_node_id() David Rientjes
2014-02-06 8:37 ` David Rientjes
2014-02-06 17:31 ` Christoph Lameter
2014-02-06 17:31 ` Christoph Lameter
2014-02-06 17:26 ` Christoph Lameter
2014-02-06 17:26 ` Christoph Lameter
2014-05-16 23:37 ` Nishanth Aravamudan
2014-05-16 23:37 ` Nishanth Aravamudan
2014-05-19 2:41 ` Joonsoo Kim
2014-05-19 2:41 ` Joonsoo Kim
2014-06-05 0:13 ` [RESEND PATCH] " David Rientjes
2014-06-05 0:13 ` David Rientjes
2014-06-05 0:13 ` David Rientjes
2014-01-27 16:24 ` [PATCH] slub: Don't throw away partial remote slabs if there is no local memory Christoph Lameter
2014-01-27 16:24 ` Christoph Lameter
2014-01-27 16:16 ` Christoph Lameter
2014-01-27 16:16 ` Christoph Lameter
2014-01-24 3:09 ` Wanpeng Li
2014-01-07 9:42 ` David Laight
2014-01-07 9:42 ` David Laight
2014-01-08 14:14 ` Anton Blanchard
2014-01-08 14:14 ` Anton Blanchard
2014-01-07 10:28 ` Wanpeng Li
2014-01-07 10:28 ` Wanpeng Li
2014-01-07 10:28 ` Wanpeng Li
2014-01-07 10:28 ` Wanpeng Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140224050851.GB14814@lge.com \
--to=iamjoonsoo.kim@lge.com \
--cc=anton@samba.org \
--cc=cl@linux.com \
--cc=hanpt@linux.vnet.ibm.com \
--cc=linux-mm@kvack.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=liwanp@linux.vnet.ibm.com \
--cc=mpm@selenic.com \
--cc=nacc@linux.vnet.ibm.com \
--cc=paulus@samba.org \
--cc=penberg@kernel.org \
--cc=rientjes@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.