linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Han Pingtian <hanpt@linux.vnet.ibm.com>,
	Matt Mackall <mpm@selenic.com>,
	David Rientjes <rientjes@google.com>,
	Pekka Enberg <penberg@kernel.org>,
	Linux Memory Management List <linux-mm@kvack.org>,
	Paul Mackerras <paulus@samba.org>, Tejun Heo <tj@kernel.org>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	linuxppc-dev@lists.ozlabs.org, Christoph Lameter <cl@linux.com>,
	Wanpeng Li <liwanp@linux.vnet.ibm.com>,
	Anton Blanchard <anton@samba.org>
Subject: Re: [PATCH v3] topology: add support for node_to_mem_node() to determine the fallback node
Date: Tue, 9 Sep 2014 17:47:23 -0700	[thread overview]
Message-ID: <20140910004723.GH22906@linux.vnet.ibm.com> (raw)
In-Reply-To: <20140909171115.75c7702c37dfb23b9e053636@linux-foundation.org>

On 09.09.2014 [17:11:15 -0700], Andrew Morton wrote:
> On Tue, 9 Sep 2014 12:03:27 -0700 Nishanth Aravamudan <nacc@linux.vnet.ibm.com> wrote:
> 
> > From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > 
> > We need to determine the fallback node in slub allocator if the
> > allocation target node is memoryless node. Without it, the SLUB wrongly
> > select the node which has no memory and can't use a partial slab,
> > because of node mismatch. Introduced function, node_to_mem_node(X), will
> > return a node Y with memory that has the nearest distance. If X is
> > memoryless node, it will return nearest distance node, but, if X is
> > normal node, it will return itself.
> > 
> > We will use this function in following patch to determine the fallback
> > node.
> > 
> > ...
> >
> > --- a/include/linux/topology.h
> > +++ b/include/linux/topology.h
> > @@ -119,11 +119,20 @@ static inline int numa_node_id(void)
> >   * Use the accessor functions set_numa_mem(), numa_mem_id() and cpu_to_mem().
> 
> This comment could be updated.

Will do, do you prefer a follow-on patch or one that replaces this one?

> >   */
> >  DECLARE_PER_CPU(int, _numa_mem_);
> > +extern int _node_numa_mem_[MAX_NUMNODES];
> >  
> >  #ifndef set_numa_mem
> >  static inline void set_numa_mem(int node)
> >  {
> >  	this_cpu_write(_numa_mem_, node);
> > +	_node_numa_mem_[numa_node_id()] = node;
> > +}
> > +#endif
> > +
> > +#ifndef node_to_mem_node
> > +static inline int node_to_mem_node(int node)
> > +{
> > +	return _node_numa_mem_[node];
> >  }
> 
> A wee bit of documentation wouldn't hurt.
> 
> How does node_to_mem_node(numa_node_id()) differ from numa_mem_id()? 
> If I'm reading things correctly, they should both always return the
> same thing.  If so, do we need both?

That seems correct to me. The nearest memory node of this cpu's NUMA
node (node_to_mem_node(numa_node_id()) is always equal to the nearest
memory node (numa_mem_id()).

> Will node_to_mem_node() ever actually be called with a node !=
> numa_node_id()?

Well, it's a layering problem. The eventual callers of
node_to_mem_node() only have the requested NUMA node (if any) available.
I think because get_partial() __slab_alloc() allow for allocations for
any node, and that's where we see the slab deactivation issues, we need
to support this in the API.

In practice, it's probably that the node parameter is often
numa_node_id(), but we can't be sure of that in these call-paths,
afaict.
 
> >  #endif
> >  
> > @@ -146,6 +155,7 @@ static inline int cpu_to_mem(int cpu)
> >  static inline void set_cpu_numa_mem(int cpu, int node)
> >  {
> >  	per_cpu(_numa_mem_, cpu) = node;
> > +	_node_numa_mem_[cpu_to_node(cpu)] = node;
> >  }
> >  #endif
> >  
> > @@ -159,6 +169,13 @@ static inline int numa_mem_id(void)
> >  }
> >  #endif
> >  
> > +#ifndef node_to_mem_node
> > +static inline int node_to_mem_node(int node)
> > +{
> > +	return node;
> > +}
> > +#endif
> > +
> >  #ifndef cpu_to_mem
> >  static inline int cpu_to_mem(int cpu)
> >  {
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 18cee0d4c8a2..0883c42936d4 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -85,6 +85,7 @@ EXPORT_PER_CPU_SYMBOL(numa_node);
> >   */
> >  DEFINE_PER_CPU(int, _numa_mem_);		/* Kernel "local memory" node */
> >  EXPORT_PER_CPU_SYMBOL(_numa_mem_);
> > +int _node_numa_mem_[MAX_NUMNODES];
> 
> How does this get updated as CPUs, memory and nodes are hot-added and
> removed?

As CPUs are added, the architecture code in the CPU bringup will update
the NUMA topology. Memory and node hotplug are still open issues, I
mentioned the former in the cover letter. I should have mentioned it in
this commit message as well.

I do notice that Lee's commit message from 7aac78988551 ("numa:
introduce numa_mem_id()- effective local memory node id"):

"Generic initialization of 'numa_mem' occurs in __build_all_zonelists().
This will initialize the boot cpu at boot time, and all cpus on change
of numa_zonelist_order, or when node or memory hot-plug requires
zonelist rebuild.  Archs that support memoryless nodes will need to
initialize 'numa_mem' for secondary cpus as they're brought on-line."

And since we update the _node_numa_mem_ value on set_cpu_numa_mem()
calls, which were already needed for numa_mem_id(), we might be covered.
Testing these cases (hotplug) is next in my plans.

Thanks,
Nish

  reply	other threads:[~2014-09-10  0:47 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-09 19:01 [PATCH 0/3] Improve slab consumption with memoryless nodes Nishanth Aravamudan
2014-09-09 19:03 ` [PATCH v3] topology: add support for node_to_mem_node() to determine the fallback node Nishanth Aravamudan
2014-09-09 19:05   ` [PATCH 2/3] slub: fallback to node_to_mem_node() node if allocating on memoryless node Nishanth Aravamudan
2014-09-09 19:06     ` [PATCH 3/3] Partial revert of 81c98869faa5 ("kthread: ensure locality of task_struct allocations") Nishanth Aravamudan
2014-09-10  0:11     ` [PATCH 2/3] slub: fallback to node_to_mem_node() node if allocating on memoryless node Andrew Morton
2014-09-10  0:55       ` Nishanth Aravamudan
2014-09-10  0:11   ` [PATCH v3] topology: add support for node_to_mem_node() to determine the fallback node Andrew Morton
2014-09-10  0:47     ` Nishanth Aravamudan [this message]
2014-09-10 19:06       ` Andrew Morton
2014-09-10 22:49         ` Nishanth Aravamudan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140910004723.GH22906@linux.vnet.ibm.com \
    --to=nacc@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=anton@samba.org \
    --cc=cl@linux.com \
    --cc=hanpt@linux.vnet.ibm.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=liwanp@linux.vnet.ibm.com \
    --cc=mpm@selenic.com \
    --cc=paulus@samba.org \
    --cc=penberg@kernel.org \
    --cc=rientjes@google.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).