linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
To: David Rientjes <rientjes@google.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Paul Mackerras <paulus@samba.org>,
	Anton Blanchard <anton@samba.org>,
	Peter Zijlstra <peterz@infradead.org>,
	linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH 1/2] powerpc/numa: fix cpu_to_node() usage during boot
Date: Fri, 10 Jul 2015 09:25:31 -0700	[thread overview]
Message-ID: <20150710162531.GE44862@linux.vnet.ibm.com> (raw)
In-Reply-To: <alpine.DEB.2.10.1507081811280.16585@chino.kir.corp.google.com>

On 08.07.2015 [18:22:09 -0700], David Rientjes wrote:
> On Thu, 2 Jul 2015, Nishanth Aravamudan wrote:
> 
> > Much like on x86, now that powerpc is using USE_PERCPU_NUMA_NODE_ID, we
> > have an ordering issue during boot with early calls to cpu_to_node().
> > The value returned by those calls now depend on the per-cpu area being
> > setup, but that is not guaranteed to be the case during boot. Instead,
> > we need to add an early_cpu_to_node() which doesn't use the per-CPU area
> > and call that from certain spots that are known to invoke cpu_to_node()
> > before the per-CPU areas are not configured.
> > 
> > On an example 2-node NUMA system with the following topology:
> > 
> > available: 2 nodes (0-1)
> > node 0 cpus: 0 1 2 3
> > node 0 size: 2029 MB
> > node 0 free: 1753 MB
> > node 1 cpus: 4 5 6 7
> > node 1 size: 2045 MB
> > node 1 free: 1945 MB
> > node distances:
> > node   0   1 
> >   0:  10  40 
> >   1:  40  10 
> > 
> > we currently emit at boot:
> > 
> > [    0.000000] pcpu-alloc: [0] 0 1 2 3 [0] 4 5 6 7 
> > 
> > After this commit, we correctly emit:
> > 
> > [    0.000000] pcpu-alloc: [0] 0 1 2 3 [1] 4 5 6 7 
> > 
> > Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
> > 
> > diff --git a/arch/powerpc/include/asm/topology.h b/arch/powerpc/include/asm/topology.h
> > index 5f1048e..f2c4c89 100644
> > --- a/arch/powerpc/include/asm/topology.h
> > +++ b/arch/powerpc/include/asm/topology.h
> > @@ -39,6 +39,8 @@ static inline int pcibus_to_node(struct pci_bus *bus)
> >  extern int __node_distance(int, int);
> >  #define node_distance(a, b) __node_distance(a, b)
> >  
> > +extern int early_cpu_to_node(int);
> > +
> >  extern void __init dump_numa_cpu_topology(void);
> >  
> >  extern int sysfs_add_device_to_node(struct device *dev, int nid);
> > diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
> > index c69671c..23a2cf3 100644
> > --- a/arch/powerpc/kernel/setup_64.c
> > +++ b/arch/powerpc/kernel/setup_64.c
> > @@ -715,8 +715,8 @@ void __init setup_arch(char **cmdline_p)
> >  
> >  static void * __init pcpu_fc_alloc(unsigned int cpu, size_t size, size_t align)
> >  {
> > -	return __alloc_bootmem_node(NODE_DATA(cpu_to_node(cpu)), size, align,
> > -				    __pa(MAX_DMA_ADDRESS));
> > +	return __alloc_bootmem_node(NODE_DATA(early_cpu_to_node(cpu)), size,
> > +				    align, __pa(MAX_DMA_ADDRESS));
> >  }
> >  
> >  static void __init pcpu_fc_free(void *ptr, size_t size)
> > @@ -726,7 +726,7 @@ static void __init pcpu_fc_free(void *ptr, size_t size)
> >  
> >  static int pcpu_cpu_distance(unsigned int from, unsigned int to)
> >  {
> > -	if (cpu_to_node(from) == cpu_to_node(to))
> > +	if (early_cpu_to_node(from) == early_cpu_to_node(to))
> >  		return LOCAL_DISTANCE;
> >  	else
> >  		return REMOTE_DISTANCE;
> > diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> > index 5e80621..9ffabf4 100644
> > --- a/arch/powerpc/mm/numa.c
> > +++ b/arch/powerpc/mm/numa.c
> > @@ -157,6 +157,11 @@ static void map_cpu_to_node(int cpu, int node)
> >  		cpumask_set_cpu(cpu, node_to_cpumask_map[node]);
> >  }
> >  
> > +int early_cpu_to_node(int cpu)
> > +{
> > +	return numa_cpu_lookup_table[cpu];
> > +}
> > +
> >  #if defined(CONFIG_HOTPLUG_CPU) || defined(CONFIG_PPC_SPLPAR)
> >  static void unmap_cpu_from_node(unsigned long cpu)
> >  {
> > 
> > 
> 
> early_cpu_to_node() looks like it's begging to be __init since we 
> shouldn't have a need to reference to numa_cpu_lookup_table after boot and 
> that appears like it can be done if pcpu_cpu_distance() is made __init in 
> this patch and smp_prepare_boot_cpu() is made __init in the next patch.  
> So I think this is fine, but those functions and things like 
> reset_numa_cpu_lookup_table() should be in init.text.

Yep, that makes total sense!

> After the percpu areas on initialized and cpu_to_node() is correct, it 
> would be really nice to be able to make numa_cpu_lookup_table[] be 
> __initdata since it shouldn't be necessary anymore.  That probably has cpu 
> callbacks that need to be modified to no longer look at 
> numa_cpu_lookup_table[] or pass the value in, but it would make it much 
> cleaner.  Then nobody will have to worry about figuring out whether 
> early_cpu_to_node() or cpu_to_node() is the right one to call.

When I worked on the original pcpu patches for power, I wanted to do
this, but got myself confused and never came back to it. Thank you for
suggesting it and I'll work on it soon.

-Nish

  reply	other threads:[~2015-07-10 16:25 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-02 23:02 [RFC PATCH 1/2] powerpc/numa: fix cpu_to_node() usage during boot Nishanth Aravamudan
2015-07-02 23:03 ` [RFC PATCH 2/2] powerpc/smp: use early_cpu_to_node() instead of direct references to numa_cpu_lookup_table Nishanth Aravamudan
2015-07-09  1:25   ` David Rientjes
2015-07-08  4:00 ` [RFC,1/2] powerpc/numa: fix cpu_to_node() usage during boot Michael Ellerman
2015-07-08 23:16   ` Nishanth Aravamudan
2015-07-09  1:24     ` David Rientjes
2015-07-10 16:15     ` Nishanth Aravamudan
2015-07-15 20:37       ` Tejun Heo
2015-07-15  0:22     ` Michael Ellerman
2015-07-09  1:22 ` [RFC PATCH 1/2] " David Rientjes
2015-07-10 16:25   ` Nishanth Aravamudan [this message]
2015-07-14 21:31     ` David Rientjes
2015-07-15 20:35 ` Tejun Heo
2015-07-15 22:43   ` Nishanth Aravamudan
2015-07-15 22:47     ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150710162531.GE44862@linux.vnet.ibm.com \
    --to=nacc@linux.vnet.ibm.com \
    --cc=anton@samba.org \
    --cc=benh@kernel.crashing.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    --cc=paulus@samba.org \
    --cc=peterz@infradead.org \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).