linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Re: NUMA topology question wrt. d4edc5b6
       [not found] <20140521200451.GB5755@linux.vnet.ibm.com>
@ 2014-05-22 20:48 ` Srivatsa S. Bhat
  2014-05-28 20:37   ` Nishanth Aravamudan
  2014-06-09 21:38   ` David Rientjes
  0 siblings, 2 replies; 4+ messages in thread
From: Srivatsa S. Bhat @ 2014-05-22 20:48 UTC (permalink / raw)
  To: Nishanth Aravamudan
  Cc: benh, Srikar Dronamraju, nfont, Aneesh Kumar K.V, Cody P Schafer,
	Anton Blanchard, Dave Hansen, linuxppc-dev@lists.ozlabs.org list,
	Linux MM


[ Adding a few more CC's ]

On 05/22/2014 01:34 AM, Nishanth Aravamudan wrote:
> Hi Srivatsa,
> 
> After d4edc5b6 ("powerpc: Fix the setup of CPU-to-Node mappings during
> CPU online"), cpu_to_node() looks like:
> 
> static inline int cpu_to_node(int cpu)
> {
>         int nid;
> 
>         nid = numa_cpu_lookup_table[cpu];
> 
>         /*
>          * During early boot, the numa-cpu lookup table might not have been
>          * setup for all CPUs yet. In such cases, default to node 0.
>          */
>         return (nid < 0) ? 0 : nid;
> }
> 
> However, I'm curious if this is correct in all cases. I have seen
> several LPARs that do not have any CPUs on node 0. In fact, because node
> 0 is statically set online in the initialization of the N_ONLINE
> nodemask, 0 is always present to Linux, whether it is present on the
> system. I'm not sure what the best thing to do here is, but I'm curious
> if you have any ideas? I would like to remove the static initialization
> of node 0, as it's confusing to users to see an empty node (particularly
> when it's completely separate in the numbering from other nodes), but
> we trip a panic (refer to:
> http://www.spinics.net/lists/linux-mm/msg73321.html).
> 

Ah, I see. I didn't have any particular reason to default it to zero.
I just did that because the existing code before this patch did the same
thing. (numa_cpu_lookup_table[] is a global array, so it will be initialized
with zeros. So if we access it before populating it via numa_setup_cpu(),
it would return 0. So I retained that behaviour with the above conditional).

Will something like the below [totally untested] patch solve the boot-panic?
I understand that as of today first_online_node will still pick 0 since
N_ONLINE is initialized statically, but with your proposed change to that
init code, I guess the following patch should avoid the boot panic.

[ But note that first_online_node is hard-coded to 0, if MAX_NUMNODES is = 1.
So we'll have to fix that if powerpc can have a single node system whose node
is numbered something other than 0. Can that happen as well? ]


And regarding your question about what is the best way to fix this whole Linux
MM's assumption about node0, I'm not really sure.. since I am not really aware
of the extent to which the MM subsystem is intertwined with this assumption
and what it would take to cure that :-(

Regards,
Srivatsa S. Bhat


diff --git a/arch/powerpc/include/asm/topology.h b/arch/powerpc/include/asm/topology.h
index c920215..58e6469 100644
--- a/arch/powerpc/include/asm/topology.h
+++ b/arch/powerpc/include/asm/topology.h
@@ -18,6 +18,7 @@ struct device_node;
  */
 #define RECLAIM_DISTANCE 10
 
+#include <linux/nodemask.h>
 #include <asm/mmzone.h>
 
 static inline int cpu_to_node(int cpu)
@@ -30,7 +31,7 @@ static inline int cpu_to_node(int cpu)
 	 * During early boot, the numa-cpu lookup table might not have been
 	 * setup for all CPUs yet. In such cases, default to node 0.
 	 */
-	return (nid < 0) ? 0 : nid;
+	return (nid < 0) ? first_online_node : nid;
 }
 
 #define parent_node(node)	(node)


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: NUMA topology question wrt. d4edc5b6
  2014-05-22 20:48 ` NUMA topology question wrt. d4edc5b6 Srivatsa S. Bhat
@ 2014-05-28 20:37   ` Nishanth Aravamudan
  2014-06-09 21:38   ` David Rientjes
  1 sibling, 0 replies; 4+ messages in thread
From: Nishanth Aravamudan @ 2014-05-28 20:37 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: benh, Srikar Dronamraju, nfont, Aneesh Kumar K.V, Cody P Schafer,
	Anton Blanchard, Dave Hansen, linuxppc-dev@lists.ozlabs.org list,
	Linux MM

On 23.05.2014 [02:18:05 +0530], Srivatsa S. Bhat wrote:
> 
> [ Adding a few more CC's ]
> 
> On 05/22/2014 01:34 AM, Nishanth Aravamudan wrote:
> > Hi Srivatsa,
> > 
> > After d4edc5b6 ("powerpc: Fix the setup of CPU-to-Node mappings during
> > CPU online"), cpu_to_node() looks like:
> > 
> > static inline int cpu_to_node(int cpu)
> > {
> >         int nid;
> > 
> >         nid = numa_cpu_lookup_table[cpu];
> > 
> >         /*
> >          * During early boot, the numa-cpu lookup table might not have been
> >          * setup for all CPUs yet. In such cases, default to node 0.
> >          */
> >         return (nid < 0) ? 0 : nid;
> > }
> > 
> > However, I'm curious if this is correct in all cases. I have seen
> > several LPARs that do not have any CPUs on node 0. In fact, because node
> > 0 is statically set online in the initialization of the N_ONLINE
> > nodemask, 0 is always present to Linux, whether it is present on the
> > system. I'm not sure what the best thing to do here is, but I'm curious
> > if you have any ideas? I would like to remove the static initialization
> > of node 0, as it's confusing to users to see an empty node (particularly
> > when it's completely separate in the numbering from other nodes), but
> > we trip a panic (refer to:
> > http://www.spinics.net/lists/linux-mm/msg73321.html).
> > 
> 
> Ah, I see. I didn't have any particular reason to default it to zero.
> I just did that because the existing code before this patch did the same
> thing. (numa_cpu_lookup_table[] is a global array, so it will be initialized
> with zeros. So if we access it before populating it via numa_setup_cpu(),
> it would return 0. So I retained that behaviour with the above conditional).

Ok, that seems reasonable to me (keeping the behavior the same as it was
before).

> Will something like the below [totally untested] patch solve the boot-panic?
> I understand that as of today first_online_node will still pick 0 since
> N_ONLINE is initialized statically, but with your proposed change to that
> init code, I guess the following patch should avoid the boot panic.
> 
> [ But note that first_online_node is hard-coded to 0, if MAX_NUMNODES is = 1.
> So we'll have to fix that if powerpc can have a single node system whose node
> is numbered something other than 0. Can that happen as well? ]

I think all single-node systems are only Node 0, but I'm not 100% on
that.

> And regarding your question about what is the best way to fix this
> whole Linux MM's assumption about node0, I'm not really sure.. since I
> am not really aware of the extent to which the MM subsystem is
> intertwined with this assumption and what it would take to cure that
> :-(

Well, at this point, it might be fine to just leave it alone, as it
seems to be more trouble than it's worth -- and really the only
confusion is on those LPARs where there really isn't a Node 0. I'll take
another look later this week.

Thanks,
Nish

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: NUMA topology question wrt. d4edc5b6
  2014-05-22 20:48 ` NUMA topology question wrt. d4edc5b6 Srivatsa S. Bhat
  2014-05-28 20:37   ` Nishanth Aravamudan
@ 2014-06-09 21:38   ` David Rientjes
  2014-06-10 23:30     ` Nishanth Aravamudan
  1 sibling, 1 reply; 4+ messages in thread
From: David Rientjes @ 2014-06-09 21:38 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: Nishanth Aravamudan, benh, Srikar Dronamraju, nfont,
	Aneesh Kumar K.V, Cody P Schafer, Anton Blanchard, Dave Hansen,
	linuxppc-dev@lists.ozlabs.org list, Linux MM

On Fri, 23 May 2014, Srivatsa S. Bhat wrote:

> diff --git a/arch/powerpc/include/asm/topology.h b/arch/powerpc/include/asm/topology.h
> index c920215..58e6469 100644
> --- a/arch/powerpc/include/asm/topology.h
> +++ b/arch/powerpc/include/asm/topology.h
> @@ -18,6 +18,7 @@ struct device_node;
>   */
>  #define RECLAIM_DISTANCE 10
>  
> +#include <linux/nodemask.h>
>  #include <asm/mmzone.h>
>  
>  static inline int cpu_to_node(int cpu)
> @@ -30,7 +31,7 @@ static inline int cpu_to_node(int cpu)
>  	 * During early boot, the numa-cpu lookup table might not have been
>  	 * setup for all CPUs yet. In such cases, default to node 0.
>  	 */
> -	return (nid < 0) ? 0 : nid;
> +	return (nid < 0) ? first_online_node : nid;
>  }
>  
>  #define parent_node(node)	(node)

I wonder what would happen on ppc if we just returned NUMA_NO_NODE here 
for cpus that have not been mapped (they shouldn't even be possible).  
This would at least allow callers that do
kmalloc_node(..., cpu_to_node(cpu)) to be allocated on the local cpu 
rather than on a perhaps offline or remote node 0.

It would seem better to catch callers that do 
cpu_to_node(<not-possible-cpu>) rather than blindly return an online node.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: NUMA topology question wrt. d4edc5b6
  2014-06-09 21:38   ` David Rientjes
@ 2014-06-10 23:30     ` Nishanth Aravamudan
  0 siblings, 0 replies; 4+ messages in thread
From: Nishanth Aravamudan @ 2014-06-10 23:30 UTC (permalink / raw)
  To: David Rientjes
  Cc: Srivatsa S. Bhat, benh, Srikar Dronamraju, nfont,
	Aneesh Kumar K.V, Cody P Schafer, Anton Blanchard, Dave Hansen,
	linuxppc-dev@lists.ozlabs.org list, Linux MM

On 09.06.2014 [14:38:26 -0700], David Rientjes wrote:
> On Fri, 23 May 2014, Srivatsa S. Bhat wrote:
> 
> > diff --git a/arch/powerpc/include/asm/topology.h b/arch/powerpc/include/asm/topology.h
> > index c920215..58e6469 100644
> > --- a/arch/powerpc/include/asm/topology.h
> > +++ b/arch/powerpc/include/asm/topology.h
> > @@ -18,6 +18,7 @@ struct device_node;
> >   */
> >  #define RECLAIM_DISTANCE 10
> >  
> > +#include <linux/nodemask.h>
> >  #include <asm/mmzone.h>
> >  
> >  static inline int cpu_to_node(int cpu)
> > @@ -30,7 +31,7 @@ static inline int cpu_to_node(int cpu)
> >  	 * During early boot, the numa-cpu lookup table might not have been
> >  	 * setup for all CPUs yet. In such cases, default to node 0.
> >  	 */
> > -	return (nid < 0) ? 0 : nid;
> > +	return (nid < 0) ? first_online_node : nid;
> >  }
> >  
> >  #define parent_node(node)	(node)
> 
> I wonder what would happen on ppc if we just returned NUMA_NO_NODE here 
> for cpus that have not been mapped (they shouldn't even be possible).  

Well, with my patch (Ben sent it to Linus in the last pull request, I
think), powerpc uses the generic per-cpu stuff, so this function is
gone. Dunno if it makes sense to initialize the per-cpu data to
NUMA_NO_NODE (rather than 0?).

For powerpc, it's a timing thing. We can call cpu_to_node() quite early,
and we may not have set up the mapping information yet.

> This would at least allow callers that do
> kmalloc_node(..., cpu_to_node(cpu)) to be allocated on the local cpu 
> rather than on a perhaps offline or remote node 0.
> 
> It would seem better to catch callers that do 
> cpu_to_node(<not-possible-cpu>) rather than blindly return an online node.

Agreed, but I've not seen such a case.

Thanks,
Nish

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-06-10 23:31 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20140521200451.GB5755@linux.vnet.ibm.com>
2014-05-22 20:48 ` NUMA topology question wrt. d4edc5b6 Srivatsa S. Bhat
2014-05-28 20:37   ` Nishanth Aravamudan
2014-06-09 21:38   ` David Rientjes
2014-06-10 23:30     ` Nishanth Aravamudan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).