* Re: NUMA topology question wrt. d4edc5b6
[not found] <20140521200451.GB5755@linux.vnet.ibm.com>
@ 2014-05-22 20:48 ` Srivatsa S. Bhat
2014-05-28 20:37 ` Nishanth Aravamudan
2014-06-09 21:38 ` David Rientjes
0 siblings, 2 replies; 4+ messages in thread
From: Srivatsa S. Bhat @ 2014-05-22 20:48 UTC (permalink / raw)
To: Nishanth Aravamudan
Cc: benh, Srikar Dronamraju, nfont, Aneesh Kumar K.V, Cody P Schafer,
Anton Blanchard, Dave Hansen, linuxppc-dev@lists.ozlabs.org list,
Linux MM
[ Adding a few more CC's ]
On 05/22/2014 01:34 AM, Nishanth Aravamudan wrote:
> Hi Srivatsa,
>
> After d4edc5b6 ("powerpc: Fix the setup of CPU-to-Node mappings during
> CPU online"), cpu_to_node() looks like:
>
> static inline int cpu_to_node(int cpu)
> {
> int nid;
>
> nid = numa_cpu_lookup_table[cpu];
>
> /*
> * During early boot, the numa-cpu lookup table might not have been
> * setup for all CPUs yet. In such cases, default to node 0.
> */
> return (nid < 0) ? 0 : nid;
> }
>
> However, I'm curious if this is correct in all cases. I have seen
> several LPARs that do not have any CPUs on node 0. In fact, because node
> 0 is statically set online in the initialization of the N_ONLINE
> nodemask, 0 is always present to Linux, whether it is present on the
> system. I'm not sure what the best thing to do here is, but I'm curious
> if you have any ideas? I would like to remove the static initialization
> of node 0, as it's confusing to users to see an empty node (particularly
> when it's completely separate in the numbering from other nodes), but
> we trip a panic (refer to:
> http://www.spinics.net/lists/linux-mm/msg73321.html).
>
Ah, I see. I didn't have any particular reason to default it to zero.
I just did that because the existing code before this patch did the same
thing. (numa_cpu_lookup_table[] is a global array, so it will be initialized
with zeros. So if we access it before populating it via numa_setup_cpu(),
it would return 0. So I retained that behaviour with the above conditional).
Will something like the below [totally untested] patch solve the boot-panic?
I understand that as of today first_online_node will still pick 0 since
N_ONLINE is initialized statically, but with your proposed change to that
init code, I guess the following patch should avoid the boot panic.
[ But note that first_online_node is hard-coded to 0, if MAX_NUMNODES is = 1.
So we'll have to fix that if powerpc can have a single node system whose node
is numbered something other than 0. Can that happen as well? ]
And regarding your question about what is the best way to fix this whole Linux
MM's assumption about node0, I'm not really sure.. since I am not really aware
of the extent to which the MM subsystem is intertwined with this assumption
and what it would take to cure that :-(
Regards,
Srivatsa S. Bhat
diff --git a/arch/powerpc/include/asm/topology.h b/arch/powerpc/include/asm/topology.h
index c920215..58e6469 100644
--- a/arch/powerpc/include/asm/topology.h
+++ b/arch/powerpc/include/asm/topology.h
@@ -18,6 +18,7 @@ struct device_node;
*/
#define RECLAIM_DISTANCE 10
+#include <linux/nodemask.h>
#include <asm/mmzone.h>
static inline int cpu_to_node(int cpu)
@@ -30,7 +31,7 @@ static inline int cpu_to_node(int cpu)
* During early boot, the numa-cpu lookup table might not have been
* setup for all CPUs yet. In such cases, default to node 0.
*/
- return (nid < 0) ? 0 : nid;
+ return (nid < 0) ? first_online_node : nid;
}
#define parent_node(node) (node)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: NUMA topology question wrt. d4edc5b6
2014-05-22 20:48 ` NUMA topology question wrt. d4edc5b6 Srivatsa S. Bhat
@ 2014-05-28 20:37 ` Nishanth Aravamudan
2014-06-09 21:38 ` David Rientjes
1 sibling, 0 replies; 4+ messages in thread
From: Nishanth Aravamudan @ 2014-05-28 20:37 UTC (permalink / raw)
To: Srivatsa S. Bhat
Cc: benh, Srikar Dronamraju, nfont, Aneesh Kumar K.V, Cody P Schafer,
Anton Blanchard, Dave Hansen, linuxppc-dev@lists.ozlabs.org list,
Linux MM
On 23.05.2014 [02:18:05 +0530], Srivatsa S. Bhat wrote:
>
> [ Adding a few more CC's ]
>
> On 05/22/2014 01:34 AM, Nishanth Aravamudan wrote:
> > Hi Srivatsa,
> >
> > After d4edc5b6 ("powerpc: Fix the setup of CPU-to-Node mappings during
> > CPU online"), cpu_to_node() looks like:
> >
> > static inline int cpu_to_node(int cpu)
> > {
> > int nid;
> >
> > nid = numa_cpu_lookup_table[cpu];
> >
> > /*
> > * During early boot, the numa-cpu lookup table might not have been
> > * setup for all CPUs yet. In such cases, default to node 0.
> > */
> > return (nid < 0) ? 0 : nid;
> > }
> >
> > However, I'm curious if this is correct in all cases. I have seen
> > several LPARs that do not have any CPUs on node 0. In fact, because node
> > 0 is statically set online in the initialization of the N_ONLINE
> > nodemask, 0 is always present to Linux, whether it is present on the
> > system. I'm not sure what the best thing to do here is, but I'm curious
> > if you have any ideas? I would like to remove the static initialization
> > of node 0, as it's confusing to users to see an empty node (particularly
> > when it's completely separate in the numbering from other nodes), but
> > we trip a panic (refer to:
> > http://www.spinics.net/lists/linux-mm/msg73321.html).
> >
>
> Ah, I see. I didn't have any particular reason to default it to zero.
> I just did that because the existing code before this patch did the same
> thing. (numa_cpu_lookup_table[] is a global array, so it will be initialized
> with zeros. So if we access it before populating it via numa_setup_cpu(),
> it would return 0. So I retained that behaviour with the above conditional).
Ok, that seems reasonable to me (keeping the behavior the same as it was
before).
> Will something like the below [totally untested] patch solve the boot-panic?
> I understand that as of today first_online_node will still pick 0 since
> N_ONLINE is initialized statically, but with your proposed change to that
> init code, I guess the following patch should avoid the boot panic.
>
> [ But note that first_online_node is hard-coded to 0, if MAX_NUMNODES is = 1.
> So we'll have to fix that if powerpc can have a single node system whose node
> is numbered something other than 0. Can that happen as well? ]
I think all single-node systems are only Node 0, but I'm not 100% on
that.
> And regarding your question about what is the best way to fix this
> whole Linux MM's assumption about node0, I'm not really sure.. since I
> am not really aware of the extent to which the MM subsystem is
> intertwined with this assumption and what it would take to cure that
> :-(
Well, at this point, it might be fine to just leave it alone, as it
seems to be more trouble than it's worth -- and really the only
confusion is on those LPARs where there really isn't a Node 0. I'll take
another look later this week.
Thanks,
Nish
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: NUMA topology question wrt. d4edc5b6
2014-05-22 20:48 ` NUMA topology question wrt. d4edc5b6 Srivatsa S. Bhat
2014-05-28 20:37 ` Nishanth Aravamudan
@ 2014-06-09 21:38 ` David Rientjes
2014-06-10 23:30 ` Nishanth Aravamudan
1 sibling, 1 reply; 4+ messages in thread
From: David Rientjes @ 2014-06-09 21:38 UTC (permalink / raw)
To: Srivatsa S. Bhat
Cc: Nishanth Aravamudan, benh, Srikar Dronamraju, nfont,
Aneesh Kumar K.V, Cody P Schafer, Anton Blanchard, Dave Hansen,
linuxppc-dev@lists.ozlabs.org list, Linux MM
On Fri, 23 May 2014, Srivatsa S. Bhat wrote:
> diff --git a/arch/powerpc/include/asm/topology.h b/arch/powerpc/include/asm/topology.h
> index c920215..58e6469 100644
> --- a/arch/powerpc/include/asm/topology.h
> +++ b/arch/powerpc/include/asm/topology.h
> @@ -18,6 +18,7 @@ struct device_node;
> */
> #define RECLAIM_DISTANCE 10
>
> +#include <linux/nodemask.h>
> #include <asm/mmzone.h>
>
> static inline int cpu_to_node(int cpu)
> @@ -30,7 +31,7 @@ static inline int cpu_to_node(int cpu)
> * During early boot, the numa-cpu lookup table might not have been
> * setup for all CPUs yet. In such cases, default to node 0.
> */
> - return (nid < 0) ? 0 : nid;
> + return (nid < 0) ? first_online_node : nid;
> }
>
> #define parent_node(node) (node)
I wonder what would happen on ppc if we just returned NUMA_NO_NODE here
for cpus that have not been mapped (they shouldn't even be possible).
This would at least allow callers that do
kmalloc_node(..., cpu_to_node(cpu)) to be allocated on the local cpu
rather than on a perhaps offline or remote node 0.
It would seem better to catch callers that do
cpu_to_node(<not-possible-cpu>) rather than blindly return an online node.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: NUMA topology question wrt. d4edc5b6
2014-06-09 21:38 ` David Rientjes
@ 2014-06-10 23:30 ` Nishanth Aravamudan
0 siblings, 0 replies; 4+ messages in thread
From: Nishanth Aravamudan @ 2014-06-10 23:30 UTC (permalink / raw)
To: David Rientjes
Cc: Srivatsa S. Bhat, benh, Srikar Dronamraju, nfont,
Aneesh Kumar K.V, Cody P Schafer, Anton Blanchard, Dave Hansen,
linuxppc-dev@lists.ozlabs.org list, Linux MM
On 09.06.2014 [14:38:26 -0700], David Rientjes wrote:
> On Fri, 23 May 2014, Srivatsa S. Bhat wrote:
>
> > diff --git a/arch/powerpc/include/asm/topology.h b/arch/powerpc/include/asm/topology.h
> > index c920215..58e6469 100644
> > --- a/arch/powerpc/include/asm/topology.h
> > +++ b/arch/powerpc/include/asm/topology.h
> > @@ -18,6 +18,7 @@ struct device_node;
> > */
> > #define RECLAIM_DISTANCE 10
> >
> > +#include <linux/nodemask.h>
> > #include <asm/mmzone.h>
> >
> > static inline int cpu_to_node(int cpu)
> > @@ -30,7 +31,7 @@ static inline int cpu_to_node(int cpu)
> > * During early boot, the numa-cpu lookup table might not have been
> > * setup for all CPUs yet. In such cases, default to node 0.
> > */
> > - return (nid < 0) ? 0 : nid;
> > + return (nid < 0) ? first_online_node : nid;
> > }
> >
> > #define parent_node(node) (node)
>
> I wonder what would happen on ppc if we just returned NUMA_NO_NODE here
> for cpus that have not been mapped (they shouldn't even be possible).
Well, with my patch (Ben sent it to Linus in the last pull request, I
think), powerpc uses the generic per-cpu stuff, so this function is
gone. Dunno if it makes sense to initialize the per-cpu data to
NUMA_NO_NODE (rather than 0?).
For powerpc, it's a timing thing. We can call cpu_to_node() quite early,
and we may not have set up the mapping information yet.
> This would at least allow callers that do
> kmalloc_node(..., cpu_to_node(cpu)) to be allocated on the local cpu
> rather than on a perhaps offline or remote node 0.
>
> It would seem better to catch callers that do
> cpu_to_node(<not-possible-cpu>) rather than blindly return an online node.
Agreed, but I've not seen such a case.
Thanks,
Nish
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2014-06-10 23:31 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20140521200451.GB5755@linux.vnet.ibm.com>
2014-05-22 20:48 ` NUMA topology question wrt. d4edc5b6 Srivatsa S. Bhat
2014-05-28 20:37 ` Nishanth Aravamudan
2014-06-09 21:38 ` David Rientjes
2014-06-10 23:30 ` Nishanth Aravamudan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).