* Node 0 not necessary for powerpc?
@ 2014-03-11 19:56 Nishanth Aravamudan
2014-03-12 2:02 ` David Rientjes
2014-03-12 13:41 ` Christoph Lameter
0 siblings, 2 replies; 14+ messages in thread
From: Nishanth Aravamudan @ 2014-03-11 19:56 UTC (permalink / raw)
To: linux-mm; +Cc: cl, linuxppc-dev, anton, rientjes
I have a P7 system that has no node0, but a node0 shows up in numactl
--hardware, which has no cpus and no memory (and no PCI devices):
numactl --hardware
available: 4 nodes (0-3)
node 0 cpus:
node 0 size: 0 MB
node 0 free: 0 MB
node 1 cpus: 0 1 2 3 4 5 6 7 8 9 10 11
node 1 size: 0 MB
node 1 free: 0 MB
node 2 cpus:
node 2 size: 7935 MB
node 2 free: 7716 MB
node 3 cpus:
node 3 size: 8395 MB
node 3 free: 8015 MB
node distances:
node 0 1 2 3
0: 10 20 10 20
1: 20 10 20 20
2: 10 20 10 20
3: 20 20 20 10
This is because we statically initialize N_ONLINE to be [0] in
mm/page_alloc.c:
[N_ONLINE] = { { [0] = 1UL } },
I'm not sure what the architectural requirements are here, but at least
on this test system, removing this initialization, it boots fine and is
running. I've not yet tried stress tests, but it's survived the
beginnings of kernbench so far.
numactl --hardware
available: 3 nodes (1-3)
node 1 cpus: 0 1 2 3 4 5 6 7 8 9 10 11
node 1 size: 0 MB
node 1 free: 0 MB
node 2 cpus:
node 2 size: 7935 MB
node 2 free: 7479 MB
node 3 cpus:
node 3 size: 8396 MB
node 3 free: 8375 MB
node distances:
node 1 2 3
1: 10 20 20
2: 20 10 20
3: 20 20 10
Perhaps we could put in a ARCH_DOES_NOT_NEED_NODE0 and only define it on
powerpc for now, conditionalizing the above initialization on that?
Thanks,
Nish
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Node 0 not necessary for powerpc?
2014-03-11 19:56 Node 0 not necessary for powerpc? Nishanth Aravamudan
@ 2014-03-12 2:02 ` David Rientjes
2014-03-13 16:48 ` Nishanth Aravamudan
2014-03-12 13:41 ` Christoph Lameter
1 sibling, 1 reply; 14+ messages in thread
From: David Rientjes @ 2014-03-12 2:02 UTC (permalink / raw)
To: Nishanth Aravamudan; +Cc: linux-mm, cl, linuxppc-dev, anton
On Tue, 11 Mar 2014, Nishanth Aravamudan wrote:
> I have a P7 system that has no node0, but a node0 shows up in numactl
> --hardware, which has no cpus and no memory (and no PCI devices):
>
> numactl --hardware
> available: 4 nodes (0-3)
> node 0 cpus:
> node 0 size: 0 MB
> node 0 free: 0 MB
> node 1 cpus: 0 1 2 3 4 5 6 7 8 9 10 11
> node 1 size: 0 MB
> node 1 free: 0 MB
> node 2 cpus:
> node 2 size: 7935 MB
> node 2 free: 7716 MB
> node 3 cpus:
> node 3 size: 8395 MB
> node 3 free: 8015 MB
> node distances:
> node 0 1 2 3
> 0: 10 20 10 20
> 1: 20 10 20 20
> 2: 10 20 10 20
> 3: 20 20 20 10
>
> This is because we statically initialize N_ONLINE to be [0] in
> mm/page_alloc.c:
>
> [N_ONLINE] = { { [0] = 1UL } },
>
> I'm not sure what the architectural requirements are here, but at least
> on this test system, removing this initialization, it boots fine and is
> running. I've not yet tried stress tests, but it's survived the
> beginnings of kernbench so far.
>
> numactl --hardware
> available: 3 nodes (1-3)
> node 1 cpus: 0 1 2 3 4 5 6 7 8 9 10 11
> node 1 size: 0 MB
> node 1 free: 0 MB
> node 2 cpus:
> node 2 size: 7935 MB
> node 2 free: 7479 MB
> node 3 cpus:
> node 3 size: 8396 MB
> node 3 free: 8375 MB
> node distances:
> node 1 2 3
> 1: 10 20 20
> 2: 20 10 20
> 3: 20 20 10
>
> Perhaps we could put in a ARCH_DOES_NOT_NEED_NODE0 and only define it on
> powerpc for now, conditionalizing the above initialization on that?
>
I don't know if anything has recently changed in the past year or so, but
I've booted x86 machines with a hacked BIOS so that all memory on node 0
is hotpluggable and offline, so I believe this is possible on x86 as well.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Node 0 not necessary for powerpc?
2014-03-11 19:56 Node 0 not necessary for powerpc? Nishanth Aravamudan
2014-03-12 2:02 ` David Rientjes
@ 2014-03-12 13:41 ` Christoph Lameter
2014-03-13 16:49 ` Nishanth Aravamudan
1 sibling, 1 reply; 14+ messages in thread
From: Christoph Lameter @ 2014-03-12 13:41 UTC (permalink / raw)
To: Nishanth Aravamudan; +Cc: linux-mm, linuxppc-dev, anton, rientjes
On Tue, 11 Mar 2014, Nishanth Aravamudan wrote:
> I have a P7 system that has no node0, but a node0 shows up in numactl
> --hardware, which has no cpus and no memory (and no PCI devices):
Well as you see from the code there has been so far the assumption that
node 0 has memory. I have never run a machine that has no node 0 memory.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Node 0 not necessary for powerpc?
2014-03-12 2:02 ` David Rientjes
@ 2014-03-13 16:48 ` Nishanth Aravamudan
0 siblings, 0 replies; 14+ messages in thread
From: Nishanth Aravamudan @ 2014-03-13 16:48 UTC (permalink / raw)
To: David Rientjes; +Cc: linux-mm, cl, linuxppc-dev, anton
On 11.03.2014 [19:02:17 -0700], David Rientjes wrote:
> On Tue, 11 Mar 2014, Nishanth Aravamudan wrote:
>
> > I have a P7 system that has no node0, but a node0 shows up in numactl
> > --hardware, which has no cpus and no memory (and no PCI devices):
> >
> > numactl --hardware
> > available: 4 nodes (0-3)
> > node 0 cpus:
> > node 0 size: 0 MB
> > node 0 free: 0 MB
> > node 1 cpus: 0 1 2 3 4 5 6 7 8 9 10 11
> > node 1 size: 0 MB
> > node 1 free: 0 MB
> > node 2 cpus:
> > node 2 size: 7935 MB
> > node 2 free: 7716 MB
> > node 3 cpus:
> > node 3 size: 8395 MB
> > node 3 free: 8015 MB
> > node distances:
> > node 0 1 2 3
> > 0: 10 20 10 20
> > 1: 20 10 20 20
> > 2: 10 20 10 20
> > 3: 20 20 20 10
> >
> > This is because we statically initialize N_ONLINE to be [0] in
> > mm/page_alloc.c:
> >
> > [N_ONLINE] = { { [0] = 1UL } },
> >
> > I'm not sure what the architectural requirements are here, but at least
> > on this test system, removing this initialization, it boots fine and is
> > running. I've not yet tried stress tests, but it's survived the
> > beginnings of kernbench so far.
> >
> > numactl --hardware
> > available: 3 nodes (1-3)
> > node 1 cpus: 0 1 2 3 4 5 6 7 8 9 10 11
> > node 1 size: 0 MB
> > node 1 free: 0 MB
> > node 2 cpus:
> > node 2 size: 7935 MB
> > node 2 free: 7479 MB
> > node 3 cpus:
> > node 3 size: 8396 MB
> > node 3 free: 8375 MB
> > node distances:
> > node 1 2 3
> > 1: 10 20 20
> > 2: 20 10 20
> > 3: 20 20 10
> >
> > Perhaps we could put in a ARCH_DOES_NOT_NEED_NODE0 and only define it on
> > powerpc for now, conditionalizing the above initialization on that?
> >
>
> I don't know if anything has recently changed in the past year or so, but
> I've booted x86 machines with a hacked BIOS so that all memory on node 0
> is hotpluggable and offline, so I believe this is possible on x86 as well.
Good to know, thanks! This is also certainly not very common on powerpc,
but it is possible -- and the topology ends up being inaccurate because
of the static initialization.
Thanks,
Nish
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Node 0 not necessary for powerpc?
2014-03-12 13:41 ` Christoph Lameter
@ 2014-03-13 16:49 ` Nishanth Aravamudan
2014-05-19 18:24 ` Nishanth Aravamudan
0 siblings, 1 reply; 14+ messages in thread
From: Nishanth Aravamudan @ 2014-03-13 16:49 UTC (permalink / raw)
To: Christoph Lameter; +Cc: linux-mm, linuxppc-dev, anton, rientjes
On 12.03.2014 [08:41:40 -0500], Christoph Lameter wrote:
> On Tue, 11 Mar 2014, Nishanth Aravamudan wrote:
> > I have a P7 system that has no node0, but a node0 shows up in numactl
> > --hardware, which has no cpus and no memory (and no PCI devices):
>
> Well as you see from the code there has been so far the assumption that
> node 0 has memory. I have never run a machine that has no node 0 memory.
Do you mean beyond the initialization? I didn't see anything obvious so
far in the code itself that assumes a given node has memory (in the
sense of the nid). What are your thoughts about how best to support
this?
Thanks,
Nish
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Node 0 not necessary for powerpc?
2014-03-13 16:49 ` Nishanth Aravamudan
@ 2014-05-19 18:24 ` Nishanth Aravamudan
2014-05-21 14:16 ` Christoph Lameter
0 siblings, 1 reply; 14+ messages in thread
From: Nishanth Aravamudan @ 2014-05-19 18:24 UTC (permalink / raw)
To: Christoph Lameter; +Cc: linux-mm, linuxppc-dev, anton, rientjes
On 13.03.2014 [09:49:49 -0700], Nishanth Aravamudan wrote:
> On 12.03.2014 [08:41:40 -0500], Christoph Lameter wrote:
> > On Tue, 11 Mar 2014, Nishanth Aravamudan wrote:
> > > I have a P7 system that has no node0, but a node0 shows up in numactl
> > > --hardware, which has no cpus and no memory (and no PCI devices):
> >
> > Well as you see from the code there has been so far the assumption that
> > node 0 has memory. I have never run a machine that has no node 0 memory.
>
> Do you mean beyond the initialization? I didn't see anything obvious so
> far in the code itself that assumes a given node has memory (in the
> sense of the nid). What are your thoughts about how best to support
> this?
Ah, I found one path that is problematic on powerpc:
I'm seeing a panic at boot with this change on an LPAR which actually
has no Node 0. Here's what I think is happening:
start_kernel
...
-> setup_per_cpu_areas
-> pcpu_embed_first_chunk
-> pcpu_fc_alloc
-> ___alloc_bootmem_node(NODE_DATA(cpu_to_node(cpu), ...
-> smp_prepare_boot_cpu
-> set_numa_node(boot_cpuid)
So we panic on the NODE_DATA call. It seems that ia64, at least, uses
pcpu_alloc_first_chunk rather than embed. x86 has some code to handle
early calls of cpu_to_node (early_cpu_to_node) and sets the mapping for
all CPUs in setup_per_cpu_areas().
Thoughts? Does that mean we need something similar to x86 for powerpc?
Thanks,
Nish
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Node 0 not necessary for powerpc?
2014-05-19 18:24 ` Nishanth Aravamudan
@ 2014-05-21 14:16 ` Christoph Lameter
2014-05-21 18:58 ` Tejun Heo
0 siblings, 1 reply; 14+ messages in thread
From: Christoph Lameter @ 2014-05-21 14:16 UTC (permalink / raw)
To: Nishanth Aravamudan
Cc: Tejun Heo, linux-mm, tony.luck, anton, David Rientjes,
linuxppc-dev
On Mon, 19 May 2014, Nishanth Aravamudan wrote:
> I'm seeing a panic at boot with this change on an LPAR which actually
> has no Node 0. Here's what I think is happening:
>
> start_kernel
> ...
> -> setup_per_cpu_areas
> -> pcpu_embed_first_chunk
> -> pcpu_fc_alloc
> -> ___alloc_bootmem_node(NODE_DATA(cpu_to_node(cpu), ...
> -> smp_prepare_boot_cpu
> -> set_numa_node(boot_cpuid)
>
> So we panic on the NODE_DATA call. It seems that ia64, at least, uses
> pcpu_alloc_first_chunk rather than embed. x86 has some code to handle
> early calls of cpu_to_node (early_cpu_to_node) and sets the mapping for
> all CPUs in setup_per_cpu_areas().
Maybe we can switch ia64 too embed? Tejun: Why are there these
dependencies?
> Thoughts? Does that mean we need something similar to x86 for powerpc?
Tejun is the expert in this area. CCing him.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Node 0 not necessary for powerpc?
2014-05-21 14:16 ` Christoph Lameter
@ 2014-05-21 18:58 ` Tejun Heo
2014-05-21 19:57 ` Nishanth Aravamudan
2014-06-19 17:14 ` Nishanth Aravamudan
0 siblings, 2 replies; 14+ messages in thread
From: Tejun Heo @ 2014-05-21 18:58 UTC (permalink / raw)
To: Christoph Lameter
Cc: tony.luck, Nishanth Aravamudan, linux-mm, anton, David Rientjes,
linuxppc-dev
Hello,
On Wed, May 21, 2014 at 09:16:27AM -0500, Christoph Lameter wrote:
> On Mon, 19 May 2014, Nishanth Aravamudan wrote:
> > I'm seeing a panic at boot with this change on an LPAR which actually
> > has no Node 0. Here's what I think is happening:
> >
> > start_kernel
> > ...
> > -> setup_per_cpu_areas
> > -> pcpu_embed_first_chunk
> > -> pcpu_fc_alloc
> > -> ___alloc_bootmem_node(NODE_DATA(cpu_to_node(cpu), ...
> > -> smp_prepare_boot_cpu
> > -> set_numa_node(boot_cpuid)
> >
> > So we panic on the NODE_DATA call. It seems that ia64, at least, uses
> > pcpu_alloc_first_chunk rather than embed. x86 has some code to handle
> > early calls of cpu_to_node (early_cpu_to_node) and sets the mapping for
> > all CPUs in setup_per_cpu_areas().
>
> Maybe we can switch ia64 too embed? Tejun: Why are there these
> dependencies?
>
> > Thoughts? Does that mean we need something similar to x86 for powerpc?
I'm missing context to properly understand what's going on but the
specific allocator in use shouldn't matter. e.g. x86 can use both
embed and page allocators. If the problem is that the arch is
accessing percpu memory before percpu allocator is initialized and the
problem was masked before somehow, the right thing to do would be
removing those premature percpu accesses. If early percpu variables
are really necessary, doing similar early_percpu thing as in x86 would
be necessary.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Node 0 not necessary for powerpc?
2014-05-21 18:58 ` Tejun Heo
@ 2014-05-21 19:57 ` Nishanth Aravamudan
2014-06-09 21:47 ` David Rientjes
2014-06-19 17:14 ` Nishanth Aravamudan
1 sibling, 1 reply; 14+ messages in thread
From: Nishanth Aravamudan @ 2014-05-21 19:57 UTC (permalink / raw)
To: Tejun Heo
Cc: tony.luck, linux-mm, anton, David Rientjes, Christoph Lameter,
linuxppc-dev
Hi Tejun,
On 21.05.2014 [14:58:12 -0400], Tejun Heo wrote:
> Hello,
>
> On Wed, May 21, 2014 at 09:16:27AM -0500, Christoph Lameter wrote:
> > On Mon, 19 May 2014, Nishanth Aravamudan wrote:
> > > I'm seeing a panic at boot with this change on an LPAR which actually
> > > has no Node 0. Here's what I think is happening:
> > >
> > > start_kernel
> > > ...
> > > -> setup_per_cpu_areas
> > > -> pcpu_embed_first_chunk
> > > -> pcpu_fc_alloc
> > > -> ___alloc_bootmem_node(NODE_DATA(cpu_to_node(cpu), ...
> > > -> smp_prepare_boot_cpu
> > > -> set_numa_node(boot_cpuid)
> > >
> > > So we panic on the NODE_DATA call. It seems that ia64, at least, uses
> > > pcpu_alloc_first_chunk rather than embed. x86 has some code to handle
> > > early calls of cpu_to_node (early_cpu_to_node) and sets the mapping for
> > > all CPUs in setup_per_cpu_areas().
> >
> > Maybe we can switch ia64 too embed? Tejun: Why are there these
> > dependencies?
> >
> > > Thoughts? Does that mean we need something similar to x86 for powerpc?
>
> I'm missing context to properly understand what's going on but the
> specific allocator in use shouldn't matter. e.g. x86 can use both
> embed and page allocators. If the problem is that the arch is
> accessing percpu memory before percpu allocator is initialized and the
> problem was masked before somehow, the right thing to do would be
> removing those premature percpu accesses. If early percpu variables
> are really necessary, doing similar early_percpu thing as in x86 would
> be necessary.
For context: I was looking at why N_ONLINE was statically setting Node 0
to be online, whether or not the topology is that way -- I've been
getting several bugs lately where Node 0 is online, but has no CPUs and
no memory on it, on powerpc.
On powerpc, setup_per_cpu_areas calls into ___alloc_bootmem_node using
NODE_DATA(cpu_to_node(cpu)).
Currently, cpu_to_node() in arch/powerpc/include/asm/topology.h does:
/*
* During early boot, the numa-cpu lookup table might not have been
* setup for all CPUs yet. In such cases, default to node 0.
*/
return (nid < 0) ? 0 : nid;
And so early at boot, if node 0 is not present, we end up accessing an
unitialized NODE_DATA(). So this seems buggy (I'll contact the powerpc
deveopers separately on that).
I recently submitted patches to have powerpc turn on
USE_PERCPU_NUMA_NODEID and HAVE_MEMORYLESS_NODES. But then, cpu_to_node
will be accessing percpu data in setup_per_cpu_areas, which seems like a
no-no. And more specifically, since we haven't yet run
smp_prepare_boot_cpu() at this point, cpu_to_node has not yet been
initialized to provide a sane value.
Thanks,
Nish
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Node 0 not necessary for powerpc?
2014-05-21 19:57 ` Nishanth Aravamudan
@ 2014-06-09 21:47 ` David Rientjes
2014-06-10 23:31 ` Nishanth Aravamudan
0 siblings, 1 reply; 14+ messages in thread
From: David Rientjes @ 2014-06-09 21:47 UTC (permalink / raw)
To: Nishanth Aravamudan
Cc: Tejun Heo, linux-mm, tony.luck, anton, Christoph Lameter,
linuxppc-dev
On Wed, 21 May 2014, Nishanth Aravamudan wrote:
> For context: I was looking at why N_ONLINE was statically setting Node 0
> to be online, whether or not the topology is that way -- I've been
> getting several bugs lately where Node 0 is online, but has no CPUs and
> no memory on it, on powerpc.
>
> On powerpc, setup_per_cpu_areas calls into ___alloc_bootmem_node using
> NODE_DATA(cpu_to_node(cpu)).
>
> Currently, cpu_to_node() in arch/powerpc/include/asm/topology.h does:
>
> /*
> * During early boot, the numa-cpu lookup table might not have been
> * setup for all CPUs yet. In such cases, default to node 0.
> */
> return (nid < 0) ? 0 : nid;
>
> And so early at boot, if node 0 is not present, we end up accessing an
> unitialized NODE_DATA(). So this seems buggy (I'll contact the powerpc
> deveopers separately on that).
>
I think what this really wants to do is NODE_DATA(cpu_to_mem(cpu)) and I
thought ppc had the cpu-to-local-memory-node mappings correct?
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Node 0 not necessary for powerpc?
2014-06-09 21:47 ` David Rientjes
@ 2014-06-10 23:31 ` Nishanth Aravamudan
2014-06-19 14:59 ` Tejun Heo
0 siblings, 1 reply; 14+ messages in thread
From: Nishanth Aravamudan @ 2014-06-10 23:31 UTC (permalink / raw)
To: David Rientjes
Cc: Tejun Heo, linux-mm, tony.luck, anton, Christoph Lameter,
linuxppc-dev
On 09.06.2014 [14:47:57 -0700], David Rientjes wrote:
> On Wed, 21 May 2014, Nishanth Aravamudan wrote:
>
> > For context: I was looking at why N_ONLINE was statically setting Node 0
> > to be online, whether or not the topology is that way -- I've been
> > getting several bugs lately where Node 0 is online, but has no CPUs and
> > no memory on it, on powerpc.
> >
> > On powerpc, setup_per_cpu_areas calls into ___alloc_bootmem_node using
> > NODE_DATA(cpu_to_node(cpu)).
> >
> > Currently, cpu_to_node() in arch/powerpc/include/asm/topology.h does:
> >
> > /*
> > * During early boot, the numa-cpu lookup table might not have been
> > * setup for all CPUs yet. In such cases, default to node 0.
> > */
> > return (nid < 0) ? 0 : nid;
> >
> > And so early at boot, if node 0 is not present, we end up accessing an
> > unitialized NODE_DATA(). So this seems buggy (I'll contact the powerpc
> > deveopers separately on that).
> >
>
> I think what this really wants to do is NODE_DATA(cpu_to_mem(cpu)) and I
> thought ppc had the cpu-to-local-memory-node mappings correct?
Except cpu_to_mem relies on the mapping being defined, but early in
boot, specifically, it isn't yet (at least not necessarily).
-Nish
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Node 0 not necessary for powerpc?
2014-06-10 23:31 ` Nishanth Aravamudan
@ 2014-06-19 14:59 ` Tejun Heo
2014-06-19 17:40 ` Nishanth Aravamudan
0 siblings, 1 reply; 14+ messages in thread
From: Tejun Heo @ 2014-06-19 14:59 UTC (permalink / raw)
To: Nishanth Aravamudan
Cc: tony.luck, linux-mm, anton, David Rientjes, Christoph Lameter,
linuxppc-dev
On Tue, Jun 10, 2014 at 04:31:57PM -0700, Nishanth Aravamudan wrote:
> > I think what this really wants to do is NODE_DATA(cpu_to_mem(cpu)) and I
> > thought ppc had the cpu-to-local-memory-node mappings correct?
>
> Except cpu_to_mem relies on the mapping being defined, but early in
> boot, specifically, it isn't yet (at least not necessarily).
Can't ppc NODE_DATA simply return dummy generic node_data during early
boot? Populating it with just enough to make early boot work
shouldn't be too hard, right?
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Node 0 not necessary for powerpc?
2014-05-21 18:58 ` Tejun Heo
2014-05-21 19:57 ` Nishanth Aravamudan
@ 2014-06-19 17:14 ` Nishanth Aravamudan
1 sibling, 0 replies; 14+ messages in thread
From: Nishanth Aravamudan @ 2014-06-19 17:14 UTC (permalink / raw)
To: Tejun Heo
Cc: tony.luck, linux-mm, anton, David Rientjes, Christoph Lameter,
linuxppc-dev
On 21.05.2014 [14:58:12 -0400], Tejun Heo wrote:
> Hello,
>
> On Wed, May 21, 2014 at 09:16:27AM -0500, Christoph Lameter wrote:
> > On Mon, 19 May 2014, Nishanth Aravamudan wrote:
> > > I'm seeing a panic at boot with this change on an LPAR which actually
> > > has no Node 0. Here's what I think is happening:
> > >
> > > start_kernel
> > > ...
> > > -> setup_per_cpu_areas
> > > -> pcpu_embed_first_chunk
> > > -> pcpu_fc_alloc
> > > -> ___alloc_bootmem_node(NODE_DATA(cpu_to_node(cpu), ...
> > > -> smp_prepare_boot_cpu
> > > -> set_numa_node(boot_cpuid)
> > >
> > > So we panic on the NODE_DATA call. It seems that ia64, at least, uses
> > > pcpu_alloc_first_chunk rather than embed. x86 has some code to handle
> > > early calls of cpu_to_node (early_cpu_to_node) and sets the mapping for
> > > all CPUs in setup_per_cpu_areas().
> >
> > Maybe we can switch ia64 too embed? Tejun: Why are there these
> > dependencies?
> >
> > > Thoughts? Does that mean we need something similar to x86 for powerpc?
>
> I'm missing context to properly understand what's going on but the
> specific allocator in use shouldn't matter. e.g. x86 can use both
> embed and page allocators. If the problem is that the arch is
> accessing percpu memory before percpu allocator is initialized and the
> problem was masked before somehow, the right thing to do would be
> removing those premature percpu accesses. If early percpu variables
> are really necessary, doing similar early_percpu thing as in x86 would
> be necessary.
The early access is in the arch's pcpu_alloc_bootmem. On x86, rather
than using NODE_DATA(cpu_to_node), it uses (in pcpu_alloc_bootmem),
early_cpu_to_node(cpu) with their custom logic.
The issue is that cpu_to_node, if USE_PERCPU_NUMA_NODE_ID is defined
(which it is for NUMA powerpc, x86, ia64), is that cpu_to_node uses the
percpu area, which data isn't initialized yet.
So I guess powerpc needs the same treatment as x86.
Thanks,
Nish
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Node 0 not necessary for powerpc?
2014-06-19 14:59 ` Tejun Heo
@ 2014-06-19 17:40 ` Nishanth Aravamudan
0 siblings, 0 replies; 14+ messages in thread
From: Nishanth Aravamudan @ 2014-06-19 17:40 UTC (permalink / raw)
To: Tejun Heo
Cc: tony.luck, linux-mm, anton, David Rientjes, Christoph Lameter,
linuxppc-dev
On 19.06.2014 [10:59:50 -0400], Tejun Heo wrote:
> On Tue, Jun 10, 2014 at 04:31:57PM -0700, Nishanth Aravamudan wrote:
> > > I think what this really wants to do is NODE_DATA(cpu_to_mem(cpu)) and I
> > > thought ppc had the cpu-to-local-memory-node mappings correct?
> >
> > Except cpu_to_mem relies on the mapping being defined, but early in
> > boot, specifically, it isn't yet (at least not necessarily).
>
> Can't ppc NODE_DATA simply return dummy generic node_data during early
> boot? Populating it with just enough to make early boot work
> shouldn't be too hard, right?
So the problem is this, whether we use cpu_to_mem() or cpu_to_node()
here, neither is setup yet because of the ordering between percpu setup
and the actual writing of the percpu data (that is actually storing what
node/local memory is relative to a given CPU).
The NODE_DATA is all correct, but since we are calling cpu_to_{mem,node}
before it really holds valid data, it falsely says 0, which is not
necessarily even an online node.
So, I think we need to do the same thing as x86 and have an early
mapping setup and configured before the percpu areas are.
Thanks,
Nish
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2014-06-19 17:41 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-03-11 19:56 Node 0 not necessary for powerpc? Nishanth Aravamudan
2014-03-12 2:02 ` David Rientjes
2014-03-13 16:48 ` Nishanth Aravamudan
2014-03-12 13:41 ` Christoph Lameter
2014-03-13 16:49 ` Nishanth Aravamudan
2014-05-19 18:24 ` Nishanth Aravamudan
2014-05-21 14:16 ` Christoph Lameter
2014-05-21 18:58 ` Tejun Heo
2014-05-21 19:57 ` Nishanth Aravamudan
2014-06-09 21:47 ` David Rientjes
2014-06-10 23:31 ` Nishanth Aravamudan
2014-06-19 14:59 ` Tejun Heo
2014-06-19 17:40 ` Nishanth Aravamudan
2014-06-19 17:14 ` Nishanth Aravamudan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).