All of lore.kernel.org
 help / color / mirror / Atom feed
From: thunder.leizhen@huawei.com (Leizhen (ThunderTown))
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH v4 11/14] arm64/numa: support HAVE_MEMORYLESS_NODES
Date: Wed, 8 Jun 2016 10:16:36 +0800	[thread overview]
Message-ID: <57578004.4050203@huawei.com> (raw)
In-Reply-To: <CAFpQJXUe_eBBR5dBXOGXMCKmbd3n3TUYzFpDNewjVom-iu+3yg@mail.gmail.com>



On 2016/6/7 22:01, Ganapatrao Kulkarni wrote:
> On Tue, Jun 7, 2016 at 6:27 PM, Leizhen (ThunderTown)
> <thunder.leizhen@huawei.com> wrote:
>>
>>
>> On 2016/6/7 16:31, Ganapatrao Kulkarni wrote:
>>> On Tue, Jun 7, 2016 at 1:38 PM, Zhen Lei <thunder.leizhen@huawei.com> wrote:
>>>> Some numa nodes may have no memory. For example:
>>>> 1. cpu0 on node0
>>>> 2. cpu1 on node1
>>>> 3. device0 access the momory from node0 and node1 take the same time.
>>>
>>> i am wondering, if access to both nodes is same, then why you need numa.
>>> the example you are quoting is against the basic principle of "numa"
>>> what is device0 here? cpu?
>> The device0 can also be a cpu. I drew a simple diagram:
>>
>>   cpu0     cpu1        cpu2/device0
>>     |        |              |
>>     |        |              |
>>    DDR0     DDR1    No DIMM slots or no DIMM plugged
>>  (node0)  (node1)         (node2)
>>
> 
> thanks for the clarification. your example is for 3 node system, where
> third node is memory less node.
> do you see any issue in supporting this topology with existing code?
If opened HAVE_MEMORYLESS_NODES, it will pick the nearest node for the cpus on
memoryless node.

For example, in include/linux/topology.h
#ifdef CONFIG_HAVE_MEMORYLESS_NODES
...
static inline int cpu_to_mem(int cpu)
{
	return per_cpu(_numa_mem_, cpu);
}
...
#else
...
static inline int cpu_to_mem(int cpu)
{
	return cpu_to_node(cpu);
}
...
#endif

> I think, this use case should be supported with present code.
> 
>>>>
>>>> So, we can not simply classify device0 to node0 or node1, but we can
>>>> define a node2 which distances to node0 and node1 are the same.
>>>>
>>>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
>>>> ---
>>>>  arch/arm64/Kconfig      |  4 ++++
>>>>  arch/arm64/kernel/smp.c |  1 +
>>>>  arch/arm64/mm/numa.c    | 43 +++++++++++++++++++++++++++++++++++++++++--
>>>>  3 files changed, 46 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>>>> index 05c1bf1..5904a62 100644
>>>> --- a/arch/arm64/Kconfig
>>>> +++ b/arch/arm64/Kconfig
>>>> @@ -581,6 +581,10 @@ config NEED_PER_CPU_EMBED_FIRST_CHUNK
>>>>         def_bool y
>>>>         depends on NUMA
>>>>
>>>> +config HAVE_MEMORYLESS_NODES
>>>> +       def_bool y
>>>> +       depends on NUMA
>>>> +
>>>>  source kernel/Kconfig.preempt
>>>>  source kernel/Kconfig.hz
>>>>
>>>> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
>>>> index d099306..9e15297 100644
>>>> --- a/arch/arm64/kernel/smp.c
>>>> +++ b/arch/arm64/kernel/smp.c
>>>> @@ -620,6 +620,7 @@ static void __init of_parse_and_init_cpus(void)
>>>>                         }
>>>>
>>>>                         bootcpu_valid = true;
>>>> +                       early_map_cpu_to_node(0, of_node_to_nid(dn));
>>>>
>>>>                         /*
>>>>                          * cpu_logical_map has already been
>>>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
>>>> index df5c842..d73b0a0 100644
>>>> --- a/arch/arm64/mm/numa.c
>>>> +++ b/arch/arm64/mm/numa.c
>>>> @@ -128,6 +128,14 @@ void __init early_map_cpu_to_node(unsigned int cpu, int nid)
>>>>                 nid = 0;
>>>>
>>>>         cpu_to_node_map[cpu] = nid;
>>>> +
>>>> +       /*
>>>> +        * We should set the numa node of cpu0 as soon as possible, because it
>>>> +        * has already been set up online before. cpu_to_node(0) will soon be
>>>> +        * called.
>>>> +        */
>>>> +       if (!cpu)
>>>> +               set_cpu_numa_node(cpu, nid);
>>>>  }
>>>>
>>>>  #ifdef CONFIG_HAVE_SETUP_PER_CPU_AREA
>>>> @@ -215,6 +223,35 @@ int __init numa_add_memblk(int nid, u64 start, u64 end)
>>>>         return ret;
>>>>  }
>>>>
>>>> +static u64 __init alloc_node_data_from_nearest_node(int nid, const size_t size)
>>>> +{
>>>> +       int i, best_nid, distance;
>>>> +       u64 pa;
>>>> +       DECLARE_BITMAP(nodes_map, MAX_NUMNODES);
>>>> +
>>>> +       bitmap_zero(nodes_map, MAX_NUMNODES);
>>>> +       bitmap_set(nodes_map, nid, 1);
>>>> +
>>>> +find_nearest_node:
>>>> +       best_nid = NUMA_NO_NODE;
>>>> +       distance = INT_MAX;
>>>> +
>>>> +       for_each_clear_bit(i, nodes_map, MAX_NUMNODES)
>>>> +               if (numa_distance[nid][i] < distance) {
>>>> +                       best_nid = i;
>>>> +                       distance = numa_distance[nid][i];
>>>> +               }
>>>> +
>>>> +       pa = memblock_alloc_nid(size, SMP_CACHE_BYTES, best_nid);
>>>> +       if (!pa) {
>>>> +               BUG_ON(best_nid == NUMA_NO_NODE);
>>>> +               bitmap_set(nodes_map, best_nid, 1);
>>>> +               goto find_nearest_node;
>>>> +       }
>>>> +
>>>> +       return pa;
>>>> +}
>>>> +
> 
> why do we need this function in arch specific code.
I also considered put these code(include HAVE_SETUP_PER_CPU_AREA) into drivers/of/of_numa.c,
but if I do that, it will make acpi numa dependent on of numa.

> dont you think common code will take care of this? when you define
> HAVE_MEMORYLESS_NODES

I have searched CONFIG_HAVE_MEMORYLESS_NODES in *.c, but did not find the relevant content.
So maybe other ARCHs also missed this.

> 
>>>>  /**
>>>>   * Initialize NODE_DATA for a node on the local memory
>>>>   */
>>>> @@ -228,7 +265,9 @@ static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn)
>>>>         pr_info("Initmem setup node %d [mem %#010Lx-%#010Lx]\n",
>>>>                 nid, start_pfn << PAGE_SHIFT, (end_pfn << PAGE_SHIFT) - 1);
>>>>
>>>> -       nd_pa = memblock_alloc_try_nid(nd_size, SMP_CACHE_BYTES, nid);
> 
> this function try to allocate from a nid, if fails, it allocates from
> node 0(local node).
> this is ok for memory less node i guess.
Yes, the function is OK, but the performance is not.

Suppose there are 3 nodes:
1. cpu0 on node0, cpu1 on node1, cpu2 on node2.
2. cpu2 access the memory on node1 take 1us, but access the memory on node1 take 5us.
   That is, distance[2,1] is shorter than distance[2,0].
3. And node2 is a memoryless node.

So if NODE_DATA(2) allocated from node0, it will take more time than allocted from node1 at run time.
Because NODE_DATA will be accessed at run time.

> 
>>>> +       nd_pa = memblock_alloc_nid(nd_size, SMP_CACHE_BYTES, nid);
>>>> +       if (!nd_pa)
>>>> +               nd_pa = alloc_node_data_from_nearest_node(nid, nd_size);
>>>>         nd = __va(nd_pa);
>>>>
>>>>         /* report and initialize */
>>>> @@ -238,7 +277,7 @@ static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn)
>>>>         if (tnid != nid)
>>>>                 pr_info("    NODE_DATA(%d) on node %d\n", nid, tnid);
>>>>
>>>> -       node_data[nid] = nd;
>>>> +       NODE_DATA(nid) = nd;
>>>>         memset(NODE_DATA(nid), 0, sizeof(pg_data_t));
>>>>         NODE_DATA(nid)->node_id = nid;
>>>>         NODE_DATA(nid)->node_start_pfn = start_pfn;
>>>> --
>>>> 2.5.0
>>>>
>>>>
>>> Ganapat
>>>>
>>>> _______________________________________________
>>>> linux-arm-kernel mailing list
>>>> linux-arm-kernel at lists.infradead.org
>>>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>>>
>>> .
>>>
>>
> 
> .
> 

WARNING: multiple messages have this Message-ID (diff)
From: "Leizhen (ThunderTown)" <thunder.leizhen@huawei.com>
To: Ganapatrao Kulkarni <gpkulkarni@gmail.com>
Cc: devicetree <devicetree@vger.kernel.org>,
	Zefan Li <lizefan@huawei.com>,
	David Daney <david.daney@cavium.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Xinwei Hu <huxinwei@huawei.com>,
	Tianhong Ding <dingtianhong@huawei.com>,
	Will Deacon <will.deacon@arm.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Robert Richter <rrichter@cavium.com>,
	Rob Herring <robh+dt@kernel.org>,
	Hanjun Guo <guohanjun@huawei.com>,
	Grant Likely <grant.likely@linaro.org>,
	Ganapatrao Kulkarni <gkulkarni@caviumnetworks.com>,
	Frank Rowand <frowand.list@gmail.com>,
	linux-arm-kernel <linux-arm-kernel@lists.infradead.org>
Subject: Re: [PATCH v4 11/14] arm64/numa: support HAVE_MEMORYLESS_NODES
Date: Wed, 8 Jun 2016 10:16:36 +0800	[thread overview]
Message-ID: <57578004.4050203@huawei.com> (raw)
In-Reply-To: <CAFpQJXUe_eBBR5dBXOGXMCKmbd3n3TUYzFpDNewjVom-iu+3yg@mail.gmail.com>



On 2016/6/7 22:01, Ganapatrao Kulkarni wrote:
> On Tue, Jun 7, 2016 at 6:27 PM, Leizhen (ThunderTown)
> <thunder.leizhen@huawei.com> wrote:
>>
>>
>> On 2016/6/7 16:31, Ganapatrao Kulkarni wrote:
>>> On Tue, Jun 7, 2016 at 1:38 PM, Zhen Lei <thunder.leizhen@huawei.com> wrote:
>>>> Some numa nodes may have no memory. For example:
>>>> 1. cpu0 on node0
>>>> 2. cpu1 on node1
>>>> 3. device0 access the momory from node0 and node1 take the same time.
>>>
>>> i am wondering, if access to both nodes is same, then why you need numa.
>>> the example you are quoting is against the basic principle of "numa"
>>> what is device0 here? cpu?
>> The device0 can also be a cpu. I drew a simple diagram:
>>
>>   cpu0     cpu1        cpu2/device0
>>     |        |              |
>>     |        |              |
>>    DDR0     DDR1    No DIMM slots or no DIMM plugged
>>  (node0)  (node1)         (node2)
>>
> 
> thanks for the clarification. your example is for 3 node system, where
> third node is memory less node.
> do you see any issue in supporting this topology with existing code?
If opened HAVE_MEMORYLESS_NODES, it will pick the nearest node for the cpus on
memoryless node.

For example, in include/linux/topology.h
#ifdef CONFIG_HAVE_MEMORYLESS_NODES
...
static inline int cpu_to_mem(int cpu)
{
	return per_cpu(_numa_mem_, cpu);
}
...
#else
...
static inline int cpu_to_mem(int cpu)
{
	return cpu_to_node(cpu);
}
...
#endif

> I think, this use case should be supported with present code.
> 
>>>>
>>>> So, we can not simply classify device0 to node0 or node1, but we can
>>>> define a node2 which distances to node0 and node1 are the same.
>>>>
>>>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
>>>> ---
>>>>  arch/arm64/Kconfig      |  4 ++++
>>>>  arch/arm64/kernel/smp.c |  1 +
>>>>  arch/arm64/mm/numa.c    | 43 +++++++++++++++++++++++++++++++++++++++++--
>>>>  3 files changed, 46 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>>>> index 05c1bf1..5904a62 100644
>>>> --- a/arch/arm64/Kconfig
>>>> +++ b/arch/arm64/Kconfig
>>>> @@ -581,6 +581,10 @@ config NEED_PER_CPU_EMBED_FIRST_CHUNK
>>>>         def_bool y
>>>>         depends on NUMA
>>>>
>>>> +config HAVE_MEMORYLESS_NODES
>>>> +       def_bool y
>>>> +       depends on NUMA
>>>> +
>>>>  source kernel/Kconfig.preempt
>>>>  source kernel/Kconfig.hz
>>>>
>>>> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
>>>> index d099306..9e15297 100644
>>>> --- a/arch/arm64/kernel/smp.c
>>>> +++ b/arch/arm64/kernel/smp.c
>>>> @@ -620,6 +620,7 @@ static void __init of_parse_and_init_cpus(void)
>>>>                         }
>>>>
>>>>                         bootcpu_valid = true;
>>>> +                       early_map_cpu_to_node(0, of_node_to_nid(dn));
>>>>
>>>>                         /*
>>>>                          * cpu_logical_map has already been
>>>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
>>>> index df5c842..d73b0a0 100644
>>>> --- a/arch/arm64/mm/numa.c
>>>> +++ b/arch/arm64/mm/numa.c
>>>> @@ -128,6 +128,14 @@ void __init early_map_cpu_to_node(unsigned int cpu, int nid)
>>>>                 nid = 0;
>>>>
>>>>         cpu_to_node_map[cpu] = nid;
>>>> +
>>>> +       /*
>>>> +        * We should set the numa node of cpu0 as soon as possible, because it
>>>> +        * has already been set up online before. cpu_to_node(0) will soon be
>>>> +        * called.
>>>> +        */
>>>> +       if (!cpu)
>>>> +               set_cpu_numa_node(cpu, nid);
>>>>  }
>>>>
>>>>  #ifdef CONFIG_HAVE_SETUP_PER_CPU_AREA
>>>> @@ -215,6 +223,35 @@ int __init numa_add_memblk(int nid, u64 start, u64 end)
>>>>         return ret;
>>>>  }
>>>>
>>>> +static u64 __init alloc_node_data_from_nearest_node(int nid, const size_t size)
>>>> +{
>>>> +       int i, best_nid, distance;
>>>> +       u64 pa;
>>>> +       DECLARE_BITMAP(nodes_map, MAX_NUMNODES);
>>>> +
>>>> +       bitmap_zero(nodes_map, MAX_NUMNODES);
>>>> +       bitmap_set(nodes_map, nid, 1);
>>>> +
>>>> +find_nearest_node:
>>>> +       best_nid = NUMA_NO_NODE;
>>>> +       distance = INT_MAX;
>>>> +
>>>> +       for_each_clear_bit(i, nodes_map, MAX_NUMNODES)
>>>> +               if (numa_distance[nid][i] < distance) {
>>>> +                       best_nid = i;
>>>> +                       distance = numa_distance[nid][i];
>>>> +               }
>>>> +
>>>> +       pa = memblock_alloc_nid(size, SMP_CACHE_BYTES, best_nid);
>>>> +       if (!pa) {
>>>> +               BUG_ON(best_nid == NUMA_NO_NODE);
>>>> +               bitmap_set(nodes_map, best_nid, 1);
>>>> +               goto find_nearest_node;
>>>> +       }
>>>> +
>>>> +       return pa;
>>>> +}
>>>> +
> 
> why do we need this function in arch specific code.
I also considered put these code(include HAVE_SETUP_PER_CPU_AREA) into drivers/of/of_numa.c,
but if I do that, it will make acpi numa dependent on of numa.

> dont you think common code will take care of this? when you define
> HAVE_MEMORYLESS_NODES

I have searched CONFIG_HAVE_MEMORYLESS_NODES in *.c, but did not find the relevant content.
So maybe other ARCHs also missed this.

> 
>>>>  /**
>>>>   * Initialize NODE_DATA for a node on the local memory
>>>>   */
>>>> @@ -228,7 +265,9 @@ static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn)
>>>>         pr_info("Initmem setup node %d [mem %#010Lx-%#010Lx]\n",
>>>>                 nid, start_pfn << PAGE_SHIFT, (end_pfn << PAGE_SHIFT) - 1);
>>>>
>>>> -       nd_pa = memblock_alloc_try_nid(nd_size, SMP_CACHE_BYTES, nid);
> 
> this function try to allocate from a nid, if fails, it allocates from
> node 0(local node).
> this is ok for memory less node i guess.
Yes, the function is OK, but the performance is not.

Suppose there are 3 nodes:
1. cpu0 on node0, cpu1 on node1, cpu2 on node2.
2. cpu2 access the memory on node1 take 1us, but access the memory on node1 take 5us.
   That is, distance[2,1] is shorter than distance[2,0].
3. And node2 is a memoryless node.

So if NODE_DATA(2) allocated from node0, it will take more time than allocted from node1 at run time.
Because NODE_DATA will be accessed at run time.

> 
>>>> +       nd_pa = memblock_alloc_nid(nd_size, SMP_CACHE_BYTES, nid);
>>>> +       if (!nd_pa)
>>>> +               nd_pa = alloc_node_data_from_nearest_node(nid, nd_size);
>>>>         nd = __va(nd_pa);
>>>>
>>>>         /* report and initialize */
>>>> @@ -238,7 +277,7 @@ static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn)
>>>>         if (tnid != nid)
>>>>                 pr_info("    NODE_DATA(%d) on node %d\n", nid, tnid);
>>>>
>>>> -       node_data[nid] = nd;
>>>> +       NODE_DATA(nid) = nd;
>>>>         memset(NODE_DATA(nid), 0, sizeof(pg_data_t));
>>>>         NODE_DATA(nid)->node_id = nid;
>>>>         NODE_DATA(nid)->node_start_pfn = start_pfn;
>>>> --
>>>> 2.5.0
>>>>
>>>>
>>> Ganapat
>>>>
>>>> _______________________________________________
>>>> linux-arm-kernel mailing list
>>>> linux-arm-kernel@lists.infradead.org
>>>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>>>
>>> .
>>>
>>
> 
> .
> 

WARNING: multiple messages have this Message-ID (diff)
From: "Leizhen (ThunderTown)" <thunder.leizhen@huawei.com>
To: Ganapatrao Kulkarni <gpkulkarni@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will.deacon@arm.com>,
	linux-arm-kernel <linux-arm-kernel@lists.infradead.org>,
	Ganapatrao Kulkarni <gkulkarni@caviumnetworks.com>,
	Robert Richter <rrichter@cavium.com>,
	"David Daney" <david.daney@cavium.com>,
	Rob Herring <robh+dt@kernel.org>,
	"Frank Rowand" <frowand.list@gmail.com>,
	Grant Likely <grant.likely@linaro.org>,
	devicetree <devicetree@vger.kernel.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Xinwei Hu <huxinwei@huawei.com>, Zefan Li <lizefan@huawei.com>,
	Hanjun Guo <guohanjun@huawei.com>,
	Tianhong Ding <dingtianhong@huawei.com>
Subject: Re: [PATCH v4 11/14] arm64/numa: support HAVE_MEMORYLESS_NODES
Date: Wed, 8 Jun 2016 10:16:36 +0800	[thread overview]
Message-ID: <57578004.4050203@huawei.com> (raw)
In-Reply-To: <CAFpQJXUe_eBBR5dBXOGXMCKmbd3n3TUYzFpDNewjVom-iu+3yg@mail.gmail.com>



On 2016/6/7 22:01, Ganapatrao Kulkarni wrote:
> On Tue, Jun 7, 2016 at 6:27 PM, Leizhen (ThunderTown)
> <thunder.leizhen@huawei.com> wrote:
>>
>>
>> On 2016/6/7 16:31, Ganapatrao Kulkarni wrote:
>>> On Tue, Jun 7, 2016 at 1:38 PM, Zhen Lei <thunder.leizhen@huawei.com> wrote:
>>>> Some numa nodes may have no memory. For example:
>>>> 1. cpu0 on node0
>>>> 2. cpu1 on node1
>>>> 3. device0 access the momory from node0 and node1 take the same time.
>>>
>>> i am wondering, if access to both nodes is same, then why you need numa.
>>> the example you are quoting is against the basic principle of "numa"
>>> what is device0 here? cpu?
>> The device0 can also be a cpu. I drew a simple diagram:
>>
>>   cpu0     cpu1        cpu2/device0
>>     |        |              |
>>     |        |              |
>>    DDR0     DDR1    No DIMM slots or no DIMM plugged
>>  (node0)  (node1)         (node2)
>>
> 
> thanks for the clarification. your example is for 3 node system, where
> third node is memory less node.
> do you see any issue in supporting this topology with existing code?
If opened HAVE_MEMORYLESS_NODES, it will pick the nearest node for the cpus on
memoryless node.

For example, in include/linux/topology.h
#ifdef CONFIG_HAVE_MEMORYLESS_NODES
...
static inline int cpu_to_mem(int cpu)
{
	return per_cpu(_numa_mem_, cpu);
}
...
#else
...
static inline int cpu_to_mem(int cpu)
{
	return cpu_to_node(cpu);
}
...
#endif

> I think, this use case should be supported with present code.
> 
>>>>
>>>> So, we can not simply classify device0 to node0 or node1, but we can
>>>> define a node2 which distances to node0 and node1 are the same.
>>>>
>>>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
>>>> ---
>>>>  arch/arm64/Kconfig      |  4 ++++
>>>>  arch/arm64/kernel/smp.c |  1 +
>>>>  arch/arm64/mm/numa.c    | 43 +++++++++++++++++++++++++++++++++++++++++--
>>>>  3 files changed, 46 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>>>> index 05c1bf1..5904a62 100644
>>>> --- a/arch/arm64/Kconfig
>>>> +++ b/arch/arm64/Kconfig
>>>> @@ -581,6 +581,10 @@ config NEED_PER_CPU_EMBED_FIRST_CHUNK
>>>>         def_bool y
>>>>         depends on NUMA
>>>>
>>>> +config HAVE_MEMORYLESS_NODES
>>>> +       def_bool y
>>>> +       depends on NUMA
>>>> +
>>>>  source kernel/Kconfig.preempt
>>>>  source kernel/Kconfig.hz
>>>>
>>>> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
>>>> index d099306..9e15297 100644
>>>> --- a/arch/arm64/kernel/smp.c
>>>> +++ b/arch/arm64/kernel/smp.c
>>>> @@ -620,6 +620,7 @@ static void __init of_parse_and_init_cpus(void)
>>>>                         }
>>>>
>>>>                         bootcpu_valid = true;
>>>> +                       early_map_cpu_to_node(0, of_node_to_nid(dn));
>>>>
>>>>                         /*
>>>>                          * cpu_logical_map has already been
>>>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
>>>> index df5c842..d73b0a0 100644
>>>> --- a/arch/arm64/mm/numa.c
>>>> +++ b/arch/arm64/mm/numa.c
>>>> @@ -128,6 +128,14 @@ void __init early_map_cpu_to_node(unsigned int cpu, int nid)
>>>>                 nid = 0;
>>>>
>>>>         cpu_to_node_map[cpu] = nid;
>>>> +
>>>> +       /*
>>>> +        * We should set the numa node of cpu0 as soon as possible, because it
>>>> +        * has already been set up online before. cpu_to_node(0) will soon be
>>>> +        * called.
>>>> +        */
>>>> +       if (!cpu)
>>>> +               set_cpu_numa_node(cpu, nid);
>>>>  }
>>>>
>>>>  #ifdef CONFIG_HAVE_SETUP_PER_CPU_AREA
>>>> @@ -215,6 +223,35 @@ int __init numa_add_memblk(int nid, u64 start, u64 end)
>>>>         return ret;
>>>>  }
>>>>
>>>> +static u64 __init alloc_node_data_from_nearest_node(int nid, const size_t size)
>>>> +{
>>>> +       int i, best_nid, distance;
>>>> +       u64 pa;
>>>> +       DECLARE_BITMAP(nodes_map, MAX_NUMNODES);
>>>> +
>>>> +       bitmap_zero(nodes_map, MAX_NUMNODES);
>>>> +       bitmap_set(nodes_map, nid, 1);
>>>> +
>>>> +find_nearest_node:
>>>> +       best_nid = NUMA_NO_NODE;
>>>> +       distance = INT_MAX;
>>>> +
>>>> +       for_each_clear_bit(i, nodes_map, MAX_NUMNODES)
>>>> +               if (numa_distance[nid][i] < distance) {
>>>> +                       best_nid = i;
>>>> +                       distance = numa_distance[nid][i];
>>>> +               }
>>>> +
>>>> +       pa = memblock_alloc_nid(size, SMP_CACHE_BYTES, best_nid);
>>>> +       if (!pa) {
>>>> +               BUG_ON(best_nid == NUMA_NO_NODE);
>>>> +               bitmap_set(nodes_map, best_nid, 1);
>>>> +               goto find_nearest_node;
>>>> +       }
>>>> +
>>>> +       return pa;
>>>> +}
>>>> +
> 
> why do we need this function in arch specific code.
I also considered put these code(include HAVE_SETUP_PER_CPU_AREA) into drivers/of/of_numa.c,
but if I do that, it will make acpi numa dependent on of numa.

> dont you think common code will take care of this? when you define
> HAVE_MEMORYLESS_NODES

I have searched CONFIG_HAVE_MEMORYLESS_NODES in *.c, but did not find the relevant content.
So maybe other ARCHs also missed this.

> 
>>>>  /**
>>>>   * Initialize NODE_DATA for a node on the local memory
>>>>   */
>>>> @@ -228,7 +265,9 @@ static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn)
>>>>         pr_info("Initmem setup node %d [mem %#010Lx-%#010Lx]\n",
>>>>                 nid, start_pfn << PAGE_SHIFT, (end_pfn << PAGE_SHIFT) - 1);
>>>>
>>>> -       nd_pa = memblock_alloc_try_nid(nd_size, SMP_CACHE_BYTES, nid);
> 
> this function try to allocate from a nid, if fails, it allocates from
> node 0(local node).
> this is ok for memory less node i guess.
Yes, the function is OK, but the performance is not.

Suppose there are 3 nodes:
1. cpu0 on node0, cpu1 on node1, cpu2 on node2.
2. cpu2 access the memory on node1 take 1us, but access the memory on node1 take 5us.
   That is, distance[2,1] is shorter than distance[2,0].
3. And node2 is a memoryless node.

So if NODE_DATA(2) allocated from node0, it will take more time than allocted from node1 at run time.
Because NODE_DATA will be accessed at run time.

> 
>>>> +       nd_pa = memblock_alloc_nid(nd_size, SMP_CACHE_BYTES, nid);
>>>> +       if (!nd_pa)
>>>> +               nd_pa = alloc_node_data_from_nearest_node(nid, nd_size);
>>>>         nd = __va(nd_pa);
>>>>
>>>>         /* report and initialize */
>>>> @@ -238,7 +277,7 @@ static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn)
>>>>         if (tnid != nid)
>>>>                 pr_info("    NODE_DATA(%d) on node %d\n", nid, tnid);
>>>>
>>>> -       node_data[nid] = nd;
>>>> +       NODE_DATA(nid) = nd;
>>>>         memset(NODE_DATA(nid), 0, sizeof(pg_data_t));
>>>>         NODE_DATA(nid)->node_id = nid;
>>>>         NODE_DATA(nid)->node_start_pfn = start_pfn;
>>>> --
>>>> 2.5.0
>>>>
>>>>
>>> Ganapat
>>>>
>>>> _______________________________________________
>>>> linux-arm-kernel mailing list
>>>> linux-arm-kernel@lists.infradead.org
>>>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>>>
>>> .
>>>
>>
> 
> .
> 

  reply	other threads:[~2016-06-08  2:16 UTC|newest]

Thread overview: 91+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-07  8:08 [PATCH v4 00/14] fix some type infos and bugs for arm64/of numa Zhen Lei
2016-06-07  8:08 ` Zhen Lei
2016-06-07  8:08 ` Zhen Lei
2016-06-07  8:08 ` [PATCH v4 01/14] of/numa: remove a duplicated pr_debug information Zhen Lei
2016-06-07  8:08   ` Zhen Lei
2016-06-07  8:08   ` Zhen Lei
2016-06-07  8:08 ` [PATCH v4 02/14] of/numa: fix a memory@ node can only contains one memory block Zhen Lei
2016-06-07  8:08   ` Zhen Lei
2016-06-07  8:08   ` Zhen Lei
2016-06-07  8:08 ` [PATCH v4 03/14] arm64/numa: add nid check for " Zhen Lei
2016-06-07  8:08   ` Zhen Lei
2016-06-07  8:08   ` Zhen Lei
2016-06-07  8:08 ` [PATCH v4 04/14] of/numa: remove a duplicated warning Zhen Lei
2016-06-07  8:08   ` Zhen Lei
2016-06-07  8:08   ` Zhen Lei
2016-06-07  8:08 ` [PATCH v4 05/14] arm64/numa: avoid inconsistent information to be printed Zhen Lei
2016-06-07  8:08   ` Zhen Lei
2016-06-07  8:08   ` Zhen Lei
2016-06-07  8:08 ` [PATCH v4 06/14] of_numa: Use of_get_next_parent to simplify code Zhen Lei
2016-06-07  8:08   ` Zhen Lei
2016-06-07  8:08   ` Zhen Lei
2016-06-07  8:08 ` [PATCH v4 07/14] of_numa: Use pr_fmt() Zhen Lei
2016-06-07  8:08   ` Zhen Lei
2016-06-07  8:08   ` Zhen Lei
2016-06-07  8:08 ` [PATCH v4 08/14] arm64: numa: " Zhen Lei
2016-06-07  8:08   ` Zhen Lei
2016-06-07  8:08   ` Zhen Lei
2016-06-07  8:08 ` [PATCH v4 09/14] arm64/numa: support HAVE_SETUP_PER_CPU_AREA Zhen Lei
2016-06-07  8:08   ` Zhen Lei
2016-06-07  8:08   ` Zhen Lei
2016-06-07  8:08 ` [PATCH v4 10/14] arm64/numa: define numa_distance as array to simplify code Zhen Lei
2016-06-07  8:08   ` Zhen Lei
2016-06-07  8:08   ` Zhen Lei
2016-06-07  8:08 ` [PATCH v4 11/14] arm64/numa: support HAVE_MEMORYLESS_NODES Zhen Lei
2016-06-07  8:08   ` Zhen Lei
2016-06-07  8:08   ` Zhen Lei
2016-06-07  8:31   ` Ganapatrao Kulkarni
2016-06-07  8:31     ` Ganapatrao Kulkarni
2016-06-07  8:31     ` Ganapatrao Kulkarni
2016-06-07 12:57     ` Leizhen (ThunderTown)
2016-06-07 12:57       ` Leizhen (ThunderTown)
2016-06-07 14:01       ` Ganapatrao Kulkarni
2016-06-07 14:01         ` Ganapatrao Kulkarni
2016-06-08  2:16         ` Leizhen (ThunderTown) [this message]
2016-06-08  2:16           ` Leizhen (ThunderTown)
2016-06-08  2:16           ` Leizhen (ThunderTown)
2016-06-08  4:45           ` Ganapatrao Kulkarni
2016-06-08  4:45             ` Ganapatrao Kulkarni
2016-06-08  7:49             ` Leizhen (ThunderTown)
2016-06-08  7:49               ` Leizhen (ThunderTown)
2016-06-08  7:49               ` Leizhen (ThunderTown)
2016-06-07  8:08 ` [PATCH v4 12/14] arm64/numa: remove some useless code Zhen Lei
2016-06-07  8:08   ` Zhen Lei
2016-06-07  8:08   ` Zhen Lei
2016-06-07  8:28   ` Ganapatrao Kulkarni
2016-06-07  8:28     ` Ganapatrao Kulkarni
2016-06-07 12:42     ` Leizhen (ThunderTown)
2016-06-07 12:42       ` Leizhen (ThunderTown)
2016-06-07  8:08 ` [PATCH v4 13/14] of/numa: remove the constraint on the distances of node pairs Zhen Lei
2016-06-07  8:08   ` Zhen Lei
2016-06-07  8:08   ` Zhen Lei
2016-06-07  8:08 ` [PATCH v4 14/14] Documentation: " Zhen Lei
2016-06-07  8:08   ` Zhen Lei
2016-06-07  8:08   ` Zhen Lei
2016-06-10 13:08   ` Rob Herring
2016-06-10 13:08     ` Rob Herring
2016-06-10 13:08     ` Rob Herring
2016-06-07 13:58 ` [PATCH v4 00/14] fix some type infos and bugs for arm64/of numa Will Deacon
2016-06-07 13:58   ` Will Deacon
2016-06-08  8:59   ` Leizhen (ThunderTown)
2016-06-08  8:59     ` Leizhen (ThunderTown)
2016-06-08  8:59     ` Leizhen (ThunderTown)
2016-06-14 14:22     ` Catalin Marinas
2016-06-14 14:22       ` Catalin Marinas
2016-06-20  6:39       ` Leizhen (ThunderTown)
2016-06-20  6:39         ` Leizhen (ThunderTown)
2016-06-20  6:39         ` Leizhen (ThunderTown)
2016-06-22  1:55         ` Leizhen (ThunderTown)
2016-06-22  1:55           ` Leizhen (ThunderTown)
2016-06-22  1:55           ` Leizhen (ThunderTown)
2016-06-12  7:09   ` Hanjun Guo
2016-06-12  7:09     ` Hanjun Guo
2016-06-12  7:09     ` Hanjun Guo
2016-06-13 10:12     ` Will Deacon
2016-06-13 10:12       ` Will Deacon
2016-06-13 10:12       ` Will Deacon
2016-06-14 10:10       ` Hanjun Guo
2016-06-14 10:10         ` Hanjun Guo
2016-06-14 10:10         ` Hanjun Guo
2016-06-07 20:38 ` Rob Herring
2016-06-07 20:38   ` Rob Herring

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57578004.4050203@huawei.com \
    --to=thunder.leizhen@huawei.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.