* [PATCH 0/5] Make cpuid <-> nodeid mapping persistent.
@ 2015-07-01 4:45 Tang Chen
2015-07-01 4:45 ` [PATCH 1/5] x86, gfp: Cache best near node for memory allocation Tang Chen
` (4 more replies)
0 siblings, 5 replies; 9+ messages in thread
From: Tang Chen @ 2015-07-01 4:45 UTC (permalink / raw)
To: tj, mingo, akpm, rjw, hpa, laijs, yasu.isimatu, isimatu.yasuaki,
kamezawa.hiroyu, izumi.taku, gongzhaogang
Cc: tangchen, x86, linux-acpi
[Problem]
cpuid <-> nodeid mapping is firstly established at boot time. And workqueue caches
the mapping in wq_numa_possible_cpumask in wq_numa_init() at boot time.
When doing node online/offline, cpuid <-> nodeid mapping is established/destroyed,
which means, cpuid <-> nodeid mapping will change if node hotplug happens. But
workqueue does not update wq_numa_possible_cpumask.
So here is the problem:
Assume we have the following cpuid <-> nodeid in the beginning:
Node | CPU
------------------------
node 0 | 0-14, 60-74
node 1 | 15-29, 75-89
node 2 | 30-44, 90-104
node 3 | 45-59, 105-119
and we hot-remove node2 and node3, it becomes:
Node | CPU
------------------------
node 0 | 0-14, 60-74
node 1 | 15-29, 75-89
and we hot-add node4 and node5, it becomes:
Node | CPU
------------------------
node 0 | 0-14, 60-74
node 1 | 15-29, 75-89
node 4 | 30-59
node 5 | 90-119
But in wq_numa_possible_cpumask, cpu30 is still mapped to node2, and the like.
When a pool workqueue is initialized, if its cpumask belongs to a node, its
pool->node will be mapped to that node. And memory used by this workqueue will
also be allocated on that node.
static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs){
...
/* if cpumask is contained inside a NUMA node, we belong to that node */
if (wq_numa_enabled) {
for_each_node(node) {
if (cpumask_subset(pool->attrs->cpumask,
wq_numa_possible_cpumask[node])) {
pool->node = node;
break;
}
}
}
Since wq_numa_possible_cpumask is not updated, it could be mapped to an offline node,
which will lead to memory allocation failure:
SLUB: Unable to allocate memory on node 2 (gfp=0x80d0)
cache: kmalloc-192, object size: 192, buffer size: 192, default order: 1, min order: 0
node 0: slabs: 6172, objs: 259224, free: 245741
node 1: slabs: 3261, objs: 136962, free: 127656
It happens here:
create_worker(struct worker_pool *pool)
|--> worker = alloc_worker(pool->node);
static struct worker *alloc_worker(int node)
{
struct worker *worker;
worker = kzalloc_node(sizeof(*worker), GFP_KERNEL, node); --> Here, useing the wrong node.
......
return worker;
}
[Solution]
To fix this problem, we establish cpuid <-> nodeid mapping for all the possible
cpus at boot time, and make it invariable. And according to init_cpu_to_node(),
cpuid <-> nodeid mapping is based on apicid <-> nodeid mapping and cpuid <-> apicid
mapping. So the key point is obtaining all cpus' apicid.
apicid can be obtained by _MAT (Multiple APIC Table Entry) method or found in
MADT (Multiple APIC Description Table). So we finish the job in the following steps:
1. Enable apic registeration flow to handle both enabled and disabled cpus.
This is done by introducing an extra parameter to generic_processor_info to let the
caller control if disabled cpus are ignored.
2. Introduce a new array storing all possible cpuid <-> apicid mapping. And also modify
the way cpuid is calculated. Establish all possible cpuid <-> apicid mapping when
registering local apic. Store the mapping in the array introduced above.
4. Enable _MAT and MADT relative apis to return non-presnet or disabled cpus' apicid.
This is also done by introducing an extra parameter to these apis to let the caller
control if disabled cpus are ignored.
5. Establish all possible cpuid <-> nodeid mapping.
This is done via an additional acpi namespace walk for processors.
For previous discussion, please refer to:
https://lkml.org/lkml/2015/2/27/145
https://lkml.org/lkml/2015/3/25/989
https://lkml.org/lkml/2015/5/14/244
Gu Zheng (5):
x86, gfp: Cache best near node for memory allocation.
x86, acpi, cpu-hotplug: Enable acpi to register all possible cpus at
boot time.
x86, acpi, cpu-hotplug: Introduce apicid_to_cpuid[] array to store
persistent cpuid <-> apicid mapping.
x86, acpi, cpu-hotplug: Enable MADT APIs to return disabled apicid.
x86, acpi, cpu-hotplug: Set persistent cpuid <-> nodeid mapping when
booting.
arch/ia64/kernel/acpi.c | 2 +-
arch/x86/include/asm/mpspec.h | 1 +
arch/x86/include/asm/topology.h | 2 +
arch/x86/kernel/acpi/boot.c | 8 +--
arch/x86/kernel/apic/apic.c | 71 ++++++++++++++++++++---
arch/x86/mm/numa.c | 57 ++++++++++++-------
drivers/acpi/acpi_processor.c | 5 +-
drivers/acpi/bus.c | 3 +
drivers/acpi/processor_core.c | 122 +++++++++++++++++++++++++++++++++-------
include/linux/acpi.h | 2 +
include/linux/gfp.h | 12 +++-
11 files changed, 227 insertions(+), 58 deletions(-)
--
1.9.3
^ permalink raw reply [flat|nested] 9+ messages in thread* [PATCH 1/5] x86, gfp: Cache best near node for memory allocation.
2015-07-01 4:45 [PATCH 0/5] Make cpuid <-> nodeid mapping persistent Tang Chen
@ 2015-07-01 4:45 ` Tang Chen
2015-07-01 4:45 ` [PATCH 2/5] x86, acpi, cpu-hotplug: Enable acpi to register all possible cpus at boot time Tang Chen
` (3 subsequent siblings)
4 siblings, 0 replies; 9+ messages in thread
From: Tang Chen @ 2015-07-01 4:45 UTC (permalink / raw)
To: tj, mingo, akpm, rjw, hpa, laijs, yasu.isimatu, isimatu.yasuaki,
kamezawa.hiroyu, izumi.taku, gongzhaogang
Cc: tangchen, x86, linux-acpi, Gu Zheng
From: Gu Zheng <guz.fnst@cn.fujitsu.com>
In current code, all possible cpus are mapped to the best near online
node if the node they reside in is offline in init_cpu_to_node().
init_cpu_to_node()
{
......
for_each_possible_cpu(cpu) {
......
if (!node_online(node))
node = find_near_online_node(node);
numa_set_node(cpu, node);
}
}
Why doing this is to prevent memory allocation failure if the cpu is
online but there is no memory on that node.
But since cpuid <-> nodeid mapping will fix after this patch-set, doing
so in initialization pharse makes no sense any more. The best near online
node for each cpu should be cached somewhere.
In this patch, a per-cpu cache named x86_cpu_to_near_online_node is
introduced to store these info, and make use of them when memory allocation
fails in alloc_pages_node() and alloc_pages_exact_node().
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
---
arch/x86/include/asm/topology.h | 2 ++
arch/x86/mm/numa.c | 57 ++++++++++++++++++++++++++---------------
include/linux/gfp.h | 12 ++++++++-
3 files changed, 50 insertions(+), 21 deletions(-)
diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index 0fb4648..e3e22b2 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -82,6 +82,8 @@ static inline const struct cpumask *cpumask_of_node(int node)
}
#endif
+extern int get_near_online_node(int node);
+
extern void setup_node_to_cpumask_map(void);
/*
diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
index 4053bb5..13bd0d7 100644
--- a/arch/x86/mm/numa.c
+++ b/arch/x86/mm/numa.c
@@ -69,6 +69,7 @@ int numa_cpu_node(int cpu)
return NUMA_NO_NODE;
}
+cpumask_t node_to_cpuid_mask_map[MAX_NUMNODES];
cpumask_var_t node_to_cpumask_map[MAX_NUMNODES];
EXPORT_SYMBOL(node_to_cpumask_map);
@@ -78,6 +79,31 @@ EXPORT_SYMBOL(node_to_cpumask_map);
DEFINE_EARLY_PER_CPU(int, x86_cpu_to_node_map, NUMA_NO_NODE);
EXPORT_EARLY_PER_CPU_SYMBOL(x86_cpu_to_node_map);
+/*
+ * Map cpu index to the best near online node. The best near online node
+ * is the backup node for memory allocation on offline node.
+ */
+DEFINE_PER_CPU(int, x86_cpu_to_near_online_node);
+EXPORT_PER_CPU_SYMBOL(x86_cpu_to_near_online_node);
+
+static int find_near_online_node(int node)
+{
+ int n, val;
+ int min_val = INT_MAX;
+ int best_node = -1;
+
+ for_each_online_node(n) {
+ val = node_distance(node, n);
+
+ if (val < min_val) {
+ min_val = val;
+ best_node = n;
+ }
+ }
+
+ return best_node;
+}
+
void numa_set_node(int cpu, int node)
{
int *cpu_to_node_map = early_per_cpu_ptr(x86_cpu_to_node_map);
@@ -95,7 +121,11 @@ void numa_set_node(int cpu, int node)
return;
}
#endif
+
+ per_cpu(x86_cpu_to_near_online_node, cpu) =
+ find_near_online_node(numa_cpu_node(cpu));
per_cpu(x86_cpu_to_node_map, cpu) = node;
+ cpumask_set_cpu(cpu, &node_to_cpuid_mask_map[numa_cpu_node(cpu)]);
set_cpu_numa_node(cpu, node);
}
@@ -105,6 +135,13 @@ void numa_clear_node(int cpu)
numa_set_node(cpu, NUMA_NO_NODE);
}
+int get_near_online_node(int node)
+{
+ return per_cpu(x86_cpu_to_near_online_node,
+ cpumask_first(&node_to_cpuid_mask_map[node]));
+}
+EXPORT_SYMBOL(get_near_online_node);
+
/*
* Allocate node_to_cpumask_map based on number of available nodes
* Requires node_possible_map to be valid.
@@ -702,24 +739,6 @@ void __init x86_numa_init(void)
numa_init(dummy_numa_init);
}
-static __init int find_near_online_node(int node)
-{
- int n, val;
- int min_val = INT_MAX;
- int best_node = -1;
-
- for_each_online_node(n) {
- val = node_distance(node, n);
-
- if (val < min_val) {
- min_val = val;
- best_node = n;
- }
- }
-
- return best_node;
-}
-
/*
* Setup early cpu_to_node.
*
@@ -746,8 +765,6 @@ void __init init_cpu_to_node(void)
if (node == NUMA_NO_NODE)
continue;
- if (!node_online(node))
- node = find_near_online_node(node);
numa_set_node(cpu, node);
}
}
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 6ba7cf2..4a18b21 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -307,13 +307,23 @@ static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,
if (nid < 0)
nid = numa_node_id();
+#if IS_ENABLED(CONFIG_X86) && IS_ENABLED(CONFIG_NUMA)
+ if (!node_online(nid))
+ nid = get_near_online_node(nid);
+#endif
+
return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask));
}
static inline struct page *alloc_pages_exact_node(int nid, gfp_t gfp_mask,
unsigned int order)
{
- VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid));
+ VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES);
+
+#if IS_ENABLED(CONFIG_X86) && IS_ENABLED(CONFIG_NUMA)
+ if (!node_online(nid))
+ nid = get_near_online_node(nid);
+#endif
return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask));
}
--
1.9.3
^ permalink raw reply related [flat|nested] 9+ messages in thread* [PATCH 2/5] x86, acpi, cpu-hotplug: Enable acpi to register all possible cpus at boot time.
2015-07-01 4:45 [PATCH 0/5] Make cpuid <-> nodeid mapping persistent Tang Chen
2015-07-01 4:45 ` [PATCH 1/5] x86, gfp: Cache best near node for memory allocation Tang Chen
@ 2015-07-01 4:45 ` Tang Chen
2015-07-01 4:45 ` [PATCH 3/5] x86, acpi, cpu-hotplug: Introduce apicid_to_cpuid[] array to store persistent cpuid <-> apicid mapping Tang Chen
` (2 subsequent siblings)
4 siblings, 0 replies; 9+ messages in thread
From: Tang Chen @ 2015-07-01 4:45 UTC (permalink / raw)
To: tj, mingo, akpm, rjw, hpa, laijs, yasu.isimatu, isimatu.yasuaki,
kamezawa.hiroyu, izumi.taku, gongzhaogang
Cc: tangchen, x86, linux-acpi, Gu Zheng
From: Gu Zheng <guz.fnst@cn.fujitsu.com>
[Problem]
cpuid <-> nodeid mapping is firstly established at boot time. And workqueue caches
the mapping in wq_numa_possible_cpumask in wq_numa_init() at boot time.
When doing node online/offline, cpuid <-> nodeid mapping is established/destroyed,
which means, cpuid <-> nodeid mapping will change if node hotplug happens. But
workqueue does not update wq_numa_possible_cpumask.
So here is the problem:
Assume we have the following cpuid <-> nodeid in the beginning:
Node | CPU
------------------------
node 0 | 0-14, 60-74
node 1 | 15-29, 75-89
node 2 | 30-44, 90-104
node 3 | 45-59, 105-119
and we hot-remove node2 and node3, it becomes:
Node | CPU
------------------------
node 0 | 0-14, 60-74
node 1 | 15-29, 75-89
and we hot-add node4 and node5, it becomes:
Node | CPU
------------------------
node 0 | 0-14, 60-74
node 1 | 15-29, 75-89
node 4 | 30-59
node 5 | 90-119
But in wq_numa_possible_cpumask, cpu30 is still mapped to node2, and the like.
When a pool workqueue is initialized, if its cpumask belongs to a node, its
pool->node will be mapped to that node. And memory used by this workqueue will
also be allocated on that node.
static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs){
...
/* if cpumask is contained inside a NUMA node, we belong to that node */
if (wq_numa_enabled) {
for_each_node(node) {
if (cpumask_subset(pool->attrs->cpumask,
wq_numa_possible_cpumask[node])) {
pool->node = node;
break;
}
}
}
Since wq_numa_possible_cpumask is not updated, it could be mapped to an offline node,
which will lead to memory allocation failure:
SLUB: Unable to allocate memory on node 2 (gfp=0x80d0)
cache: kmalloc-192, object size: 192, buffer size: 192, default order: 1, min order: 0
node 0: slabs: 6172, objs: 259224, free: 245741
node 1: slabs: 3261, objs: 136962, free: 127656
It happens here:
create_worker(struct worker_pool *pool)
|--> worker = alloc_worker(pool->node);
static struct worker *alloc_worker(int node)
{
struct worker *worker;
worker = kzalloc_node(sizeof(*worker), GFP_KERNEL, node); --> Here, useing the wrong node.
......
return worker;
}
[Solution]
To fix this problem, we establish cpuid <-> nodeid mapping for all the possible
cpus at boot time, and make it invariable. And according to init_cpu_to_node(),
cpuid <-> nodeid mapping is based on apicid <-> nodeid mapping and cpuid <-> apicid
mapping. So the key point is obtaining all cpus' apicid.
apicid can be obtained by _MAT (Multiple APIC Table Entry) method or found in
MADT (Multiple APIC Description Table). So we finish the job in the following steps:
1. Enable apic registeration flow to handle both enabled and disabled cpus.
This is done by introducing an extra parameter to generic_processor_info to let the
caller control if disabled cpus are ignored.
2. Introduce a new array storing all possible cpuid <-> apicid mapping. And also modify
the way cpuid is calculated. Establish all possible cpuid <-> apicid mapping when
registering local apic. Store the mapping in the array introduced above.
4. Enable _MAT and MADT relative apis to return non-presnet or disabled cpus' apicid.
This is also done by introducing an extra parameter to these apis to let the caller
control if disabled cpus are ignored.
5. Establish all possible cpuid <-> nodeid mapping.
This is done via an additional acpi namespace walk for processors.
This patch finished step 1.
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
---
arch/x86/kernel/apic/apic.c | 26 +++++++++++++++++++-------
1 file changed, 19 insertions(+), 7 deletions(-)
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index dcb5285..a9c9830 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1977,7 +1977,7 @@ void disconnect_bsp_APIC(int virt_wire_setup)
apic_write(APIC_LVT1, value);
}
-int generic_processor_info(int apicid, int version)
+static int __generic_processor_info(int apicid, int version, bool enabled)
{
int cpu, max = nr_cpu_ids;
bool boot_cpu_detected = physid_isset(boot_cpu_physical_apicid,
@@ -2011,7 +2011,8 @@ int generic_processor_info(int apicid, int version)
" Processor %d/0x%x ignored.\n",
thiscpu, apicid);
- disabled_cpus++;
+ if (enabled)
+ disabled_cpus++;
return -ENODEV;
}
@@ -2028,7 +2029,8 @@ int generic_processor_info(int apicid, int version)
" reached. Keeping one slot for boot cpu."
" Processor %d/0x%x ignored.\n", max, thiscpu, apicid);
- disabled_cpus++;
+ if (enabled)
+ disabled_cpus++;
return -ENODEV;
}
@@ -2039,11 +2041,14 @@ int generic_processor_info(int apicid, int version)
"ACPI: NR_CPUS/possible_cpus limit of %i reached."
" Processor %d/0x%x ignored.\n", max, thiscpu, apicid);
- disabled_cpus++;
+ if (enabled)
+ disabled_cpus++;
return -EINVAL;
}
- num_processors++;
+ if (enabled)
+ num_processors++;
+
if (apicid == boot_cpu_physical_apicid) {
/*
* x86_bios_cpu_apicid is required to have processors listed
@@ -2071,7 +2076,8 @@ int generic_processor_info(int apicid, int version)
apic_version[boot_cpu_physical_apicid], cpu, version);
}
- physid_set(apicid, phys_cpu_present_map);
+ if (enabled)
+ physid_set(apicid, phys_cpu_present_map);
if (apicid > max_physical_apicid)
max_physical_apicid = apicid;
@@ -2084,11 +2090,17 @@ int generic_processor_info(int apicid, int version)
apic->x86_32_early_logical_apicid(cpu);
#endif
set_cpu_possible(cpu, true);
- set_cpu_present(cpu, true);
+ if (enabled)
+ set_cpu_present(cpu, true);
return cpu;
}
+int generic_processor_info(int apicid, int version)
+{
+ return __generic_processor_info(apicid, version, true);
+}
+
int hard_smp_processor_id(void)
{
return read_apic_id();
--
1.9.3
^ permalink raw reply related [flat|nested] 9+ messages in thread* [PATCH 3/5] x86, acpi, cpu-hotplug: Introduce apicid_to_cpuid[] array to store persistent cpuid <-> apicid mapping.
2015-07-01 4:45 [PATCH 0/5] Make cpuid <-> nodeid mapping persistent Tang Chen
2015-07-01 4:45 ` [PATCH 1/5] x86, gfp: Cache best near node for memory allocation Tang Chen
2015-07-01 4:45 ` [PATCH 2/5] x86, acpi, cpu-hotplug: Enable acpi to register all possible cpus at boot time Tang Chen
@ 2015-07-01 4:45 ` Tang Chen
2015-07-01 4:45 ` [PATCH 4/5] x86, acpi, cpu-hotplug: Enable MADT APIs to return disabled apicid Tang Chen
2015-07-01 4:45 ` [PATCH 5/5] x86, acpi, cpu-hotplug: Set persistent cpuid <-> nodeid mapping when booting Tang Chen
4 siblings, 0 replies; 9+ messages in thread
From: Tang Chen @ 2015-07-01 4:45 UTC (permalink / raw)
To: tj, mingo, akpm, rjw, hpa, laijs, yasu.isimatu, isimatu.yasuaki,
kamezawa.hiroyu, izumi.taku, gongzhaogang
Cc: tangchen, x86, linux-acpi, Gu Zheng
From: Gu Zheng <guz.fnst@cn.fujitsu.com>
In this patch, we introduce a new static array named apicid_to_cpuid[],
which is large enough to store info for all possible cpus.
And then, we modify the cpuid calculation. In generic_processor_info(),
it simply finds the next unused cpuid. And it is also why the cpuid <-> nodeid
mapping changes with node hotplug.
After this patch, we find the next unused cpuid, map it to an apicid,
and store the mapping in apicid_to_cpuid[], so that cpuid <-> apicid
mapping will be persistent.
And finally we will use this array to make cpuid <-> nodeid persistent.
cpuid <-> apicid mapping is established at local apic registeration time.
But non-present or disabled cpus are ignored.
In this patch, we establish all possible cpuid <-> apicid mapping when
registering local apic.
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
---
arch/x86/include/asm/mpspec.h | 1 +
arch/x86/kernel/acpi/boot.c | 6 ++----
arch/x86/kernel/apic/apic.c | 47 ++++++++++++++++++++++++++++++++++++++++---
3 files changed, 47 insertions(+), 7 deletions(-)
diff --git a/arch/x86/include/asm/mpspec.h b/arch/x86/include/asm/mpspec.h
index b07233b..db902d8 100644
--- a/arch/x86/include/asm/mpspec.h
+++ b/arch/x86/include/asm/mpspec.h
@@ -86,6 +86,7 @@ static inline void early_reserve_e820_mpc_new(void) { }
#endif
int generic_processor_info(int apicid, int version);
+int __generic_processor_info(int apicid, int version, bool enabled);
#define PHYSID_ARRAY_SIZE BITS_TO_LONGS(MAX_LOCAL_APIC)
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index e49ee24..bcc85b2 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -174,15 +174,13 @@ static int acpi_register_lapic(int id, u8 enabled)
return -EINVAL;
}
- if (!enabled) {
+ if (!enabled)
++disabled_cpus;
- return -EINVAL;
- }
if (boot_cpu_physical_apicid != -1U)
ver = apic_version[boot_cpu_physical_apicid];
- return generic_processor_info(id, ver);
+ return __generic_processor_info(id, ver, enabled);
}
static int __init
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index a9c9830..c744ffb 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1977,7 +1977,38 @@ void disconnect_bsp_APIC(int virt_wire_setup)
apic_write(APIC_LVT1, value);
}
-static int __generic_processor_info(int apicid, int version, bool enabled)
+/*
+ * Logic cpu number(cpuid) to local APIC id persistent mappings.
+ * Do not clear the mapping even if cpu is hot-removed.
+ */
+static int apicid_to_cpuid[] = {
+ [0 ... NR_CPUS - 1] = -1,
+};
+
+/*
+ * Internal cpu id bits, set the bit once cpu present, and never clear it.
+ */
+static cpumask_t cpuid_mask = CPU_MASK_NONE;
+
+static int get_cpuid(int apicid)
+{
+ int free_id, i;
+
+ free_id = cpumask_next_zero(-1, &cpuid_mask);
+ if (free_id >= nr_cpu_ids)
+ return -1;
+
+ for (i = 0; i < free_id; i++)
+ if (apicid_to_cpuid[i] == apicid)
+ return i;
+
+ apicid_to_cpuid[free_id] = apicid;
+ cpumask_set_cpu(free_id, &cpuid_mask);
+
+ return free_id;
+}
+
+int __generic_processor_info(int apicid, int version, bool enabled)
{
int cpu, max = nr_cpu_ids;
bool boot_cpu_detected = physid_isset(boot_cpu_physical_apicid,
@@ -2058,8 +2089,18 @@ static int __generic_processor_info(int apicid, int version, bool enabled)
* for BSP.
*/
cpu = 0;
- } else
- cpu = cpumask_next_zero(-1, cpu_present_mask);
+ } else {
+ cpu = get_cpuid(apicid);
+ if (cpu < 0) {
+ int thiscpu = max + disabled_cpus;
+
+ pr_warning(" Processor %d/0x%x ignored.\n",
+ thiscpu, apicid);
+ if (enabled)
+ disabled_cpus++;
+ return -EINVAL;
+ }
+ }
/*
* Validate version
--
1.9.3
^ permalink raw reply related [flat|nested] 9+ messages in thread* [PATCH 4/5] x86, acpi, cpu-hotplug: Enable MADT APIs to return disabled apicid.
2015-07-01 4:45 [PATCH 0/5] Make cpuid <-> nodeid mapping persistent Tang Chen
` (2 preceding siblings ...)
2015-07-01 4:45 ` [PATCH 3/5] x86, acpi, cpu-hotplug: Introduce apicid_to_cpuid[] array to store persistent cpuid <-> apicid mapping Tang Chen
@ 2015-07-01 4:45 ` Tang Chen
2015-07-01 4:45 ` [PATCH 5/5] x86, acpi, cpu-hotplug: Set persistent cpuid <-> nodeid mapping when booting Tang Chen
4 siblings, 0 replies; 9+ messages in thread
From: Tang Chen @ 2015-07-01 4:45 UTC (permalink / raw)
To: tj, mingo, akpm, rjw, hpa, laijs, yasu.isimatu, isimatu.yasuaki,
kamezawa.hiroyu, izumi.taku, gongzhaogang
Cc: tangchen, x86, linux-acpi, Gu Zheng
From: Gu Zheng <guz.fnst@cn.fujitsu.com>
All processors' apicids can be obtained by _MAT method or from MADT in ACPI.
The current code ignores disabled processors and returns -ENODEV.
After this patch, a new parameter will be added to MADT APIs so that caller
is able to control if disabled processors are ignored.
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
---
drivers/acpi/acpi_processor.c | 5 +++-
drivers/acpi/processor_core.c | 57 +++++++++++++++++++++++++++----------------
2 files changed, 40 insertions(+), 22 deletions(-)
diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index 92a5f73..338c71a 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -282,8 +282,11 @@ static int acpi_processor_get_info(struct acpi_device *device)
* Extra Processor objects may be enumerated on MP systems with
* less than the max # of CPUs. They should be ignored _iff
* they are physically not present.
+ *
+ * NOTE: Even if the processor has a cpuid, it may not present because
+ * cpuid <-> apicid mapping is persistent now.
*/
- if (invalid_logical_cpuid(pr->id)) {
+ if (invalid_logical_cpuid(pr->id) || !cpu_present(pr->id)) {
int ret = acpi_processor_hotadd_init(pr);
if (ret)
return ret;
diff --git a/drivers/acpi/processor_core.c b/drivers/acpi/processor_core.c
index 33a38d6..824b98b 100644
--- a/drivers/acpi/processor_core.c
+++ b/drivers/acpi/processor_core.c
@@ -32,12 +32,12 @@ static struct acpi_table_madt *get_madt_table(void)
}
static int map_lapic_id(struct acpi_subtable_header *entry,
- u32 acpi_id, phys_cpuid_t *apic_id)
+ u32 acpi_id, phys_cpuid_t *apic_id, bool ignore_disabled)
{
struct acpi_madt_local_apic *lapic =
container_of(entry, struct acpi_madt_local_apic, header);
- if (!(lapic->lapic_flags & ACPI_MADT_ENABLED))
+ if (ignore_disabled && !(lapic->lapic_flags & ACPI_MADT_ENABLED))
return -ENODEV;
if (lapic->processor_id != acpi_id)
@@ -48,12 +48,13 @@ static int map_lapic_id(struct acpi_subtable_header *entry,
}
static int map_x2apic_id(struct acpi_subtable_header *entry,
- int device_declaration, u32 acpi_id, phys_cpuid_t *apic_id)
+ int device_declaration, u32 acpi_id, phys_cpuid_t *apic_id,
+ bool ignore_disabled)
{
struct acpi_madt_local_x2apic *apic =
container_of(entry, struct acpi_madt_local_x2apic, header);
- if (!(apic->lapic_flags & ACPI_MADT_ENABLED))
+ if (ignore_disabled && !(apic->lapic_flags & ACPI_MADT_ENABLED))
return -ENODEV;
if (device_declaration && (apic->uid == acpi_id)) {
@@ -65,12 +66,13 @@ static int map_x2apic_id(struct acpi_subtable_header *entry,
}
static int map_lsapic_id(struct acpi_subtable_header *entry,
- int device_declaration, u32 acpi_id, phys_cpuid_t *apic_id)
+ int device_declaration, u32 acpi_id, phys_cpuid_t *apic_id,
+ bool ignore_disabled)
{
struct acpi_madt_local_sapic *lsapic =
container_of(entry, struct acpi_madt_local_sapic, header);
- if (!(lsapic->lapic_flags & ACPI_MADT_ENABLED))
+ if (ignore_disabled && !(lsapic->lapic_flags & ACPI_MADT_ENABLED))
return -ENODEV;
if (device_declaration) {
@@ -87,12 +89,13 @@ static int map_lsapic_id(struct acpi_subtable_header *entry,
* Retrieve the ARM CPU physical identifier (MPIDR)
*/
static int map_gicc_mpidr(struct acpi_subtable_header *entry,
- int device_declaration, u32 acpi_id, phys_cpuid_t *mpidr)
+ int device_declaration, u32 acpi_id, phys_cpuid_t *mpidr,
+ bool ignore_disabled)
{
struct acpi_madt_generic_interrupt *gicc =
container_of(entry, struct acpi_madt_generic_interrupt, header);
- if (!(gicc->flags & ACPI_MADT_ENABLED))
+ if (ignore_disabled && !(gicc->flags & ACPI_MADT_ENABLED))
return -ENODEV;
/* device_declaration means Device object in DSDT, in the
@@ -108,7 +111,7 @@ static int map_gicc_mpidr(struct acpi_subtable_header *entry,
return -EINVAL;
}
-static phys_cpuid_t map_madt_entry(int type, u32 acpi_id)
+static phys_cpuid_t map_madt_entry(int type, u32 acpi_id, bool ignore_disabled)
{
unsigned long madt_end, entry;
phys_cpuid_t phys_id = PHYS_CPUID_INVALID; /* CPU hardware ID */
@@ -128,16 +131,20 @@ static phys_cpuid_t map_madt_entry(int type, u32 acpi_id)
struct acpi_subtable_header *header =
(struct acpi_subtable_header *)entry;
if (header->type == ACPI_MADT_TYPE_LOCAL_APIC) {
- if (!map_lapic_id(header, acpi_id, &phys_id))
+ if (!map_lapic_id(header, acpi_id, &phys_id,
+ ignore_disabled))
break;
} else if (header->type == ACPI_MADT_TYPE_LOCAL_X2APIC) {
- if (!map_x2apic_id(header, type, acpi_id, &phys_id))
+ if (!map_x2apic_id(header, type, acpi_id, &phys_id,
+ ignore_disabled))
break;
} else if (header->type == ACPI_MADT_TYPE_LOCAL_SAPIC) {
- if (!map_lsapic_id(header, type, acpi_id, &phys_id))
+ if (!map_lsapic_id(header, type, acpi_id, &phys_id,
+ ignore_disabled))
break;
} else if (header->type == ACPI_MADT_TYPE_GENERIC_INTERRUPT) {
- if (!map_gicc_mpidr(header, type, acpi_id, &phys_id))
+ if (!map_gicc_mpidr(header, type, acpi_id, &phys_id,
+ ignore_disabled))
break;
}
entry += header->length;
@@ -145,7 +152,8 @@ static phys_cpuid_t map_madt_entry(int type, u32 acpi_id)
return phys_id;
}
-static phys_cpuid_t map_mat_entry(acpi_handle handle, int type, u32 acpi_id)
+static phys_cpuid_t map_mat_entry(acpi_handle handle, int type, u32 acpi_id,
+ bool ignore_disabled)
{
struct acpi_buffer buffer = { ACPI_ALLOCATE_BUFFER, NULL };
union acpi_object *obj;
@@ -166,30 +174,37 @@ static phys_cpuid_t map_mat_entry(acpi_handle handle, int type, u32 acpi_id)
header = (struct acpi_subtable_header *)obj->buffer.pointer;
if (header->type == ACPI_MADT_TYPE_LOCAL_APIC)
- map_lapic_id(header, acpi_id, &phys_id);
+ map_lapic_id(header, acpi_id, &phys_id, ignore_disabled);
else if (header->type == ACPI_MADT_TYPE_LOCAL_SAPIC)
- map_lsapic_id(header, type, acpi_id, &phys_id);
+ map_lsapic_id(header, type, acpi_id, &phys_id, ignore_disabled);
else if (header->type == ACPI_MADT_TYPE_LOCAL_X2APIC)
- map_x2apic_id(header, type, acpi_id, &phys_id);
+ map_x2apic_id(header, type, acpi_id, &phys_id, ignore_disabled);
else if (header->type == ACPI_MADT_TYPE_GENERIC_INTERRUPT)
- map_gicc_mpidr(header, type, acpi_id, &phys_id);
+ map_gicc_mpidr(header, type, acpi_id, &phys_id,
+ ignore_disabled);
exit:
kfree(buffer.pointer);
return phys_id;
}
-phys_cpuid_t acpi_get_phys_id(acpi_handle handle, int type, u32 acpi_id)
+static phys_cpuid_t __acpi_get_phys_id(acpi_handle handle, int type,
+ u32 acpi_id, bool ignore_disabled)
{
phys_cpuid_t phys_id;
- phys_id = map_mat_entry(handle, type, acpi_id);
+ phys_id = map_mat_entry(handle, type, acpi_id, ignore_disabled);
if (invalid_phys_cpuid(phys_id))
- phys_id = map_madt_entry(type, acpi_id);
+ phys_id = map_madt_entry(type, acpi_id, ignore_disabled);
return phys_id;
}
+phys_cpuid_t acpi_get_phys_id(acpi_handle handle, int type, u32 acpi_id)
+{
+ return __acpi_get_phys_id(handle, type, acpi_id, true);
+}
+
int acpi_map_cpuid(phys_cpuid_t phys_id, u32 acpi_id)
{
#ifdef CONFIG_SMP
--
1.9.3
^ permalink raw reply related [flat|nested] 9+ messages in thread* [PATCH 5/5] x86, acpi, cpu-hotplug: Set persistent cpuid <-> nodeid mapping when booting.
2015-07-01 4:45 [PATCH 0/5] Make cpuid <-> nodeid mapping persistent Tang Chen
` (3 preceding siblings ...)
2015-07-01 4:45 ` [PATCH 4/5] x86, acpi, cpu-hotplug: Enable MADT APIs to return disabled apicid Tang Chen
@ 2015-07-01 4:45 ` Tang Chen
4 siblings, 0 replies; 9+ messages in thread
From: Tang Chen @ 2015-07-01 4:45 UTC (permalink / raw)
To: tj, mingo, akpm, rjw, hpa, laijs, yasu.isimatu, isimatu.yasuaki,
kamezawa.hiroyu, izumi.taku, gongzhaogang
Cc: tangchen, x86, linux-acpi, Gu Zheng
From: Gu Zheng <guz.fnst@cn.fujitsu.com>
This patch set the persistent cpuid <-> nodeid mapping for all enabled/disabled
processors at boot time via an additional acpi namespace walk for processors.
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
---
arch/ia64/kernel/acpi.c | 2 +-
arch/x86/kernel/acpi/boot.c | 2 +-
drivers/acpi/bus.c | 3 ++
drivers/acpi/processor_core.c | 65 +++++++++++++++++++++++++++++++++++++++++++
include/linux/acpi.h | 2 ++
5 files changed, 72 insertions(+), 2 deletions(-)
diff --git a/arch/ia64/kernel/acpi.c b/arch/ia64/kernel/acpi.c
index b1698bc..7db5563 100644
--- a/arch/ia64/kernel/acpi.c
+++ b/arch/ia64/kernel/acpi.c
@@ -796,7 +796,7 @@ int acpi_isa_irq_to_gsi(unsigned isa_irq, u32 *gsi)
* ACPI based hotplug CPU support
*/
#ifdef CONFIG_ACPI_HOTPLUG_CPU
-static int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
+int acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
{
#ifdef CONFIG_ACPI_NUMA
/*
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index bcc85b2..b9a1aa1 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -695,7 +695,7 @@ static void __init acpi_set_irq_model_ioapic(void)
#ifdef CONFIG_ACPI_HOTPLUG_CPU
#include <acpi/processor.h>
-static void acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
+void acpi_map_cpu2node(acpi_handle handle, int cpu, int physid)
{
#ifdef CONFIG_ACPI_NUMA
int nid;
diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
index 513e7230e..fd03885 100644
--- a/drivers/acpi/bus.c
+++ b/drivers/acpi/bus.c
@@ -700,6 +700,9 @@ static int __init acpi_init(void)
acpi_debugfs_init();
acpi_sleep_proc_init();
acpi_wakeup_device_init();
+#ifdef CONFIG_ACPI_HOTPLUG_CPU
+ acpi_set_processor_mapping();
+#endif
return 0;
}
diff --git a/drivers/acpi/processor_core.c b/drivers/acpi/processor_core.c
index 824b98b..45580ff 100644
--- a/drivers/acpi/processor_core.c
+++ b/drivers/acpi/processor_core.c
@@ -261,6 +261,71 @@ int acpi_get_cpuid(acpi_handle handle, int type, u32 acpi_id)
}
EXPORT_SYMBOL_GPL(acpi_get_cpuid);
+#ifdef CONFIG_ACPI_HOTPLUG_CPU
+static bool map_processor(acpi_handle handle, int *phys_id, int *cpuid)
+{
+ int type;
+ u32 acpi_id;
+ acpi_status status;
+ acpi_object_type acpi_type;
+ unsigned long long tmp;
+ union acpi_object object = { 0 };
+ struct acpi_buffer buffer = { sizeof(union acpi_object), &object };
+
+ status = acpi_get_type(handle, &acpi_type);
+ if (ACPI_FAILURE(status))
+ return false;
+
+ switch (acpi_type) {
+ case ACPI_TYPE_PROCESSOR:
+ status = acpi_evaluate_object(handle, NULL, NULL, &buffer);
+ if (ACPI_FAILURE(status))
+ return false;
+ acpi_id = object.processor.proc_id;
+ break;
+ case ACPI_TYPE_DEVICE:
+ status = acpi_evaluate_integer(handle, "_UID", NULL, &tmp);
+ if (ACPI_FAILURE(status))
+ return false;
+ acpi_id = tmp;
+ break;
+ default:
+ return false;
+ }
+
+ type = (acpi_type == ACPI_TYPE_DEVICE) ? 1 : 0;
+
+ *phys_id = __acpi_get_phys_id(handle, type, acpi_id, false);
+ *cpuid = acpi_map_cpuid(*phys_id, acpi_id);
+ if (*cpuid == -1)
+ return false;
+
+ return true;
+}
+
+static acpi_status __init
+set_processor_node_mapping(acpi_handle handle, u32 lvl, void *context,
+ void **rv)
+{
+ u32 apic_id;
+ int cpu_id;
+
+ if (!map_processor(handle, &apic_id, &cpu_id))
+ return AE_ERROR;
+
+ acpi_map_cpu2node(handle, cpu_id, apic_id);
+ return AE_OK;
+}
+
+void __init acpi_set_processor_mapping(void)
+{
+ /* Set persistent cpu <-> node mapping for all processors. */
+ acpi_walk_namespace(ACPI_TYPE_PROCESSOR, ACPI_ROOT_OBJECT,
+ ACPI_UINT32_MAX, set_processor_node_mapping,
+ NULL, NULL, NULL);
+}
+#endif
+
#ifdef CONFIG_ACPI_HOTPLUG_IOAPIC
static int get_ioapic_id(struct acpi_subtable_header *entry, u32 gsi_base,
u64 *phys_addr, int *ioapic_id)
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 1618cdf..fe3bd4b 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -172,6 +172,8 @@ static inline bool invalid_phys_cpuid(phys_cpuid_t phys_id)
/* Arch dependent functions for cpu hotplug support */
int acpi_map_cpu(acpi_handle handle, phys_cpuid_t physid, int *pcpu);
int acpi_unmap_cpu(int cpu);
+void acpi_map_cpu2node(acpi_handle handle, int cpu, int physid);
+void __init acpi_set_processor_mapping(void);
#endif /* CONFIG_ACPI_HOTPLUG_CPU */
#ifdef CONFIG_ACPI_HOTPLUG_IOAPIC
--
1.9.3
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 0/5] Make cpuid <-> nodeid mapping persistent.
@ 2015-07-07 9:30 Tang Chen
2015-07-07 9:30 ` [PATCH 4/5] x86, acpi, cpu-hotplug: Enable MADT APIs to return disabled apicid Tang Chen
0 siblings, 1 reply; 9+ messages in thread
From: Tang Chen @ 2015-07-07 9:30 UTC (permalink / raw)
To: tj, mingo, akpm, rjw, hpa, laijs, yasu.isimatu, isimatu.yasuaki,
kamezawa.hiroyu, izumi.taku, gongzhaogang, qiaonuohan
Cc: tangchen, x86, linux-acpi, linux-kernel, linux-mm
[Problem]
cpuid <-> nodeid mapping is firstly established at boot time. And workqueue caches
the mapping in wq_numa_possible_cpumask in wq_numa_init() at boot time.
When doing node online/offline, cpuid <-> nodeid mapping is established/destroyed,
which means, cpuid <-> nodeid mapping will change if node hotplug happens. But
workqueue does not update wq_numa_possible_cpumask.
So here is the problem:
Assume we have the following cpuid <-> nodeid in the beginning:
Node | CPU
------------------------
node 0 | 0-14, 60-74
node 1 | 15-29, 75-89
node 2 | 30-44, 90-104
node 3 | 45-59, 105-119
and we hot-remove node2 and node3, it becomes:
Node | CPU
------------------------
node 0 | 0-14, 60-74
node 1 | 15-29, 75-89
and we hot-add node4 and node5, it becomes:
Node | CPU
------------------------
node 0 | 0-14, 60-74
node 1 | 15-29, 75-89
node 4 | 30-59
node 5 | 90-119
But in wq_numa_possible_cpumask, cpu30 is still mapped to node2, and the like.
When a pool workqueue is initialized, if its cpumask belongs to a node, its
pool->node will be mapped to that node. And memory used by this workqueue will
also be allocated on that node.
static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs){
...
/* if cpumask is contained inside a NUMA node, we belong to that node */
if (wq_numa_enabled) {
for_each_node(node) {
if (cpumask_subset(pool->attrs->cpumask,
wq_numa_possible_cpumask[node])) {
pool->node = node;
break;
}
}
}
Since wq_numa_possible_cpumask is not updated, it could be mapped to an offline node,
which will lead to memory allocation failure:
SLUB: Unable to allocate memory on node 2 (gfp=0x80d0)
cache: kmalloc-192, object size: 192, buffer size: 192, default order: 1, min order: 0
node 0: slabs: 6172, objs: 259224, free: 245741
node 1: slabs: 3261, objs: 136962, free: 127656
It happens here:
create_worker(struct worker_pool *pool)
|--> worker = alloc_worker(pool->node);
static struct worker *alloc_worker(int node)
{
struct worker *worker;
worker = kzalloc_node(sizeof(*worker), GFP_KERNEL, node); --> Here, useing the wrong node.
......
return worker;
}
[Solution]
To fix this problem, we establish cpuid <-> nodeid mapping for all the possible
cpus at boot time, and make it invariable. And according to init_cpu_to_node(),
cpuid <-> nodeid mapping is based on apicid <-> nodeid mapping and cpuid <-> apicid
mapping. So the key point is obtaining all cpus' apicid.
apicid can be obtained by _MAT (Multiple APIC Table Entry) method or found in
MADT (Multiple APIC Description Table). So we finish the job in the following steps:
1. Enable apic registeration flow to handle both enabled and disabled cpus.
This is done by introducing an extra parameter to generic_processor_info to let the
caller control if disabled cpus are ignored.
2. Introduce a new array storing all possible cpuid <-> apicid mapping. And also modify
the way cpuid is calculated. Establish all possible cpuid <-> apicid mapping when
registering local apic. Store the mapping in the array introduced above.
4. Enable _MAT and MADT relative apis to return non-presnet or disabled cpus' apicid.
This is also done by introducing an extra parameter to these apis to let the caller
control if disabled cpus are ignored.
5. Establish all possible cpuid <-> nodeid mapping.
This is done via an additional acpi namespace walk for processors.
For previous discussion, please refer to:
https://lkml.org/lkml/2015/2/27/145
https://lkml.org/lkml/2015/3/25/989
https://lkml.org/lkml/2015/5/14/244
Gu Zheng (5):
x86, gfp: Cache best near node for memory allocation.
x86, acpi, cpu-hotplug: Enable acpi to register all possible cpus at
boot time.
x86, acpi, cpu-hotplug: Introduce apicid_to_cpuid[] array to store
persistent cpuid <-> apicid mapping.
x86, acpi, cpu-hotplug: Enable MADT APIs to return disabled apicid.
x86, acpi, cpu-hotplug: Set persistent cpuid <-> nodeid mapping when
booting.
arch/ia64/kernel/acpi.c | 2 +-
arch/x86/include/asm/mpspec.h | 1 +
arch/x86/include/asm/topology.h | 2 +
arch/x86/kernel/acpi/boot.c | 8 +--
arch/x86/kernel/apic/apic.c | 71 ++++++++++++++++++++---
arch/x86/mm/numa.c | 57 ++++++++++++-------
drivers/acpi/acpi_processor.c | 5 +-
drivers/acpi/bus.c | 3 +
drivers/acpi/processor_core.c | 122 +++++++++++++++++++++++++++++++++-------
include/linux/acpi.h | 2 +
include/linux/gfp.h | 12 +++-
11 files changed, 227 insertions(+), 58 deletions(-)
--
1.9.3
^ permalink raw reply [flat|nested] 9+ messages in thread* [PATCH 4/5] x86, acpi, cpu-hotplug: Enable MADT APIs to return disabled apicid.
2015-07-07 9:30 [PATCH 0/5] Make cpuid <-> nodeid mapping persistent Tang Chen
@ 2015-07-07 9:30 ` Tang Chen
2015-07-15 22:06 ` Tejun Heo
0 siblings, 1 reply; 9+ messages in thread
From: Tang Chen @ 2015-07-07 9:30 UTC (permalink / raw)
To: tj, mingo, akpm, rjw, hpa, laijs, yasu.isimatu, isimatu.yasuaki,
kamezawa.hiroyu, izumi.taku, gongzhaogang, qiaonuohan
Cc: tangchen, x86, linux-acpi, linux-kernel, linux-mm, Gu Zheng
From: Gu Zheng <guz.fnst@cn.fujitsu.com>
All processors' apicids can be obtained by _MAT method or from MADT in ACPI.
The current code ignores disabled processors and returns -ENODEV.
After this patch, a new parameter will be added to MADT APIs so that caller
is able to control if disabled processors are ignored.
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
---
drivers/acpi/acpi_processor.c | 5 +++-
drivers/acpi/processor_core.c | 57 +++++++++++++++++++++++++++----------------
2 files changed, 40 insertions(+), 22 deletions(-)
diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index 92a5f73..338c71a 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -282,8 +282,11 @@ static int acpi_processor_get_info(struct acpi_device *device)
* Extra Processor objects may be enumerated on MP systems with
* less than the max # of CPUs. They should be ignored _iff
* they are physically not present.
+ *
+ * NOTE: Even if the processor has a cpuid, it may not present because
+ * cpuid <-> apicid mapping is persistent now.
*/
- if (invalid_logical_cpuid(pr->id)) {
+ if (invalid_logical_cpuid(pr->id) || !cpu_present(pr->id)) {
int ret = acpi_processor_hotadd_init(pr);
if (ret)
return ret;
diff --git a/drivers/acpi/processor_core.c b/drivers/acpi/processor_core.c
index 33a38d6..824b98b 100644
--- a/drivers/acpi/processor_core.c
+++ b/drivers/acpi/processor_core.c
@@ -32,12 +32,12 @@ static struct acpi_table_madt *get_madt_table(void)
}
static int map_lapic_id(struct acpi_subtable_header *entry,
- u32 acpi_id, phys_cpuid_t *apic_id)
+ u32 acpi_id, phys_cpuid_t *apic_id, bool ignore_disabled)
{
struct acpi_madt_local_apic *lapic =
container_of(entry, struct acpi_madt_local_apic, header);
- if (!(lapic->lapic_flags & ACPI_MADT_ENABLED))
+ if (ignore_disabled && !(lapic->lapic_flags & ACPI_MADT_ENABLED))
return -ENODEV;
if (lapic->processor_id != acpi_id)
@@ -48,12 +48,13 @@ static int map_lapic_id(struct acpi_subtable_header *entry,
}
static int map_x2apic_id(struct acpi_subtable_header *entry,
- int device_declaration, u32 acpi_id, phys_cpuid_t *apic_id)
+ int device_declaration, u32 acpi_id, phys_cpuid_t *apic_id,
+ bool ignore_disabled)
{
struct acpi_madt_local_x2apic *apic =
container_of(entry, struct acpi_madt_local_x2apic, header);
- if (!(apic->lapic_flags & ACPI_MADT_ENABLED))
+ if (ignore_disabled && !(apic->lapic_flags & ACPI_MADT_ENABLED))
return -ENODEV;
if (device_declaration && (apic->uid == acpi_id)) {
@@ -65,12 +66,13 @@ static int map_x2apic_id(struct acpi_subtable_header *entry,
}
static int map_lsapic_id(struct acpi_subtable_header *entry,
- int device_declaration, u32 acpi_id, phys_cpuid_t *apic_id)
+ int device_declaration, u32 acpi_id, phys_cpuid_t *apic_id,
+ bool ignore_disabled)
{
struct acpi_madt_local_sapic *lsapic =
container_of(entry, struct acpi_madt_local_sapic, header);
- if (!(lsapic->lapic_flags & ACPI_MADT_ENABLED))
+ if (ignore_disabled && !(lsapic->lapic_flags & ACPI_MADT_ENABLED))
return -ENODEV;
if (device_declaration) {
@@ -87,12 +89,13 @@ static int map_lsapic_id(struct acpi_subtable_header *entry,
* Retrieve the ARM CPU physical identifier (MPIDR)
*/
static int map_gicc_mpidr(struct acpi_subtable_header *entry,
- int device_declaration, u32 acpi_id, phys_cpuid_t *mpidr)
+ int device_declaration, u32 acpi_id, phys_cpuid_t *mpidr,
+ bool ignore_disabled)
{
struct acpi_madt_generic_interrupt *gicc =
container_of(entry, struct acpi_madt_generic_interrupt, header);
- if (!(gicc->flags & ACPI_MADT_ENABLED))
+ if (ignore_disabled && !(gicc->flags & ACPI_MADT_ENABLED))
return -ENODEV;
/* device_declaration means Device object in DSDT, in the
@@ -108,7 +111,7 @@ static int map_gicc_mpidr(struct acpi_subtable_header *entry,
return -EINVAL;
}
-static phys_cpuid_t map_madt_entry(int type, u32 acpi_id)
+static phys_cpuid_t map_madt_entry(int type, u32 acpi_id, bool ignore_disabled)
{
unsigned long madt_end, entry;
phys_cpuid_t phys_id = PHYS_CPUID_INVALID; /* CPU hardware ID */
@@ -128,16 +131,20 @@ static phys_cpuid_t map_madt_entry(int type, u32 acpi_id)
struct acpi_subtable_header *header =
(struct acpi_subtable_header *)entry;
if (header->type == ACPI_MADT_TYPE_LOCAL_APIC) {
- if (!map_lapic_id(header, acpi_id, &phys_id))
+ if (!map_lapic_id(header, acpi_id, &phys_id,
+ ignore_disabled))
break;
} else if (header->type == ACPI_MADT_TYPE_LOCAL_X2APIC) {
- if (!map_x2apic_id(header, type, acpi_id, &phys_id))
+ if (!map_x2apic_id(header, type, acpi_id, &phys_id,
+ ignore_disabled))
break;
} else if (header->type == ACPI_MADT_TYPE_LOCAL_SAPIC) {
- if (!map_lsapic_id(header, type, acpi_id, &phys_id))
+ if (!map_lsapic_id(header, type, acpi_id, &phys_id,
+ ignore_disabled))
break;
} else if (header->type == ACPI_MADT_TYPE_GENERIC_INTERRUPT) {
- if (!map_gicc_mpidr(header, type, acpi_id, &phys_id))
+ if (!map_gicc_mpidr(header, type, acpi_id, &phys_id,
+ ignore_disabled))
break;
}
entry += header->length;
@@ -145,7 +152,8 @@ static phys_cpuid_t map_madt_entry(int type, u32 acpi_id)
return phys_id;
}
-static phys_cpuid_t map_mat_entry(acpi_handle handle, int type, u32 acpi_id)
+static phys_cpuid_t map_mat_entry(acpi_handle handle, int type, u32 acpi_id,
+ bool ignore_disabled)
{
struct acpi_buffer buffer = { ACPI_ALLOCATE_BUFFER, NULL };
union acpi_object *obj;
@@ -166,30 +174,37 @@ static phys_cpuid_t map_mat_entry(acpi_handle handle, int type, u32 acpi_id)
header = (struct acpi_subtable_header *)obj->buffer.pointer;
if (header->type == ACPI_MADT_TYPE_LOCAL_APIC)
- map_lapic_id(header, acpi_id, &phys_id);
+ map_lapic_id(header, acpi_id, &phys_id, ignore_disabled);
else if (header->type == ACPI_MADT_TYPE_LOCAL_SAPIC)
- map_lsapic_id(header, type, acpi_id, &phys_id);
+ map_lsapic_id(header, type, acpi_id, &phys_id, ignore_disabled);
else if (header->type == ACPI_MADT_TYPE_LOCAL_X2APIC)
- map_x2apic_id(header, type, acpi_id, &phys_id);
+ map_x2apic_id(header, type, acpi_id, &phys_id, ignore_disabled);
else if (header->type == ACPI_MADT_TYPE_GENERIC_INTERRUPT)
- map_gicc_mpidr(header, type, acpi_id, &phys_id);
+ map_gicc_mpidr(header, type, acpi_id, &phys_id,
+ ignore_disabled);
exit:
kfree(buffer.pointer);
return phys_id;
}
-phys_cpuid_t acpi_get_phys_id(acpi_handle handle, int type, u32 acpi_id)
+static phys_cpuid_t __acpi_get_phys_id(acpi_handle handle, int type,
+ u32 acpi_id, bool ignore_disabled)
{
phys_cpuid_t phys_id;
- phys_id = map_mat_entry(handle, type, acpi_id);
+ phys_id = map_mat_entry(handle, type, acpi_id, ignore_disabled);
if (invalid_phys_cpuid(phys_id))
- phys_id = map_madt_entry(type, acpi_id);
+ phys_id = map_madt_entry(type, acpi_id, ignore_disabled);
return phys_id;
}
+phys_cpuid_t acpi_get_phys_id(acpi_handle handle, int type, u32 acpi_id)
+{
+ return __acpi_get_phys_id(handle, type, acpi_id, true);
+}
+
int acpi_map_cpuid(phys_cpuid_t phys_id, u32 acpi_id)
{
#ifdef CONFIG_SMP
--
1.9.3
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH 4/5] x86, acpi, cpu-hotplug: Enable MADT APIs to return disabled apicid.
2015-07-07 9:30 ` [PATCH 4/5] x86, acpi, cpu-hotplug: Enable MADT APIs to return disabled apicid Tang Chen
@ 2015-07-15 22:06 ` Tejun Heo
0 siblings, 0 replies; 9+ messages in thread
From: Tejun Heo @ 2015-07-15 22:06 UTC (permalink / raw)
To: Tang Chen
Cc: mingo, akpm, rjw, hpa, laijs, yasu.isimatu, isimatu.yasuaki,
kamezawa.hiroyu, izumi.taku, gongzhaogang, qiaonuohan, x86,
linux-acpi, linux-kernel, linux-mm, Gu Zheng
Hello,
On Tue, Jul 07, 2015 at 05:30:24PM +0800, Tang Chen wrote:
> From: Gu Zheng <guz.fnst@cn.fujitsu.com>
>
> All processors' apicids can be obtained by _MAT method or from MADT in ACPI.
> The current code ignores disabled processors and returns -ENODEV.
>
> After this patch, a new parameter will be added to MADT APIs so that caller
> is able to control if disabled processors are ignored.
This describes what the patch does but doesn't really explain what the
patch is trying to achieve.
> @@ -282,8 +282,11 @@ static int acpi_processor_get_info(struct acpi_device *device)
> * Extra Processor objects may be enumerated on MP systems with
> * less than the max # of CPUs. They should be ignored _iff
> * they are physically not present.
> + *
> + * NOTE: Even if the processor has a cpuid, it may not present because
^
be
> + * cpuid <-> apicid mapping is persistent now.
Saying "now" is kinda weird as this is how the code is gonna be
forever.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v4 0/5] Make cpuid <-> nodeid mapping persistent.
@ 2016-01-07 4:20 Tang Chen
2016-01-07 4:20 ` [PATCH 4/5] x86, acpi, cpu-hotplug: Enable MADT APIs to return disabled apicid Tang Chen
0 siblings, 1 reply; 9+ messages in thread
From: Tang Chen @ 2016-01-07 4:20 UTC (permalink / raw)
To: cl, tj, jiang.liu, mika.j.penttila, mingo, akpm, rjw, hpa,
yasu.isimatu, isimatu.yasuaki, kamezawa.hiroyu, izumi.taku,
gongzhaogang
Cc: tangchen, x86, linux-acpi, linux-kernel, linux-mm
[Problem]
cpuid <-> nodeid mapping is firstly established at boot time. And workqueue caches
the mapping in wq_numa_possible_cpumask in wq_numa_init() at boot time.
When doing node online/offline, cpuid <-> nodeid mapping is established/destroyed,
which means, cpuid <-> nodeid mapping will change if node hotplug happens. But
workqueue does not update wq_numa_possible_cpumask.
So here is the problem:
Assume we have the following cpuid <-> nodeid in the beginning:
Node | CPU
------------------------
node 0 | 0-14, 60-74
node 1 | 15-29, 75-89
node 2 | 30-44, 90-104
node 3 | 45-59, 105-119
and we hot-remove node2 and node3, it becomes:
Node | CPU
------------------------
node 0 | 0-14, 60-74
node 1 | 15-29, 75-89
and we hot-add node4 and node5, it becomes:
Node | CPU
------------------------
node 0 | 0-14, 60-74
node 1 | 15-29, 75-89
node 4 | 30-59
node 5 | 90-119
But in wq_numa_possible_cpumask, cpu30 is still mapped to node2, and the like.
When a pool workqueue is initialized, if its cpumask belongs to a node, its
pool->node will be mapped to that node. And memory used by this workqueue will
also be allocated on that node.
static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs){
...
/* if cpumask is contained inside a NUMA node, we belong to that node */
if (wq_numa_enabled) {
for_each_node(node) {
if (cpumask_subset(pool->attrs->cpumask,
wq_numa_possible_cpumask[node])) {
pool->node = node;
break;
}
}
}
Since wq_numa_possible_cpumask is not updated, it could be mapped to an offline node,
which will lead to memory allocation failure:
SLUB: Unable to allocate memory on node 2 (gfp=0x80d0)
cache: kmalloc-192, object size: 192, buffer size: 192, default order: 1, min order: 0
node 0: slabs: 6172, objs: 259224, free: 245741
node 1: slabs: 3261, objs: 136962, free: 127656
It happens here:
create_worker(struct worker_pool *pool)
|--> worker = alloc_worker(pool->node);
static struct worker *alloc_worker(int node)
{
struct worker *worker;
worker = kzalloc_node(sizeof(*worker), GFP_KERNEL, node); --> Here, useing the wrong node.
......
return worker;
}
[Solution]
There are four mappings in the kernel:
1. nodeid (logical node id) <-> pxm
2. apicid (physical cpu id) <-> nodeid
3. cpuid (logical cpu id) <-> apicid
4. cpuid (logical cpu id) <-> nodeid
1. pxm (proximity domain) is provided by ACPI firmware in SRAT, and nodeid <-> pxm
mapping is setup at boot time. This mapping is persistent, won't change.
2. apicid <-> nodeid mapping is setup using info in 1. The mapping is setup at boot
time and CPU hotadd time, and cleared at CPU hotremove time. This mapping is also
persistent.
3. cpuid <-> apicid mapping is setup at boot time and CPU hotadd time. cpuid is
allocated, lower ids first, and released at CPU hotremove time, reused for other
hotadded CPUs. So this mapping is not persistent.
4. cpuid <-> nodeid mapping is also setup at boot time and CPU hotadd time, and
cleared at CPU hotremove time. As a result of 3, this mapping is not persistent.
To fix this problem, we establish cpuid <-> nodeid mapping for all the possible
cpus at boot time, and make it persistent. And according to init_cpu_to_node(),
cpuid <-> nodeid mapping is based on apicid <-> nodeid mapping and cpuid <-> apicid
mapping. So the key point is obtaining all cpus' apicid.
apicid can be obtained by _MAT (Multiple APIC Table Entry) method or found in
MADT (Multiple APIC Description Table). So we finish the job in the following steps:
1. Enable apic registeration flow to handle both enabled and disabled cpus.
This is done by introducing an extra parameter to generic_processor_info to let the
caller control if disabled cpus are ignored.
2. Introduce a new array storing all possible cpuid <-> apicid mapping. And also modify
the way cpuid is calculated. Establish all possible cpuid <-> apicid mapping when
registering local apic. Store the mapping in this array.
3. Enable _MAT and MADT relative apis to return non-presnet or disabled cpus' apicid.
This is also done by introducing an extra parameter to these apis to let the caller
control if disabled cpus are ignored.
4. Establish all possible cpuid <-> nodeid mapping.
This is done via an additional acpi namespace walk for processors.
For previous discussion, please refer to:
https://lkml.org/lkml/2015/2/27/145
https://lkml.org/lkml/2015/3/25/989
https://lkml.org/lkml/2015/5/14/244
https://lkml.org/lkml/2015/7/7/200
https://lkml.org/lkml/2015/9/27/209
Change log v3 -> v4:
1. Fix the kernel panic at boot time. The cause is that I tried to build zonelists
before per cpu areas were initialized.
Change log v2 -> v3:
1. Online memory-less nodes at boot time to map cpus of memory-less nodes.
2. Build zonelists for memory-less nodes so that memory allocator will fall
back to proper nodes automatically.
Change log v1 -> v2:
1. Split code movement and actual changes. Add patch 1.
2. Synchronize best near online node record when node hotplug happens. In patch 2.
3. Fix some comment.
Gu Zheng (4):
x86, acpi, cpu-hotplug: Enable acpi to register all possible cpus at
boot time.
x86, acpi, cpu-hotplug: Introduce cpuid_to_apicid[] array to store
persistent cpuid <-> apicid mapping.
x86, acpi, cpu-hotplug: Enable MADT APIs to return disabled apicid.
x86, acpi, cpu-hotplug: Set persistent cpuid <-> nodeid mapping when
booting.
Tang Chen (1):
x86, memhp, numa: Online memory-less nodes at boot time.
arch/ia64/kernel/acpi.c | 2 +-
arch/x86/include/asm/mpspec.h | 1 +
arch/x86/kernel/acpi/boot.c | 8 ++-
arch/x86/kernel/apic/apic.c | 85 +++++++++++++++++++++++++----
arch/x86/mm/numa.c | 27 +++++-----
drivers/acpi/acpi_processor.c | 5 +-
drivers/acpi/bus.c | 3 ++
drivers/acpi/processor_core.c | 122 ++++++++++++++++++++++++++++++++++--------
include/linux/acpi.h | 2 +
include/linux/mmzone.h | 1 +
mm/page_alloc.c | 2 +-
11 files changed, 206 insertions(+), 52 deletions(-)
--
1.9.3
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 9+ messages in thread* [PATCH 4/5] x86, acpi, cpu-hotplug: Enable MADT APIs to return disabled apicid.
2016-01-07 4:20 [PATCH v4 0/5] Make cpuid <-> nodeid mapping persistent Tang Chen
@ 2016-01-07 4:20 ` Tang Chen
0 siblings, 0 replies; 9+ messages in thread
From: Tang Chen @ 2016-01-07 4:20 UTC (permalink / raw)
To: cl, tj, jiang.liu, mika.j.penttila, mingo, akpm, rjw, hpa,
yasu.isimatu, isimatu.yasuaki, kamezawa.hiroyu, izumi.taku,
gongzhaogang
Cc: tangchen, x86, linux-acpi, linux-kernel, linux-mm, Gu Zheng
From: Gu Zheng <guz.fnst@cn.fujitsu.com>
This patch finishes step 3.
There are four mappings in the kernel:
1. nodeid (logical node id) <-> pxm
2. apicid (physical cpu id) <-> nodeid
3. cpuid (logical cpu id) <-> apicid
4. cpuid (logical cpu id) <-> nodeid
1. pxm (proximity domain) is provided by ACPI firmware in SRAT, and nodeid <-> pxm
mapping is setup at boot time. This mapping is persistent, won't change.
2. apicid <-> nodeid mapping is setup using info in 1. The mapping is setup at boot
time and CPU hotadd time, and cleared at CPU hotremove time. This mapping is also
persistent.
3. cpuid <-> apicid mapping is setup at boot time and CPU hotadd time. cpuid is
allocated, lower ids first, and released at CPU hotremove time, reused for other
hotadded CPUs. So this mapping is not persistent.
4. cpuid <-> nodeid mapping is also setup at boot time and CPU hotadd time, and
cleared at CPU hotremove time. As a result of 3, this mapping is not persistent.
So, in order to setup persistent cpuid <-> nodeid mapping for all possible CPUs,
we should:
1. Setup cpuid <-> apicid mapping for all possible CPUs, which has been done in step 1.
2. Setup cpuid <-> nodeid mapping for all possible CPUs. But before that, we should
obtain all apicids from MADT.
All processors' apicids can be obtained by _MAT method or from MADT in ACPI.
The current code ignores disabled processors and returns -ENODEV.
After this patch, a new parameter will be added to MADT APIs so that caller
is able to control if disabled processors are ignored.
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
---
drivers/acpi/acpi_processor.c | 5 +++-
drivers/acpi/processor_core.c | 57 +++++++++++++++++++++++++++----------------
2 files changed, 40 insertions(+), 22 deletions(-)
diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
index 6979186..d30111a 100644
--- a/drivers/acpi/acpi_processor.c
+++ b/drivers/acpi/acpi_processor.c
@@ -300,8 +300,11 @@ static int acpi_processor_get_info(struct acpi_device *device)
* Extra Processor objects may be enumerated on MP systems with
* less than the max # of CPUs. They should be ignored _iff
* they are physically not present.
+ *
+ * NOTE: Even if the processor has a cpuid, it may not present because
+ * cpuid <-> apicid mapping is persistent now.
*/
- if (invalid_logical_cpuid(pr->id)) {
+ if (invalid_logical_cpuid(pr->id) || !cpu_present(pr->id)) {
int ret = acpi_processor_hotadd_init(pr);
if (ret)
return ret;
diff --git a/drivers/acpi/processor_core.c b/drivers/acpi/processor_core.c
index 33a38d6..824b98b 100644
--- a/drivers/acpi/processor_core.c
+++ b/drivers/acpi/processor_core.c
@@ -32,12 +32,12 @@ static struct acpi_table_madt *get_madt_table(void)
}
static int map_lapic_id(struct acpi_subtable_header *entry,
- u32 acpi_id, phys_cpuid_t *apic_id)
+ u32 acpi_id, phys_cpuid_t *apic_id, bool ignore_disabled)
{
struct acpi_madt_local_apic *lapic =
container_of(entry, struct acpi_madt_local_apic, header);
- if (!(lapic->lapic_flags & ACPI_MADT_ENABLED))
+ if (ignore_disabled && !(lapic->lapic_flags & ACPI_MADT_ENABLED))
return -ENODEV;
if (lapic->processor_id != acpi_id)
@@ -48,12 +48,13 @@ static int map_lapic_id(struct acpi_subtable_header *entry,
}
static int map_x2apic_id(struct acpi_subtable_header *entry,
- int device_declaration, u32 acpi_id, phys_cpuid_t *apic_id)
+ int device_declaration, u32 acpi_id, phys_cpuid_t *apic_id,
+ bool ignore_disabled)
{
struct acpi_madt_local_x2apic *apic =
container_of(entry, struct acpi_madt_local_x2apic, header);
- if (!(apic->lapic_flags & ACPI_MADT_ENABLED))
+ if (ignore_disabled && !(apic->lapic_flags & ACPI_MADT_ENABLED))
return -ENODEV;
if (device_declaration && (apic->uid == acpi_id)) {
@@ -65,12 +66,13 @@ static int map_x2apic_id(struct acpi_subtable_header *entry,
}
static int map_lsapic_id(struct acpi_subtable_header *entry,
- int device_declaration, u32 acpi_id, phys_cpuid_t *apic_id)
+ int device_declaration, u32 acpi_id, phys_cpuid_t *apic_id,
+ bool ignore_disabled)
{
struct acpi_madt_local_sapic *lsapic =
container_of(entry, struct acpi_madt_local_sapic, header);
- if (!(lsapic->lapic_flags & ACPI_MADT_ENABLED))
+ if (ignore_disabled && !(lsapic->lapic_flags & ACPI_MADT_ENABLED))
return -ENODEV;
if (device_declaration) {
@@ -87,12 +89,13 @@ static int map_lsapic_id(struct acpi_subtable_header *entry,
* Retrieve the ARM CPU physical identifier (MPIDR)
*/
static int map_gicc_mpidr(struct acpi_subtable_header *entry,
- int device_declaration, u32 acpi_id, phys_cpuid_t *mpidr)
+ int device_declaration, u32 acpi_id, phys_cpuid_t *mpidr,
+ bool ignore_disabled)
{
struct acpi_madt_generic_interrupt *gicc =
container_of(entry, struct acpi_madt_generic_interrupt, header);
- if (!(gicc->flags & ACPI_MADT_ENABLED))
+ if (ignore_disabled && !(gicc->flags & ACPI_MADT_ENABLED))
return -ENODEV;
/* device_declaration means Device object in DSDT, in the
@@ -108,7 +111,7 @@ static int map_gicc_mpidr(struct acpi_subtable_header *entry,
return -EINVAL;
}
-static phys_cpuid_t map_madt_entry(int type, u32 acpi_id)
+static phys_cpuid_t map_madt_entry(int type, u32 acpi_id, bool ignore_disabled)
{
unsigned long madt_end, entry;
phys_cpuid_t phys_id = PHYS_CPUID_INVALID; /* CPU hardware ID */
@@ -128,16 +131,20 @@ static phys_cpuid_t map_madt_entry(int type, u32 acpi_id)
struct acpi_subtable_header *header =
(struct acpi_subtable_header *)entry;
if (header->type == ACPI_MADT_TYPE_LOCAL_APIC) {
- if (!map_lapic_id(header, acpi_id, &phys_id))
+ if (!map_lapic_id(header, acpi_id, &phys_id,
+ ignore_disabled))
break;
} else if (header->type == ACPI_MADT_TYPE_LOCAL_X2APIC) {
- if (!map_x2apic_id(header, type, acpi_id, &phys_id))
+ if (!map_x2apic_id(header, type, acpi_id, &phys_id,
+ ignore_disabled))
break;
} else if (header->type == ACPI_MADT_TYPE_LOCAL_SAPIC) {
- if (!map_lsapic_id(header, type, acpi_id, &phys_id))
+ if (!map_lsapic_id(header, type, acpi_id, &phys_id,
+ ignore_disabled))
break;
} else if (header->type == ACPI_MADT_TYPE_GENERIC_INTERRUPT) {
- if (!map_gicc_mpidr(header, type, acpi_id, &phys_id))
+ if (!map_gicc_mpidr(header, type, acpi_id, &phys_id,
+ ignore_disabled))
break;
}
entry += header->length;
@@ -145,7 +152,8 @@ static phys_cpuid_t map_madt_entry(int type, u32 acpi_id)
return phys_id;
}
-static phys_cpuid_t map_mat_entry(acpi_handle handle, int type, u32 acpi_id)
+static phys_cpuid_t map_mat_entry(acpi_handle handle, int type, u32 acpi_id,
+ bool ignore_disabled)
{
struct acpi_buffer buffer = { ACPI_ALLOCATE_BUFFER, NULL };
union acpi_object *obj;
@@ -166,30 +174,37 @@ static phys_cpuid_t map_mat_entry(acpi_handle handle, int type, u32 acpi_id)
header = (struct acpi_subtable_header *)obj->buffer.pointer;
if (header->type == ACPI_MADT_TYPE_LOCAL_APIC)
- map_lapic_id(header, acpi_id, &phys_id);
+ map_lapic_id(header, acpi_id, &phys_id, ignore_disabled);
else if (header->type == ACPI_MADT_TYPE_LOCAL_SAPIC)
- map_lsapic_id(header, type, acpi_id, &phys_id);
+ map_lsapic_id(header, type, acpi_id, &phys_id, ignore_disabled);
else if (header->type == ACPI_MADT_TYPE_LOCAL_X2APIC)
- map_x2apic_id(header, type, acpi_id, &phys_id);
+ map_x2apic_id(header, type, acpi_id, &phys_id, ignore_disabled);
else if (header->type == ACPI_MADT_TYPE_GENERIC_INTERRUPT)
- map_gicc_mpidr(header, type, acpi_id, &phys_id);
+ map_gicc_mpidr(header, type, acpi_id, &phys_id,
+ ignore_disabled);
exit:
kfree(buffer.pointer);
return phys_id;
}
-phys_cpuid_t acpi_get_phys_id(acpi_handle handle, int type, u32 acpi_id)
+static phys_cpuid_t __acpi_get_phys_id(acpi_handle handle, int type,
+ u32 acpi_id, bool ignore_disabled)
{
phys_cpuid_t phys_id;
- phys_id = map_mat_entry(handle, type, acpi_id);
+ phys_id = map_mat_entry(handle, type, acpi_id, ignore_disabled);
if (invalid_phys_cpuid(phys_id))
- phys_id = map_madt_entry(type, acpi_id);
+ phys_id = map_madt_entry(type, acpi_id, ignore_disabled);
return phys_id;
}
+phys_cpuid_t acpi_get_phys_id(acpi_handle handle, int type, u32 acpi_id)
+{
+ return __acpi_get_phys_id(handle, type, acpi_id, true);
+}
+
int acpi_map_cpuid(phys_cpuid_t phys_id, u32 acpi_id)
{
#ifdef CONFIG_SMP
--
1.9.3
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 9+ messages in thread
end of thread, other threads:[~2016-01-07 4:20 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-07-01 4:45 [PATCH 0/5] Make cpuid <-> nodeid mapping persistent Tang Chen
2015-07-01 4:45 ` [PATCH 1/5] x86, gfp: Cache best near node for memory allocation Tang Chen
2015-07-01 4:45 ` [PATCH 2/5] x86, acpi, cpu-hotplug: Enable acpi to register all possible cpus at boot time Tang Chen
2015-07-01 4:45 ` [PATCH 3/5] x86, acpi, cpu-hotplug: Introduce apicid_to_cpuid[] array to store persistent cpuid <-> apicid mapping Tang Chen
2015-07-01 4:45 ` [PATCH 4/5] x86, acpi, cpu-hotplug: Enable MADT APIs to return disabled apicid Tang Chen
2015-07-01 4:45 ` [PATCH 5/5] x86, acpi, cpu-hotplug: Set persistent cpuid <-> nodeid mapping when booting Tang Chen
-- strict thread matches above, loose matches on Subject: below --
2015-07-07 9:30 [PATCH 0/5] Make cpuid <-> nodeid mapping persistent Tang Chen
2015-07-07 9:30 ` [PATCH 4/5] x86, acpi, cpu-hotplug: Enable MADT APIs to return disabled apicid Tang Chen
2015-07-15 22:06 ` Tejun Heo
2016-01-07 4:20 [PATCH v4 0/5] Make cpuid <-> nodeid mapping persistent Tang Chen
2016-01-07 4:20 ` [PATCH 4/5] x86, acpi, cpu-hotplug: Enable MADT APIs to return disabled apicid Tang Chen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).