* [PATCHv7 0/4] enable nr_cpus for powerpc
@ 2023-09-25 7:53 Pingfan Liu
2023-09-25 7:53 ` [PATCHv7 1/4] powerpc/setup : Enable boot_cpu_hwid for PPC32 Pingfan Liu
` (3 more replies)
0 siblings, 4 replies; 10+ messages in thread
From: Pingfan Liu @ 2023-09-25 7:53 UTC (permalink / raw)
To: linuxppc-dev
Cc: Baoquan He, Pingfan Liu, Mahesh Salgaonkar, kexec, Ming Lei,
Nicholas Piggin, Wen Xiong
Since my last v4 [1], the code has undergone great changes. The paca[]
array has been reorganized and indexed by paca_ptrs[], which
dramatically decreases the memory consumption even if there are many
unpresent cpus in the middle.
However, reordering the logical cpu numbers can further decrease the
size of paca_ptrs[] in the kdump case. So I keep [2/4], which
rotate-shifts the cpu's sequence number in the device tree to obtain the
logical cpu id.
Patch [3-4/4] make efforts to decrease the nr_cpus to be less than or
equal to two.
[1]: https://lore.kernel.org/linuxppc-dev/1520829790-14029-1-git-send-email-kernelfans@gmail.com/
---
v6 -> v7
Add [1/4], which fixes compilation error on PPC32
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Mahesh Salgaonkar <mahesh@us.ibm.com>
Cc: Wen Xiong <wenxiong@us.ibm.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: kexec@lists.infradead.org
To: linuxppc-dev@lists.ozlabs.org
Pingfan Liu (4):
powerpc/setup : Enable boot_cpu_hwid for PPC32
powerpc/setup: Loosen the mapping between cpu logical id and its seq
in dt
powerpc/setup: Handle the case when boot_cpuid greater than nr_cpus
powerpc/setup: alloc extra paca_ptrs to hold boot_cpuid
arch/powerpc/include/asm/smp.h | 2 +-
arch/powerpc/kernel/paca.c | 10 +--
arch/powerpc/kernel/prom.c | 29 +++++---
arch/powerpc/kernel/setup-common.c | 108 +++++++++++++++++++++++------
4 files changed, 114 insertions(+), 35 deletions(-)
--
2.31.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCHv7 1/4] powerpc/setup : Enable boot_cpu_hwid for PPC32
2023-09-25 7:53 [PATCHv7 0/4] enable nr_cpus for powerpc Pingfan Liu
@ 2023-09-25 7:53 ` Pingfan Liu
2023-09-25 7:53 ` [PATCHv7 2/4] powerpc/setup: Loosen the mapping between cpu logical id and its seq in dt Pingfan Liu
` (2 subsequent siblings)
3 siblings, 0 replies; 10+ messages in thread
From: Pingfan Liu @ 2023-09-25 7:53 UTC (permalink / raw)
To: linuxppc-dev
Cc: Baoquan He, Pingfan Liu, kexec, Mahesh Salgaonkar, Ming Lei,
kernel test robot, Nicholas Piggin, Wen Xiong
In order to identify the boot cpu, its intserv[] should be recorded and
checked in smp_setup_cpu_maps().
smp_setup_cpu_maps() is shared between PPC64 and PPC32. Since PPC64 has
already used boot_cpu_hwid to carry that information, enabling this
variable on PPC32 so later it can also be used to carry that information
for PPC32 in the coming patch.
Signed-off-by: Pingfan Liu <piliu@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Cc: Wen Xiong <wenxiong@us.ibm.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: kexec@lists.infradead.org
To: linuxppc-dev@lists.ozlabs.org
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202309130232.N2REwHBv-lkp@intel.com/
---
arch/powerpc/include/asm/smp.h | 2 +-
arch/powerpc/kernel/prom.c | 3 +--
arch/powerpc/kernel/setup-common.c | 2 --
3 files changed, 2 insertions(+), 5 deletions(-)
diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
index aaaa576d0e15..5db9178cc800 100644
--- a/arch/powerpc/include/asm/smp.h
+++ b/arch/powerpc/include/asm/smp.h
@@ -26,7 +26,7 @@
#include <asm/percpu.h>
extern int boot_cpuid;
-extern int boot_cpu_hwid; /* PPC64 only */
+extern int boot_cpu_hwid;
extern int spinning_secondaries;
extern u32 *cpu_to_phys_id;
extern bool coregroup_enabled;
diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index 0b5878c3125b..ec82f5bda908 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -372,8 +372,7 @@ static int __init early_init_dt_scan_cpus(unsigned long node,
be32_to_cpu(intserv[found_thread]));
boot_cpuid = found;
- if (IS_ENABLED(CONFIG_PPC64))
- boot_cpu_hwid = be32_to_cpu(intserv[found_thread]);
+ boot_cpu_hwid = be32_to_cpu(intserv[found_thread]);
/*
* PAPR defines "logical" PVR values for cpus that
diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c
index d2a446216444..1b19a9815672 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -87,9 +87,7 @@ EXPORT_SYMBOL(machine_id);
int boot_cpuid = -1;
EXPORT_SYMBOL_GPL(boot_cpuid);
-#ifdef CONFIG_PPC64
int boot_cpu_hwid = -1;
-#endif
/*
* These are used in binfmt_elf.c to put aux entries on the stack
--
2.31.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCHv7 2/4] powerpc/setup: Loosen the mapping between cpu logical id and its seq in dt
2023-09-25 7:53 [PATCHv7 0/4] enable nr_cpus for powerpc Pingfan Liu
2023-09-25 7:53 ` [PATCHv7 1/4] powerpc/setup : Enable boot_cpu_hwid for PPC32 Pingfan Liu
@ 2023-09-25 7:53 ` Pingfan Liu
2023-09-28 20:36 ` Wen Xiong
2023-09-25 7:53 ` [PATCHv7 3/4] powerpc/setup: Handle the case when boot_cpuid greater than nr_cpus Pingfan Liu
2023-09-25 7:53 ` [PATCHv7 4/4] powerpc/setup: alloc extra paca_ptrs to hold boot_cpuid Pingfan Liu
3 siblings, 1 reply; 10+ messages in thread
From: Pingfan Liu @ 2023-09-25 7:53 UTC (permalink / raw)
To: linuxppc-dev
Cc: Baoquan He, Pingfan Liu, kexec, Mahesh Salgaonkar, Ming Lei,
Nicholas Piggin, Wen Xiong
*** Idea ***
For kexec -p, the boot cpu can be not the cpu0, this causes the problem
of allocating memory for paca_ptrs[]. However, in theory, there is no
requirement to assign cpu's logical id as its present sequence in the
device tree. But there is something like cpu_first_thread_sibling(),
which makes assumption on the mapping inside a core. Hence partially
loosening the mapping, i.e. unbind the mapping of core while keep the
mapping inside a core.
*** Implement ***
At this early stage, there are plenty of memory to utilize. Hence, this
patch allocates interim memory to link the cpu info on a list, then
reorder cpus by changing the list head. As a result, there is a rotate
shift between the sequence number in dt and the cpu logical number.
*** Result ***
After this patch, a boot-cpu's logical id will always be mapped into the
range [0,threads_per_core).
Besides this, at this phase, all threads in the boot core are forced to
be onlined. This restriction will be lifted in a later patch with
extra effort.
Signed-off-by: Pingfan Liu <piliu@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Cc: Wen Xiong <wenxiong@us.ibm.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: kexec@lists.infradead.org
To: linuxppc-dev@lists.ozlabs.org
---
arch/powerpc/kernel/prom.c | 25 +++++----
arch/powerpc/kernel/setup-common.c | 87 +++++++++++++++++++++++-------
2 files changed, 85 insertions(+), 27 deletions(-)
diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index ec82f5bda908..87272a2d8c10 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -76,7 +76,9 @@ u64 ppc64_rma_size;
unsigned int boot_cpu_node_count __ro_after_init;
#endif
static phys_addr_t first_memblock_size;
+#ifdef CONFIG_SMP
static int __initdata boot_cpu_count;
+#endif
static int __init early_parse_mem(char *p)
{
@@ -331,8 +333,7 @@ static int __init early_init_dt_scan_cpus(unsigned long node,
const __be32 *intserv;
int i, nthreads;
int len;
- int found = -1;
- int found_thread = 0;
+ bool found = false;
/* We are scanning "cpu" nodes only */
if (type == NULL || strcmp(type, "cpu") != 0)
@@ -355,8 +356,15 @@ static int __init early_init_dt_scan_cpus(unsigned long node,
for (i = 0; i < nthreads; i++) {
if (be32_to_cpu(intserv[i]) ==
fdt_boot_cpuid_phys(initial_boot_params)) {
- found = boot_cpu_count;
- found_thread = i;
+ /*
+ * always map the boot-cpu logical id into the
+ * range of [0, thread_per_core)
+ */
+ boot_cpuid = i;
+ found = true;
+ /* This works around the hole in paca_ptrs[]. */
+ if (nr_cpu_ids < nthreads)
+ set_nr_cpu_ids(nthreads);
}
#ifdef CONFIG_SMP
/* logical cpu id is always 0 on UP kernels */
@@ -365,14 +373,13 @@ static int __init early_init_dt_scan_cpus(unsigned long node,
}
/* Not the boot CPU */
- if (found < 0)
+ if (!found)
return 0;
- DBG("boot cpu: logical %d physical %d\n", found,
- be32_to_cpu(intserv[found_thread]));
- boot_cpuid = found;
+ DBG("boot cpu: logical %d physical %d\n", boot_cpuid,
+ be32_to_cpu(intserv[boot_cpuid]));
- boot_cpu_hwid = be32_to_cpu(intserv[found_thread]);
+ boot_cpu_hwid = be32_to_cpu(intserv[boot_cpuid]);
/*
* PAPR defines "logical" PVR values for cpus that
diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c
index 1b19a9815672..f6d32324b5a5 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -36,6 +36,7 @@
#include <linux/of_platform.h>
#include <linux/hugetlb.h>
#include <linux/pgtable.h>
+#include <linux/list.h>
#include <asm/io.h>
#include <asm/paca.h>
#include <asm/processor.h>
@@ -425,6 +426,13 @@ static void __init cpu_init_thread_core_maps(int tpc)
u32 *cpu_to_phys_id = NULL;
+struct interrupt_server_node {
+ struct list_head node;
+ bool avail;
+ int len;
+ __be32 *intserv;
+};
+
/**
* setup_cpu_maps - initialize the following cpu maps:
* cpu_possible_mask
@@ -446,11 +454,16 @@ u32 *cpu_to_phys_id = NULL;
void __init smp_setup_cpu_maps(void)
{
struct device_node *dn;
- int cpu = 0;
- int nthreads = 1;
+ int shift = 0, cpu = 0;
+ int j, nthreads = 1;
+ int len;
+ struct interrupt_server_node *intserv_node, *n;
+ struct list_head *bt_node, head;
+ bool avail, found_boot_cpu = false;
DBG("smp_setup_cpu_maps()\n");
+ INIT_LIST_HEAD(&head);
cpu_to_phys_id = memblock_alloc(nr_cpu_ids * sizeof(u32),
__alignof__(u32));
if (!cpu_to_phys_id)
@@ -460,7 +473,6 @@ void __init smp_setup_cpu_maps(void)
for_each_node_by_type(dn, "cpu") {
const __be32 *intserv;
__be32 cpu_be;
- int j, len;
DBG(" * %pOF...\n", dn);
@@ -480,29 +492,68 @@ void __init smp_setup_cpu_maps(void)
}
}
- nthreads = len / sizeof(int);
-
- for (j = 0; j < nthreads && cpu < nr_cpu_ids; j++) {
- bool avail;
+ avail = of_device_is_available(dn);
+ if (!avail)
+ avail = !of_property_match_string(dn,
+ "enable-method", "spin-table");
- DBG(" thread %d -> cpu %d (hard id %d)\n",
- j, cpu, be32_to_cpu(intserv[j]));
- avail = of_device_is_available(dn);
- if (!avail)
- avail = !of_property_match_string(dn,
- "enable-method", "spin-table");
+ intserv_node = memblock_alloc(sizeof(struct interrupt_server_node) + len,
+ __alignof__(u32));
+ if (!intserv_node)
+ panic("%s: Failed to allocate %zu bytes align=0x%zx\n",
+ __func__,
+ sizeof(struct interrupt_server_node) + len,
+ __alignof__(u32));
+ intserv_node->intserv = (__be32 *)((char *)intserv_node +
+ sizeof(struct interrupt_server_node));
+ intserv_node->len = len;
+ memcpy(intserv_node->intserv, intserv, len);
+ intserv_node->avail = avail;
+ INIT_LIST_HEAD(&intserv_node->node);
+ list_add_tail(&intserv_node->node, &head);
+
+ if (!found_boot_cpu) {
+ nthreads = len / sizeof(int);
+ for (j = 0 ; j < nthreads; j++) {
+ if (be32_to_cpu(intserv[j]) == boot_cpu_hwid) {
+ bt_node = &intserv_node->node;
+ found_boot_cpu = true;
+ /*
+ * Record the round-shift between dt
+ * seq and cpu logical number
+ */
+ shift = cpu - j;
+ break;
+ }
+
+ cpu++;
+ }
+ }
+ }
+ cpu = 0;
+ list_del_init(&head);
+ /* Select the primary thread, the boot cpu's slibing, as the logic 0 */
+ list_add_tail(&head, bt_node);
+ pr_info("the round shift between dt seq and the cpu logic number: %d\n", shift);
+ list_for_each_entry(intserv_node, &head, node) {
+
+ avail = intserv_node->avail;
+ nthreads = intserv_node->len / sizeof(int);
+ for (j = 0; j < nthreads && cpu < nr_cpu_ids; j++) {
set_cpu_present(cpu, avail);
set_cpu_possible(cpu, true);
- cpu_to_phys_id[cpu] = be32_to_cpu(intserv[j]);
+ cpu_to_phys_id[cpu] = be32_to_cpu(intserv_node->intserv[j]);
+ DBG(" thread %d -> cpu %d (hard id %d)\n",
+ j, cpu, be32_to_cpu(intserv[j]));
cpu++;
}
+ }
- if (cpu >= nr_cpu_ids) {
- of_node_put(dn);
- break;
- }
+ list_for_each_entry_safe(intserv_node, n, &head, node) {
+ len = sizeof(struct interrupt_server_node) + intserv_node->len;
+ memblock_free(intserv_node, len);
}
/* If no SMT supported, nthreads is forced to 1 */
--
2.31.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCHv7 3/4] powerpc/setup: Handle the case when boot_cpuid greater than nr_cpus
2023-09-25 7:53 [PATCHv7 0/4] enable nr_cpus for powerpc Pingfan Liu
2023-09-25 7:53 ` [PATCHv7 1/4] powerpc/setup : Enable boot_cpu_hwid for PPC32 Pingfan Liu
2023-09-25 7:53 ` [PATCHv7 2/4] powerpc/setup: Loosen the mapping between cpu logical id and its seq in dt Pingfan Liu
@ 2023-09-25 7:53 ` Pingfan Liu
2023-09-25 7:53 ` [PATCHv7 4/4] powerpc/setup: alloc extra paca_ptrs to hold boot_cpuid Pingfan Liu
3 siblings, 0 replies; 10+ messages in thread
From: Pingfan Liu @ 2023-09-25 7:53 UTC (permalink / raw)
To: linuxppc-dev
Cc: Baoquan He, Pingfan Liu, kexec, Mahesh Salgaonkar, Ming Lei,
Nicholas Piggin, Wen Xiong
If the boot_cpuid is smaller than nr_cpus, it requires extra effort to
ensure the boot_cpu is in cpu_present_mask. This can be achieved by
reserving the last quota for the boot cpu.
Note: the restriction on nr_cpus will be lifted with more effort in the
next patch
Signed-off-by: Pingfan Liu <piliu@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Cc: Wen Xiong <wenxiong@us.ibm.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: kexec@lists.infradead.org
To: linuxppc-dev@lists.ozlabs.org
---
arch/powerpc/kernel/setup-common.c | 25 ++++++++++++++++++++++---
1 file changed, 22 insertions(+), 3 deletions(-)
diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c
index f6d32324b5a5..a72d00a6cff2 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -454,8 +454,8 @@ struct interrupt_server_node {
void __init smp_setup_cpu_maps(void)
{
struct device_node *dn;
- int shift = 0, cpu = 0;
- int j, nthreads = 1;
+ int terminate, shift = 0, cpu = 0;
+ int j, bt_thread = 0, nthreads = 1;
int len;
struct interrupt_server_node *intserv_node, *n;
struct list_head *bt_node, head;
@@ -518,6 +518,7 @@ void __init smp_setup_cpu_maps(void)
for (j = 0 ; j < nthreads; j++) {
if (be32_to_cpu(intserv[j]) == boot_cpu_hwid) {
bt_node = &intserv_node->node;
+ bt_thread = j;
found_boot_cpu = true;
/*
* Record the round-shift between dt
@@ -537,11 +538,21 @@ void __init smp_setup_cpu_maps(void)
/* Select the primary thread, the boot cpu's slibing, as the logic 0 */
list_add_tail(&head, bt_node);
pr_info("the round shift between dt seq and the cpu logic number: %d\n", shift);
+ terminate = nr_cpu_ids;
list_for_each_entry(intserv_node, &head, node) {
+ j = 0;
+ /* Choose a start point to cover the boot cpu */
+ if (nr_cpu_ids - 1 < bt_thread) {
+ /*
+ * The processor core puts assumption on the thread id,
+ * not to breach the assumption.
+ */
+ terminate = nr_cpu_ids - 1;
+ }
avail = intserv_node->avail;
nthreads = intserv_node->len / sizeof(int);
- for (j = 0; j < nthreads && cpu < nr_cpu_ids; j++) {
+ for (; j < nthreads && cpu < terminate; j++) {
set_cpu_present(cpu, avail);
set_cpu_possible(cpu, true);
cpu_to_phys_id[cpu] = be32_to_cpu(intserv_node->intserv[j]);
@@ -549,6 +560,14 @@ void __init smp_setup_cpu_maps(void)
j, cpu, be32_to_cpu(intserv[j]));
cpu++;
}
+ /* Online the boot cpu */
+ if (nr_cpu_ids - 1 < bt_thread) {
+ set_cpu_present(bt_thread, avail);
+ set_cpu_possible(bt_thread, true);
+ cpu_to_phys_id[bt_thread] = be32_to_cpu(intserv_node->intserv[bt_thread]);
+ DBG(" thread %d -> cpu %d (hard id %d)\n",
+ bt_thread, bt_thread, be32_to_cpu(intserv[bt_thread]));
+ }
}
list_for_each_entry_safe(intserv_node, n, &head, node) {
--
2.31.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCHv7 4/4] powerpc/setup: alloc extra paca_ptrs to hold boot_cpuid
2023-09-25 7:53 [PATCHv7 0/4] enable nr_cpus for powerpc Pingfan Liu
` (2 preceding siblings ...)
2023-09-25 7:53 ` [PATCHv7 3/4] powerpc/setup: Handle the case when boot_cpuid greater than nr_cpus Pingfan Liu
@ 2023-09-25 7:53 ` Pingfan Liu
2023-10-03 18:06 ` Mahesh J Salgaonkar
3 siblings, 1 reply; 10+ messages in thread
From: Pingfan Liu @ 2023-09-25 7:53 UTC (permalink / raw)
To: linuxppc-dev
Cc: Baoquan He, Pingfan Liu, kexec, Mahesh Salgaonkar, Ming Lei,
Nicholas Piggin, Wen Xiong
paca_ptrs should be large enough to hold the boot_cpuid, hence, its
lower boundary is set to the bigger one between boot_cpuid+1 and
nr_cpus.
On the other hand, some kernel component: -1. the timer assumes cpu0
online since the timer_list->flags subfield 'TIMER_CPUMASK' is zero if
not initialized to a proper present cpu. -2. power9_idle_stop() assumes
the primary thread's paca is allocated.
Hence lift nr_cpu_ids from one to two to ensure cpu0 is onlined, if the
boot cpu is not cpu0.
Result:
When nr_cpus=1, taskset -c 14 bash -c 'echo c > /proc/sysrq-trigger'
the kdump kernel brings up two cpus.
While when taskset -c 4 bash -c 'echo c > /proc/sysrq-trigger',
the kdump kernel brings up one cpu.
Signed-off-by: Pingfan Liu <piliu@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Cc: Wen Xiong <wenxiong@us.ibm.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: kexec@lists.infradead.org
To: linuxppc-dev@lists.ozlabs.org
---
arch/powerpc/kernel/paca.c | 10 ++++++----
arch/powerpc/kernel/prom.c | 9 ++++++---
2 files changed, 12 insertions(+), 7 deletions(-)
diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
index cda4e00b67c1..91e2401de1bd 100644
--- a/arch/powerpc/kernel/paca.c
+++ b/arch/powerpc/kernel/paca.c
@@ -242,9 +242,10 @@ static int __initdata paca_struct_size;
void __init allocate_paca_ptrs(void)
{
- paca_nr_cpu_ids = nr_cpu_ids;
+ int n = (boot_cpuid + 1) > nr_cpu_ids ? (boot_cpuid + 1) : nr_cpu_ids;
- paca_ptrs_size = sizeof(struct paca_struct *) * nr_cpu_ids;
+ paca_nr_cpu_ids = n;
+ paca_ptrs_size = sizeof(struct paca_struct *) * n;
paca_ptrs = memblock_alloc_raw(paca_ptrs_size, SMP_CACHE_BYTES);
if (!paca_ptrs)
panic("Failed to allocate %d bytes for paca pointers\n",
@@ -287,13 +288,14 @@ void __init allocate_paca(int cpu)
void __init free_unused_pacas(void)
{
int new_ptrs_size;
+ int n = (boot_cpuid + 1) > nr_cpu_ids ? (boot_cpuid + 1) : nr_cpu_ids;
- new_ptrs_size = sizeof(struct paca_struct *) * nr_cpu_ids;
+ new_ptrs_size = sizeof(struct paca_struct *) * n;
if (new_ptrs_size < paca_ptrs_size)
memblock_phys_free(__pa(paca_ptrs) + new_ptrs_size,
paca_ptrs_size - new_ptrs_size);
- paca_nr_cpu_ids = nr_cpu_ids;
+ paca_nr_cpu_ids = n;
paca_ptrs_size = new_ptrs_size;
#ifdef CONFIG_PPC_64S_HASH_MMU
diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index 87272a2d8c10..15c994f54bf9 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -362,9 +362,12 @@ static int __init early_init_dt_scan_cpus(unsigned long node,
*/
boot_cpuid = i;
found = true;
- /* This works around the hole in paca_ptrs[]. */
- if (nr_cpu_ids < nthreads)
- set_nr_cpu_ids(nthreads);
+ /*
+ * Ideally, nr_cpus=1 can be achieved if each kernel
+ * component does not assume cpu0 is onlined.
+ */
+ if (boot_cpuid != 0 && nr_cpu_ids < 2)
+ set_nr_cpu_ids(2);
}
#ifdef CONFIG_SMP
/* logical cpu id is always 0 on UP kernels */
--
2.31.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* RE: [PATCHv7 2/4] powerpc/setup: Loosen the mapping between cpu logical id and its seq in dt
2023-09-25 7:53 ` [PATCHv7 2/4] powerpc/setup: Loosen the mapping between cpu logical id and its seq in dt Pingfan Liu
@ 2023-09-28 20:36 ` Wen Xiong
2023-09-29 3:19 ` Pingfan Liu
2023-09-29 6:43 ` Christophe Leroy
0 siblings, 2 replies; 10+ messages in thread
From: Wen Xiong @ 2023-09-28 20:36 UTC (permalink / raw)
To: Pingfan Liu, linuxppc-dev@lists.ozlabs.org
Cc: Pingfan Liu, Baoquan He, kexec@lists.infradead.org,
Mahesh Salgaonkar, Ming Lei, Nicholas Piggin
Hi Pingfan,
+ avail = intserv_node->avail;
+ nthreads = intserv_node->len / sizeof(int);
+ for (j = 0; j < nthreads && cpu < nr_cpu_ids; j++) {
set_cpu_present(cpu, avail);
set_cpu_possible(cpu, true);
- cpu_to_phys_id[cpu] = be32_to_cpu(intserv[j]);
+ cpu_to_phys_id[cpu] = be32_to_cpu(intserv_node->intserv[j]);
+ DBG(" thread %d -> cpu %d (hard id %d)\n",
+ j, cpu, be32_to_cpu(intserv[j]));
Intserv is not defined. Should "be32_to_cpu(intserv_node->intserv[j])?
cpu++;
}
+ }
-----Original Message-----
From: Pingfan Liu <piliu@redhat.com>
Sent: Monday, September 25, 2023 2:54 AM
To: linuxppc-dev@lists.ozlabs.org
Cc: Pingfan Liu <piliu@redhat.com>; Michael Ellerman <mpe@ellerman.id.au>; Nicholas Piggin <npiggin@gmail.com>; Christophe Leroy <christophe.leroy@csgroup.eu>; Mahesh Salgaonkar <mahesh@linux.ibm.com>; Wen Xiong <wenxiong@us.ibm.com>; Baoquan He <bhe@redhat.com>; Ming Lei <ming.lei@redhat.com>; kexec@lists.infradead.org
Subject: [EXTERNAL] [PATCHv7 2/4] powerpc/setup: Loosen the mapping between cpu logical id and its seq in dt
*** Idea ***
For kexec -p, the boot cpu can be not the cpu0, this causes the problem of allocating memory for paca_ptrs[]. However, in theory, there is no requirement to assign cpu's logical id as its present sequence in the device tree. But there is something like cpu_first_thread_sibling(), which makes assumption on the mapping inside a core. Hence partially loosening the mapping, i.e. unbind the mapping of core while keep the mapping inside a core.
*** Implement ***
At this early stage, there are plenty of memory to utilize. Hence, this patch allocates interim memory to link the cpu info on a list, then reorder cpus by changing the list head. As a result, there is a rotate shift between the sequence number in dt and the cpu logical number.
*** Result ***
After this patch, a boot-cpu's logical id will always be mapped into the range [0,threads_per_core).
Besides this, at this phase, all threads in the boot core are forced to be onlined. This restriction will be lifted in a later patch with extra effort.
Signed-off-by: Pingfan Liu <piliu@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Cc: Wen Xiong <wenxiong@us.ibm.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: kexec@lists.infradead.org
To: linuxppc-dev@lists.ozlabs.org
---
arch/powerpc/kernel/prom.c | 25 +++++----
arch/powerpc/kernel/setup-common.c | 87 +++++++++++++++++++++++-------
2 files changed, 85 insertions(+), 27 deletions(-)
diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c index ec82f5bda908..87272a2d8c10 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -76,7 +76,9 @@ u64 ppc64_rma_size;
unsigned int boot_cpu_node_count __ro_after_init; #endif static phys_addr_t first_memblock_size;
+#ifdef CONFIG_SMP
static int __initdata boot_cpu_count;
+#endif
static int __init early_parse_mem(char *p) { @@ -331,8 +333,7 @@ static int __init early_init_dt_scan_cpus(unsigned long node,
const __be32 *intserv;
int i, nthreads;
int len;
- int found = -1;
- int found_thread = 0;
+ bool found = false;
/* We are scanning "cpu" nodes only */
if (type == NULL || strcmp(type, "cpu") != 0) @@ -355,8 +356,15 @@ static int __init early_init_dt_scan_cpus(unsigned long node,
for (i = 0; i < nthreads; i++) {
if (be32_to_cpu(intserv[i]) ==
fdt_boot_cpuid_phys(initial_boot_params)) {
- found = boot_cpu_count;
- found_thread = i;
+ /*
+ * always map the boot-cpu logical id into the
+ * range of [0, thread_per_core)
+ */
+ boot_cpuid = i;
+ found = true;
+ /* This works around the hole in paca_ptrs[]. */
+ if (nr_cpu_ids < nthreads)
+ set_nr_cpu_ids(nthreads);
}
#ifdef CONFIG_SMP
/* logical cpu id is always 0 on UP kernels */ @@ -365,14 +373,13 @@ static int __init early_init_dt_scan_cpus(unsigned long node,
}
/* Not the boot CPU */
- if (found < 0)
+ if (!found)
return 0;
- DBG("boot cpu: logical %d physical %d\n", found,
- be32_to_cpu(intserv[found_thread]));
- boot_cpuid = found;
+ DBG("boot cpu: logical %d physical %d\n", boot_cpuid,
+ be32_to_cpu(intserv[boot_cpuid]));
- boot_cpu_hwid = be32_to_cpu(intserv[found_thread]);
+ boot_cpu_hwid = be32_to_cpu(intserv[boot_cpuid]);
/*
* PAPR defines "logical" PVR values for cpus that diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c
index 1b19a9815672..f6d32324b5a5 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -36,6 +36,7 @@
#include <linux/of_platform.h>
#include <linux/hugetlb.h>
#include <linux/pgtable.h>
+#include <linux/list.h>
#include <asm/io.h>
#include <asm/paca.h>
#include <asm/processor.h>
@@ -425,6 +426,13 @@ static void __init cpu_init_thread_core_maps(int tpc)
u32 *cpu_to_phys_id = NULL;
+struct interrupt_server_node {
+ struct list_head node;
+ bool avail;
+ int len;
+ __be32 *intserv;
+};
+
/**
* setup_cpu_maps - initialize the following cpu maps:
* cpu_possible_mask
@@ -446,11 +454,16 @@ u32 *cpu_to_phys_id = NULL; void __init smp_setup_cpu_maps(void) {
struct device_node *dn;
- int cpu = 0;
- int nthreads = 1;
+ int shift = 0, cpu = 0;
+ int j, nthreads = 1;
+ int len;
+ struct interrupt_server_node *intserv_node, *n;
+ struct list_head *bt_node, head;
+ bool avail, found_boot_cpu = false;
DBG("smp_setup_cpu_maps()\n");
+ INIT_LIST_HEAD(&head);
cpu_to_phys_id = memblock_alloc(nr_cpu_ids * sizeof(u32),
__alignof__(u32));
if (!cpu_to_phys_id)
@@ -460,7 +473,6 @@ void __init smp_setup_cpu_maps(void)
for_each_node_by_type(dn, "cpu") {
const __be32 *intserv;
__be32 cpu_be;
- int j, len;
DBG(" * %pOF...\n", dn);
@@ -480,29 +492,68 @@ void __init smp_setup_cpu_maps(void)
}
}
- nthreads = len / sizeof(int);
-
- for (j = 0; j < nthreads && cpu < nr_cpu_ids; j++) {
- bool avail;
+ avail = of_device_is_available(dn);
+ if (!avail)
+ avail = !of_property_match_string(dn,
+ "enable-method", "spin-table");
- DBG(" thread %d -> cpu %d (hard id %d)\n",
- j, cpu, be32_to_cpu(intserv[j]));
- avail = of_device_is_available(dn);
- if (!avail)
- avail = !of_property_match_string(dn,
- "enable-method", "spin-table");
+ intserv_node = memblock_alloc(sizeof(struct interrupt_server_node) + len,
+ __alignof__(u32));
+ if (!intserv_node)
+ panic("%s: Failed to allocate %zu bytes align=0x%zx\n",
+ __func__,
+ sizeof(struct interrupt_server_node) + len,
+ __alignof__(u32));
+ intserv_node->intserv = (__be32 *)((char *)intserv_node +
+ sizeof(struct interrupt_server_node));
+ intserv_node->len = len;
+ memcpy(intserv_node->intserv, intserv, len);
+ intserv_node->avail = avail;
+ INIT_LIST_HEAD(&intserv_node->node);
+ list_add_tail(&intserv_node->node, &head);
+
+ if (!found_boot_cpu) {
+ nthreads = len / sizeof(int);
+ for (j = 0 ; j < nthreads; j++) {
+ if (be32_to_cpu(intserv[j]) == boot_cpu_hwid) {
+ bt_node = &intserv_node->node;
+ found_boot_cpu = true;
+ /*
+ * Record the round-shift between dt
+ * seq and cpu logical number
+ */
+ shift = cpu - j;
+ break;
+ }
+
+ cpu++;
+ }
+ }
+ }
+ cpu = 0;
+ list_del_init(&head);
+ /* Select the primary thread, the boot cpu's slibing, as the logic 0 */
+ list_add_tail(&head, bt_node);
+ pr_info("the round shift between dt seq and the cpu logic number: %d\n", shift);
+ list_for_each_entry(intserv_node, &head, node) {
+
+ avail = intserv_node->avail;
+ nthreads = intserv_node->len / sizeof(int);
+ for (j = 0; j < nthreads && cpu < nr_cpu_ids; j++) {
set_cpu_present(cpu, avail);
set_cpu_possible(cpu, true);
- cpu_to_phys_id[cpu] = be32_to_cpu(intserv[j]);
+ cpu_to_phys_id[cpu] = be32_to_cpu(intserv_node->intserv[j]);
+ DBG(" thread %d -> cpu %d (hard id %d)\n",
+ j, cpu, be32_to_cpu(intserv[j]));
cpu++;
}
+ }
- if (cpu >= nr_cpu_ids) {
- of_node_put(dn);
- break;
- }
+ list_for_each_entry_safe(intserv_node, n, &head, node) {
+ len = sizeof(struct interrupt_server_node) + intserv_node->len;
+ memblock_free(intserv_node, len);
}
/* If no SMT supported, nthreads is forced to 1 */
--
2.31.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCHv7 2/4] powerpc/setup: Loosen the mapping between cpu logical id and its seq in dt
2023-09-28 20:36 ` Wen Xiong
@ 2023-09-29 3:19 ` Pingfan Liu
2023-09-29 6:43 ` Christophe Leroy
1 sibling, 0 replies; 10+ messages in thread
From: Pingfan Liu @ 2023-09-29 3:19 UTC (permalink / raw)
To: Wen Xiong
Cc: Baoquan He, kexec@lists.infradead.org, Mahesh Salgaonkar,
Ming Lei, Nicholas Piggin, linuxppc-dev@lists.ozlabs.org
On Fri, Sep 29, 2023 at 4:36 AM Wen Xiong <wenxiong@us.ibm.com> wrote:
>
> Hi Pingfan,
>
> + avail = intserv_node->avail;
> + nthreads = intserv_node->len / sizeof(int);
> + for (j = 0; j < nthreads && cpu < nr_cpu_ids; j++) {
> set_cpu_present(cpu, avail);
> set_cpu_possible(cpu, true);
> - cpu_to_phys_id[cpu] = be32_to_cpu(intserv[j]);
> + cpu_to_phys_id[cpu] = be32_to_cpu(intserv_node->intserv[j]);
> + DBG(" thread %d -> cpu %d (hard id %d)\n",
> + j, cpu, be32_to_cpu(intserv[j]));
>
> Intserv is not defined. Should "be32_to_cpu(intserv_node->intserv[j])?
Yes, thanks. Sorry that I did not turn on the DBG macro and not catch this bug.
Thanks,
Pingfan
> cpu++;
> }
> + }
>
> -----Original Message-----
> From: Pingfan Liu <piliu@redhat.com>
> Sent: Monday, September 25, 2023 2:54 AM
> To: linuxppc-dev@lists.ozlabs.org
> Cc: Pingfan Liu <piliu@redhat.com>; Michael Ellerman <mpe@ellerman.id.au>; Nicholas Piggin <npiggin@gmail.com>; Christophe Leroy <christophe.leroy@csgroup.eu>; Mahesh Salgaonkar <mahesh@linux.ibm.com>; Wen Xiong <wenxiong@us.ibm.com>; Baoquan He <bhe@redhat.com>; Ming Lei <ming.lei@redhat.com>; kexec@lists.infradead.org
> Subject: [EXTERNAL] [PATCHv7 2/4] powerpc/setup: Loosen the mapping between cpu logical id and its seq in dt
>
> *** Idea ***
> For kexec -p, the boot cpu can be not the cpu0, this causes the problem of allocating memory for paca_ptrs[]. However, in theory, there is no requirement to assign cpu's logical id as its present sequence in the device tree. But there is something like cpu_first_thread_sibling(), which makes assumption on the mapping inside a core. Hence partially loosening the mapping, i.e. unbind the mapping of core while keep the mapping inside a core.
>
> *** Implement ***
> At this early stage, there are plenty of memory to utilize. Hence, this patch allocates interim memory to link the cpu info on a list, then reorder cpus by changing the list head. As a result, there is a rotate shift between the sequence number in dt and the cpu logical number.
>
> *** Result ***
> After this patch, a boot-cpu's logical id will always be mapped into the range [0,threads_per_core).
>
> Besides this, at this phase, all threads in the boot core are forced to be onlined. This restriction will be lifted in a later patch with extra effort.
>
> Signed-off-by: Pingfan Liu <piliu@redhat.com>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Nicholas Piggin <npiggin@gmail.com>
> Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
> Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com>
> Cc: Wen Xiong <wenxiong@us.ibm.com>
> Cc: Baoquan He <bhe@redhat.com>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: kexec@lists.infradead.org
> To: linuxppc-dev@lists.ozlabs.org
> ---
> arch/powerpc/kernel/prom.c | 25 +++++----
> arch/powerpc/kernel/setup-common.c | 87 +++++++++++++++++++++++-------
> 2 files changed, 85 insertions(+), 27 deletions(-)
>
> diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c index ec82f5bda908..87272a2d8c10 100644
> --- a/arch/powerpc/kernel/prom.c
> +++ b/arch/powerpc/kernel/prom.c
> @@ -76,7 +76,9 @@ u64 ppc64_rma_size;
> unsigned int boot_cpu_node_count __ro_after_init; #endif static phys_addr_t first_memblock_size;
> +#ifdef CONFIG_SMP
> static int __initdata boot_cpu_count;
> +#endif
>
> static int __init early_parse_mem(char *p) { @@ -331,8 +333,7 @@ static int __init early_init_dt_scan_cpus(unsigned long node,
> const __be32 *intserv;
> int i, nthreads;
> int len;
> - int found = -1;
> - int found_thread = 0;
> + bool found = false;
>
> /* We are scanning "cpu" nodes only */
> if (type == NULL || strcmp(type, "cpu") != 0) @@ -355,8 +356,15 @@ static int __init early_init_dt_scan_cpus(unsigned long node,
> for (i = 0; i < nthreads; i++) {
> if (be32_to_cpu(intserv[i]) ==
> fdt_boot_cpuid_phys(initial_boot_params)) {
> - found = boot_cpu_count;
> - found_thread = i;
> + /*
> + * always map the boot-cpu logical id into the
> + * range of [0, thread_per_core)
> + */
> + boot_cpuid = i;
> + found = true;
> + /* This works around the hole in paca_ptrs[]. */
> + if (nr_cpu_ids < nthreads)
> + set_nr_cpu_ids(nthreads);
> }
> #ifdef CONFIG_SMP
> /* logical cpu id is always 0 on UP kernels */ @@ -365,14 +373,13 @@ static int __init early_init_dt_scan_cpus(unsigned long node,
> }
>
> /* Not the boot CPU */
> - if (found < 0)
> + if (!found)
> return 0;
>
> - DBG("boot cpu: logical %d physical %d\n", found,
> - be32_to_cpu(intserv[found_thread]));
> - boot_cpuid = found;
> + DBG("boot cpu: logical %d physical %d\n", boot_cpuid,
> + be32_to_cpu(intserv[boot_cpuid]));
>
> - boot_cpu_hwid = be32_to_cpu(intserv[found_thread]);
> + boot_cpu_hwid = be32_to_cpu(intserv[boot_cpuid]);
>
> /*
> * PAPR defines "logical" PVR values for cpus that diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c
> index 1b19a9815672..f6d32324b5a5 100644
> --- a/arch/powerpc/kernel/setup-common.c
> +++ b/arch/powerpc/kernel/setup-common.c
> @@ -36,6 +36,7 @@
> #include <linux/of_platform.h>
> #include <linux/hugetlb.h>
> #include <linux/pgtable.h>
> +#include <linux/list.h>
> #include <asm/io.h>
> #include <asm/paca.h>
> #include <asm/processor.h>
> @@ -425,6 +426,13 @@ static void __init cpu_init_thread_core_maps(int tpc)
>
> u32 *cpu_to_phys_id = NULL;
>
> +struct interrupt_server_node {
> + struct list_head node;
> + bool avail;
> + int len;
> + __be32 *intserv;
> +};
> +
> /**
> * setup_cpu_maps - initialize the following cpu maps:
> * cpu_possible_mask
> @@ -446,11 +454,16 @@ u32 *cpu_to_phys_id = NULL; void __init smp_setup_cpu_maps(void) {
> struct device_node *dn;
> - int cpu = 0;
> - int nthreads = 1;
> + int shift = 0, cpu = 0;
> + int j, nthreads = 1;
> + int len;
> + struct interrupt_server_node *intserv_node, *n;
> + struct list_head *bt_node, head;
> + bool avail, found_boot_cpu = false;
>
> DBG("smp_setup_cpu_maps()\n");
>
> + INIT_LIST_HEAD(&head);
> cpu_to_phys_id = memblock_alloc(nr_cpu_ids * sizeof(u32),
> __alignof__(u32));
> if (!cpu_to_phys_id)
> @@ -460,7 +473,6 @@ void __init smp_setup_cpu_maps(void)
> for_each_node_by_type(dn, "cpu") {
> const __be32 *intserv;
> __be32 cpu_be;
> - int j, len;
>
> DBG(" * %pOF...\n", dn);
>
> @@ -480,29 +492,68 @@ void __init smp_setup_cpu_maps(void)
> }
> }
>
> - nthreads = len / sizeof(int);
> -
> - for (j = 0; j < nthreads && cpu < nr_cpu_ids; j++) {
> - bool avail;
> + avail = of_device_is_available(dn);
> + if (!avail)
> + avail = !of_property_match_string(dn,
> + "enable-method", "spin-table");
>
> - DBG(" thread %d -> cpu %d (hard id %d)\n",
> - j, cpu, be32_to_cpu(intserv[j]));
>
> - avail = of_device_is_available(dn);
> - if (!avail)
> - avail = !of_property_match_string(dn,
> - "enable-method", "spin-table");
> + intserv_node = memblock_alloc(sizeof(struct interrupt_server_node) + len,
> + __alignof__(u32));
> + if (!intserv_node)
> + panic("%s: Failed to allocate %zu bytes align=0x%zx\n",
> + __func__,
> + sizeof(struct interrupt_server_node) + len,
> + __alignof__(u32));
> + intserv_node->intserv = (__be32 *)((char *)intserv_node +
> + sizeof(struct interrupt_server_node));
> + intserv_node->len = len;
> + memcpy(intserv_node->intserv, intserv, len);
> + intserv_node->avail = avail;
> + INIT_LIST_HEAD(&intserv_node->node);
> + list_add_tail(&intserv_node->node, &head);
> +
> + if (!found_boot_cpu) {
> + nthreads = len / sizeof(int);
> + for (j = 0 ; j < nthreads; j++) {
> + if (be32_to_cpu(intserv[j]) == boot_cpu_hwid) {
> + bt_node = &intserv_node->node;
> + found_boot_cpu = true;
> + /*
> + * Record the round-shift between dt
> + * seq and cpu logical number
> + */
> + shift = cpu - j;
> + break;
> + }
> +
> + cpu++;
> + }
> + }
>
> + }
> + cpu = 0;
> + list_del_init(&head);
> + /* Select the primary thread, the boot cpu's slibing, as the logic 0 */
> + list_add_tail(&head, bt_node);
> + pr_info("the round shift between dt seq and the cpu logic number: %d\n", shift);
> + list_for_each_entry(intserv_node, &head, node) {
> +
> + avail = intserv_node->avail;
> + nthreads = intserv_node->len / sizeof(int);
> + for (j = 0; j < nthreads && cpu < nr_cpu_ids; j++) {
> set_cpu_present(cpu, avail);
> set_cpu_possible(cpu, true);
> - cpu_to_phys_id[cpu] = be32_to_cpu(intserv[j]);
> + cpu_to_phys_id[cpu] = be32_to_cpu(intserv_node->intserv[j]);
> + DBG(" thread %d -> cpu %d (hard id %d)\n",
> + j, cpu, be32_to_cpu(intserv[j]));
> cpu++;
> }
> + }
>
> - if (cpu >= nr_cpu_ids) {
> - of_node_put(dn);
> - break;
> - }
> + list_for_each_entry_safe(intserv_node, n, &head, node) {
> + len = sizeof(struct interrupt_server_node) + intserv_node->len;
> + memblock_free(intserv_node, len);
> }
>
> /* If no SMT supported, nthreads is forced to 1 */
> --
> 2.31.1
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCHv7 2/4] powerpc/setup: Loosen the mapping between cpu logical id and its seq in dt
2023-09-28 20:36 ` Wen Xiong
2023-09-29 3:19 ` Pingfan Liu
@ 2023-09-29 6:43 ` Christophe Leroy
1 sibling, 0 replies; 10+ messages in thread
From: Christophe Leroy @ 2023-09-29 6:43 UTC (permalink / raw)
To: Wen Xiong, Pingfan Liu, linuxppc-dev@lists.ozlabs.org
Cc: Baoquan He, kexec@lists.infradead.org, Mahesh Salgaonkar,
Nicholas Piggin, Ming Lei
Le 28/09/2023 à 22:36, Wen Xiong a écrit :
> Hi Pingfan,
>
> + avail = intserv_node->avail;
> + nthreads = intserv_node->len / sizeof(int);
> + for (j = 0; j < nthreads && cpu < nr_cpu_ids; j++) {
> set_cpu_present(cpu, avail);
> set_cpu_possible(cpu, true);
> - cpu_to_phys_id[cpu] = be32_to_cpu(intserv[j]);
> + cpu_to_phys_id[cpu] = be32_to_cpu(intserv_node->intserv[j]);
> + DBG(" thread %d -> cpu %d (hard id %d)\n",
> + j, cpu, be32_to_cpu(intserv[j]));
>
> Intserv is not defined. Should "be32_to_cpu(intserv_node->intserv[j])?
> cpu++;
> }
> + }
Please don't top-post , see
https://docs.kernel.org/process/submitting-patches.html#use-trimmed-interleaved-replies-in-email-discussions
Make comments inside the patch directly, making sure that your mail
client is properly configured to add the standard > in front of all
lines of the quoted mail.
Christophe
>
> -----Original Message-----
> From: Pingfan Liu <piliu@redhat.com>
> Sent: Monday, September 25, 2023 2:54 AM
> To: linuxppc-dev@lists.ozlabs.org
> Cc: Pingfan Liu <piliu@redhat.com>; Michael Ellerman <mpe@ellerman.id.au>; Nicholas Piggin <npiggin@gmail.com>; Christophe Leroy <christophe.leroy@csgroup.eu>; Mahesh Salgaonkar <mahesh@linux.ibm.com>; Wen Xiong <wenxiong@us.ibm.com>; Baoquan He <bhe@redhat.com>; Ming Lei <ming.lei@redhat.com>; kexec@lists.infradead.org
> Subject: [EXTERNAL] [PATCHv7 2/4] powerpc/setup: Loosen the mapping between cpu logical id and its seq in dt
>
> *** Idea ***
> For kexec -p, the boot cpu can be not the cpu0, this causes the problem of allocating memory for paca_ptrs[]. However, in theory, there is no requirement to assign cpu's logical id as its present sequence in the device tree. But there is something like cpu_first_thread_sibling(), which makes assumption on the mapping inside a core. Hence partially loosening the mapping, i.e. unbind the mapping of core while keep the mapping inside a core.
>
> *** Implement ***
> At this early stage, there are plenty of memory to utilize. Hence, this patch allocates interim memory to link the cpu info on a list, then reorder cpus by changing the list head. As a result, there is a rotate shift between the sequence number in dt and the cpu logical number.
>
> *** Result ***
> After this patch, a boot-cpu's logical id will always be mapped into the range [0,threads_per_core).
>
> Besides this, at this phase, all threads in the boot core are forced to be onlined. This restriction will be lifted in a later patch with extra effort.
>
> Signed-off-by: Pingfan Liu <piliu@redhat.com>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Nicholas Piggin <npiggin@gmail.com>
> Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
> Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com>
> Cc: Wen Xiong <wenxiong@us.ibm.com>
> Cc: Baoquan He <bhe@redhat.com>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: kexec@lists.infradead.org
> To: linuxppc-dev@lists.ozlabs.org
> ---
> arch/powerpc/kernel/prom.c | 25 +++++----
> arch/powerpc/kernel/setup-common.c | 87 +++++++++++++++++++++++-------
> 2 files changed, 85 insertions(+), 27 deletions(-)
>
> diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c index ec82f5bda908..87272a2d8c10 100644
> --- a/arch/powerpc/kernel/prom.c
> +++ b/arch/powerpc/kernel/prom.c
> @@ -76,7 +76,9 @@ u64 ppc64_rma_size;
> unsigned int boot_cpu_node_count __ro_after_init; #endif static phys_addr_t first_memblock_size;
> +#ifdef CONFIG_SMP
> static int __initdata boot_cpu_count;
> +#endif
>
> static int __init early_parse_mem(char *p) { @@ -331,8 +333,7 @@ static int __init early_init_dt_scan_cpus(unsigned long node,
> const __be32 *intserv;
> int i, nthreads;
> int len;
> - int found = -1;
> - int found_thread = 0;
> + bool found = false;
>
> /* We are scanning "cpu" nodes only */
> if (type == NULL || strcmp(type, "cpu") != 0) @@ -355,8 +356,15 @@ static int __init early_init_dt_scan_cpus(unsigned long node,
> for (i = 0; i < nthreads; i++) {
> if (be32_to_cpu(intserv[i]) ==
> fdt_boot_cpuid_phys(initial_boot_params)) {
> - found = boot_cpu_count;
> - found_thread = i;
> + /*
> + * always map the boot-cpu logical id into the
> + * range of [0, thread_per_core)
> + */
> + boot_cpuid = i;
> + found = true;
> + /* This works around the hole in paca_ptrs[]. */
> + if (nr_cpu_ids < nthreads)
> + set_nr_cpu_ids(nthreads);
> }
> #ifdef CONFIG_SMP
> /* logical cpu id is always 0 on UP kernels */ @@ -365,14 +373,13 @@ static int __init early_init_dt_scan_cpus(unsigned long node,
> }
>
> /* Not the boot CPU */
> - if (found < 0)
> + if (!found)
> return 0;
>
> - DBG("boot cpu: logical %d physical %d\n", found,
> - be32_to_cpu(intserv[found_thread]));
> - boot_cpuid = found;
> + DBG("boot cpu: logical %d physical %d\n", boot_cpuid,
> + be32_to_cpu(intserv[boot_cpuid]));
>
> - boot_cpu_hwid = be32_to_cpu(intserv[found_thread]);
> + boot_cpu_hwid = be32_to_cpu(intserv[boot_cpuid]);
>
> /*
> * PAPR defines "logical" PVR values for cpus that diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c
> index 1b19a9815672..f6d32324b5a5 100644
> --- a/arch/powerpc/kernel/setup-common.c
> +++ b/arch/powerpc/kernel/setup-common.c
> @@ -36,6 +36,7 @@
> #include <linux/of_platform.h>
> #include <linux/hugetlb.h>
> #include <linux/pgtable.h>
> +#include <linux/list.h>
> #include <asm/io.h>
> #include <asm/paca.h>
> #include <asm/processor.h>
> @@ -425,6 +426,13 @@ static void __init cpu_init_thread_core_maps(int tpc)
>
> u32 *cpu_to_phys_id = NULL;
>
> +struct interrupt_server_node {
> + struct list_head node;
> + bool avail;
> + int len;
> + __be32 *intserv;
> +};
> +
> /**
> * setup_cpu_maps - initialize the following cpu maps:
> * cpu_possible_mask
> @@ -446,11 +454,16 @@ u32 *cpu_to_phys_id = NULL; void __init smp_setup_cpu_maps(void) {
> struct device_node *dn;
> - int cpu = 0;
> - int nthreads = 1;
> + int shift = 0, cpu = 0;
> + int j, nthreads = 1;
> + int len;
> + struct interrupt_server_node *intserv_node, *n;
> + struct list_head *bt_node, head;
> + bool avail, found_boot_cpu = false;
>
> DBG("smp_setup_cpu_maps()\n");
>
> + INIT_LIST_HEAD(&head);
> cpu_to_phys_id = memblock_alloc(nr_cpu_ids * sizeof(u32),
> __alignof__(u32));
> if (!cpu_to_phys_id)
> @@ -460,7 +473,6 @@ void __init smp_setup_cpu_maps(void)
> for_each_node_by_type(dn, "cpu") {
> const __be32 *intserv;
> __be32 cpu_be;
> - int j, len;
>
> DBG(" * %pOF...\n", dn);
>
> @@ -480,29 +492,68 @@ void __init smp_setup_cpu_maps(void)
> }
> }
>
> - nthreads = len / sizeof(int);
> -
> - for (j = 0; j < nthreads && cpu < nr_cpu_ids; j++) {
> - bool avail;
> + avail = of_device_is_available(dn);
> + if (!avail)
> + avail = !of_property_match_string(dn,
> + "enable-method", "spin-table");
>
> - DBG(" thread %d -> cpu %d (hard id %d)\n",
> - j, cpu, be32_to_cpu(intserv[j]));
>
> - avail = of_device_is_available(dn);
> - if (!avail)
> - avail = !of_property_match_string(dn,
> - "enable-method", "spin-table");
> + intserv_node = memblock_alloc(sizeof(struct interrupt_server_node) + len,
> + __alignof__(u32));
> + if (!intserv_node)
> + panic("%s: Failed to allocate %zu bytes align=0x%zx\n",
> + __func__,
> + sizeof(struct interrupt_server_node) + len,
> + __alignof__(u32));
> + intserv_node->intserv = (__be32 *)((char *)intserv_node +
> + sizeof(struct interrupt_server_node));
> + intserv_node->len = len;
> + memcpy(intserv_node->intserv, intserv, len);
> + intserv_node->avail = avail;
> + INIT_LIST_HEAD(&intserv_node->node);
> + list_add_tail(&intserv_node->node, &head);
> +
> + if (!found_boot_cpu) {
> + nthreads = len / sizeof(int);
> + for (j = 0 ; j < nthreads; j++) {
> + if (be32_to_cpu(intserv[j]) == boot_cpu_hwid) {
> + bt_node = &intserv_node->node;
> + found_boot_cpu = true;
> + /*
> + * Record the round-shift between dt
> + * seq and cpu logical number
> + */
> + shift = cpu - j;
> + break;
> + }
> +
> + cpu++;
> + }
> + }
>
> + }
> + cpu = 0;
> + list_del_init(&head);
> + /* Select the primary thread, the boot cpu's slibing, as the logic 0 */
> + list_add_tail(&head, bt_node);
> + pr_info("the round shift between dt seq and the cpu logic number: %d\n", shift);
> + list_for_each_entry(intserv_node, &head, node) {
> +
> + avail = intserv_node->avail;
> + nthreads = intserv_node->len / sizeof(int);
> + for (j = 0; j < nthreads && cpu < nr_cpu_ids; j++) {
> set_cpu_present(cpu, avail);
> set_cpu_possible(cpu, true);
> - cpu_to_phys_id[cpu] = be32_to_cpu(intserv[j]);
> + cpu_to_phys_id[cpu] = be32_to_cpu(intserv_node->intserv[j]);
> + DBG(" thread %d -> cpu %d (hard id %d)\n",
> + j, cpu, be32_to_cpu(intserv[j]));
> cpu++;
> }
> + }
>
> - if (cpu >= nr_cpu_ids) {
> - of_node_put(dn);
> - break;
> - }
> + list_for_each_entry_safe(intserv_node, n, &head, node) {
> + len = sizeof(struct interrupt_server_node) + intserv_node->len;
> + memblock_free(intserv_node, len);
> }
>
> /* If no SMT supported, nthreads is forced to 1 */
> --
> 2.31.1
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCHv7 4/4] powerpc/setup: alloc extra paca_ptrs to hold boot_cpuid
2023-09-25 7:53 ` [PATCHv7 4/4] powerpc/setup: alloc extra paca_ptrs to hold boot_cpuid Pingfan Liu
@ 2023-10-03 18:06 ` Mahesh J Salgaonkar
2023-10-07 1:03 ` Pingfan Liu
0 siblings, 1 reply; 10+ messages in thread
From: Mahesh J Salgaonkar @ 2023-10-03 18:06 UTC (permalink / raw)
To: Pingfan Liu
Cc: Baoquan He, kexec, Ming Lei, Nicholas Piggin, linuxppc-dev,
Wen Xiong
On 2023-09-25 15:53:48 Mon, Pingfan Liu wrote:
> paca_ptrs should be large enough to hold the boot_cpuid, hence, its
> lower boundary is set to the bigger one between boot_cpuid+1 and
> nr_cpus.
>
> On the other hand, some kernel component: -1. the timer assumes cpu0
> online since the timer_list->flags subfield 'TIMER_CPUMASK' is zero if
> not initialized to a proper present cpu. -2. power9_idle_stop() assumes
> the primary thread's paca is allocated.
>
> Hence lift nr_cpu_ids from one to two to ensure cpu0 is onlined, if the
> boot cpu is not cpu0.
>
> Result:
> When nr_cpus=1, taskset -c 14 bash -c 'echo c > /proc/sysrq-trigger'
> the kdump kernel brings up two cpus.
> While when taskset -c 4 bash -c 'echo c > /proc/sysrq-trigger',
> the kdump kernel brings up one cpu.
I tried your changes on power9 and power10 systems. However, on power10 lpar I
see bellow backtrace in kdump kernel bootup with nr_cpus=1.
$ taskset -c 4 bash -c 'echo c > /proc/sysrq-trigger'
[...]
[ 0.000000] Hardware name: IBM,9105-22A POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1040.00 (NL1040_005) hv:phyp pSeries
[ 0.000000] printk: bootconsole [udbg0] enabled
[ 0.000000] the round shift between dt seq and the cpu logic number: 8
[ 0.000000] Partition configured for 16 cpus, operating system maximum is 2.
[ 0.000000] CPU maps initialized for 8 threads per core
[...]
[ 0.002249] BUG: Unable to handle kernel data access at 0x88888888888888c0
[ 0.002260] Faulting instruction address: 0xc00000001201226c
[ 0.002268] Oops: Kernel access of bad area, sig: 11 [#1]
[ 0.002274] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
[ 0.002282] Modules linked in:
[ 0.002288] CPU: 4 PID: 1 Comm: swapper/4 Not tainted 6.6.0-rc4 #1
[ 0.002296] Hardware name: IBM,9105-22A POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1040.00 (NL1040_005) hv:phyp pSeries
[ 0.002305] NIP: c00000001201226c LR: c000000012012234 CTR: 0000000000000004
[ 0.002312] REGS: c0000000167ff8f0 TRAP: 0380 Not tainted (6.6.0-rc4)
[ 0.002321] MSR: 8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE> CR: 24000844 XER: 0000000a
[ 0.002346] CFAR: c00000001201231c IRQMASK: 0
[ 0.002346] GPR00: c000000012012234 c0000000167ffb90 c000000011b61900 0000000000000002
[ 0.002346] GPR04: 0000000000000000 0000000000000001 0000000000000001 c00000004ffeff80
[ 0.002346] GPR08: 0000000000000000 8888888888888888 0000000000000002 0000000000000000
[ 0.002346] GPR12: 0000000000000000 c000000013141000 c000000010011058 0000000000000000
[ 0.002346] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 0.002346] GPR20: 0000000000000028 c000000012170968 c0000000120a3e80 0000000000000016
[ 0.002346] GPR24: c00000004ffdcfd0 0000000000000000 c000000012b82058 0000000000000000
[ 0.002346] GPR28: c00000004fc80a68 c000000012bf0350 c0000000120a3e2c 0000000000000000
[ 0.002426] NIP [c00000001201226c] update_mask_from_threadgroup+0x98/0x174
[ 0.002437] LR [c000000012012234] update_mask_from_threadgroup+0x60/0x174
[ 0.002444] Call Trace:
[ 0.002451] [c0000000167ffb90] [c000000012012234] update_mask_from_threadgroup+0x60/0x174 (unreliable)
[ 0.002464] [c0000000167ffbe0] [c0000000120125f8] init_thread_group_cache_map+0x2b0/0x328
[ 0.002477] [c0000000167ffc50] [c00000001201296c] smp_prepare_cpus+0x2fc/0x4f0
[ 0.002497] [c0000000167ffd10] [c000000012004e40] kernel_init_freeable+0x198/0x3cc
[ 0.002509] [c0000000167ffde0] [c000000010011084] kernel_init+0x34/0x1b0
[ 0.002531] [c0000000167ffe50] [c00000001000dd3c] ret_from_kernel_user_thread+0x14/0x1c
[ 0.002547] --- interrupt: 0 at 0x0
[ 0.002553] NIP: 0000000000000000 LR: 0000000000000000 CTR: 0000000000000000
[ 0.002563] REGS: c0000000167ffe80 TRAP: 0000 Not tainted (6.6.0-rc4)
[ 0.002569] MSR: 0000000000000000 <> CR: 00000000 XER: 00000000
[ 0.002576] CFAR: 0000000000000000 IRQMASK: 0
[ 0.002576] GPR00: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 0.002576] GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 0.002576] GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 0.002576] GPR12: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 0.002576] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 0.002576] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 0.002576] GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 0.002576] GPR28: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 0.002671] NIP [0000000000000000] 0x0
[ 0.002680] LR [0000000000000000] 0x0
[ 0.002689] --- interrupt: 0
[ 0.002697] Code: 7feafb78 813d0000 7d29fa14 7f895000 409d00d4 3ce20102 38e74758 79491f24 e87e0006 39000000 e8e70000 7d27482a <a8890038> 7f834000 79090020 419e005c
[ 0.002727] ---[ end trace 0000000000000000 ]---
[ 0.002739]
[ 1.002749] Kernel panic - not syncing: Fatal exception
[ 1.002795] Rebooting in 10 seconds..
Thanks,
-Mahesh.
>
> Signed-off-by: Pingfan Liu <piliu@redhat.com>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Nicholas Piggin <npiggin@gmail.com>
> Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
> Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com>
> Cc: Wen Xiong <wenxiong@us.ibm.com>
> Cc: Baoquan He <bhe@redhat.com>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: kexec@lists.infradead.org
> To: linuxppc-dev@lists.ozlabs.org
> ---
> arch/powerpc/kernel/paca.c | 10 ++++++----
> arch/powerpc/kernel/prom.c | 9 ++++++---
> 2 files changed, 12 insertions(+), 7 deletions(-)
>
> diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
> index cda4e00b67c1..91e2401de1bd 100644
> --- a/arch/powerpc/kernel/paca.c
> +++ b/arch/powerpc/kernel/paca.c
> @@ -242,9 +242,10 @@ static int __initdata paca_struct_size;
>
> void __init allocate_paca_ptrs(void)
> {
> - paca_nr_cpu_ids = nr_cpu_ids;
> + int n = (boot_cpuid + 1) > nr_cpu_ids ? (boot_cpuid + 1) : nr_cpu_ids;
>
> - paca_ptrs_size = sizeof(struct paca_struct *) * nr_cpu_ids;
> + paca_nr_cpu_ids = n;
> + paca_ptrs_size = sizeof(struct paca_struct *) * n;
> paca_ptrs = memblock_alloc_raw(paca_ptrs_size, SMP_CACHE_BYTES);
> if (!paca_ptrs)
> panic("Failed to allocate %d bytes for paca pointers\n",
> @@ -287,13 +288,14 @@ void __init allocate_paca(int cpu)
> void __init free_unused_pacas(void)
> {
> int new_ptrs_size;
> + int n = (boot_cpuid + 1) > nr_cpu_ids ? (boot_cpuid + 1) : nr_cpu_ids;
>
> - new_ptrs_size = sizeof(struct paca_struct *) * nr_cpu_ids;
> + new_ptrs_size = sizeof(struct paca_struct *) * n;
> if (new_ptrs_size < paca_ptrs_size)
> memblock_phys_free(__pa(paca_ptrs) + new_ptrs_size,
> paca_ptrs_size - new_ptrs_size);
>
> - paca_nr_cpu_ids = nr_cpu_ids;
> + paca_nr_cpu_ids = n;
> paca_ptrs_size = new_ptrs_size;
>
> #ifdef CONFIG_PPC_64S_HASH_MMU
> diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
> index 87272a2d8c10..15c994f54bf9 100644
> --- a/arch/powerpc/kernel/prom.c
> +++ b/arch/powerpc/kernel/prom.c
> @@ -362,9 +362,12 @@ static int __init early_init_dt_scan_cpus(unsigned long node,
> */
> boot_cpuid = i;
> found = true;
> - /* This works around the hole in paca_ptrs[]. */
> - if (nr_cpu_ids < nthreads)
> - set_nr_cpu_ids(nthreads);
> + /*
> + * Ideally, nr_cpus=1 can be achieved if each kernel
> + * component does not assume cpu0 is onlined.
> + */
> + if (boot_cpuid != 0 && nr_cpu_ids < 2)
> + set_nr_cpu_ids(2);
> }
> #ifdef CONFIG_SMP
> /* logical cpu id is always 0 on UP kernels */
> --
> 2.31.1
>
--
Mahesh J Salgaonkar
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCHv7 4/4] powerpc/setup: alloc extra paca_ptrs to hold boot_cpuid
2023-10-03 18:06 ` Mahesh J Salgaonkar
@ 2023-10-07 1:03 ` Pingfan Liu
0 siblings, 0 replies; 10+ messages in thread
From: Pingfan Liu @ 2023-10-07 1:03 UTC (permalink / raw)
To: mahesh
Cc: Baoquan He, kexec, Ming Lei, Nicholas Piggin, linuxppc-dev,
Wen Xiong
On Wed, Oct 4, 2023 at 2:07 AM Mahesh J Salgaonkar <mahesh@linux.ibm.com> wrote:
>
> On 2023-09-25 15:53:48 Mon, Pingfan Liu wrote:
> > paca_ptrs should be large enough to hold the boot_cpuid, hence, its
> > lower boundary is set to the bigger one between boot_cpuid+1 and
> > nr_cpus.
> >
> > On the other hand, some kernel component: -1. the timer assumes cpu0
> > online since the timer_list->flags subfield 'TIMER_CPUMASK' is zero if
> > not initialized to a proper present cpu. -2. power9_idle_stop() assumes
> > the primary thread's paca is allocated.
> >
> > Hence lift nr_cpu_ids from one to two to ensure cpu0 is onlined, if the
> > boot cpu is not cpu0.
> >
> > Result:
> > When nr_cpus=1, taskset -c 14 bash -c 'echo c > /proc/sysrq-trigger'
> > the kdump kernel brings up two cpus.
> > While when taskset -c 4 bash -c 'echo c > /proc/sysrq-trigger',
> > the kdump kernel brings up one cpu.
>
> I tried your changes on power9 and power10 systems. However, on power10 lpar I
> see bellow backtrace in kdump kernel bootup with nr_cpus=1.
>
Thanks for the testing. I have only tried this series on Power9 bare
metal. I think the bug is related with the code snippet in
update_mask_from_threadgroup()
for (i = first_thread; i < first_thread + threads_per_core; i++) {
int i_group_start = get_cpu_thread_group_start(i, tg);
^^^
Here it iterates over each thread in the core, but some of them are not online.
I will try to bring up a remedy.
Thanks,
Pingfan
> $ taskset -c 4 bash -c 'echo c > /proc/sysrq-trigger'
> [...]
> [ 0.000000] Hardware name: IBM,9105-22A POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1040.00 (NL1040_005) hv:phyp pSeries
> [ 0.000000] printk: bootconsole [udbg0] enabled
> [ 0.000000] the round shift between dt seq and the cpu logic number: 8
> [ 0.000000] Partition configured for 16 cpus, operating system maximum is 2.
> [ 0.000000] CPU maps initialized for 8 threads per core
> [...]
> [ 0.002249] BUG: Unable to handle kernel data access at 0x88888888888888c0
> [ 0.002260] Faulting instruction address: 0xc00000001201226c
> [ 0.002268] Oops: Kernel access of bad area, sig: 11 [#1]
> [ 0.002274] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
> [ 0.002282] Modules linked in:
> [ 0.002288] CPU: 4 PID: 1 Comm: swapper/4 Not tainted 6.6.0-rc4 #1
> [ 0.002296] Hardware name: IBM,9105-22A POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1040.00 (NL1040_005) hv:phyp pSeries
> [ 0.002305] NIP: c00000001201226c LR: c000000012012234 CTR: 0000000000000004
> [ 0.002312] REGS: c0000000167ff8f0 TRAP: 0380 Not tainted (6.6.0-rc4)
> [ 0.002321] MSR: 8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE> CR: 24000844 XER: 0000000a
> [ 0.002346] CFAR: c00000001201231c IRQMASK: 0
> [ 0.002346] GPR00: c000000012012234 c0000000167ffb90 c000000011b61900 0000000000000002
> [ 0.002346] GPR04: 0000000000000000 0000000000000001 0000000000000001 c00000004ffeff80
> [ 0.002346] GPR08: 0000000000000000 8888888888888888 0000000000000002 0000000000000000
> [ 0.002346] GPR12: 0000000000000000 c000000013141000 c000000010011058 0000000000000000
> [ 0.002346] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [ 0.002346] GPR20: 0000000000000028 c000000012170968 c0000000120a3e80 0000000000000016
> [ 0.002346] GPR24: c00000004ffdcfd0 0000000000000000 c000000012b82058 0000000000000000
> [ 0.002346] GPR28: c00000004fc80a68 c000000012bf0350 c0000000120a3e2c 0000000000000000
> [ 0.002426] NIP [c00000001201226c] update_mask_from_threadgroup+0x98/0x174
> [ 0.002437] LR [c000000012012234] update_mask_from_threadgroup+0x60/0x174
> [ 0.002444] Call Trace:
> [ 0.002451] [c0000000167ffb90] [c000000012012234] update_mask_from_threadgroup+0x60/0x174 (unreliable)
> [ 0.002464] [c0000000167ffbe0] [c0000000120125f8] init_thread_group_cache_map+0x2b0/0x328
> [ 0.002477] [c0000000167ffc50] [c00000001201296c] smp_prepare_cpus+0x2fc/0x4f0
> [ 0.002497] [c0000000167ffd10] [c000000012004e40] kernel_init_freeable+0x198/0x3cc
> [ 0.002509] [c0000000167ffde0] [c000000010011084] kernel_init+0x34/0x1b0
> [ 0.002531] [c0000000167ffe50] [c00000001000dd3c] ret_from_kernel_user_thread+0x14/0x1c
> [ 0.002547] --- interrupt: 0 at 0x0
> [ 0.002553] NIP: 0000000000000000 LR: 0000000000000000 CTR: 0000000000000000
> [ 0.002563] REGS: c0000000167ffe80 TRAP: 0000 Not tainted (6.6.0-rc4)
> [ 0.002569] MSR: 0000000000000000 <> CR: 00000000 XER: 00000000
> [ 0.002576] CFAR: 0000000000000000 IRQMASK: 0
> [ 0.002576] GPR00: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [ 0.002576] GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [ 0.002576] GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [ 0.002576] GPR12: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [ 0.002576] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [ 0.002576] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [ 0.002576] GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [ 0.002576] GPR28: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [ 0.002671] NIP [0000000000000000] 0x0
> [ 0.002680] LR [0000000000000000] 0x0
> [ 0.002689] --- interrupt: 0
> [ 0.002697] Code: 7feafb78 813d0000 7d29fa14 7f895000 409d00d4 3ce20102 38e74758 79491f24 e87e0006 39000000 e8e70000 7d27482a <a8890038> 7f834000 79090020 419e005c
> [ 0.002727] ---[ end trace 0000000000000000 ]---
> [ 0.002739]
> [ 1.002749] Kernel panic - not syncing: Fatal exception
> [ 1.002795] Rebooting in 10 seconds..
>
> Thanks,
> -Mahesh.
>
> >
> > Signed-off-by: Pingfan Liu <piliu@redhat.com>
> > Cc: Michael Ellerman <mpe@ellerman.id.au>
> > Cc: Nicholas Piggin <npiggin@gmail.com>
> > Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
> > Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com>
> > Cc: Wen Xiong <wenxiong@us.ibm.com>
> > Cc: Baoquan He <bhe@redhat.com>
> > Cc: Ming Lei <ming.lei@redhat.com>
> > Cc: kexec@lists.infradead.org
> > To: linuxppc-dev@lists.ozlabs.org
> > ---
> > arch/powerpc/kernel/paca.c | 10 ++++++----
> > arch/powerpc/kernel/prom.c | 9 ++++++---
> > 2 files changed, 12 insertions(+), 7 deletions(-)
> >
> > diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
> > index cda4e00b67c1..91e2401de1bd 100644
> > --- a/arch/powerpc/kernel/paca.c
> > +++ b/arch/powerpc/kernel/paca.c
> > @@ -242,9 +242,10 @@ static int __initdata paca_struct_size;
> >
> > void __init allocate_paca_ptrs(void)
> > {
> > - paca_nr_cpu_ids = nr_cpu_ids;
> > + int n = (boot_cpuid + 1) > nr_cpu_ids ? (boot_cpuid + 1) : nr_cpu_ids;
> >
> > - paca_ptrs_size = sizeof(struct paca_struct *) * nr_cpu_ids;
> > + paca_nr_cpu_ids = n;
> > + paca_ptrs_size = sizeof(struct paca_struct *) * n;
> > paca_ptrs = memblock_alloc_raw(paca_ptrs_size, SMP_CACHE_BYTES);
> > if (!paca_ptrs)
> > panic("Failed to allocate %d bytes for paca pointers\n",
> > @@ -287,13 +288,14 @@ void __init allocate_paca(int cpu)
> > void __init free_unused_pacas(void)
> > {
> > int new_ptrs_size;
> > + int n = (boot_cpuid + 1) > nr_cpu_ids ? (boot_cpuid + 1) : nr_cpu_ids;
> >
> > - new_ptrs_size = sizeof(struct paca_struct *) * nr_cpu_ids;
> > + new_ptrs_size = sizeof(struct paca_struct *) * n;
> > if (new_ptrs_size < paca_ptrs_size)
> > memblock_phys_free(__pa(paca_ptrs) + new_ptrs_size,
> > paca_ptrs_size - new_ptrs_size);
> >
> > - paca_nr_cpu_ids = nr_cpu_ids;
> > + paca_nr_cpu_ids = n;
> > paca_ptrs_size = new_ptrs_size;
> >
> > #ifdef CONFIG_PPC_64S_HASH_MMU
> > diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
> > index 87272a2d8c10..15c994f54bf9 100644
> > --- a/arch/powerpc/kernel/prom.c
> > +++ b/arch/powerpc/kernel/prom.c
> > @@ -362,9 +362,12 @@ static int __init early_init_dt_scan_cpus(unsigned long node,
> > */
> > boot_cpuid = i;
> > found = true;
> > - /* This works around the hole in paca_ptrs[]. */
> > - if (nr_cpu_ids < nthreads)
> > - set_nr_cpu_ids(nthreads);
> > + /*
> > + * Ideally, nr_cpus=1 can be achieved if each kernel
> > + * component does not assume cpu0 is onlined.
> > + */
> > + if (boot_cpuid != 0 && nr_cpu_ids < 2)
> > + set_nr_cpu_ids(2);
> > }
> > #ifdef CONFIG_SMP
> > /* logical cpu id is always 0 on UP kernels */
> > --
> > 2.31.1
> >
>
> --
> Mahesh J Salgaonkar
>
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2023-10-07 1:04 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-25 7:53 [PATCHv7 0/4] enable nr_cpus for powerpc Pingfan Liu
2023-09-25 7:53 ` [PATCHv7 1/4] powerpc/setup : Enable boot_cpu_hwid for PPC32 Pingfan Liu
2023-09-25 7:53 ` [PATCHv7 2/4] powerpc/setup: Loosen the mapping between cpu logical id and its seq in dt Pingfan Liu
2023-09-28 20:36 ` Wen Xiong
2023-09-29 3:19 ` Pingfan Liu
2023-09-29 6:43 ` Christophe Leroy
2023-09-25 7:53 ` [PATCHv7 3/4] powerpc/setup: Handle the case when boot_cpuid greater than nr_cpus Pingfan Liu
2023-09-25 7:53 ` [PATCHv7 4/4] powerpc/setup: alloc extra paca_ptrs to hold boot_cpuid Pingfan Liu
2023-10-03 18:06 ` Mahesh J Salgaonkar
2023-10-07 1:03 ` Pingfan Liu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).