linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCHv7 0/4] enable nr_cpus for powerpc
@ 2023-09-25  7:53 Pingfan Liu
  2023-09-25  7:53 ` [PATCHv7 1/4] powerpc/setup : Enable boot_cpu_hwid for PPC32 Pingfan Liu
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Pingfan Liu @ 2023-09-25  7:53 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Baoquan He, Pingfan Liu, Mahesh Salgaonkar, kexec, Ming Lei,
	Nicholas Piggin, Wen Xiong

Since my last v4 [1], the code has undergone great changes. The paca[]
array has been reorganized and indexed by paca_ptrs[], which
dramatically decreases the memory consumption even if there are many
unpresent cpus in the middle.

However, reordering the logical cpu numbers can further decrease the
size of paca_ptrs[] in the kdump case. So I keep [2/4], which
rotate-shifts the cpu's sequence number in the device tree to obtain the
logical cpu id.

Patch [3-4/4] make efforts to decrease the nr_cpus to be less than or
equal to two.

[1]: https://lore.kernel.org/linuxppc-dev/1520829790-14029-1-git-send-email-kernelfans@gmail.com/
---
v6 -> v7
  Add [1/4], which fixes compilation error on PPC32

Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Mahesh Salgaonkar <mahesh@us.ibm.com>
Cc: Wen Xiong <wenxiong@us.ibm.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: kexec@lists.infradead.org
To: linuxppc-dev@lists.ozlabs.org


Pingfan Liu (4):
  powerpc/setup : Enable boot_cpu_hwid for PPC32
  powerpc/setup: Loosen the mapping between cpu logical id and its seq
    in dt
  powerpc/setup: Handle the case when boot_cpuid greater than nr_cpus
  powerpc/setup: alloc extra paca_ptrs to hold boot_cpuid

 arch/powerpc/include/asm/smp.h     |   2 +-
 arch/powerpc/kernel/paca.c         |  10 +--
 arch/powerpc/kernel/prom.c         |  29 +++++---
 arch/powerpc/kernel/setup-common.c | 108 +++++++++++++++++++++++------
 4 files changed, 114 insertions(+), 35 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCHv7 1/4] powerpc/setup : Enable boot_cpu_hwid for PPC32
  2023-09-25  7:53 [PATCHv7 0/4] enable nr_cpus for powerpc Pingfan Liu
@ 2023-09-25  7:53 ` Pingfan Liu
  2023-09-25  7:53 ` [PATCHv7 2/4] powerpc/setup: Loosen the mapping between cpu logical id and its seq in dt Pingfan Liu
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 10+ messages in thread
From: Pingfan Liu @ 2023-09-25  7:53 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Baoquan He, Pingfan Liu, kexec, Mahesh Salgaonkar, Ming Lei,
	kernel test robot, Nicholas Piggin, Wen Xiong

In order to identify the boot cpu, its intserv[] should be recorded and
checked in smp_setup_cpu_maps().

smp_setup_cpu_maps() is shared between PPC64 and PPC32. Since PPC64 has
already used boot_cpu_hwid to carry that information, enabling this
variable on PPC32 so later it can also be used to carry that information
for PPC32 in the coming patch.

Signed-off-by: Pingfan Liu <piliu@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Cc: Wen Xiong <wenxiong@us.ibm.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: kexec@lists.infradead.org
To: linuxppc-dev@lists.ozlabs.org
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202309130232.N2REwHBv-lkp@intel.com/
---
 arch/powerpc/include/asm/smp.h     | 2 +-
 arch/powerpc/kernel/prom.c         | 3 +--
 arch/powerpc/kernel/setup-common.c | 2 --
 3 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
index aaaa576d0e15..5db9178cc800 100644
--- a/arch/powerpc/include/asm/smp.h
+++ b/arch/powerpc/include/asm/smp.h
@@ -26,7 +26,7 @@
 #include <asm/percpu.h>
 
 extern int boot_cpuid;
-extern int boot_cpu_hwid; /* PPC64 only */
+extern int boot_cpu_hwid;
 extern int spinning_secondaries;
 extern u32 *cpu_to_phys_id;
 extern bool coregroup_enabled;
diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index 0b5878c3125b..ec82f5bda908 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -372,8 +372,7 @@ static int __init early_init_dt_scan_cpus(unsigned long node,
 	    be32_to_cpu(intserv[found_thread]));
 	boot_cpuid = found;
 
-	if (IS_ENABLED(CONFIG_PPC64))
-		boot_cpu_hwid = be32_to_cpu(intserv[found_thread]);
+	boot_cpu_hwid = be32_to_cpu(intserv[found_thread]);
 
 	/*
 	 * PAPR defines "logical" PVR values for cpus that
diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c
index d2a446216444..1b19a9815672 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -87,9 +87,7 @@ EXPORT_SYMBOL(machine_id);
 int boot_cpuid = -1;
 EXPORT_SYMBOL_GPL(boot_cpuid);
 
-#ifdef CONFIG_PPC64
 int boot_cpu_hwid = -1;
-#endif
 
 /*
  * These are used in binfmt_elf.c to put aux entries on the stack
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCHv7 2/4] powerpc/setup: Loosen the mapping between cpu logical id and its seq in dt
  2023-09-25  7:53 [PATCHv7 0/4] enable nr_cpus for powerpc Pingfan Liu
  2023-09-25  7:53 ` [PATCHv7 1/4] powerpc/setup : Enable boot_cpu_hwid for PPC32 Pingfan Liu
@ 2023-09-25  7:53 ` Pingfan Liu
  2023-09-28 20:36   ` Wen Xiong
  2023-09-25  7:53 ` [PATCHv7 3/4] powerpc/setup: Handle the case when boot_cpuid greater than nr_cpus Pingfan Liu
  2023-09-25  7:53 ` [PATCHv7 4/4] powerpc/setup: alloc extra paca_ptrs to hold boot_cpuid Pingfan Liu
  3 siblings, 1 reply; 10+ messages in thread
From: Pingfan Liu @ 2023-09-25  7:53 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Baoquan He, Pingfan Liu, kexec, Mahesh Salgaonkar, Ming Lei,
	Nicholas Piggin, Wen Xiong

*** Idea ***
For kexec -p, the boot cpu can be not the cpu0, this causes the problem
of allocating memory for paca_ptrs[]. However, in theory, there is no
requirement to assign cpu's logical id as its present sequence in the
device tree. But there is something like cpu_first_thread_sibling(),
which makes assumption on the mapping inside a core. Hence partially
loosening the mapping, i.e. unbind the mapping of core while keep the
mapping inside a core.

*** Implement ***
At this early stage, there are plenty of memory to utilize. Hence, this
patch allocates interim memory to link the cpu info on a list, then
reorder cpus by changing the list head. As a result, there is a rotate
shift between the sequence number in dt and the cpu logical number.

*** Result ***
After this patch, a boot-cpu's logical id will always be mapped into the
range [0,threads_per_core).

Besides this, at this phase, all threads in the boot core are forced to
be onlined. This restriction will be lifted in a later patch with
extra effort.

Signed-off-by: Pingfan Liu <piliu@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Cc: Wen Xiong <wenxiong@us.ibm.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: kexec@lists.infradead.org
To: linuxppc-dev@lists.ozlabs.org
---
 arch/powerpc/kernel/prom.c         | 25 +++++----
 arch/powerpc/kernel/setup-common.c | 87 +++++++++++++++++++++++-------
 2 files changed, 85 insertions(+), 27 deletions(-)

diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index ec82f5bda908..87272a2d8c10 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -76,7 +76,9 @@ u64 ppc64_rma_size;
 unsigned int boot_cpu_node_count __ro_after_init;
 #endif
 static phys_addr_t first_memblock_size;
+#ifdef CONFIG_SMP
 static int __initdata boot_cpu_count;
+#endif
 
 static int __init early_parse_mem(char *p)
 {
@@ -331,8 +333,7 @@ static int __init early_init_dt_scan_cpus(unsigned long node,
 	const __be32 *intserv;
 	int i, nthreads;
 	int len;
-	int found = -1;
-	int found_thread = 0;
+	bool found = false;
 
 	/* We are scanning "cpu" nodes only */
 	if (type == NULL || strcmp(type, "cpu") != 0)
@@ -355,8 +356,15 @@ static int __init early_init_dt_scan_cpus(unsigned long node,
 	for (i = 0; i < nthreads; i++) {
 		if (be32_to_cpu(intserv[i]) ==
 			fdt_boot_cpuid_phys(initial_boot_params)) {
-			found = boot_cpu_count;
-			found_thread = i;
+			/*
+			 * always map the boot-cpu logical id into the
+			 * range of [0, thread_per_core)
+			 */
+			boot_cpuid = i;
+			found = true;
+			/* This works around the hole in paca_ptrs[]. */
+			if (nr_cpu_ids < nthreads)
+				set_nr_cpu_ids(nthreads);
 		}
 #ifdef CONFIG_SMP
 		/* logical cpu id is always 0 on UP kernels */
@@ -365,14 +373,13 @@ static int __init early_init_dt_scan_cpus(unsigned long node,
 	}
 
 	/* Not the boot CPU */
-	if (found < 0)
+	if (!found)
 		return 0;
 
-	DBG("boot cpu: logical %d physical %d\n", found,
-	    be32_to_cpu(intserv[found_thread]));
-	boot_cpuid = found;
+	DBG("boot cpu: logical %d physical %d\n", boot_cpuid,
+	    be32_to_cpu(intserv[boot_cpuid]));
 
-	boot_cpu_hwid = be32_to_cpu(intserv[found_thread]);
+	boot_cpu_hwid = be32_to_cpu(intserv[boot_cpuid]);
 
 	/*
 	 * PAPR defines "logical" PVR values for cpus that
diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c
index 1b19a9815672..f6d32324b5a5 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -36,6 +36,7 @@
 #include <linux/of_platform.h>
 #include <linux/hugetlb.h>
 #include <linux/pgtable.h>
+#include <linux/list.h>
 #include <asm/io.h>
 #include <asm/paca.h>
 #include <asm/processor.h>
@@ -425,6 +426,13 @@ static void __init cpu_init_thread_core_maps(int tpc)
 
 u32 *cpu_to_phys_id = NULL;
 
+struct interrupt_server_node {
+	struct list_head node;
+	bool	avail;
+	int	len;
+	__be32 *intserv;
+};
+
 /**
  * setup_cpu_maps - initialize the following cpu maps:
  *                  cpu_possible_mask
@@ -446,11 +454,16 @@ u32 *cpu_to_phys_id = NULL;
 void __init smp_setup_cpu_maps(void)
 {
 	struct device_node *dn;
-	int cpu = 0;
-	int nthreads = 1;
+	int shift = 0, cpu = 0;
+	int j, nthreads = 1;
+	int len;
+	struct interrupt_server_node *intserv_node, *n;
+	struct list_head *bt_node, head;
+	bool avail, found_boot_cpu = false;
 
 	DBG("smp_setup_cpu_maps()\n");
 
+	INIT_LIST_HEAD(&head);
 	cpu_to_phys_id = memblock_alloc(nr_cpu_ids * sizeof(u32),
 					__alignof__(u32));
 	if (!cpu_to_phys_id)
@@ -460,7 +473,6 @@ void __init smp_setup_cpu_maps(void)
 	for_each_node_by_type(dn, "cpu") {
 		const __be32 *intserv;
 		__be32 cpu_be;
-		int j, len;
 
 		DBG("  * %pOF...\n", dn);
 
@@ -480,29 +492,68 @@ void __init smp_setup_cpu_maps(void)
 			}
 		}
 
-		nthreads = len / sizeof(int);
-
-		for (j = 0; j < nthreads && cpu < nr_cpu_ids; j++) {
-			bool avail;
+		avail = of_device_is_available(dn);
+		if (!avail)
+			avail = !of_property_match_string(dn,
+					"enable-method", "spin-table");
 
-			DBG("    thread %d -> cpu %d (hard id %d)\n",
-			    j, cpu, be32_to_cpu(intserv[j]));
 
-			avail = of_device_is_available(dn);
-			if (!avail)
-				avail = !of_property_match_string(dn,
-						"enable-method", "spin-table");
+		intserv_node = memblock_alloc(sizeof(struct interrupt_server_node) + len,
+					__alignof__(u32));
+		if (!intserv_node)
+			panic("%s: Failed to allocate %zu bytes align=0x%zx\n",
+				__func__,
+				sizeof(struct interrupt_server_node) + len,
+				__alignof__(u32));
+		intserv_node->intserv = (__be32 *)((char *)intserv_node +
+						sizeof(struct interrupt_server_node));
+		intserv_node->len = len;
+		memcpy(intserv_node->intserv, intserv, len);
+		intserv_node->avail = avail;
+		INIT_LIST_HEAD(&intserv_node->node);
+		list_add_tail(&intserv_node->node, &head);
+
+		if (!found_boot_cpu) {
+			nthreads = len / sizeof(int);
+			for (j = 0 ; j < nthreads; j++) {
+				if (be32_to_cpu(intserv[j]) == boot_cpu_hwid) {
+					bt_node = &intserv_node->node;
+					found_boot_cpu = true;
+					/*
+					 * Record the round-shift between dt
+					 * seq and cpu logical number
+					 */
+					shift = cpu - j;
+					break;
+				}
+
+				cpu++;
+			}
+		}
 
+	}
+	cpu = 0;
+	list_del_init(&head);
+	/* Select the primary thread, the boot cpu's slibing, as the logic 0 */
+	list_add_tail(&head, bt_node);
+	pr_info("the round shift between dt seq and the cpu logic number: %d\n", shift);
+	list_for_each_entry(intserv_node, &head, node) {
+
+		avail = intserv_node->avail;
+		nthreads = intserv_node->len / sizeof(int);
+		for (j = 0; j < nthreads && cpu < nr_cpu_ids; j++) {
 			set_cpu_present(cpu, avail);
 			set_cpu_possible(cpu, true);
-			cpu_to_phys_id[cpu] = be32_to_cpu(intserv[j]);
+			cpu_to_phys_id[cpu] = be32_to_cpu(intserv_node->intserv[j]);
+			DBG("    thread %d -> cpu %d (hard id %d)\n",
+			    j, cpu, be32_to_cpu(intserv[j]));
 			cpu++;
 		}
+	}
 
-		if (cpu >= nr_cpu_ids) {
-			of_node_put(dn);
-			break;
-		}
+	list_for_each_entry_safe(intserv_node, n, &head, node) {
+		len = sizeof(struct interrupt_server_node) + intserv_node->len;
+		memblock_free(intserv_node, len);
 	}
 
 	/* If no SMT supported, nthreads is forced to 1 */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCHv7 3/4] powerpc/setup: Handle the case when boot_cpuid greater than nr_cpus
  2023-09-25  7:53 [PATCHv7 0/4] enable nr_cpus for powerpc Pingfan Liu
  2023-09-25  7:53 ` [PATCHv7 1/4] powerpc/setup : Enable boot_cpu_hwid for PPC32 Pingfan Liu
  2023-09-25  7:53 ` [PATCHv7 2/4] powerpc/setup: Loosen the mapping between cpu logical id and its seq in dt Pingfan Liu
@ 2023-09-25  7:53 ` Pingfan Liu
  2023-09-25  7:53 ` [PATCHv7 4/4] powerpc/setup: alloc extra paca_ptrs to hold boot_cpuid Pingfan Liu
  3 siblings, 0 replies; 10+ messages in thread
From: Pingfan Liu @ 2023-09-25  7:53 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Baoquan He, Pingfan Liu, kexec, Mahesh Salgaonkar, Ming Lei,
	Nicholas Piggin, Wen Xiong

If the boot_cpuid is smaller than nr_cpus, it requires extra effort to
ensure the boot_cpu is in cpu_present_mask. This can be achieved by
reserving the last quota for the boot cpu.

Note: the restriction on nr_cpus will be lifted with more effort in the
next patch

Signed-off-by: Pingfan Liu <piliu@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Cc: Wen Xiong <wenxiong@us.ibm.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: kexec@lists.infradead.org
To: linuxppc-dev@lists.ozlabs.org
---
 arch/powerpc/kernel/setup-common.c | 25 ++++++++++++++++++++++---
 1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c
index f6d32324b5a5..a72d00a6cff2 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -454,8 +454,8 @@ struct interrupt_server_node {
 void __init smp_setup_cpu_maps(void)
 {
 	struct device_node *dn;
-	int shift = 0, cpu = 0;
-	int j, nthreads = 1;
+	int terminate, shift = 0, cpu = 0;
+	int j, bt_thread = 0, nthreads = 1;
 	int len;
 	struct interrupt_server_node *intserv_node, *n;
 	struct list_head *bt_node, head;
@@ -518,6 +518,7 @@ void __init smp_setup_cpu_maps(void)
 			for (j = 0 ; j < nthreads; j++) {
 				if (be32_to_cpu(intserv[j]) == boot_cpu_hwid) {
 					bt_node = &intserv_node->node;
+					bt_thread = j;
 					found_boot_cpu = true;
 					/*
 					 * Record the round-shift between dt
@@ -537,11 +538,21 @@ void __init smp_setup_cpu_maps(void)
 	/* Select the primary thread, the boot cpu's slibing, as the logic 0 */
 	list_add_tail(&head, bt_node);
 	pr_info("the round shift between dt seq and the cpu logic number: %d\n", shift);
+	terminate = nr_cpu_ids;
 	list_for_each_entry(intserv_node, &head, node) {
 
+		j = 0;
+		/* Choose a start point to cover the boot cpu */
+		if (nr_cpu_ids - 1 < bt_thread) {
+			/*
+			 * The processor core puts assumption on the thread id,
+			 * not to breach the assumption.
+			 */
+			terminate = nr_cpu_ids - 1;
+		}
 		avail = intserv_node->avail;
 		nthreads = intserv_node->len / sizeof(int);
-		for (j = 0; j < nthreads && cpu < nr_cpu_ids; j++) {
+		for (; j < nthreads && cpu < terminate; j++) {
 			set_cpu_present(cpu, avail);
 			set_cpu_possible(cpu, true);
 			cpu_to_phys_id[cpu] = be32_to_cpu(intserv_node->intserv[j]);
@@ -549,6 +560,14 @@ void __init smp_setup_cpu_maps(void)
 			    j, cpu, be32_to_cpu(intserv[j]));
 			cpu++;
 		}
+		/* Online the boot cpu */
+		if (nr_cpu_ids - 1 < bt_thread) {
+			set_cpu_present(bt_thread, avail);
+			set_cpu_possible(bt_thread, true);
+			cpu_to_phys_id[bt_thread] = be32_to_cpu(intserv_node->intserv[bt_thread]);
+			DBG("    thread %d -> cpu %d (hard id %d)\n",
+			    bt_thread, bt_thread, be32_to_cpu(intserv[bt_thread]));
+		}
 	}
 
 	list_for_each_entry_safe(intserv_node, n, &head, node) {
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCHv7 4/4] powerpc/setup: alloc extra paca_ptrs to hold boot_cpuid
  2023-09-25  7:53 [PATCHv7 0/4] enable nr_cpus for powerpc Pingfan Liu
                   ` (2 preceding siblings ...)
  2023-09-25  7:53 ` [PATCHv7 3/4] powerpc/setup: Handle the case when boot_cpuid greater than nr_cpus Pingfan Liu
@ 2023-09-25  7:53 ` Pingfan Liu
  2023-10-03 18:06   ` Mahesh J Salgaonkar
  3 siblings, 1 reply; 10+ messages in thread
From: Pingfan Liu @ 2023-09-25  7:53 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Baoquan He, Pingfan Liu, kexec, Mahesh Salgaonkar, Ming Lei,
	Nicholas Piggin, Wen Xiong

paca_ptrs should be large enough to hold the boot_cpuid, hence, its
lower boundary is set to the bigger one between boot_cpuid+1 and
nr_cpus.

On the other hand, some kernel component: -1. the timer assumes cpu0
online since the timer_list->flags subfield 'TIMER_CPUMASK' is zero if
not initialized to a proper present cpu.  -2. power9_idle_stop() assumes
the primary thread's paca is allocated.

Hence lift nr_cpu_ids from one to two to ensure cpu0 is onlined, if the
boot cpu is not cpu0.

Result:
When nr_cpus=1, taskset -c 14 bash -c 'echo c > /proc/sysrq-trigger'
the kdump kernel brings up two cpus.
While when taskset -c 4 bash -c 'echo c > /proc/sysrq-trigger',
the kdump kernel brings up one cpu.

Signed-off-by: Pingfan Liu <piliu@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Cc: Wen Xiong <wenxiong@us.ibm.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: kexec@lists.infradead.org
To: linuxppc-dev@lists.ozlabs.org
---
 arch/powerpc/kernel/paca.c | 10 ++++++----
 arch/powerpc/kernel/prom.c |  9 ++++++---
 2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
index cda4e00b67c1..91e2401de1bd 100644
--- a/arch/powerpc/kernel/paca.c
+++ b/arch/powerpc/kernel/paca.c
@@ -242,9 +242,10 @@ static int __initdata paca_struct_size;
 
 void __init allocate_paca_ptrs(void)
 {
-	paca_nr_cpu_ids = nr_cpu_ids;
+	int n = (boot_cpuid + 1) > nr_cpu_ids ? (boot_cpuid + 1) : nr_cpu_ids;
 
-	paca_ptrs_size = sizeof(struct paca_struct *) * nr_cpu_ids;
+	paca_nr_cpu_ids = n;
+	paca_ptrs_size = sizeof(struct paca_struct *) * n;
 	paca_ptrs = memblock_alloc_raw(paca_ptrs_size, SMP_CACHE_BYTES);
 	if (!paca_ptrs)
 		panic("Failed to allocate %d bytes for paca pointers\n",
@@ -287,13 +288,14 @@ void __init allocate_paca(int cpu)
 void __init free_unused_pacas(void)
 {
 	int new_ptrs_size;
+	int n = (boot_cpuid + 1) > nr_cpu_ids ? (boot_cpuid + 1) : nr_cpu_ids;
 
-	new_ptrs_size = sizeof(struct paca_struct *) * nr_cpu_ids;
+	new_ptrs_size = sizeof(struct paca_struct *) * n;
 	if (new_ptrs_size < paca_ptrs_size)
 		memblock_phys_free(__pa(paca_ptrs) + new_ptrs_size,
 				   paca_ptrs_size - new_ptrs_size);
 
-	paca_nr_cpu_ids = nr_cpu_ids;
+	paca_nr_cpu_ids = n;
 	paca_ptrs_size = new_ptrs_size;
 
 #ifdef CONFIG_PPC_64S_HASH_MMU
diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index 87272a2d8c10..15c994f54bf9 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -362,9 +362,12 @@ static int __init early_init_dt_scan_cpus(unsigned long node,
 			 */
 			boot_cpuid = i;
 			found = true;
-			/* This works around the hole in paca_ptrs[]. */
-			if (nr_cpu_ids < nthreads)
-				set_nr_cpu_ids(nthreads);
+			/*
+			 * Ideally, nr_cpus=1 can be achieved if each kernel
+			 * component does not assume cpu0 is onlined.
+			 */
+			if (boot_cpuid != 0 && nr_cpu_ids < 2)
+				set_nr_cpu_ids(2);
 		}
 #ifdef CONFIG_SMP
 		/* logical cpu id is always 0 on UP kernels */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* RE:  [PATCHv7 2/4] powerpc/setup: Loosen the mapping between cpu logical id and its seq in dt
  2023-09-25  7:53 ` [PATCHv7 2/4] powerpc/setup: Loosen the mapping between cpu logical id and its seq in dt Pingfan Liu
@ 2023-09-28 20:36   ` Wen Xiong
  2023-09-29  3:19     ` Pingfan Liu
  2023-09-29  6:43     ` Christophe Leroy
  0 siblings, 2 replies; 10+ messages in thread
From: Wen Xiong @ 2023-09-28 20:36 UTC (permalink / raw)
  To: Pingfan Liu, linuxppc-dev@lists.ozlabs.org
  Cc: Pingfan Liu, Baoquan He, kexec@lists.infradead.org,
	Mahesh Salgaonkar, Ming Lei, Nicholas Piggin

Hi Pingfan,

+		avail = intserv_node->avail;
+		nthreads = intserv_node->len / sizeof(int);
+		for (j = 0; j < nthreads && cpu < nr_cpu_ids; j++) {
 			set_cpu_present(cpu, avail);
 			set_cpu_possible(cpu, true);
-			cpu_to_phys_id[cpu] = be32_to_cpu(intserv[j]);
+			cpu_to_phys_id[cpu] = be32_to_cpu(intserv_node->intserv[j]);
+			DBG("    thread %d -> cpu %d (hard id %d)\n",
+			    j, cpu, be32_to_cpu(intserv[j]));

Intserv is not defined. Should "be32_to_cpu(intserv_node->intserv[j])?
 			cpu++;
 		}
+	}

-----Original Message-----
From: Pingfan Liu <piliu@redhat.com> 
Sent: Monday, September 25, 2023 2:54 AM
To: linuxppc-dev@lists.ozlabs.org
Cc: Pingfan Liu <piliu@redhat.com>; Michael Ellerman <mpe@ellerman.id.au>; Nicholas Piggin <npiggin@gmail.com>; Christophe Leroy <christophe.leroy@csgroup.eu>; Mahesh Salgaonkar <mahesh@linux.ibm.com>; Wen Xiong <wenxiong@us.ibm.com>; Baoquan He <bhe@redhat.com>; Ming Lei <ming.lei@redhat.com>; kexec@lists.infradead.org
Subject: [EXTERNAL] [PATCHv7 2/4] powerpc/setup: Loosen the mapping between cpu logical id and its seq in dt

*** Idea ***
For kexec -p, the boot cpu can be not the cpu0, this causes the problem of allocating memory for paca_ptrs[]. However, in theory, there is no requirement to assign cpu's logical id as its present sequence in the device tree. But there is something like cpu_first_thread_sibling(), which makes assumption on the mapping inside a core. Hence partially loosening the mapping, i.e. unbind the mapping of core while keep the mapping inside a core.

*** Implement ***
At this early stage, there are plenty of memory to utilize. Hence, this patch allocates interim memory to link the cpu info on a list, then reorder cpus by changing the list head. As a result, there is a rotate shift between the sequence number in dt and the cpu logical number.

*** Result ***
After this patch, a boot-cpu's logical id will always be mapped into the range [0,threads_per_core).

Besides this, at this phase, all threads in the boot core are forced to be onlined. This restriction will be lifted in a later patch with extra effort.

Signed-off-by: Pingfan Liu <piliu@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Cc: Wen Xiong <wenxiong@us.ibm.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: kexec@lists.infradead.org
To: linuxppc-dev@lists.ozlabs.org
---
 arch/powerpc/kernel/prom.c         | 25 +++++----
 arch/powerpc/kernel/setup-common.c | 87 +++++++++++++++++++++++-------
 2 files changed, 85 insertions(+), 27 deletions(-)

diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c index ec82f5bda908..87272a2d8c10 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -76,7 +76,9 @@ u64 ppc64_rma_size;
 unsigned int boot_cpu_node_count __ro_after_init;  #endif  static phys_addr_t first_memblock_size;
+#ifdef CONFIG_SMP
 static int __initdata boot_cpu_count;
+#endif
 
 static int __init early_parse_mem(char *p)  { @@ -331,8 +333,7 @@ static int __init early_init_dt_scan_cpus(unsigned long node,
 	const __be32 *intserv;
 	int i, nthreads;
 	int len;
-	int found = -1;
-	int found_thread = 0;
+	bool found = false;
 
 	/* We are scanning "cpu" nodes only */
 	if (type == NULL || strcmp(type, "cpu") != 0) @@ -355,8 +356,15 @@ static int __init early_init_dt_scan_cpus(unsigned long node,
 	for (i = 0; i < nthreads; i++) {
 		if (be32_to_cpu(intserv[i]) ==
 			fdt_boot_cpuid_phys(initial_boot_params)) {
-			found = boot_cpu_count;
-			found_thread = i;
+			/*
+			 * always map the boot-cpu logical id into the
+			 * range of [0, thread_per_core)
+			 */
+			boot_cpuid = i;
+			found = true;
+			/* This works around the hole in paca_ptrs[]. */
+			if (nr_cpu_ids < nthreads)
+				set_nr_cpu_ids(nthreads);
 		}
 #ifdef CONFIG_SMP
 		/* logical cpu id is always 0 on UP kernels */ @@ -365,14 +373,13 @@ static int __init early_init_dt_scan_cpus(unsigned long node,
 	}
 
 	/* Not the boot CPU */
-	if (found < 0)
+	if (!found)
 		return 0;
 
-	DBG("boot cpu: logical %d physical %d\n", found,
-	    be32_to_cpu(intserv[found_thread]));
-	boot_cpuid = found;
+	DBG("boot cpu: logical %d physical %d\n", boot_cpuid,
+	    be32_to_cpu(intserv[boot_cpuid]));
 
-	boot_cpu_hwid = be32_to_cpu(intserv[found_thread]);
+	boot_cpu_hwid = be32_to_cpu(intserv[boot_cpuid]);
 
 	/*
 	 * PAPR defines "logical" PVR values for cpus that diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c
index 1b19a9815672..f6d32324b5a5 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -36,6 +36,7 @@
 #include <linux/of_platform.h>
 #include <linux/hugetlb.h>
 #include <linux/pgtable.h>
+#include <linux/list.h>
 #include <asm/io.h>
 #include <asm/paca.h>
 #include <asm/processor.h>
@@ -425,6 +426,13 @@ static void __init cpu_init_thread_core_maps(int tpc)
 
 u32 *cpu_to_phys_id = NULL;
 
+struct interrupt_server_node {
+	struct list_head node;
+	bool	avail;
+	int	len;
+	__be32 *intserv;
+};
+
 /**
  * setup_cpu_maps - initialize the following cpu maps:
  *                  cpu_possible_mask
@@ -446,11 +454,16 @@ u32 *cpu_to_phys_id = NULL;  void __init smp_setup_cpu_maps(void)  {
 	struct device_node *dn;
-	int cpu = 0;
-	int nthreads = 1;
+	int shift = 0, cpu = 0;
+	int j, nthreads = 1;
+	int len;
+	struct interrupt_server_node *intserv_node, *n;
+	struct list_head *bt_node, head;
+	bool avail, found_boot_cpu = false;
 
 	DBG("smp_setup_cpu_maps()\n");
 
+	INIT_LIST_HEAD(&head);
 	cpu_to_phys_id = memblock_alloc(nr_cpu_ids * sizeof(u32),
 					__alignof__(u32));
 	if (!cpu_to_phys_id)
@@ -460,7 +473,6 @@ void __init smp_setup_cpu_maps(void)
 	for_each_node_by_type(dn, "cpu") {
 		const __be32 *intserv;
 		__be32 cpu_be;
-		int j, len;
 
 		DBG("  * %pOF...\n", dn);
 
@@ -480,29 +492,68 @@ void __init smp_setup_cpu_maps(void)
 			}
 		}
 
-		nthreads = len / sizeof(int);
-
-		for (j = 0; j < nthreads && cpu < nr_cpu_ids; j++) {
-			bool avail;
+		avail = of_device_is_available(dn);
+		if (!avail)
+			avail = !of_property_match_string(dn,
+					"enable-method", "spin-table");
 
-			DBG("    thread %d -> cpu %d (hard id %d)\n",
-			    j, cpu, be32_to_cpu(intserv[j]));
 
-			avail = of_device_is_available(dn);
-			if (!avail)
-				avail = !of_property_match_string(dn,
-						"enable-method", "spin-table");
+		intserv_node = memblock_alloc(sizeof(struct interrupt_server_node) + len,
+					__alignof__(u32));
+		if (!intserv_node)
+			panic("%s: Failed to allocate %zu bytes align=0x%zx\n",
+				__func__,
+				sizeof(struct interrupt_server_node) + len,
+				__alignof__(u32));
+		intserv_node->intserv = (__be32 *)((char *)intserv_node +
+						sizeof(struct interrupt_server_node));
+		intserv_node->len = len;
+		memcpy(intserv_node->intserv, intserv, len);
+		intserv_node->avail = avail;
+		INIT_LIST_HEAD(&intserv_node->node);
+		list_add_tail(&intserv_node->node, &head);
+
+		if (!found_boot_cpu) {
+			nthreads = len / sizeof(int);
+			for (j = 0 ; j < nthreads; j++) {
+				if (be32_to_cpu(intserv[j]) == boot_cpu_hwid) {
+					bt_node = &intserv_node->node;
+					found_boot_cpu = true;
+					/*
+					 * Record the round-shift between dt
+					 * seq and cpu logical number
+					 */
+					shift = cpu - j;
+					break;
+				}
+
+				cpu++;
+			}
+		}
 
+	}
+	cpu = 0;
+	list_del_init(&head);
+	/* Select the primary thread, the boot cpu's slibing, as the logic 0 */
+	list_add_tail(&head, bt_node);
+	pr_info("the round shift between dt seq and the cpu logic number: %d\n", shift);
+	list_for_each_entry(intserv_node, &head, node) {
+
+		avail = intserv_node->avail;
+		nthreads = intserv_node->len / sizeof(int);
+		for (j = 0; j < nthreads && cpu < nr_cpu_ids; j++) {
 			set_cpu_present(cpu, avail);
 			set_cpu_possible(cpu, true);
-			cpu_to_phys_id[cpu] = be32_to_cpu(intserv[j]);
+			cpu_to_phys_id[cpu] = be32_to_cpu(intserv_node->intserv[j]);
+			DBG("    thread %d -> cpu %d (hard id %d)\n",
+			    j, cpu, be32_to_cpu(intserv[j]));
 			cpu++;
 		}
+	}
 
-		if (cpu >= nr_cpu_ids) {
-			of_node_put(dn);
-			break;
-		}
+	list_for_each_entry_safe(intserv_node, n, &head, node) {
+		len = sizeof(struct interrupt_server_node) + intserv_node->len;
+		memblock_free(intserv_node, len);
 	}
 
 	/* If no SMT supported, nthreads is forced to 1 */
--
2.31.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCHv7 2/4] powerpc/setup: Loosen the mapping between cpu logical id and its seq in dt
  2023-09-28 20:36   ` Wen Xiong
@ 2023-09-29  3:19     ` Pingfan Liu
  2023-09-29  6:43     ` Christophe Leroy
  1 sibling, 0 replies; 10+ messages in thread
From: Pingfan Liu @ 2023-09-29  3:19 UTC (permalink / raw)
  To: Wen Xiong
  Cc: Baoquan He, kexec@lists.infradead.org, Mahesh Salgaonkar,
	Ming Lei, Nicholas Piggin, linuxppc-dev@lists.ozlabs.org

On Fri, Sep 29, 2023 at 4:36 AM Wen Xiong <wenxiong@us.ibm.com> wrote:
>
> Hi Pingfan,
>
> +               avail = intserv_node->avail;
> +               nthreads = intserv_node->len / sizeof(int);
> +               for (j = 0; j < nthreads && cpu < nr_cpu_ids; j++) {
>                         set_cpu_present(cpu, avail);
>                         set_cpu_possible(cpu, true);
> -                       cpu_to_phys_id[cpu] = be32_to_cpu(intserv[j]);
> +                       cpu_to_phys_id[cpu] = be32_to_cpu(intserv_node->intserv[j]);
> +                       DBG("    thread %d -> cpu %d (hard id %d)\n",
> +                           j, cpu, be32_to_cpu(intserv[j]));
>
> Intserv is not defined. Should "be32_to_cpu(intserv_node->intserv[j])?

Yes, thanks. Sorry that I did not turn on the DBG macro and not catch this bug.

Thanks,

Pingfan
>                         cpu++;
>                 }
> +       }
>
> -----Original Message-----
> From: Pingfan Liu <piliu@redhat.com>
> Sent: Monday, September 25, 2023 2:54 AM
> To: linuxppc-dev@lists.ozlabs.org
> Cc: Pingfan Liu <piliu@redhat.com>; Michael Ellerman <mpe@ellerman.id.au>; Nicholas Piggin <npiggin@gmail.com>; Christophe Leroy <christophe.leroy@csgroup.eu>; Mahesh Salgaonkar <mahesh@linux.ibm.com>; Wen Xiong <wenxiong@us.ibm.com>; Baoquan He <bhe@redhat.com>; Ming Lei <ming.lei@redhat.com>; kexec@lists.infradead.org
> Subject: [EXTERNAL] [PATCHv7 2/4] powerpc/setup: Loosen the mapping between cpu logical id and its seq in dt
>
> *** Idea ***
> For kexec -p, the boot cpu can be not the cpu0, this causes the problem of allocating memory for paca_ptrs[]. However, in theory, there is no requirement to assign cpu's logical id as its present sequence in the device tree. But there is something like cpu_first_thread_sibling(), which makes assumption on the mapping inside a core. Hence partially loosening the mapping, i.e. unbind the mapping of core while keep the mapping inside a core.
>
> *** Implement ***
> At this early stage, there are plenty of memory to utilize. Hence, this patch allocates interim memory to link the cpu info on a list, then reorder cpus by changing the list head. As a result, there is a rotate shift between the sequence number in dt and the cpu logical number.
>
> *** Result ***
> After this patch, a boot-cpu's logical id will always be mapped into the range [0,threads_per_core).
>
> Besides this, at this phase, all threads in the boot core are forced to be onlined. This restriction will be lifted in a later patch with extra effort.
>
> Signed-off-by: Pingfan Liu <piliu@redhat.com>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Nicholas Piggin <npiggin@gmail.com>
> Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
> Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com>
> Cc: Wen Xiong <wenxiong@us.ibm.com>
> Cc: Baoquan He <bhe@redhat.com>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: kexec@lists.infradead.org
> To: linuxppc-dev@lists.ozlabs.org
> ---
>  arch/powerpc/kernel/prom.c         | 25 +++++----
>  arch/powerpc/kernel/setup-common.c | 87 +++++++++++++++++++++++-------
>  2 files changed, 85 insertions(+), 27 deletions(-)
>
> diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c index ec82f5bda908..87272a2d8c10 100644
> --- a/arch/powerpc/kernel/prom.c
> +++ b/arch/powerpc/kernel/prom.c
> @@ -76,7 +76,9 @@ u64 ppc64_rma_size;
>  unsigned int boot_cpu_node_count __ro_after_init;  #endif  static phys_addr_t first_memblock_size;
> +#ifdef CONFIG_SMP
>  static int __initdata boot_cpu_count;
> +#endif
>
>  static int __init early_parse_mem(char *p)  { @@ -331,8 +333,7 @@ static int __init early_init_dt_scan_cpus(unsigned long node,
>         const __be32 *intserv;
>         int i, nthreads;
>         int len;
> -       int found = -1;
> -       int found_thread = 0;
> +       bool found = false;
>
>         /* We are scanning "cpu" nodes only */
>         if (type == NULL || strcmp(type, "cpu") != 0) @@ -355,8 +356,15 @@ static int __init early_init_dt_scan_cpus(unsigned long node,
>         for (i = 0; i < nthreads; i++) {
>                 if (be32_to_cpu(intserv[i]) ==
>                         fdt_boot_cpuid_phys(initial_boot_params)) {
> -                       found = boot_cpu_count;
> -                       found_thread = i;
> +                       /*
> +                        * always map the boot-cpu logical id into the
> +                        * range of [0, thread_per_core)
> +                        */
> +                       boot_cpuid = i;
> +                       found = true;
> +                       /* This works around the hole in paca_ptrs[]. */
> +                       if (nr_cpu_ids < nthreads)
> +                               set_nr_cpu_ids(nthreads);
>                 }
>  #ifdef CONFIG_SMP
>                 /* logical cpu id is always 0 on UP kernels */ @@ -365,14 +373,13 @@ static int __init early_init_dt_scan_cpus(unsigned long node,
>         }
>
>         /* Not the boot CPU */
> -       if (found < 0)
> +       if (!found)
>                 return 0;
>
> -       DBG("boot cpu: logical %d physical %d\n", found,
> -           be32_to_cpu(intserv[found_thread]));
> -       boot_cpuid = found;
> +       DBG("boot cpu: logical %d physical %d\n", boot_cpuid,
> +           be32_to_cpu(intserv[boot_cpuid]));
>
> -       boot_cpu_hwid = be32_to_cpu(intserv[found_thread]);
> +       boot_cpu_hwid = be32_to_cpu(intserv[boot_cpuid]);
>
>         /*
>          * PAPR defines "logical" PVR values for cpus that diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c
> index 1b19a9815672..f6d32324b5a5 100644
> --- a/arch/powerpc/kernel/setup-common.c
> +++ b/arch/powerpc/kernel/setup-common.c
> @@ -36,6 +36,7 @@
>  #include <linux/of_platform.h>
>  #include <linux/hugetlb.h>
>  #include <linux/pgtable.h>
> +#include <linux/list.h>
>  #include <asm/io.h>
>  #include <asm/paca.h>
>  #include <asm/processor.h>
> @@ -425,6 +426,13 @@ static void __init cpu_init_thread_core_maps(int tpc)
>
>  u32 *cpu_to_phys_id = NULL;
>
> +struct interrupt_server_node {
> +       struct list_head node;
> +       bool    avail;
> +       int     len;
> +       __be32 *intserv;
> +};
> +
>  /**
>   * setup_cpu_maps - initialize the following cpu maps:
>   *                  cpu_possible_mask
> @@ -446,11 +454,16 @@ u32 *cpu_to_phys_id = NULL;  void __init smp_setup_cpu_maps(void)  {
>         struct device_node *dn;
> -       int cpu = 0;
> -       int nthreads = 1;
> +       int shift = 0, cpu = 0;
> +       int j, nthreads = 1;
> +       int len;
> +       struct interrupt_server_node *intserv_node, *n;
> +       struct list_head *bt_node, head;
> +       bool avail, found_boot_cpu = false;
>
>         DBG("smp_setup_cpu_maps()\n");
>
> +       INIT_LIST_HEAD(&head);
>         cpu_to_phys_id = memblock_alloc(nr_cpu_ids * sizeof(u32),
>                                         __alignof__(u32));
>         if (!cpu_to_phys_id)
> @@ -460,7 +473,6 @@ void __init smp_setup_cpu_maps(void)
>         for_each_node_by_type(dn, "cpu") {
>                 const __be32 *intserv;
>                 __be32 cpu_be;
> -               int j, len;
>
>                 DBG("  * %pOF...\n", dn);
>
> @@ -480,29 +492,68 @@ void __init smp_setup_cpu_maps(void)
>                         }
>                 }
>
> -               nthreads = len / sizeof(int);
> -
> -               for (j = 0; j < nthreads && cpu < nr_cpu_ids; j++) {
> -                       bool avail;
> +               avail = of_device_is_available(dn);
> +               if (!avail)
> +                       avail = !of_property_match_string(dn,
> +                                       "enable-method", "spin-table");
>
> -                       DBG("    thread %d -> cpu %d (hard id %d)\n",
> -                           j, cpu, be32_to_cpu(intserv[j]));
>
> -                       avail = of_device_is_available(dn);
> -                       if (!avail)
> -                               avail = !of_property_match_string(dn,
> -                                               "enable-method", "spin-table");
> +               intserv_node = memblock_alloc(sizeof(struct interrupt_server_node) + len,
> +                                       __alignof__(u32));
> +               if (!intserv_node)
> +                       panic("%s: Failed to allocate %zu bytes align=0x%zx\n",
> +                               __func__,
> +                               sizeof(struct interrupt_server_node) + len,
> +                               __alignof__(u32));
> +               intserv_node->intserv = (__be32 *)((char *)intserv_node +
> +                                               sizeof(struct interrupt_server_node));
> +               intserv_node->len = len;
> +               memcpy(intserv_node->intserv, intserv, len);
> +               intserv_node->avail = avail;
> +               INIT_LIST_HEAD(&intserv_node->node);
> +               list_add_tail(&intserv_node->node, &head);
> +
> +               if (!found_boot_cpu) {
> +                       nthreads = len / sizeof(int);
> +                       for (j = 0 ; j < nthreads; j++) {
> +                               if (be32_to_cpu(intserv[j]) == boot_cpu_hwid) {
> +                                       bt_node = &intserv_node->node;
> +                                       found_boot_cpu = true;
> +                                       /*
> +                                        * Record the round-shift between dt
> +                                        * seq and cpu logical number
> +                                        */
> +                                       shift = cpu - j;
> +                                       break;
> +                               }
> +
> +                               cpu++;
> +                       }
> +               }
>
> +       }
> +       cpu = 0;
> +       list_del_init(&head);
> +       /* Select the primary thread, the boot cpu's slibing, as the logic 0 */
> +       list_add_tail(&head, bt_node);
> +       pr_info("the round shift between dt seq and the cpu logic number: %d\n", shift);
> +       list_for_each_entry(intserv_node, &head, node) {
> +
> +               avail = intserv_node->avail;
> +               nthreads = intserv_node->len / sizeof(int);
> +               for (j = 0; j < nthreads && cpu < nr_cpu_ids; j++) {
>                         set_cpu_present(cpu, avail);
>                         set_cpu_possible(cpu, true);
> -                       cpu_to_phys_id[cpu] = be32_to_cpu(intserv[j]);
> +                       cpu_to_phys_id[cpu] = be32_to_cpu(intserv_node->intserv[j]);
> +                       DBG("    thread %d -> cpu %d (hard id %d)\n",
> +                           j, cpu, be32_to_cpu(intserv[j]));
>                         cpu++;
>                 }
> +       }
>
> -               if (cpu >= nr_cpu_ids) {
> -                       of_node_put(dn);
> -                       break;
> -               }
> +       list_for_each_entry_safe(intserv_node, n, &head, node) {
> +               len = sizeof(struct interrupt_server_node) + intserv_node->len;
> +               memblock_free(intserv_node, len);
>         }
>
>         /* If no SMT supported, nthreads is forced to 1 */
> --
> 2.31.1
>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCHv7 2/4] powerpc/setup: Loosen the mapping between cpu logical id and its seq in dt
  2023-09-28 20:36   ` Wen Xiong
  2023-09-29  3:19     ` Pingfan Liu
@ 2023-09-29  6:43     ` Christophe Leroy
  1 sibling, 0 replies; 10+ messages in thread
From: Christophe Leroy @ 2023-09-29  6:43 UTC (permalink / raw)
  To: Wen Xiong, Pingfan Liu, linuxppc-dev@lists.ozlabs.org
  Cc: Baoquan He, kexec@lists.infradead.org, Mahesh Salgaonkar,
	Nicholas Piggin, Ming Lei



Le 28/09/2023 à 22:36, Wen Xiong a écrit :
> Hi Pingfan,
> 
> +               avail = intserv_node->avail;
> +               nthreads = intserv_node->len / sizeof(int);
> +               for (j = 0; j < nthreads && cpu < nr_cpu_ids; j++) {
>                          set_cpu_present(cpu, avail);
>                          set_cpu_possible(cpu, true);
> -                       cpu_to_phys_id[cpu] = be32_to_cpu(intserv[j]);
> +                       cpu_to_phys_id[cpu] = be32_to_cpu(intserv_node->intserv[j]);
> +                       DBG("    thread %d -> cpu %d (hard id %d)\n",
> +                           j, cpu, be32_to_cpu(intserv[j]));
> 
> Intserv is not defined. Should "be32_to_cpu(intserv_node->intserv[j])?
>                          cpu++;
>                  }
> +       }

Please don't top-post , see 
https://docs.kernel.org/process/submitting-patches.html#use-trimmed-interleaved-replies-in-email-discussions

Make comments inside the patch directly, making sure that your mail 
client is properly configured to add the standard > in front of all 
lines of the quoted mail.

Christophe

> 
> -----Original Message-----
> From: Pingfan Liu <piliu@redhat.com>
> Sent: Monday, September 25, 2023 2:54 AM
> To: linuxppc-dev@lists.ozlabs.org
> Cc: Pingfan Liu <piliu@redhat.com>; Michael Ellerman <mpe@ellerman.id.au>; Nicholas Piggin <npiggin@gmail.com>; Christophe Leroy <christophe.leroy@csgroup.eu>; Mahesh Salgaonkar <mahesh@linux.ibm.com>; Wen Xiong <wenxiong@us.ibm.com>; Baoquan He <bhe@redhat.com>; Ming Lei <ming.lei@redhat.com>; kexec@lists.infradead.org
> Subject: [EXTERNAL] [PATCHv7 2/4] powerpc/setup: Loosen the mapping between cpu logical id and its seq in dt
> 
> *** Idea ***
> For kexec -p, the boot cpu can be not the cpu0, this causes the problem of allocating memory for paca_ptrs[]. However, in theory, there is no requirement to assign cpu's logical id as its present sequence in the device tree. But there is something like cpu_first_thread_sibling(), which makes assumption on the mapping inside a core. Hence partially loosening the mapping, i.e. unbind the mapping of core while keep the mapping inside a core.
> 
> *** Implement ***
> At this early stage, there are plenty of memory to utilize. Hence, this patch allocates interim memory to link the cpu info on a list, then reorder cpus by changing the list head. As a result, there is a rotate shift between the sequence number in dt and the cpu logical number.
> 
> *** Result ***
> After this patch, a boot-cpu's logical id will always be mapped into the range [0,threads_per_core).
> 
> Besides this, at this phase, all threads in the boot core are forced to be onlined. This restriction will be lifted in a later patch with extra effort.
> 
> Signed-off-by: Pingfan Liu <piliu@redhat.com>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Nicholas Piggin <npiggin@gmail.com>
> Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
> Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com>
> Cc: Wen Xiong <wenxiong@us.ibm.com>
> Cc: Baoquan He <bhe@redhat.com>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: kexec@lists.infradead.org
> To: linuxppc-dev@lists.ozlabs.org
> ---
>   arch/powerpc/kernel/prom.c         | 25 +++++----
>   arch/powerpc/kernel/setup-common.c | 87 +++++++++++++++++++++++-------
>   2 files changed, 85 insertions(+), 27 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c index ec82f5bda908..87272a2d8c10 100644
> --- a/arch/powerpc/kernel/prom.c
> +++ b/arch/powerpc/kernel/prom.c
> @@ -76,7 +76,9 @@ u64 ppc64_rma_size;
>   unsigned int boot_cpu_node_count __ro_after_init;  #endif  static phys_addr_t first_memblock_size;
> +#ifdef CONFIG_SMP
>   static int __initdata boot_cpu_count;
> +#endif
> 
>   static int __init early_parse_mem(char *p)  { @@ -331,8 +333,7 @@ static int __init early_init_dt_scan_cpus(unsigned long node,
>          const __be32 *intserv;
>          int i, nthreads;
>          int len;
> -       int found = -1;
> -       int found_thread = 0;
> +       bool found = false;
> 
>          /* We are scanning "cpu" nodes only */
>          if (type == NULL || strcmp(type, "cpu") != 0) @@ -355,8 +356,15 @@ static int __init early_init_dt_scan_cpus(unsigned long node,
>          for (i = 0; i < nthreads; i++) {
>                  if (be32_to_cpu(intserv[i]) ==
>                          fdt_boot_cpuid_phys(initial_boot_params)) {
> -                       found = boot_cpu_count;
> -                       found_thread = i;
> +                       /*
> +                        * always map the boot-cpu logical id into the
> +                        * range of [0, thread_per_core)
> +                        */
> +                       boot_cpuid = i;
> +                       found = true;
> +                       /* This works around the hole in paca_ptrs[]. */
> +                       if (nr_cpu_ids < nthreads)
> +                               set_nr_cpu_ids(nthreads);
>                  }
>   #ifdef CONFIG_SMP
>                  /* logical cpu id is always 0 on UP kernels */ @@ -365,14 +373,13 @@ static int __init early_init_dt_scan_cpus(unsigned long node,
>          }
> 
>          /* Not the boot CPU */
> -       if (found < 0)
> +       if (!found)
>                  return 0;
> 
> -       DBG("boot cpu: logical %d physical %d\n", found,
> -           be32_to_cpu(intserv[found_thread]));
> -       boot_cpuid = found;
> +       DBG("boot cpu: logical %d physical %d\n", boot_cpuid,
> +           be32_to_cpu(intserv[boot_cpuid]));
> 
> -       boot_cpu_hwid = be32_to_cpu(intserv[found_thread]);
> +       boot_cpu_hwid = be32_to_cpu(intserv[boot_cpuid]);
> 
>          /*
>           * PAPR defines "logical" PVR values for cpus that diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c
> index 1b19a9815672..f6d32324b5a5 100644
> --- a/arch/powerpc/kernel/setup-common.c
> +++ b/arch/powerpc/kernel/setup-common.c
> @@ -36,6 +36,7 @@
>   #include <linux/of_platform.h>
>   #include <linux/hugetlb.h>
>   #include <linux/pgtable.h>
> +#include <linux/list.h>
>   #include <asm/io.h>
>   #include <asm/paca.h>
>   #include <asm/processor.h>
> @@ -425,6 +426,13 @@ static void __init cpu_init_thread_core_maps(int tpc)
> 
>   u32 *cpu_to_phys_id = NULL;
> 
> +struct interrupt_server_node {
> +       struct list_head node;
> +       bool    avail;
> +       int     len;
> +       __be32 *intserv;
> +};
> +
>   /**
>    * setup_cpu_maps - initialize the following cpu maps:
>    *                  cpu_possible_mask
> @@ -446,11 +454,16 @@ u32 *cpu_to_phys_id = NULL;  void __init smp_setup_cpu_maps(void)  {
>          struct device_node *dn;
> -       int cpu = 0;
> -       int nthreads = 1;
> +       int shift = 0, cpu = 0;
> +       int j, nthreads = 1;
> +       int len;
> +       struct interrupt_server_node *intserv_node, *n;
> +       struct list_head *bt_node, head;
> +       bool avail, found_boot_cpu = false;
> 
>          DBG("smp_setup_cpu_maps()\n");
> 
> +       INIT_LIST_HEAD(&head);
>          cpu_to_phys_id = memblock_alloc(nr_cpu_ids * sizeof(u32),
>                                          __alignof__(u32));
>          if (!cpu_to_phys_id)
> @@ -460,7 +473,6 @@ void __init smp_setup_cpu_maps(void)
>          for_each_node_by_type(dn, "cpu") {
>                  const __be32 *intserv;
>                  __be32 cpu_be;
> -               int j, len;
> 
>                  DBG("  * %pOF...\n", dn);
> 
> @@ -480,29 +492,68 @@ void __init smp_setup_cpu_maps(void)
>                          }
>                  }
> 
> -               nthreads = len / sizeof(int);
> -
> -               for (j = 0; j < nthreads && cpu < nr_cpu_ids; j++) {
> -                       bool avail;
> +               avail = of_device_is_available(dn);
> +               if (!avail)
> +                       avail = !of_property_match_string(dn,
> +                                       "enable-method", "spin-table");
> 
> -                       DBG("    thread %d -> cpu %d (hard id %d)\n",
> -                           j, cpu, be32_to_cpu(intserv[j]));
> 
> -                       avail = of_device_is_available(dn);
> -                       if (!avail)
> -                               avail = !of_property_match_string(dn,
> -                                               "enable-method", "spin-table");
> +               intserv_node = memblock_alloc(sizeof(struct interrupt_server_node) + len,
> +                                       __alignof__(u32));
> +               if (!intserv_node)
> +                       panic("%s: Failed to allocate %zu bytes align=0x%zx\n",
> +                               __func__,
> +                               sizeof(struct interrupt_server_node) + len,
> +                               __alignof__(u32));
> +               intserv_node->intserv = (__be32 *)((char *)intserv_node +
> +                                               sizeof(struct interrupt_server_node));
> +               intserv_node->len = len;
> +               memcpy(intserv_node->intserv, intserv, len);
> +               intserv_node->avail = avail;
> +               INIT_LIST_HEAD(&intserv_node->node);
> +               list_add_tail(&intserv_node->node, &head);
> +
> +               if (!found_boot_cpu) {
> +                       nthreads = len / sizeof(int);
> +                       for (j = 0 ; j < nthreads; j++) {
> +                               if (be32_to_cpu(intserv[j]) == boot_cpu_hwid) {
> +                                       bt_node = &intserv_node->node;
> +                                       found_boot_cpu = true;
> +                                       /*
> +                                        * Record the round-shift between dt
> +                                        * seq and cpu logical number
> +                                        */
> +                                       shift = cpu - j;
> +                                       break;
> +                               }
> +
> +                               cpu++;
> +                       }
> +               }
> 
> +       }
> +       cpu = 0;
> +       list_del_init(&head);
> +       /* Select the primary thread, the boot cpu's slibing, as the logic 0 */
> +       list_add_tail(&head, bt_node);
> +       pr_info("the round shift between dt seq and the cpu logic number: %d\n", shift);
> +       list_for_each_entry(intserv_node, &head, node) {
> +
> +               avail = intserv_node->avail;
> +               nthreads = intserv_node->len / sizeof(int);
> +               for (j = 0; j < nthreads && cpu < nr_cpu_ids; j++) {
>                          set_cpu_present(cpu, avail);
>                          set_cpu_possible(cpu, true);
> -                       cpu_to_phys_id[cpu] = be32_to_cpu(intserv[j]);
> +                       cpu_to_phys_id[cpu] = be32_to_cpu(intserv_node->intserv[j]);
> +                       DBG("    thread %d -> cpu %d (hard id %d)\n",
> +                           j, cpu, be32_to_cpu(intserv[j]));
>                          cpu++;
>                  }
> +       }
> 
> -               if (cpu >= nr_cpu_ids) {
> -                       of_node_put(dn);
> -                       break;
> -               }
> +       list_for_each_entry_safe(intserv_node, n, &head, node) {
> +               len = sizeof(struct interrupt_server_node) + intserv_node->len;
> +               memblock_free(intserv_node, len);
>          }
> 
>          /* If no SMT supported, nthreads is forced to 1 */
> --
> 2.31.1
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCHv7 4/4] powerpc/setup: alloc extra paca_ptrs to hold boot_cpuid
  2023-09-25  7:53 ` [PATCHv7 4/4] powerpc/setup: alloc extra paca_ptrs to hold boot_cpuid Pingfan Liu
@ 2023-10-03 18:06   ` Mahesh J Salgaonkar
  2023-10-07  1:03     ` Pingfan Liu
  0 siblings, 1 reply; 10+ messages in thread
From: Mahesh J Salgaonkar @ 2023-10-03 18:06 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: Baoquan He, kexec, Ming Lei, Nicholas Piggin, linuxppc-dev,
	Wen Xiong

On 2023-09-25 15:53:48 Mon, Pingfan Liu wrote:
> paca_ptrs should be large enough to hold the boot_cpuid, hence, its
> lower boundary is set to the bigger one between boot_cpuid+1 and
> nr_cpus.
> 
> On the other hand, some kernel component: -1. the timer assumes cpu0
> online since the timer_list->flags subfield 'TIMER_CPUMASK' is zero if
> not initialized to a proper present cpu.  -2. power9_idle_stop() assumes
> the primary thread's paca is allocated.
> 
> Hence lift nr_cpu_ids from one to two to ensure cpu0 is onlined, if the
> boot cpu is not cpu0.
> 
> Result:
> When nr_cpus=1, taskset -c 14 bash -c 'echo c > /proc/sysrq-trigger'
> the kdump kernel brings up two cpus.
> While when taskset -c 4 bash -c 'echo c > /proc/sysrq-trigger',
> the kdump kernel brings up one cpu.

I tried your changes on power9 and power10 systems. However, on power10 lpar I
see bellow backtrace in kdump kernel bootup with nr_cpus=1.

$ taskset -c 4 bash -c 'echo c > /proc/sysrq-trigger'
[...]
[    0.000000] Hardware name: IBM,9105-22A POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1040.00 (NL1040_005) hv:phyp pSeries
[    0.000000] printk: bootconsole [udbg0] enabled
[    0.000000] the round shift between dt seq and the cpu logic number: 8
[    0.000000] Partition configured for 16 cpus, operating system maximum is 2.
[    0.000000] CPU maps initialized for 8 threads per core
[...]
[    0.002249] BUG: Unable to handle kernel data access at 0x88888888888888c0
[    0.002260] Faulting instruction address: 0xc00000001201226c
[    0.002268] Oops: Kernel access of bad area, sig: 11 [#1]
[    0.002274] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
[    0.002282] Modules linked in:
[    0.002288] CPU: 4 PID: 1 Comm: swapper/4 Not tainted 6.6.0-rc4 #1
[    0.002296] Hardware name: IBM,9105-22A POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1040.00 (NL1040_005) hv:phyp pSeries
[    0.002305] NIP:  c00000001201226c LR: c000000012012234 CTR: 0000000000000004
[    0.002312] REGS: c0000000167ff8f0 TRAP: 0380   Not tainted  (6.6.0-rc4)
[    0.002321] MSR:  8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE>  CR: 24000844  XER: 0000000a
[    0.002346] CFAR: c00000001201231c IRQMASK: 0
[    0.002346] GPR00: c000000012012234 c0000000167ffb90 c000000011b61900 0000000000000002
[    0.002346] GPR04: 0000000000000000 0000000000000001 0000000000000001 c00000004ffeff80
[    0.002346] GPR08: 0000000000000000 8888888888888888 0000000000000002 0000000000000000
[    0.002346] GPR12: 0000000000000000 c000000013141000 c000000010011058 0000000000000000
[    0.002346] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[    0.002346] GPR20: 0000000000000028 c000000012170968 c0000000120a3e80 0000000000000016
[    0.002346] GPR24: c00000004ffdcfd0 0000000000000000 c000000012b82058 0000000000000000
[    0.002346] GPR28: c00000004fc80a68 c000000012bf0350 c0000000120a3e2c 0000000000000000
[    0.002426] NIP [c00000001201226c] update_mask_from_threadgroup+0x98/0x174
[    0.002437] LR [c000000012012234] update_mask_from_threadgroup+0x60/0x174
[    0.002444] Call Trace:
[    0.002451] [c0000000167ffb90] [c000000012012234] update_mask_from_threadgroup+0x60/0x174 (unreliable)
[    0.002464] [c0000000167ffbe0] [c0000000120125f8] init_thread_group_cache_map+0x2b0/0x328
[    0.002477] [c0000000167ffc50] [c00000001201296c] smp_prepare_cpus+0x2fc/0x4f0
[    0.002497] [c0000000167ffd10] [c000000012004e40] kernel_init_freeable+0x198/0x3cc
[    0.002509] [c0000000167ffde0] [c000000010011084] kernel_init+0x34/0x1b0
[    0.002531] [c0000000167ffe50] [c00000001000dd3c] ret_from_kernel_user_thread+0x14/0x1c
[    0.002547] --- interrupt: 0 at 0x0
[    0.002553] NIP:  0000000000000000 LR: 0000000000000000 CTR: 0000000000000000
[    0.002563] REGS: c0000000167ffe80 TRAP: 0000   Not tainted  (6.6.0-rc4)
[    0.002569] MSR:  0000000000000000 <>  CR: 00000000  XER: 00000000
[    0.002576] CFAR: 0000000000000000 IRQMASK: 0
[    0.002576] GPR00: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[    0.002576] GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[    0.002576] GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[    0.002576] GPR12: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[    0.002576] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[    0.002576] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[    0.002576] GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[    0.002576] GPR28: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[    0.002671] NIP [0000000000000000] 0x0
[    0.002680] LR [0000000000000000] 0x0
[    0.002689] --- interrupt: 0
[    0.002697] Code: 7feafb78 813d0000 7d29fa14 7f895000 409d00d4 3ce20102 38e74758 79491f24 e87e0006 39000000 e8e70000 7d27482a <a8890038> 7f834000 79090020 419e005c
[    0.002727] ---[ end trace 0000000000000000 ]---
[    0.002739]
[    1.002749] Kernel panic - not syncing: Fatal exception
[    1.002795] Rebooting in 10 seconds..

Thanks,
-Mahesh.

> 
> Signed-off-by: Pingfan Liu <piliu@redhat.com>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Nicholas Piggin <npiggin@gmail.com>
> Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
> Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com>
> Cc: Wen Xiong <wenxiong@us.ibm.com>
> Cc: Baoquan He <bhe@redhat.com>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: kexec@lists.infradead.org
> To: linuxppc-dev@lists.ozlabs.org
> ---
>  arch/powerpc/kernel/paca.c | 10 ++++++----
>  arch/powerpc/kernel/prom.c |  9 ++++++---
>  2 files changed, 12 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
> index cda4e00b67c1..91e2401de1bd 100644
> --- a/arch/powerpc/kernel/paca.c
> +++ b/arch/powerpc/kernel/paca.c
> @@ -242,9 +242,10 @@ static int __initdata paca_struct_size;
>  
>  void __init allocate_paca_ptrs(void)
>  {
> -	paca_nr_cpu_ids = nr_cpu_ids;
> +	int n = (boot_cpuid + 1) > nr_cpu_ids ? (boot_cpuid + 1) : nr_cpu_ids;
>  
> -	paca_ptrs_size = sizeof(struct paca_struct *) * nr_cpu_ids;
> +	paca_nr_cpu_ids = n;
> +	paca_ptrs_size = sizeof(struct paca_struct *) * n;
>  	paca_ptrs = memblock_alloc_raw(paca_ptrs_size, SMP_CACHE_BYTES);
>  	if (!paca_ptrs)
>  		panic("Failed to allocate %d bytes for paca pointers\n",
> @@ -287,13 +288,14 @@ void __init allocate_paca(int cpu)
>  void __init free_unused_pacas(void)
>  {
>  	int new_ptrs_size;
> +	int n = (boot_cpuid + 1) > nr_cpu_ids ? (boot_cpuid + 1) : nr_cpu_ids;
>  
> -	new_ptrs_size = sizeof(struct paca_struct *) * nr_cpu_ids;
> +	new_ptrs_size = sizeof(struct paca_struct *) * n;
>  	if (new_ptrs_size < paca_ptrs_size)
>  		memblock_phys_free(__pa(paca_ptrs) + new_ptrs_size,
>  				   paca_ptrs_size - new_ptrs_size);
>  
> -	paca_nr_cpu_ids = nr_cpu_ids;
> +	paca_nr_cpu_ids = n;
>  	paca_ptrs_size = new_ptrs_size;
>  
>  #ifdef CONFIG_PPC_64S_HASH_MMU
> diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
> index 87272a2d8c10..15c994f54bf9 100644
> --- a/arch/powerpc/kernel/prom.c
> +++ b/arch/powerpc/kernel/prom.c
> @@ -362,9 +362,12 @@ static int __init early_init_dt_scan_cpus(unsigned long node,
>  			 */
>  			boot_cpuid = i;
>  			found = true;
> -			/* This works around the hole in paca_ptrs[]. */
> -			if (nr_cpu_ids < nthreads)
> -				set_nr_cpu_ids(nthreads);
> +			/*
> +			 * Ideally, nr_cpus=1 can be achieved if each kernel
> +			 * component does not assume cpu0 is onlined.
> +			 */
> +			if (boot_cpuid != 0 && nr_cpu_ids < 2)
> +				set_nr_cpu_ids(2);
>  		}
>  #ifdef CONFIG_SMP
>  		/* logical cpu id is always 0 on UP kernels */
> -- 
> 2.31.1
> 

-- 
Mahesh J Salgaonkar

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCHv7 4/4] powerpc/setup: alloc extra paca_ptrs to hold boot_cpuid
  2023-10-03 18:06   ` Mahesh J Salgaonkar
@ 2023-10-07  1:03     ` Pingfan Liu
  0 siblings, 0 replies; 10+ messages in thread
From: Pingfan Liu @ 2023-10-07  1:03 UTC (permalink / raw)
  To: mahesh
  Cc: Baoquan He, kexec, Ming Lei, Nicholas Piggin, linuxppc-dev,
	Wen Xiong

On Wed, Oct 4, 2023 at 2:07 AM Mahesh J Salgaonkar <mahesh@linux.ibm.com> wrote:
>
> On 2023-09-25 15:53:48 Mon, Pingfan Liu wrote:
> > paca_ptrs should be large enough to hold the boot_cpuid, hence, its
> > lower boundary is set to the bigger one between boot_cpuid+1 and
> > nr_cpus.
> >
> > On the other hand, some kernel component: -1. the timer assumes cpu0
> > online since the timer_list->flags subfield 'TIMER_CPUMASK' is zero if
> > not initialized to a proper present cpu.  -2. power9_idle_stop() assumes
> > the primary thread's paca is allocated.
> >
> > Hence lift nr_cpu_ids from one to two to ensure cpu0 is onlined, if the
> > boot cpu is not cpu0.
> >
> > Result:
> > When nr_cpus=1, taskset -c 14 bash -c 'echo c > /proc/sysrq-trigger'
> > the kdump kernel brings up two cpus.
> > While when taskset -c 4 bash -c 'echo c > /proc/sysrq-trigger',
> > the kdump kernel brings up one cpu.
>
> I tried your changes on power9 and power10 systems. However, on power10 lpar I
> see bellow backtrace in kdump kernel bootup with nr_cpus=1.
>

Thanks for the testing. I have only tried this series on Power9 bare
metal.  I think the bug is related with the code snippet in
update_mask_from_threadgroup()
  for (i = first_thread; i < first_thread + threads_per_core; i++) {
    int i_group_start = get_cpu_thread_group_start(i, tg);
                                  ^^^

Here it iterates over each thread in the core, but some of them are not online.

I will try to bring up a remedy.

Thanks,

Pingfan


> $ taskset -c 4 bash -c 'echo c > /proc/sysrq-trigger'
> [...]
> [    0.000000] Hardware name: IBM,9105-22A POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1040.00 (NL1040_005) hv:phyp pSeries
> [    0.000000] printk: bootconsole [udbg0] enabled
> [    0.000000] the round shift between dt seq and the cpu logic number: 8
> [    0.000000] Partition configured for 16 cpus, operating system maximum is 2.
> [    0.000000] CPU maps initialized for 8 threads per core
> [...]
> [    0.002249] BUG: Unable to handle kernel data access at 0x88888888888888c0
> [    0.002260] Faulting instruction address: 0xc00000001201226c
> [    0.002268] Oops: Kernel access of bad area, sig: 11 [#1]
> [    0.002274] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
> [    0.002282] Modules linked in:
> [    0.002288] CPU: 4 PID: 1 Comm: swapper/4 Not tainted 6.6.0-rc4 #1
> [    0.002296] Hardware name: IBM,9105-22A POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1040.00 (NL1040_005) hv:phyp pSeries
> [    0.002305] NIP:  c00000001201226c LR: c000000012012234 CTR: 0000000000000004
> [    0.002312] REGS: c0000000167ff8f0 TRAP: 0380   Not tainted  (6.6.0-rc4)
> [    0.002321] MSR:  8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE>  CR: 24000844  XER: 0000000a
> [    0.002346] CFAR: c00000001201231c IRQMASK: 0
> [    0.002346] GPR00: c000000012012234 c0000000167ffb90 c000000011b61900 0000000000000002
> [    0.002346] GPR04: 0000000000000000 0000000000000001 0000000000000001 c00000004ffeff80
> [    0.002346] GPR08: 0000000000000000 8888888888888888 0000000000000002 0000000000000000
> [    0.002346] GPR12: 0000000000000000 c000000013141000 c000000010011058 0000000000000000
> [    0.002346] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [    0.002346] GPR20: 0000000000000028 c000000012170968 c0000000120a3e80 0000000000000016
> [    0.002346] GPR24: c00000004ffdcfd0 0000000000000000 c000000012b82058 0000000000000000
> [    0.002346] GPR28: c00000004fc80a68 c000000012bf0350 c0000000120a3e2c 0000000000000000
> [    0.002426] NIP [c00000001201226c] update_mask_from_threadgroup+0x98/0x174
> [    0.002437] LR [c000000012012234] update_mask_from_threadgroup+0x60/0x174
> [    0.002444] Call Trace:
> [    0.002451] [c0000000167ffb90] [c000000012012234] update_mask_from_threadgroup+0x60/0x174 (unreliable)
> [    0.002464] [c0000000167ffbe0] [c0000000120125f8] init_thread_group_cache_map+0x2b0/0x328
> [    0.002477] [c0000000167ffc50] [c00000001201296c] smp_prepare_cpus+0x2fc/0x4f0
> [    0.002497] [c0000000167ffd10] [c000000012004e40] kernel_init_freeable+0x198/0x3cc
> [    0.002509] [c0000000167ffde0] [c000000010011084] kernel_init+0x34/0x1b0
> [    0.002531] [c0000000167ffe50] [c00000001000dd3c] ret_from_kernel_user_thread+0x14/0x1c
> [    0.002547] --- interrupt: 0 at 0x0
> [    0.002553] NIP:  0000000000000000 LR: 0000000000000000 CTR: 0000000000000000
> [    0.002563] REGS: c0000000167ffe80 TRAP: 0000   Not tainted  (6.6.0-rc4)
> [    0.002569] MSR:  0000000000000000 <>  CR: 00000000  XER: 00000000
> [    0.002576] CFAR: 0000000000000000 IRQMASK: 0
> [    0.002576] GPR00: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [    0.002576] GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [    0.002576] GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [    0.002576] GPR12: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [    0.002576] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [    0.002576] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [    0.002576] GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [    0.002576] GPR28: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [    0.002671] NIP [0000000000000000] 0x0
> [    0.002680] LR [0000000000000000] 0x0
> [    0.002689] --- interrupt: 0
> [    0.002697] Code: 7feafb78 813d0000 7d29fa14 7f895000 409d00d4 3ce20102 38e74758 79491f24 e87e0006 39000000 e8e70000 7d27482a <a8890038> 7f834000 79090020 419e005c
> [    0.002727] ---[ end trace 0000000000000000 ]---
> [    0.002739]
> [    1.002749] Kernel panic - not syncing: Fatal exception
> [    1.002795] Rebooting in 10 seconds..
>
> Thanks,
> -Mahesh.
>
> >
> > Signed-off-by: Pingfan Liu <piliu@redhat.com>
> > Cc: Michael Ellerman <mpe@ellerman.id.au>
> > Cc: Nicholas Piggin <npiggin@gmail.com>
> > Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
> > Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com>
> > Cc: Wen Xiong <wenxiong@us.ibm.com>
> > Cc: Baoquan He <bhe@redhat.com>
> > Cc: Ming Lei <ming.lei@redhat.com>
> > Cc: kexec@lists.infradead.org
> > To: linuxppc-dev@lists.ozlabs.org
> > ---
> >  arch/powerpc/kernel/paca.c | 10 ++++++----
> >  arch/powerpc/kernel/prom.c |  9 ++++++---
> >  2 files changed, 12 insertions(+), 7 deletions(-)
> >
> > diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
> > index cda4e00b67c1..91e2401de1bd 100644
> > --- a/arch/powerpc/kernel/paca.c
> > +++ b/arch/powerpc/kernel/paca.c
> > @@ -242,9 +242,10 @@ static int __initdata paca_struct_size;
> >
> >  void __init allocate_paca_ptrs(void)
> >  {
> > -     paca_nr_cpu_ids = nr_cpu_ids;
> > +     int n = (boot_cpuid + 1) > nr_cpu_ids ? (boot_cpuid + 1) : nr_cpu_ids;
> >
> > -     paca_ptrs_size = sizeof(struct paca_struct *) * nr_cpu_ids;
> > +     paca_nr_cpu_ids = n;
> > +     paca_ptrs_size = sizeof(struct paca_struct *) * n;
> >       paca_ptrs = memblock_alloc_raw(paca_ptrs_size, SMP_CACHE_BYTES);
> >       if (!paca_ptrs)
> >               panic("Failed to allocate %d bytes for paca pointers\n",
> > @@ -287,13 +288,14 @@ void __init allocate_paca(int cpu)
> >  void __init free_unused_pacas(void)
> >  {
> >       int new_ptrs_size;
> > +     int n = (boot_cpuid + 1) > nr_cpu_ids ? (boot_cpuid + 1) : nr_cpu_ids;
> >
> > -     new_ptrs_size = sizeof(struct paca_struct *) * nr_cpu_ids;
> > +     new_ptrs_size = sizeof(struct paca_struct *) * n;
> >       if (new_ptrs_size < paca_ptrs_size)
> >               memblock_phys_free(__pa(paca_ptrs) + new_ptrs_size,
> >                                  paca_ptrs_size - new_ptrs_size);
> >
> > -     paca_nr_cpu_ids = nr_cpu_ids;
> > +     paca_nr_cpu_ids = n;
> >       paca_ptrs_size = new_ptrs_size;
> >
> >  #ifdef CONFIG_PPC_64S_HASH_MMU
> > diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
> > index 87272a2d8c10..15c994f54bf9 100644
> > --- a/arch/powerpc/kernel/prom.c
> > +++ b/arch/powerpc/kernel/prom.c
> > @@ -362,9 +362,12 @@ static int __init early_init_dt_scan_cpus(unsigned long node,
> >                        */
> >                       boot_cpuid = i;
> >                       found = true;
> > -                     /* This works around the hole in paca_ptrs[]. */
> > -                     if (nr_cpu_ids < nthreads)
> > -                             set_nr_cpu_ids(nthreads);
> > +                     /*
> > +                      * Ideally, nr_cpus=1 can be achieved if each kernel
> > +                      * component does not assume cpu0 is onlined.
> > +                      */
> > +                     if (boot_cpuid != 0 && nr_cpu_ids < 2)
> > +                             set_nr_cpu_ids(2);
> >               }
> >  #ifdef CONFIG_SMP
> >               /* logical cpu id is always 0 on UP kernels */
> > --
> > 2.31.1
> >
>
> --
> Mahesh J Salgaonkar
>


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-10-07  1:04 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-25  7:53 [PATCHv7 0/4] enable nr_cpus for powerpc Pingfan Liu
2023-09-25  7:53 ` [PATCHv7 1/4] powerpc/setup : Enable boot_cpu_hwid for PPC32 Pingfan Liu
2023-09-25  7:53 ` [PATCHv7 2/4] powerpc/setup: Loosen the mapping between cpu logical id and its seq in dt Pingfan Liu
2023-09-28 20:36   ` Wen Xiong
2023-09-29  3:19     ` Pingfan Liu
2023-09-29  6:43     ` Christophe Leroy
2023-09-25  7:53 ` [PATCHv7 3/4] powerpc/setup: Handle the case when boot_cpuid greater than nr_cpus Pingfan Liu
2023-09-25  7:53 ` [PATCHv7 4/4] powerpc/setup: alloc extra paca_ptrs to hold boot_cpuid Pingfan Liu
2023-10-03 18:06   ` Mahesh J Salgaonkar
2023-10-07  1:03     ` Pingfan Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).