From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.ozlabs.org (lists.ozlabs.org [112.213.38.117]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 1E987C46CD2 for ; Tue, 9 Jan 2024 04:30:23 +0000 (UTC) Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256 header.s=pp1 header.b=otpE4p8F; dkim-atps=neutral Received: from boromir.ozlabs.org (localhost [IPv6:::1]) by lists.ozlabs.org (Postfix) with ESMTP id 4T8HzK20D6z30g5 for ; Tue, 9 Jan 2024 15:30:21 +1100 (AEDT) Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=ibm.com header.i=@ibm.com header.a=rsa-sha256 header.s=pp1 header.b=otpE4p8F; dkim-atps=neutral Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=linux.ibm.com (client-ip=148.163.158.5; helo=mx0b-001b2d01.pphosted.com; envelope-from=hbathini@linux.ibm.com; receiver=lists.ozlabs.org) Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4T8HyL0KLWz2yRS for ; Tue, 9 Jan 2024 15:29:29 +1100 (AEDT) Received: from pps.filterd (m0360072.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 40932MAF012645; Tue, 9 Jan 2024 04:27:17 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=message-id : date : mime-version : subject : to : cc : references : from : in-reply-to : content-type : content-transfer-encoding; s=pp1; bh=wLLMbDJ953G7butPR3ZbjArDyMXZRswBxvP3SBqhwds=; b=otpE4p8F3LbW/oQ9OKtounhsSbvSLlBRDDfJoMjOJm3QDOseXoHhhMjosVBGpHD2WVd1 WDtfRyTUx9dDdIaiN3rp0HvNUtkt3a71oS/K9d/JCax+SNSJ6oTOA1BtcsVClmzfwwPi z8REcZ9VpxwEG4IHxqOBSymgWAMRSd9pcT2lm9njgUK/YYOfnrTNqdYLIPzkcrLUfQCz MQYYkvlcnX176mS6UCQeVM+OceIaiDkyaG5MrBUeLwYcZgYxjv5mBwjj6Bwyf7MwyK5g z4vplHwD+7A6yMo95BZW9aKzb0Yblxpfo7hggBMM+u4YdJ54ocU4sDYvwbxLiG0MWoSl Xg== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3vgwvthrfp-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 09 Jan 2024 04:27:17 +0000 Received: from m0360072.ppops.net (m0360072.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 4094PTRu007715; Tue, 9 Jan 2024 04:27:16 GMT Received: from ppma22.wdc07v.mail.ibm.com (5c.69.3da9.ip4.static.sl-reverse.com [169.61.105.92]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3vgwvthrfd-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 09 Jan 2024 04:27:16 +0000 Received: from pps.filterd (ppma22.wdc07v.mail.ibm.com [127.0.0.1]) by ppma22.wdc07v.mail.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 4093AoCY022808; Tue, 9 Jan 2024 04:27:15 GMT Received: from smtprelay06.fra02v.mail.ibm.com ([9.218.2.230]) by ppma22.wdc07v.mail.ibm.com (PPS) with ESMTPS id 3vfhjyckey-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 09 Jan 2024 04:27:15 +0000 Received: from smtpav07.fra02v.mail.ibm.com (smtpav07.fra02v.mail.ibm.com [10.20.54.106]) by smtprelay06.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 4094RCMl31588852 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 9 Jan 2024 04:27:12 GMT Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 3535720043; Tue, 9 Jan 2024 04:27:12 +0000 (GMT) Received: from smtpav07.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id D4EE920040; Tue, 9 Jan 2024 04:27:09 +0000 (GMT) Received: from [9.43.111.32] (unknown [9.43.111.32]) by smtpav07.fra02v.mail.ibm.com (Postfix) with ESMTP; Tue, 9 Jan 2024 04:27:09 +0000 (GMT) Message-ID: <4fb48637-be68-4e53-bb8a-f0988c03819b@linux.ibm.com> Date: Tue, 9 Jan 2024 09:57:08 +0530 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCHv9 2/2] powerpc/setup: Loosen the mapping between cpu logical id and its seq in dt To: Michael Ellerman References: <20231017022806.4523-1-piliu@redhat.com> <20231017022806.4523-3-piliu@redhat.com> <064ae2ee-ad15-c0ab-f78b-7b3e7dc6612d@linux.ibm.com> <40ab9119-fa6c-4211-98b8-478a84d46e64@linux.ibm.com> Content-Language: en-US From: Hari Bathini In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-TM-AS-GCONF: 00 X-Proofpoint-GUID: gva_Iu1qN9L_Bp60kMfOcLUMSy_nM4Fo X-Proofpoint-ORIG-GUID: A9y-NReluTZC8nK7X7-IGn-jKBE6dB3k X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.997,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2024-01-08_11,2024-01-08_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1015 mlxscore=0 phishscore=0 suspectscore=0 impostorscore=0 lowpriorityscore=0 spamscore=0 adultscore=0 malwarescore=0 priorityscore=1501 bulkscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2311290000 definitions=main-2401090028 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Pingfan Liu , Baoquan He , kexec@lists.infradead.org, Mahesh Salgaonkar , Sourabh Jain , Ming Lei , Nicholas Piggin , linuxppc-dev , Wen Xiong Errors-To: linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Sender: "Linuxppc-dev" Hi Michael, I am fine with either approach. I was trying to address your concerns in my way. Looking for your inputs here on how to go about this now.. On 29/11/23 7:00 am, Pingfan Liu wrote: > Hi Hari, > > > On Mon, Nov 27, 2023 at 12:30 PM Hari Bathini wrote: >> >> Hi Pingfan, Michael, >> >> On 17/10/23 4:03 pm, Hari Bathini wrote: >>> >>> >>> On 17/10/23 7:58 am, Pingfan Liu wrote: >>>> *** Idea *** >>>> For kexec -p, the boot cpu can be not the cpu0, this causes the problem >>>> of allocating memory for paca_ptrs[]. However, in theory, there is no >>>> requirement to assign cpu's logical id as its present sequence in the >>>> device tree. But there is something like cpu_first_thread_sibling(), >>>> which makes assumption on the mapping inside a core. Hence partially >>>> loosening the mapping, i.e. unbind the mapping of core while keep the >>>> mapping inside a core. >>>> >>>> *** Implement *** >>>> At this early stage, there are plenty of memory to utilize. Hence, this >>>> patch allocates interim memory to link the cpu info on a list, then >>>> reorder cpus by changing the list head. As a result, there is a rotate >>>> shift between the sequence number in dt and the cpu logical number. >>>> >>>> *** Result *** >>>> After this patch, a boot-cpu's logical id will always be mapped into the >>>> range [0,threads_per_core). >>>> >>>> Besides this, at this phase, all threads in the boot core are forced to >>>> be onlined. This restriction will be lifted in a later patch with >>>> extra effort. >>>> >>>> Signed-off-by: Pingfan Liu >>>> Cc: Michael Ellerman >>>> Cc: Nicholas Piggin >>>> Cc: Christophe Leroy >>>> Cc: Mahesh Salgaonkar >>>> Cc: Wen Xiong >>>> Cc: Baoquan He >>>> Cc: Ming Lei >>>> Cc: Sourabh Jain >>>> Cc: Hari Bathini >>>> Cc: kexec@lists.infradead.org >>>> To: linuxppc-dev@lists.ozlabs.org >>> >>> Thanks for working on this, Pingfan. >>> Looks good to me. >>> >>> Acked-by: Hari Bathini >>> >> >> On second thoughts, probably better off with no impact for >> bootcpu < nr_cpu_ids case and changing only two cores logical >> numbering otherwise. Something like the below (Please share >> your thoughts): >> > > I am afraid that it may not be as ideal as it looks, considering the > following factors: > -1. For the case of 'bootcpu < nr_cpu_ids', crash can happen evenly > across any cpu in the system, which seriously undermines the > protection intended here (Under the most optimistic scenario, there is > a 50% chance of success) > > -2. For the re-ordering of logical numbering, IMHO, if there is > concern that re-ordering will break something, the partial re-ordering > can not avoid that. We ought to spot probable hazards so as to ease > worries. > > > Thanks, > > Pingfan > >> diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c >> index ec82f5bda908..78a8312aa8c4 100644 >> --- a/arch/powerpc/kernel/prom.c >> +++ b/arch/powerpc/kernel/prom.c >> @@ -76,7 +76,9 @@ u64 ppc64_rma_size; >> unsigned int boot_cpu_node_count __ro_after_init; >> #endif >> static phys_addr_t first_memblock_size; >> +#ifdef CONFIG_SMP >> static int __initdata boot_cpu_count; >> +#endif >> >> static int __init early_parse_mem(char *p) >> { >> @@ -357,6 +359,25 @@ static int __init early_init_dt_scan_cpus(unsigned >> long node, >> fdt_boot_cpuid_phys(initial_boot_params)) { >> found = boot_cpu_count; >> found_thread = i; >> + /* >> + * Map boot-cpu logical id into the range >> + * of [0, thread_per_core) if it can't be >> + * accommodated within nr_cpu_ids. >> + */ >> + if (i != boot_cpu_count && boot_cpu_count >= nr_cpu_ids) { >> + boot_cpuid = i; >> + DBG("Logical CPU number for boot CPU changed from %d to %d\n", >> + boot_cpu_count, i); >> + } else { >> + boot_cpuid = boot_cpu_count; >> + } >> + >> + /* Ensure boot thread is acconted for in nr_cpu_ids */ >> + if (boot_cpuid >= nr_cpu_ids) { >> + set_nr_cpu_ids(boot_cpuid + 1); >> + DBG("Adjusted nr_cpu_ids to %u, to include boot CPU.\n", >> + nr_cpu_ids); >> + } >> } >> #ifdef CONFIG_SMP >> /* logical cpu id is always 0 on UP kernels */ >> @@ -368,9 +389,8 @@ static int __init early_init_dt_scan_cpus(unsigned >> long node, >> if (found < 0) >> return 0; >> >> - DBG("boot cpu: logical %d physical %d\n", found, >> + DBG("boot cpu: logical %d physical %d\n", boot_cpuid, >> be32_to_cpu(intserv[found_thread])); >> - boot_cpuid = found; >> >> boot_cpu_hwid = be32_to_cpu(intserv[found_thread]); >> >> diff --git a/arch/powerpc/kernel/setup-common.c >> b/arch/powerpc/kernel/setup-common.c >> index b7b733474b60..f7179525c774 100644 >> --- a/arch/powerpc/kernel/setup-common.c >> +++ b/arch/powerpc/kernel/setup-common.c >> @@ -409,6 +409,12 @@ static void __init cpu_init_thread_core_maps(int tpc) >> >> u32 *cpu_to_phys_id = NULL; >> >> +struct interrupt_server_node { >> + bool avail; >> + int len; >> + __be32 intserv[]; >> +}; >> + >> /** >> * setup_cpu_maps - initialize the following cpu maps: >> * cpu_possible_mask >> @@ -429,9 +435,13 @@ u32 *cpu_to_phys_id = NULL; >> */ >> void __init smp_setup_cpu_maps(void) >> { >> + struct interrupt_server_node *core0_node = NULL, *bt_node = NULL; >> + int orig_boot_cpu = -1, orig_boot_thread = -1; >> + bool found_boot_cpu = false; >> struct device_node *dn; >> - int cpu = 0; >> int nthreads = 1; >> + int cpu = 0; >> + int j, len; >> >> DBG("smp_setup_cpu_maps()\n"); >> >> @@ -442,9 +452,9 @@ void __init smp_setup_cpu_maps(void) >> __func__, nr_cpu_ids * sizeof(u32), __alignof__(u32)); >> >> for_each_node_by_type(dn, "cpu") { >> + bool avail, skip = false; >> const __be32 *intserv; >> __be32 cpu_be; >> - int j, len; >> >> DBG(" * %pOF...\n", dn); >> >> @@ -466,29 +476,121 @@ void __init smp_setup_cpu_maps(void) >> >> nthreads = len / sizeof(int); >> >> - for (j = 0; j < nthreads && cpu < nr_cpu_ids; j++) { >> - bool avail; >> + avail = of_device_is_available(dn); >> + if (!avail) >> + avail = !of_property_match_string(dn, >> + "enable-method", "spin-table"); >> + >> + for (j = 0; (cpu == 0 || !found_boot_cpu) && j < nthreads; j++) { >> + if (be32_to_cpu(intserv[j]) == boot_cpu_hwid) { >> + found_boot_cpu = true; >> + if (cpu == 0) >> + break; >> + >> + /* Original logical CPU number of thread0 in boot core */ >> + orig_boot_cpu = cpu; >> + orig_boot_thread = j; >> + bt_node = memblock_alloc(sizeof(struct interrupt_server_node) + len, >> + __alignof__(u32)); >> + if (!bt_node) >> + panic("%s: Failed to allocate %zu bytes align=0x%zx\n", >> + __func__, >> + sizeof(struct interrupt_server_node) + len, >> + __alignof__(u32)); >> + bt_node->len = len; >> + memcpy(bt_node->intserv, intserv, len); >> + bt_node->avail = avail; >> + skip = true; >> + break; >> + } >> + } >> >> + /* >> + * Boot CPU not on core0. >> + * Hold off adding core0 until boot core is found as core0 >> + * may have to be replaced with boot core if boot core can >> + * not be accommodated within nr_cpu_ids with its original >> + * logical CPU numbering. >> + */ >> + if (cpu == 0 && !found_boot_cpu) { >> + core0_node = memblock_alloc(sizeof(struct interrupt_server_node) + len, >> + __alignof__(u32)); >> + if (!core0_node) >> + panic("%s: Failed to allocate %zu bytes align=0x%zx\n", >> + __func__, >> + sizeof(struct interrupt_server_node) + len, >> + __alignof__(u32)); >> + core0_node->len = len; >> + memcpy(core0_node->intserv, intserv, len); >> + core0_node->avail = avail; >> + skip = true; >> + } >> + >> + if (skip) { >> + /* Assumes same number of threads for all cores */ >> + cpu += nthreads; >> + continue; >> + } >> + >> + for (j = 0; j < nthreads && cpu < nr_cpu_ids; j++) { >> DBG(" thread %d -> cpu %d (hard id %d)\n", >> j, cpu, be32_to_cpu(intserv[j])); >> >> - avail = of_device_is_available(dn); >> - if (!avail) >> - avail = !of_property_match_string(dn, >> - "enable-method", "spin-table"); >> - >> set_cpu_present(cpu, avail); >> set_cpu_possible(cpu, true); >> cpu_to_phys_id[cpu] = be32_to_cpu(intserv[j]); >> cpu++; >> } >> >> - if (cpu >= nr_cpu_ids) { >> + if (found_boot_cpu && cpu >= nr_cpu_ids) { >> of_node_put(dn); >> break; >> } >> } >> >> + /* >> + * Boot CPU not on core0. >> + * >> + * If nr_cpu_ids does not accommodate the original logical CPU >> numbering for >> + * boot CPU core, use logical CPU numbers 0 to nthreads for boot CPU core. >> + * Note that boot cpu is already assigned with logical CPU number >> somewhere >> + * between 0 to nthreads (depending on the boot thread within the core) in >> + * early_init_dt_scan_cpus() for this case. >> + * >> + * Otherwise, stick with the original logical CPU numbering. >> + */ >> + if (bt_node) { >> + int core0_cpu; >> + >> + if (orig_boot_cpu + orig_boot_thread >= nr_cpu_ids) { >> + cpu = 0; >> + core0_cpu = orig_boot_cpu; >> + } else { >> + cpu = orig_boot_cpu; >> + core0_cpu = 0; >> + } >> + >> + for (j = 0; j < nthreads && core0_cpu < nr_cpu_ids; j++) { >> + DBG(" thread %d -> cpu %d (hard id %d)\n", >> + j, core0_cpu, be32_to_cpu(core0_node->intserv[j])); >> + >> + set_cpu_present(core0_cpu, core0_node->avail); >> + set_cpu_possible(core0_cpu, true); >> + cpu_to_phys_id[core0_cpu] = be32_to_cpu(core0_node->intserv[j]); >> + core0_cpu++; >> + } >> + >> + for (j = 0; j < nthreads && cpu < nr_cpu_ids; j++) { >> + DBG(" thread %d -> cpu %d (hard id %d)\n", >> + j, cpu, be32_to_cpu(bt_node->intserv[j])); >> + >> + set_cpu_present(cpu, bt_node->avail); >> + set_cpu_possible(cpu, true); >> + cpu_to_phys_id[cpu] = be32_to_cpu(bt_node->intserv[j]); >> + cpu++; >> + } >> + } >> + >> /* If no SMT supported, nthreads is forced to 1 */ >> if (!cpu_has_feature(CPU_FTR_SMT)) { >> DBG(" SMT disabled ! nthreads forced to 1\n"); >> > Thanks Hari