From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id AEF50CDB465 for ; Tue, 17 Oct 2023 02:13:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References: Message-ID:Subject:Cc:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=o3J40Kfi+CpxNv8Yi706wL/VellZwF3Zuif8hph0MBs=; b=GfYSjt3hXGxu9Q NZ5zjHRio6hWgQBdawTlJQ3rutUFNnWzkwvIUGUd8WVXmQlv53WVqYKtiq8UbPvPS/UF4WP1SXWCL T+pEaPJjNE1SRFtdEXj90jb725rE5O5ZXYZLu7yODKFqyos/2ua2vzs9kVZA6k05reAH9VQ/SzHl0 8OidDvBoey/NkDYvsiMwf7CgPCdR/LHDeF08dNAfiRR2J9CdH/xxlsurMQa+K0oUkf01kpgE0Q1LH Y7sxHoC59/+zu7PFpa/SUTlS5bzyYNBA7XLEmTYP0Wyg2IA+tp/ys9XGtloXVTHtqghwJtjVRYbYm dNcqvRSCB/8731RY8RIQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1qsZa7-00B2om-2i; Tue, 17 Oct 2023 02:13:15 +0000 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1qsZa4-00B2nn-1B for kexec@lists.infradead.org; Tue, 17 Oct 2023 02:13:15 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1697508790; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0crWK+B+Y/C/4ysTM6gWKHJYLV3Xeajs4cO/Z4jeHkc=; b=MxlzVDD1zjb4pGxvmmOF9n0jAgv2tQr/EkTJDYl5+LD8OsFDB1WcrkDg3Z7//l+NdQrEhT GMKRMiR4aKmaVxlSGUJBYDzVeShOKr5Z1tXD6omp2mu/wiWiOeHzDjSDvTZKi1M4LrrcoV Yl66fwL6F//ykhrb6HDuSIauOMTuJj8= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-127-XxrPTpvwMpiRlll5_-WQuQ-1; Mon, 16 Oct 2023 22:13:04 -0400 X-MC-Unique: XxrPTpvwMpiRlll5_-WQuQ-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 753798E8C61; Tue, 17 Oct 2023 02:13:04 +0000 (UTC) Received: from piliu.users.ipa.redhat.com (unknown [10.72.120.5]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 4EDC1492BEE; Tue, 17 Oct 2023 02:12:58 +0000 (UTC) Date: Tue, 17 Oct 2023 10:12:53 +0800 From: Pingfan Liu To: Sourabh Jain Cc: Baoquan He , kexec@lists.infradead.org, Mahesh Salgaonkar , Nicholas Piggin , Ming Lei , linuxppc-dev@lists.ozlabs.org, Wen Xiong Subject: Re: [PATCHv8 1/5] powerpc/setup : Enable boot_cpu_hwid for PPC32 Message-ID: References: <20231009113036.45988-1-piliu@redhat.com> <20231009113036.45988-2-piliu@redhat.com> <36b2aa11-dfd9-529d-0386-aa181caeb93f@linux.ibm.com> <7da645c3-2899-fb01-bf11-392b9bd52e38@linux.ibm.com> <4bb6a4a0-525b-d36d-c5ac-127357e96eb3@linux.ibm.com> MIME-Version: 1.0 In-Reply-To: X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.9 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Disposition: inline X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20231016_191312_572941_61599825 X-CRM114-Status: GOOD ( 33.59 ) X-BeenThere: kexec@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Sender: "kexec" Errors-To: kexec-bounces+kexec=archiver.kernel.org@lists.infradead.org On Mon, Oct 16, 2023 at 12:13:53PM +0530, Sourabh Jain wrote: > Hello Pingfan, > = > > > > > > With this patch series applied, the kdump kernel fails to boot = on > > > > > > powerpc with nr_cpus=3D1. > > > > > > = > > > > > > Console logs: > > > > > > ------------------- > > > > > > [root]# echo c > /proc/sysrq-trigger > > > > > > [ 74.783235] sysrq: Trigger a crash > > > > > > [ 74.783244] Kernel panic - not syncing: sysrq triggered crash > > > > > > [ 74.783252] CPU: 58 PID: 3838 Comm: bash Kdump: loaded Not t= ainted > > > > > > 6.6.0-rc5pf-nr-cpus+ #3 > > > > > > [ 74.783259] Hardware name: POWER10 (raw) phyp pSeries > > > > > > [ 74.783275] Call Trace: > > > > > > [ 74.783280] [c00000020f4ebac0] [c000000000ed9f38] > > > > > > dump_stack_lvl+0x6c/0x9c (unreliable) > > > > > > [ 74.783291] [c00000020f4ebaf0] [c000000000150300] panic+0x17= 8/0x438 > > > > > > [ 74.783298] [c00000020f4ebb90] [c000000000936d48] > > > > > > sysrq_handle_crash+0x28/0x30 > > > > > > [ 74.783304] [c00000020f4ebbf0] [c00000000093773c] > > > > > > __handle_sysrq+0x10c/0x250 > > > > > > [ 74.783309] [c00000020f4ebc90] [c000000000937fa8] > > > > > > write_sysrq_trigger+0xc8/0x168 > > > > > > [ 74.783314] [c00000020f4ebcd0] [c000000000665d8c] > > > > > > proc_reg_write+0x10c/0x1b0 > > > > > > [ 74.783321] [c00000020f4ebd00] [c00000000058da54] > > > > > > vfs_write+0x104/0x4b0 > > > > > > [ 74.783326] [c00000020f4ebdc0] [c00000000058dfdc] > > > > > > ksys_write+0x7c/0x140 > > > > > > [ 74.783331] [c00000020f4ebe10] [c000000000033a64] > > > > > > system_call_exception+0x144/0x3a0 > > > > > > [ 74.783337] [c00000020f4ebe50] [c00000000000c554] > > > > > > system_call_common+0xf4/0x258 > > > > > > [ 74.783343] --- interrupt: c00 at 0x7fffa0721594 > > > > > > [ 74.783352] NIP: 00007fffa0721594 LR: 00007fffa0697bf4 CTR: > > > > > > 0000000000000000 > > > > > > [ 74.783364] REGS: c00000020f4ebe80 TRAP: 0c00 Not tainted > > > > > > (6.6.0-rc5pf-nr-cpus+) > > > > > > [ 74.783376] MSR: 800000000280f033 > > > > > > CR: 28222202 XER: 000000= 00 > > > > > > [ 74.783394] IRQMASK: 0 > > > > > > [ 74.783394] GPR00: 0000000000000004 00007ffffc4b6800 00007ff= fa0807300 > > > > > > 0000000000000001 > > > > > > [ 74.783394] GPR04: 000000013549ea60 0000000000000002 0000000= 000000010 > > > > > > 0000000000000000 > > > > > > [ 74.783394] GPR08: 0000000000000000 0000000000000000 0000000= 000000000 > > > > > > 0000000000000000 > > > > > > [ 74.783394] GPR12: 0000000000000000 00007fffa0abaf70 0000000= 040000000 > > > > > > 000000011a0f9798 > > > > > > [ 74.783394] GPR16: 000000011a0f9724 000000011a097688 0000000= 11a02ff70 > > > > > > 000000011a0fd568 > > > > > > [ 74.783394] GPR20: 0000000135554bf0 0000000000000001 0000000= 11a0aa478 > > > > > > 00007ffffc4b6a24 > > > > > > [ 74.783394] GPR24: 00007ffffc4b6a20 000000011a0faf94 0000000= 000000002 > > > > > > 000000013549ea60 > > > > > > [ 74.783394] GPR28: 0000000000000002 00007fffa08017a0 0000000= 13549ea60 > > > > > > 0000000000000002 > > > > > > [ 74.783440] NIP [00007fffa0721594] 0x7fffa0721594 > > > > > > [ 74.783443] LR [00007fffa0697bf4] 0x7fffa0697bf4 > > > > > > [ 74.783447] --- interrupt: c00 > > > > > > I'm in purgatory > > > > > > [ 0.000000] radix-mmu: Page sizes from device-tree: > > > > > > [ 0.000000] radix-mmu: Page size shift =3D 12 AP=3D0x0 > > > > > > [ 0.000000] radix-mmu: Page size shift =3D 16 AP=3D0x5 > > > > > > [ 0.000000] radix-mmu: Page size shift =3D 21 AP=3D0x1 > > > > > > [ 0.000000] radix-mmu: Page size shift =3D 30 AP=3D0x2 > > > > > > [ 0.000000] Activating Kernel Userspace Access Prevention > > > > > > [ 0.000000] Activating Kernel Userspace Execution Prevention > > > > > > [ 0.000000] radix-mmu: Mapped 0x0000000000000000-0x000000000= 0010000 > > > > > > with 64.0 KiB pages (exec) > > > > > > [ 0.000000] radix-mmu: Mapped 0x0000000000010000-0x000000000= 0200000 > > > > > > with 64.0 KiB pages > > > > > > [ 0.000000] radix-mmu: Mapped 0x0000000000200000-0x000000002= 0000000 > > > > > > with 2.00 MiB pages > > > > > > [ 0.000000] radix-mmu: Mapped 0x0000000020000000-0x000000002= 2600000 > > > > > > with 2.00 MiB pages (exec) > > > > > > [ 0.000000] radix-mmu: Mapped 0x0000000022600000-0x000000004= 0000000 > > > > > > with 2.00 MiB pages > > > > > > [ 0.000000] radix-mmu: Mapped 0x0000000040000000-0x000000018= 0000000 > > > > > > with 1.00 GiB pages > > > > > > [ 0.000000] radix-mmu: Mapped 0x0000000180000000-0x00000001a= 0000000 > > > > > > with 2.00 MiB pages > > > > > > [ 0.000000] lpar: Using radix MMU under hypervisor > > > > > > [ 0.000000] Linux version 6.6.0-rc5pf-nr-cpus+ > > > > > > (root@ltcever7x0-lp1.aus.stglabs.ibm.com) (gcc (GCC) 8.5.0 2021= 0514 (Red > > > > > > Hat 8.5.0-20), GNU ld version 2.30-123.el8) #3 SMP Mon Oct 9 1= 1:07: > > > > > > 41 CDT 2023 > > > > > > [ 0.000000] Found initrd at 0xc000000022e60000:0xc0000000248= f08d8 > > > > > > [ 0.000000] Hardware name: IBM,9043-MRX POWER10 (raw) 0x8002= 00 > > > > > > 0xf000006 of:IBM,FW1060.00 (NM1060_016) hv:phyp pSeries > > > > > > [ 0.000000] printk: bootconsole [udbg0] enabled > > > > > > [ 0.000000] the round shift between dt seq and the cpu logic= number: > > > > > > 56 > > > > > > [ 0.000000] BUG: Unable to handle kernel data access on writ= e at > > > > > > 0xc0000001a0000000 > > > > > > [ 0.000000] Faulting instruction address: 0xc000000022009c64 > > > > > > [ 0.000000] Oops: Kernel access of bad area, sig: 11 [#1] > > > > > > [ 0.000000] LE PAGE_SIZE=3D64K MMU=3DRadix SMP NR_CPUS=3D204= 8 NUMA pSeries > > > > > > [ 0.000000] Modules linked in: > > > > > > [ 0.000000] CPU: 2 PID: 0 Comm: swapper Not tainted > > > > > > 6.6.0-rc5pf-nr-cpus+ #3 > > > > > > [ 0.000000] Hardware name: POWER10 (raw) hv:phyp pSeries > > > > > > [ 0.000000] NIP: c000000022009c64 LR: c000000022009c54 CTR: > > > > > > c0000000201ff348 > > > > > > [ 0.000000] REGS: c000000022aebb00 TRAP: 0300 Not tainted > > > > > > (6.6.0-rc5pf-nr-cpus+) > > > > > > [ 0.000000] MSR: 8000000000001033 CR: 2= 8222824 > > > > > > XER: 00000001 > > > > > > [ 0.000000] CFAR: c000000020031574 DAR: c0000001a0000000 DSI= SR: > > > > > > 42000000 IRQMASK: 1 > > > > > > [ 0.000000] GPR00: c000000022009ba0 c000000022aebda0 c000000= 0213d1300 > > > > > > 0000000000000004 > > > > > > [ 0.000000] GPR04: 0000000000000001 c000000022aebbc0 c000000= 022aebbb8 > > > > > > 0000000000000000 > > > > > > [ 0.000000] GPR08: 0000000000000001 c00000019ffffff8 0000000= 00000003a > > > > > > c0000000229c8a78 > > > > > > [ 0.000000] GPR12: 0000000000002000 c000000022e4a800 c000000= 0211d34b8 > > > > > > c0000000211d3aa8 > > > > > > [ 0.000000] GPR16: c0000000211d75a0 c0000000211d75b0 c000000= 0225f3b98 > > > > > > 0000000000000000 > > > > > > [ 0.000000] GPR20: 0000000000000001 0000000000000001 0000000= 000000001 > > > > > > 0000000000000001 > > > > > > [ 0.000000] GPR24: 0000000000000008 0000000000000000 0000000= 000000001 > > > > > > c00000019ffffdc0 > > > > > > [ 0.000000] GPR28: 0000000000000002 c000000022b368e0 c000000= 022aebe08 > > > > > > 0000000000000008 > > > > > > [ 0.000000] NIP [c000000022009c64] smp_setup_cpu_maps+0x420/= 0x724 > > > > > > [ 0.000000] LR [c000000022009c54] smp_setup_cpu_maps+0x410/0= x724 > > > > > > [ 0.000000] Call Trace: > > > > > > [ 0.000000] [c000000022aebda0] [c000000022009ba0] > > > > > > smp_setup_cpu_maps+0x35c/0x724 (unreliable) > > > > > > [ 0.000000] [c000000022aebeb0] [c00000002200a19c] > > > > > > setup_arch+0x1b8/0x54c > > > > > > [ 0.000000] [c000000022aebf30] [c000000022003f88] > > > > > > start_kernel+0xb0/0x768 > > > > > > [ 0.000000] [c000000022aebfe0] [c00000002000d888] > > > > > > start_here_common+0x1c/0x20 > > > > > > [ 0.000000] Code: 3929ffff 7f89e040 409c002c 7ec4b378 7f83e3= 78 > > > > > > 4a027939 7f83e378 4a0278e5 e95b0018 3d22017d e929f028 7d4ac42c > > > > > > <7d49c12e> eb7b0000 7e99a378 4bffff3c > > > > > The faulting instruction address, 0xc000000022009c6, corresponds = to the code > > > > > below: > > > > > = > > > > > File: > > > > > arch/powerpc/kernel/setup-common.c > > > > > = > > > > > Function > > > > > void __init smp_setup_cpu_maps(void) > > > > > { > > > > > ... > > > > > cpu_to_phys_id[bt_thread] =3D > > > > > be32_to_cpu(intserv_node->intserv[bt_thread]); > > > > > ... > > > > > } > > > > > = > > > > > Hope it helps. > > > > > = > > > > Appreciate your help. > > > > = > > > > This issue should be linked with the capability of cpu_to_phys_id[]. > > > > = > > > > Could you please to try the fix suggested at the end of the email? > > > > It should be a fix for > > > > [PATCHv8 3/5] powerpc/setup: Handle the case when boot_cpuid greate= r than nr_cpus > > > > = > > > > = > > > > Thanks, > > > > = > > > > Pingfan > > > > = > > > > --- > > > > = > > > > diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kern= el/setup-common.c > > > > index bd7853a4bc91..849adc7a4b47 100644 > > > > --- a/arch/powerpc/kernel/setup-common.c > > > > +++ b/arch/powerpc/kernel/setup-common.c > > > > @@ -464,12 +464,6 @@ void __init smp_setup_cpu_maps(void) > > > > DBG("smp_setup_cpu_maps()\n"); > > > > = > > > > INIT_LIST_HEAD(&head); > > > > - cpu_to_phys_id =3D memblock_alloc(nr_cpu_ids * sizeof(u32), > > > > - __alignof__(u32)); > > > > - if (!cpu_to_phys_id) > > > > - panic("%s: Failed to allocate %zu bytes align=3D0x%zx= \n", > > > > - __func__, nr_cpu_ids * sizeof(u32), __alignof__= (u32)); > > > > - > > > > for_each_node_by_type(dn, "cpu") { > > > > const __be32 *intserv; > > > > __be32 cpu_be; > > > > @@ -533,6 +527,16 @@ void __init smp_setup_cpu_maps(void) > > > > } > > > > = > > > > } > > > > + > > > > + /* There may be hole between cpu0 and boot cpu */ > > > > + j =3D (bt_thread + 1) > nr_cpu_ids ? (bt_thread + 1) : nr_cpu= _ids; > > > > + cpu_to_phys_id =3D memblock_alloc(j * sizeof(u32), > > > > + __alignof__(u32)); > > > > + if (!cpu_to_phys_id) > > > > + panic("%s: Failed to allocate %zu bytes align=3D0x%zx= \n", > > > > + __func__, nr_cpu_ids * sizeof(u32), __alignof__= (u32)); > > > > + > > > > + > > > > cpu =3D 0; > > > > list_del_init(&head); > > > > /* Select the primary thread, the boot cpu's slibing, as the= logic 0 */ > > > With the above changes applied, kdump kernel boots fine with a WARNIN= G: > > > = > > > [root]# echo c > /proc/sysrq-trigger > > > [ 310.748248] sysrq: Trigger a crash > > > [ 310.748256] Kernel panic - not syncing: sysrq triggered crash > > > [ 310.748266] CPU: 26 PID: 2610 Comm: bash Kdump: loaded Not tainted > > > 6.6.0-rc5-fix-setup-common+ #3 > > > [ 310.748273] Hardware name: IBM,9043-MRX POWER10 hv:phyp pSeries > > > [ 310.748280] Call Trace: > > > [ 310.748284] [c000000184717ac0] [c000000000ecf8d8] > > > dump_stack_lvl+0x6c/0x9c (unreliable) > > > [ 310.748298] [c000000184717af0] [c000000000150310] panic+0x178/0x438 > > > [ 310.748307] [c000000184717b90] [c00000000092c8b8] > > > sysrq_handle_crash+0x28/0x30 > > > [ 310.748316] [c000000184717bf0] [c00000000092d2ac] > > > __handle_sysrq+0x10c/0x250 > > > [ 310.748330] [c000000184717c90] [c00000000092db18] > > > write_sysrq_trigger+0xc8/0x168 > > > [ 310.748339] [c000000184717cd0] [c00000000065c21c] > > > proc_reg_write+0x10c/0x1b0 > > > [ 310.748349] [c000000184717d00] [c000000000583f94] vfs_write+0x104/= 0x4b0 > > > [ 310.748356] [c000000184717dc0] [c00000000058451c] ksys_write+0x7c/= 0x140 > > > [ 310.748365] [c000000184717e10] [c000000000033a54] > > > system_call_exception+0x144/0x3a0 > > > [ 310.748377] [c000000184717e50] [c00000000000c554] > > > system_call_common+0xf4/0x258 > > > [ 310.748389] --- interrupt: c00 at 0x7fff97720c34 > > > [ 310.748395] NIP: 00007fff97720c34 LR: 00007fff97697c74 CTR: > > > 0000000000000000 > > > [ 310.748404] REGS: c000000184717e80 TRAP: 0c00 Not tainted > > > (6.6.0-rc5-fix-setup-common+) > > > [ 310.748413] MSR: 800000000280f033 > > > CR: 28222202 XER: 00000000 > > > [ 310.748430] IRQMASK: 0 > > > [ 310.748430] GPR00: 0000000000000004 00007fffffabc510 00007fff97807= 300 > > > 0000000000000001 > > > [ 310.748430] GPR04: 00000001624f7910 0000000000000002 0000000000000= 010 > > > 00007fff97669724 > > > [ 310.748430] GPR08: 0000000000000000 0000000000000000 0000000000000= 000 > > > 0000000000000000 > > > [ 310.748430] GPR12: 0000000000000000 00007fff97a5aee0 0000000040000= 000 > > > 0000000125c39798 > > > [ 310.748430] GPR16: 0000000125c39724 0000000125bd8128 0000000125b70= 370 > > > 0000000125c3d568 > > > [ 310.748430] GPR20: 0000000162551030 0000000000000001 0000000125bea= f18 > > > 00007fffffabc734 > > > [ 310.748430] GPR24: 00007fffffabc730 0000000125c3af94 0000000000000= 002 > > > 00000001624f7910 > > > [ 310.748430] GPR28: 0000000000000002 00007fff97801798 00000001624f7= 910 > > > 0000000000000002 > > > [ 310.748475] NIP [00007fff97720c34] 0x7fff97720c34 > > > [ 310.748478] LR [00007fff97697c74] 0x7fff97697c74 > > > [ 310.748482] --- interrupt: c00 > > > I'm in purgatory > > > [ 0.000000] radix-mmu: Page sizes from device-tree: > > > [ 0.000000] radix-mmu: Page size shift =3D 12 AP=3D0x0 > > > [ 0.000000] radix-mmu: Page size shift =3D 16 AP=3D0x5 > > > [ 0.000000] radix-mmu: Page size shift =3D 21 AP=3D0x1 > > > [ 0.000000] radix-mmu: Page size shift =3D 30 AP=3D0x2 > > > [ 0.000000] Activating Kernel Userspace Access Prevention > > > [ 0.000000] Activating Kernel Userspace Execution Prevention > > > [ 0.000000] radix-mmu: Mapped 0x0000000000000000-0x0000000000010000 > > > with 64.0 KiB pages (exec) > > > [ 0.000000] radix-mmu: Mapped 0x0000000000010000-0x0000000000200000 > > > with 64.0 KiB pages > > > [ 0.000000] radix-mmu: Mapped 0x0000000000200000-0x0000000020000000 > > > with 2.00 MiB pages > > > [ 0.000000] radix-mmu: Mapped 0x0000000020000000-0x0000000022600000 > > > with 2.00 MiB pages (exec) > > > [ 0.000000] radix-mmu: Mapped 0x0000000022600000-0x0000000040000000 > > > with 2.00 MiB pages > > > = > > > Trimmed logs .... > > > = > > > [ 0.001738] Mount-cache hash table entries: 16384 (order: 1, 131072 > > > bytes, linear) > > > [ 0.001751] Mountpoint-cache hash table entries: 16384 (order: 1, > > > 131072 bytes, linear) > > > [ 0.007339] ------------[ cut here ]------------ > > > [ 0.007356] WARNING: CPU: 2 PID: 1 at arch/powerpc/kernel/smp.c:941 > > > update_mask_from_threadgroup+0x128/0x1a0 > > > [ 0.007371] Modules linked in: > > > [ 0.007377] CPU: 2 PID: 1 Comm: swapper/2 Not tainted > > > 6.6.0-rc5-fix-setup-common+ #3 > > > [ 0.007385] Hardware name: IBM,9043-MRX POWER10 hv:phyp pSeries > > > [ 0.007393] NIP: c000000022011ed8 LR: c000000022011e10 CTR: > > > 0000000000000000 > > > [ 0.007411] REGS: c0000000256338f0 TRAP: 0700 Not tainted > > > (6.6.0-rc5-fix-setup-common+) > > > [ 0.007425] MSR: 8000000002029033 CR: > > > 44000842 XER: 0000000c > > > [ 0.007444] CFAR: c000000022011e78 IRQMASK: 0 > > > [ 0.007444] GPR00: c000000022011e10 c000000025633b90 c0000000213c1= 300 > > > 0000000000000002 > > > [ 0.007444] GPR04: 0000000000000000 0000000000000005 0000000000000= 001 > > > 0000000000000002 > > > [ 0.007444] GPR08: 0000000000000008 0000000000000001 0000000000000= 002 > > > 0000000000000004 > > > [ 0.007444] GPR12: 0000000000000000 c000000022e3ac00 c000000020010= 138 > > > 0000000000000000 > > > [ 0.007444] GPR16: 0000000000000000 0000000000000000 0000000000000= 000 > > > 0000000000000000 > > > [ 0.007444] GPR20: 0000000000000018 c000000022150968 c000000022093= 580 > > > c0000000253df000 > > > [ 0.007444] GPR24: 0000000000000002 0000000000000000 c000000022b32= 058 > > > 0000000000000000 > > > [ 0.007444] GPR28: c00000015fca0a68 c000000022ba0330 c000000022093= 52c > > > 0000000000000000 > > > [ 0.007520] NIP [c000000022011ed8] > > > update_mask_from_threadgroup+0x128/0x1a0 > > > [ 0.007528] LR [c000000022011e10] update_mask_from_threadgroup+0x6= 0/0x1a0 > > > [ 0.007536] Call Trace: > > > [ 0.007539] [c000000025633b90] [c000000022011e10] > > > update_mask_from_threadgroup+0x60/0x1a0 (unreliable) > > > [ 0.007550] [c000000025633be0] [c000000022012210] > > > init_thread_group_cache_map+0x2c0/0x338 > > > [ 0.007559] [c000000025633c50] [c0000000220125a0] > > > smp_prepare_cpus+0x318/0x510 > > > [ 0.007568] [c000000025633d10] [c000000022004874] > > > kernel_init_freeable+0x198/0x3cc > > > [ 0.007578] [c000000025633de0] [c000000020010164] kernel_init+0x34= /0x1b0 > > > [ 0.007586] [c000000025633e50] [c00000002000cd94] > > > ret_from_kernel_user_thread+0x14/0x1c > > > [ 0.007596] --- interrupt: 0 at 0x0 > > > [ 0.007601] NIP: 0000000000000000 LR: 0000000000000000 CTR: > > > 0000000000000000 > > > [ 0.007608] REGS: c000000025633e80 TRAP: 0000 Not tainted > > > (6.6.0-rc5-fix-setup-common+) > > > [ 0.007632] MSR: 0000000000000000 <> CR: 00000000 XER: 00000000 > > > [ 0.007651] CFAR: 0000000000000000 IRQMASK: 0 > > > [ 0.007651] GPR00: 0000000000000000 0000000000000000 0000000000000= 000 > > > 0000000000000000 > > > [ 0.007651] GPR04: 0000000000000000 0000000000000000 0000000000000= 000 > > > 0000000000000000 > > > [ 0.007651] GPR08: 0000000000000000 0000000000000000 0000000000000= 000 > > > 0000000000000000 > > > [ 0.007651] GPR12: 0000000000000000 0000000000000000 0000000000000= 000 > > > 0000000000000000 > > > [ 0.007651] GPR16: 0000000000000000 0000000000000000 0000000000000= 000 > > > 0000000000000000 > > > [ 0.007651] GPR20: 0000000000000000 0000000000000000 0000000000000= 000 > > > 0000000000000000 > > > [ 0.007651] GPR24: 0000000000000000 0000000000000000 0000000000000= 000 > > > 0000000000000000 > > > [ 0.007651] GPR28: 0000000000000000 0000000000000000 0000000000000= 000 > > > 0000000000000000 > > > [ 0.007742] NIP [0000000000000000] 0x0 > > > [ 0.007756] LR [0000000000000000] 0x0 > > > [ 0.007769] --- interrupt: 0 > > > [ 0.007779] Code: 7ca507b4 79081764 7d1e4214 8108000c 7f882000 > > > 409effdc 48000010 38e70001 7ce707b4 4bffffa4 2f8affff 409e0010 > > > <0fe00000> 3860ffc3 4800004c 7f9b5000 > > > [ 0.007805] ---[ end trace 0000000000000000 ]--- > > > [ 0.007997] RCU Tasks Rude: Setting shift to 2 and lim to 1 > > > rcu_task_cb_adjust=3D1. > > > [ 0.008018] RCU Tasks Trace: Setting shift to 2 and lim to 1 > > > rcu_task_cb_adjust=3D1. > > > [ 0.008043] POWER10 performance monitor hardware support registered > > > [ 0.008071] rcu: Hierarchical SRCU implementation. > > > [ 0.008078] rcu: Max phase no-delay instances is 1000. > > > [ 0.008516] smp: Bringing up secondary CPUs ... > > > [ 0.008735] smp: Brought up 1 node, 2 CPUs > > > ... > > > = > > > Note: no warning observed if crashing CPU is 0, 8, 16, 24, 32, .... > > > = > > > Code that generates warning: > > > = > > > File: arch/powerpc/kernel/smp.c > > > Function: update_mask_from_threadgroup > > > ... > > > if (unlikely(i_group_start =3D=3D -1)) { > > > WARN_ON_ONCE(1); > > > return -ENODATA; > > > } > > > = > > It seems that the crash cpu passed the statements in > > init_thread_group_cache_map() > > { > > if (unlikely(cpu_group_start =3D=3D -1)) { > > WARN_ON_ONCE(1); > > return -ENODATA; > > } > > = > > } > > = > > But raising warn in the above snippet. So it means that > > get_cpu_thread_group_start(i, tg) for the @first_thread failed in > > update_mask_from_threadgroup(). At present, I have no idea about it. > > = > > = > > And is this warning observed if only applying [1-2/5] ? > = > No warning observed with just 1-2/5 patches. > = Good to know it. I think that [1-2/5] can be a first step. I will post V9, which trims [3-5/5] later. > > = > > According to my collected data, percpu area will cost 1792 kB per cpu. > > Forcing all eight threads in a core online will cost 107z52KB more than > > the result if applying the whole series. Maybe I can put [3-5/5] aside > > as Hari suggested, and try them later if needed. > In my experiment 7MB was allocated for Percpu for both nr_cpus=3D1 and > nr_cpus=3D8 if only 1-2/5 patches are applied. > = > Trimmed output of lscpu and cat /proc/meminfo > = > With nr_cpus=3D1 > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > = > kdump:/# lscpu > Architecture:=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 ppc64le > =A0 Byte Order:=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 Little Endian > CPU(s):=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 8 > =A0 On-line CPU(s) list:=A0=A0 0,3 > =A0 Off-line CPU(s) list:=A0 1,2,4-7 The nr_cpus has the semantic of possible cpu, instead of online cpu. It includes both On-line and Off-line CPU(s) list. So Percpu area allocates memory for eight cpus. > Model name:=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 POWER10 > kdump:/# > kdump:/# cat /proc/meminfo | grep Percpu > Percpu:=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 7168 kB > kdump:/# > = > = > with nr_cpus=3D8 > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > = > kdump:/# lscpu > Architecture:=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 ppc64le > =A0 Byte Order:=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 Little Endian > CPU(s):=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 8 > =A0 On-line CPU(s) list:=A0=A0 0,2 > =A0 Off-line CPU(s) list:=A0 1,3-7 > Model name:=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 POWER10 > = > kdump:/# > kdump:/# cat /proc/meminfo | grep Percpu > Percpu:=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 7168 kB > = Here Percpu area also allocates memory for eight cpus, hence the size is identical to the former one. Again, thank you for your help and precious time. Regards, Pingfan _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec