From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 9E05EE77170 for ; Wed, 4 Dec 2024 19:15:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Type:MIME-Version: References:In-Reply-To:Subject:Cc:To:From:Message-ID:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=oMeJs2fkxWVa6maqFMY2sAdjC23oUJsSZNVB/x6QDBk=; b=SwKgXSt3T9sQpD+lw+cTt+o8qa J8tYF9CZBq6mdnZqPweVg/wtQ9SOh7rQACbV6kisdWZo6AWwYp/3gNVlHKcEiMR2Fi8CA2KQUlOcX ru5PN6EZtJE/OBgL+zZW5PbDB9ft3ydFv26Bi6Iu+XAe1eaJkpWBR5j5v8C9nMqKDzOk5rzxEqFz3 oENnWPXt0YCV0u+DTUtdj/WDdfpSQlVv9A/rS2ROwKrsWIdaJfQX5rcmwHetJ5jmkzo+SRDFxOWaG CeLbw7VJLWyECPbCBD2YPyCTHrsweHVhcJiJ2WgGEBJIdPCu9xXxfParm6GWzits68wadpwEuPyDB d1S19v3w==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98 #2 (Red Hat Linux)) id 1tIuqI-0000000De5U-3nc2; Wed, 04 Dec 2024 19:15:22 +0000 Received: from nyc.source.kernel.org ([2604:1380:45d1:ec00::3]) by bombadil.infradead.org with esmtps (Exim 4.98 #2 (Red Hat Linux)) id 1tIuoq-0000000Ddaa-3NVv for linux-arm-kernel@lists.infradead.org; Wed, 04 Dec 2024 19:13:54 +0000 Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by nyc.source.kernel.org (Postfix) with ESMTP id 831C9A41BF0; Wed, 4 Dec 2024 19:11:59 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2FE0BC4CECD; Wed, 4 Dec 2024 19:13:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1733339631; bh=y6Z1BboDyGXrBlp/9phIXCTZkngI1+QTcK1R5uxh3x0=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=kttFc9TRajLPnFT656INLu7F7zoYsTFtdl64tEPXZvFZX00sNtktAik6cqhciy4RS esHrmNDjVuVzWxQ8H8rH5lusyFJXEbmB2WZjvDzoa3PZCanpPS1zB1jQz2GcVpvOQr 0e/u05VkaegndM545KbMqNaTWzfOn1MqNtlNZFerV1yYXofjkhb3LLA3V6mx7V61OI 6rCxGW74Zn9ClAGtAfQjjp+u2eBcM79lZAtMhc91GiOdFayRqVImFcd9maQTyrlSa4 NyBw2DftXCw5wU1zY9ouMgzzsbr1XyhkW4JvicP94YjVlGI1mYOelU84ziswwIv1wU 5Wus3Ar3q+Sng== Received: from 82-132-238-195.dab.02.net ([82.132.238.195] helo=wait-a-minute.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1tIuom-000YwB-1Z; Wed, 04 Dec 2024 19:13:48 +0000 Date: Wed, 04 Dec 2024 19:13:09 +0000 Message-ID: <87h67js1mi.wl-maz@kernel.org> From: Marc Zyngier To: Vitaly Chikunov Cc: Shameerali Kolothum Thodi , Will Deacon , "james.morse@arm.com" , "linux-arm-kernel@lists.infradead.org" , Catalin Marinas , "linux-kernel@vger.kernel.org" , "oliver.upton@linux.dev" , "mark.rutland@arm.com" , "Wangzhou (B)" , Gleb Fotengauer-Malinovskiy Subject: Re: v6.13-rc1: Internal error: Oops - Undefined instruction: 0000000002000000 [#1] SMP In-Reply-To: <20241204182231.ovvj6rpvcs2f5gv7@altlinux.org> References: <20241202045830.e4yy3nkvxtzaybxk@altlinux.org> <20241202153618.GA6834@willie-the-truck> <86ttbmt71k.wl-maz@kernel.org> <20241202155940.p267a3tz5ypj4sog@altlinux.org> <86ser6t6fs.wl-maz@kernel.org> <20241202223119.k3uod4ksnlf7gqh2@altlinux.org> <20241203092721.j473dthkbq6wzez7@altlinux.org> <1847e34fa7724d28aeb22d93752f64f2@huawei.com> <20241203221453.mwh6sozyczi4ec2k@altlinux.org> <87jzcfsuep.wl-maz@kernel.org> <20241204182231.ovvj6rpvcs2f5gv7@altlinux.org> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/29.4 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-SA-Exim-Connect-IP: 82.132.238.195 X-SA-Exim-Rcpt-To: vt@altlinux.org, shameerali.kolothum.thodi@huawei.com, will@kernel.org, james.morse@arm.com, linux-arm-kernel@lists.infradead.org, catalin.marinas@arm.com, linux-kernel@vger.kernel.org, oliver.upton@linux.dev, mark.rutland@arm.com, wangzhou1@hisilicon.com, glebfm@altlinux.org X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20241204_111352_987441_E9B6E2CC X-CRM114-Status: GOOD ( 54.18 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Wed, 04 Dec 2024 18:34:53 +0000, Vitaly Chikunov wrote: > > Marc, > > On Wed, Dec 04, 2024 at 08:51:26AM +0000, Marc Zyngier wrote: > > On Tue, 03 Dec 2024 22:14:53 +0000, > > Vitaly Chikunov wrote: > > > > > > Shameer, Marc, Oliver, Will, > > > > > > On Tue, Dec 03, 2024 at 10:03:11AM +0000, Shameerali Kolothum Thodi wrote: > > > > > -----Original Message----- > > > > > From: linux-arm-kernel On > > > > > Behalf Of Vitaly Chikunov > > > > > Sent: Tuesday, December 3, 2024 9:27 AM > > > > > To: Marc Zyngier > > > > > Cc: Will Deacon ; james.morse@arm.com; linux-arm- > > > > > kernel@lists.infradead.org; Catalin Marinas ; > > > > > linux-kernel@vger.kernel.org; oliver.upton@linux.dev; > > > > > mark.rutland@arm.com > > > > > Subject: Re: v6.13-rc1: Internal error: Oops - Undefined instruction: > > > > > 0000000002000000 [#1] SMP > > > > > > > > > > Marc, > > > > > > > > > > On Tue, Dec 03, 2024 at 01:31:19AM +0300, Vitaly Chikunov wrote: > > > > > > On Mon, Dec 02, 2024 at 04:07:03PM +0000, Marc Zyngier wrote: > > > > > > > On Mon, 02 Dec 2024 15:59:40 +0000, > > > > > > > Vitaly Chikunov wrote: > > > > > > > > > > > > > > > > Marc, > > > > > > > > > > > > > > > > On Mon, Dec 02, 2024 at 03:53:59PM +0000, Marc Zyngier wrote: > > > > > > > > > > > > > > > > > > What the log doesn't say is what the host is. Is it 6.13-rc1 as well? > > > > > > > > > > > > > > > > No, host is 6.6.60. > > > > > > > > > > > > > > Right. I wouldn't be surprised if: > > > > > > > > > > > > > > - this v6.6 kernel doesn't hide the MPAM feature as it should (and > > > > > > > that's proably something we should backport) > > > > > > > > > > > > How to confirm this? Currently I cannot find any (case-insensitive) > > > > > > "MPAM" files in /sys, nor mpam string in /proc/cpuinfo, nor MPAM > > > > > > strings in `strace -v` (as it decodes some KVM ioctls) of qemu process. > > > > > > > > > > > > > > > > > > > > - you get a nastygram in the host log telling you that the guest has > > > > > > > executed something it shouldn't (you'll get the encoding of the > > > > > > > instruction) > > > > > > > > > > > > I requested admins of the box for dmesg output since I don't have root > > > > > > access myself and nowadays dmesg is not accessible for a user. > > > > > > > > > > This is what they reported: > > > > > > > > > > kvm [2502822]: Unsupported guest sys_reg access at: ffff80008003e9f0 > > > > > [000000c5] > > > > > { Op0( 3), Op1( 0), CRn(10), CRm( 4), Op2( 4), func_read }, > > > > > > > > > > > > > As Will pointed out I think this is access to MPAMIDR_EL1 and is from this > > > > code here, > > > > > > > > +++ b/arch/arm64/kernel/cpuinfo.c > > > > @@ -478,6 +478,9 @@ static void __cpuinfo_store_cpu(struct cpuinfo_arm64 *info) > > > > if (id_aa64pfr0_32bit_el0(info->reg_id_aa64pfr0)) > > > > __cpuinfo_store_cpu_32bit(&info->aarch32); > > > > > > > > + if (id_aa64pfr0_mpam(info->reg_id_aa64pfr0)) > > > > + info->reg_mpamidr = read_cpuid(MPAMIDR_EL1); > > > > + > > > > cpuinfo_detect_icache_policy(info); > > > > } > > > > > > > > I did manage to boot my setup in 6.6 and this is what happens, > > > > > > > > Host kernel 6.6 > > > > Guest Kernel 6.13-rc1 > > > > > > > > [ 0.195392] smp: Brought up 1 node, 8 CPUs > > > > [ 0.219000] SMP: Total of 8 processors activated. > > > > [ 0.219629] CPU: All CPU(s) started at EL1 > > > > ... > > > > [ 0.223212] CPU features: detected: RAS Extension Support > > > > [ 0.223927] CPU features: detected: Memory Partitioning And Monitoring > > > > [ 0.224796] CPU features: detected: Memory Partitioning And Monitoring Virtualisation > > > > [ 0.225961] alternatives: applying system-wide alternatives > > > > ... > > > > > > > > Guest detects MPAM and boots fine. > > > > > > > > Host kernel 6.13-rc1 > > > > Guest Kernel 6.13-rc1 > > > > > > > > [ 0.196625] smp: Brought up 1 node, 8 CPUs > > > > [ 0.222093] SMP: Total of 8 processors activated. > > > > [ 0.222769] CPU: All CPU(s) started at EL1 > > > > ... > > > > [ 0.226620] CPU features: detected: RAS Extension Support > > > > [ 0.227453] alternatives: applying system-wide alternatives > > > > > > > > MPAM is not visible to Guest in this case. > > > > > > > > So as I pointed out earlier could it be a case where the ID register reports MPAM support > > > > but the firmware has not enabled MPAM? > > > > > > > > James seems to be mentioning that case here, > > > > > > > > " (If you have a boot failure that bisects here its likely your CPUs > > > > advertise MPAM in the id registers, but firmware failed to either enable > > > > or MPAM, or emulate the trap as if it were disabled)" > > > > > > I tried to verify that MPAM is advertised with qemu+gdb method, as > > > suggested by Oliver, but ID_AA64PFR0_EL1 register is not there. > > > > > > (gdb) i r ID_AA64PFR0_EL1 > > > Invalid register `ID_AA64PFR0_EL1' > > > > Then there is a bug in either QEMU or the GDB stubs. This register > > exists, or you wouldn't be here. > > > In case this is useful: > > builder@aarch64:/.in$ qemu-system-aarch64 --version > QEMU emulator version 9.1.1 (qemu-9.1.1-alt2) > Copyright (c) 2003-2024 Fabrice Bellard and the QEMU Project developers > builder@aarch64:/.in$ gdb --version > GNU gdb (GDB) 14.1.0.56.d739d4fd457-alt1 (ALT Sisyphus) > Copyright (C) 2023 Free Software Foundation, Inc. > License GPLv3+: GNU GPL version 3 or later > This is free software: you are free to change and redistribute it. > There is NO WARRANTY, to the extent permitted by law. > > Is there way to get content of this register with these possible > gdb/qemu bugs? I have no idea. And frankly, I don't think this matters. > Perhaps, we can add some debugging print in guest kernel. > > > > > > > Are there other suggestions? > > > > Mark has described what the problem is likely to be. 6.6-stable needs > > to have 6685f5d572c22e10 backported, and it probably should have been > > Cc: to stable. Can you please apply the following patch to your *host* > > machine and retest? > > Unfortunately I cannot. But I can apply patches to the guest kernel. [I > will try to convince admins of the server to apply the patch, though, but > this can take time, and they can refuse since this is production build > server and it's update procedure is complicated.] Then I really cannot help you. I'm not going to paper over a hypervisor bug in the guest kernel, and if you/they are happy to run with critical bugs in your production machine, that's about it then. M. -- Without deviation from the norm, progress is not possible.