From: Marc Zyngier <maz@kernel.org>
To: Vitaly Chikunov <vt@altlinux.org>
Cc: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>,
Will Deacon <will@kernel.org>,
"james.morse@arm.com" <james.morse@arm.com>,
"linux-arm-kernel@lists.infradead.org"
<linux-arm-kernel@lists.infradead.org>,
Catalin Marinas <catalin.marinas@arm.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"oliver.upton@linux.dev" <oliver.upton@linux.dev>,
"mark.rutland@arm.com" <mark.rutland@arm.com>,
"Wangzhou (B)" <wangzhou1@hisilicon.com>,
Gleb Fotengauer-Malinovskiy <glebfm@altlinux.org>
Subject: Re: v6.13-rc1: Internal error: Oops - Undefined instruction: 0000000002000000 [#1] SMP
Date: Wed, 04 Dec 2024 19:13:09 +0000 [thread overview]
Message-ID: <87h67js1mi.wl-maz@kernel.org> (raw)
In-Reply-To: <20241204182231.ovvj6rpvcs2f5gv7@altlinux.org>
On Wed, 04 Dec 2024 18:34:53 +0000,
Vitaly Chikunov <vt@altlinux.org> wrote:
>
> Marc,
>
> On Wed, Dec 04, 2024 at 08:51:26AM +0000, Marc Zyngier wrote:
> > On Tue, 03 Dec 2024 22:14:53 +0000,
> > Vitaly Chikunov <vt@altlinux.org> wrote:
> > >
> > > Shameer, Marc, Oliver, Will,
> > >
> > > On Tue, Dec 03, 2024 at 10:03:11AM +0000, Shameerali Kolothum Thodi wrote:
> > > > > -----Original Message-----
> > > > > From: linux-arm-kernel <linux-arm-kernel-bounces@lists.infradead.org> On
> > > > > Behalf Of Vitaly Chikunov
> > > > > Sent: Tuesday, December 3, 2024 9:27 AM
> > > > > To: Marc Zyngier <maz@kernel.org>
> > > > > Cc: Will Deacon <will@kernel.org>; james.morse@arm.com; linux-arm-
> > > > > kernel@lists.infradead.org; Catalin Marinas <catalin.marinas@arm.com>;
> > > > > linux-kernel@vger.kernel.org; oliver.upton@linux.dev;
> > > > > mark.rutland@arm.com
> > > > > Subject: Re: v6.13-rc1: Internal error: Oops - Undefined instruction:
> > > > > 0000000002000000 [#1] SMP
> > > > >
> > > > > Marc,
> > > > >
> > > > > On Tue, Dec 03, 2024 at 01:31:19AM +0300, Vitaly Chikunov wrote:
> > > > > > On Mon, Dec 02, 2024 at 04:07:03PM +0000, Marc Zyngier wrote:
> > > > > > > On Mon, 02 Dec 2024 15:59:40 +0000,
> > > > > > > Vitaly Chikunov <vt@altlinux.org> wrote:
> > > > > > > >
> > > > > > > > Marc,
> > > > > > > >
> > > > > > > > On Mon, Dec 02, 2024 at 03:53:59PM +0000, Marc Zyngier wrote:
> > > > > > > > >
> > > > > > > > > What the log doesn't say is what the host is. Is it 6.13-rc1 as well?
> > > > > > > >
> > > > > > > > No, host is 6.6.60.
> > > > > > >
> > > > > > > Right. I wouldn't be surprised if:
> > > > > > >
> > > > > > > - this v6.6 kernel doesn't hide the MPAM feature as it should (and
> > > > > > > that's proably something we should backport)
> > > > > >
> > > > > > How to confirm this? Currently I cannot find any (case-insensitive)
> > > > > > "MPAM" files in /sys, nor mpam string in /proc/cpuinfo, nor MPAM
> > > > > > strings in `strace -v` (as it decodes some KVM ioctls) of qemu process.
> > > > > >
> > > > > > >
> > > > > > > - you get a nastygram in the host log telling you that the guest has
> > > > > > > executed something it shouldn't (you'll get the encoding of the
> > > > > > > instruction)
> > > > > >
> > > > > > I requested admins of the box for dmesg output since I don't have root
> > > > > > access myself and nowadays dmesg is not accessible for a user.
> > > > >
> > > > > This is what they reported:
> > > > >
> > > > > kvm [2502822]: Unsupported guest sys_reg access at: ffff80008003e9f0
> > > > > [000000c5]
> > > > > { Op0( 3), Op1( 0), CRn(10), CRm( 4), Op2( 4), func_read },
> > > > >
> > > >
> > > > As Will pointed out I think this is access to MPAMIDR_EL1 and is from this
> > > > code here,
> > > >
> > > > +++ b/arch/arm64/kernel/cpuinfo.c
> > > > @@ -478,6 +478,9 @@ static void __cpuinfo_store_cpu(struct cpuinfo_arm64 *info)
> > > > if (id_aa64pfr0_32bit_el0(info->reg_id_aa64pfr0))
> > > > __cpuinfo_store_cpu_32bit(&info->aarch32);
> > > >
> > > > + if (id_aa64pfr0_mpam(info->reg_id_aa64pfr0))
> > > > + info->reg_mpamidr = read_cpuid(MPAMIDR_EL1);
> > > > +
> > > > cpuinfo_detect_icache_policy(info);
> > > > }
> > > >
> > > > I did manage to boot my setup in 6.6 and this is what happens,
> > > >
> > > > Host kernel 6.6
> > > > Guest Kernel 6.13-rc1
> > > >
> > > > [ 0.195392] smp: Brought up 1 node, 8 CPUs
> > > > [ 0.219000] SMP: Total of 8 processors activated.
> > > > [ 0.219629] CPU: All CPU(s) started at EL1
> > > > ...
> > > > [ 0.223212] CPU features: detected: RAS Extension Support
> > > > [ 0.223927] CPU features: detected: Memory Partitioning And Monitoring
> > > > [ 0.224796] CPU features: detected: Memory Partitioning And Monitoring Virtualisation
> > > > [ 0.225961] alternatives: applying system-wide alternatives
> > > > ...
> > > >
> > > > Guest detects MPAM and boots fine.
> > > >
> > > > Host kernel 6.13-rc1
> > > > Guest Kernel 6.13-rc1
> > > >
> > > > [ 0.196625] smp: Brought up 1 node, 8 CPUs
> > > > [ 0.222093] SMP: Total of 8 processors activated.
> > > > [ 0.222769] CPU: All CPU(s) started at EL1
> > > > ...
> > > > [ 0.226620] CPU features: detected: RAS Extension Support
> > > > [ 0.227453] alternatives: applying system-wide alternatives
> > > >
> > > > MPAM is not visible to Guest in this case.
> > > >
> > > > So as I pointed out earlier could it be a case where the ID register reports MPAM support
> > > > but the firmware has not enabled MPAM?
> > > >
> > > > James seems to be mentioning that case here,
> > > >
> > > > " (If you have a boot failure that bisects here its likely your CPUs
> > > > advertise MPAM in the id registers, but firmware failed to either enable
> > > > or MPAM, or emulate the trap as if it were disabled)"
> > >
> > > I tried to verify that MPAM is advertised with qemu+gdb method, as
> > > suggested by Oliver, but ID_AA64PFR0_EL1 register is not there.
> > >
> > > (gdb) i r ID_AA64PFR0_EL1
> > > Invalid register `ID_AA64PFR0_EL1'
> >
> > Then there is a bug in either QEMU or the GDB stubs. This register
> > exists, or you wouldn't be here.
>
>
> In case this is useful:
>
> builder@aarch64:/.in$ qemu-system-aarch64 --version
> QEMU emulator version 9.1.1 (qemu-9.1.1-alt2)
> Copyright (c) 2003-2024 Fabrice Bellard and the QEMU Project developers
> builder@aarch64:/.in$ gdb --version
> GNU gdb (GDB) 14.1.0.56.d739d4fd457-alt1 (ALT Sisyphus)
> Copyright (C) 2023 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.
>
> Is there way to get content of this register with these possible
> gdb/qemu bugs?
I have no idea. And frankly, I don't think this matters.
> Perhaps, we can add some debugging print in guest kernel.
>
> > >
> > > Are there other suggestions?
> >
> > Mark has described what the problem is likely to be. 6.6-stable needs
> > to have 6685f5d572c22e10 backported, and it probably should have been
> > Cc: to stable. Can you please apply the following patch to your *host*
> > machine and retest?
>
> Unfortunately I cannot. But I can apply patches to the guest kernel. [I
> will try to convince admins of the server to apply the patch, though, but
> this can take time, and they can refuse since this is production build
> server and it's update procedure is complicated.]
Then I really cannot help you. I'm not going to paper over a
hypervisor bug in the guest kernel, and if you/they are happy to run
with critical bugs in your production machine, that's about it then.
M.
--
Without deviation from the norm, progress is not possible.
next prev parent reply other threads:[~2024-12-04 19:15 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-02 4:58 v6.13-rc1: Internal error: Oops - Undefined instruction: 0000000002000000 [#1] SMP Vitaly Chikunov
2024-12-02 15:36 ` Will Deacon
2024-12-02 15:53 ` Marc Zyngier
2024-12-02 15:59 ` Vitaly Chikunov
2024-12-02 16:07 ` Marc Zyngier
2024-12-02 17:53 ` Mark Rutland
2024-12-02 22:31 ` Vitaly Chikunov
2024-12-03 1:19 ` Oliver Upton
2024-12-03 4:03 ` Vitaly Chikunov
2024-12-05 2:09 ` Vitaly Chikunov
2024-12-03 9:27 ` Vitaly Chikunov
2024-12-03 10:03 ` Shameerali Kolothum Thodi
2024-12-03 22:14 ` Vitaly Chikunov
2024-12-04 8:51 ` Marc Zyngier
2024-12-04 18:34 ` Vitaly Chikunov
2024-12-04 19:13 ` Marc Zyngier [this message]
2024-12-05 8:53 ` Shameerali Kolothum Thodi
2024-12-04 18:53 ` Vitaly Chikunov
2024-12-06 20:56 ` Vitaly Chikunov
2024-12-10 2:51 ` Vitaly Chikunov
2024-12-10 9:55 ` Marc Zyngier
2024-12-02 16:06 ` Shameerali Kolothum Thodi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87h67js1mi.wl-maz@kernel.org \
--to=maz@kernel.org \
--cc=catalin.marinas@arm.com \
--cc=glebfm@altlinux.org \
--cc=james.morse@arm.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mark.rutland@arm.com \
--cc=oliver.upton@linux.dev \
--cc=shameerali.kolothum.thodi@huawei.com \
--cc=vt@altlinux.org \
--cc=wangzhou1@hisilicon.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).