All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marc Zyngier <maz@kernel.org>
To: Vitaly Chikunov <vt@altlinux.org>
Cc: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>,
	Will Deacon <will@kernel.org>,
	"james.morse@arm.com" <james.morse@arm.com>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"oliver.upton@linux.dev" <oliver.upton@linux.dev>,
	"mark.rutland@arm.com" <mark.rutland@arm.com>,
	"Wangzhou (B)" <wangzhou1@hisilicon.com>,
	Gleb Fotengauer-Malinovskiy <glebfm@altlinux.org>
Subject: Re: v6.13-rc1: Internal error: Oops - Undefined instruction: 0000000002000000 [#1] SMP
Date: Wed, 04 Dec 2024 19:13:09 +0000	[thread overview]
Message-ID: <87h67js1mi.wl-maz@kernel.org> (raw)
In-Reply-To: <20241204182231.ovvj6rpvcs2f5gv7@altlinux.org>

On Wed, 04 Dec 2024 18:34:53 +0000,
Vitaly Chikunov <vt@altlinux.org> wrote:
> 
> Marc,
> 
> On Wed, Dec 04, 2024 at 08:51:26AM +0000, Marc Zyngier wrote:
> > On Tue, 03 Dec 2024 22:14:53 +0000,
> > Vitaly Chikunov <vt@altlinux.org> wrote:
> > > 
> > > Shameer, Marc, Oliver, Will,
> > > 
> > > On Tue, Dec 03, 2024 at 10:03:11AM +0000, Shameerali Kolothum Thodi wrote:
> > > > > -----Original Message-----
> > > > > From: linux-arm-kernel <linux-arm-kernel-bounces@lists.infradead.org> On
> > > > > Behalf Of Vitaly Chikunov
> > > > > Sent: Tuesday, December 3, 2024 9:27 AM
> > > > > To: Marc Zyngier <maz@kernel.org>
> > > > > Cc: Will Deacon <will@kernel.org>; james.morse@arm.com; linux-arm-
> > > > > kernel@lists.infradead.org; Catalin Marinas <catalin.marinas@arm.com>;
> > > > > linux-kernel@vger.kernel.org; oliver.upton@linux.dev;
> > > > > mark.rutland@arm.com
> > > > > Subject: Re: v6.13-rc1: Internal error: Oops - Undefined instruction:
> > > > > 0000000002000000 [#1] SMP
> > > > > 
> > > > > Marc,
> > > > > 
> > > > > On Tue, Dec 03, 2024 at 01:31:19AM +0300, Vitaly Chikunov wrote:
> > > > > > On Mon, Dec 02, 2024 at 04:07:03PM +0000, Marc Zyngier wrote:
> > > > > > > On Mon, 02 Dec 2024 15:59:40 +0000,
> > > > > > > Vitaly Chikunov <vt@altlinux.org> wrote:
> > > > > > > >
> > > > > > > > Marc,
> > > > > > > >
> > > > > > > > On Mon, Dec 02, 2024 at 03:53:59PM +0000, Marc Zyngier wrote:
> > > > > > > > >
> > > > > > > > > What the log doesn't say is what the host is. Is it 6.13-rc1 as well?
> > > > > > > >
> > > > > > > > No, host is 6.6.60.
> > > > > > >
> > > > > > > Right. I wouldn't be surprised if:
> > > > > > >
> > > > > > > - this v6.6 kernel doesn't hide the MPAM feature as it should (and
> > > > > > >   that's proably something we should backport)
> > > > > >
> > > > > > How to confirm this? Currently I cannot find any (case-insensitive)
> > > > > > "MPAM" files in /sys, nor mpam string in /proc/cpuinfo, nor MPAM
> > > > > > strings in `strace -v` (as it decodes some KVM ioctls) of qemu process.
> > > > > >
> > > > > > >
> > > > > > > - you get a nastygram in the host log telling you that the guest has
> > > > > > >   executed something it shouldn't (you'll get the encoding of the
> > > > > > >   instruction)
> > > > > >
> > > > > > I requested admins of the box for dmesg output since I don't have root
> > > > > > access myself and nowadays dmesg is not accessible for a user.
> > > > > 
> > > > > This is what they reported:
> > > > > 
> > > > >   kvm [2502822]: Unsupported guest sys_reg access at: ffff80008003e9f0
> > > > > [000000c5]
> > > > >                    { Op0( 3), Op1( 0), CRn(10), CRm( 4), Op2( 4), func_read },
> > > > > 
> > > > 
> > > > As Will pointed out I think this is access to MPAMIDR_EL1 and is from this
> > > > code here,
> > > > 
> > > > +++ b/arch/arm64/kernel/cpuinfo.c
> > > > @@ -478,6 +478,9 @@ static void __cpuinfo_store_cpu(struct cpuinfo_arm64 *info)
> > > >  	if (id_aa64pfr0_32bit_el0(info->reg_id_aa64pfr0))
> > > >  		__cpuinfo_store_cpu_32bit(&info->aarch32);
> > > >  
> > > > +	if (id_aa64pfr0_mpam(info->reg_id_aa64pfr0))
> > > > +		info->reg_mpamidr = read_cpuid(MPAMIDR_EL1);
> > > > +
> > > >  	cpuinfo_detect_icache_policy(info);
> > > >  }
> > > > 
> > > > I did manage to boot my setup in 6.6 and this is what happens,
> > > > 
> > > > Host kernel 6.6
> > > > Guest Kernel 6.13-rc1
> > > > 
> > > > [    0.195392] smp: Brought up 1 node, 8 CPUs
> > > > [    0.219000] SMP: Total of 8 processors activated.
> > > > [    0.219629] CPU: All CPU(s) started at EL1
> > > > ...
> > > > [    0.223212] CPU features: detected: RAS Extension Support
> > > > [    0.223927] CPU features: detected: Memory Partitioning And Monitoring
> > > > [    0.224796] CPU features: detected: Memory Partitioning And Monitoring Virtualisation
> > > > [    0.225961] alternatives: applying system-wide alternatives
> > > > ...
> > > > 
> > > > Guest detects MPAM and boots fine.
> > > > 
> > > > Host kernel 6.13-rc1
> > > > Guest Kernel 6.13-rc1
> > > > 
> > > > [    0.196625] smp: Brought up 1 node, 8 CPUs
> > > > [    0.222093] SMP: Total of 8 processors activated.
> > > > [    0.222769] CPU: All CPU(s) started at EL1
> > > > ...
> > > > [    0.226620] CPU features: detected: RAS Extension Support
> > > > [    0.227453] alternatives: applying system-wide alternatives
> > > > 
> > > > MPAM is not visible to Guest in this case.
> > > > 
> > > > So as I pointed out earlier could it be a case where the ID register reports MPAM support
> > > > but the firmware has not enabled MPAM?
> > > > 
> > > > James seems to be mentioning that case here,
> > > > 
> > > > " (If you have a boot failure that bisects here its likely your CPUs
> > > > advertise MPAM in the id registers, but firmware failed to either enable
> > > > or MPAM, or emulate the trap as if it were disabled)"
> > > 
> > > I tried to verify that MPAM is advertised with qemu+gdb method, as
> > > suggested by Oliver, but ID_AA64PFR0_EL1 register is not there.
> > > 
> > >   (gdb) i r ID_AA64PFR0_EL1
> > >   Invalid register `ID_AA64PFR0_EL1'
> > 
> > Then there is a bug in either QEMU or the GDB stubs. This register
> > exists, or you wouldn't be here.
> 
> 
> In case this is useful:
> 
>   builder@aarch64:/.in$ qemu-system-aarch64 --version
>   QEMU emulator version 9.1.1 (qemu-9.1.1-alt2)
>   Copyright (c) 2003-2024 Fabrice Bellard and the QEMU Project developers
>   builder@aarch64:/.in$ gdb --version
>   GNU gdb (GDB) 14.1.0.56.d739d4fd457-alt1 (ALT Sisyphus)
>   Copyright (C) 2023 Free Software Foundation, Inc.
>   License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
>   This is free software: you are free to change and redistribute it.
>   There is NO WARRANTY, to the extent permitted by law.
> 
> Is there way to get content of this register with these possible
> gdb/qemu bugs?

I have no idea. And frankly, I don't think this matters.

> Perhaps, we can add some debugging print in guest kernel.
> 
> > > 
> > > Are there other suggestions?
> > 
> > Mark has described what the problem is likely to be. 6.6-stable needs
> > to have 6685f5d572c22e10 backported, and it probably should have been
> > Cc: to stable. Can you please apply the following patch to your *host*
> > machine and retest?
> 
> Unfortunately I cannot. But I can apply patches to the guest kernel. [I
> will try to convince admins of the server to apply the patch, though, but
> this can take time, and they can refuse since this is production build
> server and it's update procedure is complicated.]

Then I really cannot help you. I'm not going to paper over a
hypervisor bug in the guest kernel, and if you/they are happy to run
with critical bugs in your production machine, that's about it then.

	M.

-- 
Without deviation from the norm, progress is not possible.


  reply	other threads:[~2024-12-04 19:15 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-02  4:58 v6.13-rc1: Internal error: Oops - Undefined instruction: 0000000002000000 [#1] SMP Vitaly Chikunov
2024-12-02 15:36 ` Will Deacon
2024-12-02 15:53   ` Marc Zyngier
2024-12-02 15:59     ` Vitaly Chikunov
2024-12-02 16:07       ` Marc Zyngier
2024-12-02 17:53         ` Mark Rutland
2024-12-02 22:31         ` Vitaly Chikunov
2024-12-03  1:19           ` Oliver Upton
2024-12-03  4:03             ` Vitaly Chikunov
2024-12-05  2:09               ` Vitaly Chikunov
2024-12-03  9:27           ` Vitaly Chikunov
2024-12-03 10:03             ` Shameerali Kolothum Thodi
2024-12-03 22:14               ` Vitaly Chikunov
2024-12-04  8:51                 ` Marc Zyngier
2024-12-04 18:34                   ` Vitaly Chikunov
2024-12-04 19:13                     ` Marc Zyngier [this message]
2024-12-05  8:53                     ` Shameerali Kolothum Thodi
2024-12-04 18:53                   ` Vitaly Chikunov
2024-12-06 20:56                   ` Vitaly Chikunov
2024-12-10  2:51                     ` Vitaly Chikunov
2024-12-10  9:55                       ` Marc Zyngier
2024-12-02 16:06   ` Shameerali Kolothum Thodi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87h67js1mi.wl-maz@kernel.org \
    --to=maz@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=glebfm@altlinux.org \
    --cc=james.morse@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=oliver.upton@linux.dev \
    --cc=shameerali.kolothum.thodi@huawei.com \
    --cc=vt@altlinux.org \
    --cc=wangzhou1@hisilicon.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.