* v6.13-rc1: Internal error: Oops - Undefined instruction: 0000000002000000 [#1] SMP @ 2024-12-02 4:58 Vitaly Chikunov 2024-12-02 15:36 ` Will Deacon 0 siblings, 1 reply; 22+ messages in thread From: Vitaly Chikunov @ 2024-12-02 4:58 UTC (permalink / raw) To: linux-arm-kernel, Catalin Marinas, Will Deacon; +Cc: linux-kernel Hi, v6.13-rc1 exhibits a boot failure on aarch64 under KVM. (QEMU 9.1.1, CPU Kunpeng-920). Boot log: + time qemu-system-aarch64 -M accel=kvm:tcg -smp cores=8 -m 4096 -serial mon:stdio -nodefaults -nographic -no-reboot -fsdev local,id=root,path=/,security_model=none,multidevs=remap -device virtio-9p-pci,fsdev=root,mount_tag=virtio-9p:/ -device virtio-rng-pci -kernel /usr/src/tmp/kernel-image-6.13-buildroot/boot/vmlinuz-6.13.0-6.13-alt0.rc1 -initrd /usr/src/tmp/initramfs-6.13.0-6.13-alt0.rc1.img -sandbox on,spawn=deny -M virt,gic-version=3 -cpu max -append 'console=ttyAMA0 mitigations=off nokaslr panic=-1 SCRIPT=/usr/src/tmp/vm.SchsIm2FjB earlycon earlyprintk=serial ignore_loglevel debug rddebug' [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x481fd010] [ 0.000000] Linux version 6.13.0-6.13-alt0.rc1 (builder@localhost.localdomain) (gcc-14 (GCC) 14.2.1 20241028 (ALT Sisyphus 14.2.1-alt1), GNU ld (GNU Binutils) 2.43.1.20241025) #1 SMP PREEMPT_DYNAMIC Mon Dec 2 03:33:29 UTC 2024 [ 0.000000] KASLR disabled on command line [ 0.000000] random: crng init done [ 0.000000] Machine model: linux,dummy-virt [ 0.000000] printk: debug: ignoring loglevel setting. [ 0.000000] efi: UEFI not found. [ 0.000000] earlycon: pl11 at MMIO 0x0000000009000000 (options '') [ 0.000000] printk: legacy bootconsole [pl11] enabled [ 0.000000] OF: reserved mem: Reserved memory: No reserved-memory node in the DT [ 0.000000] NUMA: Faking a node at [mem 0x0000000040000000-0x000000013fffffff] [ 0.000000] NODE_DATA(0) allocated [mem 0x13f7f3540-0x13f7f947f] [ 0.000000] Zone ranges: [ 0.000000] DMA [mem 0x0000000040000000-0x00000000ffffffff] [ 0.000000] DMA32 empty [ 0.000000] Normal [mem 0x0000000100000000-0x000000013fffffff] [ 0.000000] Movable zone start for each node [ 0.000000] Early memory node ranges [ 0.000000] node 0: [mem 0x0000000040000000-0x000000013fffffff] [ 0.000000] Initmem setup node 0 [mem 0x0000000040000000-0x000000013fffffff] [ 0.000000] cma: Reserved 256 MiB at 0x00000000f0000000 on node -1 [ 0.000000] psci: probing for conduit method from DT. [ 0.000000] psci: PSCIv1.1 detected in firmware. [ 0.000000] psci: Using standard PSCI v0.2 function IDs [ 0.000000] psci: Trusted OS migration not required [ 0.000000] psci: SMC Calling Convention v1.1 [ 0.000000] smccc: KVM: hypervisor services detected (0x00000000 0x00000000 0x00000000 0x00000003) [ 0.000000] percpu: Embedded 34 pages/cpu s100632 r8192 d30440 u139264 [ 0.000000] pcpu-alloc: s100632 r8192 d30440 u139264 alloc=34*4096 [ 0.000000] pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3 [0] 4 [0] 5 [0] 6 [0] 7 [ 0.000000] Internal error: Oops - Undefined instruction: 0000000002000000 [#1] SMP [ 0.000000] Modules linked in: [ 0.000000] CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.13.0-6.13-alt0.rc1 #1 [ 0.000000] Hardware name: linux,dummy-virt (DT) [ 0.000000] pstate: 004000c5 (nzcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 0.000000] pc : __cpuinfo_store_cpu+0xe8/0x240 [ 0.000000] lr : cpuinfo_store_boot_cpu+0x34/0x88 [ 0.000000] sp : ffff800082013df0 [ 0.000000] x29: ffff800082013df0 x28: 000000000000008e x27: ffff800081e38128 [ 0.000000] x26: ffff800081702190 x25: ffff80008201f040 x24: ffff0000ff7d1d00 [ 0.000000] x23: ffff80008201ec00 x22: ffff800081e39100 x21: ffff8000816f9750 [ 0.000000] x20: ffff800081f55280 x19: ffff0000ff6be2e0 x18: 0000000000000000 [ 0.000000] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 [ 0.000000] x14: 000000000000002f x13: 000000013f7f9490 x12: 0000008000000000 [ 0.000000] x11: 0000000000000000 x10: 00000000007f8000 x9 : 000000013f808000 [ 0.000000] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 000000013f7f94c0 [ 0.000000] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 1100010011111111 [ 0.000000] x2 : 0000000000000001 x1 : 0000000084448004 x0 : ffff0000ff6be2e0 [ 0.000000] Call trace: [ 0.000000] __cpuinfo_store_cpu+0xe8/0x240 (P) [ 0.000000] cpuinfo_store_boot_cpu+0x34/0x88 (L) [ 0.000000] cpuinfo_store_boot_cpu+0x34/0x88 [ 0.000000] smp_prepare_boot_cpu+0x30/0x58 [ 0.000000] start_kernel+0x514/0x9d0 [ 0.000000] __primary_switched+0x88/0x98 [ 0.000000] Code: f100085f 54000600 f2580c7f 54000060 (d538a482) [ 0.000000] ---[ end trace 0000000000000000 ]--- [ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task! [ 0.000000] Rebooting in 600 seconds.. Thanks, ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: v6.13-rc1: Internal error: Oops - Undefined instruction: 0000000002000000 [#1] SMP 2024-12-02 4:58 v6.13-rc1: Internal error: Oops - Undefined instruction: 0000000002000000 [#1] SMP Vitaly Chikunov @ 2024-12-02 15:36 ` Will Deacon 2024-12-02 15:53 ` Marc Zyngier 2024-12-02 16:06 ` Shameerali Kolothum Thodi 0 siblings, 2 replies; 22+ messages in thread From: Will Deacon @ 2024-12-02 15:36 UTC (permalink / raw) To: Vitaly Chikunov, james.morse Cc: linux-arm-kernel, Catalin Marinas, linux-kernel, maz, oliver.upton, mark.rutland [+ usual suspects] On Mon, Dec 02, 2024 at 07:58:30AM +0300, Vitaly Chikunov wrote: > v6.13-rc1 exhibits a boot failure on aarch64 under KVM. (QEMU 9.1.1, CPU > Kunpeng-920). Boot log: I've not tried to repro this locally, but from the log: > + time qemu-system-aarch64 -M accel=kvm:tcg -smp cores=8 -m 4096 -serial mon:stdio -nodefaults -nographic -no-reboot -fsdev local,id=root,path=/,security_model=none,multidevs=remap -device virtio-9p-pci,fsdev=root,mount_tag=virtio-9p:/ -device virtio-rng-pci -kernel /usr/src/tmp/kernel-image-6.13-buildroot/boot/vmlinuz-6.13.0-6.13-alt0.rc1 -initrd /usr/src/tmp/initramfs-6.13.0-6.13-alt0.rc1.img -sandbox on,spawn=deny -M virt,gic-version=3 -cpu max -append 'console=ttyAMA0 mitigations=off nokaslr panic=-1 SCRIPT=/usr/src/tmp/vm.SchsIm2FjB earlycon earlyprintk=serial ignore_loglevel debug rddebug' > [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x481fd010] > [ 0.000000] Linux version 6.13.0-6.13-alt0.rc1 (builder@localhost.localdomain) (gcc-14 (GCC) 14.2.1 20241028 (ALT Sisyphus 14.2.1-alt1), GNU ld (GNU Binutils) 2.43.1.20241025) #1 SMP PREEMPT_DYNAMIC Mon Dec 2 03:33:29 UTC 2024 > [ 0.000000] KASLR disabled on command line > [ 0.000000] random: crng init done > [ 0.000000] Machine model: linux,dummy-virt > [ 0.000000] printk: debug: ignoring loglevel setting. > [ 0.000000] efi: UEFI not found. > [ 0.000000] earlycon: pl11 at MMIO 0x0000000009000000 (options '') > [ 0.000000] printk: legacy bootconsole [pl11] enabled > [ 0.000000] OF: reserved mem: Reserved memory: No reserved-memory node in the DT > [ 0.000000] NUMA: Faking a node at [mem 0x0000000040000000-0x000000013fffffff] > [ 0.000000] NODE_DATA(0) allocated [mem 0x13f7f3540-0x13f7f947f] > [ 0.000000] Zone ranges: > [ 0.000000] DMA [mem 0x0000000040000000-0x00000000ffffffff] > [ 0.000000] DMA32 empty > [ 0.000000] Normal [mem 0x0000000100000000-0x000000013fffffff] > [ 0.000000] Movable zone start for each node > [ 0.000000] Early memory node ranges > [ 0.000000] node 0: [mem 0x0000000040000000-0x000000013fffffff] > [ 0.000000] Initmem setup node 0 [mem 0x0000000040000000-0x000000013fffffff] > [ 0.000000] cma: Reserved 256 MiB at 0x00000000f0000000 on node -1 > [ 0.000000] psci: probing for conduit method from DT. > [ 0.000000] psci: PSCIv1.1 detected in firmware. > [ 0.000000] psci: Using standard PSCI v0.2 function IDs > [ 0.000000] psci: Trusted OS migration not required > [ 0.000000] psci: SMC Calling Convention v1.1 > [ 0.000000] smccc: KVM: hypervisor services detected (0x00000000 0x00000000 0x00000000 0x00000003) > [ 0.000000] percpu: Embedded 34 pages/cpu s100632 r8192 d30440 u139264 > [ 0.000000] pcpu-alloc: s100632 r8192 d30440 u139264 alloc=34*4096 > [ 0.000000] pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3 [0] 4 [0] 5 [0] 6 [0] 7 > [ 0.000000] Internal error: Oops - Undefined instruction: 0000000002000000 [#1] SMP We take an undefined instruction exception in the kernel early during boot... > [ 0.000000] Modules linked in: > [ 0.000000] CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.13.0-6.13-alt0.rc1 #1 > [ 0.000000] Hardware name: linux,dummy-virt (DT) > [ 0.000000] pstate: 004000c5 (nzcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) > [ 0.000000] pc : __cpuinfo_store_cpu+0xe8/0x240 > [ 0.000000] lr : cpuinfo_store_boot_cpu+0x34/0x88 > [ 0.000000] sp : ffff800082013df0 > [ 0.000000] x29: ffff800082013df0 x28: 000000000000008e x27: ffff800081e38128 > [ 0.000000] x26: ffff800081702190 x25: ffff80008201f040 x24: ffff0000ff7d1d00 > [ 0.000000] x23: ffff80008201ec00 x22: ffff800081e39100 x21: ffff8000816f9750 > [ 0.000000] x20: ffff800081f55280 x19: ffff0000ff6be2e0 x18: 0000000000000000 > [ 0.000000] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 > [ 0.000000] x14: 000000000000002f x13: 000000013f7f9490 x12: 0000008000000000 > [ 0.000000] x11: 0000000000000000 x10: 00000000007f8000 x9 : 000000013f808000 > [ 0.000000] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 000000013f7f94c0 > [ 0.000000] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 1100010011111111 > [ 0.000000] x2 : 0000000000000001 x1 : 0000000084448004 x0 : ffff0000ff6be2e0 > [ 0.000000] Call trace: > [ 0.000000] __cpuinfo_store_cpu+0xe8/0x240 (P) > [ 0.000000] cpuinfo_store_boot_cpu+0x34/0x88 (L) > [ 0.000000] cpuinfo_store_boot_cpu+0x34/0x88 > [ 0.000000] smp_prepare_boot_cpu+0x30/0x58 > [ 0.000000] start_kernel+0x514/0x9d0 > [ 0.000000] __primary_switched+0x88/0x98 > [ 0.000000] Code: f100085f 54000600 f2580c7f 54000060 (d538a482) ... and that's: 0: f100085f cmp x2, #0x2 4: 54000600 b.eq 0xc4 // b.none 8: f2580c7f tst x3, #0xf0000000000 c: 54000060 b.eq 0x18 // b.none 10:* d538a482 mrs x2, s3_0_c10_c4_4 <-- trapping instruction Which I think corresponds to a read of MPAMIDR_EL1. It looks like James routed accesses to this register to undef_access() in 31ff96c38ea3 ("KVM: arm64: Fix missing traps of guest accesses to the MPAM register") so I'm not really sure how this is supposed to work given that it's an ID register. James? Will ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: v6.13-rc1: Internal error: Oops - Undefined instruction: 0000000002000000 [#1] SMP 2024-12-02 15:36 ` Will Deacon @ 2024-12-02 15:53 ` Marc Zyngier 2024-12-02 15:59 ` Vitaly Chikunov 2024-12-02 16:06 ` Shameerali Kolothum Thodi 1 sibling, 1 reply; 22+ messages in thread From: Marc Zyngier @ 2024-12-02 15:53 UTC (permalink / raw) To: Will Deacon Cc: Vitaly Chikunov, james.morse, linux-arm-kernel, Catalin Marinas, linux-kernel, oliver.upton, mark.rutland On Mon, 02 Dec 2024 15:36:19 +0000, Will Deacon <will@kernel.org> wrote: > > [+ usual suspects] > > On Mon, Dec 02, 2024 at 07:58:30AM +0300, Vitaly Chikunov wrote: > > v6.13-rc1 exhibits a boot failure on aarch64 under KVM. (QEMU 9.1.1, CPU > > Kunpeng-920). Boot log: > > I've not tried to repro this locally, but from the log: > > > + time qemu-system-aarch64 -M accel=kvm:tcg -smp cores=8 -m 4096 -serial mon:stdio -nodefaults -nographic -no-reboot -fsdev local,id=root,path=/,security_model=none,multidevs=remap -device virtio-9p-pci,fsdev=root,mount_tag=virtio-9p:/ -device virtio-rng-pci -kernel /usr/src/tmp/kernel-image-6.13-buildroot/boot/vmlinuz-6.13.0-6.13-alt0.rc1 -initrd /usr/src/tmp/initramfs-6.13.0-6.13-alt0.rc1.img -sandbox on,spawn=deny -M virt,gic-version=3 -cpu max -append 'console=ttyAMA0 mitigations=off nokaslr panic=-1 SCRIPT=/usr/src/tmp/vm.SchsIm2FjB earlycon earlyprintk=serial ignore_loglevel debug rddebug' > > [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x481fd010] > > [ 0.000000] Linux version 6.13.0-6.13-alt0.rc1 (builder@localhost.localdomain) (gcc-14 (GCC) 14.2.1 20241028 (ALT Sisyphus 14.2.1-alt1), GNU ld (GNU Binutils) 2.43.1.20241025) #1 SMP PREEMPT_DYNAMIC Mon Dec 2 03:33:29 UTC 2024 > > [ 0.000000] KASLR disabled on command line > > [ 0.000000] random: crng init done > > [ 0.000000] Machine model: linux,dummy-virt > > [ 0.000000] printk: debug: ignoring loglevel setting. > > [ 0.000000] efi: UEFI not found. > > [ 0.000000] earlycon: pl11 at MMIO 0x0000000009000000 (options '') > > [ 0.000000] printk: legacy bootconsole [pl11] enabled > > [ 0.000000] OF: reserved mem: Reserved memory: No reserved-memory node in the DT > > [ 0.000000] NUMA: Faking a node at [mem 0x0000000040000000-0x000000013fffffff] > > [ 0.000000] NODE_DATA(0) allocated [mem 0x13f7f3540-0x13f7f947f] > > [ 0.000000] Zone ranges: > > [ 0.000000] DMA [mem 0x0000000040000000-0x00000000ffffffff] > > [ 0.000000] DMA32 empty > > [ 0.000000] Normal [mem 0x0000000100000000-0x000000013fffffff] > > [ 0.000000] Movable zone start for each node > > [ 0.000000] Early memory node ranges > > [ 0.000000] node 0: [mem 0x0000000040000000-0x000000013fffffff] > > [ 0.000000] Initmem setup node 0 [mem 0x0000000040000000-0x000000013fffffff] > > [ 0.000000] cma: Reserved 256 MiB at 0x00000000f0000000 on node -1 > > [ 0.000000] psci: probing for conduit method from DT. > > [ 0.000000] psci: PSCIv1.1 detected in firmware. > > [ 0.000000] psci: Using standard PSCI v0.2 function IDs > > [ 0.000000] psci: Trusted OS migration not required > > [ 0.000000] psci: SMC Calling Convention v1.1 > > [ 0.000000] smccc: KVM: hypervisor services detected (0x00000000 0x00000000 0x00000000 0x00000003) > > [ 0.000000] percpu: Embedded 34 pages/cpu s100632 r8192 d30440 u139264 > > [ 0.000000] pcpu-alloc: s100632 r8192 d30440 u139264 alloc=34*4096 > > [ 0.000000] pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3 [0] 4 [0] 5 [0] 6 [0] 7 > > [ 0.000000] Internal error: Oops - Undefined instruction: 0000000002000000 [#1] SMP > > We take an undefined instruction exception in the kernel early during > boot... > > > [ 0.000000] Modules linked in: > > [ 0.000000] CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.13.0-6.13-alt0.rc1 #1 > > [ 0.000000] Hardware name: linux,dummy-virt (DT) > > [ 0.000000] pstate: 004000c5 (nzcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) > > [ 0.000000] pc : __cpuinfo_store_cpu+0xe8/0x240 > > [ 0.000000] lr : cpuinfo_store_boot_cpu+0x34/0x88 > > [ 0.000000] sp : ffff800082013df0 > > [ 0.000000] x29: ffff800082013df0 x28: 000000000000008e x27: ffff800081e38128 > > [ 0.000000] x26: ffff800081702190 x25: ffff80008201f040 x24: ffff0000ff7d1d00 > > [ 0.000000] x23: ffff80008201ec00 x22: ffff800081e39100 x21: ffff8000816f9750 > > [ 0.000000] x20: ffff800081f55280 x19: ffff0000ff6be2e0 x18: 0000000000000000 > > [ 0.000000] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 > > [ 0.000000] x14: 000000000000002f x13: 000000013f7f9490 x12: 0000008000000000 > > [ 0.000000] x11: 0000000000000000 x10: 00000000007f8000 x9 : 000000013f808000 > > [ 0.000000] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 000000013f7f94c0 > > [ 0.000000] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 1100010011111111 > > [ 0.000000] x2 : 0000000000000001 x1 : 0000000084448004 x0 : ffff0000ff6be2e0 > > [ 0.000000] Call trace: > > [ 0.000000] __cpuinfo_store_cpu+0xe8/0x240 (P) > > [ 0.000000] cpuinfo_store_boot_cpu+0x34/0x88 (L) > > [ 0.000000] cpuinfo_store_boot_cpu+0x34/0x88 > > [ 0.000000] smp_prepare_boot_cpu+0x30/0x58 > > [ 0.000000] start_kernel+0x514/0x9d0 > > [ 0.000000] __primary_switched+0x88/0x98 > > [ 0.000000] Code: f100085f 54000600 f2580c7f 54000060 (d538a482) > > ... and that's: > > 0: f100085f cmp x2, #0x2 > 4: 54000600 b.eq 0xc4 // b.none > 8: f2580c7f tst x3, #0xf0000000000 > c: 54000060 b.eq 0x18 // b.none > 10:* d538a482 mrs x2, s3_0_c10_c4_4 <-- trapping instruction > > Which I think corresponds to a read of MPAMIDR_EL1. > > It looks like James routed accesses to this register to undef_access() > in 31ff96c38ea3 ("KVM: arm64: Fix missing traps of guest accesses to the > MPAM register") so I'm not really sure how this is supposed to work > given that it's an ID register. It's not. Or rather, it is an IDREG that is only valid when MPAM is advertised and implemented. From the spec: "This register is present only when FEAT_MPAM is implemented. Otherwise, direct accesses to MPAMIDR_EL1 are UNDEFINED." So from a KVM perspective, I think this is doing the right thing. What the log doesn't say is what the host is. Is it 6.13-rc1 as well? Thanks, M. -- Without deviation from the norm, progress is not possible. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: v6.13-rc1: Internal error: Oops - Undefined instruction: 0000000002000000 [#1] SMP 2024-12-02 15:53 ` Marc Zyngier @ 2024-12-02 15:59 ` Vitaly Chikunov 2024-12-02 16:07 ` Marc Zyngier 0 siblings, 1 reply; 22+ messages in thread From: Vitaly Chikunov @ 2024-12-02 15:59 UTC (permalink / raw) To: Marc Zyngier Cc: Will Deacon, james.morse, linux-arm-kernel, Catalin Marinas, linux-kernel, oliver.upton, mark.rutland Marc, On Mon, Dec 02, 2024 at 03:53:59PM +0000, Marc Zyngier wrote: > On Mon, 02 Dec 2024 15:36:19 +0000, > Will Deacon <will@kernel.org> wrote: > > > > [+ usual suspects] > > > > On Mon, Dec 02, 2024 at 07:58:30AM +0300, Vitaly Chikunov wrote: > > > v6.13-rc1 exhibits a boot failure on aarch64 under KVM. (QEMU 9.1.1, CPU > > > Kunpeng-920). Boot log: > > > > I've not tried to repro this locally, but from the log: > > > > > + time qemu-system-aarch64 -M accel=kvm:tcg -smp cores=8 -m 4096 -serial mon:stdio -nodefaults -nographic -no-reboot -fsdev local,id=root,path=/,security_model=none,multidevs=remap -device virtio-9p-pci,fsdev=root,mount_tag=virtio-9p:/ -device virtio-rng-pci -kernel /usr/src/tmp/kernel-image-6.13-buildroot/boot/vmlinuz-6.13.0-6.13-alt0.rc1 -initrd /usr/src/tmp/initramfs-6.13.0-6.13-alt0.rc1.img -sandbox on,spawn=deny -M virt,gic-version=3 -cpu max -append 'console=ttyAMA0 mitigations=off nokaslr panic=-1 SCRIPT=/usr/src/tmp/vm.SchsIm2FjB earlycon earlyprintk=serial ignore_loglevel debug rddebug' > > > [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x481fd010] > > > [ 0.000000] Linux version 6.13.0-6.13-alt0.rc1 (builder@localhost.localdomain) (gcc-14 (GCC) 14.2.1 20241028 (ALT Sisyphus 14.2.1-alt1), GNU ld (GNU Binutils) 2.43.1.20241025) #1 SMP PREEMPT_DYNAMIC Mon Dec 2 03:33:29 UTC 2024 > > > [ 0.000000] KASLR disabled on command line > > > [ 0.000000] random: crng init done > > > [ 0.000000] Machine model: linux,dummy-virt > > > [ 0.000000] printk: debug: ignoring loglevel setting. > > > [ 0.000000] efi: UEFI not found. > > > [ 0.000000] earlycon: pl11 at MMIO 0x0000000009000000 (options '') > > > [ 0.000000] printk: legacy bootconsole [pl11] enabled > > > [ 0.000000] OF: reserved mem: Reserved memory: No reserved-memory node in the DT > > > [ 0.000000] NUMA: Faking a node at [mem 0x0000000040000000-0x000000013fffffff] > > > [ 0.000000] NODE_DATA(0) allocated [mem 0x13f7f3540-0x13f7f947f] > > > [ 0.000000] Zone ranges: > > > [ 0.000000] DMA [mem 0x0000000040000000-0x00000000ffffffff] > > > [ 0.000000] DMA32 empty > > > [ 0.000000] Normal [mem 0x0000000100000000-0x000000013fffffff] > > > [ 0.000000] Movable zone start for each node > > > [ 0.000000] Early memory node ranges > > > [ 0.000000] node 0: [mem 0x0000000040000000-0x000000013fffffff] > > > [ 0.000000] Initmem setup node 0 [mem 0x0000000040000000-0x000000013fffffff] > > > [ 0.000000] cma: Reserved 256 MiB at 0x00000000f0000000 on node -1 > > > [ 0.000000] psci: probing for conduit method from DT. > > > [ 0.000000] psci: PSCIv1.1 detected in firmware. > > > [ 0.000000] psci: Using standard PSCI v0.2 function IDs > > > [ 0.000000] psci: Trusted OS migration not required > > > [ 0.000000] psci: SMC Calling Convention v1.1 > > > [ 0.000000] smccc: KVM: hypervisor services detected (0x00000000 0x00000000 0x00000000 0x00000003) > > > [ 0.000000] percpu: Embedded 34 pages/cpu s100632 r8192 d30440 u139264 > > > [ 0.000000] pcpu-alloc: s100632 r8192 d30440 u139264 alloc=34*4096 > > > [ 0.000000] pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3 [0] 4 [0] 5 [0] 6 [0] 7 > > > [ 0.000000] Internal error: Oops - Undefined instruction: 0000000002000000 [#1] SMP > > > > We take an undefined instruction exception in the kernel early during > > boot... > > > > > [ 0.000000] Modules linked in: > > > [ 0.000000] CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.13.0-6.13-alt0.rc1 #1 > > > [ 0.000000] Hardware name: linux,dummy-virt (DT) > > > [ 0.000000] pstate: 004000c5 (nzcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) > > > [ 0.000000] pc : __cpuinfo_store_cpu+0xe8/0x240 > > > [ 0.000000] lr : cpuinfo_store_boot_cpu+0x34/0x88 > > > [ 0.000000] sp : ffff800082013df0 > > > [ 0.000000] x29: ffff800082013df0 x28: 000000000000008e x27: ffff800081e38128 > > > [ 0.000000] x26: ffff800081702190 x25: ffff80008201f040 x24: ffff0000ff7d1d00 > > > [ 0.000000] x23: ffff80008201ec00 x22: ffff800081e39100 x21: ffff8000816f9750 > > > [ 0.000000] x20: ffff800081f55280 x19: ffff0000ff6be2e0 x18: 0000000000000000 > > > [ 0.000000] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 > > > [ 0.000000] x14: 000000000000002f x13: 000000013f7f9490 x12: 0000008000000000 > > > [ 0.000000] x11: 0000000000000000 x10: 00000000007f8000 x9 : 000000013f808000 > > > [ 0.000000] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 000000013f7f94c0 > > > [ 0.000000] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 1100010011111111 > > > [ 0.000000] x2 : 0000000000000001 x1 : 0000000084448004 x0 : ffff0000ff6be2e0 > > > [ 0.000000] Call trace: > > > [ 0.000000] __cpuinfo_store_cpu+0xe8/0x240 (P) > > > [ 0.000000] cpuinfo_store_boot_cpu+0x34/0x88 (L) > > > [ 0.000000] cpuinfo_store_boot_cpu+0x34/0x88 > > > [ 0.000000] smp_prepare_boot_cpu+0x30/0x58 > > > [ 0.000000] start_kernel+0x514/0x9d0 > > > [ 0.000000] __primary_switched+0x88/0x98 > > > [ 0.000000] Code: f100085f 54000600 f2580c7f 54000060 (d538a482) > > > > ... and that's: > > > > 0: f100085f cmp x2, #0x2 > > 4: 54000600 b.eq 0xc4 // b.none > > 8: f2580c7f tst x3, #0xf0000000000 > > c: 54000060 b.eq 0x18 // b.none > > 10:* d538a482 mrs x2, s3_0_c10_c4_4 <-- trapping instruction > > > > Which I think corresponds to a read of MPAMIDR_EL1. > > > > It looks like James routed accesses to this register to undef_access() > > in 31ff96c38ea3 ("KVM: arm64: Fix missing traps of guest accesses to the > > MPAM register") so I'm not really sure how this is supposed to work > > given that it's an ID register. > > It's not. Or rather, it is an IDREG that is only valid when MPAM is > advertised and implemented. From the spec: > > "This register is present only when FEAT_MPAM is implemented. > Otherwise, direct accesses to MPAMIDR_EL1 are UNDEFINED." > > So from a KVM perspective, I think this is doing the right thing. > > What the log doesn't say is what the host is. Is it 6.13-rc1 as well? No, host is 6.6.60. Thanks, > > Thanks, > > M. > > -- > Without deviation from the norm, progress is not possible. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: v6.13-rc1: Internal error: Oops - Undefined instruction: 0000000002000000 [#1] SMP 2024-12-02 15:59 ` Vitaly Chikunov @ 2024-12-02 16:07 ` Marc Zyngier 2024-12-02 17:53 ` Mark Rutland 2024-12-02 22:31 ` Vitaly Chikunov 0 siblings, 2 replies; 22+ messages in thread From: Marc Zyngier @ 2024-12-02 16:07 UTC (permalink / raw) To: Vitaly Chikunov Cc: Will Deacon, james.morse, linux-arm-kernel, Catalin Marinas, linux-kernel, oliver.upton, mark.rutland On Mon, 02 Dec 2024 15:59:40 +0000, Vitaly Chikunov <vt@altlinux.org> wrote: > > Marc, > > On Mon, Dec 02, 2024 at 03:53:59PM +0000, Marc Zyngier wrote: > > > > What the log doesn't say is what the host is. Is it 6.13-rc1 as well? > > No, host is 6.6.60. Right. I wouldn't be surprised if: - this v6.6 kernel doesn't hide the MPAM feature as it should (and that's proably something we should backport) - you get a nastygram in the host log telling you that the guest has executed something it shouldn't (you'll get the encoding of the instruction) Can you confirm these two things? M. -- Without deviation from the norm, progress is not possible. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: v6.13-rc1: Internal error: Oops - Undefined instruction: 0000000002000000 [#1] SMP 2024-12-02 16:07 ` Marc Zyngier @ 2024-12-02 17:53 ` Mark Rutland 2024-12-02 22:31 ` Vitaly Chikunov 1 sibling, 0 replies; 22+ messages in thread From: Mark Rutland @ 2024-12-02 17:53 UTC (permalink / raw) To: Marc Zyngier Cc: Vitaly Chikunov, Will Deacon, james.morse, linux-arm-kernel, Catalin Marinas, linux-kernel, oliver.upton On Mon, Dec 02, 2024 at 04:07:03PM +0000, Marc Zyngier wrote: > On Mon, 02 Dec 2024 15:59:40 +0000, > Vitaly Chikunov <vt@altlinux.org> wrote: > > > > Marc, > > > > On Mon, Dec 02, 2024 at 03:53:59PM +0000, Marc Zyngier wrote: > > > > > > What the log doesn't say is what the host is. Is it 6.13-rc1 as well? > > > > No, host is 6.6.60. > > Right. I wouldn't be surprised if: > > - this v6.6 kernel doesn't hide the MPAM feature as it should (and > that's proably something we should backport) Looks like v6.6.60 is missing: 6685f5d572c22e10 ("KVM: arm64: Disable MPAM visibility by default and ignore VMM writes") ... which is a fix for: 011e5f5bf529f8ec (" arm64/cpufeature: Add remaining feature bits in ID_AA64PFR0 register") ... which unintentionally exposed ID_AA64PFR0.MPAM to guests, and *is* in v6.6.60. Mark. > - you get a nastygram in the host log telling you that the guest has > executed something it shouldn't (you'll get the encoding of the > instruction) > > Can you confirm these two things? > > M. > > -- > Without deviation from the norm, progress is not possible. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: v6.13-rc1: Internal error: Oops - Undefined instruction: 0000000002000000 [#1] SMP 2024-12-02 16:07 ` Marc Zyngier 2024-12-02 17:53 ` Mark Rutland @ 2024-12-02 22:31 ` Vitaly Chikunov 2024-12-03 1:19 ` Oliver Upton 2024-12-03 9:27 ` Vitaly Chikunov 1 sibling, 2 replies; 22+ messages in thread From: Vitaly Chikunov @ 2024-12-02 22:31 UTC (permalink / raw) To: Marc Zyngier Cc: Will Deacon, james.morse, linux-arm-kernel, Catalin Marinas, linux-kernel, oliver.upton, mark.rutland Marc, On Mon, Dec 02, 2024 at 04:07:03PM +0000, Marc Zyngier wrote: > On Mon, 02 Dec 2024 15:59:40 +0000, > Vitaly Chikunov <vt@altlinux.org> wrote: > > > > Marc, > > > > On Mon, Dec 02, 2024 at 03:53:59PM +0000, Marc Zyngier wrote: > > > > > > What the log doesn't say is what the host is. Is it 6.13-rc1 as well? > > > > No, host is 6.6.60. > > Right. I wouldn't be surprised if: > > - this v6.6 kernel doesn't hide the MPAM feature as it should (and > that's proably something we should backport) How to confirm this? Currently I cannot find any (case-insensitive) "MPAM" files in /sys, nor mpam string in /proc/cpuinfo, nor MPAM strings in `strace -v` (as it decodes some KVM ioctls) of qemu process. > > - you get a nastygram in the host log telling you that the guest has > executed something it shouldn't (you'll get the encoding of the > instruction) I requested admins of the box for dmesg output since I don't have root access myself and nowadays dmesg is not accessible for a user. > > Can you confirm these two things? Also, I tried to reproduce on another Kunpeng box with slightly different HiSilicon CPU (presenting to the system as Cortex-A72) and the problem is not reproducible there. While things are not resolved, is it possible to workaround the problem with some QEMU option, kernel command line, config option, or a patch? Thanks, > > M. > > -- > Without deviation from the norm, progress is not possible. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: v6.13-rc1: Internal error: Oops - Undefined instruction: 0000000002000000 [#1] SMP 2024-12-02 22:31 ` Vitaly Chikunov @ 2024-12-03 1:19 ` Oliver Upton 2024-12-03 4:03 ` Vitaly Chikunov 2024-12-03 9:27 ` Vitaly Chikunov 1 sibling, 1 reply; 22+ messages in thread From: Oliver Upton @ 2024-12-03 1:19 UTC (permalink / raw) To: Vitaly Chikunov Cc: Marc Zyngier, Will Deacon, james.morse, linux-arm-kernel, Catalin Marinas, linux-kernel, mark.rutland On Tue, Dec 03, 2024 at 01:31:19AM +0300, Vitaly Chikunov wrote: > Marc, > > On Mon, Dec 02, 2024 at 04:07:03PM +0000, Marc Zyngier wrote: > > On Mon, 02 Dec 2024 15:59:40 +0000, > > Vitaly Chikunov <vt@altlinux.org> wrote: > > > > > > Marc, > > > > > > On Mon, Dec 02, 2024 at 03:53:59PM +0000, Marc Zyngier wrote: > > > > > > > > What the log doesn't say is what the host is. Is it 6.13-rc1 as well? > > > > > > No, host is 6.6.60. > > > > Right. I wouldn't be surprised if: > > > > - this v6.6 kernel doesn't hide the MPAM feature as it should (and > > that's proably something we should backport) > > How to confirm this? Currently I cannot find any (case-insensitive) > "MPAM" files in /sys, nor mpam string in /proc/cpuinfo, nor MPAM strings > in `strace -v` (as it decodes some KVM ioctls) of qemu process. If you can attach to the QEMU gdbstub of the VM, info registers will dump ~everything. If the value of ID_AA64PFR0_EL1.MPAM (bits 43:40) is nonzero then the host KVM is erroneously advertising MPAM to the guest. -- Thanks, Oliver ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: v6.13-rc1: Internal error: Oops - Undefined instruction: 0000000002000000 [#1] SMP 2024-12-03 1:19 ` Oliver Upton @ 2024-12-03 4:03 ` Vitaly Chikunov 2024-12-05 2:09 ` Vitaly Chikunov 0 siblings, 1 reply; 22+ messages in thread From: Vitaly Chikunov @ 2024-12-03 4:03 UTC (permalink / raw) To: Oliver Upton Cc: Marc Zyngier, Will Deacon, james.morse, linux-arm-kernel, Catalin Marinas, linux-kernel, mark.rutland Oliver, Marc, On Mon, Dec 02, 2024 at 05:19:54PM -0800, Oliver Upton wrote: > On Tue, Dec 03, 2024 at 01:31:19AM +0300, Vitaly Chikunov wrote: > > Marc, > > > > On Mon, Dec 02, 2024 at 04:07:03PM +0000, Marc Zyngier wrote: > > > On Mon, 02 Dec 2024 15:59:40 +0000, > > > Vitaly Chikunov <vt@altlinux.org> wrote: > > > > > > > > Marc, > > > > > > > > On Mon, Dec 02, 2024 at 03:53:59PM +0000, Marc Zyngier wrote: > > > > > > > > > > What the log doesn't say is what the host is. Is it 6.13-rc1 as well? > > > > > > > > No, host is 6.6.60. > > > > > > Right. I wouldn't be surprised if: > > > > > > - this v6.6 kernel doesn't hide the MPAM feature as it should (and > > > that's proably something we should backport) > > > > How to confirm this? Currently I cannot find any (case-insensitive) > > "MPAM" files in /sys, nor mpam string in /proc/cpuinfo, nor MPAM strings > > in `strace -v` (as it decodes some KVM ioctls) of qemu process. > > If you can attach to the QEMU gdbstub of the VM, info registers will > dump ~everything. > > If the value of ID_AA64PFR0_EL1.MPAM (bits 43:40) is nonzero then the > host KVM is erroneously advertising MPAM to the guest. I don't find such register. There is what I get: (gdb) target remote :1234 Remote debugging using :1234 0x0000000040000000 in ?? () (gdb) pipe i r | grep ID_AA64PFR ID_AA64PFR1_EL1 0x0 0 ID_AA64PFR2_EL1_RESERVED 0x0 0 ID_AA64PFR3_EL1_RESERVED 0x0 0 ID_AA64PFR6_EL1_RESERVED 0x0 0 ID_AA64PFR7_EL1_RESERVED 0x0 0 (gdb) This seems to be MPAM_frac, and it's 0, so "MPAM Extension not implemented"[1]. Thanks, [1] https://developer.arm.com/documentation/ddi0595/2021-06/AArch64-Registers/ID-AA64PFR1-EL1--AArch64-Processor-Feature-Register-1?lang=en#fieldset_0-19_16 > > -- > Thanks, > Oliver ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: v6.13-rc1: Internal error: Oops - Undefined instruction: 0000000002000000 [#1] SMP 2024-12-03 4:03 ` Vitaly Chikunov @ 2024-12-05 2:09 ` Vitaly Chikunov 0 siblings, 0 replies; 22+ messages in thread From: Vitaly Chikunov @ 2024-12-05 2:09 UTC (permalink / raw) To: Oliver Upton Cc: Marc Zyngier, Will Deacon, james.morse, linux-arm-kernel, Catalin Marinas, linux-kernel, mark.rutland Oliver, On Tue, Dec 03, 2024 at 07:03:50AM +0300, Vitaly Chikunov wrote: > On Mon, Dec 02, 2024 at 05:19:54PM -0800, Oliver Upton wrote: > > On Tue, Dec 03, 2024 at 01:31:19AM +0300, Vitaly Chikunov wrote: > > > Marc, > > > > > > On Mon, Dec 02, 2024 at 04:07:03PM +0000, Marc Zyngier wrote: > > > > On Mon, 02 Dec 2024 15:59:40 +0000, > > > > Vitaly Chikunov <vt@altlinux.org> wrote: > > > > > > > > > > Marc, > > > > > > > > > > On Mon, Dec 02, 2024 at 03:53:59PM +0000, Marc Zyngier wrote: > > > > > > > > > > > > What the log doesn't say is what the host is. Is it 6.13-rc1 as well? > > > > > > > > > > No, host is 6.6.60. > > > > > > > > Right. I wouldn't be surprised if: > > > > > > > > - this v6.6 kernel doesn't hide the MPAM feature as it should (and > > > > that's proably something we should backport) > > > > > > How to confirm this? Currently I cannot find any (case-insensitive) > > > "MPAM" files in /sys, nor mpam string in /proc/cpuinfo, nor MPAM strings > > > in `strace -v` (as it decodes some KVM ioctls) of qemu process. > > > > If you can attach to the QEMU gdbstub of the VM, info registers will > > dump ~everything. > > > > If the value of ID_AA64PFR0_EL1.MPAM (bits 43:40) is nonzero then the > > host KVM is erroneously advertising MPAM to the guest. > > I don't find such register. There is what I get: Thanks to ArmCpuInfo.efi I can confirm MPAM is advertised. Shell> ArmCpuInfo.efi ID_AA64PFR0_EL1 = 0x1100010011111111 ... PFR0 | MPAM | 43:40 | 0001 | FEAT_MPAM v1.0 implemented. I prepared the kernel with Marc's patch (backport of 6685f5d572c22e10 to 6.6) and am waiting for the admins to boot it, hopefully today or tomorrow. Thanks, ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: v6.13-rc1: Internal error: Oops - Undefined instruction: 0000000002000000 [#1] SMP 2024-12-02 22:31 ` Vitaly Chikunov 2024-12-03 1:19 ` Oliver Upton @ 2024-12-03 9:27 ` Vitaly Chikunov 2024-12-03 10:03 ` Shameerali Kolothum Thodi 1 sibling, 1 reply; 22+ messages in thread From: Vitaly Chikunov @ 2024-12-03 9:27 UTC (permalink / raw) To: Marc Zyngier Cc: Will Deacon, james.morse, linux-arm-kernel, Catalin Marinas, linux-kernel, oliver.upton, mark.rutland Marc, On Tue, Dec 03, 2024 at 01:31:19AM +0300, Vitaly Chikunov wrote: > On Mon, Dec 02, 2024 at 04:07:03PM +0000, Marc Zyngier wrote: > > On Mon, 02 Dec 2024 15:59:40 +0000, > > Vitaly Chikunov <vt@altlinux.org> wrote: > > > > > > Marc, > > > > > > On Mon, Dec 02, 2024 at 03:53:59PM +0000, Marc Zyngier wrote: > > > > > > > > What the log doesn't say is what the host is. Is it 6.13-rc1 as well? > > > > > > No, host is 6.6.60. > > > > Right. I wouldn't be surprised if: > > > > - this v6.6 kernel doesn't hide the MPAM feature as it should (and > > that's proably something we should backport) > > How to confirm this? Currently I cannot find any (case-insensitive) > "MPAM" files in /sys, nor mpam string in /proc/cpuinfo, nor MPAM strings > in `strace -v` (as it decodes some KVM ioctls) of qemu process. > > > > > - you get a nastygram in the host log telling you that the guest has > > executed something it shouldn't (you'll get the encoding of the > > instruction) > > I requested admins of the box for dmesg output since I don't have root > access myself and nowadays dmesg is not accessible for a user. This is what they reported: kvm [2502822]: Unsupported guest sys_reg access at: ffff80008003e9f0 [000000c5] { Op0( 3), Op1( 0), CRn(10), CRm( 4), Op2( 4), func_read }, Thanks, > > > > > Can you confirm these two things? > > Also, I tried to reproduce on another Kunpeng box with slightly > different HiSilicon CPU (presenting to the system as Cortex-A72) and the > problem is not reproducible there. > > While things are not resolved, is it possible to workaround the problem > with some QEMU option, kernel command line, config option, or a patch? > > Thanks, > > > > > M. > > > > -- > > Without deviation from the norm, progress is not possible. ^ permalink raw reply [flat|nested] 22+ messages in thread
* RE: v6.13-rc1: Internal error: Oops - Undefined instruction: 0000000002000000 [#1] SMP 2024-12-03 9:27 ` Vitaly Chikunov @ 2024-12-03 10:03 ` Shameerali Kolothum Thodi 2024-12-03 22:14 ` Vitaly Chikunov 0 siblings, 1 reply; 22+ messages in thread From: Shameerali Kolothum Thodi @ 2024-12-03 10:03 UTC (permalink / raw) To: Vitaly Chikunov, Marc Zyngier Cc: Will Deacon, james.morse@arm.com, linux-arm-kernel@lists.infradead.org, Catalin Marinas, linux-kernel@vger.kernel.org, oliver.upton@linux.dev, mark.rutland@arm.com, Wangzhou (B) > -----Original Message----- > From: linux-arm-kernel <linux-arm-kernel-bounces@lists.infradead.org> On > Behalf Of Vitaly Chikunov > Sent: Tuesday, December 3, 2024 9:27 AM > To: Marc Zyngier <maz@kernel.org> > Cc: Will Deacon <will@kernel.org>; james.morse@arm.com; linux-arm- > kernel@lists.infradead.org; Catalin Marinas <catalin.marinas@arm.com>; > linux-kernel@vger.kernel.org; oliver.upton@linux.dev; > mark.rutland@arm.com > Subject: Re: v6.13-rc1: Internal error: Oops - Undefined instruction: > 0000000002000000 [#1] SMP > > Marc, > > On Tue, Dec 03, 2024 at 01:31:19AM +0300, Vitaly Chikunov wrote: > > On Mon, Dec 02, 2024 at 04:07:03PM +0000, Marc Zyngier wrote: > > > On Mon, 02 Dec 2024 15:59:40 +0000, > > > Vitaly Chikunov <vt@altlinux.org> wrote: > > > > > > > > Marc, > > > > > > > > On Mon, Dec 02, 2024 at 03:53:59PM +0000, Marc Zyngier wrote: > > > > > > > > > > What the log doesn't say is what the host is. Is it 6.13-rc1 as well? > > > > > > > > No, host is 6.6.60. > > > > > > Right. I wouldn't be surprised if: > > > > > > - this v6.6 kernel doesn't hide the MPAM feature as it should (and > > > that's proably something we should backport) > > > > How to confirm this? Currently I cannot find any (case-insensitive) > > "MPAM" files in /sys, nor mpam string in /proc/cpuinfo, nor MPAM > > strings in `strace -v` (as it decodes some KVM ioctls) of qemu process. > > > > > > > > - you get a nastygram in the host log telling you that the guest has > > > executed something it shouldn't (you'll get the encoding of the > > > instruction) > > > > I requested admins of the box for dmesg output since I don't have root > > access myself and nowadays dmesg is not accessible for a user. > > This is what they reported: > > kvm [2502822]: Unsupported guest sys_reg access at: ffff80008003e9f0 > [000000c5] > { Op0( 3), Op1( 0), CRn(10), CRm( 4), Op2( 4), func_read }, > As Will pointed out I think this is access to MPAMIDR_EL1 and is from this code here, +++ b/arch/arm64/kernel/cpuinfo.c @@ -478,6 +478,9 @@ static void __cpuinfo_store_cpu(struct cpuinfo_arm64 *info) if (id_aa64pfr0_32bit_el0(info->reg_id_aa64pfr0)) __cpuinfo_store_cpu_32bit(&info->aarch32); + if (id_aa64pfr0_mpam(info->reg_id_aa64pfr0)) + info->reg_mpamidr = read_cpuid(MPAMIDR_EL1); + cpuinfo_detect_icache_policy(info); } I did manage to boot my setup in 6.6 and this is what happens, Host kernel 6.6 Guest Kernel 6.13-rc1 [ 0.195392] smp: Brought up 1 node, 8 CPUs [ 0.219000] SMP: Total of 8 processors activated. [ 0.219629] CPU: All CPU(s) started at EL1 ... [ 0.223212] CPU features: detected: RAS Extension Support [ 0.223927] CPU features: detected: Memory Partitioning And Monitoring [ 0.224796] CPU features: detected: Memory Partitioning And Monitoring Virtualisation [ 0.225961] alternatives: applying system-wide alternatives ... Guest detects MPAM and boots fine. Host kernel 6.13-rc1 Guest Kernel 6.13-rc1 [ 0.196625] smp: Brought up 1 node, 8 CPUs [ 0.222093] SMP: Total of 8 processors activated. [ 0.222769] CPU: All CPU(s) started at EL1 ... [ 0.226620] CPU features: detected: RAS Extension Support [ 0.227453] alternatives: applying system-wide alternatives MPAM is not visible to Guest in this case. So as I pointed out earlier could it be a case where the ID register reports MPAM support but the firmware has not enabled MPAM? James seems to be mentioning that case here, " (If you have a boot failure that bisects here its likely your CPUs advertise MPAM in the id registers, but firmware failed to either enable or MPAM, or emulate the trap as if it were disabled)" https://lore.kernel.org/all/20241030160317.2528209-4-joey.gouly@arm.com/ Is there a way you can find out the BIOS version on that board? Thanks, Shameer ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: v6.13-rc1: Internal error: Oops - Undefined instruction: 0000000002000000 [#1] SMP 2024-12-03 10:03 ` Shameerali Kolothum Thodi @ 2024-12-03 22:14 ` Vitaly Chikunov 2024-12-04 8:51 ` Marc Zyngier 0 siblings, 1 reply; 22+ messages in thread From: Vitaly Chikunov @ 2024-12-03 22:14 UTC (permalink / raw) To: Shameerali Kolothum Thodi Cc: Marc Zyngier, Will Deacon, james.morse@arm.com, linux-arm-kernel@lists.infradead.org, Catalin Marinas, linux-kernel@vger.kernel.org, oliver.upton@linux.dev, mark.rutland@arm.com, Wangzhou (B), Dmitry V. Levin Shameer, Marc, Oliver, Will, On Tue, Dec 03, 2024 at 10:03:11AM +0000, Shameerali Kolothum Thodi wrote: > > -----Original Message----- > > From: linux-arm-kernel <linux-arm-kernel-bounces@lists.infradead.org> On > > Behalf Of Vitaly Chikunov > > Sent: Tuesday, December 3, 2024 9:27 AM > > To: Marc Zyngier <maz@kernel.org> > > Cc: Will Deacon <will@kernel.org>; james.morse@arm.com; linux-arm- > > kernel@lists.infradead.org; Catalin Marinas <catalin.marinas@arm.com>; > > linux-kernel@vger.kernel.org; oliver.upton@linux.dev; > > mark.rutland@arm.com > > Subject: Re: v6.13-rc1: Internal error: Oops - Undefined instruction: > > 0000000002000000 [#1] SMP > > > > Marc, > > > > On Tue, Dec 03, 2024 at 01:31:19AM +0300, Vitaly Chikunov wrote: > > > On Mon, Dec 02, 2024 at 04:07:03PM +0000, Marc Zyngier wrote: > > > > On Mon, 02 Dec 2024 15:59:40 +0000, > > > > Vitaly Chikunov <vt@altlinux.org> wrote: > > > > > > > > > > Marc, > > > > > > > > > > On Mon, Dec 02, 2024 at 03:53:59PM +0000, Marc Zyngier wrote: > > > > > > > > > > > > What the log doesn't say is what the host is. Is it 6.13-rc1 as well? > > > > > > > > > > No, host is 6.6.60. > > > > > > > > Right. I wouldn't be surprised if: > > > > > > > > - this v6.6 kernel doesn't hide the MPAM feature as it should (and > > > > that's proably something we should backport) > > > > > > How to confirm this? Currently I cannot find any (case-insensitive) > > > "MPAM" files in /sys, nor mpam string in /proc/cpuinfo, nor MPAM > > > strings in `strace -v` (as it decodes some KVM ioctls) of qemu process. > > > > > > > > > > > - you get a nastygram in the host log telling you that the guest has > > > > executed something it shouldn't (you'll get the encoding of the > > > > instruction) > > > > > > I requested admins of the box for dmesg output since I don't have root > > > access myself and nowadays dmesg is not accessible for a user. > > > > This is what they reported: > > > > kvm [2502822]: Unsupported guest sys_reg access at: ffff80008003e9f0 > > [000000c5] > > { Op0( 3), Op1( 0), CRn(10), CRm( 4), Op2( 4), func_read }, > > > > As Will pointed out I think this is access to MPAMIDR_EL1 and is from this > code here, > > +++ b/arch/arm64/kernel/cpuinfo.c > @@ -478,6 +478,9 @@ static void __cpuinfo_store_cpu(struct cpuinfo_arm64 *info) > if (id_aa64pfr0_32bit_el0(info->reg_id_aa64pfr0)) > __cpuinfo_store_cpu_32bit(&info->aarch32); > > + if (id_aa64pfr0_mpam(info->reg_id_aa64pfr0)) > + info->reg_mpamidr = read_cpuid(MPAMIDR_EL1); > + > cpuinfo_detect_icache_policy(info); > } > > I did manage to boot my setup in 6.6 and this is what happens, > > Host kernel 6.6 > Guest Kernel 6.13-rc1 > > [ 0.195392] smp: Brought up 1 node, 8 CPUs > [ 0.219000] SMP: Total of 8 processors activated. > [ 0.219629] CPU: All CPU(s) started at EL1 > ... > [ 0.223212] CPU features: detected: RAS Extension Support > [ 0.223927] CPU features: detected: Memory Partitioning And Monitoring > [ 0.224796] CPU features: detected: Memory Partitioning And Monitoring Virtualisation > [ 0.225961] alternatives: applying system-wide alternatives > ... > > Guest detects MPAM and boots fine. > > Host kernel 6.13-rc1 > Guest Kernel 6.13-rc1 > > [ 0.196625] smp: Brought up 1 node, 8 CPUs > [ 0.222093] SMP: Total of 8 processors activated. > [ 0.222769] CPU: All CPU(s) started at EL1 > ... > [ 0.226620] CPU features: detected: RAS Extension Support > [ 0.227453] alternatives: applying system-wide alternatives > > MPAM is not visible to Guest in this case. > > So as I pointed out earlier could it be a case where the ID register reports MPAM support > but the firmware has not enabled MPAM? > > James seems to be mentioning that case here, > > " (If you have a boot failure that bisects here its likely your CPUs > advertise MPAM in the id registers, but firmware failed to either enable > or MPAM, or emulate the trap as if it were disabled)" I tried to verify that MPAM is advertised with qemu+gdb method, as suggested by Oliver, but ID_AA64PFR0_EL1 register is not there. (gdb) i r ID_AA64PFR0_EL1 Invalid register `ID_AA64PFR0_EL1' Are there other suggestions? > > https://lore.kernel.org/all/20241030160317.2528209-4-joey.gouly@arm.com/ > > Is there a way you can find out the BIOS version on that board? Unfortunately, admins of the server do not provide me with this info. For such cases, when MPAM is incorrectly advertised, can we have kernel command line parameter like mpam=0 to override it's detection? I think with "If you have a boot failure that bisects here" it's acknowledged possibility and it's confirmed by our server. Thanks, > > Thanks, > Shameer ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: v6.13-rc1: Internal error: Oops - Undefined instruction: 0000000002000000 [#1] SMP 2024-12-03 22:14 ` Vitaly Chikunov @ 2024-12-04 8:51 ` Marc Zyngier 2024-12-04 18:34 ` Vitaly Chikunov ` (2 more replies) 0 siblings, 3 replies; 22+ messages in thread From: Marc Zyngier @ 2024-12-04 8:51 UTC (permalink / raw) To: Vitaly Chikunov Cc: Shameerali Kolothum Thodi, Will Deacon, james.morse@arm.com, linux-arm-kernel@lists.infradead.org, Catalin Marinas, linux-kernel@vger.kernel.org, oliver.upton@linux.dev, mark.rutland@arm.com, Wangzhou (B), Dmitry V. Levin On Tue, 03 Dec 2024 22:14:53 +0000, Vitaly Chikunov <vt@altlinux.org> wrote: > > Shameer, Marc, Oliver, Will, > > On Tue, Dec 03, 2024 at 10:03:11AM +0000, Shameerali Kolothum Thodi wrote: > > > -----Original Message----- > > > From: linux-arm-kernel <linux-arm-kernel-bounces@lists.infradead.org> On > > > Behalf Of Vitaly Chikunov > > > Sent: Tuesday, December 3, 2024 9:27 AM > > > To: Marc Zyngier <maz@kernel.org> > > > Cc: Will Deacon <will@kernel.org>; james.morse@arm.com; linux-arm- > > > kernel@lists.infradead.org; Catalin Marinas <catalin.marinas@arm.com>; > > > linux-kernel@vger.kernel.org; oliver.upton@linux.dev; > > > mark.rutland@arm.com > > > Subject: Re: v6.13-rc1: Internal error: Oops - Undefined instruction: > > > 0000000002000000 [#1] SMP > > > > > > Marc, > > > > > > On Tue, Dec 03, 2024 at 01:31:19AM +0300, Vitaly Chikunov wrote: > > > > On Mon, Dec 02, 2024 at 04:07:03PM +0000, Marc Zyngier wrote: > > > > > On Mon, 02 Dec 2024 15:59:40 +0000, > > > > > Vitaly Chikunov <vt@altlinux.org> wrote: > > > > > > > > > > > > Marc, > > > > > > > > > > > > On Mon, Dec 02, 2024 at 03:53:59PM +0000, Marc Zyngier wrote: > > > > > > > > > > > > > > What the log doesn't say is what the host is. Is it 6.13-rc1 as well? > > > > > > > > > > > > No, host is 6.6.60. > > > > > > > > > > Right. I wouldn't be surprised if: > > > > > > > > > > - this v6.6 kernel doesn't hide the MPAM feature as it should (and > > > > > that's proably something we should backport) > > > > > > > > How to confirm this? Currently I cannot find any (case-insensitive) > > > > "MPAM" files in /sys, nor mpam string in /proc/cpuinfo, nor MPAM > > > > strings in `strace -v` (as it decodes some KVM ioctls) of qemu process. > > > > > > > > > > > > > > - you get a nastygram in the host log telling you that the guest has > > > > > executed something it shouldn't (you'll get the encoding of the > > > > > instruction) > > > > > > > > I requested admins of the box for dmesg output since I don't have root > > > > access myself and nowadays dmesg is not accessible for a user. > > > > > > This is what they reported: > > > > > > kvm [2502822]: Unsupported guest sys_reg access at: ffff80008003e9f0 > > > [000000c5] > > > { Op0( 3), Op1( 0), CRn(10), CRm( 4), Op2( 4), func_read }, > > > > > > > As Will pointed out I think this is access to MPAMIDR_EL1 and is from this > > code here, > > > > +++ b/arch/arm64/kernel/cpuinfo.c > > @@ -478,6 +478,9 @@ static void __cpuinfo_store_cpu(struct cpuinfo_arm64 *info) > > if (id_aa64pfr0_32bit_el0(info->reg_id_aa64pfr0)) > > __cpuinfo_store_cpu_32bit(&info->aarch32); > > > > + if (id_aa64pfr0_mpam(info->reg_id_aa64pfr0)) > > + info->reg_mpamidr = read_cpuid(MPAMIDR_EL1); > > + > > cpuinfo_detect_icache_policy(info); > > } > > > > I did manage to boot my setup in 6.6 and this is what happens, > > > > Host kernel 6.6 > > Guest Kernel 6.13-rc1 > > > > [ 0.195392] smp: Brought up 1 node, 8 CPUs > > [ 0.219000] SMP: Total of 8 processors activated. > > [ 0.219629] CPU: All CPU(s) started at EL1 > > ... > > [ 0.223212] CPU features: detected: RAS Extension Support > > [ 0.223927] CPU features: detected: Memory Partitioning And Monitoring > > [ 0.224796] CPU features: detected: Memory Partitioning And Monitoring Virtualisation > > [ 0.225961] alternatives: applying system-wide alternatives > > ... > > > > Guest detects MPAM and boots fine. > > > > Host kernel 6.13-rc1 > > Guest Kernel 6.13-rc1 > > > > [ 0.196625] smp: Brought up 1 node, 8 CPUs > > [ 0.222093] SMP: Total of 8 processors activated. > > [ 0.222769] CPU: All CPU(s) started at EL1 > > ... > > [ 0.226620] CPU features: detected: RAS Extension Support > > [ 0.227453] alternatives: applying system-wide alternatives > > > > MPAM is not visible to Guest in this case. > > > > So as I pointed out earlier could it be a case where the ID register reports MPAM support > > but the firmware has not enabled MPAM? > > > > James seems to be mentioning that case here, > > > > " (If you have a boot failure that bisects here its likely your CPUs > > advertise MPAM in the id registers, but firmware failed to either enable > > or MPAM, or emulate the trap as if it were disabled)" > > I tried to verify that MPAM is advertised with qemu+gdb method, as > suggested by Oliver, but ID_AA64PFR0_EL1 register is not there. > > (gdb) i r ID_AA64PFR0_EL1 > Invalid register `ID_AA64PFR0_EL1' Then there is a bug in either QEMU or the GDB stubs. This register exists, or you wouldn't be here. > > Are there other suggestions? Mark has described what the problem is likely to be. 6.6-stable needs to have 6685f5d572c22e10 backported, and it probably should have been Cc: to stable. Can you please apply the following patch to your *host* machine and retest? diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c index 370a1a7bd369..258a39bcd3c7 100644 --- a/arch/arm64/kvm/sys_regs.c +++ b/arch/arm64/kvm/sys_regs.c @@ -1330,6 +1330,7 @@ static u64 __kvm_read_sanitised_id_reg(const struct kvm_vcpu *vcpu, val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_MTE); val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_SME); + val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_MPAM_frac); break; case SYS_ID_AA64ISAR1_EL1: if (!vcpu_has_ptrauth(vcpu)) @@ -1472,6 +1473,13 @@ static u64 read_sanitised_id_aa64pfr0_el1(struct kvm_vcpu *vcpu, val &= ~ID_AA64PFR0_EL1_AMU_MASK; + /* + * MPAM is disabled by default as KVM also needs a set of PARTID to + * program the MPAMVPMx_EL2 PARTID remapping registers with. But some + * older kernels let the guest see the ID bit. + */ + val &= ~ID_AA64PFR0_EL1_MPAM_MASK; + return val; } @@ -1560,6 +1568,29 @@ static int set_id_dfr0_el1(struct kvm_vcpu *vcpu, return set_id_reg(vcpu, rd, val); } +static int set_id_aa64pfr0_el1(struct kvm_vcpu *vcpu, + const struct sys_reg_desc *rd, u64 user_val) +{ + u64 hw_val = read_sanitised_ftr_reg(SYS_ID_AA64PFR0_EL1); + u64 mpam_mask = ID_AA64PFR0_EL1_MPAM_MASK; + + /* + * Commit 011e5f5bf529f ("arm64/cpufeature: Add remaining feature bits + * in ID_AA64PFR0 register") exposed the MPAM field of AA64PFR0_EL1 to + * guests, but didn't add trap handling. KVM doesn't support MPAM and + * always returns an UNDEF for these registers. The guest must see 0 + * for this field. + * + * But KVM must also accept values from user-space that were provided + * by KVM. On CPUs that support MPAM, permit user-space to write + * the sanitizied value to ID_AA64PFR0_EL1.MPAM, but ignore this field. + */ + if ((hw_val & mpam_mask) == (user_val & mpam_mask)) + user_val &= ~ID_AA64PFR0_EL1_MPAM_MASK; + + return set_id_reg(vcpu, rd, user_val); +} + /* * cpufeature ID register user accessors * @@ -2018,7 +2049,7 @@ static const struct sys_reg_desc sys_reg_descs[] = { { SYS_DESC(SYS_ID_AA64PFR0_EL1), .access = access_id_reg, .get_user = get_id_reg, - .set_user = set_id_reg, + .set_user = set_id_aa64pfr0_el1, .reset = read_sanitised_id_aa64pfr0_el1, .val = ID_AA64PFR0_EL1_CSV2_MASK | ID_AA64PFR0_EL1_CSV3_MASK, }, ID_SANITISED(ID_AA64PFR1_EL1), > > https://lore.kernel.org/all/20241030160317.2528209-4-joey.gouly@arm.com/ > > > > Is there a way you can find out the BIOS version on that board? > > Unfortunately, admins of the server do not provide me with this > info. This doesn't really help, I'm afraid. > For such cases, when MPAM is incorrectly advertised, can we have kernel > command line parameter like mpam=0 to override it's detection? We could, but only when we can confirm what the problem is. > I think with "If you have a boot failure that bisects here" it's > acknowledged possibility and it's confirmed by our server. Not really. This talks about firmware. We are debugging the hypervisor here. This might be closely related, but these are not the same things. Thanks, M. -- Without deviation from the norm, progress is not possible. ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: v6.13-rc1: Internal error: Oops - Undefined instruction: 0000000002000000 [#1] SMP 2024-12-04 8:51 ` Marc Zyngier @ 2024-12-04 18:34 ` Vitaly Chikunov 2024-12-04 19:13 ` Marc Zyngier 2024-12-05 8:53 ` Shameerali Kolothum Thodi 2024-12-04 18:53 ` Vitaly Chikunov 2024-12-06 20:56 ` Vitaly Chikunov 2 siblings, 2 replies; 22+ messages in thread From: Vitaly Chikunov @ 2024-12-04 18:34 UTC (permalink / raw) To: Marc Zyngier Cc: Shameerali Kolothum Thodi, Will Deacon, james.morse@arm.com, linux-arm-kernel@lists.infradead.org, Catalin Marinas, linux-kernel@vger.kernel.org, oliver.upton@linux.dev, mark.rutland@arm.com, Wangzhou (B), Gleb Fotengauer-Malinovskiy Marc, On Wed, Dec 04, 2024 at 08:51:26AM +0000, Marc Zyngier wrote: > On Tue, 03 Dec 2024 22:14:53 +0000, > Vitaly Chikunov <vt@altlinux.org> wrote: > > > > Shameer, Marc, Oliver, Will, > > > > On Tue, Dec 03, 2024 at 10:03:11AM +0000, Shameerali Kolothum Thodi wrote: > > > > -----Original Message----- > > > > From: linux-arm-kernel <linux-arm-kernel-bounces@lists.infradead.org> On > > > > Behalf Of Vitaly Chikunov > > > > Sent: Tuesday, December 3, 2024 9:27 AM > > > > To: Marc Zyngier <maz@kernel.org> > > > > Cc: Will Deacon <will@kernel.org>; james.morse@arm.com; linux-arm- > > > > kernel@lists.infradead.org; Catalin Marinas <catalin.marinas@arm.com>; > > > > linux-kernel@vger.kernel.org; oliver.upton@linux.dev; > > > > mark.rutland@arm.com > > > > Subject: Re: v6.13-rc1: Internal error: Oops - Undefined instruction: > > > > 0000000002000000 [#1] SMP > > > > > > > > Marc, > > > > > > > > On Tue, Dec 03, 2024 at 01:31:19AM +0300, Vitaly Chikunov wrote: > > > > > On Mon, Dec 02, 2024 at 04:07:03PM +0000, Marc Zyngier wrote: > > > > > > On Mon, 02 Dec 2024 15:59:40 +0000, > > > > > > Vitaly Chikunov <vt@altlinux.org> wrote: > > > > > > > > > > > > > > Marc, > > > > > > > > > > > > > > On Mon, Dec 02, 2024 at 03:53:59PM +0000, Marc Zyngier wrote: > > > > > > > > > > > > > > > > What the log doesn't say is what the host is. Is it 6.13-rc1 as well? > > > > > > > > > > > > > > No, host is 6.6.60. > > > > > > > > > > > > Right. I wouldn't be surprised if: > > > > > > > > > > > > - this v6.6 kernel doesn't hide the MPAM feature as it should (and > > > > > > that's proably something we should backport) > > > > > > > > > > How to confirm this? Currently I cannot find any (case-insensitive) > > > > > "MPAM" files in /sys, nor mpam string in /proc/cpuinfo, nor MPAM > > > > > strings in `strace -v` (as it decodes some KVM ioctls) of qemu process. > > > > > > > > > > > > > > > > > - you get a nastygram in the host log telling you that the guest has > > > > > > executed something it shouldn't (you'll get the encoding of the > > > > > > instruction) > > > > > > > > > > I requested admins of the box for dmesg output since I don't have root > > > > > access myself and nowadays dmesg is not accessible for a user. > > > > > > > > This is what they reported: > > > > > > > > kvm [2502822]: Unsupported guest sys_reg access at: ffff80008003e9f0 > > > > [000000c5] > > > > { Op0( 3), Op1( 0), CRn(10), CRm( 4), Op2( 4), func_read }, > > > > > > > > > > As Will pointed out I think this is access to MPAMIDR_EL1 and is from this > > > code here, > > > > > > +++ b/arch/arm64/kernel/cpuinfo.c > > > @@ -478,6 +478,9 @@ static void __cpuinfo_store_cpu(struct cpuinfo_arm64 *info) > > > if (id_aa64pfr0_32bit_el0(info->reg_id_aa64pfr0)) > > > __cpuinfo_store_cpu_32bit(&info->aarch32); > > > > > > + if (id_aa64pfr0_mpam(info->reg_id_aa64pfr0)) > > > + info->reg_mpamidr = read_cpuid(MPAMIDR_EL1); > > > + > > > cpuinfo_detect_icache_policy(info); > > > } > > > > > > I did manage to boot my setup in 6.6 and this is what happens, > > > > > > Host kernel 6.6 > > > Guest Kernel 6.13-rc1 > > > > > > [ 0.195392] smp: Brought up 1 node, 8 CPUs > > > [ 0.219000] SMP: Total of 8 processors activated. > > > [ 0.219629] CPU: All CPU(s) started at EL1 > > > ... > > > [ 0.223212] CPU features: detected: RAS Extension Support > > > [ 0.223927] CPU features: detected: Memory Partitioning And Monitoring > > > [ 0.224796] CPU features: detected: Memory Partitioning And Monitoring Virtualisation > > > [ 0.225961] alternatives: applying system-wide alternatives > > > ... > > > > > > Guest detects MPAM and boots fine. > > > > > > Host kernel 6.13-rc1 > > > Guest Kernel 6.13-rc1 > > > > > > [ 0.196625] smp: Brought up 1 node, 8 CPUs > > > [ 0.222093] SMP: Total of 8 processors activated. > > > [ 0.222769] CPU: All CPU(s) started at EL1 > > > ... > > > [ 0.226620] CPU features: detected: RAS Extension Support > > > [ 0.227453] alternatives: applying system-wide alternatives > > > > > > MPAM is not visible to Guest in this case. > > > > > > So as I pointed out earlier could it be a case where the ID register reports MPAM support > > > but the firmware has not enabled MPAM? > > > > > > James seems to be mentioning that case here, > > > > > > " (If you have a boot failure that bisects here its likely your CPUs > > > advertise MPAM in the id registers, but firmware failed to either enable > > > or MPAM, or emulate the trap as if it were disabled)" > > > > I tried to verify that MPAM is advertised with qemu+gdb method, as > > suggested by Oliver, but ID_AA64PFR0_EL1 register is not there. > > > > (gdb) i r ID_AA64PFR0_EL1 > > Invalid register `ID_AA64PFR0_EL1' > > Then there is a bug in either QEMU or the GDB stubs. This register > exists, or you wouldn't be here. In case this is useful: builder@aarch64:/.in$ qemu-system-aarch64 --version QEMU emulator version 9.1.1 (qemu-9.1.1-alt2) Copyright (c) 2003-2024 Fabrice Bellard and the QEMU Project developers builder@aarch64:/.in$ gdb --version GNU gdb (GDB) 14.1.0.56.d739d4fd457-alt1 (ALT Sisyphus) Copyright (C) 2023 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Is there way to get content of this register with these possible gdb/qemu bugs? Perhaps, we can add some debugging print in guest kernel. > > > > Are there other suggestions? > > Mark has described what the problem is likely to be. 6.6-stable needs > to have 6685f5d572c22e10 backported, and it probably should have been > Cc: to stable. Can you please apply the following patch to your *host* > machine and retest? Unfortunately I cannot. But I can apply patches to the guest kernel. [I will try to convince admins of the server to apply the patch, though, but this can take time, and they can refuse since this is production build server and it's update procedure is complicated.] > > > https://lore.kernel.org/all/20241030160317.2528209-4-joey.gouly@arm.com/ > > > > > > Is there a way you can find out the BIOS version on that board? > > > > Unfortunately, admins of the server do not provide me with this > > info. > > This doesn't really help, I'm afraid. They provided me with this dmicedcode output: Handle 0x0000, DMI type 0, 26 bytes BIOS Information Vendor: Huawei Corp. Version: 1.25 Release Date: 01/17/2020 Address: 0x5F000 Runtime Size: 644 kB ROM Size: 6 MB Handle 0x0001, DMI type 1, 27 bytes System Information Manufacturer: Huawei Product Name: TaiShan 200 (Model 2280) Admins said they are in process of requesting firmware upgrade from manufacturer. > > > For such cases, when MPAM is incorrectly advertised, can we have kernel > > command line parameter like mpam=0 to override it's detection? > > We could, but only when we can confirm what the problem is. > > > I think with "If you have a boot failure that bisects here" it's > > acknowledged possibility and it's confirmed by our server. > > Not really. This talks about firmware. We are debugging the hypervisor > here. This might be closely related, but these are not the same > things. Understandable. Thanks, > > Thanks, > > M. > > -- > Without deviation from the norm, progress is not possible. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: v6.13-rc1: Internal error: Oops - Undefined instruction: 0000000002000000 [#1] SMP 2024-12-04 18:34 ` Vitaly Chikunov @ 2024-12-04 19:13 ` Marc Zyngier 2024-12-05 8:53 ` Shameerali Kolothum Thodi 1 sibling, 0 replies; 22+ messages in thread From: Marc Zyngier @ 2024-12-04 19:13 UTC (permalink / raw) To: Vitaly Chikunov Cc: Shameerali Kolothum Thodi, Will Deacon, james.morse@arm.com, linux-arm-kernel@lists.infradead.org, Catalin Marinas, linux-kernel@vger.kernel.org, oliver.upton@linux.dev, mark.rutland@arm.com, Wangzhou (B), Gleb Fotengauer-Malinovskiy On Wed, 04 Dec 2024 18:34:53 +0000, Vitaly Chikunov <vt@altlinux.org> wrote: > > Marc, > > On Wed, Dec 04, 2024 at 08:51:26AM +0000, Marc Zyngier wrote: > > On Tue, 03 Dec 2024 22:14:53 +0000, > > Vitaly Chikunov <vt@altlinux.org> wrote: > > > > > > Shameer, Marc, Oliver, Will, > > > > > > On Tue, Dec 03, 2024 at 10:03:11AM +0000, Shameerali Kolothum Thodi wrote: > > > > > -----Original Message----- > > > > > From: linux-arm-kernel <linux-arm-kernel-bounces@lists.infradead.org> On > > > > > Behalf Of Vitaly Chikunov > > > > > Sent: Tuesday, December 3, 2024 9:27 AM > > > > > To: Marc Zyngier <maz@kernel.org> > > > > > Cc: Will Deacon <will@kernel.org>; james.morse@arm.com; linux-arm- > > > > > kernel@lists.infradead.org; Catalin Marinas <catalin.marinas@arm.com>; > > > > > linux-kernel@vger.kernel.org; oliver.upton@linux.dev; > > > > > mark.rutland@arm.com > > > > > Subject: Re: v6.13-rc1: Internal error: Oops - Undefined instruction: > > > > > 0000000002000000 [#1] SMP > > > > > > > > > > Marc, > > > > > > > > > > On Tue, Dec 03, 2024 at 01:31:19AM +0300, Vitaly Chikunov wrote: > > > > > > On Mon, Dec 02, 2024 at 04:07:03PM +0000, Marc Zyngier wrote: > > > > > > > On Mon, 02 Dec 2024 15:59:40 +0000, > > > > > > > Vitaly Chikunov <vt@altlinux.org> wrote: > > > > > > > > > > > > > > > > Marc, > > > > > > > > > > > > > > > > On Mon, Dec 02, 2024 at 03:53:59PM +0000, Marc Zyngier wrote: > > > > > > > > > > > > > > > > > > What the log doesn't say is what the host is. Is it 6.13-rc1 as well? > > > > > > > > > > > > > > > > No, host is 6.6.60. > > > > > > > > > > > > > > Right. I wouldn't be surprised if: > > > > > > > > > > > > > > - this v6.6 kernel doesn't hide the MPAM feature as it should (and > > > > > > > that's proably something we should backport) > > > > > > > > > > > > How to confirm this? Currently I cannot find any (case-insensitive) > > > > > > "MPAM" files in /sys, nor mpam string in /proc/cpuinfo, nor MPAM > > > > > > strings in `strace -v` (as it decodes some KVM ioctls) of qemu process. > > > > > > > > > > > > > > > > > > > > - you get a nastygram in the host log telling you that the guest has > > > > > > > executed something it shouldn't (you'll get the encoding of the > > > > > > > instruction) > > > > > > > > > > > > I requested admins of the box for dmesg output since I don't have root > > > > > > access myself and nowadays dmesg is not accessible for a user. > > > > > > > > > > This is what they reported: > > > > > > > > > > kvm [2502822]: Unsupported guest sys_reg access at: ffff80008003e9f0 > > > > > [000000c5] > > > > > { Op0( 3), Op1( 0), CRn(10), CRm( 4), Op2( 4), func_read }, > > > > > > > > > > > > > As Will pointed out I think this is access to MPAMIDR_EL1 and is from this > > > > code here, > > > > > > > > +++ b/arch/arm64/kernel/cpuinfo.c > > > > @@ -478,6 +478,9 @@ static void __cpuinfo_store_cpu(struct cpuinfo_arm64 *info) > > > > if (id_aa64pfr0_32bit_el0(info->reg_id_aa64pfr0)) > > > > __cpuinfo_store_cpu_32bit(&info->aarch32); > > > > > > > > + if (id_aa64pfr0_mpam(info->reg_id_aa64pfr0)) > > > > + info->reg_mpamidr = read_cpuid(MPAMIDR_EL1); > > > > + > > > > cpuinfo_detect_icache_policy(info); > > > > } > > > > > > > > I did manage to boot my setup in 6.6 and this is what happens, > > > > > > > > Host kernel 6.6 > > > > Guest Kernel 6.13-rc1 > > > > > > > > [ 0.195392] smp: Brought up 1 node, 8 CPUs > > > > [ 0.219000] SMP: Total of 8 processors activated. > > > > [ 0.219629] CPU: All CPU(s) started at EL1 > > > > ... > > > > [ 0.223212] CPU features: detected: RAS Extension Support > > > > [ 0.223927] CPU features: detected: Memory Partitioning And Monitoring > > > > [ 0.224796] CPU features: detected: Memory Partitioning And Monitoring Virtualisation > > > > [ 0.225961] alternatives: applying system-wide alternatives > > > > ... > > > > > > > > Guest detects MPAM and boots fine. > > > > > > > > Host kernel 6.13-rc1 > > > > Guest Kernel 6.13-rc1 > > > > > > > > [ 0.196625] smp: Brought up 1 node, 8 CPUs > > > > [ 0.222093] SMP: Total of 8 processors activated. > > > > [ 0.222769] CPU: All CPU(s) started at EL1 > > > > ... > > > > [ 0.226620] CPU features: detected: RAS Extension Support > > > > [ 0.227453] alternatives: applying system-wide alternatives > > > > > > > > MPAM is not visible to Guest in this case. > > > > > > > > So as I pointed out earlier could it be a case where the ID register reports MPAM support > > > > but the firmware has not enabled MPAM? > > > > > > > > James seems to be mentioning that case here, > > > > > > > > " (If you have a boot failure that bisects here its likely your CPUs > > > > advertise MPAM in the id registers, but firmware failed to either enable > > > > or MPAM, or emulate the trap as if it were disabled)" > > > > > > I tried to verify that MPAM is advertised with qemu+gdb method, as > > > suggested by Oliver, but ID_AA64PFR0_EL1 register is not there. > > > > > > (gdb) i r ID_AA64PFR0_EL1 > > > Invalid register `ID_AA64PFR0_EL1' > > > > Then there is a bug in either QEMU or the GDB stubs. This register > > exists, or you wouldn't be here. > > > In case this is useful: > > builder@aarch64:/.in$ qemu-system-aarch64 --version > QEMU emulator version 9.1.1 (qemu-9.1.1-alt2) > Copyright (c) 2003-2024 Fabrice Bellard and the QEMU Project developers > builder@aarch64:/.in$ gdb --version > GNU gdb (GDB) 14.1.0.56.d739d4fd457-alt1 (ALT Sisyphus) > Copyright (C) 2023 Free Software Foundation, Inc. > License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> > This is free software: you are free to change and redistribute it. > There is NO WARRANTY, to the extent permitted by law. > > Is there way to get content of this register with these possible > gdb/qemu bugs? I have no idea. And frankly, I don't think this matters. > Perhaps, we can add some debugging print in guest kernel. > > > > > > > Are there other suggestions? > > > > Mark has described what the problem is likely to be. 6.6-stable needs > > to have 6685f5d572c22e10 backported, and it probably should have been > > Cc: to stable. Can you please apply the following patch to your *host* > > machine and retest? > > Unfortunately I cannot. But I can apply patches to the guest kernel. [I > will try to convince admins of the server to apply the patch, though, but > this can take time, and they can refuse since this is production build > server and it's update procedure is complicated.] Then I really cannot help you. I'm not going to paper over a hypervisor bug in the guest kernel, and if you/they are happy to run with critical bugs in your production machine, that's about it then. M. -- Without deviation from the norm, progress is not possible. ^ permalink raw reply [flat|nested] 22+ messages in thread
* RE: v6.13-rc1: Internal error: Oops - Undefined instruction: 0000000002000000 [#1] SMP 2024-12-04 18:34 ` Vitaly Chikunov 2024-12-04 19:13 ` Marc Zyngier @ 2024-12-05 8:53 ` Shameerali Kolothum Thodi 1 sibling, 0 replies; 22+ messages in thread From: Shameerali Kolothum Thodi @ 2024-12-05 8:53 UTC (permalink / raw) To: Vitaly Chikunov, Marc Zyngier Cc: Will Deacon, james.morse@arm.com, linux-arm-kernel@lists.infradead.org, Catalin Marinas, linux-kernel@vger.kernel.org, oliver.upton@linux.dev, mark.rutland@arm.com, Wangzhou (B), Gleb Fotengauer-Malinovskiy, wanghuiqiang > -----Original Message----- > From: Vitaly Chikunov <vt@altlinux.org> > Sent: Wednesday, December 4, 2024 6:35 PM > To: Marc Zyngier <maz@kernel.org> > Cc: Shameerali Kolothum Thodi > <shameerali.kolothum.thodi@huawei.com>; Will Deacon > <will@kernel.org>; james.morse@arm.com; linux-arm- > kernel@lists.infradead.org; Catalin Marinas <catalin.marinas@arm.com>; > linux-kernel@vger.kernel.org; oliver.upton@linux.dev; > mark.rutland@arm.com; Wangzhou (B) <wangzhou1@hisilicon.com>; Gleb > Fotengauer-Malinovskiy <glebfm@altlinux.org> > Subject: Re: v6.13-rc1: Internal error: Oops - Undefined instruction: > 0000000002000000 [#1] SMP > > > > > Is there a way you can find out the BIOS version on that board? > > > > > > Unfortunately, admins of the server do not provide me with this > > > info. > > > > This doesn't really help, I'm afraid. > > They provided me with this dmicedcode output: > > Handle 0x0000, DMI type 0, 26 bytes > BIOS Information > Vendor: Huawei Corp. > Version: 1.25 The information I have from our BIOS team is that you need version 1.70 or above to support MPAM. Thanks, Shameer ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: v6.13-rc1: Internal error: Oops - Undefined instruction: 0000000002000000 [#1] SMP 2024-12-04 8:51 ` Marc Zyngier 2024-12-04 18:34 ` Vitaly Chikunov @ 2024-12-04 18:53 ` Vitaly Chikunov 2024-12-06 20:56 ` Vitaly Chikunov 2 siblings, 0 replies; 22+ messages in thread From: Vitaly Chikunov @ 2024-12-04 18:53 UTC (permalink / raw) To: Marc Zyngier Cc: Shameerali Kolothum Thodi, Will Deacon, james.morse@arm.com, linux-arm-kernel@lists.infradead.org, Catalin Marinas, linux-kernel@vger.kernel.org, oliver.upton@linux.dev, mark.rutland@arm.com, Wangzhou (B), Gleb Fotengauer-Malinovskiy Marc, On Wed, Dec 04, 2024 at 08:51:26AM +0000, Marc Zyngier wrote: > On Tue, 03 Dec 2024 22:14:53 +0000, > Vitaly Chikunov <vt@altlinux.org> wrote: > > > > I tried to verify that MPAM is advertised with qemu+gdb method, as > > suggested by Oliver, but ID_AA64PFR0_EL1 register is not there. > > > > (gdb) i r ID_AA64PFR0_EL1 > > Invalid register `ID_AA64PFR0_EL1' > > Then there is a bug in either QEMU or the GDB stubs. This register > exists, or you wouldn't be here. > > > > > Are there other suggestions? > > Mark has described what the problem is likely to be. 6.6-stable needs > to have 6685f5d572c22e10 backported, and it probably should have been > Cc: to stable. Can you please apply the following patch to your *host* > machine and retest? Looks like I will be able to test this. Thanks, > > diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c > index 370a1a7bd369..258a39bcd3c7 100644 > --- a/arch/arm64/kvm/sys_regs.c > +++ b/arch/arm64/kvm/sys_regs.c > @@ -1330,6 +1330,7 @@ static u64 __kvm_read_sanitised_id_reg(const struct kvm_vcpu *vcpu, > val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_MTE); > > val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_SME); > + val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_MPAM_frac); > break; > case SYS_ID_AA64ISAR1_EL1: > if (!vcpu_has_ptrauth(vcpu)) > @@ -1472,6 +1473,13 @@ static u64 read_sanitised_id_aa64pfr0_el1(struct kvm_vcpu *vcpu, > > val &= ~ID_AA64PFR0_EL1_AMU_MASK; > > + /* > + * MPAM is disabled by default as KVM also needs a set of PARTID to > + * program the MPAMVPMx_EL2 PARTID remapping registers with. But some > + * older kernels let the guest see the ID bit. > + */ > + val &= ~ID_AA64PFR0_EL1_MPAM_MASK; > + > return val; > } > > @@ -1560,6 +1568,29 @@ static int set_id_dfr0_el1(struct kvm_vcpu *vcpu, > return set_id_reg(vcpu, rd, val); > } > > +static int set_id_aa64pfr0_el1(struct kvm_vcpu *vcpu, > + const struct sys_reg_desc *rd, u64 user_val) > +{ > + u64 hw_val = read_sanitised_ftr_reg(SYS_ID_AA64PFR0_EL1); > + u64 mpam_mask = ID_AA64PFR0_EL1_MPAM_MASK; > + > + /* > + * Commit 011e5f5bf529f ("arm64/cpufeature: Add remaining feature bits > + * in ID_AA64PFR0 register") exposed the MPAM field of AA64PFR0_EL1 to > + * guests, but didn't add trap handling. KVM doesn't support MPAM and > + * always returns an UNDEF for these registers. The guest must see 0 > + * for this field. > + * > + * But KVM must also accept values from user-space that were provided > + * by KVM. On CPUs that support MPAM, permit user-space to write > + * the sanitizied value to ID_AA64PFR0_EL1.MPAM, but ignore this field. > + */ > + if ((hw_val & mpam_mask) == (user_val & mpam_mask)) > + user_val &= ~ID_AA64PFR0_EL1_MPAM_MASK; > + > + return set_id_reg(vcpu, rd, user_val); > +} > + > /* > * cpufeature ID register user accessors > * > @@ -2018,7 +2049,7 @@ static const struct sys_reg_desc sys_reg_descs[] = { > { SYS_DESC(SYS_ID_AA64PFR0_EL1), > .access = access_id_reg, > .get_user = get_id_reg, > - .set_user = set_id_reg, > + .set_user = set_id_aa64pfr0_el1, > .reset = read_sanitised_id_aa64pfr0_el1, > .val = ID_AA64PFR0_EL1_CSV2_MASK | ID_AA64PFR0_EL1_CSV3_MASK, }, > ID_SANITISED(ID_AA64PFR1_EL1), > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: v6.13-rc1: Internal error: Oops - Undefined instruction: 0000000002000000 [#1] SMP 2024-12-04 8:51 ` Marc Zyngier 2024-12-04 18:34 ` Vitaly Chikunov 2024-12-04 18:53 ` Vitaly Chikunov @ 2024-12-06 20:56 ` Vitaly Chikunov 2024-12-10 2:51 ` Vitaly Chikunov 2 siblings, 1 reply; 22+ messages in thread From: Vitaly Chikunov @ 2024-12-06 20:56 UTC (permalink / raw) To: Marc Zyngier Cc: Shameerali Kolothum Thodi, Will Deacon, james.morse@arm.com, linux-arm-kernel@lists.infradead.org, Catalin Marinas, linux-kernel@vger.kernel.org, oliver.upton@linux.dev, mark.rutland@arm.com, Wangzhou (B), Gleb Fotengauer-Malinovskiy Marc, On Wed, Dec 04, 2024 at 08:51:26AM +0000, Marc Zyngier wrote: > On Tue, 03 Dec 2024 22:14:53 +0000, > Vitaly Chikunov <vt@altlinux.org> wrote: > > > > Shameer, Marc, Oliver, Will, > > > > On Tue, Dec 03, 2024 at 10:03:11AM +0000, Shameerali Kolothum Thodi wrote: > > > > -----Original Message----- > > > > From: linux-arm-kernel <linux-arm-kernel-bounces@lists.infradead.org> On > > > > Behalf Of Vitaly Chikunov > > > > Sent: Tuesday, December 3, 2024 9:27 AM > > > > To: Marc Zyngier <maz@kernel.org> > > > > Cc: Will Deacon <will@kernel.org>; james.morse@arm.com; linux-arm- > > > > kernel@lists.infradead.org; Catalin Marinas <catalin.marinas@arm.com>; > > > > linux-kernel@vger.kernel.org; oliver.upton@linux.dev; > > > > mark.rutland@arm.com > > > > Subject: Re: v6.13-rc1: Internal error: Oops - Undefined instruction: > > > > 0000000002000000 [#1] SMP > > > > > > > > Marc, > > > > > > > > On Tue, Dec 03, 2024 at 01:31:19AM +0300, Vitaly Chikunov wrote: > > > > > On Mon, Dec 02, 2024 at 04:07:03PM +0000, Marc Zyngier wrote: > > > > > > On Mon, 02 Dec 2024 15:59:40 +0000, > > > > > > Vitaly Chikunov <vt@altlinux.org> wrote: > > > > > > > > > > > > > > Marc, > > > > > > > > > > > > > > On Mon, Dec 02, 2024 at 03:53:59PM +0000, Marc Zyngier wrote: > > > > > > > > > > > > > > > > What the log doesn't say is what the host is. Is it 6.13-rc1 as well? > > > > > > > > > > > > > > No, host is 6.6.60. > > > > > > > > > > > > Right. I wouldn't be surprised if: > > > > > > > > > > > > - this v6.6 kernel doesn't hide the MPAM feature as it should (and > > > > > > that's proably something we should backport) > > > > > > > > > > How to confirm this? Currently I cannot find any (case-insensitive) > > > > > "MPAM" files in /sys, nor mpam string in /proc/cpuinfo, nor MPAM > > > > > strings in `strace -v` (as it decodes some KVM ioctls) of qemu process. > > > > > > > > > > > > > > > > > - you get a nastygram in the host log telling you that the guest has > > > > > > executed something it shouldn't (you'll get the encoding of the > > > > > > instruction) > > > > > > > > > > I requested admins of the box for dmesg output since I don't have root > > > > > access myself and nowadays dmesg is not accessible for a user. > > > > > > > > This is what they reported: > > > > > > > > kvm [2502822]: Unsupported guest sys_reg access at: ffff80008003e9f0 > > > > [000000c5] > > > > { Op0( 3), Op1( 0), CRn(10), CRm( 4), Op2( 4), func_read }, > > > > > > > > > > As Will pointed out I think this is access to MPAMIDR_EL1 and is from this > > > code here, > > > > > > +++ b/arch/arm64/kernel/cpuinfo.c > > > @@ -478,6 +478,9 @@ static void __cpuinfo_store_cpu(struct cpuinfo_arm64 *info) > > > if (id_aa64pfr0_32bit_el0(info->reg_id_aa64pfr0)) > > > __cpuinfo_store_cpu_32bit(&info->aarch32); > > > > > > + if (id_aa64pfr0_mpam(info->reg_id_aa64pfr0)) > > > + info->reg_mpamidr = read_cpuid(MPAMIDR_EL1); > > > + > > > cpuinfo_detect_icache_policy(info); > > > } > > > > > > I did manage to boot my setup in 6.6 and this is what happens, > > > > > > Host kernel 6.6 > > > Guest Kernel 6.13-rc1 > > > > > > [ 0.195392] smp: Brought up 1 node, 8 CPUs > > > [ 0.219000] SMP: Total of 8 processors activated. > > > [ 0.219629] CPU: All CPU(s) started at EL1 > > > ... > > > [ 0.223212] CPU features: detected: RAS Extension Support > > > [ 0.223927] CPU features: detected: Memory Partitioning And Monitoring > > > [ 0.224796] CPU features: detected: Memory Partitioning And Monitoring Virtualisation > > > [ 0.225961] alternatives: applying system-wide alternatives > > > ... > > > > > > Guest detects MPAM and boots fine. > > > > > > Host kernel 6.13-rc1 > > > Guest Kernel 6.13-rc1 > > > > > > [ 0.196625] smp: Brought up 1 node, 8 CPUs > > > [ 0.222093] SMP: Total of 8 processors activated. > > > [ 0.222769] CPU: All CPU(s) started at EL1 > > > ... > > > [ 0.226620] CPU features: detected: RAS Extension Support > > > [ 0.227453] alternatives: applying system-wide alternatives > > > > > > MPAM is not visible to Guest in this case. > > > > > > So as I pointed out earlier could it be a case where the ID register reports MPAM support > > > but the firmware has not enabled MPAM? > > > > > > James seems to be mentioning that case here, > > > > > > " (If you have a boot failure that bisects here its likely your CPUs > > > advertise MPAM in the id registers, but firmware failed to either enable > > > or MPAM, or emulate the trap as if it were disabled)" > > > > I tried to verify that MPAM is advertised with qemu+gdb method, as > > suggested by Oliver, but ID_AA64PFR0_EL1 register is not there. > > > > (gdb) i r ID_AA64PFR0_EL1 > > Invalid register `ID_AA64PFR0_EL1' > > Then there is a bug in either QEMU or the GDB stubs. This register > exists, or you wouldn't be here. > > > > > Are there other suggestions? > > Mark has described what the problem is likely to be. 6.6-stable needs > to have 6685f5d572c22e10 backported, and it probably should have been > Cc: to stable. Can you please apply the following patch to your *host* > machine and retest? We tested the host with this patch applied over 6.6.63 and 6.13-rc1 guest does not Oops anymore. I'd suggest this is also get backported to 6.12.y branch. Thanks, > > diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c > index 370a1a7bd369..258a39bcd3c7 100644 > --- a/arch/arm64/kvm/sys_regs.c > +++ b/arch/arm64/kvm/sys_regs.c > @@ -1330,6 +1330,7 @@ static u64 __kvm_read_sanitised_id_reg(const struct kvm_vcpu *vcpu, > val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_MTE); > > val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_SME); > + val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_MPAM_frac); > break; > case SYS_ID_AA64ISAR1_EL1: > if (!vcpu_has_ptrauth(vcpu)) > @@ -1472,6 +1473,13 @@ static u64 read_sanitised_id_aa64pfr0_el1(struct kvm_vcpu *vcpu, > > val &= ~ID_AA64PFR0_EL1_AMU_MASK; > > + /* > + * MPAM is disabled by default as KVM also needs a set of PARTID to > + * program the MPAMVPMx_EL2 PARTID remapping registers with. But some > + * older kernels let the guest see the ID bit. > + */ > + val &= ~ID_AA64PFR0_EL1_MPAM_MASK; > + > return val; > } > > @@ -1560,6 +1568,29 @@ static int set_id_dfr0_el1(struct kvm_vcpu *vcpu, > return set_id_reg(vcpu, rd, val); > } > > +static int set_id_aa64pfr0_el1(struct kvm_vcpu *vcpu, > + const struct sys_reg_desc *rd, u64 user_val) > +{ > + u64 hw_val = read_sanitised_ftr_reg(SYS_ID_AA64PFR0_EL1); > + u64 mpam_mask = ID_AA64PFR0_EL1_MPAM_MASK; > + > + /* > + * Commit 011e5f5bf529f ("arm64/cpufeature: Add remaining feature bits > + * in ID_AA64PFR0 register") exposed the MPAM field of AA64PFR0_EL1 to > + * guests, but didn't add trap handling. KVM doesn't support MPAM and > + * always returns an UNDEF for these registers. The guest must see 0 > + * for this field. > + * > + * But KVM must also accept values from user-space that were provided > + * by KVM. On CPUs that support MPAM, permit user-space to write > + * the sanitizied value to ID_AA64PFR0_EL1.MPAM, but ignore this field. > + */ > + if ((hw_val & mpam_mask) == (user_val & mpam_mask)) > + user_val &= ~ID_AA64PFR0_EL1_MPAM_MASK; > + > + return set_id_reg(vcpu, rd, user_val); > +} > + > /* > * cpufeature ID register user accessors > * > @@ -2018,7 +2049,7 @@ static const struct sys_reg_desc sys_reg_descs[] = { > { SYS_DESC(SYS_ID_AA64PFR0_EL1), > .access = access_id_reg, > .get_user = get_id_reg, > - .set_user = set_id_reg, > + .set_user = set_id_aa64pfr0_el1, > .reset = read_sanitised_id_aa64pfr0_el1, > .val = ID_AA64PFR0_EL1_CSV2_MASK | ID_AA64PFR0_EL1_CSV3_MASK, }, > ID_SANITISED(ID_AA64PFR1_EL1), > > > > https://lore.kernel.org/all/20241030160317.2528209-4-joey.gouly@arm.com/ > > > > > > Is there a way you can find out the BIOS version on that board? > > > > Unfortunately, admins of the server do not provide me with this > > info. > > This doesn't really help, I'm afraid. > > > For such cases, when MPAM is incorrectly advertised, can we have kernel > > command line parameter like mpam=0 to override it's detection? > > We could, but only when we can confirm what the problem is. > > > I think with "If you have a boot failure that bisects here" it's > > acknowledged possibility and it's confirmed by our server. > > Not really. This talks about firmware. We are debugging the hypervisor > here. This might be closely related, but these are not the same > things. > > Thanks, > > M. > > -- > Without deviation from the norm, progress is not possible. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: v6.13-rc1: Internal error: Oops - Undefined instruction: 0000000002000000 [#1] SMP 2024-12-06 20:56 ` Vitaly Chikunov @ 2024-12-10 2:51 ` Vitaly Chikunov 2024-12-10 9:55 ` Marc Zyngier 0 siblings, 1 reply; 22+ messages in thread From: Vitaly Chikunov @ 2024-12-10 2:51 UTC (permalink / raw) To: linux-arm-kernel, kvmarm, Marc Zyngier, Oliver Upton Cc: Shameerali Kolothum Thodi, Will Deacon, james.morse@arm.com, Catalin Marinas, linux-kernel@vger.kernel.org, Mark Rutland, Wangzhou (B), Gleb Fotengauer-Malinovskiy On Fri, Dec 06, 2024 at 11:56:02PM +0300, Vitaly Chikunov wrote: > On Wed, Dec 04, 2024 at 08:51:26AM +0000, Marc Zyngier wrote: > > On Tue, 03 Dec 2024 22:14:53 +0000, > > Vitaly Chikunov <vt@altlinux.org> wrote: > > > On Tue, Dec 03, 2024 at 10:03:11AM +0000, Shameerali Kolothum Thodi wrote: > > > > Mark has described what the problem is likely to be. 6.6-stable needs > > to have 6685f5d572c22e10 backported, and it probably should have been > > Cc: to stable. Can you please apply the following patch to your *host* > > machine and retest? > > We tested the host with this patch applied over 6.6.63 and 6.13-rc1 > guest does not Oops anymore. > > I'd suggest this is also get backported to 6.12.y branch. Please, can someone backport this patch to v6.12 and send to stable? This would be really useful to have this fixed and it's noted this is a critical bug. Thanks, > > Thanks, > > > > > diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c > > index 370a1a7bd369..258a39bcd3c7 100644 > > --- a/arch/arm64/kvm/sys_regs.c > > +++ b/arch/arm64/kvm/sys_regs.c > > @@ -1330,6 +1330,7 @@ static u64 __kvm_read_sanitised_id_reg(const struct kvm_vcpu *vcpu, > > val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_MTE); > > > > val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_SME); > > + val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_EL1_MPAM_frac); > > break; > > case SYS_ID_AA64ISAR1_EL1: > > if (!vcpu_has_ptrauth(vcpu)) > > @@ -1472,6 +1473,13 @@ static u64 read_sanitised_id_aa64pfr0_el1(struct kvm_vcpu *vcpu, > > > > val &= ~ID_AA64PFR0_EL1_AMU_MASK; > > > > + /* > > + * MPAM is disabled by default as KVM also needs a set of PARTID to > > + * program the MPAMVPMx_EL2 PARTID remapping registers with. But some > > + * older kernels let the guest see the ID bit. > > + */ > > + val &= ~ID_AA64PFR0_EL1_MPAM_MASK; > > + > > return val; > > } > > > > @@ -1560,6 +1568,29 @@ static int set_id_dfr0_el1(struct kvm_vcpu *vcpu, > > return set_id_reg(vcpu, rd, val); > > } > > > > +static int set_id_aa64pfr0_el1(struct kvm_vcpu *vcpu, > > + const struct sys_reg_desc *rd, u64 user_val) > > +{ > > + u64 hw_val = read_sanitised_ftr_reg(SYS_ID_AA64PFR0_EL1); > > + u64 mpam_mask = ID_AA64PFR0_EL1_MPAM_MASK; > > + > > + /* > > + * Commit 011e5f5bf529f ("arm64/cpufeature: Add remaining feature bits > > + * in ID_AA64PFR0 register") exposed the MPAM field of AA64PFR0_EL1 to > > + * guests, but didn't add trap handling. KVM doesn't support MPAM and > > + * always returns an UNDEF for these registers. The guest must see 0 > > + * for this field. > > + * > > + * But KVM must also accept values from user-space that were provided > > + * by KVM. On CPUs that support MPAM, permit user-space to write > > + * the sanitizied value to ID_AA64PFR0_EL1.MPAM, but ignore this field. > > + */ > > + if ((hw_val & mpam_mask) == (user_val & mpam_mask)) > > + user_val &= ~ID_AA64PFR0_EL1_MPAM_MASK; > > + > > + return set_id_reg(vcpu, rd, user_val); > > +} > > + > > /* > > * cpufeature ID register user accessors > > * > > @@ -2018,7 +2049,7 @@ static const struct sys_reg_desc sys_reg_descs[] = { > > { SYS_DESC(SYS_ID_AA64PFR0_EL1), > > .access = access_id_reg, > > .get_user = get_id_reg, > > - .set_user = set_id_reg, > > + .set_user = set_id_aa64pfr0_el1, > > .reset = read_sanitised_id_aa64pfr0_el1, > > .val = ID_AA64PFR0_EL1_CSV2_MASK | ID_AA64PFR0_EL1_CSV3_MASK, }, > > ID_SANITISED(ID_AA64PFR1_EL1), > > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: v6.13-rc1: Internal error: Oops - Undefined instruction: 0000000002000000 [#1] SMP 2024-12-10 2:51 ` Vitaly Chikunov @ 2024-12-10 9:55 ` Marc Zyngier 0 siblings, 0 replies; 22+ messages in thread From: Marc Zyngier @ 2024-12-10 9:55 UTC (permalink / raw) To: Vitaly Chikunov Cc: linux-arm-kernel, kvmarm, Oliver Upton, Shameerali Kolothum Thodi, Will Deacon, james.morse@arm.com, Catalin Marinas, linux-kernel@vger.kernel.org, Mark Rutland, Wangzhou (B), Gleb Fotengauer-Malinovskiy On Tue, 10 Dec 2024 02:51:27 +0000, Vitaly Chikunov <vt@altlinux.org> wrote: > > On Fri, Dec 06, 2024 at 11:56:02PM +0300, Vitaly Chikunov wrote: > > On Wed, Dec 04, 2024 at 08:51:26AM +0000, Marc Zyngier wrote: > > > On Tue, 03 Dec 2024 22:14:53 +0000, > > > Vitaly Chikunov <vt@altlinux.org> wrote: > > > > On Tue, Dec 03, 2024 at 10:03:11AM +0000, Shameerali Kolothum Thodi wrote: > > > > > > Mark has described what the problem is likely to be. 6.6-stable needs > > > to have 6685f5d572c22e10 backported, and it probably should have been > > > Cc: to stable. Can you please apply the following patch to your *host* > > > machine and retest? > > > > We tested the host with this patch applied over 6.6.63 and 6.13-rc1 > > guest does not Oops anymore. > > > > I'd suggest this is also get backported to 6.12.y branch. > > Please, can someone backport this patch to v6.12 and send to stable? This > would be really useful to have this fixed and it's noted this is a > critical bug. Look: it took you a few good days to simply *report* that the hack I gave you was helping. Surely you can similarly wait for a few days for the interested parties to spin proper patches and *test* them? And if you really cannot wait, feel free to backport this change to whatever stable release you feel needs to be fixed and post the result. I promise that I will review it. After all, you are so far the only person with both the HW and the problem. Thanks, M. -- Without deviation from the norm, progress is not possible. ^ permalink raw reply [flat|nested] 22+ messages in thread
* RE: v6.13-rc1: Internal error: Oops - Undefined instruction: 0000000002000000 [#1] SMP 2024-12-02 15:36 ` Will Deacon 2024-12-02 15:53 ` Marc Zyngier @ 2024-12-02 16:06 ` Shameerali Kolothum Thodi 1 sibling, 0 replies; 22+ messages in thread From: Shameerali Kolothum Thodi @ 2024-12-02 16:06 UTC (permalink / raw) To: Will Deacon, Vitaly Chikunov, james.morse@arm.com, joey.gouly@arm.com Cc: linux-arm-kernel@lists.infradead.org, Catalin Marinas, linux-kernel@vger.kernel.org, maz@kernel.org, oliver.upton@linux.dev, mark.rutland@arm.com > -----Original Message----- > From: linux-arm-kernel <linux-arm-kernel-bounces@lists.infradead.org> On > Behalf Of Will Deacon > Sent: Monday, December 2, 2024 3:36 PM > To: Vitaly Chikunov <vt@altlinux.org>; james.morse@arm.com > Cc: linux-arm-kernel@lists.infradead.org; Catalin Marinas > <catalin.marinas@arm.com>; linux-kernel@vger.kernel.org; > maz@kernel.org; oliver.upton@linux.dev; mark.rutland@arm.com > Subject: Re: v6.13-rc1: Internal error: Oops - Undefined instruction: > 0000000002000000 [#1] SMP > > [+ usual suspects] > > On Mon, Dec 02, 2024 at 07:58:30AM +0300, Vitaly Chikunov wrote: > > v6.13-rc1 exhibits a boot failure on aarch64 under KVM. (QEMU 9.1.1, > > CPU Kunpeng-920). Boot log: > > I've not tried to repro this locally, but from the log: > > > + time qemu-system-aarch64 -M accel=kvm:tcg -smp cores=8 -m 4096 - > serial mon:stdio -nodefaults -nographic -no-reboot -fsdev > local,id=root,path=/,security_model=none,multidevs=remap -device virtio- > 9p-pci,fsdev=root,mount_tag=virtio-9p:/ -device virtio-rng-pci -kernel > /usr/src/tmp/kernel-image-6.13-buildroot/boot/vmlinuz-6.13.0-6.13-alt0.rc1 > -initrd /usr/src/tmp/initramfs-6.13.0-6.13-alt0.rc1.img -sandbox > on,spawn=deny -M virt,gic-version=3 -cpu max -append 'console=ttyAMA0 > mitigations=off nokaslr panic=-1 SCRIPT=/usr/src/tmp/vm.SchsIm2FjB > earlycon earlyprintk=serial ignore_loglevel debug rddebug' > > [ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x481fd010] > > [ 0.000000] Linux version 6.13.0-6.13-alt0.rc1 > (builder@localhost.localdomain) (gcc-14 (GCC) 14.2.1 20241028 (ALT > Sisyphus 14.2.1-alt1), GNU ld (GNU Binutils) 2.43.1.20241025) #1 SMP > PREEMPT_DYNAMIC Mon Dec 2 03:33:29 UTC 2024 > > [ 0.000000] KASLR disabled on command line > > [ 0.000000] random: crng init done > > [ 0.000000] Machine model: linux,dummy-virt > > [ 0.000000] printk: debug: ignoring loglevel setting. > > [ 0.000000] efi: UEFI not found. > > [ 0.000000] earlycon: pl11 at MMIO 0x0000000009000000 (options '') > > [ 0.000000] printk: legacy bootconsole [pl11] enabled > > [ 0.000000] OF: reserved mem: Reserved memory: No reserved-memory > node in the DT > > [ 0.000000] NUMA: Faking a node at [mem 0x0000000040000000- > 0x000000013fffffff] > > [ 0.000000] NODE_DATA(0) allocated [mem 0x13f7f3540-0x13f7f947f] > > [ 0.000000] Zone ranges: > > [ 0.000000] DMA [mem 0x0000000040000000-0x00000000ffffffff] > > [ 0.000000] DMA32 empty > > [ 0.000000] Normal [mem 0x0000000100000000-0x000000013fffffff] > > [ 0.000000] Movable zone start for each node > > [ 0.000000] Early memory node ranges > > [ 0.000000] node 0: [mem 0x0000000040000000-0x000000013fffffff] > > [ 0.000000] Initmem setup node 0 [mem 0x0000000040000000- > 0x000000013fffffff] > > [ 0.000000] cma: Reserved 256 MiB at 0x00000000f0000000 on node -1 > > [ 0.000000] psci: probing for conduit method from DT. > > [ 0.000000] psci: PSCIv1.1 detected in firmware. > > [ 0.000000] psci: Using standard PSCI v0.2 function IDs > > [ 0.000000] psci: Trusted OS migration not required > > [ 0.000000] psci: SMC Calling Convention v1.1 > > [ 0.000000] smccc: KVM: hypervisor services detected (0x00000000 > 0x00000000 0x00000000 0x00000003) > > [ 0.000000] percpu: Embedded 34 pages/cpu s100632 r8192 d30440 > u139264 > > [ 0.000000] pcpu-alloc: s100632 r8192 d30440 u139264 alloc=34*4096 > > [ 0.000000] pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3 [0] 4 [0] 5 [0] 6 [0] 7 > > [ 0.000000] Internal error: Oops - Undefined instruction: > 0000000002000000 [#1] SMP > > We take an undefined instruction exception in the kernel early during > boot... > > > [ 0.000000] Modules linked in: > > [ 0.000000] CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.13.0-6.13- > alt0.rc1 #1 > > [ 0.000000] Hardware name: linux,dummy-virt (DT) > > [ 0.000000] pstate: 004000c5 (nzcv daIF +PAN -UAO -TCO -DIT -SSBS > BTYPE=--) > > [ 0.000000] pc : __cpuinfo_store_cpu+0xe8/0x240 > > [ 0.000000] lr : cpuinfo_store_boot_cpu+0x34/0x88 > > [ 0.000000] sp : ffff800082013df0 > > [ 0.000000] x29: ffff800082013df0 x28: 000000000000008e x27: > ffff800081e38128 > > [ 0.000000] x26: ffff800081702190 x25: ffff80008201f040 x24: > ffff0000ff7d1d00 > > [ 0.000000] x23: ffff80008201ec00 x22: ffff800081e39100 x21: > ffff8000816f9750 > > [ 0.000000] x20: ffff800081f55280 x19: ffff0000ff6be2e0 x18: > 0000000000000000 > > [ 0.000000] x17: 0000000000000000 x16: 0000000000000000 x15: > 0000000000000000 > > [ 0.000000] x14: 000000000000002f x13: 000000013f7f9490 x12: > 0000008000000000 > > [ 0.000000] x11: 0000000000000000 x10: 00000000007f8000 x9 : > 000000013f808000 > > [ 0.000000] x8 : 0000000000000000 x7 : 0000000000000000 x6 : > 000000013f7f94c0 > > [ 0.000000] x5 : 0000000000000000 x4 : 0000000000000000 x3 : > 1100010011111111 > > [ 0.000000] x2 : 0000000000000001 x1 : 0000000084448004 x0 : > ffff0000ff6be2e0 > > [ 0.000000] Call trace: > > [ 0.000000] __cpuinfo_store_cpu+0xe8/0x240 (P) > > [ 0.000000] cpuinfo_store_boot_cpu+0x34/0x88 (L) > > [ 0.000000] cpuinfo_store_boot_cpu+0x34/0x88 > > [ 0.000000] smp_prepare_boot_cpu+0x30/0x58 > > [ 0.000000] start_kernel+0x514/0x9d0 > > [ 0.000000] __primary_switched+0x88/0x98 > > [ 0.000000] Code: f100085f 54000600 f2580c7f 54000060 (d538a482) > > ... and that's: > > 0: f100085f cmp x2, #0x2 > 4: 54000600 b.eq 0xc4 // b.none > 8: f2580c7f tst x3, #0xf0000000000 > c: 54000060 b.eq 0x18 // b.none > 10:* d538a482 mrs x2, s3_0_c10_c4_4 <-- trapping > instruction > > Which I think corresponds to a read of MPAMIDR_EL1. > > It looks like James routed accesses to this register to undef_access() in > 31ff96c38ea3 ("KVM: arm64: Fix missing traps of guest accesses to the > MPAM register") so I'm not really sure how this is supposed to work given > that it's an ID register. It probably requires a firmware update as noted here, https://lore.kernel.org/all/20241030160317.2528209-4-joey.gouly@arm.com/ I just tried it on a similar system with MPAM enabled and was not able to replicate the above error. Thanks, Shameer ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2024-12-10 10:09 UTC | newest] Thread overview: 22+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-12-02 4:58 v6.13-rc1: Internal error: Oops - Undefined instruction: 0000000002000000 [#1] SMP Vitaly Chikunov 2024-12-02 15:36 ` Will Deacon 2024-12-02 15:53 ` Marc Zyngier 2024-12-02 15:59 ` Vitaly Chikunov 2024-12-02 16:07 ` Marc Zyngier 2024-12-02 17:53 ` Mark Rutland 2024-12-02 22:31 ` Vitaly Chikunov 2024-12-03 1:19 ` Oliver Upton 2024-12-03 4:03 ` Vitaly Chikunov 2024-12-05 2:09 ` Vitaly Chikunov 2024-12-03 9:27 ` Vitaly Chikunov 2024-12-03 10:03 ` Shameerali Kolothum Thodi 2024-12-03 22:14 ` Vitaly Chikunov 2024-12-04 8:51 ` Marc Zyngier 2024-12-04 18:34 ` Vitaly Chikunov 2024-12-04 19:13 ` Marc Zyngier 2024-12-05 8:53 ` Shameerali Kolothum Thodi 2024-12-04 18:53 ` Vitaly Chikunov 2024-12-06 20:56 ` Vitaly Chikunov 2024-12-10 2:51 ` Vitaly Chikunov 2024-12-10 9:55 ` Marc Zyngier 2024-12-02 16:06 ` Shameerali Kolothum Thodi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).