From: Marc Zyngier <maz@kernel.org>
To: Mark Brown <broonie@kernel.org>, Oliver Upton <oliver.upton@linux.dev>
Cc: kvmarm@lists.linux.dev, Joey Gouly <joey.gouly@arm.com>,
Suzuki K Poulose <suzuki.poulose@arm.com>,
Zenghui Yu <yuzenghui@huawei.com>
Subject: Re: [PATCH v3 18/27] KVM: arm64: nv: Take "masked" aborts to EL2 when HCRX_EL2.TMEA is set
Date: Sun, 20 Jul 2025 11:36:30 +0100 [thread overview]
Message-ID: <87v7nnup1t.wl-maz@kernel.org> (raw)
In-Reply-To: <18535df8-e647-4643-af9a-bb780af03a70@sirena.org.uk>
[Adding Zenghui to the list. In the future, please CC all reviewers
listed in MAINTAINERS, not just an arbitrary selection]
On Fri, 18 Jul 2025 23:01:46 +0100,
Mark Brown <broonie@kernel.org> wrote:
>
> On Tue, Jul 08, 2025 at 10:25:23AM -0700, Oliver Upton wrote:
> > HCRX_EL2.TMEA further modifies the external abort behavior where
> > unmasked aborts are taken to EL1 and masked aborts are taken to EL2.
> > It's rather weird when you consider that SEAs are, well, *synchronous*
> > and therefore not actually maskable. However, for the purposes of
> > exception routing, they're considered "masked" if the A flag is set.
>
> For the past few days the external_aborts KVM selftest has been failing
> in -next on a number of platforms with:
>
> # selftests: kvm: external_aborts
> # Random seed: 0x6b8b4567
> # ==== Test Assertion Failure ====
> # arm64/external_aborts.c:19: regs->pc == expected_abort_pc
> # pid=2598 tid=2598 errno=4 - Interrupted system call
> # 1 0x0000000000402f93: __vcpu_run_expect at external_aborts.c:85
> # 2 0x0000000000402197: vcpu_run_expect_done at external_aborts.c:97
> # 3 (inlined by) test_mmio_abort at external_aborts.c:136
> # 4 (inlined by) main at external_aborts.c:323
> # 5 0x0000ffffacbd7543: ?? ??:0
> # 6 0x0000ffffacbd7617: ?? ??:0
> # 7 0x000000000040272f: _start at ??:?
> # 0x0 != 0x4028f8 (regs->pc != expected_abort_pc)
> not ok 14 selftests: kvm: external_aborts # exit=254
>
> This appears to be happening on many, possibly all, VHE platforms being
> tested - nVHE appears fine. I ran a bisect, fixing the selftests
> version at the one in -next due to the renaming of this test, which
> pointed at this commit.
Thanks for reporting this.
It turns out that the exception entry code fully expects everything to
be loaded on the CPU when processing the exception. However, this is
no longer true when thins are injected from userspace. But really,
this is a pretty fragile expectation, and this absolutely needs
fixing.
I've posted a potential fix at [1], but it appears that this series as
further issues. If I run this very test in a nested guest (patch
applied on both L0 and L1), I get this:
bash-5.2# /host/home/maz/external_aborts
Random seed: 0x6b8b4567
[ 5.936631] PC = 402764
[ 5.942221] PC = 4027f4
[ 5.961351] SError Interrupt on CPU1, code 0x00000000be000000 -- SError
[ 5.961355] CPU: 1 UID: 0 PID: 64 Comm: external_aborts Not tainted 6.16.0-rc6-00163-ga03b7055c54b-dirty #4690 PREEMPT
[ 5.961357] Hardware name: linux,dummy-virt (DT)
[ 5.961358] pstate: 604000c9 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 5.961359] pc : __kvm_vcpu_run+0x30/0x70
[ 5.961364] lr : __kvm_vcpu_run+0x24/0x70
[ 5.961365] sp : ffff800080633ad0
[ 5.961366] x29: ffff800080633ad0 x28: ffff000004391280 x27: 0000000000000000
[ 5.961368] x26: 0000000000000000 x25: 0000000000000000 x24: ffff000004438048
[ 5.961369] x23: 0000000000000000 x22: ffff000002efc000 x21: 0000000000402820
[ 5.961371] x20: ffff000004391280 x19: ffff000004438000 x18: 0000000000000000
[ 5.961372] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
[ 5.961373] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
[ 5.961374] x11: 0000000000000000 x10: 0000000000000000 x9 : ffffbf8cfc0a78f0
[ 5.961375] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000
[ 5.961377] x5 : 0000000000001000 x4 : ffff407342580000 x3 : 0000000000000000
[ 5.961378] x2 : 0000000000000000 x1 : 00000000000000c0 x0 : 0000000000000000
[ 5.961379] Kernel panic - not syncing: Asynchronous SError Interrupt
[ 5.961381] CPU: 1 UID: 0 PID: 64 Comm: external_aborts Not tainted 6.16.0-rc6-00163-ga03b7055c54b-dirty #4690 PREEMPT
[ 5.961382] Hardware name: linux,dummy-virt (DT)
[ 5.961383] Call trace:
[ 5.961384] show_stack+0x20/0x38 (C)
[ 5.961388] dump_stack_lvl+0xc8/0xf8
[ 5.961392] dump_stack+0x18/0x28
[ 5.961393] panic+0x380/0x3e8
[ 5.961395] nmi_panic+0x48/0xa0
[ 5.961396] arm64_serror_panic+0x6c/0x88
[ 5.961398] arm64_is_fatal_ras_serror+0x8c/0x98
[ 5.961399] do_serror+0x3c/0x68
[ 5.961401] el1h_64_error_handler+0x38/0x60
[ 5.961404] el1h_64_error+0x80/0x88
[ 5.961405] __kvm_vcpu_run+0x30/0x70 (P)
[ 5.961406] kvm_arm_vcpu_enter_exit+0x64/0x98
[ 5.961408] kvm_arch_vcpu_ioctl_run+0x208/0x620
[ 5.961411] kvm_vcpu_ioctl+0x14c/0x9f8
[ 5.961414] __arm64_sys_ioctl+0x9c/0x100
[ 5.961416] invoke_syscall+0x50/0x120
[ 5.961417] el0_svc_common.constprop.0+0x48/0xf0
[ 5.961418] do_el0_svc+0x24/0x38
[ 5.961419] el0_svc+0x34/0xd8
[ 5.961421] el0t_64_sync_handler+0x10c/0x138
[ 5.961422] el0t_64_sync+0x1ac/0x1b0
[ 5.961425] SMP: stopping secondary CPUs
[ 5.961432] Kernel Offset: 0x3f8c7c000000 from 0xffff800080000000
[ 5.961433] PHYS_OFFSET: 0x80000000
[ 5.961433] CPU features: 0x01000,00000700,094f0d61,556ffea7
[ 5.961436] Memory Limit: none
[ 5.995295] ---[ end Kernel panic - not syncing: Asynchronous SError Interrupt ]---
where the L1 crashes due to L0 having reinjected the exception in L1.
Another thing is that a E2H==0 L1 never completes this test at all.
Nothing bad happens, but this is another indication of something being
off.
I'll try to investigate both issues if I get the time.
M.
[1] https://lore.kernel.org/r/20250720102229.179114-1-maz@kernel.org
--
Without deviation from the norm, progress is not possible.
next prev parent reply other threads:[~2025-07-20 10:36 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-08 17:25 [PATCH v3 00/27] KVM: arm64: SCTLR2, DoubleFault2, and NV external abort fixes Oliver Upton
2025-07-08 17:25 ` [PATCH v3 01/27] arm64: Detect FEAT_SCTLR2 Oliver Upton
2025-07-08 17:25 ` [PATCH v3 02/27] arm64: Detect FEAT_DoubleFault2 Oliver Upton
2025-07-08 17:25 ` [PATCH v3 03/27] KVM: arm64: Add helper to identify a nested context Oliver Upton
2025-07-08 17:25 ` [PATCH v3 04/27] KVM: arm64: Treat vCPU with pending SError as runnable Oliver Upton
2025-07-08 17:25 ` [PATCH v3 05/27] KVM: arm64: nv: Respect exception routing rules for SEAs Oliver Upton
2025-07-08 17:25 ` [PATCH v3 06/27] KVM: arm64: nv: Honor SError exception routing / masking Oliver Upton
2025-07-08 17:25 ` [PATCH v3 07/27] KVM: arm64: nv: Add FEAT_RAS vSError sys regs to table Oliver Upton
2025-07-08 17:25 ` [PATCH v3 08/27] KVM: arm64: nv: Use guest hypervisor's vSError state Oliver Upton
2025-07-08 17:39 ` Oliver Upton
2025-07-08 17:25 ` [PATCH v3 09/27] KVM: arm64: nv: Advertise support for FEAT_RAS Oliver Upton
2025-07-08 17:25 ` [PATCH v3 10/27] KVM: arm64: nv: Describe trap behavior of SCTLR2_EL1 Oliver Upton
2025-07-08 17:25 ` [PATCH v3 11/27] KVM: arm64: Wire up SCTLR2_ELx sysreg descriptors Oliver Upton
2025-07-08 17:25 ` [PATCH v3 12/27] KVM: arm64: Context switch SCTLR2_ELx when advertised to the guest Oliver Upton
2025-07-08 17:25 ` [PATCH v3 13/27] KVM: arm64: Enable SCTLR2 " Oliver Upton
2025-07-08 17:25 ` [PATCH v3 14/27] KVM: arm64: Describe SCTLR2_ELx RESx masks Oliver Upton
2025-07-08 17:25 ` [PATCH v3 15/27] KVM: arm64: Factor out helper for selecting exception target EL Oliver Upton
2025-07-08 17:25 ` [PATCH v3 16/27] KVM: arm64: nv: Ensure Address size faults affect correct ESR Oliver Upton
2025-07-08 17:25 ` [PATCH v3 17/27] KVM: arm64: Route SEAs to the SError vector when EASE is set Oliver Upton
2025-07-08 17:25 ` [PATCH v3 18/27] KVM: arm64: nv: Take "masked" aborts to EL2 when HCRX_EL2.TMEA " Oliver Upton
2025-07-18 22:01 ` Mark Brown
2025-07-20 10:36 ` Marc Zyngier [this message]
2025-07-20 11:45 ` Marc Zyngier
2025-07-08 17:25 ` [PATCH v3 19/27] KVM: arm64: nv: Honor SError routing effects of SCTLR2_ELx.NMEA Oliver Upton
2025-07-08 17:25 ` [PATCH v3 20/27] KVM: arm64: nv: Enable vSErrors when HCRX_EL2.TMEA is set Oliver Upton
2025-07-08 17:25 ` [PATCH v3 21/27] KVM: arm64: Advertise support for FEAT_SCTLR2 Oliver Upton
2025-07-08 17:25 ` [PATCH v3 22/27] KVM: arm64: Advertise support for FEAT_DoubleFault2 Oliver Upton
2025-07-08 17:25 ` [PATCH v3 23/27] KVM: arm64: Don't retire MMIO instruction w/ pending (emulated) SError Oliver Upton
2025-07-08 17:25 ` [PATCH v3 24/27] KVM: arm64: selftests: Add basic SError injection test Oliver Upton
2025-07-08 17:25 ` [PATCH v3 25/27] KVM: arm64: selftests: Test SEAs are taken to SError vector when EASE=1 Oliver Upton
2025-07-08 17:25 ` [PATCH v3 26/27] KVM: arm64: selftests: Add SCTLR2_EL1 to get-reg-list Oliver Upton
2025-07-08 17:25 ` [PATCH v3 27/27] KVM: arm64: selftests: Catch up set_id_regs with the kernel Oliver Upton
2025-07-08 19:00 ` [PATCH v3 00/27] KVM: arm64: SCTLR2, DoubleFault2, and NV external abort fixes Oliver Upton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87v7nnup1t.wl-maz@kernel.org \
--to=maz@kernel.org \
--cc=broonie@kernel.org \
--cc=joey.gouly@arm.com \
--cc=kvmarm@lists.linux.dev \
--cc=oliver.upton@linux.dev \
--cc=suzuki.poulose@arm.com \
--cc=yuzenghui@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.