All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marc Zyngier <maz@kernel.org>
To: Mark Brown <broonie@kernel.org>, Oliver Upton <oliver.upton@linux.dev>
Cc: kvmarm@lists.linux.dev, Joey Gouly <joey.gouly@arm.com>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	Zenghui Yu <yuzenghui@huawei.com>
Subject: Re: [PATCH v3 18/27] KVM: arm64: nv: Take "masked" aborts to EL2 when HCRX_EL2.TMEA is set
Date: Sun, 20 Jul 2025 12:45:08 +0100	[thread overview]
Message-ID: <87tt37ulvf.wl-maz@kernel.org> (raw)
In-Reply-To: <87v7nnup1t.wl-maz@kernel.org>

On Sun, 20 Jul 2025 11:36:30 +0100,
Marc Zyngier <maz@kernel.org> wrote:

[...]

> If I run this very test in a nested guest (patch applied on both L0
> and L1), I get this:
> 
> bash-5.2# /host/home/maz/external_aborts 
> Random seed: 0x6b8b4567
> [    5.936631] PC = 402764
> [    5.942221] PC = 4027f4
> [    5.961351] SError Interrupt on CPU1, code 0x00000000be000000 -- SError
> [    5.961355] CPU: 1 UID: 0 PID: 64 Comm: external_aborts Not tainted 6.16.0-rc6-00163-ga03b7055c54b-dirty #4690 PREEMPT 
> [    5.961357] Hardware name: linux,dummy-virt (DT)
> [    5.961358] pstate: 604000c9 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [    5.961359] pc : __kvm_vcpu_run+0x30/0x70
> [    5.961364] lr : __kvm_vcpu_run+0x24/0x70
> [    5.961365] sp : ffff800080633ad0
> [    5.961366] x29: ffff800080633ad0 x28: ffff000004391280 x27: 0000000000000000
> [    5.961368] x26: 0000000000000000 x25: 0000000000000000 x24: ffff000004438048
> [    5.961369] x23: 0000000000000000 x22: ffff000002efc000 x21: 0000000000402820
> [    5.961371] x20: ffff000004391280 x19: ffff000004438000 x18: 0000000000000000
> [    5.961372] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
> [    5.961373] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
> [    5.961374] x11: 0000000000000000 x10: 0000000000000000 x9 : ffffbf8cfc0a78f0
> [    5.961375] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000
> [    5.961377] x5 : 0000000000001000 x4 : ffff407342580000 x3 : 0000000000000000
> [    5.961378] x2 : 0000000000000000 x1 : 00000000000000c0 x0 : 0000000000000000
> [    5.961379] Kernel panic - not syncing: Asynchronous SError Interrupt
> [    5.961381] CPU: 1 UID: 0 PID: 64 Comm: external_aborts Not tainted 6.16.0-rc6-00163-ga03b7055c54b-dirty #4690 PREEMPT 
> [    5.961382] Hardware name: linux,dummy-virt (DT)
> [    5.961383] Call trace:
> [    5.961384]  show_stack+0x20/0x38 (C)
> [    5.961388]  dump_stack_lvl+0xc8/0xf8
> [    5.961392]  dump_stack+0x18/0x28
> [    5.961393]  panic+0x380/0x3e8
> [    5.961395]  nmi_panic+0x48/0xa0
> [    5.961396]  arm64_serror_panic+0x6c/0x88
> [    5.961398]  arm64_is_fatal_ras_serror+0x8c/0x98
> [    5.961399]  do_serror+0x3c/0x68
> [    5.961401]  el1h_64_error_handler+0x38/0x60
> [    5.961404]  el1h_64_error+0x80/0x88
> [    5.961405]  __kvm_vcpu_run+0x30/0x70 (P)
> [    5.961406]  kvm_arm_vcpu_enter_exit+0x64/0x98
> [    5.961408]  kvm_arch_vcpu_ioctl_run+0x208/0x620
> [    5.961411]  kvm_vcpu_ioctl+0x14c/0x9f8
> [    5.961414]  __arm64_sys_ioctl+0x9c/0x100
> [    5.961416]  invoke_syscall+0x50/0x120
> [    5.961417]  el0_svc_common.constprop.0+0x48/0xf0
> [    5.961418]  do_el0_svc+0x24/0x38
> [    5.961419]  el0_svc+0x34/0xd8
> [    5.961421]  el0t_64_sync_handler+0x10c/0x138
> [    5.961422]  el0t_64_sync+0x1ac/0x1b0
> [    5.961425] SMP: stopping secondary CPUs
> [    5.961432] Kernel Offset: 0x3f8c7c000000 from 0xffff800080000000
> [    5.961433] PHYS_OFFSET: 0x80000000
> [    5.961433] CPU features: 0x01000,00000700,094f0d61,556ffea7
> [    5.961436] Memory Limit: none
> [    5.995295] ---[ end Kernel panic - not syncing: Asynchronous SError Interrupt ]---
> 
> where the L1 crashes due to L0 having reinjected the exception in L1.
> 
> Another thing is that a E2H==0 L1 never completes this test at all.
> Nothing bad happens, but this is another indication of something being
> off.

Having dug into this, the bug is rather obvious: we unconditionally
take the L1's HCR_EL2.VSE and merge it into the host's, instead of
only merging it when running the L2. As a result, it is extremely
likely that the L1 guest will observe the SError instead of the L2,
taking it as if it was a physical one. All it takes is a trap, and
there is no shortage of those.

I've posted the equally obvious fix at [1].

With this and the previous fix, the external_aborts test runs
correctly in both VHE and nVHE guests on my setup (tested on the QC
X1E box).

	M.

[1] https://lore.kernel.org/r/20250720113334.218099-1-maz@kernel.org

-- 
Without deviation from the norm, progress is not possible.

  reply	other threads:[~2025-07-20 11:45 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-08 17:25 [PATCH v3 00/27] KVM: arm64: SCTLR2, DoubleFault2, and NV external abort fixes Oliver Upton
2025-07-08 17:25 ` [PATCH v3 01/27] arm64: Detect FEAT_SCTLR2 Oliver Upton
2025-07-08 17:25 ` [PATCH v3 02/27] arm64: Detect FEAT_DoubleFault2 Oliver Upton
2025-07-08 17:25 ` [PATCH v3 03/27] KVM: arm64: Add helper to identify a nested context Oliver Upton
2025-07-08 17:25 ` [PATCH v3 04/27] KVM: arm64: Treat vCPU with pending SError as runnable Oliver Upton
2025-07-08 17:25 ` [PATCH v3 05/27] KVM: arm64: nv: Respect exception routing rules for SEAs Oliver Upton
2025-07-08 17:25 ` [PATCH v3 06/27] KVM: arm64: nv: Honor SError exception routing / masking Oliver Upton
2025-07-08 17:25 ` [PATCH v3 07/27] KVM: arm64: nv: Add FEAT_RAS vSError sys regs to table Oliver Upton
2025-07-08 17:25 ` [PATCH v3 08/27] KVM: arm64: nv: Use guest hypervisor's vSError state Oliver Upton
2025-07-08 17:39   ` Oliver Upton
2025-07-08 17:25 ` [PATCH v3 09/27] KVM: arm64: nv: Advertise support for FEAT_RAS Oliver Upton
2025-07-08 17:25 ` [PATCH v3 10/27] KVM: arm64: nv: Describe trap behavior of SCTLR2_EL1 Oliver Upton
2025-07-08 17:25 ` [PATCH v3 11/27] KVM: arm64: Wire up SCTLR2_ELx sysreg descriptors Oliver Upton
2025-07-08 17:25 ` [PATCH v3 12/27] KVM: arm64: Context switch SCTLR2_ELx when advertised to the guest Oliver Upton
2025-07-08 17:25 ` [PATCH v3 13/27] KVM: arm64: Enable SCTLR2 " Oliver Upton
2025-07-08 17:25 ` [PATCH v3 14/27] KVM: arm64: Describe SCTLR2_ELx RESx masks Oliver Upton
2025-07-08 17:25 ` [PATCH v3 15/27] KVM: arm64: Factor out helper for selecting exception target EL Oliver Upton
2025-07-08 17:25 ` [PATCH v3 16/27] KVM: arm64: nv: Ensure Address size faults affect correct ESR Oliver Upton
2025-07-08 17:25 ` [PATCH v3 17/27] KVM: arm64: Route SEAs to the SError vector when EASE is set Oliver Upton
2025-07-08 17:25 ` [PATCH v3 18/27] KVM: arm64: nv: Take "masked" aborts to EL2 when HCRX_EL2.TMEA " Oliver Upton
2025-07-18 22:01   ` Mark Brown
2025-07-20 10:36     ` Marc Zyngier
2025-07-20 11:45       ` Marc Zyngier [this message]
2025-07-08 17:25 ` [PATCH v3 19/27] KVM: arm64: nv: Honor SError routing effects of SCTLR2_ELx.NMEA Oliver Upton
2025-07-08 17:25 ` [PATCH v3 20/27] KVM: arm64: nv: Enable vSErrors when HCRX_EL2.TMEA is set Oliver Upton
2025-07-08 17:25 ` [PATCH v3 21/27] KVM: arm64: Advertise support for FEAT_SCTLR2 Oliver Upton
2025-07-08 17:25 ` [PATCH v3 22/27] KVM: arm64: Advertise support for FEAT_DoubleFault2 Oliver Upton
2025-07-08 17:25 ` [PATCH v3 23/27] KVM: arm64: Don't retire MMIO instruction w/ pending (emulated) SError Oliver Upton
2025-07-08 17:25 ` [PATCH v3 24/27] KVM: arm64: selftests: Add basic SError injection test Oliver Upton
2025-07-08 17:25 ` [PATCH v3 25/27] KVM: arm64: selftests: Test SEAs are taken to SError vector when EASE=1 Oliver Upton
2025-07-08 17:25 ` [PATCH v3 26/27] KVM: arm64: selftests: Add SCTLR2_EL1 to get-reg-list Oliver Upton
2025-07-08 17:25 ` [PATCH v3 27/27] KVM: arm64: selftests: Catch up set_id_regs with the kernel Oliver Upton
2025-07-08 19:00 ` [PATCH v3 00/27] KVM: arm64: SCTLR2, DoubleFault2, and NV external abort fixes Oliver Upton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87tt37ulvf.wl-maz@kernel.org \
    --to=maz@kernel.org \
    --cc=broonie@kernel.org \
    --cc=joey.gouly@arm.com \
    --cc=kvmarm@lists.linux.dev \
    --cc=oliver.upton@linux.dev \
    --cc=suzuki.poulose@arm.com \
    --cc=yuzenghui@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.