From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 418A31EEA28 for ; Sun, 20 Jul 2025 11:45:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753011912; cv=none; b=ErMQoBnqORlRejAXkKA3JC4mFzY44eB2yRvqhaEM6DgZT7xfYGUjObdxws2F2poKfrLzLCy9pjTH3DDyzAgo6+asfOAx0FBPlyXZnBuflOg87TnIlHac6bNRJgSzgZl3RGjxQRh2BsOtm6yR9H25sXsmGvl4CbDtlGony9lsAg0= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753011912; c=relaxed/simple; bh=u31TYIDOFnRhNJrCwfhRHz4nvuyzOdJ6iu8lnbjaNHM=; h=Date:Message-ID:From:To:Cc:Subject:In-Reply-To:References: MIME-Version:Content-Type; b=HrV5hJVkv23wz8i3yr/GyINNW9ugLE6HFm1qSw6HWm0bZUL4ux19UW3eyYnjRMc9gh72D1SbMRKvdGuuALVzP+Tbc9v8OHdbw8bBs2ywxaQt1QIVM7yQBz/lMQLc+shgEoreZ13/tFCFWc7ZuJWeMhHOo22lKntjDj1Gx/cHqjg= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=lDETMVNs; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="lDETMVNs" Received: by smtp.kernel.org (Postfix) with ESMTPSA id B9492C4CEE7; Sun, 20 Jul 2025 11:45:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1753011911; bh=u31TYIDOFnRhNJrCwfhRHz4nvuyzOdJ6iu8lnbjaNHM=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=lDETMVNsu1ViJmLbAo2FgRbVXilc26GnXOWci5pbz33nU2gz/werBx1eJeW6zSxKG Em9yrooDKlAhMYguni0PvOerxjv84OfnPSjya6pYQVb2X7B1nWCUEJxyjGQuPAcECj jrxjzVmBkK+Km4gzgnuLYR7bGhSrcKvm65VwVe/h1KSw+YbKZuHXUwOI5LUirFAr0U rfN6EEQsmrKMk/6N9tlJNcGuH9ZXjhlfQGiCbaqEryOuOtMC38FtTdl+ZwKIKALJXd Q4q2iGY+zAXK7zKRbLrTs8VaDIWU/Na2dMLD7TxCc9zv1O7J+FcyJFfNL8DjYT0av7 i2W1q4uZVvT6Q== Received: from sofa.misterjones.org ([185.219.108.64] helo=wait-a-minute.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1udSTd-00HJ9M-IQ; Sun, 20 Jul 2025 12:45:09 +0100 Date: Sun, 20 Jul 2025 12:45:08 +0100 Message-ID: <87tt37ulvf.wl-maz@kernel.org> From: Marc Zyngier To: Mark Brown , Oliver Upton Cc: kvmarm@lists.linux.dev, Joey Gouly , Suzuki K Poulose , Zenghui Yu Subject: Re: [PATCH v3 18/27] KVM: arm64: nv: Take "masked" aborts to EL2 when HCRX_EL2.TMEA is set In-Reply-To: <87v7nnup1t.wl-maz@kernel.org> References: <20250708172532.1699409-1-oliver.upton@linux.dev> <20250708172532.1699409-19-oliver.upton@linux.dev> <18535df8-e647-4643-af9a-bb780af03a70@sirena.org.uk> <87v7nnup1t.wl-maz@kernel.org> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/30.1 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) Precedence: bulk X-Mailing-List: kvmarm@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: broonie@kernel.org, oliver.upton@linux.dev, kvmarm@lists.linux.dev, joey.gouly@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false On Sun, 20 Jul 2025 11:36:30 +0100, Marc Zyngier wrote: [...] > If I run this very test in a nested guest (patch applied on both L0 > and L1), I get this: > > bash-5.2# /host/home/maz/external_aborts > Random seed: 0x6b8b4567 > [ 5.936631] PC = 402764 > [ 5.942221] PC = 4027f4 > [ 5.961351] SError Interrupt on CPU1, code 0x00000000be000000 -- SError > [ 5.961355] CPU: 1 UID: 0 PID: 64 Comm: external_aborts Not tainted 6.16.0-rc6-00163-ga03b7055c54b-dirty #4690 PREEMPT > [ 5.961357] Hardware name: linux,dummy-virt (DT) > [ 5.961358] pstate: 604000c9 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) > [ 5.961359] pc : __kvm_vcpu_run+0x30/0x70 > [ 5.961364] lr : __kvm_vcpu_run+0x24/0x70 > [ 5.961365] sp : ffff800080633ad0 > [ 5.961366] x29: ffff800080633ad0 x28: ffff000004391280 x27: 0000000000000000 > [ 5.961368] x26: 0000000000000000 x25: 0000000000000000 x24: ffff000004438048 > [ 5.961369] x23: 0000000000000000 x22: ffff000002efc000 x21: 0000000000402820 > [ 5.961371] x20: ffff000004391280 x19: ffff000004438000 x18: 0000000000000000 > [ 5.961372] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 > [ 5.961373] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 > [ 5.961374] x11: 0000000000000000 x10: 0000000000000000 x9 : ffffbf8cfc0a78f0 > [ 5.961375] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000 > [ 5.961377] x5 : 0000000000001000 x4 : ffff407342580000 x3 : 0000000000000000 > [ 5.961378] x2 : 0000000000000000 x1 : 00000000000000c0 x0 : 0000000000000000 > [ 5.961379] Kernel panic - not syncing: Asynchronous SError Interrupt > [ 5.961381] CPU: 1 UID: 0 PID: 64 Comm: external_aborts Not tainted 6.16.0-rc6-00163-ga03b7055c54b-dirty #4690 PREEMPT > [ 5.961382] Hardware name: linux,dummy-virt (DT) > [ 5.961383] Call trace: > [ 5.961384] show_stack+0x20/0x38 (C) > [ 5.961388] dump_stack_lvl+0xc8/0xf8 > [ 5.961392] dump_stack+0x18/0x28 > [ 5.961393] panic+0x380/0x3e8 > [ 5.961395] nmi_panic+0x48/0xa0 > [ 5.961396] arm64_serror_panic+0x6c/0x88 > [ 5.961398] arm64_is_fatal_ras_serror+0x8c/0x98 > [ 5.961399] do_serror+0x3c/0x68 > [ 5.961401] el1h_64_error_handler+0x38/0x60 > [ 5.961404] el1h_64_error+0x80/0x88 > [ 5.961405] __kvm_vcpu_run+0x30/0x70 (P) > [ 5.961406] kvm_arm_vcpu_enter_exit+0x64/0x98 > [ 5.961408] kvm_arch_vcpu_ioctl_run+0x208/0x620 > [ 5.961411] kvm_vcpu_ioctl+0x14c/0x9f8 > [ 5.961414] __arm64_sys_ioctl+0x9c/0x100 > [ 5.961416] invoke_syscall+0x50/0x120 > [ 5.961417] el0_svc_common.constprop.0+0x48/0xf0 > [ 5.961418] do_el0_svc+0x24/0x38 > [ 5.961419] el0_svc+0x34/0xd8 > [ 5.961421] el0t_64_sync_handler+0x10c/0x138 > [ 5.961422] el0t_64_sync+0x1ac/0x1b0 > [ 5.961425] SMP: stopping secondary CPUs > [ 5.961432] Kernel Offset: 0x3f8c7c000000 from 0xffff800080000000 > [ 5.961433] PHYS_OFFSET: 0x80000000 > [ 5.961433] CPU features: 0x01000,00000700,094f0d61,556ffea7 > [ 5.961436] Memory Limit: none > [ 5.995295] ---[ end Kernel panic - not syncing: Asynchronous SError Interrupt ]--- > > where the L1 crashes due to L0 having reinjected the exception in L1. > > Another thing is that a E2H==0 L1 never completes this test at all. > Nothing bad happens, but this is another indication of something being > off. Having dug into this, the bug is rather obvious: we unconditionally take the L1's HCR_EL2.VSE and merge it into the host's, instead of only merging it when running the L2. As a result, it is extremely likely that the L1 guest will observe the SError instead of the L2, taking it as if it was a physical one. All it takes is a trap, and there is no shortage of those. I've posted the equally obvious fix at [1]. With this and the previous fix, the external_aborts test runs correctly in both VHE and nVHE guests on my setup (tested on the QC X1E box). M. [1] https://lore.kernel.org/r/20250720113334.218099-1-maz@kernel.org -- Without deviation from the norm, progress is not possible.