From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 24CDFC369AB for ; Tue, 15 Apr 2025 18:36:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Type:MIME-Version: References:In-Reply-To:Subject:Cc:To:From:Message-ID:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=j5gQNmHjMfbv0bE1Nn1nWnLpz9P0GOXKqyGQsGpwpYs=; b=lfw16H4xtXaGHhsxtQBBxVDGRy 2TcCr38DGfUH+DDkLbypAJD0PIIgL5lXmNBNglJ5R8IJwmCcEbJSjsawRY0ro4yiEvbROS829l6oF HGxLxkJIgaXd5On+njOkArdEDcWfBwsvPDDYqjOpQG5yPNDuVxO+y2FR5rcpdauNFXLKQhENW0vDb J62IF+n3/sEDevrBFu3lGODzOsnlmgIhjcM+FBsUPmVputRbCe3nHnr/iAN4HkbA/Fy/CbV/6PSx9 L5HbgAyYJOQg11gUuUXkeaNKxM83tggAXuX290pqbE7OqBMXkLaKBMwoFXHypMCanWHBUDKe/bwiD GNz28Pjg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98.2 #2 (Red Hat Linux)) id 1u4l9I-00000006joP-2sKg; Tue, 15 Apr 2025 18:36:44 +0000 Received: from dfw.source.kernel.org ([139.178.84.217]) by bombadil.infradead.org with esmtps (Exim 4.98.2 #2 (Red Hat Linux)) id 1u4l64-00000006jRI-0SfO for linux-arm-kernel@lists.infradead.org; Tue, 15 Apr 2025 18:33:25 +0000 Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id 7146D5C424B; Tue, 15 Apr 2025 18:31:06 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 1C453C4CEE9; Tue, 15 Apr 2025 18:33:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1744742003; bh=xElVJVESsbAJLuZAcsQnEEywNdvO8tPQ/vxvtd6rd7k=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=WtGnlnAvI3tuTHiCKrLLalMgm3W3aNAjF0ckRuz97uxYAxOlRwNgZM3lal+61e1Ov qYumvuC/C2pEtg/QTfFmorwtyWsouoOtw0G46f8BzlLSBT9HGODIa3kbpKdzZXJmAt NaHBUpMzbRDfdNf/Hvr5mvgJoR8yDpD+zzBJpMK9q7TTLmuP5rU0L1MAR5nSX4Y+J7 cTabiXFNcLC4MqN5TdW7VKlE32tS1a5ALs7nWp+C+AVcqNXmVPxEnVdLMJV5Wz6aRA 4fjjVyqy05ORKU1cjm+L3dgsFHVX0I/2uw2/kYIsO46/H6fEvWhKdQnXQ+PJvEMZEQ 8mHdiFddBGOLg== Received: from sofa.misterjones.org ([185.219.108.64] helo=goblin-girl.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1u4l60-005mPp-TI; Tue, 15 Apr 2025 19:33:20 +0100 Date: Tue, 15 Apr 2025 19:33:19 +0100 Message-ID: <861pttl1g0.wl-maz@kernel.org> From: Marc Zyngier To: D Scott Phillips Cc: Catalin Marinas , Joey Gouly , Oliver Upton , Suzuki K Poulose , Will Deacon , Zenghui Yu , kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 2/2] KVM: arm64: Avoid blocking irqs when tlb flushing/ATing with HCR.TGE=0 In-Reply-To: <20250415154656.1698522-2-scott@os.amperecomputing.com> References: <20250415154656.1698522-1-scott@os.amperecomputing.com> <20250415154656.1698522-2-scott@os.amperecomputing.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/29.4 (aarch64-unknown-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: scott@os.amperecomputing.com, catalin.marinas@arm.com, joey.gouly@arm.com, oliver.upton@linux.dev, suzuki.poulose@arm.com, will@kernel.org, yuzenghui@huawei.com, kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250415_113324_236901_ECB81FB4 X-CRM114-Status: GOOD ( 32.10 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Tue, 15 Apr 2025 16:46:56 +0100, D Scott Phillips wrote: > > pNMIs are intended to be deliverable during operations like guest > tlb flushing or nested AT, however the setting of HCR_EL2 controls > are accidentally blocking async exceptions. > > You can see this by doing: > > # perf record -e cycles:Hk -g ./dirty_log_perf_test -m 3 \ > -i 4 -s anonymous -b 4G -v 32 > > Where no samples will be collected in __kvm_tlb_flush_vmid_ipa_nsh() > between enter_vmid_context() and exit_vmid_context() then many > samples are collected right after the write to HCR_EL2 in > exit_vmid_context(), where pNMIs actually get unmasked. > > Set HCR_EL2.IMO so that pNMIs are not blocked during guest tlb > flushing or nested AT. > > Signed-off-by: D Scott Phillips > --- > arch/arm64/include/asm/hardirq.h | 3 ++- > arch/arm64/kvm/at.c | 4 +++- > arch/arm64/kvm/hyp/vhe/tlb.c | 10 ++++++++++ > 3 files changed, 15 insertions(+), 2 deletions(-) > > diff --git a/arch/arm64/include/asm/hardirq.h b/arch/arm64/include/asm/hardirq.h > index cbfa7b6f2e098..6eb3f93851023 100644 > --- a/arch/arm64/include/asm/hardirq.h > +++ b/arch/arm64/include/asm/hardirq.h > @@ -41,7 +41,8 @@ do { \ > \ > ___hcr = read_sysreg(hcr_el2); \ > if (!(___hcr & HCR_TGE)) { \ > - write_sysreg(___hcr | HCR_TGE, hcr_el2); \ > + write_sysreg((___hcr & ~(HCR_AMO | HCR_IMO | HCR_FMO)) |\ > + HCR_TGE, hcr_el2); \ > isb(); \ > } \ > /* \ > diff --git a/arch/arm64/kvm/at.c b/arch/arm64/kvm/at.c > index ff4b06ce661af..f31f0d78c5813 100644 > --- a/arch/arm64/kvm/at.c > +++ b/arch/arm64/kvm/at.c > @@ -1269,7 +1269,9 @@ static u64 __kvm_at_s1e01_fast(struct kvm_vcpu *vcpu, u32 op, u64 vaddr) > > skip_mmu_switch: > /* Clear TGE, enable S2 translation, we're rolling */ > - write_sysreg((config.hcr & ~HCR_TGE) | HCR_VM, hcr_el2); > + write_sysreg((config.hcr & ~HCR_TGE) | > + HCR_AMO | HCR_IMO | HCR_FMO | HCR_VM, > + hcr_el2); > isb(); > > switch (op) { > diff --git a/arch/arm64/kvm/hyp/vhe/tlb.c b/arch/arm64/kvm/hyp/vhe/tlb.c > index 3d50a1bd2bdbc..ecb700bab3b8f 100644 > --- a/arch/arm64/kvm/hyp/vhe/tlb.c > +++ b/arch/arm64/kvm/hyp/vhe/tlb.c > @@ -55,6 +55,15 @@ static void enter_vmid_context(struct kvm_s2_mmu *mmu, > * bits. Changing E2H is impossible (goodbye TTBR1_EL2), so > * let's flip TGE before executing the TLB operation. > * > + * One other fun complication to consider is the target EL for > + * asynchronous exceptions. We want to allow NMIs during tlb flushing, > + * so we need to ensure that the target EL for IRQs remains as EL2. > + * HCR_EL2.{E2H,TGE,IMO} = {1,0,0} would set the target EL for IRQs as > + * EL1, and IRQs at EL2 would be "C" (Interrupts not taken regardless > + * of the value of interrupt masks). So we need to set > + * HCR_EL2.{E2H,TGE,IMO} = {1,0,1} so that NMIs will still be > + * delivered. > + * > * ARM erratum 1165522 requires some special handling (again), > * as we need to make sure both stages of translation are in > * place before clearing TGE. __load_stage2() already > @@ -63,6 +72,7 @@ static void enter_vmid_context(struct kvm_s2_mmu *mmu, > __load_stage2(mmu, mmu->arch); > val = read_sysreg(hcr_el2); > val &= ~HCR_TGE; > + val |= HCR_AMO | HCR_IMO | HCR_FMO; > write_sysreg(val, hcr_el2); > isb(); > } This seems terribly complicated for no good reasons. Why can't we run with HCR_xMO set at all times? i.e. like this: diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h index 974d72b5905b8..bba4b0e930915 100644 --- a/arch/arm64/include/asm/kvm_arm.h +++ b/arch/arm64/include/asm/kvm_arm.h @@ -100,7 +100,7 @@ HCR_FMO | HCR_IMO | HCR_PTW | HCR_TID3 | HCR_TID1) #define HCR_HOST_NVHE_FLAGS (HCR_RW | HCR_API | HCR_APK | HCR_ATA) #define HCR_HOST_NVHE_PROTECTED_FLAGS (HCR_HOST_NVHE_FLAGS | HCR_TSC) -#define HCR_HOST_VHE_FLAGS (HCR_RW | HCR_TGE | HCR_E2H) +#define HCR_HOST_VHE_FLAGS (HCR_RW | HCR_TGE | HCR_E2H | HCR_AMO | HCR_IMO | HCR_FMO) #define HCRX_HOST_FLAGS (HCRX_EL2_MSCEn | HCRX_EL2_TCR2En | HCRX_EL2_EnFPM) #define MPAMHCR_HOST_FLAGS 0 There should never be a case where we don't want physical interrupts to be taken at EL2 when running VHE, and we should never use these bits to mask interrupts. This has been relaxed in the architecture to have an IMPDEF behaviour anyway, as it cannot be implemented on NV. Thanks, M. -- Without deviation from the norm, progress is not possible.