linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Marc Zyngier <maz@kernel.org>
To: D Scott Phillips <scott@os.amperecomputing.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>,
	Joey Gouly <joey.gouly@arm.com>,
	Oliver Upton <oliver.upton@linux.dev>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	Will Deacon <will@kernel.org>, Zenghui Yu <yuzenghui@huawei.com>,
	kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/2] KVM: arm64: Avoid blocking irqs when tlb flushing/ATing with HCR.TGE=0
Date: Tue, 15 Apr 2025 19:33:19 +0100	[thread overview]
Message-ID: <861pttl1g0.wl-maz@kernel.org> (raw)
In-Reply-To: <20250415154656.1698522-2-scott@os.amperecomputing.com>

On Tue, 15 Apr 2025 16:46:56 +0100,
D Scott Phillips <scott@os.amperecomputing.com> wrote:
> 
> pNMIs are intended to be deliverable during operations like guest
> tlb flushing or nested AT, however the setting of HCR_EL2 controls
> are accidentally blocking async exceptions.
> 
> You can see this by doing:
> 
>      # perf record -e cycles:Hk -g ./dirty_log_perf_test -m 3 \
>        -i 4 -s anonymous -b 4G -v 32
> 
> Where no samples will be collected in __kvm_tlb_flush_vmid_ipa_nsh()
> between enter_vmid_context() and exit_vmid_context() then many
> samples are collected right after the write to HCR_EL2 in
> exit_vmid_context(), where pNMIs actually get unmasked.
> 
> Set HCR_EL2.IMO so that pNMIs are not blocked during guest tlb
> flushing or nested AT.
>
> Signed-off-by: D Scott Phillips <scott@os.amperecomputing.com>
> ---
>  arch/arm64/include/asm/hardirq.h |  3 ++-
>  arch/arm64/kvm/at.c              |  4 +++-
>  arch/arm64/kvm/hyp/vhe/tlb.c     | 10 ++++++++++
>  3 files changed, 15 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/hardirq.h b/arch/arm64/include/asm/hardirq.h
> index cbfa7b6f2e098..6eb3f93851023 100644
> --- a/arch/arm64/include/asm/hardirq.h
> +++ b/arch/arm64/include/asm/hardirq.h
> @@ -41,7 +41,8 @@ do {									\
>  									\
>  	___hcr = read_sysreg(hcr_el2);					\
>  	if (!(___hcr & HCR_TGE)) {					\
> -		write_sysreg(___hcr | HCR_TGE, hcr_el2);		\
> +		write_sysreg((___hcr & ~(HCR_AMO | HCR_IMO | HCR_FMO)) |\
> +			     HCR_TGE, hcr_el2);				\
>  		isb();							\
>  	}								\
>  	/*								\
> diff --git a/arch/arm64/kvm/at.c b/arch/arm64/kvm/at.c
> index ff4b06ce661af..f31f0d78c5813 100644
> --- a/arch/arm64/kvm/at.c
> +++ b/arch/arm64/kvm/at.c
> @@ -1269,7 +1269,9 @@ static u64 __kvm_at_s1e01_fast(struct kvm_vcpu *vcpu, u32 op, u64 vaddr)
>  
>  skip_mmu_switch:
>  	/* Clear TGE, enable S2 translation, we're rolling */
> -	write_sysreg((config.hcr & ~HCR_TGE) | HCR_VM,	hcr_el2);
> +	write_sysreg((config.hcr & ~HCR_TGE) |
> +		     HCR_AMO | HCR_IMO | HCR_FMO | HCR_VM,
> +		     hcr_el2);
>  	isb();
>  
>  	switch (op) {
> diff --git a/arch/arm64/kvm/hyp/vhe/tlb.c b/arch/arm64/kvm/hyp/vhe/tlb.c
> index 3d50a1bd2bdbc..ecb700bab3b8f 100644
> --- a/arch/arm64/kvm/hyp/vhe/tlb.c
> +++ b/arch/arm64/kvm/hyp/vhe/tlb.c
> @@ -55,6 +55,15 @@ static void enter_vmid_context(struct kvm_s2_mmu *mmu,
>  	 * bits. Changing E2H is impossible (goodbye TTBR1_EL2), so
>  	 * let's flip TGE before executing the TLB operation.
>  	 *
> +	 * One other fun complication to consider is the target EL for
> +	 * asynchronous exceptions. We want to allow NMIs during tlb flushing,
> +	 * so we need to ensure that the target EL for IRQs remains as EL2.
> +	 * HCR_EL2.{E2H,TGE,IMO} = {1,0,0} would set the target EL for IRQs as
> +	 * EL1, and IRQs at EL2 would be "C" (Interrupts not taken regardless
> +	 * of the value of interrupt masks). So we need to set
> +	 * HCR_EL2.{E2H,TGE,IMO} = {1,0,1} so that NMIs will still be
> +	 * delivered.
> +	 *
>  	 * ARM erratum 1165522 requires some special handling (again),
>  	 * as we need to make sure both stages of translation are in
>  	 * place before clearing TGE. __load_stage2() already
> @@ -63,6 +72,7 @@ static void enter_vmid_context(struct kvm_s2_mmu *mmu,
>  	__load_stage2(mmu, mmu->arch);
>  	val = read_sysreg(hcr_el2);
>  	val &= ~HCR_TGE;
> +	val |= HCR_AMO | HCR_IMO | HCR_FMO;
>  	write_sysreg(val, hcr_el2);
>  	isb();
>  }

This seems terribly complicated for no good reasons. Why can't we run
with HCR_xMO set at all times? i.e. like this:

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 974d72b5905b8..bba4b0e930915 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -100,7 +100,7 @@
 			 HCR_FMO | HCR_IMO | HCR_PTW | HCR_TID3 | HCR_TID1)
 #define HCR_HOST_NVHE_FLAGS (HCR_RW | HCR_API | HCR_APK | HCR_ATA)
 #define HCR_HOST_NVHE_PROTECTED_FLAGS (HCR_HOST_NVHE_FLAGS | HCR_TSC)
-#define HCR_HOST_VHE_FLAGS (HCR_RW | HCR_TGE | HCR_E2H)
+#define HCR_HOST_VHE_FLAGS (HCR_RW | HCR_TGE | HCR_E2H | HCR_AMO | HCR_IMO | HCR_FMO)
 
 #define HCRX_HOST_FLAGS (HCRX_EL2_MSCEn | HCRX_EL2_TCR2En | HCRX_EL2_EnFPM)
 #define MPAMHCR_HOST_FLAGS	0

There should never be a case where we don't want physical interrupts
to be taken at EL2 when running VHE, and we should never use these
bits to mask interrupts. This has been relaxed in the architecture to
have an IMPDEF behaviour anyway, as it cannot be implemented on NV.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.


  reply	other threads:[~2025-04-15 18:36 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-15 15:46 [PATCH 1/2] KVM: arm64: fix config.hcr used uninitialized in __kvm_at_s1e01_fast D Scott Phillips
2025-04-15 15:46 ` [PATCH 2/2] KVM: arm64: Avoid blocking irqs when tlb flushing/ATing with HCR.TGE=0 D Scott Phillips
2025-04-15 18:33   ` Marc Zyngier [this message]
2025-04-16 22:59     ` D Scott Phillips
2025-04-15 18:22 ` [PATCH 1/2] KVM: arm64: fix config.hcr used uninitialized in __kvm_at_s1e01_fast Marc Zyngier
2025-04-16 23:00   ` D Scott Phillips
2025-04-17 16:13     ` Marc Zyngier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=861pttl1g0.wl-maz@kernel.org \
    --to=maz@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=joey.gouly@arm.com \
    --cc=kvmarm@lists.linux.dev \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=oliver.upton@linux.dev \
    --cc=scott@os.amperecomputing.com \
    --cc=suzuki.poulose@arm.com \
    --cc=will@kernel.org \
    --cc=yuzenghui@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).