linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 0/2] KVM: arm: nv: fix AT S* behaviour
@ 2025-08-06 14:17 Volodymyr Babchuk
  2025-08-06 14:17 ` [PATCH v1 2/2] KVM: arm64: nv: update CPU register PAR_EL1 after 'at s*' Volodymyr Babchuk
  2025-08-06 14:17 ` [PATCH v1 1/2] KVM: arm64: nv: fix S2 translation for nVHE guests Volodymyr Babchuk
  0 siblings, 2 replies; 10+ messages in thread
From: Volodymyr Babchuk @ 2025-08-06 14:17 UTC (permalink / raw)
  To: linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
	linux-kernel@vger.kernel.org
  Cc: Volodymyr Babchuk, Marc Zyngier, Oliver Upton, Joey Gouly,
	Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon

These two small patches fix bug in AT S12*/AT S1* emulation, which
returned IPA (aka result of S1 translation) regardles of what nested
hypervisor asked.

This was a tricky issue, as it worked for VHE nested hypervisors by
pure chance. These hypervisors require S1 translation only, and they
got it becase KVM tried fast path first, which left correct values in
PAR_EL1 almost by accident.

Also, this worked fine for Xen with Dom0 only, because Dom0 is
identity mapped (IPA=PA). So again, returning result of S1 translation
when emulating AT S12 worked for Dom0.

I stumbled on this issue only when I tried to launch DomU under
Xen. In that case IPA!=PA and we need real S2 translation.

The first patch ensures that nVHE hypevisor will got its S2
translation, the second patch ensures that the hypervisor will really
see result of this translation.

Volodymyr Babchuk (2):
  KVM: arm64: nv: fix S2 translation for nVHE guests
  KVM: arm64: nv: update CPU register PAR_EL1 after 'at s*'

 arch/arm64/kvm/at.c       | 4 ++--
 arch/arm64/kvm/sys_regs.c | 6 ++++++
 2 files changed, 8 insertions(+), 2 deletions(-)

-- 
2.50.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v1 1/2] KVM: arm64: nv: fix S2 translation for nVHE guests
  2025-08-06 14:17 [PATCH v1 0/2] KVM: arm: nv: fix AT S* behaviour Volodymyr Babchuk
  2025-08-06 14:17 ` [PATCH v1 2/2] KVM: arm64: nv: update CPU register PAR_EL1 after 'at s*' Volodymyr Babchuk
@ 2025-08-06 14:17 ` Volodymyr Babchuk
  2025-08-06 17:37   ` Oliver Upton
  2025-08-06 17:45   ` Marc Zyngier
  1 sibling, 2 replies; 10+ messages in thread
From: Volodymyr Babchuk @ 2025-08-06 14:17 UTC (permalink / raw)
  To: linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
	linux-kernel@vger.kernel.org
  Cc: Volodymyr Babchuk, Marc Zyngier, Oliver Upton, Joey Gouly,
	Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon

According to ARM architecture specification (ARM DDI 0487 L.a, section
C5.4.3), Stage 2 translation should be skipped when VHE is active, or,
in other words, E2H bit is set. Fix the code by inverting both check
and comment.

Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
---
 arch/arm64/kvm/at.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/at.c b/arch/arm64/kvm/at.c
index a25be111cd8f8..5e7c3fb01273c 100644
--- a/arch/arm64/kvm/at.c
+++ b/arch/arm64/kvm/at.c
@@ -1412,10 +1412,10 @@ void __kvm_at_s12(struct kvm_vcpu *vcpu, u32 op, u64 vaddr)
 		return;
 
 	/*
-	 * If we only have a single stage of translation (E2H=0 or
+	 * If we only have a single stage of translation (E2H=1 or
 	 * TGE=1), exit early. Same thing if {VM,DC}=={0,0}.
 	 */
-	if (!vcpu_el2_e2h_is_set(vcpu) || vcpu_el2_tge_is_set(vcpu) ||
+	if (vcpu_el2_e2h_is_set(vcpu) || vcpu_el2_tge_is_set(vcpu) ||
 	    !(vcpu_read_sys_reg(vcpu, HCR_EL2) & (HCR_VM | HCR_DC)))
 		return;
 
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v1 2/2] KVM: arm64: nv: update CPU register PAR_EL1 after 'at s*'
  2025-08-06 14:17 [PATCH v1 0/2] KVM: arm: nv: fix AT S* behaviour Volodymyr Babchuk
@ 2025-08-06 14:17 ` Volodymyr Babchuk
  2025-08-06 18:40   ` Oliver Upton
  2025-08-06 18:56   ` Marc Zyngier
  2025-08-06 14:17 ` [PATCH v1 1/2] KVM: arm64: nv: fix S2 translation for nVHE guests Volodymyr Babchuk
  1 sibling, 2 replies; 10+ messages in thread
From: Volodymyr Babchuk @ 2025-08-06 14:17 UTC (permalink / raw)
  To: linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
	linux-kernel@vger.kernel.org
  Cc: Volodymyr Babchuk, Marc Zyngier, Oliver Upton, Joey Gouly,
	Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon

Previously this code update only vCPU's in-memory value, which is good,
but not enough, as there will be no context switch after exiting
exception handler, so in-memory value will not get into actual
register.

It worked good enough for VHE guests because KVM code tried fast path,
which of course updated real PAR_EL1.

Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
---
 arch/arm64/kvm/sys_regs.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 7b8a0a6f26468..ab2b5e261d312 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -3463,6 +3463,9 @@ static bool handle_at_s1e2(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 
 	__kvm_at_s1e2(vcpu, op, p->regval);
 
+	/* No context switch happened, so we need to update PAR_EL1 manually */
+	write_sysreg(vcpu_read_sys_reg(vcpu, PAR_EL1), par_el1);
+
 	return true;
 }
 
@@ -3473,6 +3476,9 @@ static bool handle_at_s12(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 
 	__kvm_at_s12(vcpu, op, p->regval);
 
+	/* No context switch happened, so we need to update PAR_EL1 manually */
+	write_sysreg(vcpu_read_sys_reg(vcpu, PAR_EL1), par_el1);
+
 	return true;
 }
 
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v1 1/2] KVM: arm64: nv: fix S2 translation for nVHE guests
  2025-08-06 14:17 ` [PATCH v1 1/2] KVM: arm64: nv: fix S2 translation for nVHE guests Volodymyr Babchuk
@ 2025-08-06 17:37   ` Oliver Upton
  2025-08-06 18:17     ` Marc Zyngier
  2025-08-06 17:45   ` Marc Zyngier
  1 sibling, 1 reply; 10+ messages in thread
From: Oliver Upton @ 2025-08-06 17:37 UTC (permalink / raw)
  To: Volodymyr Babchuk
  Cc: linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
	linux-kernel@vger.kernel.org, Marc Zyngier, Joey Gouly,
	Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon

Hi Volodymyr,

Thanks for catching this.

On Wed, Aug 06, 2025 at 02:17:55PM +0000, Volodymyr Babchuk wrote:
> According to ARM architecture specification (ARM DDI 0487 L.a, section
> C5.4.3), Stage 2 translation should be skipped when VHE is active, or,
> in other words, E2H bit is set. Fix the code by inverting both check
> and comment.
> 
> Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
> ---
>  arch/arm64/kvm/at.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/kvm/at.c b/arch/arm64/kvm/at.c
> index a25be111cd8f8..5e7c3fb01273c 100644
> --- a/arch/arm64/kvm/at.c
> +++ b/arch/arm64/kvm/at.c
> @@ -1412,10 +1412,10 @@ void __kvm_at_s12(struct kvm_vcpu *vcpu, u32 op, u64 vaddr)
>  		return;
>  
>  	/*
> -	 * If we only have a single stage of translation (E2H=0 or
> +	 * If we only have a single stage of translation (E2H=1 or
>  	 * TGE=1), exit early. Same thing if {VM,DC}=={0,0}.
>  	 */
> -	if (!vcpu_el2_e2h_is_set(vcpu) || vcpu_el2_tge_is_set(vcpu) ||
> +	if (vcpu_el2_e2h_is_set(vcpu) || vcpu_el2_tge_is_set(vcpu) ||

The check should be HCR_EL2.<E2H,TGE> == '11'. Maybe instead:

	/*
	 * Exit early if we only have a single stage of translation
	 * either because we're in the EL2&0 translation regime or
	 * stage-2 translation is disabled (i.e. HCR_EL2.{VM,DC}=={0,0}).
	 */
	 if (compute_translation_regime(vcpu, op) == TR_EL20 ||
	     !(vcpu_read_sys_reg(vcpu, HCR_EL2) & (HCR_VM | HCR_DC)))
	     return;

Thanks,
Oliver


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v1 1/2] KVM: arm64: nv: fix S2 translation for nVHE guests
  2025-08-06 14:17 ` [PATCH v1 1/2] KVM: arm64: nv: fix S2 translation for nVHE guests Volodymyr Babchuk
  2025-08-06 17:37   ` Oliver Upton
@ 2025-08-06 17:45   ` Marc Zyngier
  1 sibling, 0 replies; 10+ messages in thread
From: Marc Zyngier @ 2025-08-06 17:45 UTC (permalink / raw)
  To: Volodymyr Babchuk
  Cc: linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
	linux-kernel@vger.kernel.org, Oliver Upton, Joey Gouly,
	Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon

Hi Volodymyr,

Thanks for looking into this.

On Wed, 06 Aug 2025 15:17:55 +0100,
Volodymyr Babchuk <Volodymyr_Babchuk@epam.com> wrote:
> 
> According to ARM architecture specification (ARM DDI 0487 L.a, section
> C5.4.3), Stage 2 translation should be skipped when VHE is active, or,
> in other words, E2H bit is set. Fix the code by inverting both check
> and comment.
> 
> Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
> ---
>  arch/arm64/kvm/at.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/kvm/at.c b/arch/arm64/kvm/at.c
> index a25be111cd8f8..5e7c3fb01273c 100644
> --- a/arch/arm64/kvm/at.c
> +++ b/arch/arm64/kvm/at.c
> @@ -1412,10 +1412,10 @@ void __kvm_at_s12(struct kvm_vcpu *vcpu, u32 op, u64 vaddr)
>  		return;
>  
>  	/*
> -	 * If we only have a single stage of translation (E2H=0 or
> +	 * If we only have a single stage of translation (E2H=1 or
>  	 * TGE=1), exit early. Same thing if {VM,DC}=={0,0}.
>  	 */
> -	if (!vcpu_el2_e2h_is_set(vcpu) || vcpu_el2_tge_is_set(vcpu) ||
> +	if (vcpu_el2_e2h_is_set(vcpu) || vcpu_el2_tge_is_set(vcpu) ||
>  	    !(vcpu_read_sys_reg(vcpu, HCR_EL2) & (HCR_VM | HCR_DC)))
>  		return;

The code we have here is clearly bogus, but what you are suggesting
doesn't look correct to me either. Here's what the spec says:

<quote>

Performs stage 1 and 2 address translation, with permissions as if
reading from the given virtual address from EL1, or from EL2 if the
Effective value of HCR_EL2.{E2H, TGE} is {1, 1}, using the following
translation regime:

  * When EL2 is implemented and enabled in the Security state
    described by the current Effective value of SCR_EL3.{NSE, NS}:

    - If the Effective value of HCR_EL2.{E2H, TGE} is not {1, 1}, the
      EL1&0 translation regime, accessed from EL1.

    - If the Effective value of HCR_EL2.{E2H, TGE} is {1, 1}, the
      EL2&0 translation regime, accessed from EL2.

  * Otherwise, the EL1&0 translation regime, accessed from EL1.

</quote>

We're obviously in the first bullet, so we need to work out whether
this is applying to the EL1&0 or the EL2&0 translation regime. By the
letter of the spec, we need to check for E2H+TGE being both set to
elide the S2 final walk.

I suspect what you really want is the hack below, though I think the
DC handling has always been broken (who cares anyway?).

Thanks,

	M.

diff --git a/arch/arm64/kvm/at.c b/arch/arm64/kvm/at.c
index 0e56105339493..3e591979f947e 100644
--- a/arch/arm64/kvm/at.c
+++ b/arch/arm64/kvm/at.c
@@ -1420,10 +1420,10 @@ void __kvm_at_s12(struct kvm_vcpu *vcpu, u32 op, u64 vaddr)
 		return;
 
 	/*
-	 * If we only have a single stage of translation (E2H=0 or
-	 * TGE=1), exit early. Same thing if {VM,DC}=={0,0}.
+	 * If we only have a single stage of translation ({E2H,TGE}={1,1}),
+	 * exit early. Same thing if {VM,DC}=={0,0}.
 	 */
-	if (!vcpu_el2_e2h_is_set(vcpu) || vcpu_el2_tge_is_set(vcpu) ||
+	if ((vcpu_el2_e2h_is_set(vcpu) && vcpu_el2_tge_is_set(vcpu)) ||
 	    !(vcpu_read_sys_reg(vcpu, HCR_EL2) & (HCR_VM | HCR_DC)))
 		return;
 

-- 
Without deviation from the norm, progress is not possible.


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v1 1/2] KVM: arm64: nv: fix S2 translation for nVHE guests
  2025-08-06 17:37   ` Oliver Upton
@ 2025-08-06 18:17     ` Marc Zyngier
  0 siblings, 0 replies; 10+ messages in thread
From: Marc Zyngier @ 2025-08-06 18:17 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Volodymyr Babchuk, linux-arm-kernel@lists.infradead.org,
	kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org, Joey Gouly,
	Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon

On Wed, 06 Aug 2025 18:37:34 +0100,
Oliver Upton <oliver.upton@linux.dev> wrote:
> 
> Hi Volodymyr,
> 
> Thanks for catching this.
> 
> On Wed, Aug 06, 2025 at 02:17:55PM +0000, Volodymyr Babchuk wrote:
> > According to ARM architecture specification (ARM DDI 0487 L.a, section
> > C5.4.3), Stage 2 translation should be skipped when VHE is active, or,
> > in other words, E2H bit is set. Fix the code by inverting both check
> > and comment.
> > 
> > Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
> > ---
> >  arch/arm64/kvm/at.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/arm64/kvm/at.c b/arch/arm64/kvm/at.c
> > index a25be111cd8f8..5e7c3fb01273c 100644
> > --- a/arch/arm64/kvm/at.c
> > +++ b/arch/arm64/kvm/at.c
> > @@ -1412,10 +1412,10 @@ void __kvm_at_s12(struct kvm_vcpu *vcpu, u32 op, u64 vaddr)
> >  		return;
> >  
> >  	/*
> > -	 * If we only have a single stage of translation (E2H=0 or
> > +	 * If we only have a single stage of translation (E2H=1 or
> >  	 * TGE=1), exit early. Same thing if {VM,DC}=={0,0}.
> >  	 */
> > -	if (!vcpu_el2_e2h_is_set(vcpu) || vcpu_el2_tge_is_set(vcpu) ||
> > +	if (vcpu_el2_e2h_is_set(vcpu) || vcpu_el2_tge_is_set(vcpu) ||
> 
> The check should be HCR_EL2.<E2H,TGE> == '11'. Maybe instead:
> 
> 	/*
> 	 * Exit early if we only have a single stage of translation
> 	 * either because we're in the EL2&0 translation regime or
> 	 * stage-2 translation is disabled (i.e. HCR_EL2.{VM,DC}=={0,0}).
> 	 */
> 	 if (compute_translation_regime(vcpu, op) == TR_EL20 ||
> 	     !(vcpu_read_sys_reg(vcpu, HCR_EL2) & (HCR_VM | HCR_DC)))
> 	     return;

Ah, same solution, just way better written!

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v1 2/2] KVM: arm64: nv: update CPU register PAR_EL1 after 'at s*'
  2025-08-06 14:17 ` [PATCH v1 2/2] KVM: arm64: nv: update CPU register PAR_EL1 after 'at s*' Volodymyr Babchuk
@ 2025-08-06 18:40   ` Oliver Upton
  2025-08-06 20:30     ` Volodymyr Babchuk
  2025-08-06 18:56   ` Marc Zyngier
  1 sibling, 1 reply; 10+ messages in thread
From: Oliver Upton @ 2025-08-06 18:40 UTC (permalink / raw)
  To: Volodymyr Babchuk
  Cc: linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
	linux-kernel@vger.kernel.org, Marc Zyngier, Joey Gouly,
	Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon

On Wed, Aug 06, 2025 at 02:17:55PM +0000, Volodymyr Babchuk wrote:
> Previously this code update only vCPU's in-memory value, which is good,
> but not enough, as there will be no context switch after exiting
> exception handler, so in-memory value will not get into actual
> register.
> 
> It worked good enough for VHE guests because KVM code tried fast path,
> which of course updated real PAR_EL1.
> 
> Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
> ---
>  arch/arm64/kvm/sys_regs.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 7b8a0a6f26468..ab2b5e261d312 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -3463,6 +3463,9 @@ static bool handle_at_s1e2(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
>  
>  	__kvm_at_s1e2(vcpu, op, p->regval);
>  
> +	/* No context switch happened, so we need to update PAR_EL1 manually */
> +	write_sysreg(vcpu_read_sys_reg(vcpu, PAR_EL1), par_el1);
> +

Ok, this had me thoroughly confused for a moment. The bug is actually in
kvm_write_sys_reg() which is supposed to update the sysreg value when
things are loaded on the CPU. __kvm_at_s1e2() is doing the right thing
by calling this accessor.

For registers like PAR_EL1 that don't have an EL2->EL1 mapping we assume
they belong to the EL1 context and thus are in-memory when in a hyp
context. TPIDR(RO)_EL0 is similarly affected.

This is a bit of an ugly hack, but something like the following should
get things working if you're able to test it:

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index ad2548477257..32f8d1de8f1a 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -149,6 +149,22 @@ static bool get_el2_to_el1_mapping(unsigned int reg,
 	}
 }
 
+/*
+ * Special-cased registers that do not have an ELx mapping and are always
+ * loaded on the CPU.
+ */
+static bool reg_has_elx_mapping(int reg)
+{
+	switch (reg) {
+	case TPIDR_EL0:
+	case TPIDRRO_EL0:
+	case PAR_EL1:
+		return false;
+	default:
+		return true;
+	}
+}
+
 u64 vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
 {
 	u64 val = 0x8badf00d8badf00d;
@@ -158,6 +174,9 @@ u64 vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
 	if (!vcpu_get_flag(vcpu, SYSREGS_ON_CPU))
 		goto memory_read;
 
+	if (!reg_has_elx_mapping(reg))
+		goto sysreg_read;
+
 	if (unlikely(get_el2_to_el1_mapping(reg, &el1r, &xlate))) {
 		if (!is_hyp_ctxt(vcpu))
 			goto memory_read;
@@ -204,6 +223,7 @@ u64 vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
 	if (unlikely(is_hyp_ctxt(vcpu)))
 		goto memory_read;
 
+sysreg_read:
 	if (__vcpu_read_sys_reg_from_cpu(reg, &val))
 		return val;
 
@@ -219,6 +239,9 @@ void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg)
 	if (!vcpu_get_flag(vcpu, SYSREGS_ON_CPU))
 		goto memory_write;
 
+	if (!reg_has_elx_mapping(reg))
+		goto sysreg_write;
+
 	if (unlikely(get_el2_to_el1_mapping(reg, &el1r, &xlate))) {
 		if (!is_hyp_ctxt(vcpu))
 			goto memory_write;
@@ -259,6 +282,7 @@ void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg)
 	if (unlikely(is_hyp_ctxt(vcpu)))
 		goto memory_write;
 
+sysreg_write:
 	if (__vcpu_write_sys_reg_to_cpu(val, reg))
 		return;
 
-- 
Thanks,
Oliver


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v1 2/2] KVM: arm64: nv: update CPU register PAR_EL1 after 'at s*'
  2025-08-06 14:17 ` [PATCH v1 2/2] KVM: arm64: nv: update CPU register PAR_EL1 after 'at s*' Volodymyr Babchuk
  2025-08-06 18:40   ` Oliver Upton
@ 2025-08-06 18:56   ` Marc Zyngier
  2025-08-06 20:00     ` Volodymyr Babchuk
  1 sibling, 1 reply; 10+ messages in thread
From: Marc Zyngier @ 2025-08-06 18:56 UTC (permalink / raw)
  To: Volodymyr Babchuk
  Cc: linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
	linux-kernel@vger.kernel.org, Oliver Upton, Joey Gouly,
	Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon

On Wed, 06 Aug 2025 15:17:55 +0100,
Volodymyr Babchuk <Volodymyr_Babchuk@epam.com> wrote:
> 
> Previously this code update only vCPU's in-memory value, which is good,
> but not enough, as there will be no context switch after exiting
> exception handler, so in-memory value will not get into actual
> register.
> 
> It worked good enough for VHE guests because KVM code tried fast path,
> which of course updated real PAR_EL1.

Nothing to do with VHE, I'm afraid. We can take the slow path for any
odd reason, even on VHE. This is more of a structural problem, see
below.

> 
> Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
> ---
>  arch/arm64/kvm/sys_regs.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 7b8a0a6f26468..ab2b5e261d312 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -3463,6 +3463,9 @@ static bool handle_at_s1e2(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
>  
>  	__kvm_at_s1e2(vcpu, op, p->regval);
>  
> +	/* No context switch happened, so we need to update PAR_EL1 manually */
> +	write_sysreg(vcpu_read_sys_reg(vcpu, PAR_EL1), par_el1);
> +

This looks like the wrong fix, as it papers over another issue.

The core problem is vcpu_write_sys_reg() (resp. read) does the wrong
thing with registers such as PAR_EL1, which are not translated between
EL1 and EL2, and therefore are always live, no matter what.

Can you please try the hack below? I don't like it, but at least it
shows where the actual problem lies.

Thanks,

	M.

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index ad25484772574..167f0d411a708 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -95,7 +95,13 @@ static bool write_to_read_only(struct kvm_vcpu *vcpu,
 		return true;						\
 	}
 
-static bool get_el2_to_el1_mapping(unsigned int reg,
+#define COMMON_SYSREG(r)						\
+	case r: {							\
+		 *el1r = __INVALID_SYSREG__;				\
+		 return is_hyp_ctxt(vcpu);					\
+	}
+
+static bool get_el2_to_el1_mapping(const struct kvm_vcpu *vcpu, unsigned int reg,
 				   unsigned int *el1r, u64 (**xlate)(u64))
 {
 	switch (reg) {
@@ -119,6 +125,7 @@ static bool get_el2_to_el1_mapping(unsigned int reg,
 		PURE_EL2_SYSREG(  HAFGRTR_EL2	);
 		PURE_EL2_SYSREG(  CNTVOFF_EL2	);
 		PURE_EL2_SYSREG(  CNTHCTL_EL2	);
+		COMMON_SYSREG(	  PAR_EL1	);
 		MAPPED_EL2_SYSREG(SCTLR_EL2,   SCTLR_EL1,
 				  translate_sctlr_el2_to_sctlr_el1	     );
 		MAPPED_EL2_SYSREG(CPTR_EL2,    CPACR_EL1,
@@ -158,7 +165,7 @@ u64 vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
 	if (!vcpu_get_flag(vcpu, SYSREGS_ON_CPU))
 		goto memory_read;
 
-	if (unlikely(get_el2_to_el1_mapping(reg, &el1r, &xlate))) {
+	if (unlikely(get_el2_to_el1_mapping(vcpu, reg, &el1r, &xlate))) {
 		if (!is_hyp_ctxt(vcpu))
 			goto memory_read;
 
@@ -219,7 +226,7 @@ void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg)
 	if (!vcpu_get_flag(vcpu, SYSREGS_ON_CPU))
 		goto memory_write;
 
-	if (unlikely(get_el2_to_el1_mapping(reg, &el1r, &xlate))) {
+	if (unlikely(get_el2_to_el1_mapping(vcpu, reg, &el1r, &xlate))) {
 		if (!is_hyp_ctxt(vcpu))
 			goto memory_write;
 

-- 
Without deviation from the norm, progress is not possible.


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v1 2/2] KVM: arm64: nv: update CPU register PAR_EL1 after 'at s*'
  2025-08-06 18:56   ` Marc Zyngier
@ 2025-08-06 20:00     ` Volodymyr Babchuk
  0 siblings, 0 replies; 10+ messages in thread
From: Volodymyr Babchuk @ 2025-08-06 20:00 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
	linux-kernel@vger.kernel.org, Oliver Upton, Joey Gouly,
	Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon


Hi Marc,

Marc Zyngier <maz@kernel.org> writes:

> On Wed, 06 Aug 2025 15:17:55 +0100,
> Volodymyr Babchuk <Volodymyr_Babchuk@epam.com> wrote:
>> 
>> Previously this code update only vCPU's in-memory value, which is good,
>> but not enough, as there will be no context switch after exiting
>> exception handler, so in-memory value will not get into actual
>> register.
>> 
>> It worked good enough for VHE guests because KVM code tried fast path,
>> which of course updated real PAR_EL1.
>
> Nothing to do with VHE, I'm afraid. We can take the slow path for any
> odd reason, even on VHE.

Yeah, I just didn't conveyed my idea well. As VHE guest does not require
S2 translation, calling real "at s1*" was enough to get correct value in
PAR_EL1 in that case. That would hide that issue if using VHE
guests. Also, I believe that most hypervisors will try to do own
software pagewalk if "at s1" fails...

Anyways, I'll rewrite commit message to make this more clear.

> This is more of a structural problem, see
> below.
>
>> 
>> Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
>> ---
>>  arch/arm64/kvm/sys_regs.c | 6 ++++++
>>  1 file changed, 6 insertions(+)
>> 
>> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
>> index 7b8a0a6f26468..ab2b5e261d312 100644
>> --- a/arch/arm64/kvm/sys_regs.c
>> +++ b/arch/arm64/kvm/sys_regs.c
>> @@ -3463,6 +3463,9 @@ static bool handle_at_s1e2(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
>>  
>>  	__kvm_at_s1e2(vcpu, op, p->regval);
>>  
>> +	/* No context switch happened, so we need to update PAR_EL1 manually */
>> +	write_sysreg(vcpu_read_sys_reg(vcpu, PAR_EL1), par_el1);
>> +
>
> This looks like the wrong fix, as it papers over another issue.
>
> The core problem is vcpu_write_sys_reg() (resp. read) does the wrong
> thing with registers such as PAR_EL1, which are not translated between
> EL1 and EL2, and therefore are always live, no matter what.
>
> Can you please try the hack below? I don't like it, but at least it
> shows where the actual problem lies.

It does no work, see below.

Also, what do you think about Oliver's approach? I'll try it next.

>
> Thanks,
>
> 	M.
>
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index ad25484772574..167f0d411a708 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -95,7 +95,13 @@ static bool write_to_read_only(struct kvm_vcpu *vcpu,
>  		return true;						\
>  	}
>  
> -static bool get_el2_to_el1_mapping(unsigned int reg,
> +#define COMMON_SYSREG(r)						\
> +	case r: {							\
> +		 *el1r = __INVALID_SYSREG__;
> \
This didn't worked right away, as code in vcpu_read_sys_reg()
will do just __vcpu_read_sys_reg_from_cpu(el1r, &val);

(the same true for write counterpart of course)

I tried to put  *el1r = r, here, but this didn't worked also, because of
this check:

		/*
		 * If this register does not have an EL1 counterpart,
		 * then read the stored EL2 version.
		 */
		if (reg == el1r)
			goto memory_read;

So, either we need to add one more check, like

if (el1r == INVALID_REG)
    __vcpu_read_sys_reg_from_cpu(reg, &val);

Or use Oliver's approach.


> +		 return is_hyp_ctxt(vcpu);					\
> +	}
> +
> +static bool get_el2_to_el1_mapping(const struct kvm_vcpu *vcpu, unsigned int reg,
>  				   unsigned int *el1r, u64 (**xlate)(u64))
>  {
>  	switch (reg) {
> @@ -119,6 +125,7 @@ static bool get_el2_to_el1_mapping(unsigned int reg,
>  		PURE_EL2_SYSREG(  HAFGRTR_EL2	);
>  		PURE_EL2_SYSREG(  CNTVOFF_EL2	);
>  		PURE_EL2_SYSREG(  CNTHCTL_EL2	);
> +		COMMON_SYSREG(	  PAR_EL1	);
>  		MAPPED_EL2_SYSREG(SCTLR_EL2,   SCTLR_EL1,
>  				  translate_sctlr_el2_to_sctlr_el1	     );
>  		MAPPED_EL2_SYSREG(CPTR_EL2,    CPACR_EL1,
> @@ -158,7 +165,7 @@ u64 vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
>  	if (!vcpu_get_flag(vcpu, SYSREGS_ON_CPU))
>  		goto memory_read;
>  
> -	if (unlikely(get_el2_to_el1_mapping(reg, &el1r, &xlate))) {
> +	if (unlikely(get_el2_to_el1_mapping(vcpu, reg, &el1r, &xlate))) {
>  		if (!is_hyp_ctxt(vcpu))
>  			goto memory_read;
>  
> @@ -219,7 +226,7 @@ void vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg)
>  	if (!vcpu_get_flag(vcpu, SYSREGS_ON_CPU))
>  		goto memory_write;
>  
> -	if (unlikely(get_el2_to_el1_mapping(reg, &el1r, &xlate))) {
> +	if (unlikely(get_el2_to_el1_mapping(vcpu, reg, &el1r, &xlate))) {
>  		if (!is_hyp_ctxt(vcpu))
>  			goto memory_write;

-- 
WBR, Volodymyr

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v1 2/2] KVM: arm64: nv: update CPU register PAR_EL1 after 'at s*'
  2025-08-06 18:40   ` Oliver Upton
@ 2025-08-06 20:30     ` Volodymyr Babchuk
  0 siblings, 0 replies; 10+ messages in thread
From: Volodymyr Babchuk @ 2025-08-06 20:30 UTC (permalink / raw)
  To: Oliver Upton
  Cc: linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
	linux-kernel@vger.kernel.org, Marc Zyngier, Joey Gouly,
	Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon


Hi Oliver,

Oliver Upton <oliver.upton@linux.dev> writes:

> On Wed, Aug 06, 2025 at 02:17:55PM +0000, Volodymyr Babchuk wrote:
>> Previously this code update only vCPU's in-memory value, which is good,
>> but not enough, as there will be no context switch after exiting
>> exception handler, so in-memory value will not get into actual
>> register.
>> 
>> It worked good enough for VHE guests because KVM code tried fast path,
>> which of course updated real PAR_EL1.
>> 
>> Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
>> ---
>>  arch/arm64/kvm/sys_regs.c | 6 ++++++
>>  1 file changed, 6 insertions(+)
>> 
>> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
>> index 7b8a0a6f26468..ab2b5e261d312 100644
>> --- a/arch/arm64/kvm/sys_regs.c
>> +++ b/arch/arm64/kvm/sys_regs.c
>> @@ -3463,6 +3463,9 @@ static bool handle_at_s1e2(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
>>  
>>  	__kvm_at_s1e2(vcpu, op, p->regval);
>>  
>> +	/* No context switch happened, so we need to update PAR_EL1 manually */
>> +	write_sysreg(vcpu_read_sys_reg(vcpu, PAR_EL1), par_el1);
>> +
>
> Ok, this had me thoroughly confused for a moment. The bug is actually in
> kvm_write_sys_reg() which is supposed to update the sysreg value when
> things are loaded on the CPU. __kvm_at_s1e2() is doing the right thing
> by calling this accessor.
>
> For registers like PAR_EL1 that don't have an EL2->EL1 mapping we assume
> they belong to the EL1 context and thus are in-memory when in a hyp
> context. TPIDR(RO)_EL0 is similarly affected.
>
> This is a bit of an ugly hack, but something like the following should
> get things working if you're able to test it:

Yep, it is working pretty good. Will you send the patch? I'll add my
Tested-by.

Or will you prefer that I'll send the patch with your Suggested-by: ?

[...]

-- 
WBR, Volodymyr

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2025-08-06 22:03 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-06 14:17 [PATCH v1 0/2] KVM: arm: nv: fix AT S* behaviour Volodymyr Babchuk
2025-08-06 14:17 ` [PATCH v1 2/2] KVM: arm64: nv: update CPU register PAR_EL1 after 'at s*' Volodymyr Babchuk
2025-08-06 18:40   ` Oliver Upton
2025-08-06 20:30     ` Volodymyr Babchuk
2025-08-06 18:56   ` Marc Zyngier
2025-08-06 20:00     ` Volodymyr Babchuk
2025-08-06 14:17 ` [PATCH v1 1/2] KVM: arm64: nv: fix S2 translation for nVHE guests Volodymyr Babchuk
2025-08-06 17:37   ` Oliver Upton
2025-08-06 18:17     ` Marc Zyngier
2025-08-06 17:45   ` Marc Zyngier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).