[PATCH v2 0/2] KVM: arm64: nv: Fix permission checks for S1PTW faults

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 0/2] KVM: arm64: nv: Fix permission checks for S1PTW faults
@ 2026-06-24 20:24 Oliver Upton
  2026-06-24 20:24 ` [PATCH v2 1/2] KVM: arm64: Only consider S1PTW a write fault if HA is set Oliver Upton
  2026-06-24 20:24 ` [PATCH v2 2/2] KVM: arm64: nv: Treat S1PTW permission faults specially Oliver Upton
  0 siblings, 2 replies; 7+ messages in thread
From: Oliver Upton @ 2026-06-24 20:24 UTC (permalink / raw)
  To: kvmarm
  Cc: Marc Zyngier, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
	Wei-Lin Chang, Steffen Eiden, Oliver Upton

Small series that papers over arch ambiguity around S1PTW permission
faults.

As kvm_s2_handle_perm_fault() wasn't checking for S1PTW instruction
aborts, it was incorrectly evaluating the execute permissions to decide
where to send the fault.

Fixing that uncovers another issue in that kvm_is_write_fault() assumes
any S1PTW permission fault was due to write. Nested screws this up since
an L1 hypervisor could use write-only permissions at stage-2.

We end up papering over architecture ambiguity by potentially evaluating
*both* read and write permissions for S1PTW, assuming any fault with HA
set to require write permission (in addition to read).

Applies to kvmarm/fixes.

v1: https://lore.kernel.org/kvmarm/20260623211310.1529760-1-oupton@kernel.org/

Oliver Upton (2):
  KVM: arm64: Only consider S1PTW a write fault if HA is set
  KVM: arm64: nv: Treat S1PTW permission faults specially

 arch/arm64/include/asm/kvm_emulate.h | 22 +++++----------
 arch/arm64/include/asm/kvm_nested.h  |  2 ++
 arch/arm64/kvm/at.c                  | 42 +++++++++++++++++++++-------
 arch/arm64/kvm/nested.c              | 20 +++++++++++--
 4 files changed, 58 insertions(+), 28 deletions(-)

base-commit: d098bb75d14fde2f12155f1a95ec0168160867ce
-- 
2.47.3

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v2 1/2] KVM: arm64: Only consider S1PTW a write fault if HA is set
  2026-06-24 20:24 [PATCH v2 0/2] KVM: arm64: nv: Fix permission checks for S1PTW faults Oliver Upton
@ 2026-06-24 20:24 ` Oliver Upton
  2026-06-24 20:40   ` sashiko-bot
  2026-06-24 20:24 ` [PATCH v2 2/2] KVM: arm64: nv: Treat S1PTW permission faults specially Oliver Upton
  1 sibling, 1 reply; 7+ messages in thread
From: Oliver Upton @ 2026-06-24 20:24 UTC (permalink / raw)
  To: kvmarm
  Cc: Marc Zyngier, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
	Wei-Lin Chang, Steffen Eiden, Oliver Upton, Sashiko

In yet another example where the architecture is awesome, S1PTW faults
may not have a valid ESR_ELx.WnR. kvm_is_write_fault() worked around
this by relying on a KVM implementation detail that canonical stage-2
translations must have at least read-only permissions.

That assumption no longer holds for nested virt where an L1 hypervisor
could construct write-only mappings that propagate to the shadow
stage-2.

Since there's no exact science to this, assume that the S1PTW fault was
for write if HA is enabled at stage-1.

Fixes: fd276e71d1e7 ("KVM: arm64: nv: Handle shadow stage 2 page faults")
Reported-by: Sashiko <sashiko-bot@kernel.org>
Closes: https://lore.kernel.org/kvmarm/20260623213225.A89CF1F000E9@smtp.kernel.org/
Signed-off-by: Oliver Upton <oupton@kernel.org>
---
 arch/arm64/include/asm/kvm_emulate.h | 22 +++++----------
 arch/arm64/include/asm/kvm_nested.h  |  2 ++
 arch/arm64/kvm/at.c                  | 42 +++++++++++++++++++++-------
 3 files changed, 41 insertions(+), 25 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 5bf3d7e1d92c..8e208ce2597e 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -479,21 +479,13 @@ static __always_inline int kvm_vcpu_sys_get_rt(struct kvm_vcpu *vcpu)
 
 static inline bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
 {
-	if (kvm_vcpu_abt_iss1tw(vcpu)) {
-		/*
-		 * Only a permission fault on a S1PTW should be
-		 * considered as a write. Otherwise, page tables baked
-		 * in a read-only memslot will result in an exception
-		 * being delivered in the guest.
-		 *
-		 * The drawback is that we end-up faulting twice if the
-		 * guest is using any of HW AF/DB: a translation fault
-		 * to map the page containing the PT (read only at
-		 * first), then a permission fault to allow the flags
-		 * to be set.
-		 */
-		return kvm_vcpu_trap_is_permission_fault(vcpu);
-	}
+	/*
+	 * The architecture sucks; assume that the S1PTW fetched for write if
+	 * HA is enabled at stage-1. Note that hardware updates to dirty state
+	 * and table AF are predicated on HA=1 (DDI0487 M.a D24.2.194; R_SNVTX).
+	 */
+	if (kvm_vcpu_abt_iss1tw(vcpu))
+		return effective_tcr_ha(vcpu);
 
 	if (kvm_vcpu_trap_is_iabt(vcpu))
 		return false;
diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
index cbdaaa2a2903..e9f48f94a77f 100644
--- a/arch/arm64/include/asm/kvm_nested.h
+++ b/arch/arm64/include/asm/kvm_nested.h
@@ -417,4 +417,6 @@ u16 get_asid_by_regime(struct kvm_vcpu *vcpu, enum trans_regime regime);
 
 int __kvm_at_swap_desc(struct kvm *kvm, gpa_t ipa, u64 old, u64 new);
 
+bool effective_tcr_ha(struct kvm_vcpu *vcpu);
+
 #endif /* __ARM64_KVM_NESTED_H */
diff --git a/arch/arm64/kvm/at.c b/arch/arm64/kvm/at.c
index 8263c648207b..91154654210e 100644
--- a/arch/arm64/kvm/at.c
+++ b/arch/arm64/kvm/at.c
@@ -219,6 +219,36 @@ static unsigned int tcr_tg_pgshift(struct kvm *kvm, u64 tcr, bool upper_range)
 	return shift;
 }
 
+static bool __effective_tcr_ha(struct kvm_vcpu *vcpu, enum trans_regime regime)
+{
+	if (!kvm_has_feat(vcpu->kvm, ID_AA64MMFR1_EL1, HAFDBS, AF))
+		return false;
+
+	switch (regime) {
+	case TR_EL10:
+		return vcpu_read_sys_reg(vcpu, TCR_EL1) & TCR_HA;
+	case TR_EL20:
+		return vcpu_read_sys_reg(vcpu, TCR_EL2) & TCR_HA;
+	case TR_EL2:
+		return vcpu_read_sys_reg(vcpu, TCR_EL2) & TCR_EL2_HA;
+	default:
+		BUG();
+	}
+}
+
+static enum trans_regime vcpu_trans_regime(struct kvm_vcpu *vcpu)
+{
+	if (!is_hyp_ctxt(vcpu))
+		return TR_EL10;
+
+	return vcpu_el2_e2h_is_set(vcpu) ? TR_EL20 : TR_EL2;
+}
+
+bool effective_tcr_ha(struct kvm_vcpu *vcpu)
+{
+	return __effective_tcr_ha(vcpu, vcpu_trans_regime(vcpu));
+}
+
 static int setup_s1_walk(struct kvm_vcpu *vcpu, struct s1_walk_info *wi,
 			 struct s1_walk_result *wr, u64 va)
 {
@@ -407,12 +437,7 @@ static int setup_s1_walk(struct kvm_vcpu *vcpu, struct s1_walk_info *wi,
 		goto addrsz;
 
 	wi->baddr &= GENMASK_ULL(wi->max_oa_bits - 1, x);
-
-	wi->ha  = kvm_has_feat(vcpu->kvm, ID_AA64MMFR1_EL1, HAFDBS, AF);
-	wi->ha &= (wi->regime == TR_EL2 ?
-		  FIELD_GET(TCR_EL2_HA, tcr) :
-		  FIELD_GET(TCR_HA, tcr));
-
+	wi->ha = __effective_tcr_ha(vcpu, wi->regime);
 	return 0;
 
 addrsz:
@@ -1723,10 +1748,7 @@ int __kvm_find_s1_desc_level(struct kvm_vcpu *vcpu, u64 va, u64 ipa, int *level)
 	struct s1_walk_result wr = {};
 	int ret;
 
-	if (is_hyp_ctxt(vcpu))
-		wi.regime = vcpu_el2_e2h_is_set(vcpu) ? TR_EL20 : TR_EL2;
-	else
-		wi.regime = TR_EL10;
+	wi.regime = vcpu_trans_regime(vcpu);
 
 	ret = setup_s1_walk(vcpu, &wi, &wr, va);
 	if (ret)
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 1/2] KVM: arm64: Only consider S1PTW a write fault if HA is set
  2026-06-24 20:24 ` [PATCH v2 1/2] KVM: arm64: Only consider S1PTW a write fault if HA is set Oliver Upton
@ 2026-06-24 20:40   ` sashiko-bot
  2026-06-24 21:00     ` Oliver Upton
  0 siblings, 1 reply; 7+ messages in thread
From: sashiko-bot @ 2026-06-24 20:40 UTC (permalink / raw)
  To: Oliver Upton; +Cc: Marc Zyngier, kvmarm

Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [High] Unconditionally treating all S1PTW faults as write faults when HA=1 breaks pre-populated guest page tables in read-only memslots.
- [High] `effective_tcr_ha()` incorrectly evaluates the Hardware Access (HA) bit for unprivileged memory accesses (e.g., `LDTR`) at EL2, leading to potential infinite fault loops.
--

commit 05ab0c07bd3e6099e1182b29afb0219565246eb7
Author: Oliver Upton <oupton@kernel.org>

KVM: arm64: Only consider S1PTW a write fault if HA is set

This commit works around the hardware not always setting the WnR bit for 
stage-1 page table walk (S1PTW) faults. It introduces effective_tcr_ha() 
to assume an S1PTW fault is a write if the Hardware Access (HA) flag is 
enabled at stage-1.

Link: https://lore.kernel.org/kvmarm/20260623213225.A89CF1F000E9@smtp.kernel.org/

> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index 5bf3d7e1d92c7..8e208ce2597e5 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h

[ ... ]

> @@ -479,21 +479,13 @@ static __always_inline int kvm_vcpu_sys_get_rt(struct kvm_vcpu *vcpu)
>  
>  static inline bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
>  {
> -	if (kvm_vcpu_abt_iss1tw(vcpu)) {
> -		/*
> -		 * Only a permission fault on a S1PTW should be
> -		 * considered as a write. Otherwise, page tables baked
> -		 * in a read-only memslot will result in an exception
> -		 * being delivered in the guest.
> -		 *
> -		 * The drawback is that we end-up faulting twice if the
> -		 * guest is using any of HW AF/DB: a translation fault
> -		 * to map the page containing the PT (read only at
> -		 * first), then a permission fault to allow the flags
> -		 * to be set.
> -		 */
> -		return kvm_vcpu_trap_is_permission_fault(vcpu);
> -	}
> +	/*
> +	 * The architecture sucks; assume that the S1PTW fetched for write if
> +	 * HA is enabled at stage-1. Note that hardware updates to dirty state
> +	 * and table AF are predicated on HA=1 (DDI0487 M.a D24.2.194; R_SNVTX).
> +	 */
> +	if (kvm_vcpu_abt_iss1tw(vcpu))
> +		return effective_tcr_ha(vcpu);

[Severity: High]
Does unconditionally treating S1PTW faults as writes when HA is enabled break
guests that use pre-populated page tables in read-only memslots?

If a guest populates its page tables with the Access Flag already set, places
them in an RO memslot, and enables HA in its TCR, the hardware only needs read
access during a walk. However, if a translation fault occurs,
kvm_is_write_fault() will unconditionally evaluate to true because of
effective_tcr_ha().

This will cause kvm_handle_guest_abort() to see a write fault on an RO memslot
and inject a spurious Synchronous External Abort (SEA) into the guest, even
though the hardware only needed to read the page table. Could we retain the
conditional check for permission faults to avoid this?

> diff --git a/arch/arm64/kvm/at.c b/arch/arm64/kvm/at.c
> index 8263c648207b0..91154654210ec 100644
> --- a/arch/arm64/kvm/at.c
> +++ b/arch/arm64/kvm/at.c

[ ... ]

> +static bool __effective_tcr_ha(struct kvm_vcpu *vcpu, enum trans_regime regime)
> +{
> +	if (!kvm_has_feat(vcpu->kvm, ID_AA64MMFR1_EL1, HAFDBS, AF))
> +		return false;
> +
> +	switch (regime) {
> +	case TR_EL10:
> +		return vcpu_read_sys_reg(vcpu, TCR_EL1) & TCR_HA;
> +	case TR_EL20:
> +		return vcpu_read_sys_reg(vcpu, TCR_EL2) & TCR_HA;
> +	case TR_EL2:
> +		return vcpu_read_sys_reg(vcpu, TCR_EL2) & TCR_EL2_HA;
> +	default:
> +		BUG();
> +	}
> +}
> +
> +static enum trans_regime vcpu_trans_regime(struct kvm_vcpu *vcpu)
> +{
> +	if (!is_hyp_ctxt(vcpu))
> +		return TR_EL10;
> +
> +	return vcpu_el2_e2h_is_set(vcpu) ? TR_EL20 : TR_EL2;
> +}

[Severity: High]
Does vcpu_trans_regime() correctly handle unprivileged memory accesses 
like LDTR/STTR executed from EL2?

By returning TR_EL2 based strictly on the vCPU's execution context, KVM
disregards that unprivileged instructions at EL2 use the EL1 and 0 translation
regime (TCR_EL1) when HCR_EL2.E2H is 0 or TGE is 0.

Since KVM cannot distinguish an LDTR fault from an LDR fault via the ESR
syndrome, it will read TCR_EL2.HA in __effective_tcr_ha() instead of
TCR_EL1.HA. If TCR_EL1.HA is 1 but TCR_EL2.HA is 0, KVM incorrectly evaluates
HA as disabled.

This causes KVM to treat a legitimate S1PTW permission fault (generated to
update the Access Flag) as a read fault, refusing to grant write permissions.
The hardware will then fault again on retry, creating an infinite stage-2 fault
loop that hangs the guest vCPU.

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260624202446.1698535-1-oupton@kernel.org?part=1

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 1/2] KVM: arm64: Only consider S1PTW a write fault if HA is set
  2026-06-24 20:40   ` sashiko-bot
@ 2026-06-24 21:00     ` Oliver Upton
  0 siblings, 0 replies; 7+ messages in thread
From: Oliver Upton @ 2026-06-24 21:00 UTC (permalink / raw)
  To: sashiko-reviews; +Cc: Marc Zyngier, kvmarm

On Wed, Jun 24, 2026 at 08:40:24PM +0000, sashiko-bot@kernel.org wrote:
> > +	/*
> > +	 * The architecture sucks; assume that the S1PTW fetched for write if
> > +	 * HA is enabled at stage-1. Note that hardware updates to dirty state
> > +	 * and table AF are predicated on HA=1 (DDI0487 M.a D24.2.194; R_SNVTX).
> > +	 */
> > +	if (kvm_vcpu_abt_iss1tw(vcpu))
> > +		return effective_tcr_ha(vcpu);
> 
> [Severity: High]
> Does unconditionally treating S1PTW faults as writes when HA is enabled break
> guests that use pre-populated page tables in read-only memslots?
> 
> If a guest populates its page tables with the Access Flag already set, places
> them in an RO memslot, and enables HA in its TCR, the hardware only needs read
> access during a walk. However, if a translation fault occurs,
> kvm_is_write_fault() will unconditionally evaluate to true because of
> effective_tcr_ha().
> 
> This will cause kvm_handle_guest_abort() to see a write fault on an RO memslot
> and inject a spurious Synchronous External Abort (SEA) into the guest, even
> though the hardware only needed to read the page table. Could we retain the
> conditional check for permission faults to avoid this?

Gah, I tested with QEMU+EDK2 for this exact purpose but did so on an
implementation without HAFDBS.

> > +static enum trans_regime vcpu_trans_regime(struct kvm_vcpu *vcpu)
> > +{
> > +	if (!is_hyp_ctxt(vcpu))
> > +		return TR_EL10;
> > +
> > +	return vcpu_el2_e2h_is_set(vcpu) ? TR_EL20 : TR_EL2;
> > +}
> 
> [Severity: High]
> Does vcpu_trans_regime() correctly handle unprivileged memory accesses 
> like LDTR/STTR executed from EL2?
> 
> By returning TR_EL2 based strictly on the vCPU's execution context, KVM
> disregards that unprivileged instructions at EL2 use the EL1 and 0 translation
> regime (TCR_EL1) when HCR_EL2.E2H is 0 or TGE is 0.
> 
> Since KVM cannot distinguish an LDTR fault from an LDR fault via the ESR
> syndrome, it will read TCR_EL2.HA in __effective_tcr_ha() instead of
> TCR_EL1.HA. If TCR_EL1.HA is 1 but TCR_EL2.HA is 0, KVM incorrectly evaluates
> HA as disabled.
> 
> This causes KVM to treat a legitimate S1PTW permission fault (generated to
> update the Access Flag) as a read fault, refusing to grant write permissions.
> The hardware will then fault again on retry, creating an infinite stage-2 fault
> loop that hangs the guest vCPU.

This is wrong, the LDTR instruction does not use an out-of-context
translation regime.

If HCR_EL2.{E2H,TGE} != {1,1}, LDTR silently upgrades to a privileged
access.

"""
func AArch64_IsUnprivAccessPriv() => boolean
begin
    var privileged : boolean;

    case PSTATE.EL of
        when EL0 =>
            privileged = FALSE;
        when EL1 => privileged = EffectiveHCR_EL2_NVx()[1:0] == '11';
        when EL2 => privileged = !ELIsInHost(EL0);
        when EL3 =>
            privileged = TRUE;
    end;

    if IsFeatureImplemented(FEAT_UAO) && PSTATE.UAO == '1' then
        privileged = PSTATE.EL != EL0;
    end;

    return privileged;
end;
"""

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v2 2/2] KVM: arm64: nv: Treat S1PTW permission faults specially
  2026-06-24 20:24 [PATCH v2 0/2] KVM: arm64: nv: Fix permission checks for S1PTW faults Oliver Upton
  2026-06-24 20:24 ` [PATCH v2 1/2] KVM: arm64: Only consider S1PTW a write fault if HA is set Oliver Upton
@ 2026-06-24 20:24 ` Oliver Upton
  2026-06-24 20:35   ` sashiko-bot
  1 sibling, 1 reply; 7+ messages in thread
From: Oliver Upton @ 2026-06-24 20:24 UTC (permalink / raw)
  To: kvmarm
  Cc: Marc Zyngier, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
	Wei-Lin Chang, Steffen Eiden, Oliver Upton, Sashiko

Permission faults reported as instruction aborts with S1PTW set are due
to missing read/write permissions for the table walk, not execute
permissions on the output of translation.

As there's no way to directly tell which of the two permissions failed,
evaluate both and forward the fault to the L1 hypervisor if either
fails.

Fixes: fd276e71d1e7 ("KVM: arm64: nv: Handle shadow stage 2 page faults")
Reported-by: Sashiko <sashiko-bot@kernel.org>
Closes: https://lore.kernel.org/kvmarm/20260623190607.7106B1F000E9@smtp.kernel.org/
Signed-off-by: Oliver Upton <oupton@kernel.org>
---
 arch/arm64/kvm/nested.c | 20 +++++++++++++++++---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index 94df26de6990..4c9123cb2e1c 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -911,6 +911,7 @@ void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu)
  */
 int kvm_s2_handle_perm_fault(struct kvm_vcpu *vcpu, struct kvm_s2_trans *trans)
 {
+	bool write_fault = kvm_is_write_fault(vcpu);
 	bool forward_fault = false;
 
 	trans->esr = 0;
@@ -918,14 +919,27 @@ int kvm_s2_handle_perm_fault(struct kvm_vcpu *vcpu, struct kvm_s2_trans *trans)
 	if (!kvm_vcpu_trap_is_permission_fault(vcpu))
 		return 0;
 
-	if (kvm_vcpu_trap_is_iabt(vcpu)) {
+	/*
+	 * S1PTW permission faults do not provide sufficient syndrome information
+	 * to determine if the fault was for read or write permissions. Perform a
+	 * read permission check and an optional write permission check, relying
+	 * on the fact that:
+	 *
+	 *  - The table walker at minimum requires read permission
+	 *
+	 *  - The L1 hypervisor also needs to deal with the architecture and
+	 *    cannot directly infer the failing permission from the fault context
+	 */
+	if (kvm_vcpu_abt_iss1tw(vcpu)) {
+		forward_fault = !trans->readable;
+		if (write_fault)
+			forward_fault |= !trans->writable;
+	} else if (kvm_vcpu_trap_is_iabt(vcpu)) {
 		if (vcpu_mode_priv(vcpu))
 			forward_fault = !kvm_s2_trans_exec_el1(vcpu->kvm, trans);
 		else
 			forward_fault = !kvm_s2_trans_exec_el0(vcpu->kvm, trans);
 	} else {
-		bool write_fault = kvm_is_write_fault(vcpu);
-
 		forward_fault = ((write_fault && !trans->writable) ||
 				 (!write_fault && !trans->readable));
 	}
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 2/2] KVM: arm64: nv: Treat S1PTW permission faults specially
  2026-06-24 20:24 ` [PATCH v2 2/2] KVM: arm64: nv: Treat S1PTW permission faults specially Oliver Upton
@ 2026-06-24 20:35   ` sashiko-bot
  2026-06-24 21:22     ` Oliver Upton
  0 siblings, 1 reply; 7+ messages in thread
From: sashiko-bot @ 2026-06-24 20:35 UTC (permalink / raw)
  To: Oliver Upton; +Cc: kvmarm, Marc Zyngier

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:

Pre-existing issues:
- [High] An L1 hypervisor can cause an infinite page fault loop in the L0 hypervisor by creating an L1 S2 PTE with XN=0 (execution allowed) but AP=00 (read/write denied). L0 fails to forward the resulting instruction abort to L1, repeatedly mapping the shadow PTE without read permissions, which the hardware continually rejects as non-executable.
--

commit f64f1ef4967d6c0954790535ee60d6e909805163
Author: Oliver Upton <oupton@kernel.org>

KVM: arm64: nv: Treat S1PTW permission faults specially

Permission faults reported as instruction aborts with S1PTW set are due
to missing read/write permissions for the table walk, not execute
permissions on the output of translation. This patch adds logic to evaluate
both and forward the fault to the L1 hypervisor if either fails.

Closes: https://lore.kernel.org/kvmarm/20260623190607.7106B1F000E9@smtp.kernel.org/

> diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
> index 94df26de6990c..4c9123cb2e1c5 100644
> --- a/arch/arm64/kvm/nested.c
> +++ b/arch/arm64/kvm/nested.c
[ ... ]
> @@ -918,14 +919,27 @@ int kvm_s2_handle_perm_fault(struct kvm_vcpu *vcpu, struct kvm_s2_trans *trans)
>  	if (!kvm_vcpu_trap_is_permission_fault(vcpu))
>  		return 0;
>  
> -	if (kvm_vcpu_trap_is_iabt(vcpu)) {
> +	/*
> +	 * S1PTW permission faults do not provide sufficient syndrome information
> +	 * to determine if the fault was for read or write permissions. Perform a
> +	 * read permission check and an optional write permission check, relying
> +	 * on the fact that:
> +	 *
> +	 *  - The table walker at minimum requires read permission
> +	 *
> +	 *  - The L1 hypervisor also needs to deal with the architecture and
> +	 *    cannot directly infer the failing permission from the fault context
> +	 */
> +	if (kvm_vcpu_abt_iss1tw(vcpu)) {
> +		forward_fault = !trans->readable;
> +		if (write_fault)
> +			forward_fault |= !trans->writable;
> +	} else if (kvm_vcpu_trap_is_iabt(vcpu)) {
>  		if (vcpu_mode_priv(vcpu))
>  			forward_fault = !kvm_s2_trans_exec_el1(vcpu->kvm, trans);
>  		else
>  			forward_fault = !kvm_s2_trans_exec_el0(vcpu->kvm, trans);

[Severity: High]
This is a pre-existing issue, but does this logic miss a read-permission
dependency check for normal instruction aborts?

If an L1 hypervisor maps a page for L2 with Stage-2 PTE AP=00 (no read/write)
and XN=0 (executable), and L2 attempts to fetch an instruction, the hardware
generates a Stage-2 Instruction Abort (Permission Fault) because execution
requires read permission.

Here, kvm_s2_handle_perm_fault() intercepts the IABT and checks only the XN
bits via kvm_s2_trans_exec_el1() and kvm_s2_trans_exec_el0(). Since they
only check the XN bits and ignore trans->readable, forward_fault is set to
false.

L0 then handles the fault by mapping the shadow S2 PTE. The shadow fault
handler will clear the read protection bit, leaving the execute protection
bit intact. The hardware will reject this shadow mapping as non-executable
due to missing read permissions, triggering another IABT upon guest
resumption and causing an infinite page fault loop in the L0 hypervisor.

>  	} else {
> -		bool write_fault = kvm_is_write_fault(vcpu);
> -
>  		forward_fault = ((write_fault && !trans->writable) ||
>  				 (!write_fault && !trans->readable));
>  	}

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260624202446.1698535-1-oupton@kernel.org?part=2

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 2/2] KVM: arm64: nv: Treat S1PTW permission faults specially
  2026-06-24 20:35   ` sashiko-bot
@ 2026-06-24 21:22     ` Oliver Upton
  0 siblings, 0 replies; 7+ messages in thread
From: Oliver Upton @ 2026-06-24 21:22 UTC (permalink / raw)
  To: sashiko-reviews; +Cc: kvmarm, Marc Zyngier

On Wed, Jun 24, 2026 at 08:35:16PM +0000, sashiko-bot@kernel.org wrote:
> > +	/*
> > +	 * S1PTW permission faults do not provide sufficient syndrome information
> > +	 * to determine if the fault was for read or write permissions. Perform a
> > +	 * read permission check and an optional write permission check, relying
> > +	 * on the fact that:
> > +	 *
> > +	 *  - The table walker at minimum requires read permission
> > +	 *
> > +	 *  - The L1 hypervisor also needs to deal with the architecture and
> > +	 *    cannot directly infer the failing permission from the fault context
> > +	 */
> > +	if (kvm_vcpu_abt_iss1tw(vcpu)) {
> > +		forward_fault = !trans->readable;
> > +		if (write_fault)
> > +			forward_fault |= !trans->writable;
> > +	} else if (kvm_vcpu_trap_is_iabt(vcpu)) {
> >  		if (vcpu_mode_priv(vcpu))
> >  			forward_fault = !kvm_s2_trans_exec_el1(vcpu->kvm, trans);
> >  		else
> >  			forward_fault = !kvm_s2_trans_exec_el0(vcpu->kvm, trans);
> 
> [Severity: High]
> This is a pre-existing issue, but does this logic miss a read-permission
> dependency check for normal instruction aborts?
> 
> If an L1 hypervisor maps a page for L2 with Stage-2 PTE AP=00 (no read/write)
> and XN=0 (executable), and L2 attempts to fetch an instruction, the hardware
> generates a Stage-2 Instruction Abort (Permission Fault) because execution
> requires read permission.

This does not match my understanding of stage-2 permissions. S2AP
controls data permissions, whereas the XN field controls instruction
fetches. The pseudocode also supports this view, where execute
permissions are computed separately from S2AP:

AArch64.S2DirectBasePermissions()
"""
    S2AccessControls s2perms;
    bit w;
    constant bit r = permissions.s2ap<0>;
    if permissions.s2ap<1> == '1' then
        w = '1';
    // Descriptors marked with DBM set have the effective value of S2AP[1] set.
    // This implies no Permission faults caused by lack of write permissions are
    // reported, and the Dirty bit can be set.
    elsif permissions.dbm == '1' && walkparams.hd == '1' then
        // An update occurs here, conditional to being able to append to HDBSS
        if walkparams.hdbss == '1' then
            w = if CanAppendToHDBSS() then '1' else '0';
        else
            w = '1';
    else
        w = '0';

    bit px, ux;
    case (permissions.s2xn:permissions.s2xnx) of
        when '00' (px,ux) = ('1','1');
        when '01' (px,ux) = ('0','1');
        when '10' (px,ux) = ('0','0');
        when '11' (px,ux) = ('1','0');

    x = if accdesc.el == EL0 then ux else px;
    s2perms.r = r;
    s2perms.w = w;
    s2perms.x = x;
    s2perms.r_rcw = r;
    s2perms.w_rcw = w;
    s2perms.r_mmu = r;
    s2perms.w_mmu = w;
    s2perms.toplevel0 = '0';
    s2perms.toplevel1 = '0';
    s2perms.overlay = FALSE;

    return s2perms;
"""

AArch64.S2CheckPermissions()
"""
    elsif accdesc.acctype == AccessType_IFETCH then
        if s2perms.overlay && s2perms.ox == '0' then
            fault.statuscode = Fault_Permission;
            fault.overlay    = TRUE;
        elsif (memtype == MemType_Device &&
                ConstrainUnpredictable(Unpredictable_INSTRDEVICE) == Constraint_FAULT) then
            fault.statuscode = Fault_Permission;

        // Prevent execution from Non-secure space by Realm state
        elsif accdesc.ss == SS_Realm && walkstate.baseaddress.paspace != PAS_Realm then
            fault.statuscode = Fault_Permission;
        elsif s2perms.x == '0' then
            fault.statuscode = Fault_Permission;
"""

Thanks,
Oliver

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-06-24 21:22 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-24 20:24 [PATCH v2 0/2] KVM: arm64: nv: Fix permission checks for S1PTW faults Oliver Upton
2026-06-24 20:24 ` [PATCH v2 1/2] KVM: arm64: Only consider S1PTW a write fault if HA is set Oliver Upton
2026-06-24 20:40   ` sashiko-bot
2026-06-24 21:00     ` Oliver Upton
2026-06-24 20:24 ` [PATCH v2 2/2] KVM: arm64: nv: Treat S1PTW permission faults specially Oliver Upton
2026-06-24 20:35   ` sashiko-bot
2026-06-24 21:22     ` Oliver Upton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.