public inbox for linux-arm-kernel@lists.infradead.org
 help / color / mirror / Atom feed
* [PATCH v2 0/6] KVM: arm64: EL2 synchronisation and pKVM stage-2 error propagation fixes
@ 2026-05-01 11:21 Fuad Tabba
  2026-05-01 11:21 ` [PATCH v2 1/6] KVM: arm64: Make EL2 exception entry and exit context-synchronization events Fuad Tabba
                   ` (5 more replies)
  0 siblings, 6 replies; 10+ messages in thread
From: Fuad Tabba @ 2026-05-01 11:21 UTC (permalink / raw)
  To: maz, oliver.upton
  Cc: james.morse, suzuki.poulose, yuzenghui, qperret, vdonnefort,
	tabba, catalin.marinas, will, yaoyuan, linux-arm-kernel, kvmarm,
	linux-kernel, stable

Hi folks,

V2 of the kvm/arm64 audit fixes [1].

Changes since v1:

    Patch 1 (SCTLR_EL2.EIS|EOS): Fixes: tag corrected to 0a35bd285f43
    ("arm64: Convert SCTLR_EL2 to sysreg infrastructure"); the commit
    message now explains that the conversion auto-generated
    SCTLR_EL2_RES1 to UL(0).  Code unchanged.

    Patches 2-3 (NULL vcpu guard, __deactivate_fgt typo): unchanged.

    Patch 4 (new): Seed selftest_vcpu's memcache to mirror
    hyp-main.c's pkvm_refill_memcache() flow; required by the
    pre-check in patches 5-6.

    Patches 5-6 (host->guest share/donate, formerly v1 patches 5-6):
    reworked to pre-check the vcpu memcache against
    kvm_mmu_cache_min_pages() during the existing pre-check pass,
    before any state mutation.  The WARN_ON() around
    kvm_pgtable_stage2_map() then asserts an invariant the pre-check
    pass establishes, rather than swallowing a reachable -ENOMEM.

Dropped since v1:

    - Patch 2 (HCR_EL2 sync): failure path not reachable.
    - Patches 7-8 (guest->host share/unshare): the stage-2 map cannot
      fail at those call sites (the leaf already exists).

Carried `Reviewed-by` tag (thanks!) and added `Assisted-by:` tags.

Note that with `review-prompts` in the `Assisted-by:` tags, I am
referring to subsystem guides that I added to the base prompts [2],
which I plan submit for upstreaming.

Cheers,
/fuad

[1] https://lore.kernel.org/all/20260428103008.696141-1-tabba@google.com/
[2] https://github.com/masoncl/review-prompts

Fuad Tabba (6):
  KVM: arm64: Make EL2 exception entry and exit context-synchronization
    events
  KVM: arm64: Guard against NULL vcpu on VHE hyp panic path
  KVM: arm64: Fix __deactivate_fgt macro parameter typo
  KVM: arm64: Seed pkvm_ownership_selftest vcpu memcache
  KVM: arm64: Pre-check vcpu memcache for host->guest share
  KVM: arm64: Pre-check vcpu memcache for host->guest donate

 arch/arm64/include/asm/sysreg.h         |  2 +-
 arch/arm64/kvm/hyp/include/hyp/switch.h |  2 +-
 arch/arm64/kvm/hyp/nvhe/mem_protect.c   | 24 ++++++++++++++++++++++++
 arch/arm64/kvm/hyp/nvhe/pkvm.c          | 16 +++++++++++++++-
 arch/arm64/kvm/hyp/vhe/switch.c         |  3 ++-
 5 files changed, 43 insertions(+), 4 deletions(-)

-- 
2.54.0.545.g6539524ca2-goog



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2 1/6] KVM: arm64: Make EL2 exception entry and exit context-synchronization events
  2026-05-01 11:21 [PATCH v2 0/6] KVM: arm64: EL2 synchronisation and pKVM stage-2 error propagation fixes Fuad Tabba
@ 2026-05-01 11:21 ` Fuad Tabba
  2026-05-01 13:47   ` Ben Horgan
  2026-05-01 11:21 ` [PATCH v2 2/6] KVM: arm64: Guard against NULL vcpu on VHE hyp panic path Fuad Tabba
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 10+ messages in thread
From: Fuad Tabba @ 2026-05-01 11:21 UTC (permalink / raw)
  To: maz, oliver.upton
  Cc: james.morse, suzuki.poulose, yuzenghui, qperret, vdonnefort,
	tabba, catalin.marinas, will, yaoyuan, linux-arm-kernel, kvmarm,
	linux-kernel, stable

SCTLR_EL2.EIS and SCTLR_EL2.EOS control whether exception entry and
exit at EL2 are Context Synchronisation Events (CSEs). Per ARM DDI
0487 M.b D24.2.175 (p. D24-9754):

  - !FEAT_ExS: the bit is RES1, so the entry/exit is unconditionally
    a CSE.
  - FEAT_ExS: the reset value is architecturally UNKNOWN; software
    must set the bit to make the entry/exit a CSE.

INIT_SCTLR_EL2_MMU_ON in arch/arm64/include/asm/sysreg.h sets neither
bit. KVM/arm64 hot paths rely on ERET from EL2 being a CSE, and on
synchronous EL1->EL2 entry being a CSE, to elide explicit ISBs after
MSRs to context-switching system registers (HCR_EL2, ZCR_EL2,
ptrauth keys, etc.). On FEAT_ExS hardware those reliances are not
architecturally backed unless EOS=1 (and, for entry, EIS=1).

Until commit 0a35bd285f43 ("arm64: Convert SCTLR_EL2 to sysreg
infrastructure"), SCTLR_EL2_RES1 was a hand-rolled mask that
included BIT(11) (EOS) and BIT(22) (EIS), so INIT_SCTLR_EL2_MMU_ON
was setting both unconditionally. The conversion made
SCTLR_EL2_RES1 auto-generated; because the sysreg tooling only
models unconditionally-RES1 fields and EIS/EOS are RES1 only when
FEAT_ExS is absent, the auto-generated mask is UL(0). The seven
other bits dropped from the old mask (positions 4, 5, 16, 18, 23,
28, 29) are unconditionally RES1 in the E2H=0 SCTLR_EL2 layout per
DDI 0487 M.b D24.2.175, so dropping them is harmless. EIS and EOS
are the only bits whose semantics changed for FEAT_ExS hardware
and where the kernel relies on the value being 1.

Make the guarantee explicit: include SCTLR_ELx_EIS | SCTLR_ELx_EOS in
INIT_SCTLR_EL2_MMU_ON so that EL2 exception entry and exit are
unconditionally CSEs regardless of whether FEAT_ExS is implemented.
This matches the pairing in arch/arm64/kvm/config.c which treats EIS
and EOS together as RES1 under !FEAT_ExS.

Fixes: 0a35bd285f43 ("arm64: Convert SCTLR_EL2 to sysreg infrastructure")
Reviewed-by: Yuan Yao <yaoyuan@linux.alibaba.com>
Assisted-by: Gemini:gemini-3.1-pro review-prompts
Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/arm64/include/asm/sysreg.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 736561480f36..7aa08d59d494 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -844,7 +844,7 @@
 #define INIT_SCTLR_EL2_MMU_ON						\
 	(SCTLR_ELx_M  | SCTLR_ELx_C | SCTLR_ELx_SA | SCTLR_ELx_I |	\
 	 SCTLR_ELx_IESB | SCTLR_ELx_WXN | ENDIAN_SET_EL2 |		\
-	 SCTLR_ELx_ITFSB | SCTLR_EL2_RES1)
+	 SCTLR_ELx_ITFSB | SCTLR_ELx_EIS | SCTLR_ELx_EOS | SCTLR_EL2_RES1)
 
 #define INIT_SCTLR_EL2_MMU_OFF \
 	(SCTLR_EL2_RES1 | ENDIAN_SET_EL2)
-- 
2.54.0.545.g6539524ca2-goog



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 2/6] KVM: arm64: Guard against NULL vcpu on VHE hyp panic path
  2026-05-01 11:21 [PATCH v2 0/6] KVM: arm64: EL2 synchronisation and pKVM stage-2 error propagation fixes Fuad Tabba
  2026-05-01 11:21 ` [PATCH v2 1/6] KVM: arm64: Make EL2 exception entry and exit context-synchronization events Fuad Tabba
@ 2026-05-01 11:21 ` Fuad Tabba
  2026-05-01 11:21 ` [PATCH v2 3/6] KVM: arm64: Fix __deactivate_fgt macro parameter typo Fuad Tabba
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Fuad Tabba @ 2026-05-01 11:21 UTC (permalink / raw)
  To: maz, oliver.upton
  Cc: james.morse, suzuki.poulose, yuzenghui, qperret, vdonnefort,
	tabba, catalin.marinas, will, yaoyuan, linux-arm-kernel, kvmarm,
	linux-kernel, stable

On VHE, __hyp_call_panic() unconditionally calls __deactivate_traps(vcpu)
on the vcpu pointer read from host_ctxt->__hyp_running_vcpu. That pointer
is cleared after every guest exit (and is never set when no guest is
running), so an unexpected EL2 exception landing in _guest_exit_panic,
e.g. via the el2t*_invalid / el2h_irq_invalid vectors - reaches this
function with vcpu == NULL. __deactivate_traps() then dereferences vcpu
via ___deactivate_traps() -> vserror_state_is_nested() -> vcpu_has_nv()
-> vcpu->arch.features, faulting inside the panic handler and obscuring
the original failure.

The nVHE counterpart (hyp_panic() in arch/arm64/kvm/hyp/nvhe/switch.c)
already guards its vcpu-using cleanup with "if (vcpu)"; mirror that
here. sysreg_restore_host_state_vhe() does not depend on vcpu and
continues to run unconditionally, preserving panic forensics. The
trailing panic("...VCPU:%p", vcpu) prints "(null)" safely via printk's
%p handling.

Fixes: 6a0259ed29bb ("KVM: arm64: Remove hyp_panic arguments")
Assisted-by: Gemini:gemini-3.1-pro review-prompts
Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/arm64/kvm/hyp/vhe/switch.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
index 9db3f11a4754..1e8995add14f 100644
--- a/arch/arm64/kvm/hyp/vhe/switch.c
+++ b/arch/arm64/kvm/hyp/vhe/switch.c
@@ -663,7 +663,8 @@ static void __noreturn __hyp_call_panic(u64 spsr, u64 elr, u64 par)
 	host_ctxt = host_data_ptr(host_ctxt);
 	vcpu = host_ctxt->__hyp_running_vcpu;
 
-	__deactivate_traps(vcpu);
+	if (vcpu)
+		__deactivate_traps(vcpu);
 	sysreg_restore_host_state_vhe(host_ctxt);
 
 	panic("HYP panic:\nPS:%08llx PC:%016llx ESR:%08llx\nFAR:%016llx HPFAR:%016llx PAR:%016llx\nVCPU:%p\n",
-- 
2.54.0.545.g6539524ca2-goog



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 3/6] KVM: arm64: Fix __deactivate_fgt macro parameter typo
  2026-05-01 11:21 [PATCH v2 0/6] KVM: arm64: EL2 synchronisation and pKVM stage-2 error propagation fixes Fuad Tabba
  2026-05-01 11:21 ` [PATCH v2 1/6] KVM: arm64: Make EL2 exception entry and exit context-synchronization events Fuad Tabba
  2026-05-01 11:21 ` [PATCH v2 2/6] KVM: arm64: Guard against NULL vcpu on VHE hyp panic path Fuad Tabba
@ 2026-05-01 11:21 ` Fuad Tabba
  2026-05-01 11:21 ` [PATCH v2 4/6] KVM: arm64: Seed pkvm_ownership_selftest vcpu memcache Fuad Tabba
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Fuad Tabba @ 2026-05-01 11:21 UTC (permalink / raw)
  To: maz, oliver.upton
  Cc: james.morse, suzuki.poulose, yuzenghui, qperret, vdonnefort,
	tabba, catalin.marinas, will, yaoyuan, linux-arm-kernel, kvmarm,
	linux-kernel, stable

__deactivate_fgt() declares its first parameter as "htcxt" but the body
references "hctxt". The parameter is unused; the macro silently captures
"hctxt" from the enclosing scope. Both existing callers
(__deactivate_traps_hfgxtr() and __deactivate_traps_ich_hfgxtr()) happen
to define a local "struct kvm_cpu_context *hctxt", so the macro works
by coincidence.

A future caller without an "hctxt" local in scope, or naming it
differently, would compile but bind to the wrong context. Align the
parameter name with the sibling __activate_fgt() macro.

The "vcpu" parameter remains unused in the body, kept for API symmetry
with __activate_fgt() (which uses it).

Fixes: f5a5a406b4b8 ("KVM: arm64: Propagate and handle Fine-Grained UNDEF bits")
Assisted-by: Gemini:gemini-3.1-pro review-prompts
Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/arm64/kvm/hyp/include/hyp/switch.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h
index 98b2976837b1..bf0eb5e43427 100644
--- a/arch/arm64/kvm/hyp/include/hyp/switch.h
+++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
@@ -245,7 +245,7 @@ static inline void __activate_traps_ich_hfgxtr(struct kvm_vcpu *vcpu)
 	__activate_fgt(hctxt, vcpu, ICH_HFGITR_EL2);
 }
 
-#define __deactivate_fgt(htcxt, vcpu, reg)				\
+#define __deactivate_fgt(hctxt, vcpu, reg)				\
 	do {								\
 		write_sysreg_s(ctxt_sys_reg(hctxt, reg),		\
 			       SYS_ ## reg);				\
-- 
2.54.0.545.g6539524ca2-goog



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 4/6] KVM: arm64: Seed pkvm_ownership_selftest vcpu memcache
  2026-05-01 11:21 [PATCH v2 0/6] KVM: arm64: EL2 synchronisation and pKVM stage-2 error propagation fixes Fuad Tabba
                   ` (2 preceding siblings ...)
  2026-05-01 11:21 ` [PATCH v2 3/6] KVM: arm64: Fix __deactivate_fgt macro parameter typo Fuad Tabba
@ 2026-05-01 11:21 ` Fuad Tabba
  2026-05-01 11:21 ` [PATCH v2 5/6] KVM: arm64: Pre-check vcpu memcache for host->guest share Fuad Tabba
  2026-05-01 11:21 ` [PATCH v2 6/6] KVM: arm64: Pre-check vcpu memcache for host->guest donate Fuad Tabba
  5 siblings, 0 replies; 10+ messages in thread
From: Fuad Tabba @ 2026-05-01 11:21 UTC (permalink / raw)
  To: maz, oliver.upton
  Cc: james.morse, suzuki.poulose, yuzenghui, qperret, vdonnefort,
	tabba, catalin.marinas, will, yaoyuan, linux-arm-kernel, kvmarm,
	linux-kernel, stable

The hypercall handlers call pkvm_refill_memcache() to top up the
hyp_vcpu memcache before invoking __pkvm_host_{share,donate}_guest().
pkvm_ownership_selftest invokes those functions directly with a
static selftest_vcpu that has an empty memcache.

Seed selftest_vcpu's memcache from the prepopulated selftest
pages, leaving the remainder for selftest_vm.pool. Required by
the memcache-sufficiency pre-check added in the following
patches.

Assisted-by: Gemini:gemini-3.1-pro review-prompts
Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/arm64/kvm/hyp/nvhe/pkvm.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index 7ed96d64d611..deee7947d694 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -751,16 +751,30 @@ static struct pkvm_hyp_vcpu selftest_vcpu = {
 struct pkvm_hyp_vcpu *init_selftest_vm(void *virt)
 {
 	struct hyp_page *p = hyp_virt_to_page(virt);
+	unsigned long min_pages, seeded = 0;
 	int i;
 
 	selftest_vm.kvm.arch.mmu.vtcr = host_mmu.arch.mmu.vtcr;
 	WARN_ON(kvm_guest_prepare_stage2(&selftest_vm, virt));
 
+	/*
+	 * Mirror pkvm_refill_memcache() for the share/donate pre-checks;
+	 * the selftest invokes those functions directly and would
+	 * otherwise see an empty memcache.
+	 */
+	min_pages = kvm_mmu_cache_min_pages(&selftest_vm.kvm.arch.mmu);
+
 	for (i = 0; i < pkvm_selftest_pages(); i++) {
 		if (p[i].refcount)
 			continue;
 		p[i].refcount = 1;
-		hyp_put_page(&selftest_vm.pool, hyp_page_to_virt(&p[i]));
+		if (seeded < min_pages) {
+			push_hyp_memcache(&selftest_vcpu.vcpu.arch.pkvm_memcache,
+					  hyp_page_to_virt(&p[i]), hyp_virt_to_phys);
+			seeded++;
+		} else {
+			hyp_put_page(&selftest_vm.pool, hyp_page_to_virt(&p[i]));
+		}
 	}
 
 	selftest_vm.kvm.arch.pkvm.handle = __pkvm_reserve_vm();
-- 
2.54.0.545.g6539524ca2-goog



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 5/6] KVM: arm64: Pre-check vcpu memcache for host->guest share
  2026-05-01 11:21 [PATCH v2 0/6] KVM: arm64: EL2 synchronisation and pKVM stage-2 error propagation fixes Fuad Tabba
                   ` (3 preceding siblings ...)
  2026-05-01 11:21 ` [PATCH v2 4/6] KVM: arm64: Seed pkvm_ownership_selftest vcpu memcache Fuad Tabba
@ 2026-05-01 11:21 ` Fuad Tabba
  2026-05-01 11:21 ` [PATCH v2 6/6] KVM: arm64: Pre-check vcpu memcache for host->guest donate Fuad Tabba
  5 siblings, 0 replies; 10+ messages in thread
From: Fuad Tabba @ 2026-05-01 11:21 UTC (permalink / raw)
  To: maz, oliver.upton
  Cc: james.morse, suzuki.poulose, yuzenghui, qperret, vdonnefort,
	tabba, catalin.marinas, will, yaoyuan, linux-arm-kernel, kvmarm,
	linux-kernel, stable

__pkvm_host_share_guest() ends with kvm_pgtable_stage2_map() to
install the guest stage-2 mapping, after a forward pass that mutates
the host vmemmap (sets PKVM_PAGE_SHARED_OWNED and increments
host_share_guest_count) for every page in the range. The map's
return value is wrapped in WARN_ON() and otherwise discarded,
asserting that the call cannot fail.

WARN_ON() at nVHE EL2 panics, so this assertion is only correct if
the call genuinely cannot fail. kvm_pgtable_stage2_map() can fail
with -ENOMEM when the stage-2 walker exhausts the caller's
memcache, and the host controls the vcpu memcache via the topup
interface, so an under-provisioned share request would otherwise
turn a recoverable -ENOMEM into a fatal hyp panic.

Bound the worst-case walker allocation in the existing pre-check
pass so that kvm_pgtable_stage2_map() cannot fail at the call
site, using kvm_mmu_cache_min_pages() -- the same bound host EL1
uses for its own stage-2 maps. If the vcpu memcache holds fewer
pages, return -ENOMEM before any state mutation.

Fixes: d0bd3e6570ae ("KVM: arm64: Introduce __pkvm_host_share_guest()")
Assisted-by: Gemini:gemini-3.1-pro review-prompts
Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/arm64/kvm/hyp/nvhe/mem_protect.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 28a471d1927c..e428304f94f2 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -1369,6 +1369,22 @@ int __pkvm_host_reclaim_page_guest(u64 gfn, struct pkvm_hyp_vm *vm)
 	return ret && ret != -EHWPOISON ? ret : 0;
 }
 
+/*
+ * share/donate install at most one stage-2 leaf (PAGE_SIZE, or one
+ * KVM_PGTABLE_LAST_LEVEL - 1 block for share). kvm_mmu_cache_min_pages()
+ * bounds the worst-case allocation: exact for the PAGE_SIZE leaf,
+ * conservative by one for the block.
+ */
+static int __guest_check_pgtable_memcache(struct pkvm_hyp_vcpu *vcpu)
+{
+	struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu);
+
+	if (vcpu->vcpu.arch.pkvm_memcache.nr_pages < kvm_mmu_cache_min_pages(vm->pgt.mmu))
+		return -ENOMEM;
+
+	return 0;
+}
+
 int __pkvm_host_donate_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu)
 {
 	struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu);
@@ -1453,6 +1469,10 @@ int __pkvm_host_share_guest(u64 pfn, u64 gfn, u64 nr_pages, struct pkvm_hyp_vcpu
 		}
 	}
 
+	ret = __guest_check_pgtable_memcache(vcpu);
+	if (ret)
+		goto unlock;
+
 	for_each_hyp_page(page, phys, size) {
 		set_host_state(page, PKVM_PAGE_SHARED_OWNED);
 		page->host_share_guest_count++;
-- 
2.54.0.545.g6539524ca2-goog



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 6/6] KVM: arm64: Pre-check vcpu memcache for host->guest donate
  2026-05-01 11:21 [PATCH v2 0/6] KVM: arm64: EL2 synchronisation and pKVM stage-2 error propagation fixes Fuad Tabba
                   ` (4 preceding siblings ...)
  2026-05-01 11:21 ` [PATCH v2 5/6] KVM: arm64: Pre-check vcpu memcache for host->guest share Fuad Tabba
@ 2026-05-01 11:21 ` Fuad Tabba
  5 siblings, 0 replies; 10+ messages in thread
From: Fuad Tabba @ 2026-05-01 11:21 UTC (permalink / raw)
  To: maz, oliver.upton
  Cc: james.morse, suzuki.poulose, yuzenghui, qperret, vdonnefort,
	tabba, catalin.marinas, will, yaoyuan, linux-arm-kernel, kvmarm,
	linux-kernel, stable

__pkvm_host_donate_guest() flips the host stage-2 PTE for the
donated page to a non-valid annotation via
host_stage2_set_owner_metadata_locked() and then calls
kvm_pgtable_stage2_map() to install the matching guest stage-2
mapping. The map's return value is wrapped in WARN_ON() and
otherwise discarded, asserting that the call cannot fail.

WARN_ON() at nVHE EL2 panics, so this assertion is only correct
if the call genuinely cannot fail. kvm_pgtable_stage2_map() can
fail with -ENOMEM even at PAGE_SIZE granularity: the donate path
verifies PKVM_NOPAGE for the guest IPA before the map, so the
walker must allocate fresh page-table pages from the vcpu
memcache, and the host controls the vcpu memcache via the topup
interface. An under-provisioned donation request would otherwise
turn a recoverable -ENOMEM into a fatal hyp panic.

Bound the worst-case walker allocation alongside the existing
__host_check_page_state_range() / __guest_check_page_state_range()
pre-checks, using the helper introduced for host->guest share. If
the vcpu memcache holds fewer pages than kvm_mmu_cache_min_pages(),
return -ENOMEM before any state mutation.

Fixes: 1e579adca177 ("KVM: arm64: Introduce __pkvm_host_donate_guest()")
Assisted-by: Gemini:gemini-3.1-pro review-prompts
Signed-off-by: Fuad Tabba <tabba@google.com>
---
 arch/arm64/kvm/hyp/nvhe/mem_protect.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index e428304f94f2..c7f7149c4796 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -1404,6 +1404,10 @@ int __pkvm_host_donate_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu)
 	if (ret)
 		goto unlock;
 
+	ret = __guest_check_pgtable_memcache(vcpu);
+	if (ret)
+		goto unlock;
+
 	meta = host_stage2_encode_gfn_meta(vm, gfn);
 	WARN_ON(host_stage2_set_owner_metadata_locked(phys, PAGE_SIZE,
 						      PKVM_ID_GUEST, meta));
-- 
2.54.0.545.g6539524ca2-goog



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/6] KVM: arm64: Make EL2 exception entry and exit context-synchronization events
  2026-05-01 11:21 ` [PATCH v2 1/6] KVM: arm64: Make EL2 exception entry and exit context-synchronization events Fuad Tabba
@ 2026-05-01 13:47   ` Ben Horgan
  2026-05-01 15:01     ` Fuad Tabba
  0 siblings, 1 reply; 10+ messages in thread
From: Ben Horgan @ 2026-05-01 13:47 UTC (permalink / raw)
  To: Fuad Tabba, maz, oliver.upton
  Cc: james.morse, suzuki.poulose, yuzenghui, qperret, vdonnefort,
	catalin.marinas, will, yaoyuan, linux-arm-kernel, kvmarm,
	linux-kernel, stable

Hi Fuad,

On 5/1/26 12:21, Fuad Tabba wrote:
> SCTLR_EL2.EIS and SCTLR_EL2.EOS control whether exception entry and
> exit at EL2 are Context Synchronisation Events (CSEs). Per ARM DDI
> 0487 M.b D24.2.175 (p. D24-9754):
> 
>   - !FEAT_ExS: the bit is RES1, so the entry/exit is unconditionally
>     a CSE.
>   - FEAT_ExS: the reset value is architecturally UNKNOWN; software
>     must set the bit to make the entry/exit a CSE.
> 
> INIT_SCTLR_EL2_MMU_ON in arch/arm64/include/asm/sysreg.h sets neither
> bit. KVM/arm64 hot paths rely on ERET from EL2 being a CSE, and on
> synchronous EL1->EL2 entry being a CSE, to elide explicit ISBs after
> MSRs to context-switching system registers (HCR_EL2, ZCR_EL2,
> ptrauth keys, etc.). On FEAT_ExS hardware those reliances are not
> architecturally backed unless EOS=1 (and, for entry, EIS=1).
> 
> Until commit 0a35bd285f43 ("arm64: Convert SCTLR_EL2 to sysreg
> infrastructure"), SCTLR_EL2_RES1 was a hand-rolled mask that
> included BIT(11) (EOS) and BIT(22) (EIS), so INIT_SCTLR_EL2_MMU_ON
> was setting both unconditionally. The conversion made
> SCTLR_EL2_RES1 auto-generated; because the sysreg tooling only
> models unconditionally-RES1 fields and EIS/EOS are RES1 only when
> FEAT_ExS is absent, the auto-generated mask is UL(0). The seven
> other bits dropped from the old mask (positions 4, 5, 16, 18, 23,
> 28, 29) are unconditionally RES1 in the E2H=0 SCTLR_EL2 layout per
> DDI 0487 M.b D24.2.175, so dropping them is harmless. EIS and EOS
> are the only bits whose semantics changed for FEAT_ExS hardware
> and where the kernel relies on the value being 1.
> 
> Make the guarantee explicit: include SCTLR_ELx_EIS | SCTLR_ELx_EOS in
> INIT_SCTLR_EL2_MMU_ON so that EL2 exception entry and exit are
> unconditionally CSEs regardless of whether FEAT_ExS is implemented.
> This matches the pairing in arch/arm64/kvm/config.c which treats EIS
> and EOS together as RES1 under !FEAT_ExS.

In v1 you also had this sentence:

"INIT_SCTLR_EL2_MMU_OFF is left unchanged: that path is used during
very early EL2 init and the EL2 MMU-off transition, neither of which
relies on these bits in the same way."

To me, it seems useful to keep that sentence as it makes it clear that INIT_SCTLR_EL2_MMU_OFF is purposely not changed.
Or is there a reason why you dropped it? Perhaps it's just obvious to people more familiar with this code.

Thanks,

Ben

> 
> Fixes: 0a35bd285f43 ("arm64: Convert SCTLR_EL2 to sysreg infrastructure")
> Reviewed-by: Yuan Yao <yaoyuan@linux.alibaba.com>
> Assisted-by: Gemini:gemini-3.1-pro review-prompts
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
>  arch/arm64/include/asm/sysreg.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> index 736561480f36..7aa08d59d494 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -844,7 +844,7 @@
>  #define INIT_SCTLR_EL2_MMU_ON						\
>  	(SCTLR_ELx_M  | SCTLR_ELx_C | SCTLR_ELx_SA | SCTLR_ELx_I |	\
>  	 SCTLR_ELx_IESB | SCTLR_ELx_WXN | ENDIAN_SET_EL2 |		\
> -	 SCTLR_ELx_ITFSB | SCTLR_EL2_RES1)
> +	 SCTLR_ELx_ITFSB | SCTLR_ELx_EIS | SCTLR_ELx_EOS | SCTLR_EL2_RES1)
>  
>  #define INIT_SCTLR_EL2_MMU_OFF \
>  	(SCTLR_EL2_RES1 | ENDIAN_SET_EL2)



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/6] KVM: arm64: Make EL2 exception entry and exit context-synchronization events
  2026-05-01 13:47   ` Ben Horgan
@ 2026-05-01 15:01     ` Fuad Tabba
  2026-05-01 15:07       ` Ben Horgan
  0 siblings, 1 reply; 10+ messages in thread
From: Fuad Tabba @ 2026-05-01 15:01 UTC (permalink / raw)
  To: Ben Horgan
  Cc: maz, oliver.upton, james.morse, suzuki.poulose, yuzenghui,
	qperret, vdonnefort, catalin.marinas, will, yaoyuan,
	linux-arm-kernel, kvmarm, linux-kernel, stable

Hi Ben,

On Fri, 1 May 2026 at 14:47, Ben Horgan <ben.horgan@arm.com> wrote:
>
> Hi Fuad,
>
> On 5/1/26 12:21, Fuad Tabba wrote:
> > SCTLR_EL2.EIS and SCTLR_EL2.EOS control whether exception entry and
> > exit at EL2 are Context Synchronisation Events (CSEs). Per ARM DDI
> > 0487 M.b D24.2.175 (p. D24-9754):
> >
> >   - !FEAT_ExS: the bit is RES1, so the entry/exit is unconditionally
> >     a CSE.
> >   - FEAT_ExS: the reset value is architecturally UNKNOWN; software
> >     must set the bit to make the entry/exit a CSE.
> >
> > INIT_SCTLR_EL2_MMU_ON in arch/arm64/include/asm/sysreg.h sets neither
> > bit. KVM/arm64 hot paths rely on ERET from EL2 being a CSE, and on
> > synchronous EL1->EL2 entry being a CSE, to elide explicit ISBs after
> > MSRs to context-switching system registers (HCR_EL2, ZCR_EL2,
> > ptrauth keys, etc.). On FEAT_ExS hardware those reliances are not
> > architecturally backed unless EOS=1 (and, for entry, EIS=1).
> >
> > Until commit 0a35bd285f43 ("arm64: Convert SCTLR_EL2 to sysreg
> > infrastructure"), SCTLR_EL2_RES1 was a hand-rolled mask that
> > included BIT(11) (EOS) and BIT(22) (EIS), so INIT_SCTLR_EL2_MMU_ON
> > was setting both unconditionally. The conversion made
> > SCTLR_EL2_RES1 auto-generated; because the sysreg tooling only
> > models unconditionally-RES1 fields and EIS/EOS are RES1 only when
> > FEAT_ExS is absent, the auto-generated mask is UL(0). The seven
> > other bits dropped from the old mask (positions 4, 5, 16, 18, 23,
> > 28, 29) are unconditionally RES1 in the E2H=0 SCTLR_EL2 layout per
> > DDI 0487 M.b D24.2.175, so dropping them is harmless. EIS and EOS
> > are the only bits whose semantics changed for FEAT_ExS hardware
> > and where the kernel relies on the value being 1.
> >
> > Make the guarantee explicit: include SCTLR_ELx_EIS | SCTLR_ELx_EOS in
> > INIT_SCTLR_EL2_MMU_ON so that EL2 exception entry and exit are
> > unconditionally CSEs regardless of whether FEAT_ExS is implemented.
> > This matches the pairing in arch/arm64/kvm/config.c which treats EIS
> > and EOS together as RES1 under !FEAT_ExS.
>
> In v1 you also had this sentence:
>
> "INIT_SCTLR_EL2_MMU_OFF is left unchanged: that path is used during
> very early EL2 init and the EL2 MMU-off transition, neither of which
> relies on these bits in the same way."
>
> To me, it seems useful to keep that sentence as it makes it clear that INIT_SCTLR_EL2_MMU_OFF is purposely not changed.
> Or is there a reason why you dropped it? Perhaps it's just obvious to people more familiar with this code.

To be honest, I thought the commit message was quite long, and I
wanted to make it a bit more concise. I could re-introduce it if you
think it's helpful.

Cheers,
/fuad

> Thanks,
>
> Ben
>
> >
> > Fixes: 0a35bd285f43 ("arm64: Convert SCTLR_EL2 to sysreg infrastructure")
> > Reviewed-by: Yuan Yao <yaoyuan@linux.alibaba.com>
> > Assisted-by: Gemini:gemini-3.1-pro review-prompts
> > Signed-off-by: Fuad Tabba <tabba@google.com>
> > ---
> >  arch/arm64/include/asm/sysreg.h | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> > index 736561480f36..7aa08d59d494 100644
> > --- a/arch/arm64/include/asm/sysreg.h
> > +++ b/arch/arm64/include/asm/sysreg.h
> > @@ -844,7 +844,7 @@
> >  #define INIT_SCTLR_EL2_MMU_ON                                                \
> >       (SCTLR_ELx_M  | SCTLR_ELx_C | SCTLR_ELx_SA | SCTLR_ELx_I |      \
> >        SCTLR_ELx_IESB | SCTLR_ELx_WXN | ENDIAN_SET_EL2 |              \
> > -      SCTLR_ELx_ITFSB | SCTLR_EL2_RES1)
> > +      SCTLR_ELx_ITFSB | SCTLR_ELx_EIS | SCTLR_ELx_EOS | SCTLR_EL2_RES1)
> >
> >  #define INIT_SCTLR_EL2_MMU_OFF \
> >       (SCTLR_EL2_RES1 | ENDIAN_SET_EL2)
>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/6] KVM: arm64: Make EL2 exception entry and exit context-synchronization events
  2026-05-01 15:01     ` Fuad Tabba
@ 2026-05-01 15:07       ` Ben Horgan
  0 siblings, 0 replies; 10+ messages in thread
From: Ben Horgan @ 2026-05-01 15:07 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: maz, oliver.upton, james.morse, suzuki.poulose, yuzenghui,
	qperret, vdonnefort, catalin.marinas, will, yaoyuan,
	linux-arm-kernel, kvmarm, linux-kernel, stable

Hi Fuad,

On 5/1/26 16:01, Fuad Tabba wrote:
> Hi Ben,
> 
> On Fri, 1 May 2026 at 14:47, Ben Horgan <ben.horgan@arm.com> wrote:
>>
>> Hi Fuad,
>>
>> On 5/1/26 12:21, Fuad Tabba wrote:
>>> SCTLR_EL2.EIS and SCTLR_EL2.EOS control whether exception entry and
>>> exit at EL2 are Context Synchronisation Events (CSEs). Per ARM DDI
>>> 0487 M.b D24.2.175 (p. D24-9754):
>>>
>>>   - !FEAT_ExS: the bit is RES1, so the entry/exit is unconditionally
>>>     a CSE.
>>>   - FEAT_ExS: the reset value is architecturally UNKNOWN; software
>>>     must set the bit to make the entry/exit a CSE.
>>>
>>> INIT_SCTLR_EL2_MMU_ON in arch/arm64/include/asm/sysreg.h sets neither
>>> bit. KVM/arm64 hot paths rely on ERET from EL2 being a CSE, and on
>>> synchronous EL1->EL2 entry being a CSE, to elide explicit ISBs after
>>> MSRs to context-switching system registers (HCR_EL2, ZCR_EL2,
>>> ptrauth keys, etc.). On FEAT_ExS hardware those reliances are not
>>> architecturally backed unless EOS=1 (and, for entry, EIS=1).
>>>
>>> Until commit 0a35bd285f43 ("arm64: Convert SCTLR_EL2 to sysreg
>>> infrastructure"), SCTLR_EL2_RES1 was a hand-rolled mask that
>>> included BIT(11) (EOS) and BIT(22) (EIS), so INIT_SCTLR_EL2_MMU_ON
>>> was setting both unconditionally. The conversion made
>>> SCTLR_EL2_RES1 auto-generated; because the sysreg tooling only
>>> models unconditionally-RES1 fields and EIS/EOS are RES1 only when
>>> FEAT_ExS is absent, the auto-generated mask is UL(0). The seven
>>> other bits dropped from the old mask (positions 4, 5, 16, 18, 23,
>>> 28, 29) are unconditionally RES1 in the E2H=0 SCTLR_EL2 layout per
>>> DDI 0487 M.b D24.2.175, so dropping them is harmless. EIS and EOS
>>> are the only bits whose semantics changed for FEAT_ExS hardware
>>> and where the kernel relies on the value being 1.
>>>
>>> Make the guarantee explicit: include SCTLR_ELx_EIS | SCTLR_ELx_EOS in
>>> INIT_SCTLR_EL2_MMU_ON so that EL2 exception entry and exit are
>>> unconditionally CSEs regardless of whether FEAT_ExS is implemented.
>>> This matches the pairing in arch/arm64/kvm/config.c which treats EIS
>>> and EOS together as RES1 under !FEAT_ExS.
>>
>> In v1 you also had this sentence:
>>
>> "INIT_SCTLR_EL2_MMU_OFF is left unchanged: that path is used during
>> very early EL2 init and the EL2 MMU-off transition, neither of which
>> relies on these bits in the same way."
>>
>> To me, it seems useful to keep that sentence as it makes it clear that INIT_SCTLR_EL2_MMU_OFF is purposely not changed.
>> Or is there a reason why you dropped it? Perhaps it's just obvious to people more familiar with this code.
> 
> To be honest, I thought the commit message was quite long, and I
> wanted to make it a bit more concise. I could re-introduce it if you
> think it's helpful.

I don't really mind but it was useful in helping me understand your change.

Thanks,

Ben


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2026-05-01 15:08 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-01 11:21 [PATCH v2 0/6] KVM: arm64: EL2 synchronisation and pKVM stage-2 error propagation fixes Fuad Tabba
2026-05-01 11:21 ` [PATCH v2 1/6] KVM: arm64: Make EL2 exception entry and exit context-synchronization events Fuad Tabba
2026-05-01 13:47   ` Ben Horgan
2026-05-01 15:01     ` Fuad Tabba
2026-05-01 15:07       ` Ben Horgan
2026-05-01 11:21 ` [PATCH v2 2/6] KVM: arm64: Guard against NULL vcpu on VHE hyp panic path Fuad Tabba
2026-05-01 11:21 ` [PATCH v2 3/6] KVM: arm64: Fix __deactivate_fgt macro parameter typo Fuad Tabba
2026-05-01 11:21 ` [PATCH v2 4/6] KVM: arm64: Seed pkvm_ownership_selftest vcpu memcache Fuad Tabba
2026-05-01 11:21 ` [PATCH v2 5/6] KVM: arm64: Pre-check vcpu memcache for host->guest share Fuad Tabba
2026-05-01 11:21 ` [PATCH v2 6/6] KVM: arm64: Pre-check vcpu memcache for host->guest donate Fuad Tabba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox