* [PATCH v2 0/6] KVM: arm64: EL2 synchronisation and pKVM stage-2 error propagation fixes
@ 2026-05-01 11:21 Fuad Tabba
2026-05-01 11:21 ` [PATCH v2 1/6] KVM: arm64: Make EL2 exception entry and exit context-synchronization events Fuad Tabba
` (5 more replies)
0 siblings, 6 replies; 10+ messages in thread
From: Fuad Tabba @ 2026-05-01 11:21 UTC (permalink / raw)
To: maz, oliver.upton
Cc: james.morse, suzuki.poulose, yuzenghui, qperret, vdonnefort,
tabba, catalin.marinas, will, yaoyuan, linux-arm-kernel, kvmarm,
linux-kernel, stable
Hi folks,
V2 of the kvm/arm64 audit fixes [1].
Changes since v1:
Patch 1 (SCTLR_EL2.EIS|EOS): Fixes: tag corrected to 0a35bd285f43
("arm64: Convert SCTLR_EL2 to sysreg infrastructure"); the commit
message now explains that the conversion auto-generated
SCTLR_EL2_RES1 to UL(0). Code unchanged.
Patches 2-3 (NULL vcpu guard, __deactivate_fgt typo): unchanged.
Patch 4 (new): Seed selftest_vcpu's memcache to mirror
hyp-main.c's pkvm_refill_memcache() flow; required by the
pre-check in patches 5-6.
Patches 5-6 (host->guest share/donate, formerly v1 patches 5-6):
reworked to pre-check the vcpu memcache against
kvm_mmu_cache_min_pages() during the existing pre-check pass,
before any state mutation. The WARN_ON() around
kvm_pgtable_stage2_map() then asserts an invariant the pre-check
pass establishes, rather than swallowing a reachable -ENOMEM.
Dropped since v1:
- Patch 2 (HCR_EL2 sync): failure path not reachable.
- Patches 7-8 (guest->host share/unshare): the stage-2 map cannot
fail at those call sites (the leaf already exists).
Carried `Reviewed-by` tag (thanks!) and added `Assisted-by:` tags.
Note that with `review-prompts` in the `Assisted-by:` tags, I am
referring to subsystem guides that I added to the base prompts [2],
which I plan submit for upstreaming.
Cheers,
/fuad
[1] https://lore.kernel.org/all/20260428103008.696141-1-tabba@google.com/
[2] https://github.com/masoncl/review-prompts
Fuad Tabba (6):
KVM: arm64: Make EL2 exception entry and exit context-synchronization
events
KVM: arm64: Guard against NULL vcpu on VHE hyp panic path
KVM: arm64: Fix __deactivate_fgt macro parameter typo
KVM: arm64: Seed pkvm_ownership_selftest vcpu memcache
KVM: arm64: Pre-check vcpu memcache for host->guest share
KVM: arm64: Pre-check vcpu memcache for host->guest donate
arch/arm64/include/asm/sysreg.h | 2 +-
arch/arm64/kvm/hyp/include/hyp/switch.h | 2 +-
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 24 ++++++++++++++++++++++++
arch/arm64/kvm/hyp/nvhe/pkvm.c | 16 +++++++++++++++-
arch/arm64/kvm/hyp/vhe/switch.c | 3 ++-
5 files changed, 43 insertions(+), 4 deletions(-)
--
2.54.0.545.g6539524ca2-goog
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2 1/6] KVM: arm64: Make EL2 exception entry and exit context-synchronization events
2026-05-01 11:21 [PATCH v2 0/6] KVM: arm64: EL2 synchronisation and pKVM stage-2 error propagation fixes Fuad Tabba
@ 2026-05-01 11:21 ` Fuad Tabba
2026-05-01 13:47 ` Ben Horgan
2026-05-01 11:21 ` [PATCH v2 2/6] KVM: arm64: Guard against NULL vcpu on VHE hyp panic path Fuad Tabba
` (4 subsequent siblings)
5 siblings, 1 reply; 10+ messages in thread
From: Fuad Tabba @ 2026-05-01 11:21 UTC (permalink / raw)
To: maz, oliver.upton
Cc: james.morse, suzuki.poulose, yuzenghui, qperret, vdonnefort,
tabba, catalin.marinas, will, yaoyuan, linux-arm-kernel, kvmarm,
linux-kernel, stable
SCTLR_EL2.EIS and SCTLR_EL2.EOS control whether exception entry and
exit at EL2 are Context Synchronisation Events (CSEs). Per ARM DDI
0487 M.b D24.2.175 (p. D24-9754):
- !FEAT_ExS: the bit is RES1, so the entry/exit is unconditionally
a CSE.
- FEAT_ExS: the reset value is architecturally UNKNOWN; software
must set the bit to make the entry/exit a CSE.
INIT_SCTLR_EL2_MMU_ON in arch/arm64/include/asm/sysreg.h sets neither
bit. KVM/arm64 hot paths rely on ERET from EL2 being a CSE, and on
synchronous EL1->EL2 entry being a CSE, to elide explicit ISBs after
MSRs to context-switching system registers (HCR_EL2, ZCR_EL2,
ptrauth keys, etc.). On FEAT_ExS hardware those reliances are not
architecturally backed unless EOS=1 (and, for entry, EIS=1).
Until commit 0a35bd285f43 ("arm64: Convert SCTLR_EL2 to sysreg
infrastructure"), SCTLR_EL2_RES1 was a hand-rolled mask that
included BIT(11) (EOS) and BIT(22) (EIS), so INIT_SCTLR_EL2_MMU_ON
was setting both unconditionally. The conversion made
SCTLR_EL2_RES1 auto-generated; because the sysreg tooling only
models unconditionally-RES1 fields and EIS/EOS are RES1 only when
FEAT_ExS is absent, the auto-generated mask is UL(0). The seven
other bits dropped from the old mask (positions 4, 5, 16, 18, 23,
28, 29) are unconditionally RES1 in the E2H=0 SCTLR_EL2 layout per
DDI 0487 M.b D24.2.175, so dropping them is harmless. EIS and EOS
are the only bits whose semantics changed for FEAT_ExS hardware
and where the kernel relies on the value being 1.
Make the guarantee explicit: include SCTLR_ELx_EIS | SCTLR_ELx_EOS in
INIT_SCTLR_EL2_MMU_ON so that EL2 exception entry and exit are
unconditionally CSEs regardless of whether FEAT_ExS is implemented.
This matches the pairing in arch/arm64/kvm/config.c which treats EIS
and EOS together as RES1 under !FEAT_ExS.
Fixes: 0a35bd285f43 ("arm64: Convert SCTLR_EL2 to sysreg infrastructure")
Reviewed-by: Yuan Yao <yaoyuan@linux.alibaba.com>
Assisted-by: Gemini:gemini-3.1-pro review-prompts
Signed-off-by: Fuad Tabba <tabba@google.com>
---
arch/arm64/include/asm/sysreg.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 736561480f36..7aa08d59d494 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -844,7 +844,7 @@
#define INIT_SCTLR_EL2_MMU_ON \
(SCTLR_ELx_M | SCTLR_ELx_C | SCTLR_ELx_SA | SCTLR_ELx_I | \
SCTLR_ELx_IESB | SCTLR_ELx_WXN | ENDIAN_SET_EL2 | \
- SCTLR_ELx_ITFSB | SCTLR_EL2_RES1)
+ SCTLR_ELx_ITFSB | SCTLR_ELx_EIS | SCTLR_ELx_EOS | SCTLR_EL2_RES1)
#define INIT_SCTLR_EL2_MMU_OFF \
(SCTLR_EL2_RES1 | ENDIAN_SET_EL2)
--
2.54.0.545.g6539524ca2-goog
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v2 2/6] KVM: arm64: Guard against NULL vcpu on VHE hyp panic path
2026-05-01 11:21 [PATCH v2 0/6] KVM: arm64: EL2 synchronisation and pKVM stage-2 error propagation fixes Fuad Tabba
2026-05-01 11:21 ` [PATCH v2 1/6] KVM: arm64: Make EL2 exception entry and exit context-synchronization events Fuad Tabba
@ 2026-05-01 11:21 ` Fuad Tabba
2026-05-01 11:21 ` [PATCH v2 3/6] KVM: arm64: Fix __deactivate_fgt macro parameter typo Fuad Tabba
` (3 subsequent siblings)
5 siblings, 0 replies; 10+ messages in thread
From: Fuad Tabba @ 2026-05-01 11:21 UTC (permalink / raw)
To: maz, oliver.upton
Cc: james.morse, suzuki.poulose, yuzenghui, qperret, vdonnefort,
tabba, catalin.marinas, will, yaoyuan, linux-arm-kernel, kvmarm,
linux-kernel, stable
On VHE, __hyp_call_panic() unconditionally calls __deactivate_traps(vcpu)
on the vcpu pointer read from host_ctxt->__hyp_running_vcpu. That pointer
is cleared after every guest exit (and is never set when no guest is
running), so an unexpected EL2 exception landing in _guest_exit_panic,
e.g. via the el2t*_invalid / el2h_irq_invalid vectors - reaches this
function with vcpu == NULL. __deactivate_traps() then dereferences vcpu
via ___deactivate_traps() -> vserror_state_is_nested() -> vcpu_has_nv()
-> vcpu->arch.features, faulting inside the panic handler and obscuring
the original failure.
The nVHE counterpart (hyp_panic() in arch/arm64/kvm/hyp/nvhe/switch.c)
already guards its vcpu-using cleanup with "if (vcpu)"; mirror that
here. sysreg_restore_host_state_vhe() does not depend on vcpu and
continues to run unconditionally, preserving panic forensics. The
trailing panic("...VCPU:%p", vcpu) prints "(null)" safely via printk's
%p handling.
Fixes: 6a0259ed29bb ("KVM: arm64: Remove hyp_panic arguments")
Assisted-by: Gemini:gemini-3.1-pro review-prompts
Signed-off-by: Fuad Tabba <tabba@google.com>
---
arch/arm64/kvm/hyp/vhe/switch.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
index 9db3f11a4754..1e8995add14f 100644
--- a/arch/arm64/kvm/hyp/vhe/switch.c
+++ b/arch/arm64/kvm/hyp/vhe/switch.c
@@ -663,7 +663,8 @@ static void __noreturn __hyp_call_panic(u64 spsr, u64 elr, u64 par)
host_ctxt = host_data_ptr(host_ctxt);
vcpu = host_ctxt->__hyp_running_vcpu;
- __deactivate_traps(vcpu);
+ if (vcpu)
+ __deactivate_traps(vcpu);
sysreg_restore_host_state_vhe(host_ctxt);
panic("HYP panic:\nPS:%08llx PC:%016llx ESR:%08llx\nFAR:%016llx HPFAR:%016llx PAR:%016llx\nVCPU:%p\n",
--
2.54.0.545.g6539524ca2-goog
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v2 3/6] KVM: arm64: Fix __deactivate_fgt macro parameter typo
2026-05-01 11:21 [PATCH v2 0/6] KVM: arm64: EL2 synchronisation and pKVM stage-2 error propagation fixes Fuad Tabba
2026-05-01 11:21 ` [PATCH v2 1/6] KVM: arm64: Make EL2 exception entry and exit context-synchronization events Fuad Tabba
2026-05-01 11:21 ` [PATCH v2 2/6] KVM: arm64: Guard against NULL vcpu on VHE hyp panic path Fuad Tabba
@ 2026-05-01 11:21 ` Fuad Tabba
2026-05-01 11:21 ` [PATCH v2 4/6] KVM: arm64: Seed pkvm_ownership_selftest vcpu memcache Fuad Tabba
` (2 subsequent siblings)
5 siblings, 0 replies; 10+ messages in thread
From: Fuad Tabba @ 2026-05-01 11:21 UTC (permalink / raw)
To: maz, oliver.upton
Cc: james.morse, suzuki.poulose, yuzenghui, qperret, vdonnefort,
tabba, catalin.marinas, will, yaoyuan, linux-arm-kernel, kvmarm,
linux-kernel, stable
__deactivate_fgt() declares its first parameter as "htcxt" but the body
references "hctxt". The parameter is unused; the macro silently captures
"hctxt" from the enclosing scope. Both existing callers
(__deactivate_traps_hfgxtr() and __deactivate_traps_ich_hfgxtr()) happen
to define a local "struct kvm_cpu_context *hctxt", so the macro works
by coincidence.
A future caller without an "hctxt" local in scope, or naming it
differently, would compile but bind to the wrong context. Align the
parameter name with the sibling __activate_fgt() macro.
The "vcpu" parameter remains unused in the body, kept for API symmetry
with __activate_fgt() (which uses it).
Fixes: f5a5a406b4b8 ("KVM: arm64: Propagate and handle Fine-Grained UNDEF bits")
Assisted-by: Gemini:gemini-3.1-pro review-prompts
Signed-off-by: Fuad Tabba <tabba@google.com>
---
arch/arm64/kvm/hyp/include/hyp/switch.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h
index 98b2976837b1..bf0eb5e43427 100644
--- a/arch/arm64/kvm/hyp/include/hyp/switch.h
+++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
@@ -245,7 +245,7 @@ static inline void __activate_traps_ich_hfgxtr(struct kvm_vcpu *vcpu)
__activate_fgt(hctxt, vcpu, ICH_HFGITR_EL2);
}
-#define __deactivate_fgt(htcxt, vcpu, reg) \
+#define __deactivate_fgt(hctxt, vcpu, reg) \
do { \
write_sysreg_s(ctxt_sys_reg(hctxt, reg), \
SYS_ ## reg); \
--
2.54.0.545.g6539524ca2-goog
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v2 4/6] KVM: arm64: Seed pkvm_ownership_selftest vcpu memcache
2026-05-01 11:21 [PATCH v2 0/6] KVM: arm64: EL2 synchronisation and pKVM stage-2 error propagation fixes Fuad Tabba
` (2 preceding siblings ...)
2026-05-01 11:21 ` [PATCH v2 3/6] KVM: arm64: Fix __deactivate_fgt macro parameter typo Fuad Tabba
@ 2026-05-01 11:21 ` Fuad Tabba
2026-05-01 11:21 ` [PATCH v2 5/6] KVM: arm64: Pre-check vcpu memcache for host->guest share Fuad Tabba
2026-05-01 11:21 ` [PATCH v2 6/6] KVM: arm64: Pre-check vcpu memcache for host->guest donate Fuad Tabba
5 siblings, 0 replies; 10+ messages in thread
From: Fuad Tabba @ 2026-05-01 11:21 UTC (permalink / raw)
To: maz, oliver.upton
Cc: james.morse, suzuki.poulose, yuzenghui, qperret, vdonnefort,
tabba, catalin.marinas, will, yaoyuan, linux-arm-kernel, kvmarm,
linux-kernel, stable
The hypercall handlers call pkvm_refill_memcache() to top up the
hyp_vcpu memcache before invoking __pkvm_host_{share,donate}_guest().
pkvm_ownership_selftest invokes those functions directly with a
static selftest_vcpu that has an empty memcache.
Seed selftest_vcpu's memcache from the prepopulated selftest
pages, leaving the remainder for selftest_vm.pool. Required by
the memcache-sufficiency pre-check added in the following
patches.
Assisted-by: Gemini:gemini-3.1-pro review-prompts
Signed-off-by: Fuad Tabba <tabba@google.com>
---
arch/arm64/kvm/hyp/nvhe/pkvm.c | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index 7ed96d64d611..deee7947d694 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -751,16 +751,30 @@ static struct pkvm_hyp_vcpu selftest_vcpu = {
struct pkvm_hyp_vcpu *init_selftest_vm(void *virt)
{
struct hyp_page *p = hyp_virt_to_page(virt);
+ unsigned long min_pages, seeded = 0;
int i;
selftest_vm.kvm.arch.mmu.vtcr = host_mmu.arch.mmu.vtcr;
WARN_ON(kvm_guest_prepare_stage2(&selftest_vm, virt));
+ /*
+ * Mirror pkvm_refill_memcache() for the share/donate pre-checks;
+ * the selftest invokes those functions directly and would
+ * otherwise see an empty memcache.
+ */
+ min_pages = kvm_mmu_cache_min_pages(&selftest_vm.kvm.arch.mmu);
+
for (i = 0; i < pkvm_selftest_pages(); i++) {
if (p[i].refcount)
continue;
p[i].refcount = 1;
- hyp_put_page(&selftest_vm.pool, hyp_page_to_virt(&p[i]));
+ if (seeded < min_pages) {
+ push_hyp_memcache(&selftest_vcpu.vcpu.arch.pkvm_memcache,
+ hyp_page_to_virt(&p[i]), hyp_virt_to_phys);
+ seeded++;
+ } else {
+ hyp_put_page(&selftest_vm.pool, hyp_page_to_virt(&p[i]));
+ }
}
selftest_vm.kvm.arch.pkvm.handle = __pkvm_reserve_vm();
--
2.54.0.545.g6539524ca2-goog
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v2 5/6] KVM: arm64: Pre-check vcpu memcache for host->guest share
2026-05-01 11:21 [PATCH v2 0/6] KVM: arm64: EL2 synchronisation and pKVM stage-2 error propagation fixes Fuad Tabba
` (3 preceding siblings ...)
2026-05-01 11:21 ` [PATCH v2 4/6] KVM: arm64: Seed pkvm_ownership_selftest vcpu memcache Fuad Tabba
@ 2026-05-01 11:21 ` Fuad Tabba
2026-05-01 11:21 ` [PATCH v2 6/6] KVM: arm64: Pre-check vcpu memcache for host->guest donate Fuad Tabba
5 siblings, 0 replies; 10+ messages in thread
From: Fuad Tabba @ 2026-05-01 11:21 UTC (permalink / raw)
To: maz, oliver.upton
Cc: james.morse, suzuki.poulose, yuzenghui, qperret, vdonnefort,
tabba, catalin.marinas, will, yaoyuan, linux-arm-kernel, kvmarm,
linux-kernel, stable
__pkvm_host_share_guest() ends with kvm_pgtable_stage2_map() to
install the guest stage-2 mapping, after a forward pass that mutates
the host vmemmap (sets PKVM_PAGE_SHARED_OWNED and increments
host_share_guest_count) for every page in the range. The map's
return value is wrapped in WARN_ON() and otherwise discarded,
asserting that the call cannot fail.
WARN_ON() at nVHE EL2 panics, so this assertion is only correct if
the call genuinely cannot fail. kvm_pgtable_stage2_map() can fail
with -ENOMEM when the stage-2 walker exhausts the caller's
memcache, and the host controls the vcpu memcache via the topup
interface, so an under-provisioned share request would otherwise
turn a recoverable -ENOMEM into a fatal hyp panic.
Bound the worst-case walker allocation in the existing pre-check
pass so that kvm_pgtable_stage2_map() cannot fail at the call
site, using kvm_mmu_cache_min_pages() -- the same bound host EL1
uses for its own stage-2 maps. If the vcpu memcache holds fewer
pages, return -ENOMEM before any state mutation.
Fixes: d0bd3e6570ae ("KVM: arm64: Introduce __pkvm_host_share_guest()")
Assisted-by: Gemini:gemini-3.1-pro review-prompts
Signed-off-by: Fuad Tabba <tabba@google.com>
---
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 28a471d1927c..e428304f94f2 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -1369,6 +1369,22 @@ int __pkvm_host_reclaim_page_guest(u64 gfn, struct pkvm_hyp_vm *vm)
return ret && ret != -EHWPOISON ? ret : 0;
}
+/*
+ * share/donate install at most one stage-2 leaf (PAGE_SIZE, or one
+ * KVM_PGTABLE_LAST_LEVEL - 1 block for share). kvm_mmu_cache_min_pages()
+ * bounds the worst-case allocation: exact for the PAGE_SIZE leaf,
+ * conservative by one for the block.
+ */
+static int __guest_check_pgtable_memcache(struct pkvm_hyp_vcpu *vcpu)
+{
+ struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu);
+
+ if (vcpu->vcpu.arch.pkvm_memcache.nr_pages < kvm_mmu_cache_min_pages(vm->pgt.mmu))
+ return -ENOMEM;
+
+ return 0;
+}
+
int __pkvm_host_donate_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu)
{
struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu);
@@ -1453,6 +1469,10 @@ int __pkvm_host_share_guest(u64 pfn, u64 gfn, u64 nr_pages, struct pkvm_hyp_vcpu
}
}
+ ret = __guest_check_pgtable_memcache(vcpu);
+ if (ret)
+ goto unlock;
+
for_each_hyp_page(page, phys, size) {
set_host_state(page, PKVM_PAGE_SHARED_OWNED);
page->host_share_guest_count++;
--
2.54.0.545.g6539524ca2-goog
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v2 6/6] KVM: arm64: Pre-check vcpu memcache for host->guest donate
2026-05-01 11:21 [PATCH v2 0/6] KVM: arm64: EL2 synchronisation and pKVM stage-2 error propagation fixes Fuad Tabba
` (4 preceding siblings ...)
2026-05-01 11:21 ` [PATCH v2 5/6] KVM: arm64: Pre-check vcpu memcache for host->guest share Fuad Tabba
@ 2026-05-01 11:21 ` Fuad Tabba
5 siblings, 0 replies; 10+ messages in thread
From: Fuad Tabba @ 2026-05-01 11:21 UTC (permalink / raw)
To: maz, oliver.upton
Cc: james.morse, suzuki.poulose, yuzenghui, qperret, vdonnefort,
tabba, catalin.marinas, will, yaoyuan, linux-arm-kernel, kvmarm,
linux-kernel, stable
__pkvm_host_donate_guest() flips the host stage-2 PTE for the
donated page to a non-valid annotation via
host_stage2_set_owner_metadata_locked() and then calls
kvm_pgtable_stage2_map() to install the matching guest stage-2
mapping. The map's return value is wrapped in WARN_ON() and
otherwise discarded, asserting that the call cannot fail.
WARN_ON() at nVHE EL2 panics, so this assertion is only correct
if the call genuinely cannot fail. kvm_pgtable_stage2_map() can
fail with -ENOMEM even at PAGE_SIZE granularity: the donate path
verifies PKVM_NOPAGE for the guest IPA before the map, so the
walker must allocate fresh page-table pages from the vcpu
memcache, and the host controls the vcpu memcache via the topup
interface. An under-provisioned donation request would otherwise
turn a recoverable -ENOMEM into a fatal hyp panic.
Bound the worst-case walker allocation alongside the existing
__host_check_page_state_range() / __guest_check_page_state_range()
pre-checks, using the helper introduced for host->guest share. If
the vcpu memcache holds fewer pages than kvm_mmu_cache_min_pages(),
return -ENOMEM before any state mutation.
Fixes: 1e579adca177 ("KVM: arm64: Introduce __pkvm_host_donate_guest()")
Assisted-by: Gemini:gemini-3.1-pro review-prompts
Signed-off-by: Fuad Tabba <tabba@google.com>
---
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index e428304f94f2..c7f7149c4796 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -1404,6 +1404,10 @@ int __pkvm_host_donate_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu)
if (ret)
goto unlock;
+ ret = __guest_check_pgtable_memcache(vcpu);
+ if (ret)
+ goto unlock;
+
meta = host_stage2_encode_gfn_meta(vm, gfn);
WARN_ON(host_stage2_set_owner_metadata_locked(phys, PAGE_SIZE,
PKVM_ID_GUEST, meta));
--
2.54.0.545.g6539524ca2-goog
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v2 1/6] KVM: arm64: Make EL2 exception entry and exit context-synchronization events
2026-05-01 11:21 ` [PATCH v2 1/6] KVM: arm64: Make EL2 exception entry and exit context-synchronization events Fuad Tabba
@ 2026-05-01 13:47 ` Ben Horgan
2026-05-01 15:01 ` Fuad Tabba
0 siblings, 1 reply; 10+ messages in thread
From: Ben Horgan @ 2026-05-01 13:47 UTC (permalink / raw)
To: Fuad Tabba, maz, oliver.upton
Cc: james.morse, suzuki.poulose, yuzenghui, qperret, vdonnefort,
catalin.marinas, will, yaoyuan, linux-arm-kernel, kvmarm,
linux-kernel, stable
Hi Fuad,
On 5/1/26 12:21, Fuad Tabba wrote:
> SCTLR_EL2.EIS and SCTLR_EL2.EOS control whether exception entry and
> exit at EL2 are Context Synchronisation Events (CSEs). Per ARM DDI
> 0487 M.b D24.2.175 (p. D24-9754):
>
> - !FEAT_ExS: the bit is RES1, so the entry/exit is unconditionally
> a CSE.
> - FEAT_ExS: the reset value is architecturally UNKNOWN; software
> must set the bit to make the entry/exit a CSE.
>
> INIT_SCTLR_EL2_MMU_ON in arch/arm64/include/asm/sysreg.h sets neither
> bit. KVM/arm64 hot paths rely on ERET from EL2 being a CSE, and on
> synchronous EL1->EL2 entry being a CSE, to elide explicit ISBs after
> MSRs to context-switching system registers (HCR_EL2, ZCR_EL2,
> ptrauth keys, etc.). On FEAT_ExS hardware those reliances are not
> architecturally backed unless EOS=1 (and, for entry, EIS=1).
>
> Until commit 0a35bd285f43 ("arm64: Convert SCTLR_EL2 to sysreg
> infrastructure"), SCTLR_EL2_RES1 was a hand-rolled mask that
> included BIT(11) (EOS) and BIT(22) (EIS), so INIT_SCTLR_EL2_MMU_ON
> was setting both unconditionally. The conversion made
> SCTLR_EL2_RES1 auto-generated; because the sysreg tooling only
> models unconditionally-RES1 fields and EIS/EOS are RES1 only when
> FEAT_ExS is absent, the auto-generated mask is UL(0). The seven
> other bits dropped from the old mask (positions 4, 5, 16, 18, 23,
> 28, 29) are unconditionally RES1 in the E2H=0 SCTLR_EL2 layout per
> DDI 0487 M.b D24.2.175, so dropping them is harmless. EIS and EOS
> are the only bits whose semantics changed for FEAT_ExS hardware
> and where the kernel relies on the value being 1.
>
> Make the guarantee explicit: include SCTLR_ELx_EIS | SCTLR_ELx_EOS in
> INIT_SCTLR_EL2_MMU_ON so that EL2 exception entry and exit are
> unconditionally CSEs regardless of whether FEAT_ExS is implemented.
> This matches the pairing in arch/arm64/kvm/config.c which treats EIS
> and EOS together as RES1 under !FEAT_ExS.
In v1 you also had this sentence:
"INIT_SCTLR_EL2_MMU_OFF is left unchanged: that path is used during
very early EL2 init and the EL2 MMU-off transition, neither of which
relies on these bits in the same way."
To me, it seems useful to keep that sentence as it makes it clear that INIT_SCTLR_EL2_MMU_OFF is purposely not changed.
Or is there a reason why you dropped it? Perhaps it's just obvious to people more familiar with this code.
Thanks,
Ben
>
> Fixes: 0a35bd285f43 ("arm64: Convert SCTLR_EL2 to sysreg infrastructure")
> Reviewed-by: Yuan Yao <yaoyuan@linux.alibaba.com>
> Assisted-by: Gemini:gemini-3.1-pro review-prompts
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
> arch/arm64/include/asm/sysreg.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> index 736561480f36..7aa08d59d494 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -844,7 +844,7 @@
> #define INIT_SCTLR_EL2_MMU_ON \
> (SCTLR_ELx_M | SCTLR_ELx_C | SCTLR_ELx_SA | SCTLR_ELx_I | \
> SCTLR_ELx_IESB | SCTLR_ELx_WXN | ENDIAN_SET_EL2 | \
> - SCTLR_ELx_ITFSB | SCTLR_EL2_RES1)
> + SCTLR_ELx_ITFSB | SCTLR_ELx_EIS | SCTLR_ELx_EOS | SCTLR_EL2_RES1)
>
> #define INIT_SCTLR_EL2_MMU_OFF \
> (SCTLR_EL2_RES1 | ENDIAN_SET_EL2)
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 1/6] KVM: arm64: Make EL2 exception entry and exit context-synchronization events
2026-05-01 13:47 ` Ben Horgan
@ 2026-05-01 15:01 ` Fuad Tabba
2026-05-01 15:07 ` Ben Horgan
0 siblings, 1 reply; 10+ messages in thread
From: Fuad Tabba @ 2026-05-01 15:01 UTC (permalink / raw)
To: Ben Horgan
Cc: maz, oliver.upton, james.morse, suzuki.poulose, yuzenghui,
qperret, vdonnefort, catalin.marinas, will, yaoyuan,
linux-arm-kernel, kvmarm, linux-kernel, stable
Hi Ben,
On Fri, 1 May 2026 at 14:47, Ben Horgan <ben.horgan@arm.com> wrote:
>
> Hi Fuad,
>
> On 5/1/26 12:21, Fuad Tabba wrote:
> > SCTLR_EL2.EIS and SCTLR_EL2.EOS control whether exception entry and
> > exit at EL2 are Context Synchronisation Events (CSEs). Per ARM DDI
> > 0487 M.b D24.2.175 (p. D24-9754):
> >
> > - !FEAT_ExS: the bit is RES1, so the entry/exit is unconditionally
> > a CSE.
> > - FEAT_ExS: the reset value is architecturally UNKNOWN; software
> > must set the bit to make the entry/exit a CSE.
> >
> > INIT_SCTLR_EL2_MMU_ON in arch/arm64/include/asm/sysreg.h sets neither
> > bit. KVM/arm64 hot paths rely on ERET from EL2 being a CSE, and on
> > synchronous EL1->EL2 entry being a CSE, to elide explicit ISBs after
> > MSRs to context-switching system registers (HCR_EL2, ZCR_EL2,
> > ptrauth keys, etc.). On FEAT_ExS hardware those reliances are not
> > architecturally backed unless EOS=1 (and, for entry, EIS=1).
> >
> > Until commit 0a35bd285f43 ("arm64: Convert SCTLR_EL2 to sysreg
> > infrastructure"), SCTLR_EL2_RES1 was a hand-rolled mask that
> > included BIT(11) (EOS) and BIT(22) (EIS), so INIT_SCTLR_EL2_MMU_ON
> > was setting both unconditionally. The conversion made
> > SCTLR_EL2_RES1 auto-generated; because the sysreg tooling only
> > models unconditionally-RES1 fields and EIS/EOS are RES1 only when
> > FEAT_ExS is absent, the auto-generated mask is UL(0). The seven
> > other bits dropped from the old mask (positions 4, 5, 16, 18, 23,
> > 28, 29) are unconditionally RES1 in the E2H=0 SCTLR_EL2 layout per
> > DDI 0487 M.b D24.2.175, so dropping them is harmless. EIS and EOS
> > are the only bits whose semantics changed for FEAT_ExS hardware
> > and where the kernel relies on the value being 1.
> >
> > Make the guarantee explicit: include SCTLR_ELx_EIS | SCTLR_ELx_EOS in
> > INIT_SCTLR_EL2_MMU_ON so that EL2 exception entry and exit are
> > unconditionally CSEs regardless of whether FEAT_ExS is implemented.
> > This matches the pairing in arch/arm64/kvm/config.c which treats EIS
> > and EOS together as RES1 under !FEAT_ExS.
>
> In v1 you also had this sentence:
>
> "INIT_SCTLR_EL2_MMU_OFF is left unchanged: that path is used during
> very early EL2 init and the EL2 MMU-off transition, neither of which
> relies on these bits in the same way."
>
> To me, it seems useful to keep that sentence as it makes it clear that INIT_SCTLR_EL2_MMU_OFF is purposely not changed.
> Or is there a reason why you dropped it? Perhaps it's just obvious to people more familiar with this code.
To be honest, I thought the commit message was quite long, and I
wanted to make it a bit more concise. I could re-introduce it if you
think it's helpful.
Cheers,
/fuad
> Thanks,
>
> Ben
>
> >
> > Fixes: 0a35bd285f43 ("arm64: Convert SCTLR_EL2 to sysreg infrastructure")
> > Reviewed-by: Yuan Yao <yaoyuan@linux.alibaba.com>
> > Assisted-by: Gemini:gemini-3.1-pro review-prompts
> > Signed-off-by: Fuad Tabba <tabba@google.com>
> > ---
> > arch/arm64/include/asm/sysreg.h | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> > index 736561480f36..7aa08d59d494 100644
> > --- a/arch/arm64/include/asm/sysreg.h
> > +++ b/arch/arm64/include/asm/sysreg.h
> > @@ -844,7 +844,7 @@
> > #define INIT_SCTLR_EL2_MMU_ON \
> > (SCTLR_ELx_M | SCTLR_ELx_C | SCTLR_ELx_SA | SCTLR_ELx_I | \
> > SCTLR_ELx_IESB | SCTLR_ELx_WXN | ENDIAN_SET_EL2 | \
> > - SCTLR_ELx_ITFSB | SCTLR_EL2_RES1)
> > + SCTLR_ELx_ITFSB | SCTLR_ELx_EIS | SCTLR_ELx_EOS | SCTLR_EL2_RES1)
> >
> > #define INIT_SCTLR_EL2_MMU_OFF \
> > (SCTLR_EL2_RES1 | ENDIAN_SET_EL2)
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2 1/6] KVM: arm64: Make EL2 exception entry and exit context-synchronization events
2026-05-01 15:01 ` Fuad Tabba
@ 2026-05-01 15:07 ` Ben Horgan
0 siblings, 0 replies; 10+ messages in thread
From: Ben Horgan @ 2026-05-01 15:07 UTC (permalink / raw)
To: Fuad Tabba
Cc: maz, oliver.upton, james.morse, suzuki.poulose, yuzenghui,
qperret, vdonnefort, catalin.marinas, will, yaoyuan,
linux-arm-kernel, kvmarm, linux-kernel, stable
Hi Fuad,
On 5/1/26 16:01, Fuad Tabba wrote:
> Hi Ben,
>
> On Fri, 1 May 2026 at 14:47, Ben Horgan <ben.horgan@arm.com> wrote:
>>
>> Hi Fuad,
>>
>> On 5/1/26 12:21, Fuad Tabba wrote:
>>> SCTLR_EL2.EIS and SCTLR_EL2.EOS control whether exception entry and
>>> exit at EL2 are Context Synchronisation Events (CSEs). Per ARM DDI
>>> 0487 M.b D24.2.175 (p. D24-9754):
>>>
>>> - !FEAT_ExS: the bit is RES1, so the entry/exit is unconditionally
>>> a CSE.
>>> - FEAT_ExS: the reset value is architecturally UNKNOWN; software
>>> must set the bit to make the entry/exit a CSE.
>>>
>>> INIT_SCTLR_EL2_MMU_ON in arch/arm64/include/asm/sysreg.h sets neither
>>> bit. KVM/arm64 hot paths rely on ERET from EL2 being a CSE, and on
>>> synchronous EL1->EL2 entry being a CSE, to elide explicit ISBs after
>>> MSRs to context-switching system registers (HCR_EL2, ZCR_EL2,
>>> ptrauth keys, etc.). On FEAT_ExS hardware those reliances are not
>>> architecturally backed unless EOS=1 (and, for entry, EIS=1).
>>>
>>> Until commit 0a35bd285f43 ("arm64: Convert SCTLR_EL2 to sysreg
>>> infrastructure"), SCTLR_EL2_RES1 was a hand-rolled mask that
>>> included BIT(11) (EOS) and BIT(22) (EIS), so INIT_SCTLR_EL2_MMU_ON
>>> was setting both unconditionally. The conversion made
>>> SCTLR_EL2_RES1 auto-generated; because the sysreg tooling only
>>> models unconditionally-RES1 fields and EIS/EOS are RES1 only when
>>> FEAT_ExS is absent, the auto-generated mask is UL(0). The seven
>>> other bits dropped from the old mask (positions 4, 5, 16, 18, 23,
>>> 28, 29) are unconditionally RES1 in the E2H=0 SCTLR_EL2 layout per
>>> DDI 0487 M.b D24.2.175, so dropping them is harmless. EIS and EOS
>>> are the only bits whose semantics changed for FEAT_ExS hardware
>>> and where the kernel relies on the value being 1.
>>>
>>> Make the guarantee explicit: include SCTLR_ELx_EIS | SCTLR_ELx_EOS in
>>> INIT_SCTLR_EL2_MMU_ON so that EL2 exception entry and exit are
>>> unconditionally CSEs regardless of whether FEAT_ExS is implemented.
>>> This matches the pairing in arch/arm64/kvm/config.c which treats EIS
>>> and EOS together as RES1 under !FEAT_ExS.
>>
>> In v1 you also had this sentence:
>>
>> "INIT_SCTLR_EL2_MMU_OFF is left unchanged: that path is used during
>> very early EL2 init and the EL2 MMU-off transition, neither of which
>> relies on these bits in the same way."
>>
>> To me, it seems useful to keep that sentence as it makes it clear that INIT_SCTLR_EL2_MMU_OFF is purposely not changed.
>> Or is there a reason why you dropped it? Perhaps it's just obvious to people more familiar with this code.
>
> To be honest, I thought the commit message was quite long, and I
> wanted to make it a bit more concise. I could re-introduce it if you
> think it's helpful.
I don't really mind but it was useful in helping me understand your change.
Thanks,
Ben
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2026-05-01 15:08 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-01 11:21 [PATCH v2 0/6] KVM: arm64: EL2 synchronisation and pKVM stage-2 error propagation fixes Fuad Tabba
2026-05-01 11:21 ` [PATCH v2 1/6] KVM: arm64: Make EL2 exception entry and exit context-synchronization events Fuad Tabba
2026-05-01 13:47 ` Ben Horgan
2026-05-01 15:01 ` Fuad Tabba
2026-05-01 15:07 ` Ben Horgan
2026-05-01 11:21 ` [PATCH v2 2/6] KVM: arm64: Guard against NULL vcpu on VHE hyp panic path Fuad Tabba
2026-05-01 11:21 ` [PATCH v2 3/6] KVM: arm64: Fix __deactivate_fgt macro parameter typo Fuad Tabba
2026-05-01 11:21 ` [PATCH v2 4/6] KVM: arm64: Seed pkvm_ownership_selftest vcpu memcache Fuad Tabba
2026-05-01 11:21 ` [PATCH v2 5/6] KVM: arm64: Pre-check vcpu memcache for host->guest share Fuad Tabba
2026-05-01 11:21 ` [PATCH v2 6/6] KVM: arm64: Pre-check vcpu memcache for host->guest donate Fuad Tabba
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox