public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* [kvm-unit-tests PATCH v2 0/7] arm64: Add Stage-2 MMU and Nested Guest Framework
@ 2026-04-13 20:46 Jing Zhang
  2026-04-13 20:46 ` [kvm-unit-tests PATCH v2 1/7] lib: arm64: Generalize ESR exception class definitions for EL2 support Jing Zhang
                   ` (6 more replies)
  0 siblings, 7 replies; 11+ messages in thread
From: Jing Zhang @ 2026-04-13 20:46 UTC (permalink / raw)
  To: KVM, KVMARM, Marc Zyngier, Joey Gouly, Wei-Lin Chang, Yao Yuan
  Cc: Oliver Upton, Andrew Jones, Alexandru Elisei, Mingwei Zhang,
	Raghavendra Rao Ananta, Colton Lewis, Jing Zhang

This patch series introduces a lightweight infrastructure for managing ARM64
Stage-2 translation tables and executing nested guests. These components are
essential for testing advanced virtualization features such as nested
virtualization (NV) and GICv4 direct interrupt injection.

The series provides a generic Stage-2 MMU library supporting multiple
translation granules (4K, 16K, 64K) and dynamic page table management.
Building on this, it adds a guest execution framework that handles guest
lifecycle management, context switching and guest exit routing. A new test
case for Stage-2 MMU demand paging to verify fault handling.

Please note that this is a very preliminary implementation intended as a
startup baseline for future work in virtualization testing. Users should be
aware that because this is an early-stage baseline, some portions of the code
may just happen to work in its current state. There might be critical
architectural elements or edge-case handling missing that will need to be
addressed as the framework matures.

---

* v1 -> v2
  - Used generated `struct guest` offset instead of hardcoding.
  - Cleaned up register definitions.
  - Refined EL1/EL2 exception vector tables.
  - Split monolithic patches into a series of granular commits.
  - Addressed other review feedbacks.

[1] https://lore.kernel.org/kvmarm/20260316224349.2360482-1-jingzhangos@google.com/

---

Jing Zhang (7):
  lib: arm64: Generalize ESR exception class definitions for EL2 support
  lib: arm64: Add stage2 page table management library
  lib: arm64: Generalize exception vector definitions for EL2 support
  lib: arm64: Add foundational guest execution framework
  lib: arm64: Add support for guest exit exception handling
  lib: arm64: Add guest-internal exception handling (EL1)
  arm64: Add Stage-2 MMU demand paging test

 arm/Makefile.arm64         |   4 +
 arm/debug.c                |   6 +-
 arm/gic.c                  |   2 +-
 arm/micro-bench.c          |   4 +-
 arm/mte.c                  |   6 +-
 arm/pl031.c                |   2 +-
 arm/pmu.c                  |   2 +-
 arm/psci.c                 |   2 +-
 arm/selftest.c             |   6 +-
 arm/stage2-mmu-test.c      | 107 ++++++++++
 arm/timer.c                |   6 +-
 lib/arm64/asm-offsets.c    |  15 ++
 lib/arm64/asm/esr.h        |   5 +-
 lib/arm64/asm/guest.h      |  91 +++++++++
 lib/arm64/asm/processor.h  |  32 +--
 lib/arm64/asm/stage2_mmu.h |  70 +++++++
 lib/arm64/asm/sysreg.h     |   7 +
 lib/arm64/guest.c          | 158 +++++++++++++++
 lib/arm64/guest_arch.S     | 248 +++++++++++++++++++++++
 lib/arm64/processor.c      |  18 +-
 lib/arm64/stage2_mmu.c     | 403 +++++++++++++++++++++++++++++++++++++
 21 files changed, 1149 insertions(+), 45 deletions(-)
 create mode 100644 arm/stage2-mmu-test.c
 create mode 100644 lib/arm64/asm/guest.h
 create mode 100644 lib/arm64/asm/stage2_mmu.h
 create mode 100644 lib/arm64/guest.c
 create mode 100644 lib/arm64/guest_arch.S
 create mode 100644 lib/arm64/stage2_mmu.c


base-commit: 86e53277ac80dabb04f4fa5fa6a6cc7649392bdc
-- 
2.53.0.1213.gd9a14994de-goog


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [kvm-unit-tests PATCH v2 1/7] lib: arm64: Generalize ESR exception class definitions for EL2 support
  2026-04-13 20:46 [kvm-unit-tests PATCH v2 0/7] arm64: Add Stage-2 MMU and Nested Guest Framework Jing Zhang
@ 2026-04-13 20:46 ` Jing Zhang
  2026-04-16 15:27   ` Joey Gouly
  2026-04-13 20:46 ` [kvm-unit-tests PATCH v2 2/7] lib: arm64: Add stage2 page table management library Jing Zhang
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 11+ messages in thread
From: Jing Zhang @ 2026-04-13 20:46 UTC (permalink / raw)
  To: KVM, KVMARM, Marc Zyngier, Joey Gouly, Wei-Lin Chang, Yao Yuan
  Cc: Oliver Upton, Andrew Jones, Alexandru Elisei, Mingwei Zhang,
	Raghavendra Rao Ananta, Colton Lewis, Jing Zhang

Generalize some Exception Syndrome Register (ESR) definitions by
renaming EL1-specific macros to ELx equivalents. This allows these
constants to be shared between EL1 and EL2, supporting the upcoming
S2MMU library implementation.

Signed-off-by: Jing Zhang <jingzhangos@google.com>
---
 lib/arm64/asm/esr.h   |  5 +++--
 lib/arm64/processor.c | 10 +++++-----
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/lib/arm64/asm/esr.h b/lib/arm64/asm/esr.h
index 335343c5..8437916f 100644
--- a/lib/arm64/asm/esr.h
+++ b/lib/arm64/asm/esr.h
@@ -12,7 +12,7 @@
 #define ESR_EL1_CM		(1 << 8)
 #define ESR_EL1_IL		(1 << 25)
 
-#define ESR_EL1_EC_SHIFT	(26)
+#define ESR_ELx_EC_SHIFT	(26)
 #define ESR_EL1_EC_UNKNOWN	(0x00)
 #define ESR_EL1_EC_WFI		(0x01)
 #define ESR_EL1_EC_CP15_32	(0x03)
@@ -25,12 +25,13 @@
 #define ESR_EL1_EC_ILL_ISS	(0x0E)
 #define ESR_EL1_EC_SVC32	(0x11)
 #define ESR_EL1_EC_SVC64	(0x15)
+#define ESR_ELx_EC_HVC64	(0x16)
 #define ESR_EL1_EC_SYS64	(0x18)
 #define ESR_EL1_EC_SVE		(0x19)
 #define ESR_EL1_EC_IABT_EL0	(0x20)
 #define ESR_EL1_EC_IABT_EL1	(0x21)
 #define ESR_EL1_EC_PC_ALIGN	(0x22)
-#define ESR_EL1_EC_DABT_EL0	(0x24)
+#define ESR_ELx_EC_DABT_LOW	(0x24)
 #define ESR_EL1_EC_DABT_EL1	(0x25)
 #define ESR_EL1_EC_SP_ALIGN	(0x26)
 #define ESR_EL1_EC_FP_EXC32	(0x28)
diff --git a/lib/arm64/processor.c b/lib/arm64/processor.c
index f9fea519..bde3caa5 100644
--- a/lib/arm64/processor.c
+++ b/lib/arm64/processor.c
@@ -48,7 +48,7 @@ static const char *ec_names[EC_MAX] = {
 	[ESR_EL1_EC_IABT_EL0]		= "IABT_EL0",
 	[ESR_EL1_EC_IABT_EL1]		= "IABT_EL1",
 	[ESR_EL1_EC_PC_ALIGN]		= "PC_ALIGN",
-	[ESR_EL1_EC_DABT_EL0]		= "DABT_EL0",
+	[ESR_ELx_EC_DABT_LOW]		= "DABT_EL0",
 	[ESR_EL1_EC_DABT_EL1]		= "DABT_EL1",
 	[ESR_EL1_EC_SP_ALIGN]		= "SP_ALIGN",
 	[ESR_EL1_EC_FP_EXC32]		= "FP_EXC32",
@@ -82,7 +82,7 @@ void show_regs(struct pt_regs *regs)
 
 bool get_far(unsigned int esr, unsigned long *far)
 {
-	unsigned int ec = esr >> ESR_EL1_EC_SHIFT;
+	unsigned int ec = esr >> ESR_ELx_EC_SHIFT;
 
 	asm volatile("mrs %0, far_el1": "=r" (*far));
 
@@ -90,7 +90,7 @@ bool get_far(unsigned int esr, unsigned long *far)
 	case ESR_EL1_EC_IABT_EL0:
 	case ESR_EL1_EC_IABT_EL1:
 	case ESR_EL1_EC_PC_ALIGN:
-	case ESR_EL1_EC_DABT_EL0:
+	case ESR_ELx_EC_DABT_LOW:
 	case ESR_EL1_EC_DABT_EL1:
 	case ESR_EL1_EC_WATCHPT_EL0:
 	case ESR_EL1_EC_WATCHPT_EL1:
@@ -108,7 +108,7 @@ static void bad_exception(enum vector v, struct pt_regs *regs,
 {
 	unsigned long far;
 	bool far_valid = get_far(esr, &far);
-	unsigned int ec = esr >> ESR_EL1_EC_SHIFT;
+	unsigned int ec = esr >> ESR_ELx_EC_SHIFT;
 	uintptr_t text = (uintptr_t)&_text;
 
 	printf("Load address: %" PRIxPTR "\n", text);
@@ -158,7 +158,7 @@ void default_vector_sync_handler(enum vector v, struct pt_regs *regs,
 				 unsigned int esr)
 {
 	struct thread_info *ti = thread_info_sp(regs->sp);
-	unsigned int ec = esr >> ESR_EL1_EC_SHIFT;
+	unsigned int ec = esr >> ESR_ELx_EC_SHIFT;
 
 	if (ti->flags & TIF_USER_MODE) {
 		if (ec < EC_MAX && ti->exception_handlers[v][ec]) {
-- 
2.53.0.1213.gd9a14994de-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [kvm-unit-tests PATCH v2 2/7] lib: arm64: Add stage2 page table management library
  2026-04-13 20:46 [kvm-unit-tests PATCH v2 0/7] arm64: Add Stage-2 MMU and Nested Guest Framework Jing Zhang
  2026-04-13 20:46 ` [kvm-unit-tests PATCH v2 1/7] lib: arm64: Generalize ESR exception class definitions for EL2 support Jing Zhang
@ 2026-04-13 20:46 ` Jing Zhang
  2026-04-16 15:19   ` Joey Gouly
  2026-04-13 20:46 ` [kvm-unit-tests PATCH v2 3/7] lib: arm64: Generalize exception vector definitions for EL2 support Jing Zhang
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 11+ messages in thread
From: Jing Zhang @ 2026-04-13 20:46 UTC (permalink / raw)
  To: KVM, KVMARM, Marc Zyngier, Joey Gouly, Wei-Lin Chang, Yao Yuan
  Cc: Oliver Upton, Andrew Jones, Alexandru Elisei, Mingwei Zhang,
	Raghavendra Rao Ananta, Colton Lewis, Jing Zhang

Tests running at EL2 (hypervisor level) often require the ability to
manage Stage 2 translation tables to control Guest Physical Address (IPA)
to Host Physical Address (PA) translation.

Add a generic Stage 2 MMU library that provides software management of
ARM64 Stage 2 translation tables.

The library features include:
- Support for 4K, 16K, and 64K translation granules.
- Dynamic page table allocation using the allocator.
- Support for 2M block mappings where applicable.
- APIs for mapping, unmapping, enabling, and disabling the Stage 2 MMU.
- Basic fault info reporting (ESR, FAR, HPFAR).

This infrastructure is necessary for upcoming virtualization and
hypervisor-mode tests.

Signed-off-by: Jing Zhang <jingzhangos@google.com>
---
 arm/Makefile.arm64         |   1 +
 lib/arm64/asm/stage2_mmu.h |  70 +++++++
 lib/arm64/stage2_mmu.c     | 403 +++++++++++++++++++++++++++++++++++++
 3 files changed, 474 insertions(+)
 create mode 100644 lib/arm64/asm/stage2_mmu.h
 create mode 100644 lib/arm64/stage2_mmu.c

diff --git a/arm/Makefile.arm64 b/arm/Makefile.arm64
index a40c830d..5e50f5ba 100644
--- a/arm/Makefile.arm64
+++ b/arm/Makefile.arm64
@@ -40,6 +40,7 @@ cflatobjs += lib/arm64/stack.o
 cflatobjs += lib/arm64/processor.o
 cflatobjs += lib/arm64/spinlock.o
 cflatobjs += lib/arm64/gic-v3-its.o lib/arm64/gic-v3-its-cmd.o
+cflatobjs += lib/arm64/stage2_mmu.o
 
 ifeq ($(CONFIG_EFI),y)
 cflatobjs += lib/acpi.o
diff --git a/lib/arm64/asm/stage2_mmu.h b/lib/arm64/asm/stage2_mmu.h
new file mode 100644
index 00000000..a5324108
--- /dev/null
+++ b/lib/arm64/asm/stage2_mmu.h
@@ -0,0 +1,70 @@
+/*
+ * Copyright (C) 2026, Google LLC.
+ * Author: Jing Zhang <jingzhangos@google.com>
+ *
+ * SPDX-License-Identifier: LGPL-2.0-or-later
+ */
+#ifndef _ASMARM64_STAGE2_MMU_H_
+#define _ASMARM64_STAGE2_MMU_H_
+
+#include <libcflat.h>
+#include <asm/page.h>
+#include <asm/pgtable.h>
+
+#define pte_is_table(pte)	(pte_val(pte) & PTE_TABLE_BIT)
+
+/* Stage-2 Memory Attributes (MemAttr[3:0]) */
+#define S2_MEMATTR_NORMAL	(0xFUL << 2) /* Normal Memory, Outer/Inner Write-Back */
+#define S2_MEMATTR_DEVICE	(0x0UL << 2) /* Device-nGnRnE */
+
+/* Stage-2 Access Permissions (S2AP[1:0]) */
+#define S2AP_NONE	(0UL << 6)
+#define S2AP_RO		(1UL << 6) /* Read-only */
+#define S2AP_WO		(2UL << 6) /* Write-only */
+#define S2AP_RW		(3UL << 6) /* Read-Write */
+
+/* Flags for mapping */
+#define S2_MAP_RW	(S2AP_RW | S2_MEMATTR_NORMAL | PTE_AF | PTE_SHARED)
+#define S2_MAP_DEVICE	(S2AP_RW | S2_MEMATTR_DEVICE | PTE_AF)
+
+enum s2_granule {
+	S2_PAGE_4K,
+	S2_PAGE_16K,
+	S2_PAGE_64K,
+};
+
+/* Main Stage-2 MMU Structure */
+struct s2_mmu {
+	pgd_t *pgd;
+	int vmid;
+
+	/* Configuration */
+	enum s2_granule granule;
+	bool allow_block_mappings;
+
+	/* Internal helpers calculated from granule & VA_BITS */
+	unsigned int page_shift;
+	unsigned int level_shift;
+	int root_level; /* 0, 1, or 2 */
+	unsigned long page_size;
+	unsigned long block_size;
+};
+
+/* API */
+/* Initialize an s2_mmu struct with specific settings */
+struct s2_mmu *s2mmu_init(int vmid, enum s2_granule granule, bool allow_block_mappings);
+
+/* Management */
+void s2mmu_destroy(struct s2_mmu *mmu);
+void s2mmu_map(struct s2_mmu *mmu, unsigned long ipa, unsigned long pa,
+	       unsigned long size, unsigned long flags);
+void s2mmu_unmap(struct s2_mmu *mmu, unsigned long ipa, unsigned long size);
+
+/* Activation */
+void s2mmu_enable(struct s2_mmu *mmu);
+void s2mmu_disable(struct s2_mmu *mmu);
+
+/* Debug */
+void s2mmu_print_fault_info(void);
+
+#endif /* _ASMARM64_STAGE2_MMU_H_ */
diff --git a/lib/arm64/stage2_mmu.c b/lib/arm64/stage2_mmu.c
new file mode 100644
index 00000000..cf419e28
--- /dev/null
+++ b/lib/arm64/stage2_mmu.c
@@ -0,0 +1,403 @@
+/*
+ * Copyright (C) 2026, Google LLC.
+ * Author: Jing Zhang <jingzhangos@google.com>
+ *
+ * SPDX-License-Identifier: LGPL-2.0-or-later
+ */
+#include <libcflat.h>
+#include <alloc.h>
+#include <asm/stage2_mmu.h>
+#include <asm/sysreg.h>
+#include <asm/io.h>
+#include <asm/barrier.h>
+#include <alloc_page.h>
+
+/* VTCR_EL2 Definitions */
+#define VTCR_SH0_INNER		(3UL << 12)
+#define VTCR_ORGN0_WBWA		(1UL << 10)
+#define VTCR_IRGN0_WBWA		(1UL << 8)
+
+/* TG0 Encodings */
+#define VTCR_TG0_SHIFT		14
+#define VTCR_TG0_4K		(0UL << VTCR_TG0_SHIFT)
+#define VTCR_TG0_64K		(1UL << VTCR_TG0_SHIFT)
+#define VTCR_TG0_16K		(2UL << VTCR_TG0_SHIFT)
+
+/* Physical Address Size (PS) - Derive from VA_BITS for simplicity or max */
+#define VTCR_PS_SHIFT		16
+#if VA_BITS > 40
+#define VTCR_PS_VAL		(5UL << VTCR_PS_SHIFT) /* 48-bit PA */
+#else
+#define VTCR_PS_VAL		(2UL << VTCR_PS_SHIFT) /* 40-bit PA */
+#endif
+
+struct s2_mmu *s2mmu_init(int vmid, enum s2_granule granule, bool allow_block_mappings)
+{
+	struct s2_mmu *mmu = calloc(1, sizeof(struct s2_mmu));
+	int order = 0;
+
+	mmu->vmid = vmid;
+	mmu->granule = granule;
+	mmu->allow_block_mappings = allow_block_mappings;
+
+	/* Configure shifts based on granule */
+	switch (granule) {
+	case S2_PAGE_4K:
+		mmu->page_shift = 12;
+		mmu->level_shift = 9;
+		/*
+		 * Determine Root Level for 4K:
+		 * VA_BITS > 39 (e.g. 48) -> Start L0
+		 * VA_BITS <= 39 (e.g. 32, 36) -> Start L1
+		 */
+		mmu->root_level = (VA_BITS > 39) ? 0 : 1;
+		break;
+	case S2_PAGE_16K:
+		mmu->page_shift = 14;
+		mmu->level_shift = 11;
+		/*
+		 * 16K: L1 covers 47 bits. L0 not valid for 16K
+		 * Start L1 for 47 bits. Start L2 for 36 bits.
+		 */
+		mmu->root_level = (VA_BITS > 36) ? 1 : 2;
+		break;
+	case S2_PAGE_64K:
+		mmu->page_shift = 16;
+		mmu->level_shift = 13;
+		/* 64K: L1 covers 52 bits. L2 covers 42 bits. */
+		mmu->root_level = (VA_BITS > 42) ? 1 : 2;
+		break;
+	}
+
+	mmu->page_size = 1UL << mmu->page_shift;
+	mmu->block_size = 1UL << (mmu->page_shift + mmu->level_shift);
+
+	/* Alloc PGD. Use order for allocation size */
+	if (mmu->page_size > PAGE_SIZE) {
+		order = __builtin_ctz(mmu->page_size / PAGE_SIZE);
+	}
+	mmu->pgd = (pgd_t *)alloc_pages(order);
+	if (mmu->pgd) {
+		memset(mmu->pgd, 0, mmu->page_size);
+	} else {
+		free(mmu);
+		return NULL;
+	}
+
+	return mmu;
+}
+
+static unsigned long s2mmu_get_addr_mask(struct s2_mmu *mmu)
+{
+	switch (mmu->granule) {
+	case S2_PAGE_16K:
+		return GENMASK_ULL(47, 14);
+	case S2_PAGE_64K:
+		return GENMASK_ULL(47, 16);
+	default:
+		return GENMASK_ULL(47, 12); /* 4K */
+	}
+}
+
+static void s2mmu_free_tables(struct s2_mmu *mmu, pte_t *table, int level)
+{
+	unsigned long entries = 1UL << mmu->level_shift;
+	unsigned long mask = s2mmu_get_addr_mask(mmu);
+	unsigned long i;
+
+	/*
+	 * Recurse if not leaf level
+	 * Level 3 is always leaf page. Levels 0-2 can be Table or Block.
+	 */
+	if (level < 3) {
+		for (i = 0; i < entries; i++) {
+			pte_t entry = table[i];
+			if ((pte_valid(entry) && pte_is_table(entry))) {
+				pte_t *next = (pte_t *)phys_to_virt(pte_val(entry) & mask);
+				s2mmu_free_tables(mmu, next, level + 1);
+			}
+		}
+	}
+
+	free_pages(table);
+}
+
+void s2mmu_destroy(struct s2_mmu *mmu)
+{
+	if (mmu->pgd)
+		s2mmu_free_tables(mmu, (pte_t *)mmu->pgd, mmu->root_level);
+	free(mmu);
+}
+
+void s2mmu_enable(struct s2_mmu *mmu)
+{
+	unsigned long vtcr = VTCR_PS_VAL | VTCR_SH0_INNER |
+			     VTCR_ORGN0_WBWA | VTCR_IRGN0_WBWA;
+	unsigned long t0sz = 64 - VA_BITS;
+	unsigned long vttbr;
+
+	switch (mmu->granule) {
+	case S2_PAGE_4K:
+		vtcr |= VTCR_TG0_4K;
+		/* SL0 Encodings for 4K: 0=L2, 1=L1, 2=L0 */
+		if (mmu->root_level == 0)
+			vtcr |= (2UL << 6); /* Start L0 */
+		else if (mmu->root_level == 1)
+			vtcr |= (1UL << 6); /* Start L1 */
+		else
+			vtcr |= (0UL << 6); /* Start L2 */
+		break;
+	case S2_PAGE_16K:
+		vtcr |= VTCR_TG0_16K;
+		/* SL0 Encodings for 16K: 0=L3(Res), 1=L2, 2=L1, 3=L0(Res) */
+		if (mmu->root_level == 1)
+			vtcr |= (2UL << 6); /* Start L1 */
+		else
+			vtcr |= (1UL << 6); /* Start L2 */
+		break;
+	case S2_PAGE_64K:
+		vtcr |= VTCR_TG0_64K;
+		/* SL0 Encodings for 64K: 0=L3(Res), 1=L2, 2=L1, 3=L0(Res) */
+		if (mmu->root_level == 1)
+			vtcr |= (2UL << 6); /* Start L1 */
+		else
+			vtcr |= (1UL << 6); /* Start L2 */
+		break;
+	}
+
+	vtcr |= t0sz;
+
+	write_sysreg(vtcr, vtcr_el2);
+
+	/* Setup VTTBR */
+	vttbr = virt_to_phys(mmu->pgd);
+	vttbr |= ((unsigned long)mmu->vmid << 48);
+	write_sysreg(vttbr, vttbr_el2);
+
+	asm volatile("tlbi vmalls12e1is");
+
+	dsb(ish);
+	isb();
+}
+
+void s2mmu_disable(struct s2_mmu *mmu)
+{
+	write_sysreg(0, vttbr_el2);
+	isb();
+}
+
+static pte_t *get_pte(struct s2_mmu *mmu, pte_t *table, unsigned long idx, bool alloc)
+{
+	unsigned long mask = s2mmu_get_addr_mask(mmu);
+	pte_t entry = table[idx];
+	pte_t *next_table;
+	int order = 0;
+
+	if (pte_valid(entry)) {
+		if (pte_is_table(entry))
+			return (pte_t *)phys_to_virt(pte_val(entry) & mask);
+		/* Block Entry */
+		return NULL;
+	}
+
+	if (!alloc)
+		return NULL;
+
+	/* Allocate table memory covering the Stage-2 Granule size */
+	if (mmu->page_size > PAGE_SIZE)
+		order = __builtin_ctz(mmu->page_size / PAGE_SIZE);
+
+	next_table = (pte_t *)alloc_pages(order);
+	if (next_table)
+		memset(next_table, 0, mmu->page_size);
+
+	pte_val(entry) = virt_to_phys(next_table) | PTE_TABLE_BIT | PTE_VALID;
+	WRITE_ONCE(table[idx], entry);
+
+	return next_table;
+}
+
+void s2mmu_map(struct s2_mmu *mmu, unsigned long ipa, unsigned long pa,
+	       unsigned long size, unsigned long flags)
+{
+	unsigned long level_mask, level_shift, level_size, level;
+	unsigned long start_ipa, end_ipa, idx;
+	pte_t entry, *table, *next_table;
+	bool is_block_level;
+
+	start_ipa = ipa;
+	end_ipa = ipa + size;
+	level_mask = (1UL << mmu->level_shift) - 1;
+
+	while (start_ipa < end_ipa) {
+		table = (pte_t *)mmu->pgd;
+
+		/* Walk from Root to Leaf */
+		for (level = mmu->root_level; level < 3; level++) {
+			level_shift = mmu->page_shift + (3 - level) * mmu->level_shift;
+			idx = (start_ipa >> level_shift) & level_mask;
+			level_size = 1UL << level_shift;
+
+			/*
+			 * Check for Block Mapping
+			 * Valid Block Levels:
+			 * 4K:  L1 (1G), L2 (2MB)
+			 * 16K: L2 (32MB)
+			 * 64K: L2 (512MB)
+			 */
+			is_block_level = (level == 2) ||
+				(mmu->granule == S2_PAGE_4K && level == 1);
+
+			if (mmu->allow_block_mappings && is_block_level) {
+				if ((start_ipa & (level_size - 1)) == 0 &&
+				    (pa & (level_size - 1)) == 0 &&
+				    (start_ipa + level_size) <= end_ipa) {
+					/* Map Block */
+					pte_val(entry) = (pa & ~(level_size - 1)) |
+							 flags | PTE_VALID;
+					WRITE_ONCE(table[idx], entry);
+					start_ipa += level_size;
+					pa += level_size;
+					goto next_chunk; /* Continue outer loop */
+				}
+			}
+
+			/* Move to next level */
+			next_table = get_pte(mmu, table, idx, true);
+			if (!next_table) {
+				printf("Error allocating or existing block conflict.\n");
+				return;
+			}
+			table = next_table;
+		}
+
+		/* Leaf Level (Level 3 PTE) */
+		if (level == 3) {
+			idx = (start_ipa >> mmu->page_shift) & level_mask;
+			pte_val(entry) = (pa & ~(mmu->page_size - 1)) | flags | PTE_TYPE_PAGE;
+			WRITE_ONCE(table[idx], entry);
+			start_ipa += mmu->page_size;
+			pa += mmu->page_size;
+		}
+
+next_chunk:
+		continue;
+	}
+
+	asm volatile("tlbi vmalls12e1is");
+	dsb(ish);
+	isb();
+}
+
+/*
+ * Recursive helper to unmap a range within a specific table.
+ * Returns true if the table at this level is now completely empty
+ * and should be freed by the caller.
+ */
+static bool s2mmu_unmap_level(struct s2_mmu *mmu, pte_t *table,
+			      unsigned long current_ipa, int level,
+			      unsigned long start_ipa, unsigned long end_ipa,
+			      unsigned long mask)
+{
+	unsigned long level_size, entry_ipa, entry_end;
+	bool child_empty, table_empty = true;
+	pte_t entry, *next_table;
+	unsigned int level_shift;
+	unsigned long i;
+
+	/* Calculate shift and size for this level */
+	if (level == 3) {
+		level_shift = mmu->page_shift;
+	} else {
+		level_shift = mmu->page_shift + (3 - level) * mmu->level_shift;
+	}
+	level_size = 1UL << level_shift;
+
+	/* Iterate over all entries in this table */
+	for (i = 0; i < (1UL << mmu->level_shift); i++) {
+		entry = table[i];
+		entry_ipa = current_ipa + (i * level_size);
+		entry_end = entry_ipa + level_size;
+
+		/* Skip entries completely outside our target range */
+		if (entry_end <= start_ipa || entry_ipa >= end_ipa) {
+			if (pte_valid(entry))
+				table_empty = false;
+			continue;
+		}
+
+		/*
+		 * If the entry is fully covered by the unmap range,
+		 * we can clear it (leaf) or recurse and free (table).
+		 */
+		if (entry_ipa >= start_ipa && entry_end <= end_ipa) {
+			if (pte_valid(entry)) {
+				if (pte_is_table(entry) && level < 3) {
+					/* Recurse to free children first */
+					next_table = (pte_t *)phys_to_virt(pte_val(entry) & mask);
+					s2mmu_free_tables(mmu, next_table, level + 1);
+				}
+				/* Invalidate the entry */
+				WRITE_ONCE(table[i], __pte(0));
+			}
+			continue;
+		}
+
+		/*
+		 * Partial overlap: This must be a table (split required).
+		 * If it's a Block, we can't split easily in this context
+		 * without complex logic, so we generally skip or fail.
+		 * Assuming standard breakdown: recurse into the table.
+		 */
+		if (pte_valid(entry) && pte_is_table(entry) && level < 3) {
+			next_table = (pte_t *)phys_to_virt(pte_val(entry) & mask);
+			child_empty = s2mmu_unmap_level(mmu, next_table, entry_ipa, level + 1,
+							start_ipa, end_ipa, mask);
+
+			if (child_empty) {
+				free_pages(next_table);
+				WRITE_ONCE(table[i], __pte(0));
+			} else {
+				table_empty = false;
+			}
+		} else if (pte_valid(entry)) {
+			/*
+			 * Overlap on a leaf/block entry that extends
+			 * beyond the unmap range. We cannot simply clear it.
+			 */
+			table_empty = false;
+		}
+	}
+
+	return table_empty;
+}
+
+void s2mmu_unmap(struct s2_mmu *mmu, unsigned long ipa, unsigned long size)
+{
+	unsigned long end_ipa = ipa + size;
+	unsigned long mask = s2mmu_get_addr_mask(mmu);
+
+	if (!mmu->pgd)
+		return;
+
+	/*
+	 * Start recursion from the root level.
+	 * We rarely free the PGD itself unless destroying the MMU,
+	 * so we ignore the return value here.
+	 */
+	s2mmu_unmap_level(mmu, (pte_t *)mmu->pgd, 0, mmu->root_level,
+			  ipa, end_ipa, mask);
+
+	/* Ensure TLB invalidation occurs after page table updates */
+	asm volatile("tlbi vmalls12e1is");
+	dsb(ish);
+	isb();
+}
+
+void s2mmu_print_fault_info(void)
+{
+	unsigned long esr = read_sysreg(esr_el2);
+	unsigned long far = read_sysreg(far_el2);
+	unsigned long hpfar = read_sysreg(hpfar_el2);
+	printf("Stage-2 Fault Info: ESR=0x%lx FAR=0x%lx HPFAR=0x%lx\n", esr, far, hpfar);
+}
-- 
2.53.0.1213.gd9a14994de-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [kvm-unit-tests PATCH v2 3/7] lib: arm64: Generalize exception vector definitions for EL2 support
  2026-04-13 20:46 [kvm-unit-tests PATCH v2 0/7] arm64: Add Stage-2 MMU and Nested Guest Framework Jing Zhang
  2026-04-13 20:46 ` [kvm-unit-tests PATCH v2 1/7] lib: arm64: Generalize ESR exception class definitions for EL2 support Jing Zhang
  2026-04-13 20:46 ` [kvm-unit-tests PATCH v2 2/7] lib: arm64: Add stage2 page table management library Jing Zhang
@ 2026-04-13 20:46 ` Jing Zhang
  2026-04-13 20:46 ` [kvm-unit-tests PATCH v2 4/7] lib: arm64: Add foundational guest execution framework Jing Zhang
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Jing Zhang @ 2026-04-13 20:46 UTC (permalink / raw)
  To: KVM, KVMARM, Marc Zyngier, Joey Gouly, Wei-Lin Chang, Yao Yuan
  Cc: Oliver Upton, Andrew Jones, Alexandru Elisei, Mingwei Zhang,
	Raghavendra Rao Ananta, Colton Lewis, Jing Zhang

Rename EL1-specific exception vector enumerations to ELx equivalents
(e.g., EL1H to ELxH and EL0 to ELx_LOW) to make them level-agnostic.
This allows these identifiers to be reused when running at other
exception levels, such as EL2, supporting the upcoming guest
management library.

Signed-off-by: Jing Zhang <jingzhangos@google.com>
---
 arm/debug.c               |  6 +++---
 arm/gic.c                 |  2 +-
 arm/micro-bench.c         |  4 ++--
 arm/mte.c                 |  6 +++---
 arm/pl031.c               |  2 +-
 arm/pmu.c                 |  2 +-
 arm/psci.c                |  2 +-
 arm/selftest.c            |  6 +++---
 arm/timer.c               |  6 +++---
 lib/arm64/asm/processor.h | 32 ++++++++++++++++----------------
 lib/arm64/processor.c     |  8 ++++----
 11 files changed, 38 insertions(+), 38 deletions(-)

diff --git a/arm/debug.c b/arm/debug.c
index 0299fd28..9ba93ac2 100644
--- a/arm/debug.c
+++ b/arm/debug.c
@@ -272,7 +272,7 @@ static noinline void test_hw_bp(bool migrate)
 		return;
 	}
 
-	install_exception_handler(EL1H_SYNC, ESR_EC_HW_BP_CURRENT, hw_bp_handler);
+	install_exception_handler(ELxH_SYNC, ESR_EC_HW_BP_CURRENT, hw_bp_handler);
 
 	reset_debug_state();
 
@@ -324,7 +324,7 @@ static noinline void test_wp(bool migrate)
 		return;
 	}
 
-	install_exception_handler(EL1H_SYNC, ESR_EC_WP_CURRENT, wp_handler);
+	install_exception_handler(ELxH_SYNC, ESR_EC_WP_CURRENT, wp_handler);
 
 	reset_debug_state();
 
@@ -365,7 +365,7 @@ static noinline void test_ss(bool migrate)
 	extern unsigned char ss_start;
 	uint32_t mdscr;
 
-	install_exception_handler(EL1H_SYNC, ESR_EC_SSTEP_CURRENT, ss_handler);
+	install_exception_handler(ELxH_SYNC, ESR_EC_SSTEP_CURRENT, ss_handler);
 
 	reset_debug_state();
 
diff --git a/arm/gic.c b/arm/gic.c
index 256dd80d..cccc5ae1 100644
--- a/arm/gic.c
+++ b/arm/gic.c
@@ -180,7 +180,7 @@ static void setup_irq(irq_handler_fn handler)
 #ifdef __arm__
 	install_exception_handler(EXCPTN_IRQ, handler);
 #else
-	install_irq_handler(EL1H_IRQ, handler);
+	install_irq_handler(ELxH_IRQ, handler);
 #endif
 	local_irq_enable();
 }
diff --git a/arm/micro-bench.c b/arm/micro-bench.c
index a6a78f20..6ca36c55 100644
--- a/arm/micro-bench.c
+++ b/arm/micro-bench.c
@@ -52,7 +52,7 @@ static void gic_irq_handler(struct pt_regs *regs)
 
 static void gic_secondary_entry(void *data)
 {
-	install_irq_handler(EL1H_IRQ, gic_irq_handler);
+	install_irq_handler(ELxH_IRQ, gic_irq_handler);
 	gic_enable_defaults();
 	local_irq_enable();
 	irq_ready = true;
@@ -212,7 +212,7 @@ static void lpi_exec(void)
 static bool timer_prep(void)
 {
 	gic_enable_defaults();
-	install_irq_handler(EL1H_IRQ, gic_irq_handler);
+	install_irq_handler(ELxH_IRQ, gic_irq_handler);
 	local_irq_enable();
 
 	if (current_level() == CurrentEL_EL1)
diff --git a/arm/mte.c b/arm/mte.c
index a1bed8a7..38940d7e 100644
--- a/arm/mte.c
+++ b/arm/mte.c
@@ -192,7 +192,7 @@ static void mte_sync_test(void)
 
 	mte_exception = false;
 
-	install_exception_handler(EL1H_SYNC, ESR_EL1_EC_DABT_EL1, mte_fault_handler);
+	install_exception_handler(ELxH_SYNC, ESR_EL1_EC_DABT_EL1, mte_fault_handler);
 
 	mem_read(tagged(mem, 2), &val);
 
@@ -218,12 +218,12 @@ static void mte_asymm_test(void)
 	mte_set_tcf(MTE_TCF_ASYMM);
 	mte_exception = false;
 
-	install_exception_handler(EL1H_SYNC, ESR_EL1_EC_DABT_EL1, mte_fault_handler);
+	install_exception_handler(ELxH_SYNC, ESR_EL1_EC_DABT_EL1, mte_fault_handler);
 
 	mem_read(tagged(mem, 3), &val);
 	report((val == 0) && mte_exception && (get_clear_tfsr() == 0), "read");
 
-	install_exception_handler(EL1H_SYNC, ESR_EL1_EC_DABT_EL1, NULL);
+	install_exception_handler(ELxH_SYNC, ESR_EL1_EC_DABT_EL1, NULL);
 
 	mem_write(tagged(mem, 4), 0xaaaaaaaa);
 	report((*mem == 0xaaaaaaaa) && (get_clear_tfsr() == TFSR_EL1_TF0), "write");
diff --git a/arm/pl031.c b/arm/pl031.c
index c56805e4..3c739d4d 100644
--- a/arm/pl031.c
+++ b/arm/pl031.c
@@ -162,7 +162,7 @@ static int check_rtc_irq(void)
 	writel(before + seconds_to_wait, &pl031->mr);
 
 #ifdef __aarch64__
-	install_irq_handler(EL1H_IRQ, irq_handler);
+	install_irq_handler(ELxH_IRQ, irq_handler);
 #else
 	install_exception_handler(EXCPTN_IRQ, irq_handler);
 #endif
diff --git a/arm/pmu.c b/arm/pmu.c
index 2fcec71a..90878641 100644
--- a/arm/pmu.c
+++ b/arm/pmu.c
@@ -968,7 +968,7 @@ static void test_overflow_interrupt(bool overflow_at_64bits)
 		return;
 
 	gic_enable_defaults();
-	install_irq_handler(EL1H_IRQ, irq_handler);
+	install_irq_handler(ELxH_IRQ, irq_handler);
 	local_irq_enable();
 	gic_enable_irq(23);
 
diff --git a/arm/psci.c b/arm/psci.c
index 55308c8f..a52a1701 100644
--- a/arm/psci.c
+++ b/arm/psci.c
@@ -36,7 +36,7 @@ static void install_invalid_function_handler(exception_fn handler)
 #ifdef __arm__
 	install_exception_handler(EXCPTN_UND, handler);
 #else
-	install_exception_handler(EL1H_SYNC, ESR_EL1_EC_UNKNOWN, handler);
+	install_exception_handler(ELxH_SYNC, ESR_EL1_EC_UNKNOWN, handler);
 #endif
 }
 
diff --git a/arm/selftest.c b/arm/selftest.c
index 33f4cf42..7a626980 100644
--- a/arm/selftest.c
+++ b/arm/selftest.c
@@ -305,7 +305,7 @@ static enum vector check_vector_prep(void)
 	unsigned long daif;
 
 	if (is_user())
-		return EL0_SYNC_64;
+		return ELx_LOW_SYNC_64;
 
 	asm volatile("mrs %0, daif" : "=r" (daif) ::);
 	expected_regs.pstate = daif;
@@ -313,7 +313,7 @@ static enum vector check_vector_prep(void)
 		expected_regs.pstate |= PSR_MODE_EL1h;
 	else
 		expected_regs.pstate |= PSR_MODE_EL2h;
-	return EL1H_SYNC;
+	return ELxH_SYNC;
 }
 
 static void unknown_handler(struct pt_regs *regs, unsigned int esr __unused)
@@ -400,7 +400,7 @@ static void check_vectors(void *arg __unused)
 #ifdef __arm__
 		install_exception_handler(EXCPTN_UND, user_psci_system_off);
 #else
-		install_exception_handler(EL0_SYNC_64, ESR_EL1_EC_UNKNOWN,
+		install_exception_handler(ELx_LOW_SYNC_64, ESR_EL1_EC_UNKNOWN,
 					  user_psci_system_off);
 #endif
 	} else {
diff --git a/arm/timer.c b/arm/timer.c
index 43fb6d88..0f77f8d8 100644
--- a/arm/timer.c
+++ b/arm/timer.c
@@ -356,9 +356,9 @@ static void test_init(void)
 		vtimer_info.irq = TIMER_HVTIMER_IRQ;
 	}
 
-	install_exception_handler(EL1H_SYNC, ESR_EL1_EC_UNKNOWN, ptimer_unsupported_handler);
+	install_exception_handler(ELxH_SYNC, ESR_EL1_EC_UNKNOWN, ptimer_unsupported_handler);
 	ptimer_info.read_ctl();
-	install_exception_handler(EL1H_SYNC, ESR_EL1_EC_UNKNOWN, NULL);
+	install_exception_handler(ELxH_SYNC, ESR_EL1_EC_UNKNOWN, NULL);
 
 	if (ptimer_unsupported && !ERRATA(7b6b46311a85)) {
 		report_skip("Skipping ptimer tests. Set ERRATA_7b6b46311a85=y to enable.");
@@ -369,7 +369,7 @@ static void test_init(void)
 
 	gic_enable_defaults();
 
-	install_irq_handler(EL1H_IRQ, irq_handler);
+	install_irq_handler(ELxH_IRQ, irq_handler);
 	set_timer_irq_enabled(&ptimer_info, true);
 	set_timer_irq_enabled(&vtimer_info, true);
 	local_irq_enable();
diff --git a/lib/arm64/asm/processor.h b/lib/arm64/asm/processor.h
index 32ddc1b3..c20dd599 100644
--- a/lib/arm64/asm/processor.h
+++ b/lib/arm64/asm/processor.h
@@ -13,22 +13,22 @@
 #include <asm/barrier.h>
 
 enum vector {
-	EL1T_SYNC,
-	EL1T_IRQ,
-	EL1T_FIQ,
-	EL1T_ERROR,
-	EL1H_SYNC,
-	EL1H_IRQ,
-	EL1H_FIQ,
-	EL1H_ERROR,
-	EL0_SYNC_64,
-	EL0_IRQ_64,
-	EL0_FIQ_64,
-	EL0_ERROR_64,
-	EL0_SYNC_32,
-	EL0_IRQ_32,
-	EL0_FIQ_32,
-	EL0_ERROR_32,
+	ELxT_SYNC,
+	ELxT_IRQ,
+	ELxT_FIQ,
+	ELxT_ERROR,
+	ELxH_SYNC,
+	ELxH_IRQ,
+	ELxH_FIQ,
+	ELxH_ERROR,
+	ELx_LOW_SYNC_64,
+	ELx_LOW_IRQ_64,
+	ELx_LOW_FIQ_64,
+	ELx_LOW_ERROR_64,
+	ELx_LOW_SYNC_32,
+	ELx_LOW_IRQ_32,
+	ELx_LOW_FIQ_32,
+	ELx_LOW_ERROR_32,
 	VECTOR_MAX,
 };
 
diff --git a/lib/arm64/processor.c b/lib/arm64/processor.c
index bde3caa5..98e26912 100644
--- a/lib/arm64/processor.c
+++ b/lib/arm64/processor.c
@@ -198,10 +198,10 @@ void default_vector_irq_handler(enum vector v, struct pt_regs *regs,
 
 void vector_handlers_default_init(vector_fn *handlers)
 {
-	handlers[EL1H_SYNC]	= default_vector_sync_handler;
-	handlers[EL1H_IRQ]	= default_vector_irq_handler;
-	handlers[EL0_SYNC_64]	= default_vector_sync_handler;
-	handlers[EL0_IRQ_64]	= default_vector_irq_handler;
+	handlers[ELxH_SYNC]	= default_vector_sync_handler;
+	handlers[ELxH_IRQ]	= default_vector_irq_handler;
+	handlers[ELx_LOW_SYNC_64]	= default_vector_sync_handler;
+	handlers[ELx_LOW_IRQ_64]	= default_vector_irq_handler;
 }
 
 /* Needed to compile with -Wmissing-prototypes */
-- 
2.53.0.1213.gd9a14994de-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [kvm-unit-tests PATCH v2 4/7] lib: arm64: Add foundational guest execution framework
  2026-04-13 20:46 [kvm-unit-tests PATCH v2 0/7] arm64: Add Stage-2 MMU and Nested Guest Framework Jing Zhang
                   ` (2 preceding siblings ...)
  2026-04-13 20:46 ` [kvm-unit-tests PATCH v2 3/7] lib: arm64: Generalize exception vector definitions for EL2 support Jing Zhang
@ 2026-04-13 20:46 ` Jing Zhang
  2026-04-16 16:16   ` Joey Gouly
  2026-04-13 20:46 ` [kvm-unit-tests PATCH v2 5/7] lib: arm64: Add support for guest exit exception handling Jing Zhang
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 11+ messages in thread
From: Jing Zhang @ 2026-04-13 20:46 UTC (permalink / raw)
  To: KVM, KVMARM, Marc Zyngier, Joey Gouly, Wei-Lin Chang, Yao Yuan
  Cc: Oliver Upton, Andrew Jones, Alexandru Elisei, Mingwei Zhang,
	Raghavendra Rao Ananta, Colton Lewis, Jing Zhang

Introduce the infrastructure to manage and execute guest when running
at EL2. This provides the basis for testing advanced features like
nested virtualization and GICv4 direct interrupt injection.

The framework includes:
- 'struct guest': Encapsulates vCPU state (GPRs, EL1/EL2 sysregs) and
  Stage-2 MMU context.
- guest_create() / guest_destroy(): Handle lifecycle management and
  Stage-2 MMU setup, including identity mappings for guest code,
  stack, and UART.
- guest_run(): Assembly entry point that saves host callee-saved
  registers, caches the guest context pointer in TPIDR_EL2, and
  performs the exception return (ERET) to the guest.

Signed-off-by: Jing Zhang <jingzhangos@google.com>
---
 arm/Makefile.arm64      |  2 +
 lib/arm64/asm-offsets.c | 13 ++++++
 lib/arm64/asm/guest.h   | 46 +++++++++++++++++++++
 lib/arm64/asm/sysreg.h  |  6 +++
 lib/arm64/guest.c       | 89 +++++++++++++++++++++++++++++++++++++++++
 lib/arm64/guest_arch.S  | 59 +++++++++++++++++++++++++++
 6 files changed, 215 insertions(+)
 create mode 100644 lib/arm64/asm/guest.h
 create mode 100644 lib/arm64/guest.c
 create mode 100644 lib/arm64/guest_arch.S

diff --git a/arm/Makefile.arm64 b/arm/Makefile.arm64
index 5e50f5ba..9026fd71 100644
--- a/arm/Makefile.arm64
+++ b/arm/Makefile.arm64
@@ -41,6 +41,8 @@ cflatobjs += lib/arm64/processor.o
 cflatobjs += lib/arm64/spinlock.o
 cflatobjs += lib/arm64/gic-v3-its.o lib/arm64/gic-v3-its-cmd.o
 cflatobjs += lib/arm64/stage2_mmu.o
+cflatobjs += lib/arm64/guest.o
+cflatobjs += lib/arm64/guest_arch.o
 
 ifeq ($(CONFIG_EFI),y)
 cflatobjs += lib/acpi.o
diff --git a/lib/arm64/asm-offsets.c b/lib/arm64/asm-offsets.c
index 80de023c..ceeecce5 100644
--- a/lib/arm64/asm-offsets.c
+++ b/lib/arm64/asm-offsets.c
@@ -8,6 +8,7 @@
 #include <libcflat.h>
 #include <kbuild.h>
 #include <asm/ptrace.h>
+#include <asm/guest.h>
 
 int main(void)
 {
@@ -30,5 +31,17 @@ int main(void)
 	DEFINE(S_FP, sizeof(struct pt_regs));
 	DEFINE(S_FRAME_SIZE, (sizeof(struct pt_regs) + 16));
 
+	OFFSET(GUEST_X_OFFSET, guest, x);
+	OFFSET(GUEST_ELR_OFFSET, guest, elr_el2);
+	OFFSET(GUEST_SPSR_OFFSET, guest, spsr_el2);
+	OFFSET(GUEST_HCR_OFFSET, guest, hcr_el2);
+	OFFSET(GUEST_VTTBR_OFFSET, guest, vttbr_el2);
+	OFFSET(GUEST_SCTLR_OFFSET, guest, sctlr_el1);
+	OFFSET(GUEST_SP_EL1_OFFSET, guest, sp_el1);
+	OFFSET(GUEST_ESR_OFFSET, guest, esr_el2);
+	OFFSET(GUEST_FAR_OFFSET, guest, far_el2);
+	OFFSET(GUEST_HPFAR_OFFSET, guest, hpfar_el2);
+	OFFSET(GUEST_EXIT_CODE_OFFSET, guest, exit_code);
+
 	return 0;
 }
diff --git a/lib/arm64/asm/guest.h b/lib/arm64/asm/guest.h
new file mode 100644
index 00000000..826c44f8
--- /dev/null
+++ b/lib/arm64/asm/guest.h
@@ -0,0 +1,46 @@
+/*
+ * Copyright (C) 2026, Google LLC.
+ * Author: Jing Zhang <jingzhangos@google.com>
+ *
+ * SPDX-License-Identifier: LGPL-2.0-or-later
+ */
+#ifndef _ASMARM64_GUEST_H_
+#define _ASMARM64_GUEST_H_
+
+#include <libcflat.h>
+#include <asm/processor.h>
+#include <asm/stage2_mmu.h>
+
+#define HCR_GUEST_FLAGS (HCR_EL2_VM | HCR_EL2_FMO | HCR_EL2_IMO | \
+			 HCR_EL2_AMO | HCR_EL2_RW | HCR_EL2_E2H)
+/* Guest stack size */
+#define GUEST_STACK_SIZE		SZ_64K
+
+struct guest {
+	/* General Purpose Registers */
+	unsigned long x[31]; /* x0..x30 */
+
+	/* Execution State */
+	unsigned long elr_el2;
+	unsigned long spsr_el2;
+
+	/* Control Registers */
+	unsigned long hcr_el2;
+	unsigned long vttbr_el2;
+	unsigned long sctlr_el1;
+	unsigned long sp_el1;
+
+	/* Exit Information */
+	unsigned long esr_el2;
+	unsigned long far_el2;
+	unsigned long hpfar_el2;
+	unsigned long exit_code;
+
+	struct s2_mmu *s2mmu;
+};
+
+struct guest *guest_create(int vmid, void (*guest_func)(void), enum s2_granule granule);
+void guest_destroy(struct guest *guest);
+void guest_run(struct guest *guest);
+
+#endif /* _ASMARM64_GUEST_H_ */
diff --git a/lib/arm64/asm/sysreg.h b/lib/arm64/asm/sysreg.h
index f2d05018..857bee98 100644
--- a/lib/arm64/asm/sysreg.h
+++ b/lib/arm64/asm/sysreg.h
@@ -118,6 +118,10 @@ asm(
 #define SCTLR_EL1_TCF0_SHIFT	38
 #define SCTLR_EL1_TCF0_MASK	GENMASK_ULL(39, 38)
 
+#define HCR_EL2_VM		_BITULL(0)
+#define HCR_EL2_FMO		_BITULL(3)
+#define HCR_EL2_IMO		_BITULL(4)
+#define HCR_EL2_AMO		_BITULL(5)
 #define HCR_EL2_TGE		_BITULL(27)
 #define HCR_EL2_RW		_BITULL(31)
 #define HCR_EL2_E2H		_BITULL(34)
@@ -132,6 +136,8 @@ asm(
 #define SYS_HFGWTR2_EL2		sys_reg(3, 4, 3, 1, 3)
 #define SYS_HFGITR2_EL2		sys_reg(3, 4, 3, 1, 7)
 
+#define SYS_SCTLR_EL1		sys_reg(3, 5, 1, 0, 0)
+
 #define INIT_SCTLR_EL1_MMU_OFF	\
 			(SCTLR_EL1_ITD | SCTLR_EL1_SED | SCTLR_EL1_EOS | \
 			 SCTLR_EL1_TSCXT | SCTLR_EL1_EIS | SCTLR_EL1_SPAN | \
diff --git a/lib/arm64/guest.c b/lib/arm64/guest.c
new file mode 100644
index 00000000..68dd449d
--- /dev/null
+++ b/lib/arm64/guest.c
@@ -0,0 +1,89 @@
+/*
+ * Copyright (C) 2026, Google LLC.
+ * Author: Jing Zhang <jingzhangos@google.com>
+ *
+ * SPDX-License-Identifier: LGPL-2.0-or-later
+ */
+#include <libcflat.h>
+#include <asm/guest.h>
+#include <asm/io.h>
+#include <asm/sysreg.h>
+#include <asm/barrier.h>
+#include <alloc_page.h>
+#include <alloc.h>
+
+static struct guest *__guest_create(struct s2_mmu *s2_ctx, void *entry_point)
+{
+	struct guest *guest = calloc(1, sizeof(struct guest));
+
+	guest->elr_el2 = (unsigned long)entry_point;
+	guest->spsr_el2 = 0x3C5; /* M=EL1h, DAIF=Masked */
+	guest->hcr_el2 = HCR_GUEST_FLAGS;
+
+	if (s2_ctx) {
+		guest->vttbr_el2 = virt_to_phys(s2_ctx->pgd);
+		guest->vttbr_el2 |= ((unsigned long)s2_ctx->vmid << 48);
+	} else {
+		printf("Stage 2 MMU context missing!");
+	}
+
+	guest->sctlr_el1 = read_sysreg(sctlr_el1);
+	/* Disable guest stage 1 translation */
+	guest->sctlr_el1 &= ~(SCTLR_EL1_M | SCTLR_EL1_C);
+	guest->sctlr_el1 |= SCTLR_EL1_I;
+
+	guest->s2mmu = s2_ctx;
+
+	return guest;
+}
+
+struct guest *guest_create(int vmid, void (*guest_func)(void), enum s2_granule granule)
+{
+	unsigned long guest_pa, code_base, stack_pa;
+	unsigned long *stack_page;
+	struct guest *guest;
+	struct s2_mmu *ctx;
+
+	ctx = s2mmu_init(vmid, granule, true);
+	/*
+	 * Map the Host's code segment Identity Mapped (IPA=PA).
+	 * To be safe, we map a large chunk (e.g., 2MB) around the function
+	 * to capture any helper functions the compiler might generate calls to.
+	 */
+	guest_pa = virt_to_phys((void *)guest_func);
+	code_base = guest_pa & ~(SZ_2M - 1);
+	s2mmu_map(ctx, code_base, code_base, SZ_2M, S2_MAP_RW);
+
+	/*
+	 * Map Stack
+	 * Allocate 16 pages (64K) in Host, get its PA, and map it for Guest.
+	 */
+	stack_page = alloc_pages(get_order(GUEST_STACK_SIZE >> PAGE_SHIFT));
+	stack_pa = virt_to_phys(stack_page);
+	/* Identity Map it (IPA = PA) */
+	s2mmu_map(ctx, stack_pa, stack_pa, GUEST_STACK_SIZE, S2_MAP_RW);
+
+	s2mmu_enable(ctx);
+
+	/* Create Guest */
+	/* Entry point is the PA of the function (Identity Mapped) */
+	guest = __guest_create(ctx, (void *)guest_pa);
+
+	/*
+	 * Setup Guest Stack Pointer
+	 * Must match where we mapped the stack + Offset
+	 */
+	guest->sp_el1 = stack_pa + GUEST_STACK_SIZE;
+
+	/* Map UART identity mapped, printf() available to guest */
+	s2mmu_map(ctx, 0x09000000, 0x09000000, PAGE_SIZE, S2_MAP_DEVICE);
+
+	return guest;
+}
+
+void guest_destroy(struct guest *guest)
+{
+	s2mmu_disable(guest->s2mmu);
+	s2mmu_destroy(guest->s2mmu);
+	free(guest);
+}
diff --git a/lib/arm64/guest_arch.S b/lib/arm64/guest_arch.S
new file mode 100644
index 00000000..70c19507
--- /dev/null
+++ b/lib/arm64/guest_arch.S
@@ -0,0 +1,59 @@
+/*
+ * Copyright (C) 2026, Google LLC.
+ * Author: Jing Zhang <jingzhangos@google.com>
+ *
+ * SPDX-License-Identifier: LGPL-2.0-or-later
+ */
+#define __ASSEMBLY__
+#include <asm/asm-offsets.h>
+#include <asm/sysreg.h>
+
+.global guest_run
+guest_run:
+	/* x0 = struct guest pointer */
+
+	/* Save Host Callee-Saved Regs */
+	stp	x29, x30, [sp, #-16]!
+	stp	x27, x28, [sp, #-16]!
+	stp	x25, x26, [sp, #-16]!
+	stp	x23, x24, [sp, #-16]!
+	stp	x21, x22, [sp, #-16]!
+	stp	x19, x20, [sp, #-16]!
+
+	/* Cache Guest Pointer in TPIDR_EL2 */
+	msr	tpidr_el2, x0
+
+	/* Load Guest System Registers */
+	ldr	x1, [x0, #GUEST_ELR_OFFSET]
+	msr	elr_el2, x1
+	ldr	x1, [x0, #GUEST_SPSR_OFFSET]
+	msr	spsr_el2, x1
+	ldr	x1, [x0, #GUEST_HCR_OFFSET]
+	msr	hcr_el2, x1
+	ldr	x1, [x0, #GUEST_VTTBR_OFFSET]
+	msr	vttbr_el2, x1
+	ldr	x1, [x0, #GUEST_SCTLR_OFFSET]
+	msr_s	SYS_SCTLR_EL1, x1
+	ldr	x1, [x0, #GUEST_SP_EL1_OFFSET]
+	msr	sp_el1, x1
+
+	/* Load Guest GPRs */
+	ldp	x1, x2, [x0, #8]
+	ldp	x3, x4, [x0, #24]
+	ldp	x5, x6, [x0, #40]
+	ldp	x7, x8, [x0, #56]
+	ldp	x9, x10, [x0, #72]
+	ldp	x11, x12, [x0, #88]
+	ldp	x13, x14, [x0, #104]
+	ldp	x15, x16, [x0, #120]
+	ldp	x17, x18, [x0, #136]
+	ldp	x19, x20, [x0, #152]
+	ldp	x21, x22, [x0, #168]
+	ldp	x23, x24, [x0, #184]
+	ldp	x25, x26, [x0, #200]
+	ldp	x27, x28, [x0, #216]
+	ldp	x29, x30, [x0, #232]
+	ldr	x0, [x0, #0]
+
+	isb
+	eret
-- 
2.53.0.1213.gd9a14994de-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [kvm-unit-tests PATCH v2 5/7] lib: arm64: Add support for guest exit exception handling
  2026-04-13 20:46 [kvm-unit-tests PATCH v2 0/7] arm64: Add Stage-2 MMU and Nested Guest Framework Jing Zhang
                   ` (3 preceding siblings ...)
  2026-04-13 20:46 ` [kvm-unit-tests PATCH v2 4/7] lib: arm64: Add foundational guest execution framework Jing Zhang
@ 2026-04-13 20:46 ` Jing Zhang
  2026-04-13 20:46 ` [kvm-unit-tests PATCH v2 6/7] lib: arm64: Add guest-internal exception handling (EL1) Jing Zhang
  2026-04-13 20:46 ` [kvm-unit-tests PATCH v2 7/7] arm64: Add Stage-2 MMU demand paging test Jing Zhang
  6 siblings, 0 replies; 11+ messages in thread
From: Jing Zhang @ 2026-04-13 20:46 UTC (permalink / raw)
  To: KVM, KVMARM, Marc Zyngier, Joey Gouly, Wei-Lin Chang, Yao Yuan
  Cc: Oliver Upton, Andrew Jones, Alexandru Elisei, Mingwei Zhang,
	Raghavendra Rao Ananta, Colton Lewis, Jing Zhang

Implement the logic to intercept and handle exceptions from managed
guests at EL2. This completes the context switch loop by providing a
mechanism to save guest state, route exceptions to C handlers, and
either resume the guest or return to the host.

Key additions:
- 'guest_hyp_vectors': An EL2 vector table installed during guest
  execution to trap guest exits (Synchronous, IRQ, etc.).
- 'guest_c_exception_handler': A C-level dispatcher that updates the
  guest context with exception info (ESR, FAR, HPFAR) and invokes
  registered per-vector handlers.
- Integrated save/restore logic: guest_run() now handles the full
  cycle of saving host state, loading guest state, trapping exits,
  and optionally resuming or returning.
- Handler API: Added guest_install_handler() to allow tests to register
  custom logic for specific exception types.

Signed-off-by: Jing Zhang <jingzhangos@google.com>
---
 lib/arm64/asm/guest.h  |  19 +++++++
 lib/arm64/guest.c      |  30 +++++++++++
 lib/arm64/guest_arch.S | 111 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 160 insertions(+)

diff --git a/lib/arm64/asm/guest.h b/lib/arm64/asm/guest.h
index 826c44f8..b99641ef 100644
--- a/lib/arm64/asm/guest.h
+++ b/lib/arm64/asm/guest.h
@@ -16,6 +16,19 @@
 /* Guest stack size */
 #define GUEST_STACK_SIZE		SZ_64K
 
+/*
+ * Result from Handler:
+ * RESUME: Keep guest running (ERET immediately)
+ * EXIT:   Return to Host C caller
+ */
+enum guest_handler_result {
+	GUEST_ACTION_RESUME,
+	GUEST_ACTION_EXIT
+};
+
+struct guest;
+typedef enum guest_handler_result (*guest_handler_t)(struct guest *guest);
+
 struct guest {
 	/* General Purpose Registers */
 	unsigned long x[31]; /* x0..x30 */
@@ -36,6 +49,9 @@ struct guest {
 	unsigned long hpfar_el2;
 	unsigned long exit_code;
 
+	/* Exception Handlers in EL2 */
+	guest_handler_t handlers[VECTOR_MAX];
+
 	struct s2_mmu *s2mmu;
 };
 
@@ -43,4 +59,7 @@ struct guest *guest_create(int vmid, void (*guest_func)(void), enum s2_granule g
 void guest_destroy(struct guest *guest);
 void guest_run(struct guest *guest);
 
+unsigned long guest_c_exception_handler(struct guest *guest, unsigned long vector_offset);
+void guest_install_handler(struct guest *guest, enum vector v, guest_handler_t handler);
+
 #endif /* _ASMARM64_GUEST_H_ */
diff --git a/lib/arm64/guest.c b/lib/arm64/guest.c
index 68dd449d..dd42ccd6 100644
--- a/lib/arm64/guest.c
+++ b/lib/arm64/guest.c
@@ -12,6 +12,30 @@
 #include <alloc_page.h>
 #include <alloc.h>
 
+/*
+ * C-Entry for Exception Handling
+ * Returns 0 to Resume Guest, 1 to Exit to Host Caller
+ */
+unsigned long guest_c_exception_handler(struct guest *guest, unsigned long vector_offset)
+{
+	enum vector vector = (enum vector)guest->exit_code;
+
+	/* Save Trap Info */
+	guest->esr_el2 = read_sysreg(esr_el2);
+	guest->far_el2 = read_sysreg(far_el2);
+	guest->hpfar_el2 = read_sysreg(hpfar_el2);
+
+	/* Invoke Handler if registered */
+	if (guest->handlers[vector]) {
+		if (guest->handlers[vector](guest) == GUEST_ACTION_RESUME) {
+			return 0; /* ASM stub will restore and ERET */
+		}
+	}
+
+	/* Default: Exit to caller */
+	return 1;
+}
+
 static struct guest *__guest_create(struct s2_mmu *s2_ctx, void *entry_point)
 {
 	struct guest *guest = calloc(1, sizeof(struct guest));
@@ -87,3 +111,9 @@ void guest_destroy(struct guest *guest)
 	s2mmu_destroy(guest->s2mmu);
 	free(guest);
 }
+
+void guest_install_handler(struct guest *guest, enum vector v, guest_handler_t handler)
+{
+	if (v < VECTOR_MAX)
+		guest->handlers[v] = handler;
+}
diff --git a/lib/arm64/guest_arch.S b/lib/arm64/guest_arch.S
index 70c19507..81874fd0 100644
--- a/lib/arm64/guest_arch.S
+++ b/lib/arm64/guest_arch.S
@@ -55,5 +55,116 @@ guest_run:
 	ldp	x29, x30, [x0, #232]
 	ldr	x0, [x0, #0]
 
+	/* Install Trap Handler */
+	adrp	x29, hyp_el2_vectors
+	add	x29, x29, :lo12:hyp_el2_vectors
+	msr	vbar_el2, x29
+
+	/* Restore x29 from struct (via tpidr_el2) */
+	mrs	x29, tpidr_el2
+	ldr	x29, [x29, #232]
+
 	isb
 	eret
+
+/* EL2 Hypervisor Vector Table */
+.macro hyp_el2_vector, vector_num
+	.align 7
+	stp	x0, x1, [sp, #-16]!
+	mrs	x0, tpidr_el2
+	mov	x1, \vector_num
+	str	x1, [x0, #GUEST_EXIT_CODE_OFFSET]
+	b	guest_common_exit
+.endm
+
+	.align 11
+hyp_el2_vectors:
+	.skip 0x200
+
+	hyp_el2_vector #4	/* ELxH_SYNC */
+	hyp_el2_vector #5	/* ELxH_IRQ */
+	hyp_el2_vector #6	/* ELxH_FIQ */
+	hyp_el2_vector #7	/* ELxH_ERROR */
+
+	hyp_el2_vector #8	/* ELx_LOW_SYNC_64 */
+	hyp_el2_vector #9	/* ELx_LOW_IRQ_64 */
+	hyp_el2_vector #10	/* ELx_LOW_FIQ_64 */
+	hyp_el2_vector #11	/* ELx_LOW_ERROR_64 */
+
+	.align 11
+
+guest_common_exit:
+	stp	x2, x3, [x0, #16]
+	stp	x4, x5, [x0, #32]
+	stp	x6, x7, [x0, #48]
+	stp	x8, x9, [x0, #64]
+	stp	x10, x11, [x0, #80]
+	stp	x12, x13, [x0, #96]
+	stp	x14, x15, [x0, #112]
+	stp	x16, x17, [x0, #128]
+	stp	x18, x19, [x0, #144]
+	stp	x20, x21, [x0, #160]
+	stp	x22, x23, [x0, #176]
+	stp	x24, x25, [x0, #192]
+	stp	x26, x27, [x0, #208]
+	stp	x28, x29, [x0, #224]
+	str	x30, [x0, #240]
+
+	ldp	x2, x3, [sp], #16
+	stp	x2, x3, [x0, #0]
+
+	mrs	x1, elr_el2
+	str	x1, [x0, #GUEST_ELR_OFFSET]
+	mrs	x1, spsr_el2
+	str	x1, [x0, #GUEST_SPSR_OFFSET]
+	mrs	x1, esr_el2
+	str	x1, [x0, #GUEST_ESR_OFFSET]
+	mrs	x1, far_el2
+	str	x1, [x0, #GUEST_FAR_OFFSET]
+	mrs	x1, hpfar_el2
+	str	x1, [x0, #GUEST_HPFAR_OFFSET]
+	mrs	x1, sp_el1
+	str	x1, [x0, #GUEST_SP_EL1_OFFSET]
+
+	/* x29 contains vector offset from entry */
+	mov	x1, x29
+	bl	guest_c_exception_handler
+	cbz	x0, guest_resume_guest
+
+	/* EXIT */
+	/* Restore Host Callee-Saved Regs */
+	ldp	x19, x20, [sp], #16
+	ldp	x21, x22, [sp], #16
+	ldp	x23, x24, [sp], #16
+	ldp	x25, x26, [sp], #16
+	ldp	x27, x28, [sp], #16
+	ldp	x29, x30, [sp], #16
+	ret
+
+	/* RESUME */
+guest_resume_guest:
+	mrs	x0, tpidr_el2
+	ldr	x1, [x0, #GUEST_ELR_OFFSET]
+	msr	elr_el2, x1
+	ldr	x1, [x0, #GUEST_SPSR_OFFSET]
+	msr	spsr_el2, x1
+	ldr	x1, [x0, #GUEST_SP_EL1_OFFSET]
+	msr	sp_el1, x1
+
+	ldp	x1, x2, [x0, #8]
+	ldp	x3, x4, [x0, #24]
+	ldp	x5, x6, [x0, #40]
+	ldp	x7, x8, [x0, #56]
+	ldp	x9, x10, [x0, #72]
+	ldp	x11, x12, [x0, #88]
+	ldp	x13, x14, [x0, #104]
+	ldp	x15, x16, [x0, #120]
+	ldp	x17, x18, [x0, #136]
+	ldp	x19, x20, [x0, #152]
+	ldp	x21, x22, [x0, #168]
+	ldp	x23, x24, [x0, #184]
+	ldp	x25, x26, [x0, #200]
+	ldp	x27, x28, [x0, #216]
+	ldp	x29, x30, [x0, #232]
+	ldr	x0, [x0, #0]
+	eret
-- 
2.53.0.1213.gd9a14994de-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [kvm-unit-tests PATCH v2 6/7] lib: arm64: Add guest-internal exception handling (EL1)
  2026-04-13 20:46 [kvm-unit-tests PATCH v2 0/7] arm64: Add Stage-2 MMU and Nested Guest Framework Jing Zhang
                   ` (4 preceding siblings ...)
  2026-04-13 20:46 ` [kvm-unit-tests PATCH v2 5/7] lib: arm64: Add support for guest exit exception handling Jing Zhang
@ 2026-04-13 20:46 ` Jing Zhang
  2026-04-13 20:46 ` [kvm-unit-tests PATCH v2 7/7] arm64: Add Stage-2 MMU demand paging test Jing Zhang
  6 siblings, 0 replies; 11+ messages in thread
From: Jing Zhang @ 2026-04-13 20:46 UTC (permalink / raw)
  To: KVM, KVMARM, Marc Zyngier, Joey Gouly, Wei-Lin Chang, Yao Yuan
  Cc: Oliver Upton, Andrew Jones, Alexandru Elisei, Mingwei Zhang,
	Raghavendra Rao Ananta, Colton Lewis, Jing Zhang

Implement the infrastructure for managed guests to handle their own
exceptions at EL1. This allows guests to manage internal traps
independently of the EL2 host.

Changes include:
- 'guest_el1_vectors': A dedicated EL1 vector table for guests,
  supporting synchronous exceptions, IRQs, FIQs, and SErrors.
- 'guest_context': A per-guest metadata structure tracked via
  TPIDR_EL1, containing a dispatch table for guest-internal handlers.
- 'guest_el1_c_handler': C-level dispatcher that executes registered
  guest handlers or triggers an HVC trap back to the host for unhandled
  exceptions.
- Integrated setup: guest_create() now allocates and maps the guest
  context, while the context switch logic ensures VBAR_EL1 and
  TPIDR_EL1 are correctly saved and restored.

Signed-off-by: Jing Zhang <jingzhangos@google.com>
---
 lib/arm64/asm-offsets.c |  2 ++
 lib/arm64/asm/guest.h   | 26 ++++++++++++++
 lib/arm64/asm/sysreg.h  |  1 +
 lib/arm64/guest.c       | 39 +++++++++++++++++++++
 lib/arm64/guest_arch.S  | 78 +++++++++++++++++++++++++++++++++++++++++
 5 files changed, 146 insertions(+)

diff --git a/lib/arm64/asm-offsets.c b/lib/arm64/asm-offsets.c
index ceeecce5..fc9f28fb 100644
--- a/lib/arm64/asm-offsets.c
+++ b/lib/arm64/asm-offsets.c
@@ -37,11 +37,13 @@ int main(void)
 	OFFSET(GUEST_HCR_OFFSET, guest, hcr_el2);
 	OFFSET(GUEST_VTTBR_OFFSET, guest, vttbr_el2);
 	OFFSET(GUEST_SCTLR_OFFSET, guest, sctlr_el1);
+	OFFSET(GUEST_VBAR_OFFSET, guest, vbar_el1);
 	OFFSET(GUEST_SP_EL1_OFFSET, guest, sp_el1);
 	OFFSET(GUEST_ESR_OFFSET, guest, esr_el2);
 	OFFSET(GUEST_FAR_OFFSET, guest, far_el2);
 	OFFSET(GUEST_HPFAR_OFFSET, guest, hpfar_el2);
 	OFFSET(GUEST_EXIT_CODE_OFFSET, guest, exit_code);
+	OFFSET(GUEST_TPIDR_EL1_OFFSET, guest, tpidr_el1);
 
 	return 0;
 }
diff --git a/lib/arm64/asm/guest.h b/lib/arm64/asm/guest.h
index b99641ef..8b40f3a5 100644
--- a/lib/arm64/asm/guest.h
+++ b/lib/arm64/asm/guest.h
@@ -29,6 +29,26 @@ enum guest_handler_result {
 struct guest;
 typedef enum guest_handler_result (*guest_handler_t)(struct guest *guest);
 
+/*
+ * Guest EL1 Exception Frame (pushed to guest stack by asm stub)
+ * We use a simplified frame: x0-x30, elr, spsr. size = 33*8
+ */
+struct guest_el1_regs {
+	unsigned long regs[31];
+	unsigned long elr;
+	unsigned long spsr;
+};
+
+typedef void (*guest_el1_handler_t)(struct guest_el1_regs *regs, unsigned int esr);
+
+/*
+ * Guest Context Structure
+ * This will be pointed to by TPIDR_EL1 while the guest is running.
+ */
+struct guest_context {
+	guest_el1_handler_t handlers[VECTOR_MAX];
+};
+
 struct guest {
 	/* General Purpose Registers */
 	unsigned long x[31]; /* x0..x30 */
@@ -42,15 +62,18 @@ struct guest {
 	unsigned long vttbr_el2;
 	unsigned long sctlr_el1;
 	unsigned long sp_el1;
+	unsigned long vbar_el1;
 
 	/* Exit Information */
 	unsigned long esr_el2;
 	unsigned long far_el2;
 	unsigned long hpfar_el2;
 	unsigned long exit_code;
+	unsigned long tpidr_el1;
 
 	/* Exception Handlers in EL2 */
 	guest_handler_t handlers[VECTOR_MAX];
+	struct guest_context *guest_context;
 
 	struct s2_mmu *s2mmu;
 };
@@ -62,4 +85,7 @@ void guest_run(struct guest *guest);
 unsigned long guest_c_exception_handler(struct guest *guest, unsigned long vector_offset);
 void guest_install_handler(struct guest *guest, enum vector v, guest_handler_t handler);
 
+void guest_el1_c_handler(struct guest_el1_regs *regs, unsigned int vector);
+void guest_install_el1_handler(struct guest *guest, enum vector v, guest_el1_handler_t handler);
+
 #endif /* _ASMARM64_GUEST_H_ */
diff --git a/lib/arm64/asm/sysreg.h b/lib/arm64/asm/sysreg.h
index 857bee98..a298ceff 100644
--- a/lib/arm64/asm/sysreg.h
+++ b/lib/arm64/asm/sysreg.h
@@ -137,6 +137,7 @@ asm(
 #define SYS_HFGITR2_EL2		sys_reg(3, 4, 3, 1, 7)
 
 #define SYS_SCTLR_EL1		sys_reg(3, 5, 1, 0, 0)
+#define SYS_VBAR_EL1		sys_reg(3, 5, 12, 0, 0)
 
 #define INIT_SCTLR_EL1_MMU_OFF	\
 			(SCTLR_EL1_ITD | SCTLR_EL1_SED | SCTLR_EL1_EOS | \
diff --git a/lib/arm64/guest.c b/lib/arm64/guest.c
index dd42ccd6..54c8b429 100644
--- a/lib/arm64/guest.c
+++ b/lib/arm64/guest.c
@@ -36,9 +36,45 @@ unsigned long guest_c_exception_handler(struct guest *guest, unsigned long vecto
 	return 1;
 }
 
+/* --- EL1 (Guest-Internal) Vector Handling --- */
+
+void guest_install_el1_handler(struct guest *guest, enum vector v, guest_el1_handler_t handler)
+{
+	if (guest && guest->guest_context && v < VECTOR_MAX)
+		guest->guest_context->handlers[v] = handler;
+}
+
+void guest_el1_c_handler(struct guest_el1_regs *regs, unsigned int vector)
+{
+	struct guest_context *ctx = (struct guest_context *)read_sysreg(tpidr_el1);
+	unsigned int esr = read_sysreg(esr_el1);
+
+	if (ctx && vector < VECTOR_MAX && ctx->handlers[vector]) {
+		ctx->handlers[vector](regs, esr);
+	} else {
+		printf("Guest: Unhandled Exception Vector %d, ESR=0x%x\n", vector, esr);
+		asm volatile("hvc #0xFFFF");
+	}
+}
+
+extern void guest_el1_vectors(void);
+
 static struct guest *__guest_create(struct s2_mmu *s2_ctx, void *entry_point)
 {
 	struct guest *guest = calloc(1, sizeof(struct guest));
+	struct guest_context *guest_ctx;
+	unsigned long guest_ctx_pa;
+
+	/* Allocate the internal context table */
+	guest_ctx = (void *)alloc_page();
+	memset(guest_ctx, 0, PAGE_SIZE);
+	guest->guest_context = guest_ctx;
+
+	guest_ctx_pa = virt_to_phys(guest_ctx);
+	if (s2_ctx)
+		s2mmu_map(s2_ctx, guest_ctx_pa, guest_ctx_pa, PAGE_SIZE, S2_MAP_RW);
+
+	guest->tpidr_el1 = guest_ctx_pa;
 
 	guest->elr_el2 = (unsigned long)entry_point;
 	guest->spsr_el2 = 0x3C5; /* M=EL1h, DAIF=Masked */
@@ -56,6 +92,7 @@ static struct guest *__guest_create(struct s2_mmu *s2_ctx, void *entry_point)
 	guest->sctlr_el1 &= ~(SCTLR_EL1_M | SCTLR_EL1_C);
 	guest->sctlr_el1 |= SCTLR_EL1_I;
 
+	guest->vbar_el1 = (unsigned long)guest_el1_vectors;
 	guest->s2mmu = s2_ctx;
 
 	return guest;
@@ -109,6 +146,8 @@ void guest_destroy(struct guest *guest)
 {
 	s2mmu_disable(guest->s2mmu);
 	s2mmu_destroy(guest->s2mmu);
+	if (guest->guest_context)
+		free_page(guest->guest_context);
 	free(guest);
 }
 
diff --git a/lib/arm64/guest_arch.S b/lib/arm64/guest_arch.S
index 81874fd0..32d640e6 100644
--- a/lib/arm64/guest_arch.S
+++ b/lib/arm64/guest_arch.S
@@ -34,8 +34,12 @@ guest_run:
 	msr	vttbr_el2, x1
 	ldr	x1, [x0, #GUEST_SCTLR_OFFSET]
 	msr_s	SYS_SCTLR_EL1, x1
+	ldr	x1, [x0, #GUEST_VBAR_OFFSET]
+	msr_s	SYS_VBAR_EL1, x1
 	ldr	x1, [x0, #GUEST_SP_EL1_OFFSET]
 	msr	sp_el1, x1
+	ldr	x1, [x0, #GUEST_TPIDR_EL1_OFFSET]
+	msr	tpidr_el1, x1
 
 	/* Load Guest GPRs */
 	ldp	x1, x2, [x0, #8]
@@ -125,6 +129,8 @@ guest_common_exit:
 	str	x1, [x0, #GUEST_HPFAR_OFFSET]
 	mrs	x1, sp_el1
 	str	x1, [x0, #GUEST_SP_EL1_OFFSET]
+	mrs_s	x1, SYS_VBAR_EL1
+	str	x1, [x0, #GUEST_VBAR_OFFSET]
 
 	/* x29 contains vector offset from entry */
 	mov	x1, x29
@@ -168,3 +174,75 @@ guest_resume_guest:
 	ldp	x29, x30, [x0, #232]
 	ldr	x0, [x0, #0]
 	eret
+
+/* EL1 Vector Table */
+.macro guest_el1_vector, vector_num
+	.align 7
+	stp     x29, x30, [sp, #-16]!
+	mov	x29, \vector_num
+	b       guest_el1_common
+.endm
+
+.align 11
+.global guest_el1_vectors
+guest_el1_vectors:
+	.skip 0x200
+
+	guest_el1_vector #4	/* Sync */
+	guest_el1_vector #5	/* IRQ */
+	guest_el1_vector #6	/* FIQ */
+	guest_el1_vector #7	/* SError */
+
+	.align 11
+
+guest_el1_common:
+	sub	sp, sp, #264
+	stp	x0, x1, [sp, #0]
+	stp	x2, x3, [sp, #16]
+	stp	x4, x5, [sp, #32]
+	stp	x6, x7, [sp, #48]
+	stp	x8, x9, [sp, #64]
+	stp	x10, x11, [sp, #80]
+	stp	x12, x13, [sp, #96]
+	stp	x14, x15, [sp, #112]
+	stp	x16, x17, [sp, #128]
+	stp	x18, x19, [sp, #144]
+	stp	x20, x21, [sp, #160]
+	stp	x22, x23, [sp, #176]
+	stp	x24, x25, [sp, #192]
+	stp	x26, x27, [sp, #208]
+	stp	x28, x30, [sp, #224]
+
+	mrs	x0, elr_el1
+	str	x0, [sp, #248]
+	mrs	x0, spsr_el1
+	str	x0, [sp, #256]
+
+	mov	x0, sp
+	mov	x1, x29
+	bl	guest_el1_c_handler
+
+	ldr	x0, [sp, #248]
+	msr	elr_el1, x0
+	ldr	x0, [sp, #256]
+	msr	spsr_el1, x0
+
+	ldp	x0, x1, [sp, #0]
+	ldp	x2, x3, [sp, #16]
+	ldp	x4, x5, [sp, #32]
+	ldp	x6, x7, [sp, #48]
+	ldp	x8, x9, [sp, #64]
+	ldp	x10, x11, [sp, #80]
+	ldp	x12, x13, [sp, #96]
+	ldp	x14, x15, [sp, #112]
+	ldp	x16, x17, [sp, #128]
+	ldp	x18, x19, [sp, #144]
+	ldp	x20, x21, [sp, #160]
+	ldp	x22, x23, [sp, #176]
+	ldp	x24, x25, [sp, #192]
+	ldp	x26, x27, [sp, #208]
+	ldp	x28, x30, [sp, #224]
+
+	add	sp, sp, #264
+	ldp	x29, x30, [sp], #16
+	eret
-- 
2.53.0.1213.gd9a14994de-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [kvm-unit-tests PATCH v2 7/7] arm64: Add Stage-2 MMU demand paging test
  2026-04-13 20:46 [kvm-unit-tests PATCH v2 0/7] arm64: Add Stage-2 MMU and Nested Guest Framework Jing Zhang
                   ` (5 preceding siblings ...)
  2026-04-13 20:46 ` [kvm-unit-tests PATCH v2 6/7] lib: arm64: Add guest-internal exception handling (EL1) Jing Zhang
@ 2026-04-13 20:46 ` Jing Zhang
  6 siblings, 0 replies; 11+ messages in thread
From: Jing Zhang @ 2026-04-13 20:46 UTC (permalink / raw)
  To: KVM, KVMARM, Marc Zyngier, Joey Gouly, Wei-Lin Chang, Yao Yuan
  Cc: Oliver Upton, Andrew Jones, Alexandru Elisei, Mingwei Zhang,
	Raghavendra Rao Ananta, Colton Lewis, Jing Zhang

Introduce a new test case to validate Stage-2 MMU fault handling. The
test verifies that the hypervisor correctly identifies and handles
Stage-2 data aborts triggered by a guest accessing unmapped memory.

The test performs the following:
- Sets up a guest with Stage-1 disabled, using identity-mapped host
   code and shared data in the Stage-2 page tables.
- Triggers a Stage-2 data abort by accessing a specific unmapped IPA.
- Catches the exception in the host, verifies the fault address,
   and dynamically maps a new page to resolve the fault.
- Resumes the guest to confirm the memory access completes successfully
   and the fault handler functioned as expected.

Signed-off-by: Jing Zhang <jingzhangos@google.com>
---
 arm/Makefile.arm64    |   1 +
 arm/stage2-mmu-test.c | 107 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 108 insertions(+)
 create mode 100644 arm/stage2-mmu-test.c

diff --git a/arm/Makefile.arm64 b/arm/Makefile.arm64
index 9026fd71..e547f92d 100644
--- a/arm/Makefile.arm64
+++ b/arm/Makefile.arm64
@@ -67,6 +67,7 @@ tests += $(TEST_DIR)/cache.$(exe)
 tests += $(TEST_DIR)/debug.$(exe)
 tests += $(TEST_DIR)/fpu.$(exe)
 tests += $(TEST_DIR)/mte.$(exe)
+tests += $(TEST_DIR)/stage2-mmu-test.$(exe)
 
 include $(SRCDIR)/$(TEST_DIR)/Makefile.common
 
diff --git a/arm/stage2-mmu-test.c b/arm/stage2-mmu-test.c
new file mode 100644
index 00000000..0df4704b
--- /dev/null
+++ b/arm/stage2-mmu-test.c
@@ -0,0 +1,107 @@
+/*
+ * ARM64 Stage-2 MMU Demand Paging Test
+ *
+ * This test validates stage-2 data abort handling by purposefully
+ * accessing unmapped memory in the guest and verifying that the
+ * host correctly handles the fault by mapping the page.
+ *
+ * Copyright (C) 2026 Google LLC.
+ * Author: Jing Zhang <jingzhangos@google.com>
+ *
+ * SPDX-License-Identifier: LGPL-2.0-or-later
+ */
+#include <libcflat.h>
+#include <alloc_page.h>
+#include <asm/io.h>
+#include <asm/smp.h>
+#include <asm/guest.h>
+#include <asm/stage2_mmu.h>
+
+#define TEST_PAGE_IPA		0x40000000UL
+#define FAULT_ADDR_IPA		0x50000000UL
+#define TEST_DATA		0xBEEFCAFEUL
+
+static volatile bool handled = false;
+
+static void guest_code(void)
+{
+	volatile unsigned long *test_va = (void *)TEST_PAGE_IPA;
+	volatile unsigned long *fault_va = (void *)FAULT_ADDR_IPA;
+
+	*fault_va = *test_va;
+
+	if (*fault_va == *test_va)
+		handled = true;
+
+	asm("hvc #0");
+}
+
+static enum guest_handler_result guest_exception_handler(struct guest *guest)
+{
+	unsigned long far, ec;
+	unsigned long *fixup_page;
+
+	ec = guest->esr_el2 >> ESR_ELx_EC_SHIFT;
+
+	if (ec == ESR_ELx_EC_HVC64) {
+		report_info("CPU%d: Guest exited via HVC.", smp_processor_id());
+		return GUEST_ACTION_EXIT;
+	}
+
+	if (ec == ESR_ELx_EC_DABT_LOW) {
+		far = guest->far_el2;
+		if (far == FAULT_ADDR_IPA) {
+			fixup_page = alloc_page();
+			s2mmu_map(guest->s2mmu, FAULT_ADDR_IPA,
+				  virt_to_phys(fixup_page), PAGE_SIZE, S2_MAP_RW);
+			report(true, "Caught stage-2 fault at 0x%lx", far);
+		} else {
+			report(false, "Unexpected fault address: 0x%lx", far);
+		}
+	} else {
+		report(false, "Unexpected exception class: 0x%lx", ec);
+	}
+
+	return GUEST_ACTION_RESUME;
+}
+
+int main(int argc, char **argv)
+{
+	struct guest *guest;
+	unsigned long *test_page;
+	unsigned long code_va_base, code_pa_base, data_base;
+
+	report_prefix_push("stage2-mmu");
+
+	guest = guest_create(smp_processor_id(), guest_code, S2_PAGE_4K);
+
+	/* Map host code: IPA(VA) -> PA */
+	/* We use the host VA as the Guest IPA because guest stage 1 is disabled. */
+	code_va_base = (unsigned long)guest_code;
+	code_pa_base = virt_to_phys((void *)guest_code);
+
+	/* Align to 2MB to use block descriptors where possible */
+	code_va_base = code_va_base & ~(SZ_2M - 1);
+	code_pa_base = code_pa_base & ~(SZ_2M - 1);
+	s2mmu_map(guest->s2mmu, code_va_base, code_pa_base, SZ_2M, S2_MAP_RW);
+
+	/* Identity map the shared variable */
+	data_base = virt_to_phys((void *)&handled) & PAGE_MASK;
+	s2mmu_map(guest->s2mmu, data_base, data_base, PAGE_SIZE, S2_MAP_RW);
+
+	/* Map test data page */
+	test_page = alloc_page();
+	*test_page = TEST_DATA;
+	s2mmu_map(guest->s2mmu, TEST_PAGE_IPA, virt_to_phys(test_page), PAGE_SIZE, S2_MAP_RW);
+
+	guest_install_handler(guest, ELx_LOW_SYNC_64, guest_exception_handler);
+
+	report_info("CPU%d: entering guest...", smp_processor_id());
+
+	guest_run(guest);
+
+	report(handled, "Stage-2 fault handling test completed");
+	guest_destroy(guest);
+
+	return report_summary();
+}
-- 
2.53.0.1213.gd9a14994de-goog


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [kvm-unit-tests PATCH v2 2/7] lib: arm64: Add stage2 page table management library
  2026-04-13 20:46 ` [kvm-unit-tests PATCH v2 2/7] lib: arm64: Add stage2 page table management library Jing Zhang
@ 2026-04-16 15:19   ` Joey Gouly
  0 siblings, 0 replies; 11+ messages in thread
From: Joey Gouly @ 2026-04-16 15:19 UTC (permalink / raw)
  To: Jing Zhang
  Cc: KVM, KVMARM, Marc Zyngier, Wei-Lin Chang, Yao Yuan, Oliver Upton,
	Andrew Jones, Alexandru Elisei, Mingwei Zhang,
	Raghavendra Rao Ananta, Colton Lewis

Hi Jing,

On Mon, Apr 13, 2026 at 01:46:25PM -0700, Jing Zhang wrote:
> Tests running at EL2 (hypervisor level) often require the ability to
> manage Stage 2 translation tables to control Guest Physical Address (IPA)
> to Host Physical Address (PA) translation.
> 
> Add a generic Stage 2 MMU library that provides software management of
> ARM64 Stage 2 translation tables.
> 
> The library features include:
> - Support for 4K, 16K, and 64K translation granules.
> - Dynamic page table allocation using the allocator.
> - Support for 2M block mappings where applicable.
> - APIs for mapping, unmapping, enabling, and disabling the Stage 2 MMU.
> - Basic fault info reporting (ESR, FAR, HPFAR).
> 
> This infrastructure is necessary for upcoming virtualization and
> hypervisor-mode tests.
> 
> Signed-off-by: Jing Zhang <jingzhangos@google.com>
> ---
>  arm/Makefile.arm64         |   1 +
>  lib/arm64/asm/stage2_mmu.h |  70 +++++++
>  lib/arm64/stage2_mmu.c     | 403 +++++++++++++++++++++++++++++++++++++
>  3 files changed, 474 insertions(+)
>  create mode 100644 lib/arm64/asm/stage2_mmu.h
>  create mode 100644 lib/arm64/stage2_mmu.c
> 
> diff --git a/arm/Makefile.arm64 b/arm/Makefile.arm64
> index a40c830d..5e50f5ba 100644
> --- a/arm/Makefile.arm64
> +++ b/arm/Makefile.arm64
> @@ -40,6 +40,7 @@ cflatobjs += lib/arm64/stack.o
>  cflatobjs += lib/arm64/processor.o
>  cflatobjs += lib/arm64/spinlock.o
>  cflatobjs += lib/arm64/gic-v3-its.o lib/arm64/gic-v3-its-cmd.o
> +cflatobjs += lib/arm64/stage2_mmu.o
>  
>  ifeq ($(CONFIG_EFI),y)
>  cflatobjs += lib/acpi.o
> diff --git a/lib/arm64/asm/stage2_mmu.h b/lib/arm64/asm/stage2_mmu.h
> new file mode 100644
> index 00000000..a5324108
> --- /dev/null
> +++ b/lib/arm64/asm/stage2_mmu.h
> @@ -0,0 +1,70 @@
> +/*
> + * Copyright (C) 2026, Google LLC.
> + * Author: Jing Zhang <jingzhangos@google.com>
> + *
> + * SPDX-License-Identifier: LGPL-2.0-or-later
> + */
> +#ifndef _ASMARM64_STAGE2_MMU_H_
> +#define _ASMARM64_STAGE2_MMU_H_
> +
> +#include <libcflat.h>
> +#include <asm/page.h>
> +#include <asm/pgtable.h>
> +
> +#define pte_is_table(pte)	(pte_val(pte) & PTE_TABLE_BIT)

This can go in lib/arm64/asm/pgtable.h.

> +
> +/* Stage-2 Memory Attributes (MemAttr[3:0]) */
> +#define S2_MEMATTR_NORMAL	(0xFUL << 2) /* Normal Memory, Outer/Inner Write-Back */
> +#define S2_MEMATTR_DEVICE	(0x0UL << 2) /* Device-nGnRnE */
> +
> +/* Stage-2 Access Permissions (S2AP[1:0]) */
> +#define S2AP_NONE	(0UL << 6)
> +#define S2AP_RO		(1UL << 6) /* Read-only */
> +#define S2AP_WO		(2UL << 6) /* Write-only */
> +#define S2AP_RW		(3UL << 6) /* Read-Write */

Do we need S2AP_NONE, it's just 0? Maybe S2AP_MASK would be useful for
something (which would be same as S2AP_RW)

Could you do:

#define S2AP_RO		BIT(6) /* Read-only */
#define S2AP_WO		BIT(7) /* Write-only */
#define S2AP_RW		S2AP_RO | S2AP_WO /* Read-Write */

Maybe even drop the comments, I think the suffixes are understandable.

> +
> +/* Flags for mapping */
> +#define S2_MAP_RW	(S2AP_RW | S2_MEMATTR_NORMAL | PTE_AF | PTE_SHARED)
> +#define S2_MAP_DEVICE	(S2AP_RW | S2_MEMATTR_DEVICE | PTE_AF)
> +
> +enum s2_granule {
> +	S2_PAGE_4K,
> +	S2_PAGE_16K,
> +	S2_PAGE_64K,
> +};
> +
> +/* Main Stage-2 MMU Structure */
> +struct s2_mmu {
> +	pgd_t *pgd;
> +	int vmid;
> +
> +	/* Configuration */
> +	enum s2_granule granule;
> +	bool allow_block_mappings;
> +
> +	/* Internal helpers calculated from granule & VA_BITS */
> +	unsigned int page_shift;
> +	unsigned int level_shift;
> +	int root_level; /* 0, 1, or 2 */
> +	unsigned long page_size;
> +	unsigned long block_size;
> +};
> +
> +/* API */
> +/* Initialize an s2_mmu struct with specific settings */
> +struct s2_mmu *s2mmu_init(int vmid, enum s2_granule granule, bool allow_block_mappings);
> +
> +/* Management */
> +void s2mmu_destroy(struct s2_mmu *mmu);
> +void s2mmu_map(struct s2_mmu *mmu, unsigned long ipa, unsigned long pa,
> +	       unsigned long size, unsigned long flags);
> +void s2mmu_unmap(struct s2_mmu *mmu, unsigned long ipa, unsigned long size);
> +
> +/* Activation */
> +void s2mmu_enable(struct s2_mmu *mmu);
> +void s2mmu_disable(struct s2_mmu *mmu);
> +
> +/* Debug */
> +void s2mmu_print_fault_info(void);
> +
> +#endif /* _ASMARM64_STAGE2_MMU_H_ */
> diff --git a/lib/arm64/stage2_mmu.c b/lib/arm64/stage2_mmu.c
> new file mode 100644
> index 00000000..cf419e28
> --- /dev/null
> +++ b/lib/arm64/stage2_mmu.c
> @@ -0,0 +1,403 @@
> +/*
> + * Copyright (C) 2026, Google LLC.
> + * Author: Jing Zhang <jingzhangos@google.com>
> + *
> + * SPDX-License-Identifier: LGPL-2.0-or-later
> + */
> +#include <libcflat.h>
> +#include <alloc.h>
> +#include <asm/stage2_mmu.h>
> +#include <asm/sysreg.h>
> +#include <asm/io.h>
> +#include <asm/barrier.h>
> +#include <alloc_page.h>
> +
> +/* VTCR_EL2 Definitions */
> +#define VTCR_SH0_INNER		(3UL << 12)
> +#define VTCR_ORGN0_WBWA		(1UL << 10)
> +#define VTCR_IRGN0_WBWA		(1UL << 8)
> +
> +/* TG0 Encodings */
> +#define VTCR_TG0_SHIFT		14
> +#define VTCR_TG0_4K		(0UL << VTCR_TG0_SHIFT)
> +#define VTCR_TG0_64K		(1UL << VTCR_TG0_SHIFT)
> +#define VTCR_TG0_16K		(2UL << VTCR_TG0_SHIFT)
> +
> +/* Physical Address Size (PS) - Derive from VA_BITS for simplicity or max */
> +#define VTCR_PS_SHIFT		16
> +#if VA_BITS > 40
> +#define VTCR_PS_VAL		(5UL << VTCR_PS_SHIFT) /* 48-bit PA */
> +#else
> +#define VTCR_PS_VAL		(2UL << VTCR_PS_SHIFT) /* 40-bit PA */
> +#endif

These definitions could go in headers?

> +
> +struct s2_mmu *s2mmu_init(int vmid, enum s2_granule granule, bool allow_block_mappings)
> +{
> +	struct s2_mmu *mmu = calloc(1, sizeof(struct s2_mmu));
> +	int order = 0;
> +
> +	mmu->vmid = vmid;
> +	mmu->granule = granule;
> +	mmu->allow_block_mappings = allow_block_mappings;
> +
> +	/* Configure shifts based on granule */
> +	switch (granule) {
> +	case S2_PAGE_4K:
> +		mmu->page_shift = 12;
> +		mmu->level_shift = 9;
> +		/*
> +		 * Determine Root Level for 4K:
> +		 * VA_BITS > 39 (e.g. 48) -> Start L0
> +		 * VA_BITS <= 39 (e.g. 32, 36) -> Start L1
> +		 */
> +		mmu->root_level = (VA_BITS > 39) ? 0 : 1;
> +		break;
> +	case S2_PAGE_16K:
> +		mmu->page_shift = 14;
> +		mmu->level_shift = 11;
> +		/*
> +		 * 16K: L1 covers 47 bits. L0 not valid for 16K
> +		 * Start L1 for 47 bits. Start L2 for 36 bits.
> +		 */
> +		mmu->root_level = (VA_BITS > 36) ? 1 : 2;
> +		break;
> +	case S2_PAGE_64K:
> +		mmu->page_shift = 16;
> +		mmu->level_shift = 13;
> +		/* 64K: L1 covers 52 bits. L2 covers 42 bits. */
> +		mmu->root_level = (VA_BITS > 42) ? 1 : 2;
> +		break;
> +	}
> +
> +	mmu->page_size = 1UL << mmu->page_shift;
> +	mmu->block_size = 1UL << (mmu->page_shift + mmu->level_shift);
> +
> +	/* Alloc PGD. Use order for allocation size */
> +	if (mmu->page_size > PAGE_SIZE) {
> +		order = __builtin_ctz(mmu->page_size / PAGE_SIZE);
> +	}
> +	mmu->pgd = (pgd_t *)alloc_pages(order);
> +	if (mmu->pgd) {
> +		memset(mmu->pgd, 0, mmu->page_size);
> +	} else {
> +		free(mmu);
> +		return NULL;
> +	}
> +
> +	return mmu;
> +}
> +
> +static unsigned long s2mmu_get_addr_mask(struct s2_mmu *mmu)
> +{
> +	switch (mmu->granule) {
> +	case S2_PAGE_16K:
> +		return GENMASK_ULL(47, 14);
> +	case S2_PAGE_64K:
> +		return GENMASK_ULL(47, 16);
> +	default:
> +		return GENMASK_ULL(47, 12); /* 4K */
> +	}
> +}
> +
> +static void s2mmu_free_tables(struct s2_mmu *mmu, pte_t *table, int level)
> +{
> +	unsigned long entries = 1UL << mmu->level_shift;
> +	unsigned long mask = s2mmu_get_addr_mask(mmu);
> +	unsigned long i;
> +
> +	/*
> +	 * Recurse if not leaf level
> +	 * Level 3 is always leaf page. Levels 0-2 can be Table or Block.
> +	 */
> +	if (level < 3) {
> +		for (i = 0; i < entries; i++) {
> +			pte_t entry = table[i];
> +			if ((pte_valid(entry) && pte_is_table(entry))) {
> +				pte_t *next = (pte_t *)phys_to_virt(pte_val(entry) & mask);
> +				s2mmu_free_tables(mmu, next, level + 1);
> +			}
> +		}
> +	}
> +
> +	free_pages(table);
> +}
> +
> +void s2mmu_destroy(struct s2_mmu *mmu)
> +{
> +	if (mmu->pgd)
> +		s2mmu_free_tables(mmu, (pte_t *)mmu->pgd, mmu->root_level);
> +	free(mmu);
> +}
> +
> +void s2mmu_enable(struct s2_mmu *mmu)
> +{
> +	unsigned long vtcr = VTCR_PS_VAL | VTCR_SH0_INNER |
> +			     VTCR_ORGN0_WBWA | VTCR_IRGN0_WBWA;
> +	unsigned long t0sz = 64 - VA_BITS;
> +	unsigned long vttbr;
> +
> +	switch (mmu->granule) {
> +	case S2_PAGE_4K:
> +		vtcr |= VTCR_TG0_4K;
> +		/* SL0 Encodings for 4K: 0=L2, 1=L1, 2=L0 */
> +		if (mmu->root_level == 0)
> +			vtcr |= (2UL << 6); /* Start L0 */
> +		else if (mmu->root_level == 1)
> +			vtcr |= (1UL << 6); /* Start L1 */
> +		else
> +			vtcr |= (0UL << 6); /* Start L2 */
> +		break;
> +	case S2_PAGE_16K:
> +		vtcr |= VTCR_TG0_16K;
> +		/* SL0 Encodings for 16K: 0=L3(Res), 1=L2, 2=L1, 3=L0(Res) */
> +		if (mmu->root_level == 1)
> +			vtcr |= (2UL << 6); /* Start L1 */
> +		else
> +			vtcr |= (1UL << 6); /* Start L2 */
> +		break;
> +	case S2_PAGE_64K:
> +		vtcr |= VTCR_TG0_64K;
> +		/* SL0 Encodings for 64K: 0=L3(Res), 1=L2, 2=L1, 3=L0(Res) */
> +		if (mmu->root_level == 1)
> +			vtcr |= (2UL << 6); /* Start L1 */
> +		else
> +			vtcr |= (1UL << 6); /* Start L2 */
> +		break;
> +	}

This could use a VTCR_EL2_SL0_SHIFT to remove the hardcoded 6.

> +
> +	vtcr |= t0sz;
> +
> +	write_sysreg(vtcr, vtcr_el2);
> +
> +	/* Setup VTTBR */
> +	vttbr = virt_to_phys(mmu->pgd);
> +	vttbr |= ((unsigned long)mmu->vmid << 48);

VTTBR_VMID_SHIFT instead of the bare 48.

> +	write_sysreg(vttbr, vttbr_el2);
> +
> +	asm volatile("tlbi vmalls12e1is");
> +
> +	dsb(ish);
> +	isb();
> +}
> +
> +void s2mmu_disable(struct s2_mmu *mmu)
> +{
> +	write_sysreg(0, vttbr_el2);
> +	isb();
> +}
> +
> +static pte_t *get_pte(struct s2_mmu *mmu, pte_t *table, unsigned long idx, bool alloc)
> +{
> +	unsigned long mask = s2mmu_get_addr_mask(mmu);
> +	pte_t entry = table[idx];
> +	pte_t *next_table;
> +	int order = 0;
> +
> +	if (pte_valid(entry)) {
> +		if (pte_is_table(entry))
> +			return (pte_t *)phys_to_virt(pte_val(entry) & mask);
> +		/* Block Entry */
> +		return NULL;
> +	}
> +
> +	if (!alloc)
> +		return NULL;
> +
> +	/* Allocate table memory covering the Stage-2 Granule size */
> +	if (mmu->page_size > PAGE_SIZE)
> +		order = __builtin_ctz(mmu->page_size / PAGE_SIZE);
> +
> +	next_table = (pte_t *)alloc_pages(order);
> +	if (next_table)
> +		memset(next_table, 0, mmu->page_size);
> +
> +	pte_val(entry) = virt_to_phys(next_table) | PTE_TABLE_BIT | PTE_VALID;
> +	WRITE_ONCE(table[idx], entry);

Should these two lines be inside `if (next_table)`?

> +
> +	return next_table;
> +}
> +
> +void s2mmu_map(struct s2_mmu *mmu, unsigned long ipa, unsigned long pa,
> +	       unsigned long size, unsigned long flags)
> +{
> +	unsigned long level_mask, level_shift, level_size, level;
> +	unsigned long start_ipa, end_ipa, idx;
> +	pte_t entry, *table, *next_table;
> +	bool is_block_level;
> +
> +	start_ipa = ipa;
> +	end_ipa = ipa + size;
> +	level_mask = (1UL << mmu->level_shift) - 1;
> +
> +	while (start_ipa < end_ipa) {
> +		table = (pte_t *)mmu->pgd;
> +
> +		/* Walk from Root to Leaf */
> +		for (level = mmu->root_level; level < 3; level++) {
> +			level_shift = mmu->page_shift + (3 - level) * mmu->level_shift;
> +			idx = (start_ipa >> level_shift) & level_mask;
> +			level_size = 1UL << level_shift;
> +
> +			/*
> +			 * Check for Block Mapping
> +			 * Valid Block Levels:
> +			 * 4K:  L1 (1G), L2 (2MB)
> +			 * 16K: L2 (32MB)
> +			 * 64K: L2 (512MB)
> +			 */
> +			is_block_level = (level == 2) ||
> +				(mmu->granule == S2_PAGE_4K && level == 1);
> +
> +			if (mmu->allow_block_mappings && is_block_level) {
> +				if ((start_ipa & (level_size - 1)) == 0 &&
> +				    (pa & (level_size - 1)) == 0 &&
> +				    (start_ipa + level_size) <= end_ipa) {
> +					/* Map Block */
> +					pte_val(entry) = (pa & ~(level_size - 1)) |
> +							 flags | PTE_VALID;
> +					WRITE_ONCE(table[idx], entry);

Should this check if there's some mapping here already?

If the table[idx] is an invalid pte, we can overwrite it.
If `table[idx] == entry`, do nothing.
If `table[idx] != entry` I think we can assert. Could add Break-Before-Make
handling, but I think it makes sense to keep it simple for now.

What do you think?

> +					start_ipa += level_size;
> +					pa += level_size;
> +					goto next_chunk; /* Continue outer loop */
> +				}
> +			}
> +
> +			/* Move to next level */
> +			next_table = get_pte(mmu, table, idx, true);
> +			if (!next_table) {
> +				printf("Error allocating or existing block conflict.\n");
> +				return;
> +			}
> +			table = next_table;
> +		}
> +
> +		/* Leaf Level (Level 3 PTE) */
> +		if (level == 3) {
> +			idx = (start_ipa >> mmu->page_shift) & level_mask;
> +			pte_val(entry) = (pa & ~(mmu->page_size - 1)) | flags | PTE_TYPE_PAGE;
> +			WRITE_ONCE(table[idx], entry);

Same comment as above.

> +			start_ipa += mmu->page_size;
> +			pa += mmu->page_size;
> +		}
> +
> +next_chunk:
> +		continue;
> +	}
> +
> +	asm volatile("tlbi vmalls12e1is");

This invalidates the current vmid, which might not be the vmid of `mmu` (see
enter_vmid_context() in Linux for example).

s2mmu_enable() is what sets vmid, but there are some calls to s2mmu_map()
before that. Either map/unmap could save/restore vmid, or maybe could assert
the current vmid is equal to `mmu`s vmid?

> +	dsb(ish);
> +	isb();
> +}
> +
> +/*
> + * Recursive helper to unmap a range within a specific table.
> + * Returns true if the table at this level is now completely empty
> + * and should be freed by the caller.
> + */
> +static bool s2mmu_unmap_level(struct s2_mmu *mmu, pte_t *table,
> +			      unsigned long current_ipa, int level,
> +			      unsigned long start_ipa, unsigned long end_ipa,
> +			      unsigned long mask)
> +{
> +	unsigned long level_size, entry_ipa, entry_end;
> +	bool child_empty, table_empty = true;
> +	pte_t entry, *next_table;
> +	unsigned int level_shift;
> +	unsigned long i;
> +
> +	/* Calculate shift and size for this level */
> +	if (level == 3) {
> +		level_shift = mmu->page_shift;
> +	} else {
> +		level_shift = mmu->page_shift + (3 - level) * mmu->level_shift;
> +	}

We don't really need the conditional since if level was 3, this subtraction is
0, but either way is fine to me.

> +	level_size = 1UL << level_shift;
> +
> +	/* Iterate over all entries in this table */
> +	for (i = 0; i < (1UL << mmu->level_shift); i++) {
> +		entry = table[i];
> +		entry_ipa = current_ipa + (i * level_size);
> +		entry_end = entry_ipa + level_size;
> +
> +		/* Skip entries completely outside our target range */
> +		if (entry_end <= start_ipa || entry_ipa >= end_ipa) {
> +			if (pte_valid(entry))
> +				table_empty = false;
> +			continue;
> +		}
> +
> +		/*
> +		 * If the entry is fully covered by the unmap range,
> +		 * we can clear it (leaf) or recurse and free (table).
> +		 */
> +		if (entry_ipa >= start_ipa && entry_end <= end_ipa) {
> +			if (pte_valid(entry)) {
> +				if (pte_is_table(entry) && level < 3) {
> +					/* Recurse to free children first */
> +					next_table = (pte_t *)phys_to_virt(pte_val(entry) & mask);
> +					s2mmu_free_tables(mmu, next_table, level + 1);
> +				}
> +				/* Invalidate the entry */
> +				WRITE_ONCE(table[i], __pte(0));
> +			}
> +			continue;
> +		}
> +
> +		/*
> +		 * Partial overlap: This must be a table (split required).
> +		 * If it's a Block, we can't split easily in this context
> +		 * without complex logic, so we generally skip or fail.
> +		 * Assuming standard breakdown: recurse into the table.
> +		 */
> +		if (pte_valid(entry) && pte_is_table(entry) && level < 3) {
> +			next_table = (pte_t *)phys_to_virt(pte_val(entry) & mask);
> +			child_empty = s2mmu_unmap_level(mmu, next_table, entry_ipa, level + 1,
> +							start_ipa, end_ipa, mask);
> +
> +			if (child_empty) {
> +				free_pages(next_table);
> +				WRITE_ONCE(table[i], __pte(0));
> +			} else {
> +				table_empty = false;
> +			}
> +		} else if (pte_valid(entry)) {
> +			/*
> +			 * Overlap on a leaf/block entry that extends
> +			 * beyond the unmap range. We cannot simply clear it.

Can we overlap a leaf here, or is it definitely a block? I'm just wondering if
it makes sense to assert() here? Since we're in full control of the code.

> +			 */
> +			table_empty = false;
> +		}
> +	}
> +
> +	return table_empty;
> +}
> +
> +void s2mmu_unmap(struct s2_mmu *mmu, unsigned long ipa, unsigned long size)
> +{
> +	unsigned long end_ipa = ipa + size;
> +	unsigned long mask = s2mmu_get_addr_mask(mmu);
> +
> +	if (!mmu->pgd)
> +		return;
> +
> +	/*
> +	 * Start recursion from the root level.
> +	 * We rarely free the PGD itself unless destroying the MMU,
> +	 * so we ignore the return value here.
> +	 */
> +	s2mmu_unmap_level(mmu, (pte_t *)mmu->pgd, 0, mmu->root_level,
> +			  ipa, end_ipa, mask);
> +
> +	/* Ensure TLB invalidation occurs after page table updates */
> +	asm volatile("tlbi vmalls12e1is");

Same as other comment earlier about vmid.

> +	dsb(ish);
> +	isb();
> +}
> +
> +void s2mmu_print_fault_info(void)
> +{
> +	unsigned long esr = read_sysreg(esr_el2);
> +	unsigned long far = read_sysreg(far_el2);
> +	unsigned long hpfar = read_sysreg(hpfar_el2);
> +	printf("Stage-2 Fault Info: ESR=0x%lx FAR=0x%lx HPFAR=0x%lx\n", esr, far, hpfar);
> +}

Thanks,
Joey

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [kvm-unit-tests PATCH v2 1/7] lib: arm64: Generalize ESR exception class definitions for EL2 support
  2026-04-13 20:46 ` [kvm-unit-tests PATCH v2 1/7] lib: arm64: Generalize ESR exception class definitions for EL2 support Jing Zhang
@ 2026-04-16 15:27   ` Joey Gouly
  0 siblings, 0 replies; 11+ messages in thread
From: Joey Gouly @ 2026-04-16 15:27 UTC (permalink / raw)
  To: Jing Zhang
  Cc: KVM, KVMARM, Marc Zyngier, Wei-Lin Chang, Yao Yuan, Oliver Upton,
	Andrew Jones, Alexandru Elisei, Mingwei Zhang,
	Raghavendra Rao Ananta, Colton Lewis

Hi,

On Mon, Apr 13, 2026 at 01:46:24PM -0700, Jing Zhang wrote:
> Generalize some Exception Syndrome Register (ESR) definitions by
> renaming EL1-specific macros to ELx equivalents. This allows these
> constants to be shared between EL1 and EL2, supporting the upcoming
> S2MMU library implementation.
> 
> Signed-off-by: Jing Zhang <jingzhangos@google.com>
> ---
>  lib/arm64/asm/esr.h   |  5 +++--
>  lib/arm64/processor.c | 10 +++++-----
>  2 files changed, 8 insertions(+), 7 deletions(-)
> 
> diff --git a/lib/arm64/asm/esr.h b/lib/arm64/asm/esr.h
> index 335343c5..8437916f 100644
> --- a/lib/arm64/asm/esr.h
> +++ b/lib/arm64/asm/esr.h
> @@ -12,7 +12,7 @@
>  #define ESR_EL1_CM		(1 << 8)
>  #define ESR_EL1_IL		(1 << 25)
>  
> -#define ESR_EL1_EC_SHIFT	(26)
> +#define ESR_ELx_EC_SHIFT	(26)
>  #define ESR_EL1_EC_UNKNOWN	(0x00)
>  #define ESR_EL1_EC_WFI		(0x01)
>  #define ESR_EL1_EC_CP15_32	(0x03)
> @@ -25,12 +25,13 @@
>  #define ESR_EL1_EC_ILL_ISS	(0x0E)
>  #define ESR_EL1_EC_SVC32	(0x11)
>  #define ESR_EL1_EC_SVC64	(0x15)
> +#define ESR_ELx_EC_HVC64	(0x16)
>  #define ESR_EL1_EC_SYS64	(0x18)
>  #define ESR_EL1_EC_SVE		(0x19)
>  #define ESR_EL1_EC_IABT_EL0	(0x20)
>  #define ESR_EL1_EC_IABT_EL1	(0x21)
>  #define ESR_EL1_EC_PC_ALIGN	(0x22)
> -#define ESR_EL1_EC_DABT_EL0	(0x24)
> +#define ESR_ELx_EC_DABT_LOW	(0x24)
>  #define ESR_EL1_EC_DABT_EL1	(0x25)
>  #define ESR_EL1_EC_SP_ALIGN	(0x26)
>  #define ESR_EL1_EC_FP_EXC32	(0x28)

More work, but it probably makes sense to align all of them to Linux's naming?

So ESR_ELx_EC_DABT_CUR instead of ESR_EL1_EC_DABT_EL1 etc

You can check arch/arm64/include/asm/esr.h

Thanks,
Joey

> diff --git a/lib/arm64/processor.c b/lib/arm64/processor.c
> index f9fea519..bde3caa5 100644
> --- a/lib/arm64/processor.c
> +++ b/lib/arm64/processor.c
> @@ -48,7 +48,7 @@ static const char *ec_names[EC_MAX] = {
>  	[ESR_EL1_EC_IABT_EL0]		= "IABT_EL0",
>  	[ESR_EL1_EC_IABT_EL1]		= "IABT_EL1",
>  	[ESR_EL1_EC_PC_ALIGN]		= "PC_ALIGN",
> -	[ESR_EL1_EC_DABT_EL0]		= "DABT_EL0",
> +	[ESR_ELx_EC_DABT_LOW]		= "DABT_EL0",
>  	[ESR_EL1_EC_DABT_EL1]		= "DABT_EL1",
>  	[ESR_EL1_EC_SP_ALIGN]		= "SP_ALIGN",
>  	[ESR_EL1_EC_FP_EXC32]		= "FP_EXC32",
> @@ -82,7 +82,7 @@ void show_regs(struct pt_regs *regs)
>  
>  bool get_far(unsigned int esr, unsigned long *far)
>  {
> -	unsigned int ec = esr >> ESR_EL1_EC_SHIFT;
> +	unsigned int ec = esr >> ESR_ELx_EC_SHIFT;
>  
>  	asm volatile("mrs %0, far_el1": "=r" (*far));
>  
> @@ -90,7 +90,7 @@ bool get_far(unsigned int esr, unsigned long *far)
>  	case ESR_EL1_EC_IABT_EL0:
>  	case ESR_EL1_EC_IABT_EL1:
>  	case ESR_EL1_EC_PC_ALIGN:
> -	case ESR_EL1_EC_DABT_EL0:
> +	case ESR_ELx_EC_DABT_LOW:
>  	case ESR_EL1_EC_DABT_EL1:
>  	case ESR_EL1_EC_WATCHPT_EL0:
>  	case ESR_EL1_EC_WATCHPT_EL1:
> @@ -108,7 +108,7 @@ static void bad_exception(enum vector v, struct pt_regs *regs,
>  {
>  	unsigned long far;
>  	bool far_valid = get_far(esr, &far);
> -	unsigned int ec = esr >> ESR_EL1_EC_SHIFT;
> +	unsigned int ec = esr >> ESR_ELx_EC_SHIFT;
>  	uintptr_t text = (uintptr_t)&_text;
>  
>  	printf("Load address: %" PRIxPTR "\n", text);
> @@ -158,7 +158,7 @@ void default_vector_sync_handler(enum vector v, struct pt_regs *regs,
>  				 unsigned int esr)
>  {
>  	struct thread_info *ti = thread_info_sp(regs->sp);
> -	unsigned int ec = esr >> ESR_EL1_EC_SHIFT;
> +	unsigned int ec = esr >> ESR_ELx_EC_SHIFT;
>  
>  	if (ti->flags & TIF_USER_MODE) {
>  		if (ec < EC_MAX && ti->exception_handlers[v][ec]) {
> -- 
> 2.53.0.1213.gd9a14994de-goog
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [kvm-unit-tests PATCH v2 4/7] lib: arm64: Add foundational guest execution framework
  2026-04-13 20:46 ` [kvm-unit-tests PATCH v2 4/7] lib: arm64: Add foundational guest execution framework Jing Zhang
@ 2026-04-16 16:16   ` Joey Gouly
  0 siblings, 0 replies; 11+ messages in thread
From: Joey Gouly @ 2026-04-16 16:16 UTC (permalink / raw)
  To: Jing Zhang
  Cc: KVM, KVMARM, Marc Zyngier, Wei-Lin Chang, Yao Yuan, Oliver Upton,
	Andrew Jones, Alexandru Elisei, Mingwei Zhang,
	Raghavendra Rao Ananta, Colton Lewis

Hi,

First-look comments below,

On Mon, Apr 13, 2026 at 01:46:27PM -0700, Jing Zhang wrote:
> Introduce the infrastructure to manage and execute guest when running
> at EL2. This provides the basis for testing advanced features like
> nested virtualization and GICv4 direct interrupt injection.
> 
> The framework includes:
> - 'struct guest': Encapsulates vCPU state (GPRs, EL1/EL2 sysregs) and
>   Stage-2 MMU context.
> - guest_create() / guest_destroy(): Handle lifecycle management and
>   Stage-2 MMU setup, including identity mappings for guest code,
>   stack, and UART.
> - guest_run(): Assembly entry point that saves host callee-saved
>   registers, caches the guest context pointer in TPIDR_EL2, and
>   performs the exception return (ERET) to the guest.
> 
> Signed-off-by: Jing Zhang <jingzhangos@google.com>
> ---
>  arm/Makefile.arm64      |  2 +
>  lib/arm64/asm-offsets.c | 13 ++++++
>  lib/arm64/asm/guest.h   | 46 +++++++++++++++++++++
>  lib/arm64/asm/sysreg.h  |  6 +++
>  lib/arm64/guest.c       | 89 +++++++++++++++++++++++++++++++++++++++++
>  lib/arm64/guest_arch.S  | 59 +++++++++++++++++++++++++++
>  6 files changed, 215 insertions(+)
>  create mode 100644 lib/arm64/asm/guest.h
>  create mode 100644 lib/arm64/guest.c
>  create mode 100644 lib/arm64/guest_arch.S
> 
> diff --git a/arm/Makefile.arm64 b/arm/Makefile.arm64
> index 5e50f5ba..9026fd71 100644
> --- a/arm/Makefile.arm64
> +++ b/arm/Makefile.arm64
> @@ -41,6 +41,8 @@ cflatobjs += lib/arm64/processor.o
>  cflatobjs += lib/arm64/spinlock.o
>  cflatobjs += lib/arm64/gic-v3-its.o lib/arm64/gic-v3-its-cmd.o
>  cflatobjs += lib/arm64/stage2_mmu.o
> +cflatobjs += lib/arm64/guest.o
> +cflatobjs += lib/arm64/guest_arch.o
>  
>  ifeq ($(CONFIG_EFI),y)
>  cflatobjs += lib/acpi.o
> diff --git a/lib/arm64/asm-offsets.c b/lib/arm64/asm-offsets.c
> index 80de023c..ceeecce5 100644
> --- a/lib/arm64/asm-offsets.c
> +++ b/lib/arm64/asm-offsets.c
> @@ -8,6 +8,7 @@
>  #include <libcflat.h>
>  #include <kbuild.h>
>  #include <asm/ptrace.h>
> +#include <asm/guest.h>
>  
>  int main(void)
>  {
> @@ -30,5 +31,17 @@ int main(void)
>  	DEFINE(S_FP, sizeof(struct pt_regs));
>  	DEFINE(S_FRAME_SIZE, (sizeof(struct pt_regs) + 16));
>  
> +	OFFSET(GUEST_X_OFFSET, guest, x);
> +	OFFSET(GUEST_ELR_OFFSET, guest, elr_el2);
> +	OFFSET(GUEST_SPSR_OFFSET, guest, spsr_el2);
> +	OFFSET(GUEST_HCR_OFFSET, guest, hcr_el2);
> +	OFFSET(GUEST_VTTBR_OFFSET, guest, vttbr_el2);
> +	OFFSET(GUEST_SCTLR_OFFSET, guest, sctlr_el1);
> +	OFFSET(GUEST_SP_EL1_OFFSET, guest, sp_el1);
> +	OFFSET(GUEST_ESR_OFFSET, guest, esr_el2);
> +	OFFSET(GUEST_FAR_OFFSET, guest, far_el2);
> +	OFFSET(GUEST_HPFAR_OFFSET, guest, hpfar_el2);
> +	OFFSET(GUEST_EXIT_CODE_OFFSET, guest, exit_code);
> +
>  	return 0;
>  }
> diff --git a/lib/arm64/asm/guest.h b/lib/arm64/asm/guest.h
> new file mode 100644
> index 00000000..826c44f8
> --- /dev/null
> +++ b/lib/arm64/asm/guest.h
> @@ -0,0 +1,46 @@
> +/*
> + * Copyright (C) 2026, Google LLC.
> + * Author: Jing Zhang <jingzhangos@google.com>
> + *
> + * SPDX-License-Identifier: LGPL-2.0-or-later
> + */
> +#ifndef _ASMARM64_GUEST_H_
> +#define _ASMARM64_GUEST_H_
> +
> +#include <libcflat.h>
> +#include <asm/processor.h>
> +#include <asm/stage2_mmu.h>
> +
> +#define HCR_GUEST_FLAGS (HCR_EL2_VM | HCR_EL2_FMO | HCR_EL2_IMO | \
> +			 HCR_EL2_AMO | HCR_EL2_RW | HCR_EL2_E2H)
> +/* Guest stack size */
> +#define GUEST_STACK_SIZE		SZ_64K
> +
> +struct guest {
> +	/* General Purpose Registers */
> +	unsigned long x[31]; /* x0..x30 */
> +
> +	/* Execution State */
> +	unsigned long elr_el2;
> +	unsigned long spsr_el2;
> +
> +	/* Control Registers */
> +	unsigned long hcr_el2;
> +	unsigned long vttbr_el2;
> +	unsigned long sctlr_el1;
> +	unsigned long sp_el1;
> +
> +	/* Exit Information */
> +	unsigned long esr_el2;
> +	unsigned long far_el2;
> +	unsigned long hpfar_el2;
> +	unsigned long exit_code;
> +
> +	struct s2_mmu *s2mmu;
> +};
> +
> +struct guest *guest_create(int vmid, void (*guest_func)(void), enum s2_granule granule);
> +void guest_destroy(struct guest *guest);
> +void guest_run(struct guest *guest);
> +
> +#endif /* _ASMARM64_GUEST_H_ */
> diff --git a/lib/arm64/asm/sysreg.h b/lib/arm64/asm/sysreg.h
> index f2d05018..857bee98 100644
> --- a/lib/arm64/asm/sysreg.h
> +++ b/lib/arm64/asm/sysreg.h
> @@ -118,6 +118,10 @@ asm(
>  #define SCTLR_EL1_TCF0_SHIFT	38
>  #define SCTLR_EL1_TCF0_MASK	GENMASK_ULL(39, 38)
>  
> +#define HCR_EL2_VM		_BITULL(0)
> +#define HCR_EL2_FMO		_BITULL(3)
> +#define HCR_EL2_IMO		_BITULL(4)
> +#define HCR_EL2_AMO		_BITULL(5)
>  #define HCR_EL2_TGE		_BITULL(27)
>  #define HCR_EL2_RW		_BITULL(31)
>  #define HCR_EL2_E2H		_BITULL(34)
> @@ -132,6 +136,8 @@ asm(
>  #define SYS_HFGWTR2_EL2		sys_reg(3, 4, 3, 1, 3)
>  #define SYS_HFGITR2_EL2		sys_reg(3, 4, 3, 1, 7)
>  
> +#define SYS_SCTLR_EL1		sys_reg(3, 5, 1, 0, 0)

This isn't needed, the asm below can use `msr sctlr_el1` directly.

> +
>  #define INIT_SCTLR_EL1_MMU_OFF	\
>  			(SCTLR_EL1_ITD | SCTLR_EL1_SED | SCTLR_EL1_EOS | \
>  			 SCTLR_EL1_TSCXT | SCTLR_EL1_EIS | SCTLR_EL1_SPAN | \
> diff --git a/lib/arm64/guest.c b/lib/arm64/guest.c
> new file mode 100644
> index 00000000..68dd449d
> --- /dev/null
> +++ b/lib/arm64/guest.c
> @@ -0,0 +1,89 @@
> +/*
> + * Copyright (C) 2026, Google LLC.
> + * Author: Jing Zhang <jingzhangos@google.com>
> + *
> + * SPDX-License-Identifier: LGPL-2.0-or-later
> + */
> +#include <libcflat.h>
> +#include <asm/guest.h>
> +#include <asm/io.h>
> +#include <asm/sysreg.h>
> +#include <asm/barrier.h>
> +#include <alloc_page.h>
> +#include <alloc.h>
> +
> +static struct guest *__guest_create(struct s2_mmu *s2_ctx, void *entry_point)
> +{
> +	struct guest *guest = calloc(1, sizeof(struct guest));

Missing check if `guest` is NULL.

> +
> +	guest->elr_el2 = (unsigned long)entry_point;
> +	guest->spsr_el2 = 0x3C5; /* M=EL1h, DAIF=Masked */

Check lib/arm64/asm/ptrace.h, we can build this with macros, PSR_MODE_EL1h etc.

> +	guest->hcr_el2 = HCR_GUEST_FLAGS;
> +
> +	if (s2_ctx) {
> +		guest->vttbr_el2 = virt_to_phys(s2_ctx->pgd);
> +		guest->vttbr_el2 |= ((unsigned long)s2_ctx->vmid << 48);

Maybe this should be extracted into a helper in stage2_mmu, to build vttbr.

> +	} else {
> +		printf("Stage 2 MMU context missing!");
> +	}

Can probably just assert(s2_ctx) instead of this conditional, don't think you
can use the guest without one?

> +
> +	guest->sctlr_el1 = read_sysreg(sctlr_el1);
> +	/* Disable guest stage 1 translation */
> +	guest->sctlr_el1 &= ~(SCTLR_EL1_M | SCTLR_EL1_C);
> +	guest->sctlr_el1 |= SCTLR_EL1_I;
> +
> +	guest->s2mmu = s2_ctx;
> +
> +	return guest;
> +}
> +
> +struct guest *guest_create(int vmid, void (*guest_func)(void), enum s2_granule granule)
> +{
> +	unsigned long guest_pa, code_base, stack_pa;
> +	unsigned long *stack_page;
> +	struct guest *guest;
> +	struct s2_mmu *ctx;
> +
> +	ctx = s2mmu_init(vmid, granule, true);
> +	/*
> +	 * Map the Host's code segment Identity Mapped (IPA=PA).
> +	 * To be safe, we map a large chunk (e.g., 2MB) around the function
> +	 * to capture any helper functions the compiler might generate calls to.
> +	 */
> +	guest_pa = virt_to_phys((void *)guest_func);
> +	code_base = guest_pa & ~(SZ_2M - 1);
> +	s2mmu_map(ctx, code_base, code_base, SZ_2M, S2_MAP_RW);
> +
> +	/*
> +	 * Map Stack
> +	 * Allocate 16 pages (64K) in Host, get its PA, and map it for Guest.

It's not always 16 pages (depends on PAGE_SIZE), can just reword to say
`Allocate pages in Host..`

> +	 */
> +	stack_page = alloc_pages(get_order(GUEST_STACK_SIZE >> PAGE_SHIFT));
> +	stack_pa = virt_to_phys(stack_page);
> +	/* Identity Map it (IPA = PA) */
> +	s2mmu_map(ctx, stack_pa, stack_pa, GUEST_STACK_SIZE, S2_MAP_RW);
> +
> +	s2mmu_enable(ctx);
> +
> +	/* Create Guest */
> +	/* Entry point is the PA of the function (Identity Mapped) */
> +	guest = __guest_create(ctx, (void *)guest_pa);
> +
> +	/*
> +	 * Setup Guest Stack Pointer
> +	 * Must match where we mapped the stack + Offset
> +	 */
> +	guest->sp_el1 = stack_pa + GUEST_STACK_SIZE;
> +
> +	/* Map UART identity mapped, printf() available to guest */
> +	s2mmu_map(ctx, 0x09000000, 0x09000000, PAGE_SIZE, S2_MAP_DEVICE);

This is not portable.

> +
> +	return guest;
> +}

I'm actually not sure what I think of this function as a whole, maybe it does
too much and the caller should be expected to map stuff in. Different types of
tests could have different specific guest_create helpers.

> +
> +void guest_destroy(struct guest *guest)
> +{
> +	s2mmu_disable(guest->s2mmu);
> +	s2mmu_destroy(guest->s2mmu);
> +	free(guest);
> +}
> diff --git a/lib/arm64/guest_arch.S b/lib/arm64/guest_arch.S
> new file mode 100644
> index 00000000..70c19507
> --- /dev/null
> +++ b/lib/arm64/guest_arch.S
> @@ -0,0 +1,59 @@
> +/*
> + * Copyright (C) 2026, Google LLC.
> + * Author: Jing Zhang <jingzhangos@google.com>
> + *
> + * SPDX-License-Identifier: LGPL-2.0-or-later
> + */
> +#define __ASSEMBLY__
> +#include <asm/asm-offsets.h>
> +#include <asm/sysreg.h>
> +
> +.global guest_run
> +guest_run:
> +	/* x0 = struct guest pointer */
> +
> +	/* Save Host Callee-Saved Regs */
> +	stp	x29, x30, [sp, #-16]!
> +	stp	x27, x28, [sp, #-16]!
> +	stp	x25, x26, [sp, #-16]!
> +	stp	x23, x24, [sp, #-16]!
> +	stp	x21, x22, [sp, #-16]!
> +	stp	x19, x20, [sp, #-16]!
> +
> +	/* Cache Guest Pointer in TPIDR_EL2 */
> +	msr	tpidr_el2, x0
> +
> +	/* Load Guest System Registers */
> +	ldr	x1, [x0, #GUEST_ELR_OFFSET]
> +	msr	elr_el2, x1
> +	ldr	x1, [x0, #GUEST_SPSR_OFFSET]
> +	msr	spsr_el2, x1
> +	ldr	x1, [x0, #GUEST_HCR_OFFSET]
> +	msr	hcr_el2, x1
> +	ldr	x1, [x0, #GUEST_VTTBR_OFFSET]
> +	msr	vttbr_el2, x1
> +	ldr	x1, [x0, #GUEST_SCTLR_OFFSET]
> +	msr_s	SYS_SCTLR_EL1, x1

This part, can use `msr sctlr_el1`.

> +	ldr	x1, [x0, #GUEST_SP_EL1_OFFSET]
> +	msr	sp_el1, x1
> +
> +	/* Load Guest GPRs */
> +	ldp	x1, x2, [x0, #8]
> +	ldp	x3, x4, [x0, #24]
> +	ldp	x5, x6, [x0, #40]
> +	ldp	x7, x8, [x0, #56]
> +	ldp	x9, x10, [x0, #72]
> +	ldp	x11, x12, [x0, #88]
> +	ldp	x13, x14, [x0, #104]
> +	ldp	x15, x16, [x0, #120]
> +	ldp	x17, x18, [x0, #136]
> +	ldp	x19, x20, [x0, #152]
> +	ldp	x21, x22, [x0, #168]
> +	ldp	x23, x24, [x0, #184]
> +	ldp	x25, x26, [x0, #200]
> +	ldp	x27, x28, [x0, #216]
> +	ldp	x29, x30, [x0, #232]
> +	ldr	x0, [x0, #0]
> +
> +	isb
> +	eret
> -- 
> 2.53.0.1213.gd9a14994de-goog
> 

Thanks,
Joey

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2026-04-16 16:16 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-13 20:46 [kvm-unit-tests PATCH v2 0/7] arm64: Add Stage-2 MMU and Nested Guest Framework Jing Zhang
2026-04-13 20:46 ` [kvm-unit-tests PATCH v2 1/7] lib: arm64: Generalize ESR exception class definitions for EL2 support Jing Zhang
2026-04-16 15:27   ` Joey Gouly
2026-04-13 20:46 ` [kvm-unit-tests PATCH v2 2/7] lib: arm64: Add stage2 page table management library Jing Zhang
2026-04-16 15:19   ` Joey Gouly
2026-04-13 20:46 ` [kvm-unit-tests PATCH v2 3/7] lib: arm64: Generalize exception vector definitions for EL2 support Jing Zhang
2026-04-13 20:46 ` [kvm-unit-tests PATCH v2 4/7] lib: arm64: Add foundational guest execution framework Jing Zhang
2026-04-16 16:16   ` Joey Gouly
2026-04-13 20:46 ` [kvm-unit-tests PATCH v2 5/7] lib: arm64: Add support for guest exit exception handling Jing Zhang
2026-04-13 20:46 ` [kvm-unit-tests PATCH v2 6/7] lib: arm64: Add guest-internal exception handling (EL1) Jing Zhang
2026-04-13 20:46 ` [kvm-unit-tests PATCH v2 7/7] arm64: Add Stage-2 MMU demand paging test Jing Zhang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox