[PATCH v4 00/10] Stage-2 huge mappings for pKVM np-guests

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v4 00/10] Stage-2 huge mappings for pKVM np-guests
@ 2025-05-09 13:16 Vincent Donnefort
  2025-05-09 13:16 ` [PATCH v4 01/10] KVM: arm64: Handle huge mappings for np-guest CMOs Vincent Donnefort
                   ` (9 more replies)
  0 siblings, 10 replies; 24+ messages in thread
From: Vincent Donnefort @ 2025-05-09 13:16 UTC (permalink / raw)
  To: maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
	catalin.marinas, will
  Cc: qperret, linux-arm-kernel, kvmarm, linux-kernel, kernel-team,
	Vincent Donnefort

Hi all,

This series adds support for stage-2 huge mappings (PMD_SIZE) to pKVM
np-guests, that is installing PMD-level mappings in the stage-2,
whenever the stage-1 is backed by either Hugetlbfs or THPs.

The last patch of that series is an optimization for CMOs using a shared
PMD_SIZE fixmap.

Changes since v3 (https://lore.kernel.org/all/20250407082706.1239603-1-vdonnefort@google.com/)

  - Rebase on kvmarm/next

Changes since v2 (https://lore.kernel.org/all/20250306110038.3733649-1-vdonnefort@google.com/)

  - Fix PUD_SIZE -> PMD_SIZE enforcement (Quentin)
  - Rework pkvm_host_share_guest() to remove one hyp_page walk (Quentin)
  - Remove one pgtable walk into __check_host_shared_guest() (Quentin)
  - Return EBUSY on host_share_guest_count overflow

Changes since v1 (https://lore.kernel.org/all/20250228102530.1229089-1-vdonnefort@google.com/)

  - WARN_ON() on !PAGE_ALIGNED size for guest CMOs (Quentin)
  - check_range_allowed_memory() before accessing the Vmemmap (Quentin)

Quentin Perret (2):
  KVM: arm64: Convert pkvm_mappings to interval tree
  KVM: arm64: Add a range to pkvm_mappings

Vincent Donnefort (8):
  KVM: arm64: Handle huge mappings for np-guest CMOs
  KVM: arm64: Introduce for_each_hyp_page
  KVM: arm64: Add a range to __pkvm_host_share_guest()
  KVM: arm64: Add a range to __pkvm_host_unshare_guest()
  KVM: arm64: Add a range to __pkvm_host_wrprotect_guest()
  KVM: arm64: Add a range to __pkvm_host_test_clear_young_guest()
  KVM: arm64: Stage-2 huge mappings for np-guests
  KVM: arm64: np-guest CMOs with PMD_SIZE fixmap

 arch/arm64/include/asm/kvm_pgtable.h          |   7 +-
 arch/arm64/include/asm/kvm_pkvm.h             |   2 +
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |   8 +-
 arch/arm64/kvm/hyp/include/nvhe/memory.h      |  16 +-
 arch/arm64/kvm/hyp/include/nvhe/mm.h          |   4 +-
 arch/arm64/kvm/hyp/nvhe/hyp-main.c            |  16 +-
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 276 ++++++++++++------
 arch/arm64/kvm/hyp/nvhe/mm.c                  |  86 +++++-
 arch/arm64/kvm/hyp/nvhe/setup.c               |  15 +-
 arch/arm64/kvm/hyp/pgtable.c                  |   6 -
 arch/arm64/kvm/mmu.c                          |   5 +-
 arch/arm64/kvm/pkvm.c                         | 129 ++++----
 12 files changed, 375 insertions(+), 195 deletions(-)


base-commit: c4e91ea0cc7e6345a4f7b8167e838d728ca86c30
-- 
2.49.0.1015.ga840276032-goog



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v4 01/10] KVM: arm64: Handle huge mappings for np-guest CMOs
  2025-05-09 13:16 [PATCH v4 00/10] Stage-2 huge mappings for pKVM np-guests Vincent Donnefort
@ 2025-05-09 13:16 ` Vincent Donnefort
  2025-05-16 12:15   ` Marc Zyngier
  2025-05-09 13:16 ` [PATCH v4 02/10] KVM: arm64: Introduce for_each_hyp_page Vincent Donnefort
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 24+ messages in thread
From: Vincent Donnefort @ 2025-05-09 13:16 UTC (permalink / raw)
  To: maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
	catalin.marinas, will
  Cc: qperret, linux-arm-kernel, kvmarm, linux-kernel, kernel-team,
	Vincent Donnefort

clean_dcache_guest_page() and invalidate_icache_guest_page() accept a
size as an argument. But they also rely on fixmap, which can only map a
single PAGE_SIZE page.

With the upcoming stage-2 huge mappings for pKVM np-guests, those
callbacks will get size > PAGE_SIZE. Loop the CMOs on a PAGE_SIZE basis
until the whole range is done.

Signed-off-by: Vincent Donnefort <vdonnefort@google.com>

diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 31173c694695..23544928a637 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -219,14 +219,28 @@ static void guest_s2_put_page(void *addr)
 
 static void clean_dcache_guest_page(void *va, size_t size)
 {
-	__clean_dcache_guest_page(hyp_fixmap_map(__hyp_pa(va)), size);
-	hyp_fixmap_unmap();
+	WARN_ON(!PAGE_ALIGNED(size));
+
+	while (size) {
+		__clean_dcache_guest_page(hyp_fixmap_map(__hyp_pa(va)),
+					  PAGE_SIZE);
+		hyp_fixmap_unmap();
+		va += PAGE_SIZE;
+		size -= PAGE_SIZE;
+	}
 }
 
 static void invalidate_icache_guest_page(void *va, size_t size)
 {
-	__invalidate_icache_guest_page(hyp_fixmap_map(__hyp_pa(va)), size);
-	hyp_fixmap_unmap();
+	WARN_ON(!PAGE_ALIGNED(size));
+
+	while (size) {
+		__invalidate_icache_guest_page(hyp_fixmap_map(__hyp_pa(va)),
+					       PAGE_SIZE);
+		hyp_fixmap_unmap();
+		va += PAGE_SIZE;
+		size -= PAGE_SIZE;
+	}
 }
 
 int kvm_guest_prepare_stage2(struct pkvm_hyp_vm *vm, void *pgd)
-- 
2.49.0.1015.ga840276032-goog



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 02/10] KVM: arm64: Introduce for_each_hyp_page
  2025-05-09 13:16 [PATCH v4 00/10] Stage-2 huge mappings for pKVM np-guests Vincent Donnefort
  2025-05-09 13:16 ` [PATCH v4 01/10] KVM: arm64: Handle huge mappings for np-guest CMOs Vincent Donnefort
@ 2025-05-09 13:16 ` Vincent Donnefort
  2025-05-16 12:57   ` Marc Zyngier
  2025-05-09 13:16 ` [PATCH v4 03/10] KVM: arm64: Add a range to __pkvm_host_share_guest() Vincent Donnefort
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 24+ messages in thread
From: Vincent Donnefort @ 2025-05-09 13:16 UTC (permalink / raw)
  To: maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
	catalin.marinas, will
  Cc: qperret, linux-arm-kernel, kvmarm, linux-kernel, kernel-team,
	Vincent Donnefort

Add a helper to iterate over the hypervisor vmemmap. This will be
particularly handy with the introduction of huge mapping support
for the np-guest stage-2.

Signed-off-by: Vincent Donnefort <vdonnefort@google.com>

diff --git a/arch/arm64/kvm/hyp/include/nvhe/memory.h b/arch/arm64/kvm/hyp/include/nvhe/memory.h
index eb0c2ebd1743..676a0a13c741 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/memory.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/memory.h
@@ -96,24 +96,24 @@ static inline struct hyp_page *hyp_phys_to_page(phys_addr_t phys)
 #define hyp_page_to_virt(page)	__hyp_va(hyp_page_to_phys(page))
 #define hyp_page_to_pool(page)	(((struct hyp_page *)page)->pool)
 
-static inline enum pkvm_page_state get_host_state(phys_addr_t phys)
+static inline enum pkvm_page_state get_host_state(struct hyp_page *p)
 {
-	return (enum pkvm_page_state)hyp_phys_to_page(phys)->__host_state;
+	return (enum pkvm_page_state)p->__host_state;
 }
 
-static inline void set_host_state(phys_addr_t phys, enum pkvm_page_state state)
+static inline void set_host_state(struct hyp_page *p, enum pkvm_page_state state)
 {
-	hyp_phys_to_page(phys)->__host_state = state;
+	p->__host_state = state;
 }
 
-static inline enum pkvm_page_state get_hyp_state(phys_addr_t phys)
+static inline enum pkvm_page_state get_hyp_state(struct hyp_page *p)
 {
-	return hyp_phys_to_page(phys)->__hyp_state_comp ^ PKVM_PAGE_STATE_MASK;
+	return p->__hyp_state_comp ^ PKVM_PAGE_STATE_MASK;
 }
 
-static inline void set_hyp_state(phys_addr_t phys, enum pkvm_page_state state)
+static inline void set_hyp_state(struct hyp_page *p, enum pkvm_page_state state)
 {
-	hyp_phys_to_page(phys)->__hyp_state_comp = state ^ PKVM_PAGE_STATE_MASK;
+	p->__hyp_state_comp = state ^ PKVM_PAGE_STATE_MASK;
 }
 
 /*
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 23544928a637..4d269210dae0 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -60,6 +60,9 @@ static void hyp_unlock_component(void)
 	hyp_spin_unlock(&pkvm_pgd_lock);
 }
 
+#define for_each_hyp_page(start, size, page)   \
+	for (page = hyp_phys_to_page(start); page < hyp_phys_to_page((start) + (size)); page++)
+
 static void *host_s2_zalloc_pages_exact(size_t size)
 {
 	void *addr = hyp_alloc_pages(&host_s2_pool, get_order(size));
@@ -481,7 +484,8 @@ static int host_stage2_adjust_range(u64 addr, struct kvm_mem_range *range)
 		return -EAGAIN;
 
 	if (pte) {
-		WARN_ON(addr_is_memory(addr) && get_host_state(addr) != PKVM_NOPAGE);
+		WARN_ON(addr_is_memory(addr) &&
+			get_host_state(hyp_phys_to_page(addr)) != PKVM_NOPAGE);
 		return -EPERM;
 	}
 
@@ -507,10 +511,10 @@ int host_stage2_idmap_locked(phys_addr_t addr, u64 size,
 
 static void __host_update_page_state(phys_addr_t addr, u64 size, enum pkvm_page_state state)
 {
-	phys_addr_t end = addr + size;
+	struct hyp_page *page;
 
-	for (; addr < end; addr += PAGE_SIZE)
-		set_host_state(addr, state);
+	for_each_hyp_page(addr, size, page)
+		set_host_state(page, state);
 }
 
 int host_stage2_set_owner_locked(phys_addr_t addr, u64 size, u8 owner_id)
@@ -632,16 +636,17 @@ static int check_page_state_range(struct kvm_pgtable *pgt, u64 addr, u64 size,
 static int __host_check_page_state_range(u64 addr, u64 size,
 					 enum pkvm_page_state state)
 {
-	u64 end = addr + size;
+	struct hyp_page *page;
 	int ret;
 
-	ret = check_range_allowed_memory(addr, end);
+	ret = check_range_allowed_memory(addr, addr + size);
 	if (ret)
 		return ret;
 
 	hyp_assert_lock_held(&host_mmu.lock);
-	for (; addr < end; addr += PAGE_SIZE) {
-		if (get_host_state(addr) != state)
+
+	for_each_hyp_page(addr, size, page) {
+		if (get_host_state(page) != state)
 			return -EPERM;
 	}
 
@@ -651,7 +656,7 @@ static int __host_check_page_state_range(u64 addr, u64 size,
 static int __host_set_page_state_range(u64 addr, u64 size,
 				       enum pkvm_page_state state)
 {
-	if (get_host_state(addr) == PKVM_NOPAGE) {
+	if (get_host_state(hyp_phys_to_page(addr)) == PKVM_NOPAGE) {
 		int ret = host_stage2_idmap_locked(addr, size, PKVM_HOST_MEM_PROT);
 
 		if (ret)
@@ -665,18 +670,18 @@ static int __host_set_page_state_range(u64 addr, u64 size,
 
 static void __hyp_set_page_state_range(phys_addr_t phys, u64 size, enum pkvm_page_state state)
 {
-	phys_addr_t end = phys + size;
+	struct hyp_page *page;
 
-	for (; phys < end; phys += PAGE_SIZE)
-		set_hyp_state(phys, state);
+	for_each_hyp_page(phys, size, page)
+		set_hyp_state(page, state);
 }
 
 static int __hyp_check_page_state_range(phys_addr_t phys, u64 size, enum pkvm_page_state state)
 {
-	phys_addr_t end = phys + size;
+	struct hyp_page *page;
 
-	for (; phys < end; phys += PAGE_SIZE) {
-		if (get_hyp_state(phys) != state)
+	for_each_hyp_page(phys, size, page) {
+		if (get_hyp_state(page) != state)
 			return -EPERM;
 	}
 
@@ -927,7 +932,7 @@ int __pkvm_host_share_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu,
 		goto unlock;
 
 	page = hyp_phys_to_page(phys);
-	switch (get_host_state(phys)) {
+	switch (get_host_state(page)) {
 	case PKVM_PAGE_OWNED:
 		WARN_ON(__host_set_page_state_range(phys, PAGE_SIZE, PKVM_PAGE_SHARED_OWNED));
 		break;
@@ -979,9 +984,9 @@ static int __check_host_shared_guest(struct pkvm_hyp_vm *vm, u64 *__phys, u64 ip
 	if (WARN_ON(ret))
 		return ret;
 
-	if (get_host_state(phys) != PKVM_PAGE_SHARED_OWNED)
-		return -EPERM;
 	page = hyp_phys_to_page(phys);
+	if (get_host_state(page) != PKVM_PAGE_SHARED_OWNED)
+		return -EPERM;
 	if (WARN_ON(!page->host_share_guest_count))
 		return -EINVAL;
 
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 6d513a4b3763..c19860fc8183 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -190,6 +190,7 @@ static int fix_host_ownership_walker(const struct kvm_pgtable_visit_ctx *ctx,
 				     enum kvm_pgtable_walk_flags visit)
 {
 	enum pkvm_page_state state;
+	struct hyp_page *page;
 	phys_addr_t phys;
 
 	if (!kvm_pte_valid(ctx->old))
@@ -202,6 +203,8 @@ static int fix_host_ownership_walker(const struct kvm_pgtable_visit_ctx *ctx,
 	if (!addr_is_memory(phys))
 		return -EINVAL;
 
+	page = hyp_phys_to_page(phys);
+
 	/*
 	 * Adjust the host stage-2 mappings to match the ownership attributes
 	 * configured in the hypervisor stage-1, and make sure to propagate them
@@ -210,15 +213,15 @@ static int fix_host_ownership_walker(const struct kvm_pgtable_visit_ctx *ctx,
 	state = pkvm_getstate(kvm_pgtable_hyp_pte_prot(ctx->old));
 	switch (state) {
 	case PKVM_PAGE_OWNED:
-		set_hyp_state(phys, PKVM_PAGE_OWNED);
+		set_hyp_state(page, PKVM_PAGE_OWNED);
 		return host_stage2_set_owner_locked(phys, PAGE_SIZE, PKVM_ID_HYP);
 	case PKVM_PAGE_SHARED_OWNED:
-		set_hyp_state(phys, PKVM_PAGE_SHARED_OWNED);
-		set_host_state(phys, PKVM_PAGE_SHARED_BORROWED);
+		set_hyp_state(page, PKVM_PAGE_SHARED_OWNED);
+		set_host_state(page, PKVM_PAGE_SHARED_BORROWED);
 		break;
 	case PKVM_PAGE_SHARED_BORROWED:
-		set_hyp_state(phys, PKVM_PAGE_SHARED_BORROWED);
-		set_host_state(phys, PKVM_PAGE_SHARED_OWNED);
+		set_hyp_state(page, PKVM_PAGE_SHARED_BORROWED);
+		set_host_state(page, PKVM_PAGE_SHARED_OWNED);
 		break;
 	default:
 		return -EINVAL;
-- 
2.49.0.1015.ga840276032-goog



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 03/10] KVM: arm64: Add a range to __pkvm_host_share_guest()
  2025-05-09 13:16 [PATCH v4 00/10] Stage-2 huge mappings for pKVM np-guests Vincent Donnefort
  2025-05-09 13:16 ` [PATCH v4 01/10] KVM: arm64: Handle huge mappings for np-guest CMOs Vincent Donnefort
  2025-05-09 13:16 ` [PATCH v4 02/10] KVM: arm64: Introduce for_each_hyp_page Vincent Donnefort
@ 2025-05-09 13:16 ` Vincent Donnefort
  2025-05-16 13:10   ` Marc Zyngier
  2025-05-09 13:17 ` [PATCH v4 04/10] KVM: arm64: Add a range to __pkvm_host_unshare_guest() Vincent Donnefort
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 24+ messages in thread
From: Vincent Donnefort @ 2025-05-09 13:16 UTC (permalink / raw)
  To: maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
	catalin.marinas, will
  Cc: qperret, linux-arm-kernel, kvmarm, linux-kernel, kernel-team,
	Vincent Donnefort

In preparation for supporting stage-2 huge mappings for np-guest. Add a
nr_pages argument to the __pkvm_host_share_guest hypercall. This range
supports only two values: 1 or PMD_SIZE / PAGE_SIZE (that is 512 on a
4K-pages system).

Signed-off-by: Vincent Donnefort <vdonnefort@google.com>

diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 26016eb9323f..47aa7b01114f 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -39,7 +39,7 @@ int __pkvm_host_donate_hyp(u64 pfn, u64 nr_pages);
 int __pkvm_hyp_donate_host(u64 pfn, u64 nr_pages);
 int __pkvm_host_share_ffa(u64 pfn, u64 nr_pages);
 int __pkvm_host_unshare_ffa(u64 pfn, u64 nr_pages);
-int __pkvm_host_share_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu,
+int __pkvm_host_share_guest(u64 pfn, u64 gfn, u64 nr_pages, struct pkvm_hyp_vcpu *vcpu,
 			    enum kvm_pgtable_prot prot);
 int __pkvm_host_unshare_guest(u64 gfn, struct pkvm_hyp_vm *hyp_vm);
 int __pkvm_host_relax_perms_guest(u64 gfn, struct pkvm_hyp_vcpu *vcpu, enum kvm_pgtable_prot prot);
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 59db9606e6e1..4d3d215955c3 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -245,7 +245,8 @@ static void handle___pkvm_host_share_guest(struct kvm_cpu_context *host_ctxt)
 {
 	DECLARE_REG(u64, pfn, host_ctxt, 1);
 	DECLARE_REG(u64, gfn, host_ctxt, 2);
-	DECLARE_REG(enum kvm_pgtable_prot, prot, host_ctxt, 3);
+	DECLARE_REG(u64, nr_pages, host_ctxt, 3);
+	DECLARE_REG(enum kvm_pgtable_prot, prot, host_ctxt, 4);
 	struct pkvm_hyp_vcpu *hyp_vcpu;
 	int ret = -EINVAL;
 
@@ -260,7 +261,7 @@ static void handle___pkvm_host_share_guest(struct kvm_cpu_context *host_ctxt)
 	if (ret)
 		goto out;
 
-	ret = __pkvm_host_share_guest(pfn, gfn, hyp_vcpu, prot);
+	ret = __pkvm_host_share_guest(pfn, gfn, nr_pages, hyp_vcpu, prot);
 out:
 	cpu_reg(host_ctxt, 1) =  ret;
 }
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 4d269210dae0..f0f7c6f83e57 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -696,10 +696,9 @@ static enum pkvm_page_state guest_get_page_state(kvm_pte_t pte, u64 addr)
 	return pkvm_getstate(kvm_pgtable_stage2_pte_prot(pte));
 }
 
-static int __guest_check_page_state_range(struct pkvm_hyp_vcpu *vcpu, u64 addr,
+static int __guest_check_page_state_range(struct pkvm_hyp_vm *vm, u64 addr,
 					  u64 size, enum pkvm_page_state state)
 {
-	struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu);
 	struct check_walk_data d = {
 		.desired	= state,
 		.get_page_state	= guest_get_page_state,
@@ -908,48 +907,81 @@ int __pkvm_host_unshare_ffa(u64 pfn, u64 nr_pages)
 	return ret;
 }
 
-int __pkvm_host_share_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu,
+static int __guest_check_transition_size(u64 phys, u64 ipa, u64 nr_pages, u64 *size)
+{
+	if (nr_pages == 1) {
+		*size = PAGE_SIZE;
+		return 0;
+	}
+
+	/* We solely support PMD_SIZE huge-pages */
+	if (nr_pages != (1 << (PMD_SHIFT - PAGE_SHIFT)))
+		return -EINVAL;
+
+	if (!IS_ALIGNED(phys | ipa, PMD_SIZE))
+		return -EINVAL;
+
+	*size = PMD_SIZE;
+	return 0;
+}
+
+int __pkvm_host_share_guest(u64 pfn, u64 gfn, u64 nr_pages, struct pkvm_hyp_vcpu *vcpu,
 			    enum kvm_pgtable_prot prot)
 {
 	struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu);
 	u64 phys = hyp_pfn_to_phys(pfn);
 	u64 ipa = hyp_pfn_to_phys(gfn);
 	struct hyp_page *page;
+	u64 size;
 	int ret;
 
 	if (prot & ~KVM_PGTABLE_PROT_RWX)
 		return -EINVAL;
 
-	ret = check_range_allowed_memory(phys, phys + PAGE_SIZE);
+	ret = __guest_check_transition_size(phys, ipa, nr_pages, &size);
+	if (ret)
+		return ret;
+
+	ret = check_range_allowed_memory(phys, phys + size);
 	if (ret)
 		return ret;
 
 	host_lock_component();
 	guest_lock_component(vm);
 
-	ret = __guest_check_page_state_range(vcpu, ipa, PAGE_SIZE, PKVM_NOPAGE);
+	ret = __guest_check_page_state_range(vm, ipa, size, PKVM_NOPAGE);
 	if (ret)
 		goto unlock;
 
-	page = hyp_phys_to_page(phys);
-	switch (get_host_state(page)) {
-	case PKVM_PAGE_OWNED:
-		WARN_ON(__host_set_page_state_range(phys, PAGE_SIZE, PKVM_PAGE_SHARED_OWNED));
-		break;
-	case PKVM_PAGE_SHARED_OWNED:
-		if (page->host_share_guest_count)
-			break;
-		/* Only host to np-guest multi-sharing is tolerated */
-		fallthrough;
-	default:
-		ret = -EPERM;
-		goto unlock;
+	for_each_hyp_page(phys, size, page) {
+		switch (get_host_state(page)) {
+		case PKVM_PAGE_OWNED:
+			continue;
+		case PKVM_PAGE_SHARED_OWNED:
+			if (page->host_share_guest_count == U32_MAX) {
+				ret = -EBUSY;
+				goto unlock;
+			}
+
+			/* Only host to np-guest multi-sharing is tolerated */
+			if (page->host_share_guest_count)
+				continue;
+
+			fallthrough;
+		default:
+			ret = -EPERM;
+			goto unlock;
+		}
+	}
+
+	for_each_hyp_page(phys, size, page) {
+		set_host_state(page, PKVM_PAGE_SHARED_OWNED);
+		page->host_share_guest_count++;
 	}
 
-	WARN_ON(kvm_pgtable_stage2_map(&vm->pgt, ipa, PAGE_SIZE, phys,
+	WARN_ON(kvm_pgtable_stage2_map(&vm->pgt, ipa, size, phys,
 				       pkvm_mkstate(prot, PKVM_PAGE_SHARED_BORROWED),
 				       &vcpu->vcpu.arch.pkvm_memcache, 0));
-	page->host_share_guest_count++;
 
 unlock:
 	guest_unlock_component(vm);
@@ -1170,6 +1202,9 @@ static void assert_page_state(void)
 	struct pkvm_hyp_vcpu *vcpu = &selftest_vcpu;
 	u64 phys = hyp_virt_to_phys(virt);
 	u64 ipa[2] = { selftest_ipa(), selftest_ipa() + PAGE_SIZE };
+	struct pkvm_hyp_vm *vm;
+
+	vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu);
 
 	host_lock_component();
 	WARN_ON(__host_check_page_state_range(phys, size, selftest_state.host));
@@ -1180,8 +1215,8 @@ static void assert_page_state(void)
 	hyp_unlock_component();
 
 	guest_lock_component(&selftest_vm);
-	WARN_ON(__guest_check_page_state_range(vcpu, ipa[0], size, selftest_state.guest[0]));
-	WARN_ON(__guest_check_page_state_range(vcpu, ipa[1], size, selftest_state.guest[1]));
+	WARN_ON(__guest_check_page_state_range(vm, ipa[0], size, selftest_state.guest[0]));
+	WARN_ON(__guest_check_page_state_range(vm, ipa[1], size, selftest_state.guest[1]));
 	guest_unlock_component(&selftest_vm);
 }
 
@@ -1219,7 +1254,7 @@ void pkvm_ownership_selftest(void *base)
 	assert_transition_res(-EPERM,	__pkvm_host_share_ffa, pfn, 1);
 	assert_transition_res(-EPERM,	__pkvm_host_unshare_ffa, pfn, 1);
 	assert_transition_res(-EPERM,	hyp_pin_shared_mem, virt, virt + size);
-	assert_transition_res(-EPERM,	__pkvm_host_share_guest, pfn, gfn, vcpu, prot);
+	assert_transition_res(-EPERM,	__pkvm_host_share_guest, pfn, gfn, 1, vcpu, prot);
 	assert_transition_res(-ENOENT,	__pkvm_host_unshare_guest, gfn, vm);
 
 	selftest_state.host = PKVM_PAGE_OWNED;
@@ -1238,7 +1273,7 @@ void pkvm_ownership_selftest(void *base)
 	assert_transition_res(-EPERM,	__pkvm_host_donate_hyp, pfn, 1);
 	assert_transition_res(-EPERM,	__pkvm_host_share_ffa, pfn, 1);
 	assert_transition_res(-EPERM,	__pkvm_hyp_donate_host, pfn, 1);
-	assert_transition_res(-EPERM,	__pkvm_host_share_guest, pfn, gfn, vcpu, prot);
+	assert_transition_res(-EPERM,	__pkvm_host_share_guest, pfn, gfn, 1, vcpu, prot);
 	assert_transition_res(-ENOENT,	__pkvm_host_unshare_guest, gfn, vm);
 
 	assert_transition_res(0,	hyp_pin_shared_mem, virt, virt + size);
@@ -1250,7 +1285,7 @@ void pkvm_ownership_selftest(void *base)
 	assert_transition_res(-EPERM,	__pkvm_host_donate_hyp, pfn, 1);
 	assert_transition_res(-EPERM,	__pkvm_host_share_ffa, pfn, 1);
 	assert_transition_res(-EPERM,	__pkvm_hyp_donate_host, pfn, 1);
-	assert_transition_res(-EPERM,	__pkvm_host_share_guest, pfn, gfn, vcpu, prot);
+	assert_transition_res(-EPERM,	__pkvm_host_share_guest, pfn, gfn, 1, vcpu, prot);
 	assert_transition_res(-ENOENT,	__pkvm_host_unshare_guest, gfn, vm);
 
 	hyp_unpin_shared_mem(virt, virt + size);
@@ -1269,7 +1304,7 @@ void pkvm_ownership_selftest(void *base)
 	assert_transition_res(-EPERM,	__pkvm_host_share_hyp, pfn);
 	assert_transition_res(-EPERM,	__pkvm_host_unshare_hyp, pfn);
 	assert_transition_res(-EPERM,	__pkvm_hyp_donate_host, pfn, 1);
-	assert_transition_res(-EPERM,	__pkvm_host_share_guest, pfn, gfn, vcpu, prot);
+	assert_transition_res(-EPERM,	__pkvm_host_share_guest, pfn, gfn, 1, vcpu, prot);
 	assert_transition_res(-ENOENT,	__pkvm_host_unshare_guest, gfn, vm);
 	assert_transition_res(-EPERM,	hyp_pin_shared_mem, virt, virt + size);
 
@@ -1280,8 +1315,8 @@ void pkvm_ownership_selftest(void *base)
 
 	selftest_state.host = PKVM_PAGE_SHARED_OWNED;
 	selftest_state.guest[0] = PKVM_PAGE_SHARED_BORROWED;
-	assert_transition_res(0,	__pkvm_host_share_guest, pfn, gfn, vcpu, prot);
-	assert_transition_res(-EPERM,	__pkvm_host_share_guest, pfn, gfn, vcpu, prot);
+	assert_transition_res(0,	__pkvm_host_share_guest, pfn, gfn, 1, vcpu, prot);
+	assert_transition_res(-EPERM,	__pkvm_host_share_guest, pfn, gfn, 1, vcpu, prot);
 	assert_transition_res(-EPERM,	__pkvm_host_share_ffa, pfn, 1);
 	assert_transition_res(-EPERM,	__pkvm_host_donate_hyp, pfn, 1);
 	assert_transition_res(-EPERM,	__pkvm_host_share_hyp, pfn);
@@ -1290,7 +1325,7 @@ void pkvm_ownership_selftest(void *base)
 	assert_transition_res(-EPERM,	hyp_pin_shared_mem, virt, virt + size);
 
 	selftest_state.guest[1] = PKVM_PAGE_SHARED_BORROWED;
-	assert_transition_res(0,	__pkvm_host_share_guest, pfn, gfn + 1, vcpu, prot);
+	assert_transition_res(0,	__pkvm_host_share_guest, pfn, gfn + 1, 1, vcpu, prot);
 	WARN_ON(hyp_virt_to_page(virt)->host_share_guest_count != 2);
 
 	selftest_state.guest[0] = PKVM_NOPAGE;
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index 83a737484046..0285e2cd2e7f 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -347,7 +347,7 @@ int pkvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
 		return -EINVAL;
 
 	lockdep_assert_held_write(&kvm->mmu_lock);
-	ret = kvm_call_hyp_nvhe(__pkvm_host_share_guest, pfn, gfn, prot);
+	ret = kvm_call_hyp_nvhe(__pkvm_host_share_guest, pfn, gfn, 1, prot);
 	if (ret) {
 		/* Is the gfn already mapped due to a racing vCPU? */
 		if (ret == -EPERM)
-- 
2.49.0.1015.ga840276032-goog



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 04/10] KVM: arm64: Add a range to __pkvm_host_unshare_guest()
  2025-05-09 13:16 [PATCH v4 00/10] Stage-2 huge mappings for pKVM np-guests Vincent Donnefort
                   ` (2 preceding siblings ...)
  2025-05-09 13:16 ` [PATCH v4 03/10] KVM: arm64: Add a range to __pkvm_host_share_guest() Vincent Donnefort
@ 2025-05-09 13:17 ` Vincent Donnefort
  2025-05-09 13:17 ` [PATCH v4 05/10] KVM: arm64: Add a range to __pkvm_host_wrprotect_guest() Vincent Donnefort
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: Vincent Donnefort @ 2025-05-09 13:17 UTC (permalink / raw)
  To: maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
	catalin.marinas, will
  Cc: qperret, linux-arm-kernel, kvmarm, linux-kernel, kernel-team,
	Vincent Donnefort

In preparation for supporting stage-2 huge mappings for np-guest. Add a
nr_pages argument to the __pkvm_host_unshare_guest hypercall. This range
supports only two values: 1 or PMD_SIZE / PAGE_SIZE (that is 512 on a
4K-pages system).

Signed-off-by: Vincent Donnefort <vdonnefort@google.com>

diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 47aa7b01114f..19671edbe18f 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -41,7 +41,7 @@ int __pkvm_host_share_ffa(u64 pfn, u64 nr_pages);
 int __pkvm_host_unshare_ffa(u64 pfn, u64 nr_pages);
 int __pkvm_host_share_guest(u64 pfn, u64 gfn, u64 nr_pages, struct pkvm_hyp_vcpu *vcpu,
 			    enum kvm_pgtable_prot prot);
-int __pkvm_host_unshare_guest(u64 gfn, struct pkvm_hyp_vm *hyp_vm);
+int __pkvm_host_unshare_guest(u64 gfn, u64 nr_pages, struct pkvm_hyp_vm *hyp_vm);
 int __pkvm_host_relax_perms_guest(u64 gfn, struct pkvm_hyp_vcpu *vcpu, enum kvm_pgtable_prot prot);
 int __pkvm_host_wrprotect_guest(u64 gfn, struct pkvm_hyp_vm *hyp_vm);
 int __pkvm_host_test_clear_young_guest(u64 gfn, bool mkold, struct pkvm_hyp_vm *vm);
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 4d3d215955c3..5c03bd1db873 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -270,6 +270,7 @@ static void handle___pkvm_host_unshare_guest(struct kvm_cpu_context *host_ctxt)
 {
 	DECLARE_REG(pkvm_handle_t, handle, host_ctxt, 1);
 	DECLARE_REG(u64, gfn, host_ctxt, 2);
+	DECLARE_REG(u64, nr_pages, host_ctxt, 3);
 	struct pkvm_hyp_vm *hyp_vm;
 	int ret = -EINVAL;
 
@@ -280,7 +281,7 @@ static void handle___pkvm_host_unshare_guest(struct kvm_cpu_context *host_ctxt)
 	if (!hyp_vm)
 		goto out;
 
-	ret = __pkvm_host_unshare_guest(gfn, hyp_vm);
+	ret = __pkvm_host_unshare_guest(gfn, nr_pages, hyp_vm);
 	put_pkvm_hyp_vm(hyp_vm);
 out:
 	cpu_reg(host_ctxt, 1) =  ret;
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index f0f7c6f83e57..ae9a91a21a61 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -990,7 +990,7 @@ int __pkvm_host_share_guest(u64 pfn, u64 gfn, u64 nr_pages, struct pkvm_hyp_vcpu
 	return ret;
 }
 
-static int __check_host_shared_guest(struct pkvm_hyp_vm *vm, u64 *__phys, u64 ipa)
+static int __check_host_shared_guest(struct pkvm_hyp_vm *vm, u64 *__phys, u64 ipa, u64 size)
 {
 	enum pkvm_page_state state;
 	struct hyp_page *page;
@@ -1004,7 +1004,7 @@ static int __check_host_shared_guest(struct pkvm_hyp_vm *vm, u64 *__phys, u64 ip
 		return ret;
 	if (!kvm_pte_valid(pte))
 		return -ENOENT;
-	if (level != KVM_PGTABLE_LAST_LEVEL)
+	if (kvm_granule_size(level) != size)
 		return -E2BIG;
 
 	state = guest_get_page_state(pte, ipa);
@@ -1012,43 +1012,50 @@ static int __check_host_shared_guest(struct pkvm_hyp_vm *vm, u64 *__phys, u64 ip
 		return -EPERM;
 
 	phys = kvm_pte_to_phys(pte);
-	ret = check_range_allowed_memory(phys, phys + PAGE_SIZE);
+	ret = check_range_allowed_memory(phys, phys + size);
 	if (WARN_ON(ret))
 		return ret;
 
-	page = hyp_phys_to_page(phys);
-	if (get_host_state(page) != PKVM_PAGE_SHARED_OWNED)
-		return -EPERM;
-	if (WARN_ON(!page->host_share_guest_count))
-		return -EINVAL;
+	for_each_hyp_page(phys, size, page) {
+		if (get_host_state(page) != PKVM_PAGE_SHARED_OWNED)
+			return -EPERM;
+		if (WARN_ON(!page->host_share_guest_count))
+			return -EINVAL;
+	}
 
 	*__phys = phys;
 
 	return 0;
 }
 
-int __pkvm_host_unshare_guest(u64 gfn, struct pkvm_hyp_vm *vm)
+int __pkvm_host_unshare_guest(u64 gfn, u64 nr_pages, struct pkvm_hyp_vm *vm)
 {
 	u64 ipa = hyp_pfn_to_phys(gfn);
 	struct hyp_page *page;
-	u64 phys;
+	u64 size, phys;
 	int ret;
 
+	ret = __guest_check_transition_size(0, ipa, nr_pages, &size);
+	if (ret)
+		return ret;
+
 	host_lock_component();
 	guest_lock_component(vm);
 
-	ret = __check_host_shared_guest(vm, &phys, ipa);
+	ret = __check_host_shared_guest(vm, &phys, ipa, size);
 	if (ret)
 		goto unlock;
 
-	ret = kvm_pgtable_stage2_unmap(&vm->pgt, ipa, PAGE_SIZE);
+	ret = kvm_pgtable_stage2_unmap(&vm->pgt, ipa, size);
 	if (ret)
 		goto unlock;
 
-	page = hyp_phys_to_page(phys);
-	page->host_share_guest_count--;
-	if (!page->host_share_guest_count)
-		WARN_ON(__host_set_page_state_range(phys, PAGE_SIZE, PKVM_PAGE_OWNED));
+	for_each_hyp_page(phys, size, page) {
+		/* __check_host_shared_guest() protects against underflow */
+		page->host_share_guest_count--;
+		if (!page->host_share_guest_count)
+			set_host_state(page, PKVM_PAGE_OWNED);
+	}
 
 unlock:
 	guest_unlock_component(vm);
@@ -1068,7 +1075,7 @@ static void assert_host_shared_guest(struct pkvm_hyp_vm *vm, u64 ipa)
 	host_lock_component();
 	guest_lock_component(vm);
 
-	ret = __check_host_shared_guest(vm, &phys, ipa);
+	ret = __check_host_shared_guest(vm, &phys, ipa, PAGE_SIZE);
 
 	guest_unlock_component(vm);
 	host_unlock_component();
@@ -1255,7 +1262,7 @@ void pkvm_ownership_selftest(void *base)
 	assert_transition_res(-EPERM,	__pkvm_host_unshare_ffa, pfn, 1);
 	assert_transition_res(-EPERM,	hyp_pin_shared_mem, virt, virt + size);
 	assert_transition_res(-EPERM,	__pkvm_host_share_guest, pfn, gfn, 1, vcpu, prot);
-	assert_transition_res(-ENOENT,	__pkvm_host_unshare_guest, gfn, vm);
+	assert_transition_res(-ENOENT,	__pkvm_host_unshare_guest, gfn, 1, vm);
 
 	selftest_state.host = PKVM_PAGE_OWNED;
 	selftest_state.hyp = PKVM_NOPAGE;
@@ -1263,7 +1270,7 @@ void pkvm_ownership_selftest(void *base)
 	assert_transition_res(-EPERM,	__pkvm_hyp_donate_host, pfn, 1);
 	assert_transition_res(-EPERM,	__pkvm_host_unshare_hyp, pfn);
 	assert_transition_res(-EPERM,	__pkvm_host_unshare_ffa, pfn, 1);
-	assert_transition_res(-ENOENT,	__pkvm_host_unshare_guest, gfn, vm);
+	assert_transition_res(-ENOENT,	__pkvm_host_unshare_guest, gfn, 1, vm);
 	assert_transition_res(-EPERM,	hyp_pin_shared_mem, virt, virt + size);
 
 	selftest_state.host = PKVM_PAGE_SHARED_OWNED;
@@ -1274,7 +1281,7 @@ void pkvm_ownership_selftest(void *base)
 	assert_transition_res(-EPERM,	__pkvm_host_share_ffa, pfn, 1);
 	assert_transition_res(-EPERM,	__pkvm_hyp_donate_host, pfn, 1);
 	assert_transition_res(-EPERM,	__pkvm_host_share_guest, pfn, gfn, 1, vcpu, prot);
-	assert_transition_res(-ENOENT,	__pkvm_host_unshare_guest, gfn, vm);
+	assert_transition_res(-ENOENT,	__pkvm_host_unshare_guest, gfn, 1, vm);
 
 	assert_transition_res(0,	hyp_pin_shared_mem, virt, virt + size);
 	assert_transition_res(0,	hyp_pin_shared_mem, virt, virt + size);
@@ -1286,7 +1293,7 @@ void pkvm_ownership_selftest(void *base)
 	assert_transition_res(-EPERM,	__pkvm_host_share_ffa, pfn, 1);
 	assert_transition_res(-EPERM,	__pkvm_hyp_donate_host, pfn, 1);
 	assert_transition_res(-EPERM,	__pkvm_host_share_guest, pfn, gfn, 1, vcpu, prot);
-	assert_transition_res(-ENOENT,	__pkvm_host_unshare_guest, gfn, vm);
+	assert_transition_res(-ENOENT,	__pkvm_host_unshare_guest, gfn, 1, vm);
 
 	hyp_unpin_shared_mem(virt, virt + size);
 	assert_page_state();
@@ -1305,7 +1312,7 @@ void pkvm_ownership_selftest(void *base)
 	assert_transition_res(-EPERM,	__pkvm_host_unshare_hyp, pfn);
 	assert_transition_res(-EPERM,	__pkvm_hyp_donate_host, pfn, 1);
 	assert_transition_res(-EPERM,	__pkvm_host_share_guest, pfn, gfn, 1, vcpu, prot);
-	assert_transition_res(-ENOENT,	__pkvm_host_unshare_guest, gfn, vm);
+	assert_transition_res(-ENOENT,	__pkvm_host_unshare_guest, gfn, 1, vm);
 	assert_transition_res(-EPERM,	hyp_pin_shared_mem, virt, virt + size);
 
 	selftest_state.host = PKVM_PAGE_OWNED;
@@ -1329,11 +1336,11 @@ void pkvm_ownership_selftest(void *base)
 	WARN_ON(hyp_virt_to_page(virt)->host_share_guest_count != 2);
 
 	selftest_state.guest[0] = PKVM_NOPAGE;
-	assert_transition_res(0,	__pkvm_host_unshare_guest, gfn, vm);
+	assert_transition_res(0,	__pkvm_host_unshare_guest, gfn, 1, vm);
 
 	selftest_state.guest[1] = PKVM_NOPAGE;
 	selftest_state.host = PKVM_PAGE_OWNED;
-	assert_transition_res(0,	__pkvm_host_unshare_guest, gfn + 1, vm);
+	assert_transition_res(0,	__pkvm_host_unshare_guest, gfn + 1, 1, vm);
 
 	selftest_state.host = PKVM_NOPAGE;
 	selftest_state.hyp = PKVM_PAGE_OWNED;
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index 0285e2cd2e7f..f77c5157a8d7 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -371,7 +371,7 @@ int pkvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
 
 	lockdep_assert_held_write(&kvm->mmu_lock);
 	for_each_mapping_in_range_safe(pgt, addr, addr + size, mapping) {
-		ret = kvm_call_hyp_nvhe(__pkvm_host_unshare_guest, handle, mapping->gfn);
+		ret = kvm_call_hyp_nvhe(__pkvm_host_unshare_guest, handle, mapping->gfn, 1);
 		if (WARN_ON(ret))
 			break;
 		rb_erase(&mapping->node, &pgt->pkvm_mappings);
-- 
2.49.0.1015.ga840276032-goog



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 05/10] KVM: arm64: Add a range to __pkvm_host_wrprotect_guest()
  2025-05-09 13:16 [PATCH v4 00/10] Stage-2 huge mappings for pKVM np-guests Vincent Donnefort
                   ` (3 preceding siblings ...)
  2025-05-09 13:17 ` [PATCH v4 04/10] KVM: arm64: Add a range to __pkvm_host_unshare_guest() Vincent Donnefort
@ 2025-05-09 13:17 ` Vincent Donnefort
  2025-05-09 13:17 ` [PATCH v4 06/10] KVM: arm64: Add a range to __pkvm_host_test_clear_young_guest() Vincent Donnefort
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: Vincent Donnefort @ 2025-05-09 13:17 UTC (permalink / raw)
  To: maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
	catalin.marinas, will
  Cc: qperret, linux-arm-kernel, kvmarm, linux-kernel, kernel-team,
	Vincent Donnefort

In preparation for supporting stage-2 huge mappings for np-guest. Add a
nr_pages argument to the __pkvm_host_wrprotect_guest hypercall. This
range supports only two values: 1 or PMD_SIZE / PAGE_SIZE (that is 512
on a 4K-pages system).

Signed-off-by: Vincent Donnefort <vdonnefort@google.com>

diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 19671edbe18f..64d4f3bf6269 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -43,8 +43,8 @@ int __pkvm_host_share_guest(u64 pfn, u64 gfn, u64 nr_pages, struct pkvm_hyp_vcpu
 			    enum kvm_pgtable_prot prot);
 int __pkvm_host_unshare_guest(u64 gfn, u64 nr_pages, struct pkvm_hyp_vm *hyp_vm);
 int __pkvm_host_relax_perms_guest(u64 gfn, struct pkvm_hyp_vcpu *vcpu, enum kvm_pgtable_prot prot);
-int __pkvm_host_wrprotect_guest(u64 gfn, struct pkvm_hyp_vm *hyp_vm);
 int __pkvm_host_test_clear_young_guest(u64 gfn, bool mkold, struct pkvm_hyp_vm *vm);
+int __pkvm_host_wrprotect_guest(u64 gfn, u64 nr_pages, struct pkvm_hyp_vm *hyp_vm);
 int __pkvm_host_mkyoung_guest(u64 gfn, struct pkvm_hyp_vcpu *vcpu);
 
 bool addr_is_memory(phys_addr_t phys);
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 5c03bd1db873..fa7e2421d359 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -310,6 +310,7 @@ static void handle___pkvm_host_wrprotect_guest(struct kvm_cpu_context *host_ctxt
 {
 	DECLARE_REG(pkvm_handle_t, handle, host_ctxt, 1);
 	DECLARE_REG(u64, gfn, host_ctxt, 2);
+	DECLARE_REG(u64, nr_pages, host_ctxt, 3);
 	struct pkvm_hyp_vm *hyp_vm;
 	int ret = -EINVAL;
 
@@ -320,7 +321,7 @@ static void handle___pkvm_host_wrprotect_guest(struct kvm_cpu_context *host_ctxt
 	if (!hyp_vm)
 		goto out;
 
-	ret = __pkvm_host_wrprotect_guest(gfn, hyp_vm);
+	ret = __pkvm_host_wrprotect_guest(gfn, nr_pages, hyp_vm);
 	put_pkvm_hyp_vm(hyp_vm);
 out:
 	cpu_reg(host_ctxt, 1) = ret;
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index ae9a91a21a61..887848408e1b 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -1064,7 +1064,7 @@ int __pkvm_host_unshare_guest(u64 gfn, u64 nr_pages, struct pkvm_hyp_vm *vm)
 	return ret;
 }
 
-static void assert_host_shared_guest(struct pkvm_hyp_vm *vm, u64 ipa)
+static void assert_host_shared_guest(struct pkvm_hyp_vm *vm, u64 ipa, u64 size)
 {
 	u64 phys;
 	int ret;
@@ -1075,7 +1075,7 @@ static void assert_host_shared_guest(struct pkvm_hyp_vm *vm, u64 ipa)
 	host_lock_component();
 	guest_lock_component(vm);
 
-	ret = __check_host_shared_guest(vm, &phys, ipa, PAGE_SIZE);
+	ret = __check_host_shared_guest(vm, &phys, ipa, size);
 
 	guest_unlock_component(vm);
 	host_unlock_component();
@@ -1095,7 +1095,7 @@ int __pkvm_host_relax_perms_guest(u64 gfn, struct pkvm_hyp_vcpu *vcpu, enum kvm_
 	if (prot & ~KVM_PGTABLE_PROT_RWX)
 		return -EINVAL;
 
-	assert_host_shared_guest(vm, ipa);
+	assert_host_shared_guest(vm, ipa, PAGE_SIZE);
 	guest_lock_component(vm);
 	ret = kvm_pgtable_stage2_relax_perms(&vm->pgt, ipa, prot, 0);
 	guest_unlock_component(vm);
@@ -1103,17 +1103,21 @@ int __pkvm_host_relax_perms_guest(u64 gfn, struct pkvm_hyp_vcpu *vcpu, enum kvm_
 	return ret;
 }
 
-int __pkvm_host_wrprotect_guest(u64 gfn, struct pkvm_hyp_vm *vm)
+int __pkvm_host_wrprotect_guest(u64 gfn, u64 nr_pages, struct pkvm_hyp_vm *vm)
 {
-	u64 ipa = hyp_pfn_to_phys(gfn);
+	u64 size, ipa = hyp_pfn_to_phys(gfn);
 	int ret;
 
 	if (pkvm_hyp_vm_is_protected(vm))
 		return -EPERM;
 
-	assert_host_shared_guest(vm, ipa);
+	ret = __guest_check_transition_size(0, ipa, nr_pages, &size);
+	if (ret)
+		return ret;
+
+	assert_host_shared_guest(vm, ipa, size);
 	guest_lock_component(vm);
-	ret = kvm_pgtable_stage2_wrprotect(&vm->pgt, ipa, PAGE_SIZE);
+	ret = kvm_pgtable_stage2_wrprotect(&vm->pgt, ipa, size);
 	guest_unlock_component(vm);
 
 	return ret;
@@ -1127,7 +1131,7 @@ int __pkvm_host_test_clear_young_guest(u64 gfn, bool mkold, struct pkvm_hyp_vm *
 	if (pkvm_hyp_vm_is_protected(vm))
 		return -EPERM;
 
-	assert_host_shared_guest(vm, ipa);
+	assert_host_shared_guest(vm, ipa, PAGE_SIZE);
 	guest_lock_component(vm);
 	ret = kvm_pgtable_stage2_test_clear_young(&vm->pgt, ipa, PAGE_SIZE, mkold);
 	guest_unlock_component(vm);
@@ -1143,7 +1147,7 @@ int __pkvm_host_mkyoung_guest(u64 gfn, struct pkvm_hyp_vcpu *vcpu)
 	if (pkvm_hyp_vm_is_protected(vm))
 		return -EPERM;
 
-	assert_host_shared_guest(vm, ipa);
+	assert_host_shared_guest(vm, ipa, PAGE_SIZE);
 	guest_lock_component(vm);
 	kvm_pgtable_stage2_mkyoung(&vm->pgt, ipa, 0);
 	guest_unlock_component(vm);
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index f77c5157a8d7..daab4a00790a 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -390,7 +390,7 @@ int pkvm_pgtable_stage2_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size)
 
 	lockdep_assert_held(&kvm->mmu_lock);
 	for_each_mapping_in_range_safe(pgt, addr, addr + size, mapping) {
-		ret = kvm_call_hyp_nvhe(__pkvm_host_wrprotect_guest, handle, mapping->gfn);
+		ret = kvm_call_hyp_nvhe(__pkvm_host_wrprotect_guest, handle, mapping->gfn, 1);
 		if (WARN_ON(ret))
 			break;
 	}
-- 
2.49.0.1015.ga840276032-goog



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 06/10] KVM: arm64: Add a range to __pkvm_host_test_clear_young_guest()
  2025-05-09 13:16 [PATCH v4 00/10] Stage-2 huge mappings for pKVM np-guests Vincent Donnefort
                   ` (4 preceding siblings ...)
  2025-05-09 13:17 ` [PATCH v4 05/10] KVM: arm64: Add a range to __pkvm_host_wrprotect_guest() Vincent Donnefort
@ 2025-05-09 13:17 ` Vincent Donnefort
  2025-05-09 13:17 ` [PATCH v4 07/10] KVM: arm64: Convert pkvm_mappings to interval tree Vincent Donnefort
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: Vincent Donnefort @ 2025-05-09 13:17 UTC (permalink / raw)
  To: maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
	catalin.marinas, will
  Cc: qperret, linux-arm-kernel, kvmarm, linux-kernel, kernel-team,
	Vincent Donnefort

In preparation for supporting stage-2 huge mappings for np-guest. Add a
nr_pages argument to the __pkvm_host_test_clear_young_guest hypercall.
This range supports only two values: 1 or PMD_SIZE / PAGE_SIZE (that is
512 on a 4K-pages system).

Signed-off-by: Vincent Donnefort <vdonnefort@google.com>

diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 64d4f3bf6269..5f9d56754e39 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -43,8 +43,8 @@ int __pkvm_host_share_guest(u64 pfn, u64 gfn, u64 nr_pages, struct pkvm_hyp_vcpu
 			    enum kvm_pgtable_prot prot);
 int __pkvm_host_unshare_guest(u64 gfn, u64 nr_pages, struct pkvm_hyp_vm *hyp_vm);
 int __pkvm_host_relax_perms_guest(u64 gfn, struct pkvm_hyp_vcpu *vcpu, enum kvm_pgtable_prot prot);
-int __pkvm_host_test_clear_young_guest(u64 gfn, bool mkold, struct pkvm_hyp_vm *vm);
 int __pkvm_host_wrprotect_guest(u64 gfn, u64 nr_pages, struct pkvm_hyp_vm *hyp_vm);
+int __pkvm_host_test_clear_young_guest(u64 gfn, u64 nr_pages, bool mkold, struct pkvm_hyp_vm *vm);
 int __pkvm_host_mkyoung_guest(u64 gfn, struct pkvm_hyp_vcpu *vcpu);
 
 bool addr_is_memory(phys_addr_t phys);
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index fa7e2421d359..8e8848de4d47 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -331,7 +331,8 @@ static void handle___pkvm_host_test_clear_young_guest(struct kvm_cpu_context *ho
 {
 	DECLARE_REG(pkvm_handle_t, handle, host_ctxt, 1);
 	DECLARE_REG(u64, gfn, host_ctxt, 2);
-	DECLARE_REG(bool, mkold, host_ctxt, 3);
+	DECLARE_REG(u64, nr_pages, host_ctxt, 3);
+	DECLARE_REG(bool, mkold, host_ctxt, 4);
 	struct pkvm_hyp_vm *hyp_vm;
 	int ret = -EINVAL;
 
@@ -342,7 +343,7 @@ static void handle___pkvm_host_test_clear_young_guest(struct kvm_cpu_context *ho
 	if (!hyp_vm)
 		goto out;
 
-	ret = __pkvm_host_test_clear_young_guest(gfn, mkold, hyp_vm);
+	ret = __pkvm_host_test_clear_young_guest(gfn, nr_pages, mkold, hyp_vm);
 	put_pkvm_hyp_vm(hyp_vm);
 out:
 	cpu_reg(host_ctxt, 1) = ret;
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 887848408e1b..78fb9cea2034 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -1123,17 +1123,21 @@ int __pkvm_host_wrprotect_guest(u64 gfn, u64 nr_pages, struct pkvm_hyp_vm *vm)
 	return ret;
 }
 
-int __pkvm_host_test_clear_young_guest(u64 gfn, bool mkold, struct pkvm_hyp_vm *vm)
+int __pkvm_host_test_clear_young_guest(u64 gfn, u64 nr_pages, bool mkold, struct pkvm_hyp_vm *vm)
 {
-	u64 ipa = hyp_pfn_to_phys(gfn);
+	u64 size, ipa = hyp_pfn_to_phys(gfn);
 	int ret;
 
 	if (pkvm_hyp_vm_is_protected(vm))
 		return -EPERM;
 
-	assert_host_shared_guest(vm, ipa, PAGE_SIZE);
+	ret = __guest_check_transition_size(0, ipa, nr_pages, &size);
+	if (ret)
+		return ret;
+
+	assert_host_shared_guest(vm, ipa, size);
 	guest_lock_component(vm);
-	ret = kvm_pgtable_stage2_test_clear_young(&vm->pgt, ipa, PAGE_SIZE, mkold);
+	ret = kvm_pgtable_stage2_test_clear_young(&vm->pgt, ipa, size, mkold);
 	guest_unlock_component(vm);
 
 	return ret;
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index daab4a00790a..057874bbe3e1 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -420,7 +420,7 @@ bool pkvm_pgtable_stage2_test_clear_young(struct kvm_pgtable *pgt, u64 addr, u64
 	lockdep_assert_held(&kvm->mmu_lock);
 	for_each_mapping_in_range_safe(pgt, addr, addr + size, mapping)
 		young |= kvm_call_hyp_nvhe(__pkvm_host_test_clear_young_guest, handle, mapping->gfn,
-					   mkold);
+					   1, mkold);
 
 	return young;
 }
-- 
2.49.0.1015.ga840276032-goog



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 07/10] KVM: arm64: Convert pkvm_mappings to interval tree
  2025-05-09 13:16 [PATCH v4 00/10] Stage-2 huge mappings for pKVM np-guests Vincent Donnefort
                   ` (5 preceding siblings ...)
  2025-05-09 13:17 ` [PATCH v4 06/10] KVM: arm64: Add a range to __pkvm_host_test_clear_young_guest() Vincent Donnefort
@ 2025-05-09 13:17 ` Vincent Donnefort
  2025-05-16 13:15   ` Marc Zyngier
  2025-05-09 13:17 ` [PATCH v4 08/10] KVM: arm64: Add a range to pkvm_mappings Vincent Donnefort
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 24+ messages in thread
From: Vincent Donnefort @ 2025-05-09 13:17 UTC (permalink / raw)
  To: maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
	catalin.marinas, will
  Cc: qperret, linux-arm-kernel, kvmarm, linux-kernel, kernel-team,
	Vincent Donnefort

From: Quentin Perret <qperret@google.com>

In preparation for supporting stage-2 huge mappings for np-guest, let's
convert pgt.pkvm_mappings to an interval tree.

No functional change intended.

Suggested-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 6b9d274052c7..1b43bcd2a679 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -413,7 +413,7 @@ static inline bool kvm_pgtable_walk_lock_held(void)
  */
 struct kvm_pgtable {
 	union {
-		struct rb_root					pkvm_mappings;
+		struct rb_root_cached				pkvm_mappings;
 		struct {
 			u32					ia_bits;
 			s8					start_level;
diff --git a/arch/arm64/include/asm/kvm_pkvm.h b/arch/arm64/include/asm/kvm_pkvm.h
index d91bfcf2db56..da75d41c948c 100644
--- a/arch/arm64/include/asm/kvm_pkvm.h
+++ b/arch/arm64/include/asm/kvm_pkvm.h
@@ -173,6 +173,7 @@ struct pkvm_mapping {
 	struct rb_node node;
 	u64 gfn;
 	u64 pfn;
+	u64 __subtree_last;	/* Internal member for interval tree */
 };
 
 int pkvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index 057874bbe3e1..6febddbec69e 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -5,6 +5,7 @@
  */
 
 #include <linux/init.h>
+#include <linux/interval_tree_generic.h>
 #include <linux/kmemleak.h>
 #include <linux/kvm_host.h>
 #include <asm/kvm_mmu.h>
@@ -256,80 +257,63 @@ static int __init finalize_pkvm(void)
 }
 device_initcall_sync(finalize_pkvm);
 
-static int cmp_mappings(struct rb_node *node, const struct rb_node *parent)
+static u64 __pkvm_mapping_start(struct pkvm_mapping *m)
 {
-	struct pkvm_mapping *a = rb_entry(node, struct pkvm_mapping, node);
-	struct pkvm_mapping *b = rb_entry(parent, struct pkvm_mapping, node);
-
-	if (a->gfn < b->gfn)
-		return -1;
-	if (a->gfn > b->gfn)
-		return 1;
-	return 0;
+	return m->gfn * PAGE_SIZE;
 }
 
-static struct rb_node *find_first_mapping_node(struct rb_root *root, u64 gfn)
+static u64 __pkvm_mapping_end(struct pkvm_mapping *m)
 {
-	struct rb_node *node = root->rb_node, *prev = NULL;
-	struct pkvm_mapping *mapping;
-
-	while (node) {
-		mapping = rb_entry(node, struct pkvm_mapping, node);
-		if (mapping->gfn == gfn)
-			return node;
-		prev = node;
-		node = (gfn < mapping->gfn) ? node->rb_left : node->rb_right;
-	}
-
-	return prev;
+	return (m->gfn + 1) * PAGE_SIZE - 1;
 }
 
-/*
- * __tmp is updated to rb_next(__tmp) *before* entering the body of the loop to allow freeing
- * of __map inline.
- */
+INTERVAL_TREE_DEFINE(struct pkvm_mapping, node, u64, __subtree_last,
+		     __pkvm_mapping_start, __pkvm_mapping_end, static,
+		     pkvm_mapping);
+
 #define for_each_mapping_in_range_safe(__pgt, __start, __end, __map)				\
-	for (struct rb_node *__tmp = find_first_mapping_node(&(__pgt)->pkvm_mappings,		\
-							     ((__start) >> PAGE_SHIFT));	\
+	for (struct pkvm_mapping *__tmp = pkvm_mapping_iter_first(&(__pgt)->pkvm_mappings,	\
+								  __start, __end - 1);		\
 	     __tmp && ({									\
-				__map = rb_entry(__tmp, struct pkvm_mapping, node);		\
-				__tmp = rb_next(__tmp);						\
+				__map = __tmp;							\
+				__tmp = pkvm_mapping_iter_next(__map, __start, __end - 1);	\
 				true;								\
 		       });									\
-	    )											\
-		if (__map->gfn < ((__start) >> PAGE_SHIFT))					\
-			continue;								\
-		else if (__map->gfn >= ((__end) >> PAGE_SHIFT))					\
-			break;									\
-		else
+	    )
 
 int pkvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
 			     struct kvm_pgtable_mm_ops *mm_ops)
 {
-	pgt->pkvm_mappings	= RB_ROOT;
+	pgt->pkvm_mappings	= RB_ROOT_CACHED;
 	pgt->mmu		= mmu;
 
 	return 0;
 }
 
-void pkvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
+static int __pkvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 start, u64 end)
 {
 	struct kvm *kvm = kvm_s2_mmu_to_kvm(pgt->mmu);
 	pkvm_handle_t handle = kvm->arch.pkvm.handle;
 	struct pkvm_mapping *mapping;
-	struct rb_node *node;
+	int ret;
 
 	if (!handle)
-		return;
+		return 0;
 
-	node = rb_first(&pgt->pkvm_mappings);
-	while (node) {
-		mapping = rb_entry(node, struct pkvm_mapping, node);
-		kvm_call_hyp_nvhe(__pkvm_host_unshare_guest, handle, mapping->gfn);
-		node = rb_next(node);
-		rb_erase(&mapping->node, &pgt->pkvm_mappings);
+	for_each_mapping_in_range_safe(pgt, start, end, mapping) {
+		ret = kvm_call_hyp_nvhe(__pkvm_host_unshare_guest, handle, mapping->gfn, 1);
+		if (WARN_ON(ret))
+			return ret;
+		pkvm_mapping_remove(mapping, &pgt->pkvm_mappings);
 		kfree(mapping);
 	}
+
+	return 0;
+}
+
+void pkvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
+{
+	__pkvm_pgtable_stage2_unmap(pgt, 0, ~(0ULL));
 }
 
 int pkvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
@@ -357,28 +341,16 @@ int pkvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
 	swap(mapping, cache->mapping);
 	mapping->gfn = gfn;
 	mapping->pfn = pfn;
-	WARN_ON(rb_find_add(&mapping->node, &pgt->pkvm_mappings, cmp_mappings));
+	pkvm_mapping_insert(mapping, &pgt->pkvm_mappings);
 
 	return ret;
 }
 
 int pkvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
 {
-	struct kvm *kvm = kvm_s2_mmu_to_kvm(pgt->mmu);
-	pkvm_handle_t handle = kvm->arch.pkvm.handle;
-	struct pkvm_mapping *mapping;
-	int ret = 0;
+	lockdep_assert_held_write(&kvm_s2_mmu_to_kvm(pgt->mmu)->mmu_lock);
 
-	lockdep_assert_held_write(&kvm->mmu_lock);
-	for_each_mapping_in_range_safe(pgt, addr, addr + size, mapping) {
-		ret = kvm_call_hyp_nvhe(__pkvm_host_unshare_guest, handle, mapping->gfn, 1);
-		if (WARN_ON(ret))
-			break;
-		rb_erase(&mapping->node, &pgt->pkvm_mappings);
-		kfree(mapping);
-	}
-
-	return ret;
+	return __pkvm_pgtable_stage2_unmap(pgt, addr, addr + size);
 }
 
 int pkvm_pgtable_stage2_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size)
-- 
2.49.0.1015.ga840276032-goog



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 08/10] KVM: arm64: Add a range to pkvm_mappings
  2025-05-09 13:16 [PATCH v4 00/10] Stage-2 huge mappings for pKVM np-guests Vincent Donnefort
                   ` (6 preceding siblings ...)
  2025-05-09 13:17 ` [PATCH v4 07/10] KVM: arm64: Convert pkvm_mappings to interval tree Vincent Donnefort
@ 2025-05-09 13:17 ` Vincent Donnefort
  2025-05-09 13:17 ` [PATCH v4 09/10] KVM: arm64: Stage-2 huge mappings for np-guests Vincent Donnefort
  2025-05-09 13:17 ` [PATCH v4 10/10] KVM: arm64: np-guest CMOs with PMD_SIZE fixmap Vincent Donnefort
  9 siblings, 0 replies; 24+ messages in thread
From: Vincent Donnefort @ 2025-05-09 13:17 UTC (permalink / raw)
  To: maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
	catalin.marinas, will
  Cc: qperret, linux-arm-kernel, kvmarm, linux-kernel, kernel-team,
	Vincent Donnefort

From: Quentin Perret <qperret@google.com>

In preparation for supporting stage-2 huge mappings for np-guest, add a
nr_pages member for pkvm_mappings to allow EL1 to track the size of the
stage-2 mapping.

Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>

diff --git a/arch/arm64/include/asm/kvm_pkvm.h b/arch/arm64/include/asm/kvm_pkvm.h
index da75d41c948c..ea58282f59bb 100644
--- a/arch/arm64/include/asm/kvm_pkvm.h
+++ b/arch/arm64/include/asm/kvm_pkvm.h
@@ -173,6 +173,7 @@ struct pkvm_mapping {
 	struct rb_node node;
 	u64 gfn;
 	u64 pfn;
+	u64 nr_pages;
 	u64 __subtree_last;	/* Internal member for interval tree */
 };
 
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index 6febddbec69e..0e30f16149d5 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -264,7 +264,7 @@ static u64 __pkvm_mapping_start(struct pkvm_mapping *m)
 
 static u64 __pkvm_mapping_end(struct pkvm_mapping *m)
 {
-	return (m->gfn + 1) * PAGE_SIZE - 1;
+	return (m->gfn + m->nr_pages) * PAGE_SIZE - 1;
 }
 
 INTERVAL_TREE_DEFINE(struct pkvm_mapping, node, u64, __subtree_last,
@@ -301,7 +301,8 @@ static int __pkvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 start, u64 e
 		return 0;
 
 	for_each_mapping_in_range_safe(pgt, start, end, mapping) {
-		ret = kvm_call_hyp_nvhe(__pkvm_host_unshare_guest, handle, mapping->gfn, 1);
+		ret = kvm_call_hyp_nvhe(__pkvm_host_unshare_guest, handle, mapping->gfn,
+					mapping->nr_pages);
 		if (WARN_ON(ret))
 			return ret;
 		pkvm_mapping_remove(mapping, &pgt->pkvm_mappings);
@@ -331,16 +332,32 @@ int pkvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
 		return -EINVAL;
 
 	lockdep_assert_held_write(&kvm->mmu_lock);
-	ret = kvm_call_hyp_nvhe(__pkvm_host_share_guest, pfn, gfn, 1, prot);
-	if (ret) {
-		/* Is the gfn already mapped due to a racing vCPU? */
-		if (ret == -EPERM)
+
+	/*
+	 * Calling stage2_map() on top of existing mappings is either happening because of a race
+	 * with another vCPU, or because we're changing between page and block mappings. As per
+	 * user_mem_abort(), same-size permission faults are handled in the relax_perms() path.
+	 */
+	mapping = pkvm_mapping_iter_first(&pgt->pkvm_mappings, addr, addr + size - 1);
+	if (mapping) {
+		if (size == (mapping->nr_pages * PAGE_SIZE))
 			return -EAGAIN;
+
+		/* Remove _any_ pkvm_mapping overlapping with the range, bigger or smaller. */
+		ret = __pkvm_pgtable_stage2_unmap(pgt, addr, addr + size);
+		if (ret)
+			return ret;
+		mapping = NULL;
 	}
 
+	ret = kvm_call_hyp_nvhe(__pkvm_host_share_guest, pfn, gfn, size / PAGE_SIZE, prot);
+	if (WARN_ON(ret))
+		return ret;
+
 	swap(mapping, cache->mapping);
 	mapping->gfn = gfn;
 	mapping->pfn = pfn;
+	mapping->nr_pages = size / PAGE_SIZE;
 	pkvm_mapping_insert(mapping, &pgt->pkvm_mappings);
 
 	return ret;
@@ -362,7 +379,8 @@ int pkvm_pgtable_stage2_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size)
 
 	lockdep_assert_held(&kvm->mmu_lock);
 	for_each_mapping_in_range_safe(pgt, addr, addr + size, mapping) {
-		ret = kvm_call_hyp_nvhe(__pkvm_host_wrprotect_guest, handle, mapping->gfn, 1);
+		ret = kvm_call_hyp_nvhe(__pkvm_host_wrprotect_guest, handle, mapping->gfn,
+					mapping->nr_pages);
 		if (WARN_ON(ret))
 			break;
 	}
@@ -377,7 +395,8 @@ int pkvm_pgtable_stage2_flush(struct kvm_pgtable *pgt, u64 addr, u64 size)
 
 	lockdep_assert_held(&kvm->mmu_lock);
 	for_each_mapping_in_range_safe(pgt, addr, addr + size, mapping)
-		__clean_dcache_guest_page(pfn_to_kaddr(mapping->pfn), PAGE_SIZE);
+		__clean_dcache_guest_page(pfn_to_kaddr(mapping->pfn),
+					  PAGE_SIZE * mapping->nr_pages);
 
 	return 0;
 }
@@ -392,7 +411,7 @@ bool pkvm_pgtable_stage2_test_clear_young(struct kvm_pgtable *pgt, u64 addr, u64
 	lockdep_assert_held(&kvm->mmu_lock);
 	for_each_mapping_in_range_safe(pgt, addr, addr + size, mapping)
 		young |= kvm_call_hyp_nvhe(__pkvm_host_test_clear_young_guest, handle, mapping->gfn,
-					   1, mkold);
+					   mapping->nr_pages, mkold);
 
 	return young;
 }
-- 
2.49.0.1015.ga840276032-goog



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 09/10] KVM: arm64: Stage-2 huge mappings for np-guests
  2025-05-09 13:16 [PATCH v4 00/10] Stage-2 huge mappings for pKVM np-guests Vincent Donnefort
                   ` (7 preceding siblings ...)
  2025-05-09 13:17 ` [PATCH v4 08/10] KVM: arm64: Add a range to pkvm_mappings Vincent Donnefort
@ 2025-05-09 13:17 ` Vincent Donnefort
  2025-05-16 13:28   ` Marc Zyngier
  2025-05-09 13:17 ` [PATCH v4 10/10] KVM: arm64: np-guest CMOs with PMD_SIZE fixmap Vincent Donnefort
  9 siblings, 1 reply; 24+ messages in thread
From: Vincent Donnefort @ 2025-05-09 13:17 UTC (permalink / raw)
  To: maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
	catalin.marinas, will
  Cc: qperret, linux-arm-kernel, kvmarm, linux-kernel, kernel-team,
	Vincent Donnefort

Now np-guests hypercalls with range are supported, we can let the
hypervisor to install block mappings whenever the Stage-1 allows it,
that is when backed by either Hugetlbfs or THPs. The size of those block
mappings is limited to PMD_SIZE.

Signed-off-by: Vincent Donnefort <vdonnefort@google.com>

diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 78fb9cea2034..97e0fea9db4e 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -167,7 +167,7 @@ int kvm_host_prepare_stage2(void *pgt_pool_base)
 static bool guest_stage2_force_pte_cb(u64 addr, u64 end,
 				      enum kvm_pgtable_prot prot)
 {
-	return true;
+	return false;
 }
 
 static void *guest_s2_zalloc_pages_exact(size_t size)
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 754f2fe0cc67..7c8be22e81f9 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1537,7 +1537,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	 * logging_active is guaranteed to never be true for VM_PFNMAP
 	 * memslots.
 	 */
-	if (logging_active || is_protected_kvm_enabled()) {
+	if (logging_active) {
 		force_pte = true;
 		vma_shift = PAGE_SHIFT;
 	} else {
@@ -1547,7 +1547,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	switch (vma_shift) {
 #ifndef __PAGETABLE_PMD_FOLDED
 	case PUD_SHIFT:
-		if (fault_supports_stage2_huge_mapping(memslot, hva, PUD_SIZE))
+		if (!is_protected_kvm_enabled() &&
+		    fault_supports_stage2_huge_mapping(memslot, hva, PUD_SIZE))
 			break;
 		fallthrough;
 #endif
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index 0e30f16149d5..9504169fb37f 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -328,7 +328,7 @@ int pkvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
 	u64 pfn = phys >> PAGE_SHIFT;
 	int ret;
 
-	if (size != PAGE_SIZE)
+	if (size != PAGE_SIZE && size != PMD_SIZE)
 		return -EINVAL;
 
 	lockdep_assert_held_write(&kvm->mmu_lock);
-- 
2.49.0.1015.ga840276032-goog



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 10/10] KVM: arm64: np-guest CMOs with PMD_SIZE fixmap
  2025-05-09 13:16 [PATCH v4 00/10] Stage-2 huge mappings for pKVM np-guests Vincent Donnefort
                   ` (8 preceding siblings ...)
  2025-05-09 13:17 ` [PATCH v4 09/10] KVM: arm64: Stage-2 huge mappings for np-guests Vincent Donnefort
@ 2025-05-09 13:17 ` Vincent Donnefort
  2025-05-16 13:55   ` Marc Zyngier
  9 siblings, 1 reply; 24+ messages in thread
From: Vincent Donnefort @ 2025-05-09 13:17 UTC (permalink / raw)
  To: maz, oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
	catalin.marinas, will
  Cc: qperret, linux-arm-kernel, kvmarm, linux-kernel, kernel-team,
	Vincent Donnefort

With the introduction of stage-2 huge mappings in the pKVM hypervisor,
guest pages CMO is needed for PMD_SIZE size. Fixmap only supports
PAGE_SIZE and iterating over the huge-page is time consuming (mostly due
to TLBI on hyp_fixmap_unmap) which is a problem for EL2 latency.

Introduce a shared PMD_SIZE fixmap (hyp_fixblock_map/hyp_fixblock_unmap)
to improve guest page CMOs when stage-2 huge mappings are installed.

On a Pixel6, the iterative solution resulted in a latency of ~700us,
while the PMD_SIZE fixmap reduces it to ~100us.

Because of the horrendous private range allocation that would be
necessary, this is disabled for 64KiB pages systems.

Suggested-by: Quentin Perret <qperret@google.com>
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 1b43bcd2a679..2888b5d03757 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -59,6 +59,11 @@ typedef u64 kvm_pte_t;
 
 #define KVM_PHYS_INVALID		(-1ULL)
 
+#define KVM_PTE_TYPE			BIT(1)
+#define KVM_PTE_TYPE_BLOCK		0
+#define KVM_PTE_TYPE_PAGE		1
+#define KVM_PTE_TYPE_TABLE		1
+
 #define KVM_PTE_LEAF_ATTR_LO		GENMASK(11, 2)
 
 #define KVM_PTE_LEAF_ATTR_LO_S1_ATTRIDX	GENMASK(4, 2)
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mm.h b/arch/arm64/kvm/hyp/include/nvhe/mm.h
index 230e4f2527de..b0c72bc2d5ba 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mm.h
@@ -13,9 +13,11 @@
 extern struct kvm_pgtable pkvm_pgtable;
 extern hyp_spinlock_t pkvm_pgd_lock;
 
-int hyp_create_pcpu_fixmap(void);
+int hyp_create_fixmap(void);
 void *hyp_fixmap_map(phys_addr_t phys);
 void hyp_fixmap_unmap(void);
+void *hyp_fixblock_map(phys_addr_t phys);
+void hyp_fixblock_unmap(void);
 
 int hyp_create_idmap(u32 hyp_va_bits);
 int hyp_map_vectors(void);
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 97e0fea9db4e..9f3ffa4e0690 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -220,16 +220,52 @@ static void guest_s2_put_page(void *addr)
 	hyp_put_page(&current_vm->pool, addr);
 }
 
+static void *__fixmap_guest_page(void *va, size_t *size)
+{
+	if (IS_ALIGNED(*size, PMD_SIZE)) {
+		void *addr = hyp_fixblock_map(__hyp_pa(va));
+
+		if (addr)
+			return addr;
+
+		*size = PAGE_SIZE;
+	}
+
+	if (IS_ALIGNED(*size, PAGE_SIZE))
+		return hyp_fixmap_map(__hyp_pa(va));
+
+	WARN_ON(1);
+
+	return NULL;
+}
+
+static void __fixunmap_guest_page(size_t size)
+{
+	switch (size) {
+	case PAGE_SIZE:
+		hyp_fixmap_unmap();
+		break;
+	case PMD_SIZE:
+		hyp_fixblock_unmap();
+		break;
+	default:
+		WARN_ON(1);
+	}
+}
+
 static void clean_dcache_guest_page(void *va, size_t size)
 {
 	WARN_ON(!PAGE_ALIGNED(size));
 
 	while (size) {
-		__clean_dcache_guest_page(hyp_fixmap_map(__hyp_pa(va)),
-					  PAGE_SIZE);
-		hyp_fixmap_unmap();
-		va += PAGE_SIZE;
-		size -= PAGE_SIZE;
+		size_t fixmap_size = size == PMD_SIZE ? size : PAGE_SIZE;
+		void *addr = __fixmap_guest_page(va, &fixmap_size);
+
+		__clean_dcache_guest_page(addr, fixmap_size);
+		__fixunmap_guest_page(fixmap_size);
+
+		size -= fixmap_size;
+		va += fixmap_size;
 	}
 }
 
@@ -238,11 +274,14 @@ static void invalidate_icache_guest_page(void *va, size_t size)
 	WARN_ON(!PAGE_ALIGNED(size));
 
 	while (size) {
-		__invalidate_icache_guest_page(hyp_fixmap_map(__hyp_pa(va)),
-					       PAGE_SIZE);
-		hyp_fixmap_unmap();
-		va += PAGE_SIZE;
-		size -= PAGE_SIZE;
+		size_t fixmap_size = size == PMD_SIZE ? size : PAGE_SIZE;
+		void *addr = __fixmap_guest_page(va, &fixmap_size);
+
+		__invalidate_icache_guest_page(addr, fixmap_size);
+		__fixunmap_guest_page(fixmap_size);
+
+		size -= fixmap_size;
+		va += fixmap_size;
 	}
 }
 
diff --git a/arch/arm64/kvm/hyp/nvhe/mm.c b/arch/arm64/kvm/hyp/nvhe/mm.c
index f41c7440b34b..e3b1bece8504 100644
--- a/arch/arm64/kvm/hyp/nvhe/mm.c
+++ b/arch/arm64/kvm/hyp/nvhe/mm.c
@@ -229,9 +229,8 @@ int hyp_map_vectors(void)
 	return 0;
 }
 
-void *hyp_fixmap_map(phys_addr_t phys)
+static void *fixmap_map_slot(struct hyp_fixmap_slot *slot, phys_addr_t phys)
 {
-	struct hyp_fixmap_slot *slot = this_cpu_ptr(&fixmap_slots);
 	kvm_pte_t pte, *ptep = slot->ptep;
 
 	pte = *ptep;
@@ -243,10 +242,21 @@ void *hyp_fixmap_map(phys_addr_t phys)
 	return (void *)slot->addr;
 }
 
+void *hyp_fixmap_map(phys_addr_t phys)
+{
+	return fixmap_map_slot(this_cpu_ptr(&fixmap_slots), phys);
+}
+
 static void fixmap_clear_slot(struct hyp_fixmap_slot *slot)
 {
 	kvm_pte_t *ptep = slot->ptep;
 	u64 addr = slot->addr;
+	u32 level;
+
+	if (FIELD_GET(KVM_PTE_TYPE, *ptep) == KVM_PTE_TYPE_PAGE)
+		level = KVM_PGTABLE_LAST_LEVEL;
+	else
+		level = KVM_PGTABLE_LAST_LEVEL - 1; /* create_fixblock() guarantees PMD level */
 
 	WRITE_ONCE(*ptep, *ptep & ~KVM_PTE_VALID);
 
@@ -260,7 +270,7 @@ static void fixmap_clear_slot(struct hyp_fixmap_slot *slot)
 	 * https://lore.kernel.org/kvm/20221017115209.2099-1-will@kernel.org/T/#mf10dfbaf1eaef9274c581b81c53758918c1d0f03
 	 */
 	dsb(ishst);
-	__tlbi_level(vale2is, __TLBI_VADDR(addr, 0), KVM_PGTABLE_LAST_LEVEL);
+	__tlbi_level(vale2is, __TLBI_VADDR(addr, 0), level);
 	dsb(ish);
 	isb();
 }
@@ -273,9 +283,9 @@ void hyp_fixmap_unmap(void)
 static int __create_fixmap_slot_cb(const struct kvm_pgtable_visit_ctx *ctx,
 				   enum kvm_pgtable_walk_flags visit)
 {
-	struct hyp_fixmap_slot *slot = per_cpu_ptr(&fixmap_slots, (u64)ctx->arg);
+	struct hyp_fixmap_slot *slot = (struct hyp_fixmap_slot *)ctx->arg;
 
-	if (!kvm_pte_valid(ctx->old) || ctx->level != KVM_PGTABLE_LAST_LEVEL)
+	if (!kvm_pte_valid(ctx->old) || (ctx->end - ctx->start) != kvm_granule_size(ctx->level))
 		return -EINVAL;
 
 	slot->addr = ctx->addr;
@@ -296,13 +306,73 @@ static int create_fixmap_slot(u64 addr, u64 cpu)
 	struct kvm_pgtable_walker walker = {
 		.cb	= __create_fixmap_slot_cb,
 		.flags	= KVM_PGTABLE_WALK_LEAF,
-		.arg = (void *)cpu,
+		.arg = (void *)per_cpu_ptr(&fixmap_slots, cpu),
 	};
 
 	return kvm_pgtable_walk(&pkvm_pgtable, addr, PAGE_SIZE, &walker);
 }
 
-int hyp_create_pcpu_fixmap(void)
+#ifndef CONFIG_ARM64_64K_PAGES
+static struct hyp_fixmap_slot hyp_fixblock_slot;
+static DEFINE_HYP_SPINLOCK(hyp_fixblock_lock);
+
+void *hyp_fixblock_map(phys_addr_t phys)
+{
+	hyp_spin_lock(&hyp_fixblock_lock);
+	return fixmap_map_slot(&hyp_fixblock_slot, phys);
+}
+
+void hyp_fixblock_unmap(void)
+{
+	fixmap_clear_slot(&hyp_fixblock_slot);
+	hyp_spin_unlock(&hyp_fixblock_lock);
+}
+
+static int create_fixblock(void)
+{
+	struct kvm_pgtable_walker walker = {
+		.cb	= __create_fixmap_slot_cb,
+		.flags	= KVM_PGTABLE_WALK_LEAF,
+		.arg = (void *)&hyp_fixblock_slot,
+	};
+	unsigned long addr;
+	phys_addr_t phys;
+	int ret, i;
+
+	/* Find a RAM phys address, PMD aligned */
+	for (i = 0; i < hyp_memblock_nr; i++) {
+		phys = ALIGN(hyp_memory[i].base, PMD_SIZE);
+		if (phys + PMD_SIZE < (hyp_memory[i].base + hyp_memory[i].size))
+			break;
+	}
+
+	if (i >= hyp_memblock_nr)
+		return -EINVAL;
+
+	hyp_spin_lock(&pkvm_pgd_lock);
+	addr = ALIGN(__io_map_base, PMD_SIZE);
+	ret = __pkvm_alloc_private_va_range(addr, PMD_SIZE);
+	if (ret)
+		goto unlock;
+
+	ret = kvm_pgtable_hyp_map(&pkvm_pgtable, addr, PMD_SIZE, phys, PAGE_HYP);
+	if (ret)
+		goto unlock;
+
+	ret = kvm_pgtable_walk(&pkvm_pgtable, addr, PMD_SIZE, &walker);
+
+unlock:
+	hyp_spin_unlock(&pkvm_pgd_lock);
+
+	return ret;
+}
+#else
+void hyp_fixblock_unmap(void) { WARN_ON(1); }
+void *hyp_fixblock_map(phys_addr_t phys) { return NULL; }
+static int create_fixblock(void) { return 0; }
+#endif
+
+int hyp_create_fixmap(void)
 {
 	unsigned long addr, i;
 	int ret;
@@ -322,7 +392,7 @@ int hyp_create_pcpu_fixmap(void)
 			return ret;
 	}
 
-	return 0;
+	return create_fixblock();
 }
 
 int hyp_create_idmap(u32 hyp_va_bits)
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index c19860fc8183..a48d3f5a5afb 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -312,7 +312,7 @@ void __noreturn __pkvm_init_finalise(void)
 	if (ret)
 		goto out;
 
-	ret = hyp_create_pcpu_fixmap();
+	ret = hyp_create_fixmap();
 	if (ret)
 		goto out;
 
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index df5cc74a7dd0..c351b4abd5db 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -11,12 +11,6 @@
 #include <asm/kvm_pgtable.h>
 #include <asm/stage2_pgtable.h>
 
-
-#define KVM_PTE_TYPE			BIT(1)
-#define KVM_PTE_TYPE_BLOCK		0
-#define KVM_PTE_TYPE_PAGE		1
-#define KVM_PTE_TYPE_TABLE		1
-
 struct kvm_pgtable_walk_data {
 	struct kvm_pgtable_walker	*walker;
 
-- 
2.49.0.1015.ga840276032-goog



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 01/10] KVM: arm64: Handle huge mappings for np-guest CMOs
  2025-05-09 13:16 ` [PATCH v4 01/10] KVM: arm64: Handle huge mappings for np-guest CMOs Vincent Donnefort
@ 2025-05-16 12:15   ` Marc Zyngier
  2025-05-16 17:53     ` Vincent Donnefort
  0 siblings, 1 reply; 24+ messages in thread
From: Marc Zyngier @ 2025-05-16 12:15 UTC (permalink / raw)
  To: Vincent Donnefort
  Cc: oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
	catalin.marinas, will, qperret, linux-arm-kernel, kvmarm,
	linux-kernel, kernel-team

On Fri, 09 May 2025 14:16:57 +0100,
Vincent Donnefort <vdonnefort@google.com> wrote:
> 
> clean_dcache_guest_page() and invalidate_icache_guest_page() accept a
> size as an argument. But they also rely on fixmap, which can only map a
> single PAGE_SIZE page.
> 
> With the upcoming stage-2 huge mappings for pKVM np-guests, those
> callbacks will get size > PAGE_SIZE. Loop the CMOs on a PAGE_SIZE basis
> until the whole range is done.
> 
> Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
> 
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index 31173c694695..23544928a637 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -219,14 +219,28 @@ static void guest_s2_put_page(void *addr)
>  
>  static void clean_dcache_guest_page(void *va, size_t size)
>  {
> -	__clean_dcache_guest_page(hyp_fixmap_map(__hyp_pa(va)), size);
> -	hyp_fixmap_unmap();
> +	WARN_ON(!PAGE_ALIGNED(size));

What if "va" isn't aligned?

> +
> +	while (size) {
> +		__clean_dcache_guest_page(hyp_fixmap_map(__hyp_pa(va)),
> +					  PAGE_SIZE);
> +		hyp_fixmap_unmap();
> +		va += PAGE_SIZE;
> +		size -= PAGE_SIZE;
> +	}

I know pKVM dies on WARN, but this code "looks" unsafe. Can you align
va and size to be on page boundaries, so that we are 100% sure the
loop terminates?

>  }
>  
>  static void invalidate_icache_guest_page(void *va, size_t size)
>  {
> -	__invalidate_icache_guest_page(hyp_fixmap_map(__hyp_pa(va)), size);
> -	hyp_fixmap_unmap();
> +	WARN_ON(!PAGE_ALIGNED(size));
> +
> +	while (size) {
> +		__invalidate_icache_guest_page(hyp_fixmap_map(__hyp_pa(va)),
> +					       PAGE_SIZE);
> +		hyp_fixmap_unmap();
> +		va += PAGE_SIZE;
> +		size -= PAGE_SIZE;
> +	}

Same here.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 02/10] KVM: arm64: Introduce for_each_hyp_page
  2025-05-09 13:16 ` [PATCH v4 02/10] KVM: arm64: Introduce for_each_hyp_page Vincent Donnefort
@ 2025-05-16 12:57   ` Marc Zyngier
  2025-05-19 14:46     ` Quentin Perret
  0 siblings, 1 reply; 24+ messages in thread
From: Marc Zyngier @ 2025-05-16 12:57 UTC (permalink / raw)
  To: Vincent Donnefort
  Cc: oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
	catalin.marinas, will, qperret, linux-arm-kernel, kvmarm,
	linux-kernel, kernel-team

On Fri, 09 May 2025 14:16:58 +0100,
Vincent Donnefort <vdonnefort@google.com> wrote:
> 
> Add a helper to iterate over the hypervisor vmemmap. This will be
> particularly handy with the introduction of huge mapping support
> for the np-guest stage-2.
> 
> Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
> 
> diff --git a/arch/arm64/kvm/hyp/include/nvhe/memory.h b/arch/arm64/kvm/hyp/include/nvhe/memory.h
> index eb0c2ebd1743..676a0a13c741 100644
> --- a/arch/arm64/kvm/hyp/include/nvhe/memory.h
> +++ b/arch/arm64/kvm/hyp/include/nvhe/memory.h
> @@ -96,24 +96,24 @@ static inline struct hyp_page *hyp_phys_to_page(phys_addr_t phys)
>  #define hyp_page_to_virt(page)	__hyp_va(hyp_page_to_phys(page))
>  #define hyp_page_to_pool(page)	(((struct hyp_page *)page)->pool)
>  
> -static inline enum pkvm_page_state get_host_state(phys_addr_t phys)
> +static inline enum pkvm_page_state get_host_state(struct hyp_page *p)
>  {
> -	return (enum pkvm_page_state)hyp_phys_to_page(phys)->__host_state;
> +	return (enum pkvm_page_state)p->__host_state;

I'm not quite sure why we have this cast the first place. If we are so
keen on type consistency, why isn't __host_state an enum pkvm_page_state
the first place?

>  }
>  
> -static inline void set_host_state(phys_addr_t phys, enum pkvm_page_state state)
> +static inline void set_host_state(struct hyp_page *p, enum pkvm_page_state state)
>  {
> -	hyp_phys_to_page(phys)->__host_state = state;
> +	p->__host_state = state;
>  }
>  
> -static inline enum pkvm_page_state get_hyp_state(phys_addr_t phys)
> +static inline enum pkvm_page_state get_hyp_state(struct hyp_page *p)
>  {
> -	return hyp_phys_to_page(phys)->__hyp_state_comp ^ PKVM_PAGE_STATE_MASK;
> +	return p->__hyp_state_comp ^ PKVM_PAGE_STATE_MASK;

And we don't seem that bothered here.

>  }
>  
> -static inline void set_hyp_state(phys_addr_t phys, enum pkvm_page_state state)
> +static inline void set_hyp_state(struct hyp_page *p, enum pkvm_page_state state)
>  {
> -	hyp_phys_to_page(phys)->__hyp_state_comp = state ^ PKVM_PAGE_STATE_MASK;
> +	p->__hyp_state_comp = state ^ PKVM_PAGE_STATE_MASK;
>  }
>  
>  /*
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index 23544928a637..4d269210dae0 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -60,6 +60,9 @@ static void hyp_unlock_component(void)
>  	hyp_spin_unlock(&pkvm_pgd_lock);
>  }
>  
> +#define for_each_hyp_page(start, size, page)   \
> +	for (page = hyp_phys_to_page(start); page < hyp_phys_to_page((start) + (size)); page++)
> +

Huh. This really should be something like:

#define for_each_hyp_page(__p, __st, __sz)   			\
	for (struct hyp_page *__p = hyp_phys_to_page(__st),	\
	     *__e = __p + ((__sz) >> PAGE_SHIFT);		\
	     __p < __e; __p++)

so that:

- the iterator variable comes first and looks like a normal for-loop
- the iterator has a scope limited to the loop
- we don't evaluate parameters multiple times
- we don't evaluate the end of the loop multiple times
- we try not to reuse common variable names

>  static void *host_s2_zalloc_pages_exact(size_t size)
>  {
>  	void *addr = hyp_alloc_pages(&host_s2_pool, get_order(size));
> @@ -481,7 +484,8 @@ static int host_stage2_adjust_range(u64 addr, struct kvm_mem_range *range)
>  		return -EAGAIN;
>  
>  	if (pte) {
> -		WARN_ON(addr_is_memory(addr) && get_host_state(addr) != PKVM_NOPAGE);
> +		WARN_ON(addr_is_memory(addr) &&
> +			get_host_state(hyp_phys_to_page(addr)) != PKVM_NOPAGE);
>  		return -EPERM;
>  	}
>  
> @@ -507,10 +511,10 @@ int host_stage2_idmap_locked(phys_addr_t addr, u64 size,
>  
>  static void __host_update_page_state(phys_addr_t addr, u64 size, enum pkvm_page_state state)
>  {
> -	phys_addr_t end = addr + size;
> +	struct hyp_page *page;

and then these extra declarations can go.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 03/10] KVM: arm64: Add a range to __pkvm_host_share_guest()
  2025-05-09 13:16 ` [PATCH v4 03/10] KVM: arm64: Add a range to __pkvm_host_share_guest() Vincent Donnefort
@ 2025-05-16 13:10   ` Marc Zyngier
  0 siblings, 0 replies; 24+ messages in thread
From: Marc Zyngier @ 2025-05-16 13:10 UTC (permalink / raw)
  To: Vincent Donnefort
  Cc: oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
	catalin.marinas, will, qperret, linux-arm-kernel, kvmarm,
	linux-kernel, kernel-team

On Fri, 09 May 2025 14:16:59 +0100,
Vincent Donnefort <vdonnefort@google.com> wrote:
> 
> In preparation for supporting stage-2 huge mappings for np-guest. Add a
> nr_pages argument to the __pkvm_host_share_guest hypercall. This range
> supports only two values: 1 or PMD_SIZE / PAGE_SIZE (that is 512 on a
> 4K-pages system).
> 
> Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
> 
> diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> index 26016eb9323f..47aa7b01114f 100644
> --- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> +++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> @@ -39,7 +39,7 @@ int __pkvm_host_donate_hyp(u64 pfn, u64 nr_pages);
>  int __pkvm_hyp_donate_host(u64 pfn, u64 nr_pages);
>  int __pkvm_host_share_ffa(u64 pfn, u64 nr_pages);
>  int __pkvm_host_unshare_ffa(u64 pfn, u64 nr_pages);
> -int __pkvm_host_share_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu,
> +int __pkvm_host_share_guest(u64 pfn, u64 gfn, u64 nr_pages, struct pkvm_hyp_vcpu *vcpu,
>  			    enum kvm_pgtable_prot prot);
>  int __pkvm_host_unshare_guest(u64 gfn, struct pkvm_hyp_vm *hyp_vm);
>  int __pkvm_host_relax_perms_guest(u64 gfn, struct pkvm_hyp_vcpu *vcpu, enum kvm_pgtable_prot prot);
> diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> index 59db9606e6e1..4d3d215955c3 100644
> --- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> +++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
> @@ -245,7 +245,8 @@ static void handle___pkvm_host_share_guest(struct kvm_cpu_context *host_ctxt)
>  {
>  	DECLARE_REG(u64, pfn, host_ctxt, 1);
>  	DECLARE_REG(u64, gfn, host_ctxt, 2);
> -	DECLARE_REG(enum kvm_pgtable_prot, prot, host_ctxt, 3);
> +	DECLARE_REG(u64, nr_pages, host_ctxt, 3);
> +	DECLARE_REG(enum kvm_pgtable_prot, prot, host_ctxt, 4);
>  	struct pkvm_hyp_vcpu *hyp_vcpu;
>  	int ret = -EINVAL;
>  
> @@ -260,7 +261,7 @@ static void handle___pkvm_host_share_guest(struct kvm_cpu_context *host_ctxt)
>  	if (ret)
>  		goto out;
>  
> -	ret = __pkvm_host_share_guest(pfn, gfn, hyp_vcpu, prot);
> +	ret = __pkvm_host_share_guest(pfn, gfn, nr_pages, hyp_vcpu, prot);
>  out:
>  	cpu_reg(host_ctxt, 1) =  ret;
>  }
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index 4d269210dae0..f0f7c6f83e57 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -696,10 +696,9 @@ static enum pkvm_page_state guest_get_page_state(kvm_pte_t pte, u64 addr)
>  	return pkvm_getstate(kvm_pgtable_stage2_pte_prot(pte));
>  }
>  
> -static int __guest_check_page_state_range(struct pkvm_hyp_vcpu *vcpu, u64 addr,
> +static int __guest_check_page_state_range(struct pkvm_hyp_vm *vm, u64 addr,
>  					  u64 size, enum pkvm_page_state state)
>  {
> -	struct pkvm_hyp_vm *vm = pkvm_hyp_vcpu_to_hyp_vm(vcpu);
>  	struct check_walk_data d = {
>  		.desired	= state,
>  		.get_page_state	= guest_get_page_state,
> @@ -908,48 +907,81 @@ int __pkvm_host_unshare_ffa(u64 pfn, u64 nr_pages)
>  	return ret;
>  }
>  
> -int __pkvm_host_share_guest(u64 pfn, u64 gfn, struct pkvm_hyp_vcpu *vcpu,
> +static int __guest_check_transition_size(u64 phys, u64 ipa, u64 nr_pages, u64 *size)
> +{
> +	if (nr_pages == 1) {
> +		*size = PAGE_SIZE;
> +		return 0;
> +	}
> +
> +	/* We solely support PMD_SIZE huge-pages */
> +	if (nr_pages != (1 << (PMD_SHIFT - PAGE_SHIFT)))
> +		return -EINVAL;

I'm not really keen on the whole PxD nomenclature. What we really care
about is a mapping level (level 2 in this instance). Can we instead
use kvm_granule_size()?  Something like:

	if ((nr_page * PAGE_SIZE) != kvm_granule_size(2))
		return -EINVAL;

> +
> +	if (!IS_ALIGNED(phys | ipa, PMD_SIZE))
> +		return -EINVAL;
> +
> +	*size = PMD_SIZE;

Similar things here. But also, should this level-2 block checking be
moved to the patch that actually allows block mapping?

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 07/10] KVM: arm64: Convert pkvm_mappings to interval tree
  2025-05-09 13:17 ` [PATCH v4 07/10] KVM: arm64: Convert pkvm_mappings to interval tree Vincent Donnefort
@ 2025-05-16 13:15   ` Marc Zyngier
  2025-05-19 14:22     ` Quentin Perret
  0 siblings, 1 reply; 24+ messages in thread
From: Marc Zyngier @ 2025-05-16 13:15 UTC (permalink / raw)
  To: Vincent Donnefort
  Cc: oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
	catalin.marinas, will, qperret, linux-arm-kernel, kvmarm,
	linux-kernel, kernel-team

On Fri, 09 May 2025 14:17:03 +0100,
Vincent Donnefort <vdonnefort@google.com> wrote:
> 
> From: Quentin Perret <qperret@google.com>
> 
> In preparation for supporting stage-2 huge mappings for np-guest, let's
> convert pgt.pkvm_mappings to an interval tree.
> 
> No functional change intended.
> 
> Suggested-by: Vincent Donnefort <vdonnefort@google.com>
> Signed-off-by: Quentin Perret <qperret@google.com>
> Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
> 
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index 6b9d274052c7..1b43bcd2a679 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -413,7 +413,7 @@ static inline bool kvm_pgtable_walk_lock_held(void)
>   */
>  struct kvm_pgtable {
>  	union {
> -		struct rb_root					pkvm_mappings;
> +		struct rb_root_cached				pkvm_mappings;
>  		struct {
>  			u32					ia_bits;
>  			s8					start_level;
> diff --git a/arch/arm64/include/asm/kvm_pkvm.h b/arch/arm64/include/asm/kvm_pkvm.h
> index d91bfcf2db56..da75d41c948c 100644
> --- a/arch/arm64/include/asm/kvm_pkvm.h
> +++ b/arch/arm64/include/asm/kvm_pkvm.h
> @@ -173,6 +173,7 @@ struct pkvm_mapping {
>  	struct rb_node node;
>  	u64 gfn;
>  	u64 pfn;
> +	u64 __subtree_last;	/* Internal member for interval tree */
>  };
>  
>  int pkvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
> diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
> index 057874bbe3e1..6febddbec69e 100644
> --- a/arch/arm64/kvm/pkvm.c
> +++ b/arch/arm64/kvm/pkvm.c
> @@ -5,6 +5,7 @@
>   */
>  
>  #include <linux/init.h>
> +#include <linux/interval_tree_generic.h>
>  #include <linux/kmemleak.h>
>  #include <linux/kvm_host.h>
>  #include <asm/kvm_mmu.h>
> @@ -256,80 +257,63 @@ static int __init finalize_pkvm(void)
>  }
>  device_initcall_sync(finalize_pkvm);
>  
> -static int cmp_mappings(struct rb_node *node, const struct rb_node *parent)
> +static u64 __pkvm_mapping_start(struct pkvm_mapping *m)
>  {
> -	struct pkvm_mapping *a = rb_entry(node, struct pkvm_mapping, node);
> -	struct pkvm_mapping *b = rb_entry(parent, struct pkvm_mapping, node);
> -
> -	if (a->gfn < b->gfn)
> -		return -1;
> -	if (a->gfn > b->gfn)
> -		return 1;
> -	return 0;
> +	return m->gfn * PAGE_SIZE;
>  }
>  
> -static struct rb_node *find_first_mapping_node(struct rb_root *root, u64 gfn)
> +static u64 __pkvm_mapping_end(struct pkvm_mapping *m)
>  {
> -	struct rb_node *node = root->rb_node, *prev = NULL;
> -	struct pkvm_mapping *mapping;
> -
> -	while (node) {
> -		mapping = rb_entry(node, struct pkvm_mapping, node);
> -		if (mapping->gfn == gfn)
> -			return node;
> -		prev = node;
> -		node = (gfn < mapping->gfn) ? node->rb_left : node->rb_right;
> -	}
> -
> -	return prev;
> +	return (m->gfn + 1) * PAGE_SIZE - 1;
>  }
>  
> -/*
> - * __tmp is updated to rb_next(__tmp) *before* entering the body of the loop to allow freeing
> - * of __map inline.
> - */
> +INTERVAL_TREE_DEFINE(struct pkvm_mapping, node, u64, __subtree_last,
> +		     __pkvm_mapping_start, __pkvm_mapping_end, static,
> +		     pkvm_mapping);
> +
>  #define for_each_mapping_in_range_safe(__pgt, __start, __end, __map)				\
> -	for (struct rb_node *__tmp = find_first_mapping_node(&(__pgt)->pkvm_mappings,		\
> -							     ((__start) >> PAGE_SHIFT));	\
> +	for (struct pkvm_mapping *__tmp = pkvm_mapping_iter_first(&(__pgt)->pkvm_mappings,	\
> +								  __start, __end - 1);		\
>  	     __tmp && ({									\
> -				__map = rb_entry(__tmp, struct pkvm_mapping, node);		\
> -				__tmp = rb_next(__tmp);						\
> +				__map = __tmp;							\
> +				__tmp = pkvm_mapping_iter_next(__map, __start, __end - 1);	\
>  				true;								\
>  		       });									\
> -	    )											\
> -		if (__map->gfn < ((__start) >> PAGE_SHIFT))					\
> -			continue;								\
> -		else if (__map->gfn >= ((__end) >> PAGE_SHIFT))					\
> -			break;									\
> -		else
> +	    )

The removal of the comment worries me a bit. Is this iterator still
safe wrt freeing of the iterator in the loop?

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 09/10] KVM: arm64: Stage-2 huge mappings for np-guests
  2025-05-09 13:17 ` [PATCH v4 09/10] KVM: arm64: Stage-2 huge mappings for np-guests Vincent Donnefort
@ 2025-05-16 13:28   ` Marc Zyngier
  2025-05-19 14:34     ` Quentin Perret
  0 siblings, 1 reply; 24+ messages in thread
From: Marc Zyngier @ 2025-05-16 13:28 UTC (permalink / raw)
  To: Vincent Donnefort
  Cc: oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
	catalin.marinas, will, qperret, linux-arm-kernel, kvmarm,
	linux-kernel, kernel-team

On Fri, 09 May 2025 14:17:05 +0100,
Vincent Donnefort <vdonnefort@google.com> wrote:
> 
> Now np-guests hypercalls with range are supported, we can let the
> hypervisor to install block mappings whenever the Stage-1 allows it,
> that is when backed by either Hugetlbfs or THPs. The size of those block
> mappings is limited to PMD_SIZE.
> 
> Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
> 
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index 78fb9cea2034..97e0fea9db4e 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -167,7 +167,7 @@ int kvm_host_prepare_stage2(void *pgt_pool_base)
>  static bool guest_stage2_force_pte_cb(u64 addr, u64 end,
>  				      enum kvm_pgtable_prot prot)
>  {
> -	return true;
> +	return false;
>  }

Can we get rid of this callback now? And of the .force_pte_cb field in
the kvm_pgtable struct?

>  
>  static void *guest_s2_zalloc_pages_exact(size_t size)
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 754f2fe0cc67..7c8be22e81f9 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1537,7 +1537,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	 * logging_active is guaranteed to never be true for VM_PFNMAP
>  	 * memslots.
>  	 */
> -	if (logging_active || is_protected_kvm_enabled()) {
> +	if (logging_active) {
>  		force_pte = true;
>  		vma_shift = PAGE_SHIFT;
>  	} else {
> @@ -1547,7 +1547,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  	switch (vma_shift) {
>  #ifndef __PAGETABLE_PMD_FOLDED
>  	case PUD_SHIFT:
> -		if (fault_supports_stage2_huge_mapping(memslot, hva, PUD_SIZE))
> +		if (!is_protected_kvm_enabled() &&
> +		    fault_supports_stage2_huge_mapping(memslot, hva, PUD_SIZE))

Can you move this new condition into the fault_supports...() helper instead?

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 10/10] KVM: arm64: np-guest CMOs with PMD_SIZE fixmap
  2025-05-09 13:17 ` [PATCH v4 10/10] KVM: arm64: np-guest CMOs with PMD_SIZE fixmap Vincent Donnefort
@ 2025-05-16 13:55   ` Marc Zyngier
  2025-05-16 18:03     ` Vincent Donnefort
  0 siblings, 1 reply; 24+ messages in thread
From: Marc Zyngier @ 2025-05-16 13:55 UTC (permalink / raw)
  To: Vincent Donnefort
  Cc: oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
	catalin.marinas, will, qperret, linux-arm-kernel, kvmarm,
	linux-kernel, kernel-team

On Fri, 09 May 2025 14:17:06 +0100,
Vincent Donnefort <vdonnefort@google.com> wrote:
> 
> With the introduction of stage-2 huge mappings in the pKVM hypervisor,
> guest pages CMO is needed for PMD_SIZE size. Fixmap only supports
> PAGE_SIZE and iterating over the huge-page is time consuming (mostly due
> to TLBI on hyp_fixmap_unmap) which is a problem for EL2 latency.
> 
> Introduce a shared PMD_SIZE fixmap (hyp_fixblock_map/hyp_fixblock_unmap)
> to improve guest page CMOs when stage-2 huge mappings are installed.
> 
> On a Pixel6, the iterative solution resulted in a latency of ~700us,
> while the PMD_SIZE fixmap reduces it to ~100us.
> 
> Because of the horrendous private range allocation that would be
> necessary, this is disabled for 64KiB pages systems.
> 
> Suggested-by: Quentin Perret <qperret@google.com>
> Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
> Signed-off-by: Quentin Perret <qperret@google.com>
> 
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index 1b43bcd2a679..2888b5d03757 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -59,6 +59,11 @@ typedef u64 kvm_pte_t;
>  
>  #define KVM_PHYS_INVALID		(-1ULL)
>  
> +#define KVM_PTE_TYPE			BIT(1)
> +#define KVM_PTE_TYPE_BLOCK		0
> +#define KVM_PTE_TYPE_PAGE		1
> +#define KVM_PTE_TYPE_TABLE		1
> +
>  #define KVM_PTE_LEAF_ATTR_LO		GENMASK(11, 2)
>  
>  #define KVM_PTE_LEAF_ATTR_LO_S1_ATTRIDX	GENMASK(4, 2)
> diff --git a/arch/arm64/kvm/hyp/include/nvhe/mm.h b/arch/arm64/kvm/hyp/include/nvhe/mm.h
> index 230e4f2527de..b0c72bc2d5ba 100644
> --- a/arch/arm64/kvm/hyp/include/nvhe/mm.h
> +++ b/arch/arm64/kvm/hyp/include/nvhe/mm.h
> @@ -13,9 +13,11 @@
>  extern struct kvm_pgtable pkvm_pgtable;
>  extern hyp_spinlock_t pkvm_pgd_lock;
>  
> -int hyp_create_pcpu_fixmap(void);
> +int hyp_create_fixmap(void);
>  void *hyp_fixmap_map(phys_addr_t phys);
>  void hyp_fixmap_unmap(void);
> +void *hyp_fixblock_map(phys_addr_t phys);
> +void hyp_fixblock_unmap(void);
>  
>  int hyp_create_idmap(u32 hyp_va_bits);
>  int hyp_map_vectors(void);
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index 97e0fea9db4e..9f3ffa4e0690 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -220,16 +220,52 @@ static void guest_s2_put_page(void *addr)
>  	hyp_put_page(&current_vm->pool, addr);
>  }
>  
> +static void *__fixmap_guest_page(void *va, size_t *size)
> +{
> +	if (IS_ALIGNED(*size, PMD_SIZE)) {
> +		void *addr = hyp_fixblock_map(__hyp_pa(va));
> +
> +		if (addr)
> +			return addr;
> +
> +		*size = PAGE_SIZE;
> +	}
> +
> +	if (IS_ALIGNED(*size, PAGE_SIZE))
> +		return hyp_fixmap_map(__hyp_pa(va));
> +
> +	WARN_ON(1);
> +
> +	return NULL;
> +}
> +
> +static void __fixunmap_guest_page(size_t size)
> +{
> +	switch (size) {
> +	case PAGE_SIZE:
> +		hyp_fixmap_unmap();
> +		break;
> +	case PMD_SIZE:
> +		hyp_fixblock_unmap();
> +		break;
> +	default:
> +		WARN_ON(1);
> +	}

This is pretty ugly. How can we end-up there the first place? I'd
rather you make sure we can't reach this default path at all. See also
towards the end of this patch (tl;dr: hyp_fixblock_unmap() should
never explode).

> +}
> +
>  static void clean_dcache_guest_page(void *va, size_t size)
>  {
>  	WARN_ON(!PAGE_ALIGNED(size));
>  
>  	while (size) {
> -		__clean_dcache_guest_page(hyp_fixmap_map(__hyp_pa(va)),
> -					  PAGE_SIZE);
> -		hyp_fixmap_unmap();
> -		va += PAGE_SIZE;
> -		size -= PAGE_SIZE;
> +		size_t fixmap_size = size == PMD_SIZE ? size : PAGE_SIZE;
> +		void *addr = __fixmap_guest_page(va, &fixmap_size);
> +
> +		__clean_dcache_guest_page(addr, fixmap_size);
> +		__fixunmap_guest_page(fixmap_size);
> +
> +		size -= fixmap_size;
> +		va += fixmap_size;

Can this ever be called with a *multiple* of PMD_SIZE? In this case
you'd still end-up doing PAGE_SIZEd-bite CMOs until there is only
PMD_SIZE left, ruining the optimisation.

I think this needs fixing.

>  	}
>  }
>  
> @@ -238,11 +274,14 @@ static void invalidate_icache_guest_page(void *va, size_t size)
>  	WARN_ON(!PAGE_ALIGNED(size));
>  
>  	while (size) {
> -		__invalidate_icache_guest_page(hyp_fixmap_map(__hyp_pa(va)),
> -					       PAGE_SIZE);
> -		hyp_fixmap_unmap();
> -		va += PAGE_SIZE;
> -		size -= PAGE_SIZE;
> +		size_t fixmap_size = size == PMD_SIZE ? size : PAGE_SIZE;
> +		void *addr = __fixmap_guest_page(va, &fixmap_size);
> +
> +		__invalidate_icache_guest_page(addr, fixmap_size);
> +		__fixunmap_guest_page(fixmap_size);
> +
> +		size -= fixmap_size;
> +		va += fixmap_size;
>  	}
>  }
>  
> diff --git a/arch/arm64/kvm/hyp/nvhe/mm.c b/arch/arm64/kvm/hyp/nvhe/mm.c
> index f41c7440b34b..e3b1bece8504 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mm.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mm.c
> @@ -229,9 +229,8 @@ int hyp_map_vectors(void)
>  	return 0;
>  }
>  
> -void *hyp_fixmap_map(phys_addr_t phys)
> +static void *fixmap_map_slot(struct hyp_fixmap_slot *slot, phys_addr_t phys)
>  {
> -	struct hyp_fixmap_slot *slot = this_cpu_ptr(&fixmap_slots);
>  	kvm_pte_t pte, *ptep = slot->ptep;
>  
>  	pte = *ptep;
> @@ -243,10 +242,21 @@ void *hyp_fixmap_map(phys_addr_t phys)
>  	return (void *)slot->addr;
>  }
>  
> +void *hyp_fixmap_map(phys_addr_t phys)
> +{
> +	return fixmap_map_slot(this_cpu_ptr(&fixmap_slots), phys);
> +}
> +
>  static void fixmap_clear_slot(struct hyp_fixmap_slot *slot)
>  {
>  	kvm_pte_t *ptep = slot->ptep;
>  	u64 addr = slot->addr;
> +	u32 level;
> +
> +	if (FIELD_GET(KVM_PTE_TYPE, *ptep) == KVM_PTE_TYPE_PAGE)
> +		level = KVM_PGTABLE_LAST_LEVEL;
> +	else
> +		level = KVM_PGTABLE_LAST_LEVEL - 1; /* create_fixblock() guarantees PMD level */

Seeing this, (KVM_PGTABLE_LAST_LEVEL - 1) looks nicee than the "2" I
suggested in one of the previous patches.

>
>  	WRITE_ONCE(*ptep, *ptep & ~KVM_PTE_VALID);
>  
> @@ -260,7 +270,7 @@ static void fixmap_clear_slot(struct hyp_fixmap_slot *slot)
>  	 * https://lore.kernel.org/kvm/20221017115209.2099-1-will@kernel.org/T/#mf10dfbaf1eaef9274c581b81c53758918c1d0f03
>  	 */
>  	dsb(ishst);
> -	__tlbi_level(vale2is, __TLBI_VADDR(addr, 0), KVM_PGTABLE_LAST_LEVEL);
> +	__tlbi_level(vale2is, __TLBI_VADDR(addr, 0), level);
>  	dsb(ish);
>  	isb();
>  }
> @@ -273,9 +283,9 @@ void hyp_fixmap_unmap(void)
>  static int __create_fixmap_slot_cb(const struct kvm_pgtable_visit_ctx *ctx,
>  				   enum kvm_pgtable_walk_flags visit)
>  {
> -	struct hyp_fixmap_slot *slot = per_cpu_ptr(&fixmap_slots, (u64)ctx->arg);
> +	struct hyp_fixmap_slot *slot = (struct hyp_fixmap_slot *)ctx->arg;
>  
> -	if (!kvm_pte_valid(ctx->old) || ctx->level != KVM_PGTABLE_LAST_LEVEL)
> +	if (!kvm_pte_valid(ctx->old) || (ctx->end - ctx->start) != kvm_granule_size(ctx->level))
>  		return -EINVAL;
>  
>  	slot->addr = ctx->addr;
> @@ -296,13 +306,73 @@ static int create_fixmap_slot(u64 addr, u64 cpu)
>  	struct kvm_pgtable_walker walker = {
>  		.cb	= __create_fixmap_slot_cb,
>  		.flags	= KVM_PGTABLE_WALK_LEAF,
> -		.arg = (void *)cpu,
> +		.arg = (void *)per_cpu_ptr(&fixmap_slots, cpu),

Do you really need this cast?

>  	};
>  
>  	return kvm_pgtable_walk(&pkvm_pgtable, addr, PAGE_SIZE, &walker);
>  }
>  
> -int hyp_create_pcpu_fixmap(void)
> +#ifndef CONFIG_ARM64_64K_PAGES

I don't have much faith in this symbol. We have changed the config
stuff so often over the years that I wouldn't trust it long term.

Using something like PAGE_SIZE or PAGE_SHIFT is likely to be more
robust.

> +static struct hyp_fixmap_slot hyp_fixblock_slot;
> +static DEFINE_HYP_SPINLOCK(hyp_fixblock_lock);
> +
> +void *hyp_fixblock_map(phys_addr_t phys)
> +{
> +	hyp_spin_lock(&hyp_fixblock_lock);
> +	return fixmap_map_slot(&hyp_fixblock_slot, phys);
> +}
> +
> +void hyp_fixblock_unmap(void)
> +{
> +	fixmap_clear_slot(&hyp_fixblock_slot);
> +	hyp_spin_unlock(&hyp_fixblock_lock);
> +}
> +
> +static int create_fixblock(void)
> +{
> +	struct kvm_pgtable_walker walker = {
> +		.cb	= __create_fixmap_slot_cb,
> +		.flags	= KVM_PGTABLE_WALK_LEAF,
> +		.arg = (void *)&hyp_fixblock_slot,
> +	};
> +	unsigned long addr;
> +	phys_addr_t phys;
> +	int ret, i;
> +
> +	/* Find a RAM phys address, PMD aligned */
> +	for (i = 0; i < hyp_memblock_nr; i++) {
> +		phys = ALIGN(hyp_memory[i].base, PMD_SIZE);
> +		if (phys + PMD_SIZE < (hyp_memory[i].base + hyp_memory[i].size))
> +			break;
> +	}
> +
> +	if (i >= hyp_memblock_nr)
> +		return -EINVAL;
> +
> +	hyp_spin_lock(&pkvm_pgd_lock);
> +	addr = ALIGN(__io_map_base, PMD_SIZE);
> +	ret = __pkvm_alloc_private_va_range(addr, PMD_SIZE);
> +	if (ret)
> +		goto unlock;
> +
> +	ret = kvm_pgtable_hyp_map(&pkvm_pgtable, addr, PMD_SIZE, phys, PAGE_HYP);
> +	if (ret)
> +		goto unlock;
> +
> +	ret = kvm_pgtable_walk(&pkvm_pgtable, addr, PMD_SIZE, &walker);
> +
> +unlock:
> +	hyp_spin_unlock(&pkvm_pgd_lock);
> +
> +	return ret;
> +}
> +#else
> +void hyp_fixblock_unmap(void) { WARN_ON(1); }
> +void *hyp_fixblock_map(phys_addr_t phys) { return NULL; }
> +static int create_fixblock(void) { return 0; }
> +#endif

I can't say I like this. Can't you have a fallback that does the
iteration rather than these placeholders that are only there to make
things catch fire?

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 01/10] KVM: arm64: Handle huge mappings for np-guest CMOs
  2025-05-16 12:15   ` Marc Zyngier
@ 2025-05-16 17:53     ` Vincent Donnefort
  2025-05-17  8:51       ` Marc Zyngier
  0 siblings, 1 reply; 24+ messages in thread
From: Vincent Donnefort @ 2025-05-16 17:53 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
	catalin.marinas, will, qperret, linux-arm-kernel, kvmarm,
	linux-kernel, kernel-team

Hi,

Thanks for having a look at the series.

On Fri, May 16, 2025 at 01:15:00PM +0100, Marc Zyngier wrote:
> On Fri, 09 May 2025 14:16:57 +0100,
> Vincent Donnefort <vdonnefort@google.com> wrote:
> > 
> > clean_dcache_guest_page() and invalidate_icache_guest_page() accept a
> > size as an argument. But they also rely on fixmap, which can only map a
> > single PAGE_SIZE page.
> > 
> > With the upcoming stage-2 huge mappings for pKVM np-guests, those
> > callbacks will get size > PAGE_SIZE. Loop the CMOs on a PAGE_SIZE basis
> > until the whole range is done.
> > 
> > Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
> > 
> > diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > index 31173c694695..23544928a637 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > @@ -219,14 +219,28 @@ static void guest_s2_put_page(void *addr)
> >  
> >  static void clean_dcache_guest_page(void *va, size_t size)
> >  {
> > -	__clean_dcache_guest_page(hyp_fixmap_map(__hyp_pa(va)), size);
> > -	hyp_fixmap_unmap();
> > +	WARN_ON(!PAGE_ALIGNED(size));
> 
> What if "va" isn't aligned?

So the only callers are either for PAGE_SIZE or PMD_SIZE with the right
alignment addr alignment.

But happy to make this more future-proof, after all an ALIGN() is quite cheap.

> 
> > +
> > +	while (size) {
> > +		__clean_dcache_guest_page(hyp_fixmap_map(__hyp_pa(va)),
> > +					  PAGE_SIZE);
> > +		hyp_fixmap_unmap();
> > +		va += PAGE_SIZE;
> > +		size -= PAGE_SIZE;
> > +	}
> 
> I know pKVM dies on WARN, but this code "looks" unsafe. Can you align
> va and size to be on page boundaries, so that we are 100% sure the
> loop terminates?
> 
> >  }
> >  
> >  static void invalidate_icache_guest_page(void *va, size_t size)
> >  {
> > -	__invalidate_icache_guest_page(hyp_fixmap_map(__hyp_pa(va)), size);
> > -	hyp_fixmap_unmap();
> > +	WARN_ON(!PAGE_ALIGNED(size));
> > +
> > +	while (size) {
> > +		__invalidate_icache_guest_page(hyp_fixmap_map(__hyp_pa(va)),
> > +					       PAGE_SIZE);
> > +		hyp_fixmap_unmap();
> > +		va += PAGE_SIZE;
> > +		size -= PAGE_SIZE;
> > +	}
> 
> Same here.
> 
> Thanks,
> 
> 	M.
> 
> -- 
> Without deviation from the norm, progress is not possible.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 10/10] KVM: arm64: np-guest CMOs with PMD_SIZE fixmap
  2025-05-16 13:55   ` Marc Zyngier
@ 2025-05-16 18:03     ` Vincent Donnefort
  2025-05-17  8:53       ` Marc Zyngier
  0 siblings, 1 reply; 24+ messages in thread
From: Vincent Donnefort @ 2025-05-16 18:03 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
	catalin.marinas, will, qperret, linux-arm-kernel, kvmarm,
	linux-kernel, kernel-team

> > +}
> > +
> >  static void clean_dcache_guest_page(void *va, size_t size)
> >  {
> >  	WARN_ON(!PAGE_ALIGNED(size));
> >  
> >  	while (size) {
> > -		__clean_dcache_guest_page(hyp_fixmap_map(__hyp_pa(va)),
> > -					  PAGE_SIZE);
> > -		hyp_fixmap_unmap();
> > -		va += PAGE_SIZE;
> > -		size -= PAGE_SIZE;
> > +		size_t fixmap_size = size == PMD_SIZE ? size : PAGE_SIZE;
> > +		void *addr = __fixmap_guest_page(va, &fixmap_size);
> > +
> > +		__clean_dcache_guest_page(addr, fixmap_size);
> > +		__fixunmap_guest_page(fixmap_size);
> > +
> > +		size -= fixmap_size;
> > +		va += fixmap_size;
> 
> Can this ever be called with a *multiple* of PMD_SIZE? In this case
> you'd still end-up doing PAGE_SIZEd-bite CMOs until there is only
> PMD_SIZE left, ruining the optimisation.
> 
> I think this needs fixing.

So this can be only called with size either equal to PAGE_SIZE or PMD_SIZE. I
wasn't sure if it was worth to make it more generic than it needs.

But like for the first patch, I can make it more future-proof by handling size >
PMD_SIZE.

> 
> >  	}
> >  }
> >

[...]


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 01/10] KVM: arm64: Handle huge mappings for np-guest CMOs
  2025-05-16 17:53     ` Vincent Donnefort
@ 2025-05-17  8:51       ` Marc Zyngier
  0 siblings, 0 replies; 24+ messages in thread
From: Marc Zyngier @ 2025-05-17  8:51 UTC (permalink / raw)
  To: Vincent Donnefort
  Cc: oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
	catalin.marinas, will, qperret, linux-arm-kernel, kvmarm,
	linux-kernel, kernel-team

On Fri, 16 May 2025 18:53:27 +0100,
Vincent Donnefort <vdonnefort@google.com> wrote:
> 
> Hi,
> 
> Thanks for having a look at the series.
> 
> On Fri, May 16, 2025 at 01:15:00PM +0100, Marc Zyngier wrote:
> > On Fri, 09 May 2025 14:16:57 +0100,
> > Vincent Donnefort <vdonnefort@google.com> wrote:
> > > 
> > > clean_dcache_guest_page() and invalidate_icache_guest_page() accept a
> > > size as an argument. But they also rely on fixmap, which can only map a
> > > single PAGE_SIZE page.
> > > 
> > > With the upcoming stage-2 huge mappings for pKVM np-guests, those
> > > callbacks will get size > PAGE_SIZE. Loop the CMOs on a PAGE_SIZE basis
> > > until the whole range is done.
> > > 
> > > Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
> > > 
> > > diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > > index 31173c694695..23544928a637 100644
> > > --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > > +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > > @@ -219,14 +219,28 @@ static void guest_s2_put_page(void *addr)
> > >  
> > >  static void clean_dcache_guest_page(void *va, size_t size)
> > >  {
> > > -	__clean_dcache_guest_page(hyp_fixmap_map(__hyp_pa(va)), size);
> > > -	hyp_fixmap_unmap();
> > > +	WARN_ON(!PAGE_ALIGNED(size));
> > 
> > What if "va" isn't aligned?
> 
> So the only callers are either for PAGE_SIZE or PMD_SIZE with the right
> alignment addr alignment.
> 
> But happy to make this more future-proof, after all an ALIGN() is quite cheap.

Exactly. I'd rather have too many of those instead of something that
may not terminate. If anything, it makes it easy to reason about it.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 10/10] KVM: arm64: np-guest CMOs with PMD_SIZE fixmap
  2025-05-16 18:03     ` Vincent Donnefort
@ 2025-05-17  8:53       ` Marc Zyngier
  0 siblings, 0 replies; 24+ messages in thread
From: Marc Zyngier @ 2025-05-17  8:53 UTC (permalink / raw)
  To: Vincent Donnefort
  Cc: oliver.upton, joey.gouly, suzuki.poulose, yuzenghui,
	catalin.marinas, will, qperret, linux-arm-kernel, kvmarm,
	linux-kernel, kernel-team

On Fri, 16 May 2025 19:03:14 +0100,
Vincent Donnefort <vdonnefort@google.com> wrote:
> 
> > > +}
> > > +
> > >  static void clean_dcache_guest_page(void *va, size_t size)
> > >  {
> > >  	WARN_ON(!PAGE_ALIGNED(size));
> > >  
> > >  	while (size) {
> > > -		__clean_dcache_guest_page(hyp_fixmap_map(__hyp_pa(va)),
> > > -					  PAGE_SIZE);
> > > -		hyp_fixmap_unmap();
> > > -		va += PAGE_SIZE;
> > > -		size -= PAGE_SIZE;
> > > +		size_t fixmap_size = size == PMD_SIZE ? size : PAGE_SIZE;
> > > +		void *addr = __fixmap_guest_page(va, &fixmap_size);
> > > +
> > > +		__clean_dcache_guest_page(addr, fixmap_size);
> > > +		__fixunmap_guest_page(fixmap_size);
> > > +
> > > +		size -= fixmap_size;
> > > +		va += fixmap_size;
> > 
> > Can this ever be called with a *multiple* of PMD_SIZE? In this case
> > you'd still end-up doing PAGE_SIZEd-bite CMOs until there is only
> > PMD_SIZE left, ruining the optimisation.
> > 
> > I think this needs fixing.
> 
> So this can be only called with size either equal to PAGE_SIZE or PMD_SIZE. I
> wasn't sure if it was worth to make it more generic than it needs.
> 
> But like for the first patch, I can make it more future-proof by handling size >
> PMD_SIZE.

Yup. These things are hard enough to debug that we should try and make
it fool-proof, even if that's not immediately used (and fixing that is
pretty easy).

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 07/10] KVM: arm64: Convert pkvm_mappings to interval tree
  2025-05-16 13:15   ` Marc Zyngier
@ 2025-05-19 14:22     ` Quentin Perret
  0 siblings, 0 replies; 24+ messages in thread
From: Quentin Perret @ 2025-05-19 14:22 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Vincent Donnefort, oliver.upton, joey.gouly, suzuki.poulose,
	yuzenghui, catalin.marinas, will, linux-arm-kernel, kvmarm,
	linux-kernel, kernel-team

On Friday 16 May 2025 at 14:15:45 (+0100), Marc Zyngier wrote:
> > -/*
> > - * __tmp is updated to rb_next(__tmp) *before* entering the body of the loop to allow freeing
> > - * of __map inline.
> > - */
> > +INTERVAL_TREE_DEFINE(struct pkvm_mapping, node, u64, __subtree_last,
> > +		     __pkvm_mapping_start, __pkvm_mapping_end, static,
> > +		     pkvm_mapping);
> > +
> >  #define for_each_mapping_in_range_safe(__pgt, __start, __end, __map)				\
> > -	for (struct rb_node *__tmp = find_first_mapping_node(&(__pgt)->pkvm_mappings,		\
> > -							     ((__start) >> PAGE_SHIFT));	\
> > +	for (struct pkvm_mapping *__tmp = pkvm_mapping_iter_first(&(__pgt)->pkvm_mappings,	\
> > +								  __start, __end - 1);		\
> >  	     __tmp && ({									\
> > -				__map = rb_entry(__tmp, struct pkvm_mapping, node);		\
> > -				__tmp = rb_next(__tmp);						\
> > +				__map = __tmp;							\
> > +				__tmp = pkvm_mapping_iter_next(__map, __start, __end - 1);	\
> >  				true;								\
> >  		       });									\
> > -	    )											\
> > -		if (__map->gfn < ((__start) >> PAGE_SHIFT))					\
> > -			continue;								\
> > -		else if (__map->gfn >= ((__end) >> PAGE_SHIFT))					\
> > -			break;									\
> > -		else
> > +	    )
> 
> The removal of the comment worries me a bit. Is this iterator still
> safe wrt freeing of the iterator in the loop?

Yep it is still safe (we're still caching the next value in __tmp before
entering the body of the loop). But we shouldn't remove the comment
altogether, it just needs an update.

Cheers,
Quentin


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 09/10] KVM: arm64: Stage-2 huge mappings for np-guests
  2025-05-16 13:28   ` Marc Zyngier
@ 2025-05-19 14:34     ` Quentin Perret
  0 siblings, 0 replies; 24+ messages in thread
From: Quentin Perret @ 2025-05-19 14:34 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Vincent Donnefort, oliver.upton, joey.gouly, suzuki.poulose,
	yuzenghui, catalin.marinas, will, linux-arm-kernel, kvmarm,
	linux-kernel, kernel-team

On Friday 16 May 2025 at 14:28:27 (+0100), Marc Zyngier wrote:
> On Fri, 09 May 2025 14:17:05 +0100,
> Vincent Donnefort <vdonnefort@google.com> wrote:
> > 
> > Now np-guests hypercalls with range are supported, we can let the
> > hypervisor to install block mappings whenever the Stage-1 allows it,
> > that is when backed by either Hugetlbfs or THPs. The size of those block
> > mappings is limited to PMD_SIZE.
> > 
> > Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
> > 
> > diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > index 78fb9cea2034..97e0fea9db4e 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> > @@ -167,7 +167,7 @@ int kvm_host_prepare_stage2(void *pgt_pool_base)
> >  static bool guest_stage2_force_pte_cb(u64 addr, u64 end,
> >  				      enum kvm_pgtable_prot prot)
> >  {
> > -	return true;
> > +	return false;
> >  }
> 
> Can we get rid of this callback now? And of the .force_pte_cb field in
> the kvm_pgtable struct?

I think it is still needed for the host side -- see the comment in
host_stage2_force_pte_cb(). In short, it is not safe to put down a
RO block mapping for the host. Re-mapping one page RW in the middle of
the block for example would cause us to 'lose' the RO permission for
the 511 other pages (assuming 4K pages obv) when we zap the block and
we have no way to reconstruct that lazily as we do for guests.

One way to solve this is to teach pgtable.c to inherit the block
permission/attributes/sw bits/ ... for all the PTEs in the newly
installed table when we zap a block. That is actually quite cheap
and easy to do in particular for KVM_PGTABLE_S2_IDMAP page-tables
(a.k.a the host) -- Android does that downstream FWIW.

Thanks,
Quentin

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 02/10] KVM: arm64: Introduce for_each_hyp_page
  2025-05-16 12:57   ` Marc Zyngier
@ 2025-05-19 14:46     ` Quentin Perret
  0 siblings, 0 replies; 24+ messages in thread
From: Quentin Perret @ 2025-05-19 14:46 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Vincent Donnefort, oliver.upton, joey.gouly, suzuki.poulose,
	yuzenghui, catalin.marinas, will, linux-arm-kernel, kvmarm,
	linux-kernel, kernel-team

On Friday 16 May 2025 at 13:57:52 (+0100), Marc Zyngier wrote:
> On Fri, 09 May 2025 14:16:58 +0100,
> Vincent Donnefort <vdonnefort@google.com> wrote:
> > 
> > Add a helper to iterate over the hypervisor vmemmap. This will be
> > particularly handy with the introduction of huge mapping support
> > for the np-guest stage-2.
> > 
> > Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
> > 
> > diff --git a/arch/arm64/kvm/hyp/include/nvhe/memory.h b/arch/arm64/kvm/hyp/include/nvhe/memory.h
> > index eb0c2ebd1743..676a0a13c741 100644
> > --- a/arch/arm64/kvm/hyp/include/nvhe/memory.h
> > +++ b/arch/arm64/kvm/hyp/include/nvhe/memory.h
> > @@ -96,24 +96,24 @@ static inline struct hyp_page *hyp_phys_to_page(phys_addr_t phys)
> >  #define hyp_page_to_virt(page)	__hyp_va(hyp_page_to_phys(page))
> >  #define hyp_page_to_pool(page)	(((struct hyp_page *)page)->pool)
> >  
> > -static inline enum pkvm_page_state get_host_state(phys_addr_t phys)
> > +static inline enum pkvm_page_state get_host_state(struct hyp_page *p)
> >  {
> > -	return (enum pkvm_page_state)hyp_phys_to_page(phys)->__host_state;
> > +	return (enum pkvm_page_state)p->__host_state;
> 
> I'm not quite sure why we have this cast the first place. If we are so
> keen on type consistency, why isn't __host_state an enum pkvm_page_state
> the first place?

Purely for cosmetic reasons TBH. The *hyp* state (as you've seen below)
really needs accessors with some logic to decode whatever we store in
hyp_page (the _complement_ of the hyp state in practice). So we
introduced accessors for the *host* state for consistency and obfuscated
the __host_state type in hyp_page to encourage the usage of these
accessors. None of that is strictly necessary, though. And the explicit
cast can go, it is already implicit from the return type of the accessor
and the compiler should be happy enough I think.

> >  }
> >  
> > -static inline void set_host_state(phys_addr_t phys, enum pkvm_page_state state)
> > +static inline void set_host_state(struct hyp_page *p, enum pkvm_page_state state)
> >  {
> > -	hyp_phys_to_page(phys)->__host_state = state;
> > +	p->__host_state = state;
> >  }
> >  
> > -static inline enum pkvm_page_state get_hyp_state(phys_addr_t phys)
> > +static inline enum pkvm_page_state get_hyp_state(struct hyp_page *p)
> >  {
> > -	return hyp_phys_to_page(phys)->__hyp_state_comp ^ PKVM_PAGE_STATE_MASK;
> > +	return p->__hyp_state_comp ^ PKVM_PAGE_STATE_MASK;
> 
> And we don't seem that bothered here.

Cheers,
Quentin


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2025-05-19 15:45 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-09 13:16 [PATCH v4 00/10] Stage-2 huge mappings for pKVM np-guests Vincent Donnefort
2025-05-09 13:16 ` [PATCH v4 01/10] KVM: arm64: Handle huge mappings for np-guest CMOs Vincent Donnefort
2025-05-16 12:15   ` Marc Zyngier
2025-05-16 17:53     ` Vincent Donnefort
2025-05-17  8:51       ` Marc Zyngier
2025-05-09 13:16 ` [PATCH v4 02/10] KVM: arm64: Introduce for_each_hyp_page Vincent Donnefort
2025-05-16 12:57   ` Marc Zyngier
2025-05-19 14:46     ` Quentin Perret
2025-05-09 13:16 ` [PATCH v4 03/10] KVM: arm64: Add a range to __pkvm_host_share_guest() Vincent Donnefort
2025-05-16 13:10   ` Marc Zyngier
2025-05-09 13:17 ` [PATCH v4 04/10] KVM: arm64: Add a range to __pkvm_host_unshare_guest() Vincent Donnefort
2025-05-09 13:17 ` [PATCH v4 05/10] KVM: arm64: Add a range to __pkvm_host_wrprotect_guest() Vincent Donnefort
2025-05-09 13:17 ` [PATCH v4 06/10] KVM: arm64: Add a range to __pkvm_host_test_clear_young_guest() Vincent Donnefort
2025-05-09 13:17 ` [PATCH v4 07/10] KVM: arm64: Convert pkvm_mappings to interval tree Vincent Donnefort
2025-05-16 13:15   ` Marc Zyngier
2025-05-19 14:22     ` Quentin Perret
2025-05-09 13:17 ` [PATCH v4 08/10] KVM: arm64: Add a range to pkvm_mappings Vincent Donnefort
2025-05-09 13:17 ` [PATCH v4 09/10] KVM: arm64: Stage-2 huge mappings for np-guests Vincent Donnefort
2025-05-16 13:28   ` Marc Zyngier
2025-05-19 14:34     ` Quentin Perret
2025-05-09 13:17 ` [PATCH v4 10/10] KVM: arm64: np-guest CMOs with PMD_SIZE fixmap Vincent Donnefort
2025-05-16 13:55   ` Marc Zyngier
2025-05-16 18:03     ` Vincent Donnefort
2025-05-17  8:53       ` Marc Zyngier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).